Re: [RFC] cfg80211 and nl80211

2006-10-05 Thread Johannes Berg
Umm, looks like I skipped this paragraph in my earlier reply to you. Sorry
about that.

> I'd also argue that one specific BSSID is part of an initial
> configuration.  We should support that in config command.  It's an
> implicit SET_FIXED_BSSID, yes.  But one of the major points of
> nl80211/cfg80211 was that you could bundle up a set of configuration
> settings into a single atomic "packet", which you couldn't do with WE.
> 
> So if a specific BSSID isn't sent in the initial config command, when do
> you set a specific BSSID?  Before?  After?  The behavior starts getting
> complicated, and we're back to a situation where every driver implements
> the semantics in a slightly different manner.

Ah, good point. But then, why would you want to set a specific (initial)
BSSID at all? Either you set userspace roaming (which you'd do before
setting the SSID) then the kernel can't do anything without you setting a
BSSID, or you don't set userspace roaming, then all the kernel needs is the
SSID.

I'm thinking you probably want something like 'list of BSSIDs to use for
userspace roaming' and possibly a blacklist too, although I'm inclined to
let userspace manage the blacklist by way of having a whitelist *only* and
having userspace simply add everything to the whitelist that it discovers
through scanning and isn't on the blacklist...

Hence, would you be satisfied with a BSSID-whitelist for kernel-controlled
roaming (userspace roaming doesn't need the kernel to know about the
whitelist)? Heck, you could even use a single-element whitelist for when you
want to force the kernel to associate to that AP... Maybe we should thus
drop the userspace roaming support? I think it's a simpler API though...
Then again, why do we need a BSSID-whitelist? Just have userspace control
roaming then...

Also, the use case you want could probably be achieved by turning on
userspace roaming, setting the BSSID for it, configuring the SSID and then
turning off userspace roaming again.

Or let me put it another way: I'm not sure what the use case actually is :)

johannes
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] cfg80211 and nl80211

2006-10-05 Thread Johannes Berg
On Wed, 2006-10-04 at 13:57 -0400, Dan Williams wrote:

> Are we talking about config changes when some other process pushes a new
> config to the card, or when something happens over the air, like new
> association or deauth?

Well, both actually :) Yeah, we should have different groups for that.

> Is it a problem to actually push the _entire_ scan list out to clients
> over netlink?  The scan list could be quite large, maybe even a few
> kilobytes when stuff like Information Elements, ratesets, etc is
> available.  I've seen 35-item scan lists that are already around 1.5K.

35-item list at 1.5K, heh, the allocated skbs are always at least 4k so
we can just push that out in full I guess. Though scan lists with
nl80211 will be slightly larger (some genetlink overhead) than the
bit-packed WE stuff.

> Ideally, we could push the whole scan list to clients, and then we avoid
> the race between getting the scan result notification and hitting the
> card.  That said, as long as the driver does proper locking, the race
> condition shouldn't matter at all.

Heh yeah.

> I'd vote for pushing results along with the notification, because one of
> the most annoying things in the past was the inconsistency between how
> drivers reported results and what BSSID attributes they sent.  If we can
> _standardize_ the result list and its construction inside
> cfg80211/nl80211 that would be a great benefit.

Well, I'm thinking that drivers provide an iterator that provides a scan
result structure for each result and nl80211 iterates over that building
up the netlink message. That way, they just have to fill in such a
structure for each iteration, which should ease things.

> If we can't push results with the notification, at least provide some
> functions to build up the GET_SCAN reply message, which you'll likely
> have to do anyway once you implement GET_SCAN.  We _really_ need drivers
> to be consistent here.

Same thing, get_scan() gives an iterator function that the driver calls
for each result it has with the same scan result struct.

> > >  * crypto and auth support
> 
> I've done a lot of thinking about crypto/auth this morning while beating
> the hell out of the libertas 8388 driver to clean up the ENCODE support.
> 
> There are several issues here.  They can be roughly split by encryption
> algorithm.  But the big question:
> 
> Is there a case for _multiple_ encryption algorithms enabled
> on a single "virtual" interface at one time?
> 
> I don't think there is, and I think that just complicates things in
> d80211 anyway.  If we agree that you can only set one of [none, WEP,
> WPA] on a virtual interface at any given time, it makes the crypto
> interface for nl80211 a lot easier.

I can't see a case point for that. Although maybe for AP interfaces? But
does that make sense to have some stations with say TKIP and others with
AES, and is that even negotiable? In any case, I think a STA inside an
AP should be treated mostly like a single "STA virtual interface" which
surely doesn't need multiple algorithms.

> Part of the problem of WE right now is that there's no clear API
> separation between the different options.  You can pass some WEP options
> through when you really want to do WPA (like key indexes).  That makes
> the driver handling code for ENCODE and ENCODEEXT too complex.
> 
> Taking one-at-a-time as a given, and the pseudo-structure
> 
> struct cmd_crypto {
>   enum crypto_alg alg;
>   union data {
>   none_data;
>   wep_data;
>   wpa_data;
>   ...
>   };
> };
> 
> Set alg == , set the options, and the driver will _enable_
> that crypto mode with the given options.  It makes no sense at all to,
> say, set the WEP transmit key index or WEP key when the card is in WPA
> mode or no-crypto mode.

That makes sense; in netlink it'd be represented by a message containing
a algorithm attribute and then attributes for all the other things and
not those attributes for say WPA if you use WEP.

> It's important to note that some options are independent of the initial
> operation that enabled the crypto, and need to be set later without
> triggering deauth and such.  Setting non-TX-index WEP key is one such
> operation.  I should be able to set WEP keys at indexes other than the
> transmit key index without affecting operation of the card (unless some
> hardware/firmware issue prevents this).
> 
> - No crypto
> - WEP encryption (following ops are independent of each other):
> - Set TX key index
> - Set privacy invoked

what is that?

> - Set exclude unencrypted packets
> - Set authentication mode (open, shared-key, or both)
> - Set (or clear) WEP key 1, 2, 3, or 4
> - WPA/WPA2/IEEE8021X
> - Jouni/others would know better and my brain is fried right now
> 
> All the WEP options should be independent attributes in nl80211.  You
> could even have a generic WEPKey attribute that is defined like so:
> 
> ATTR_WEP_KEY {
>   enum 

Re: [PATCH] wext

2006-10-05 Thread Johannes Berg
On Wed, 2006-10-04 at 14:45 -0700, Jouni Malinen wrote:

> SIOCSIWMLME was designed to allow additional MLME commands to be added.
> IMHO, a potential replacement in the future should not prevent us from
> extending WEXT at this point and stop all changes in something that is
> currently available.

Fine with me, it's really just a matter of switch()ing on the
sub-command in the cfg80211-we compat code. My answer was more political
in nature I guess -- even I need sleep, you know. :P

johannes
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Evgeniy Polyakov
On Wed, Oct 04, 2006 at 10:57:32AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) 
wrote:
> On 10/3/06, Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:
> >http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c
> >http://tservice.net.ru/~s0mbre/archive/kevent/evtest.c
> 
> These are simple programs which by themselves have problems.  For
> instance, I consider a very bad idea to hardcode the size of the ring
> buffer.  Specifying macros in the header file counts as hardcoding.
> Systems grow over time and so will the demand of connections.  I have
> no problem with the kernel hardcoding the value internally (or having
> a /proc entry to select it) but programs should be able to dynamically
> learn about the value so they don't have to be recompiled.

Well, it is possible to create /sys/proc entry for that, and even now 
userspace can grow mapping ring until it is forbiden by kernel, which
means limit is reached.

Actually the whole idea with global limit of kevents does not sound very
good to me, but it is required to remove overflow in mapped buffer.

> But more problematic is that I don't see how the interfaces can be
> efficiently used in multi-threaded (or multi-process) programs.  How
> would multiple threads using the same kevent queue and running in the
> same kevent_get_events() loop work out?  How do they guarantee that
> each request is only handled once?

kqueue_dequeue_ready() is atomic and this function removes kevent from
ready queue so other thread can not get it.

> From what I see now this means a second data structure is needed to
> keep track of the state of each entry.  But even then, how do we even
> recognized used ring buffer entries?
> 
> For instance, assume two threads.  Both call get_events, one event is
> reported, both threads are woken up (which is another thing to
> consider, more later).  One thread uses ring buffer entry, the other
> goes back to sleep in get_events.  Now, how does the kernel know when
> the other thread is done working on the ring buffer entry?  There
> might be lots of entries coming in overflowing the entire buffer.
> Heck, you don't even need two threads for this scenario.

Are you talking about mapped buffer or syscall interface?
The former has special syscall kevent_wait(), which reports number of
'processed' events and first processed number, so kernel can remove all
appropriate events. The latter is described above -
kqueue_dequeue_ready() is atomic, so that event will be removed from the
ready queue and optionally from the whole kevent tree.

It is possible to work with both interfaces at the same time, since
mapped buffer contains a copy of the event, which is potentially freed
and processed by other thread. 

Actually I do not like idea of mapped ring anyway, since if application 
uses a lot of events, it will batch them into big chunks, so syscall 
overhead is negligible, if application uses small number of events, 
syscalls will be rare and will not hurt performance.

> When I was thinking about this (and discussing it in Ottawa) I was
> always assuming that we have a status field in the ring buffer entry
> which lets the userlevel code indicate whether the entry is free again
> or not.  This requires a writable mapping, yes, and potentially causes
> cache line ping-pong.  I think Zach mentioned he has some ideas about
> this.

As far as I can see, there are no other ideas on how to implement ring
buffer, so I did it like I wanted. It has some limitation indeed, but
since I do not see any other code, how can I say what is better or
worse?
 
> As for the multiple thread wakeup, I mentioned this before.  We have
> to avoid the trampling herd problem.  We cannot wakeup all waiters.
> But we also cannot assume that, without protocols, waking up just one
> for each available entry is sufficient.  So the first question is:
> what is the current policy?

It is a good practice to _not_ share the same queue between a lot of
threads. Currently all waiters are awakened.

> >AIO was removed from patchset by request of Cristoph.
> >Timers, network AIO, fs AIO, socket nortifications and poll/select
> >events work well with existing structures.
> 
> Well, excuse me if I don't take your word for it.  I agree, the AIO
> code should not be submitted along with this.  The same for any other
> code using the event handling.  But we need to check whether the
> interface is generic enough to accomodate them in a way which actually
> makes sense.  Again, think highly threaded processes or multiple
> processes sharing the same event queue.

You missed the point.
I implemented _all_ above and it does work.
Although it was removed from submission patchset.
You can find all patches on kevent homepage, they were posted to lkml@
and netdev@ too many times to miss them.
 
> >It is even possible to create variable sized kevents - each kevent
> >contain pointer to user's data, which can be considered as pointer to
> >additional area (it's size kernel implementation for given k

Re: [take19 0/4] kevent: Generic event handling mechanism.

2006-10-05 Thread Evgeniy Polyakov
On Wed, Oct 04, 2006 at 10:20:44AM -0700, Ulrich Drepper ([EMAIL PROTECTED]) 
wrote:
> Evgeniy Polyakov wrote:
> > It is completely possible to do what you describe without special
> > syscall parameters.
> 
> First of all, I don't see how this is efficiently possible.  The mask
> might change from call to call.

And you can add/remove signal events using existing kevent api between
calls.

> Second, hasn't it sunk in that inventing new ways to pass parameters is
> bad?  Programmers don't want to learn new ways for every new interface.
>  Reuse is good!

And creating special cases for usual events is bad.
There is unified way to deal with events in kevent -
add/remove/modify/wait on them, signals are just usual events.

> This applies to the signal mask here.
> 
> But there is another parameter falling into that category and I meant to
> mention it before: the timeout value.  All other calls except poll and
> especially all modern interfaces use a timespec pointer.  This is the
> way times are kept in userland code.  Don't try to force people to do
> something else.
> 
> Using a timespec also has the advantage that we can add an absolute
> timeout value mode (optional) instead of the relative timeout value.
> 
> In this context, we should/must be able to specify which clock the
> timeout is for (not as part of the wait call, but another control
> operation perhaps).  It's important to distinguish between
> CLOCK_REALTIME and CLOCK_MONOTONE.  Both have their use.

I think you wanted to say, that 'all event mechanism except the most
commonly used poll/select/epoll use timespec'.
I designed it to be similar to poll(), it is really good interface.
Nature of the waiting is to wait for some time, so I put there that
'some time'.

> -- 
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
> 



-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: !! SPAM Suspect : SPAM-URL-DBL !! Re: [RFC] Disable addrconf on ~multicast interfaces?

2006-10-05 Thread Jean-Philippe Andriot

This patch will break multicast forwarding using the pim6 daemon.
This daemon creates an interface called pim6reg which is not MULTICAST
enabled but needs to be configred by addrconf to get ff00::/8 and
fe80::/64 routes. This is required since the route lookup process has
been enforced to strictly match input interfaces for linklocal and
multicast packets.

Regards,
JP


On Thu, Oct 05, 2006 at 04:35:49PM +1000, Herbert Xu wrote:
> Pekka Savola <[EMAIL PROTECTED]> wrote:
> > On Thu, 5 Oct 2006, Herbert Xu wrote:
> >> Are there any non-multicast interfaces that require addrconf?
> >> In other words, what does the following patch break :)
> > 
> > Point-to-point (or NOARP) interfaces such as tunnels.   I'm not sure 
> > what are the right flags to check..
> 
> Tunnels shouldn't even get into that function so they aren't
> affected.  Are there any Ethernet-like interfaces which do not
> set IFF_MULTICAST yet still require addrconf?
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Network Events Connector

2006-10-05 Thread Evgeniy Polyakov
On Thu, Oct 05, 2006 at 03:10:02AM +0200, Samir Bellabes ([EMAIL PROTECTED]) 
wrote:
> > You can also extend your module to be more generic and send all (or just
> > requested in config) state changes for all protocols (or those checked
> > in config).
> 
> Ok, so the next step now is to target all state changes for all
> protocols, *but* send only the states asked dynamically from the
> userspace, using the userspace-to-kernel's way of the netlink.
> What do you think about that ?

That sounds good, but as David mentioned, if there are other good
possibilities to do so, there is no need to reinvent new one (although
sometimes it is much better to reinvent the wheel, if existing one is
square).

> >> > Btw, you could also create netlink/connector based firewall rules
> >> > update, I think people with hundreds of rules in one table will bless
> >> > you after that.
> >> 
> >> This is the real goal, using ipset - http://ipset.netfilter.org/
> >> With this we can easily create a uniq rule for iptables, and then
> >> add/remove port from the 'set' involve.
> >
> > It is not the same as create and update existing rules.
> > I think hipac project uses feature of fast rules update.
> > It is quite major break for existing iptables, but it should be
> > eventually done...
> 
> Ok now i understand clearly your point. 
> But we are a bit far from the initial idea, even if it could be really
> good to do that. First, let's code the initial idea.

Agree.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: incorrect length on receive packets

2006-10-05 Thread Jeff Garzik

applied to #upstream-fixes

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] ibmveth: Harden driver initilisation

2006-10-05 Thread Jeff Garzik

applied 1-5 to #upstream-fixes

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mv643xx_eth: Fix ethtool stats

2006-10-05 Thread Jeff Garzik

applied to #upstream-fixes

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/6] 2.6.18: sb1250-mac: Broadcom PHY support

2006-10-05 Thread Jeff Garzik

Maciej W. Rozycki wrote:

Hello,

[...]

 Please consider.

  Maciej



Please don't include this in the patch description.  It must be 
hand-edited out, before applying with git-applymbox.  All comments 
should be placed AFTER the "---" separator, which terminates the patch 
description.


Applied patches 1-3, patch #4 failed due to drivers/net/Kconfig breakage

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2006-10-05 Thread Jeff Garzik

Jay Vosburgh wrote:

From: Karsten Keil <[EMAIL PROTECTED]>

In bond_alb_monitor the bond->curr_slave_lock write lock is taken
and then dev_set_promiscuity maybe called which can take some time,
depending on the network HW. If a network IRQ for this card come in
the softirq handler maybe try to deliver more packets which end up in
a request to the read lock of bond->curr_slave_lock -> deadlock.
This issue was found by a test lab during network stress tests, this patch
disable the softirq handler for this case and solved the issue.

Signed-off-by: Karsten Keil <[EMAIL PROTECTED]>
Acked-by: Jay Vosburgh <[EMAIL PROTECTED]>


applied, though note that your email was slightly corrupted.  It 
included _two_ Subject headers, making the email non-compliant with RFC822.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/6] 2.6.18: sb1250-mac: Broadcom PHY support

2006-10-05 Thread Jeff Garzik

Jeff Garzik wrote:

Maciej W. Rozycki wrote:

Hello,

[...]

 Please consider.

  Maciej



Please don't include this in the patch description.  It must be 
hand-edited out, before applying with git-applymbox.  All comments 
should be placed AFTER the "---" separator, which terminates the patch 
description.


Applied patches 1-3, patch #4 failed due to drivers/net/Kconfig breakage


Also, in your email subject line, the kernel version should be included 
in the [PATCH...] brackets.  Please see 
http://linux.yyz.us/patch-format.html and 
Documentation/SubmittingPatches for more info.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] b44: fix multicast with >32 groups

2006-10-05 Thread Jeff Garzik

Bill Helfinstine wrote:


The b44 driver has a bug where if there are more than 
B44_MCAST_TABLE_SIZE groups in the dev->mc_list, it will only listen to 
the first B44_MCAST_TABLE_SIZE that it sees.


This patch makes the driver go into RXCONFIG_ALLMULTI mode if there are 
more than B44_MCAST_TABLE_SIZE groups being subscribed to.


This patch is against 2.6.18, b44.c version 1.01.


Signed-off-by: Bill Helfinstine <[EMAIL PROTECTED]>


applied manually, due to whitespace damage

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.18] AT91RM9200 Ethernet update

2006-10-05 Thread Jeff Garzik

ACK, but patch doesn't apply to 2.6.18

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Jiri Benc
On Wed, 04 Oct 2006 18:51:40 +0200, Jan Kiszka wrote:
> Ok, I'm not promising success and I'm going to duck immediately if
> someone else feels like working on it, but I could try to patch in this
> direction.

Your patches are welcomed!

> Now there just remains my precautious question if there are other
> services in the ieee_80211_hw interface that may conflict with sleeping
> USB drivers. What about specifying the possible contexts in
> include/net/d80211.h?

Yes, that makes sense. Feel free to send a patch :-)

Thanks,

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Jiri Benc
On Wed, 4 Oct 2006 19:22:38 +0200, Ivo van Doorn wrote:
> Well another point of concern for me is the TSF handling, those handlers are 
> called
> from interrupt context as well, and also deliver problems for the USB drivers 
> in case
> of adhoc mode.

Where is a problem with tsf handlers? get_tsf is not called at all
(unless CONFIG_D80211_IBSS_DEBUG is set; well, that raises a question
why the function exists in the first place), reset_tsf returns void.

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Eric Dumazet
On Thursday 05 October 2006 10:57, Evgeniy Polyakov wrote:

> Well, it is possible to create /sys/proc entry for that, and even now
> userspace can grow mapping ring until it is forbiden by kernel, which
> means limit is reached.

No need for yet another /sys/proc entry.

Right now, I (for example) may have a use for Generic event handling, but for 
a program that needs XXX.XXX handles, and about XX.XXX events per second.

Right now, this program uses epoll, and reaches no limit at all, once you pass 
the "ulimit -n", and other kernel wide tunes of course, not related to epoll.

With your current kevent, I cannot switch to it, because of hardcoded limits.

I may be wrong, but what is currently missing for me is :

- No hardcoded limit on the max number of events. (A process that can open 
XXX.XXX files should be allowed to open a kevent queue with at least XXX.XXX 
events). Right now thats not clear what happens IF the current limit is 
reached.

- In order to avoid touching the whole ring buffer, it might be good to be 
able to reset the indexes to the beginning when ring buffer is empty. (So if 
the user land is responsive enough to consume events, only first pages of the 
mapping would be used : that saves L1/L2 cpu caches)

A plus would be

- A working/usable mmap ring buffer implementation, but I think its not 
mandatory. System calls are not that expensive, especially if you can batch 
XX events per syscall (like epoll). Nice thing with a ring buffer is that we 
touch less cache lines than say epoll that have lot of linked structures.

About mmap, I think you might want a hybrid thing :

One writable page where userland can write its index, (and hold one or more 
futex shared by kernel) (with appropriate thread locking in case multiple 
threads want to dequeue events). In fast path, no syscalls are needed to 
maintain this user index.

XXX readonly pages (for user, but r/w for kernel), where kernel write its own 
index, and events of course.

Using separate cache lines avoid false sharing : kernel can update its own 
index and events without having to pay the price of cache line ping pongs.
It could use futex infrastructure to wakeup one thread 'only' instead of all 
threads waiting an event.


Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Evgeniy Polyakov
On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 05 October 2006 10:57, Evgeniy Polyakov wrote:
> 
> > Well, it is possible to create /sys/proc entry for that, and even now
> > userspace can grow mapping ring until it is forbiden by kernel, which
> > means limit is reached.
> 
> No need for yet another /sys/proc entry.
> 
> Right now, I (for example) may have a use for Generic event handling, but for 
> a program that needs XXX.XXX handles, and about XX.XXX events per second.
> 
> Right now, this program uses epoll, and reaches no limit at all, once you 
> pass 
> the "ulimit -n", and other kernel wide tunes of course, not related to epoll.
> 
> With your current kevent, I cannot switch to it, because of hardcoded limits.
> 
> I may be wrong, but what is currently missing for me is :
> 
> - No hardcoded limit on the max number of events. (A process that can open 
> XXX.XXX files should be allowed to open a kevent queue with at least XXX.XXX 
> events). Right now thats not clear what happens IF the current limit is 
> reached.

This forces to overflows in fixed sized memory mapped buffer.
If we remove memory mapped buffer or will allow to have overflows (and
thus skipped entries) keven can easily scale to that limits (tested with
xx.xxx events though).

> - In order to avoid touching the whole ring buffer, it might be good to be 
> able to reset the indexes to the beginning when ring buffer is empty. (So if 
> the user land is responsive enough to consume events, only first pages of the 
> mapping would be used : that saves L1/L2 cpu caches)

And what happens when there are 3 empty at the beginning and \we need to
put there 4 ready events?

> A plus would be
> 
> - A working/usable mmap ring buffer implementation, but I think its not 
> mandatory. System calls are not that expensive, especially if you can batch 
> XX events per syscall (like epoll). Nice thing with a ring buffer is that we 
> touch less cache lines than say epoll that have lot of linked structures.
> 
> About mmap, I think you might want a hybrid thing :
> 
> One writable page where userland can write its index, (and hold one or more 
> futex shared by kernel) (with appropriate thread locking in case multiple 
> threads want to dequeue events). In fast path, no syscalls are needed to 
> maintain this user index.
> 
> XXX readonly pages (for user, but r/w for kernel), where kernel write its own 
> index, and events of course.

The problem is in that xxx pages - how many can we eat per kevent
descriptor? It is pinned memory and thus it is possible to have a DoS.
If xxx above is not enough to store all events, we will have
yet-another-broken behaviour like rt-signal queue overflow.

> Using separate cache lines avoid false sharing : kernel can update its own 
> index and events without having to pay the price of cache line ping pongs.
> It could use futex infrastructure to wakeup one thread 'only' instead of all 
> threads waiting an event.
>
> 
> Eric

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Eric Dumazet
On Thursday 05 October 2006 12:21, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
> > I may be wrong, but what is currently missing for me is :
> >
> > - No hardcoded limit on the max number of events. (A process that can
> > open XXX.XXX files should be allowed to open a kevent queue with at least
> > XXX.XXX events). Right now thats not clear what happens IF the current
> > limit is reached.
>
> This forces to overflows in fixed sized memory mapped buffer.
> If we remove memory mapped buffer or will allow to have overflows (and
> thus skipped entries) keven can easily scale to that limits (tested with
> xx.xxx events though).

What is missing or not obvious is : If events are skipped because of 
overflows, What happens ? Connections stuck forever ? Hope that everything 
will restore itself ? Is kernel able to SIGNAL this problem to user land ?


>
> > - In order to avoid touching the whole ring buffer, it might be good to
> > be able to reset the indexes to the beginning when ring buffer is empty.
> > (So if the user land is responsive enough to consume events, only first
> > pages of the mapping would be used : that saves L1/L2 cpu caches)
>
> And what happens when there are 3 empty at the beginning and \we need to
> put there 4 ready events?

Re-read what I said :  when ring buffer is empty.

When ring buffer is empty, kernel can reset index right before adding XX new 
events. You read 3 events consumed, I said : When all ring buffer is empty, 
because all previous events were consumed by user land, then we can reset 
indexes to 0.

>
> > A plus would be
> >
> > - A working/usable mmap ring buffer implementation, but I think its not
> > mandatory. System calls are not that expensive, especially if you can
> > batch XX events per syscall (like epoll). Nice thing with a ring buffer
> > is that we touch less cache lines than say epoll that have lot of linked
> > structures.
> >
> > About mmap, I think you might want a hybrid thing :
> >
> > One writable page where userland can write its index, (and hold one or
> > more futex shared by kernel) (with appropriate thread locking in case
> > multiple threads want to dequeue events). In fast path, no syscalls are
> > needed to maintain this user index.
> >
> > XXX readonly pages (for user, but r/w for kernel), where kernel write its
> > own index, and events of course.
>
> The problem is in that xxx pages - how many can we eat per kevent
> descriptor? It is pinned memory and thus it is possible to have a DoS.
> If xxx above is not enough to store all events, we will have
> yet-another-broken behaviour like rt-signal queue overflow.
>

Re-read : I have a process that has the right to open XXX.XXX handles, 
allocating XXX.XXX tcp sockets, dentries, files structures, inodes, epoll 
events, its obviously already a DOS risk, but controled by 'ulimit -n'

Allocating XXX.XXX * (32 or 64) bytes is a win if I can zap epoll structures 
(currently more than 256 bytes per event)

epoll structures are pinned too... what's wrong with that ?

# egrep "filp|poll|TCP|dentries|sock_inode" /proc/slabinfo |cut -c1-50
tw_sock_TCP 1302   2200192   201 :
request_sock_TCP2046   4260128   301 :
TCP   151509 196910   147252 :
eventpoll_pwq 146718 199439 72   531 :
eventpoll_epi 146718 199360192   201 :
sock_inode_cache  149182 19794064061 :
filp  149537 202515256   151 :

If you want to protect from DOS, just use ulimit -n 100

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Evgeniy Polyakov
On Thu, Oct 05, 2006 at 12:45:03PM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 05 October 2006 12:21, Evgeniy Polyakov wrote:
> > On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
> > > I may be wrong, but what is currently missing for me is :
> > >
> > > - No hardcoded limit on the max number of events. (A process that can
> > > open XXX.XXX files should be allowed to open a kevent queue with at least
> > > XXX.XXX events). Right now thats not clear what happens IF the current
> > > limit is reached.
> >
> > This forces to overflows in fixed sized memory mapped buffer.
> > If we remove memory mapped buffer or will allow to have overflows (and
> > thus skipped entries) keven can easily scale to that limits (tested with
> > xx.xxx events though).
> 
> What is missing or not obvious is : If events are skipped because of 
> overflows, What happens ? Connections stuck forever ? Hope that everything 
> will restore itself ? Is kernel able to SIGNAL this problem to user land ?
 
Exisitng  code does not overflow by design, but can consume a lot of
memory. I talked about the case, when there will be some limit on
number of entries put into mapped buffer.

> > > - In order to avoid touching the whole ring buffer, it might be good to
> > > be able to reset the indexes to the beginning when ring buffer is empty.
> > > (So if the user land is responsive enough to consume events, only first
> > > pages of the mapping would be used : that saves L1/L2 cpu caches)
> >
> > And what happens when there are 3 empty at the beginning and \we need to
> > put there 4 ready events?
> 
> Re-read what I said :  when ring buffer is empty.
> 
> When ring buffer is empty, kernel can reset index right before adding XX new 
> events. You read 3 events consumed, I said : When all ring buffer is empty, 
> because all previous events were consumed by user land, then we can reset 
> indexes to 0.

It is the same.
What if reing buffer was grown upto 3 entry, and is now empty, and we
need to put there 4 entries? Grow it again?
It can be done, easily, but it looks like a workaround not as solution.
And it is highly unlikely that in situation, when there are a lot of
event, ring can be empty.

> >
> > > A plus would be
> > >
> > > - A working/usable mmap ring buffer implementation, but I think its not
> > > mandatory. System calls are not that expensive, especially if you can
> > > batch XX events per syscall (like epoll). Nice thing with a ring buffer
> > > is that we touch less cache lines than say epoll that have lot of linked
> > > structures.
> > >
> > > About mmap, I think you might want a hybrid thing :
> > >
> > > One writable page where userland can write its index, (and hold one or
> > > more futex shared by kernel) (with appropriate thread locking in case
> > > multiple threads want to dequeue events). In fast path, no syscalls are
> > > needed to maintain this user index.
> > >
> > > XXX readonly pages (for user, but r/w for kernel), where kernel write its
> > > own index, and events of course.
> >
> > The problem is in that xxx pages - how many can we eat per kevent
> > descriptor? It is pinned memory and thus it is possible to have a DoS.
> > If xxx above is not enough to store all events, we will have
> > yet-another-broken behaviour like rt-signal queue overflow.
> >
> 
> Re-read : I have a process that has the right to open XXX.XXX handles, 
> allocating XXX.XXX tcp sockets, dentries, files structures, inodes, epoll 
> events, its obviously already a DOS risk, but controled by 'ulimit -n'
> 
> Allocating XXX.XXX * (32 or 64) bytes is a win if I can zap epoll structures 
> (currently more than 256 bytes per event)
> 
> epoll structures are pinned too... what's wrong with that ?
> 
> # egrep "filp|poll|TCP|dentries|sock_inode" /proc/slabinfo |cut -c1-50
> tw_sock_TCP 1302   2200192   201 :
> request_sock_TCP2046   4260128   301 :
> TCP   151509 196910   147252 :
> eventpoll_pwq 146718 199439 72   531 :
> eventpoll_epi 146718 199360192   201 :
> sock_inode_cache  149182 19794064061 :
> filp  149537 202515256   151 :
> 
> If you want to protect from DOS, just use ulimit -n 100

epoll() does not have mmap.
Problem is not about how many events can be put into the kernel, but how
many of them can be put into mapped buffer.
There is no problem if mmap is turned off.

> Eric

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.19-rc1] ehea bug fix (port state notification, default queue sizes)

2006-10-05 Thread Jeff Garzik

which patch is to be applied first?

You failed to include an order, as described by 
http://linux.yyz.us/patch-format.html and Documentation/SubmittingPatches.


Also, stuff like "hi Jeff" and "Thanks, Jan-Bernd" must be hand-edited 
out before patch application.  All comments not intended to be DIRECTLY 
copied into the kernel changeset description should follow the "---" 
separator.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Eric Dumazet
On Thursday 05 October 2006 12:55, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 12:45:03PM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
> >
> > What is missing or not obvious is : If events are skipped because of
> > overflows, What happens ? Connections stuck forever ? Hope that
> > everything will restore itself ? Is kernel able to SIGNAL this problem to
> > user land ?
>
> Exisitng  code does not overflow by design, but can consume a lot of
> memory. I talked about the case, when there will be some limit on
> number of entries put into mapped buffer.

You still dont answer my question. Please answer the question.
Recap : You have a max of  events queued. A network message come and 
kernel want to add another event. It cannot because limit is reached. How the 
User Program knows that this problem was hit ?


> It is the same.
> What if reing buffer was grown upto 3 entry, and is now empty, and we
> need to put there 4 entries? Grow it again?
> It can be done, easily, but it looks like a workaround not as solution.
> And it is highly unlikely that in situation, when there are a lot of
> event, ring can be empty.

I dont speak of re-allocation of ring buffer. I dont care to allocate at 
startup a big enough buffer.

Say you have allocated a ring buffer of 1024*1024 entries.
Then you queue 100 events per second, and dequeue them immediatly.
No need to blindly use all 1024*1024 slots in the ring buffer, doing 
index = (index+1)%(1024*1024)



> epoll() does not have mmap.
> Problem is not about how many events can be put into the kernel, but how
> many of them can be put into mapped buffer.
> There is no problem if mmap is turned off.

So zap mmap() support completely, since it is not usable at all. We wont 
discuss on it.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Evgeniy Polyakov
On Thu, Oct 05, 2006 at 02:09:31PM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Thursday 05 October 2006 12:55, Evgeniy Polyakov wrote:
> > On Thu, Oct 05, 2006 at 12:45:03PM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
> > >
> > > What is missing or not obvious is : If events are skipped because of
> > > overflows, What happens ? Connections stuck forever ? Hope that
> > > everything will restore itself ? Is kernel able to SIGNAL this problem to
> > > user land ?
> >
> > Exisitng  code does not overflow by design, but can consume a lot of
> > memory. I talked about the case, when there will be some limit on
> > number of entries put into mapped buffer.
> 
> You still dont answer my question. Please answer the question.
> Recap : You have a max of  events queued. A network message come and 
> kernel want to add another event. It cannot because limit is reached. How the 
> User Program knows that this problem was hit ?

Existing design does not allow overflow.
If event was added into the queue (like user requested notification,
when new data has arrived), it is guaranteed that there will be place to
put that event into mapped buffer when it is ready.

If user wants to add anotehr event (for example after accept() user
wants to add another socket with request for notification about data
arrival into that socket), it can fail though. This limit is introduced
only because of mmap buffer.
 
> > It is the same.
> > What if reing buffer was grown upto 3 entry, and is now empty, and we
> > need to put there 4 entries? Grow it again?
> > It can be done, easily, but it looks like a workaround not as solution.
> > And it is highly unlikely that in situation, when there are a lot of
> > event, ring can be empty.
> 
> I dont speak of re-allocation of ring buffer. I dont care to allocate at 
> startup a big enough buffer.
> 
> Say you have allocated a ring buffer of 1024*1024 entries.
> Then you queue 100 events per second, and dequeue them immediatly.
> No need to blindly use all 1024*1024 slots in the ring buffer, doing 
> index = (index+1)%(1024*1024)

But what if they are not dequeued immediateyl? What if rate is high and
while one tries to dequeue, system adds another events?

> > epoll() does not have mmap.
> > Problem is not about how many events can be put into the kernel, but how
> > many of them can be put into mapped buffer.
> > There is no problem if mmap is turned off.
> 
> So zap mmap() support completely, since it is not usable at all. We wont 
> discuss on it.

Initial implementation did not have it.
But I was requested to do it, and it is ready now.
No one likes it, but no one provides an alternative implementation.
We are stuck.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] cfg80211 and nl80211

2006-10-05 Thread Stuffed Crust
On Wed, Oct 04, 2006 at 01:57:38PM -0400, Dan Williams wrote:
> * None
> - Crypto: None
> - 802.11 Auth: Open System
> 
> * Static WEP
> - Keys: up to 4 group keys
> - Crypto: WEP-40, WEP-104, WEP-152, WEP-256
> - 802.11 Auth: Open System or Shared Key
> - Key Mgmt/Auth: none
> 
> * Dynamic WEP (LEAP?)
> - Keys: up to 4 group keys
> - Crypto: WEP-40, WEP-104, WEP-152, WEP-256
> - 802.11 Auth: Open System (only?)
> - Key Mgmt/Auth: IEEE 802.1x with LEAP or EAP
> 
> * WPA PSK
> - Keys: pairwise & group
> - Use WPA IEs
> - 802.11 Auth: Open System
> - Crypto: TKIP or CCMP
> - Key Mgmt/Auth: WPA-PSK (elided 802.1x)
> 
> * WPA Enterprise
> - Keys: pairwise & group
> - Use WPA IEs
> - 802.11 Auth: Open System
> - Crypto: TKIP or CCMP
> - Key Mgmt/Auth: WPA-EAP (full 802.1x)
> 
> * WPA2 PSK
> - Keys: pairwise & group
> - Use RSN IEs
> - 802.11 Auth: Open System
> - Crypto: TKIP or CCMP
> - Key Mgmt/Auth: WPA-PSK (elided 802.1x)
> 
> * WPA2 Enterprise
> - Keys: pairwise & group
> - Use RSN IEs
> - 802.11 Auth: Open System
> - Crypto: TKIP or CCMP
> - Key Mgmt/Auth: WPA-EAP (full 802.1x)

This strikes me as overly complicated; to figure out what's necessary 
you shoudn't be looking at the WEXT API -- The 802.11 standards are 
all you need, and they lay things out fairly clearly, complete with 
rx/tx path flowcharts.  :)

Essentially, you have two crypto paradigms, pre-802.11i and 
post-802.11i.  (WPA uses the latter, and LEAP/CCX v1 is mostly the 
former; newer ones use the latter)

(Leave out the RSNIE, AuthType and KeyMgmt stuff; while they're 
 used in the actual key negotiation/derivation, they're separate 
 problems and have no bearing on the crypto layer.  From the driver's 
 perspective the RSNIE is just an opaque blob to be appended to 
 beacons,presps and [re]assoc frames, KeyMgmt is purely a matter for 
 the authenticator/supplicant, and AuthType is just a toggle that 
 happens to be off for post-802.11i, although LEAP v1 adds some 
 complications there..)

The old way:

* Four "default" keys. (used globally)
* PrivacyInvoked
* SetDefaultKeyIndex

The new way:

* PrivacyInvoked
* SetProtection (tx&|rx -- essentially "require crypto for a given macaddr)
* SetKeyMapping (one key per macaddr)

Each key has:

* Key type (WEP/TKIP/AES-CCMP/NONE)
* Key length (implied, but WEP can have varying key lengths)
* Key index (only '0' is generally used for unicast frames, but 802.11i 
 requires use of simultaneous broadcast keys)
* Macaddr (ucast addr or broadcast aka pairwise vs group)
* RxSequence (mainly for bcast aka group keys)

It's fairly easy to implement the old stuff in terms of the new stuff, 
if you assume that "if I don't have a per-sta key, just use the 
global/bcast key".   The 802.11i rx/tx frame path flow handles the old 
crypto style just fine.

...Meanwhile.  It's foolish to ignore the 802.11 MLME.  It lists out
pretty much everything that's necessary to get a working connection, and
looking at its evolution (and changes in the pipeline) shows that it's
impossible to do it all (right) the first time, and that changes, not
just additions, will be necessary.

(Did I mention that I really like how the ALSA people manage this?  The 
 userspace-kernelspace API is effectively private; apps write to the  
 libs, which do the hard work of maintaining backwards compatibility as 
 the internals change and get new features, but now I'm really just 
 armchair quarterbacking, so I'll shut up now.)

> Wheee!  So you basically have a bunch of buckets and you just pull shit
> out of them at random, stick it all together, and you've got a wireless
> connection :)  Thank you, Cisco.  Thank you, Wi-Fi Alliance.

You forgot the part about sacrificing rubber chickens with pulleys 
in the middle.  While hopping on one foot.  Under a new moon. 

Bah, it's too early in the morning to be thinking about this stuff.  

 - Solomon 
-- 
Solomon Peachy pizza at shaftnet dot org 
Melbourne, FL  ^^ (mail/jabber/gtalk) ^^
Quidquid latine dictum sit, altum viditur.  ICQ: 1318344



pgptme77JcM3O.pgp
Description: PGP signature


Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices

2006-10-05 Thread Or Gerlitz

Jay Vosburgh wrote:

Or Gerlitz <[EMAIL PROTECTED]> wrote:

My understanding is that changing ifenslave and the bonding kernel code to
allow for enslaving while master is not up is enough, so actually no
change is needed to the sysconfig tools, correct?


Incorrect.  The /sbin/ifup included with sysconfig (I'm looking
at version 0.31-0-15.51) has logic to set the bonding master device up
prior to adding any slaves.  E.g.,

# get up the bonding device before enslaving
#   if ! is_iface_up $INTERFACE; then
ip link set $INTERFACE up 2>&1
#   fi
# enslave available slave devices; if there is none -> hard 
break and log
MESSAGE=`/sbin/ifenslave $BONDING_OPTIONS $INTERFACE $BSINTERFACES 
2>&1`

For your purposes, this would cause it to register as an
ethernet hardware type, not an IB type.  The /sbin/ifup included with
initscripts operates a little differently, but also sets the bonding
master up prior to adding any slaves.


OK, you are correct, i agree that the /sbin/ifup would attempt to first 
bring up the bonding device so it breaks my assumptions...



Yes.  Part of the difficulty is that the changes to the
initscripts and sysconfig packages won't be compatible with versions of
bonding prior to the bonding kernel changes (because older versions of
bonding will refuse to add slaves if the master is down).  It might
require adding another API version to bonding, and modifying ifenslave
to work both ways (i.e., with the current "enslave with master up" API,
as well as the new "enslave with master down" API).


Gee, sounds bad


An alternate approach would be to undertake the more substantial
task of converting the initscripts and sysconfig code to use sysfs to
configure bonding.  This would permit changing the logic (to add slaves
while the bonding master is down, then set it up), as well as remove the
current hacks (present only in sysconfig) to load the bonding module
once per configured bonding interface.  The initscripts currently don't
do this (as far as I know), so it's generally only possible to have one
bonding interface under initscripts control.


This sounds like a good idea to get out of all these troubles...

So the direction to have sysconfig and initscripts tools configure 
bonding by sysfs and not by the enslave program is something you were 
considering regardless of the needs imposed by bonding support for non 
ARPHRD_ETHER netdevices? and you think the distro packages owners would 
like this?


I will look into the current methods used by sysconfig to configure 
bonding and see if i can come up with sketch of how to do it with sysfs.


Basically, i use now my own script working with sysfs in my IPoIB 
bonding testing where i have followed the directions in the bonding 
kernel doc.


Thanks again for all the coaching...

Or.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Ivo van Doorn
On Thursday 05 October 2006 13:29, Jiri Benc wrote:
> On Wed, 04 Oct 2006 18:51:40 +0200, Jan Kiszka wrote:
> > Ok, I'm not promising success and I'm going to duck immediately if
> > someone else feels like working on it, but I could try to patch in this
> > direction.
> 
> Your patches are welcomed!
> 
> > Now there just remains my precautious question if there are other
> > services in the ieee_80211_hw interface that may conflict with sleeping
> > USB drivers. What about specifying the possible contexts in
> > include/net/d80211.h?
> 
> Yes, that makes sense. Feel free to send a patch :-)

The patch is currently in testing in the rt2x00 tree.
So it will be shortly send to the netdev list. :)

Ivo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Ivo van Doorn
On Thursday 05 October 2006 13:37, Jiri Benc wrote:
> On Wed, 4 Oct 2006 19:22:38 +0200, Ivo van Doorn wrote:
> > Well another point of concern for me is the TSF handling, those handlers 
> > are called
> > from interrupt context as well, and also deliver problems for the USB 
> > drivers in case
> > of adhoc mode.
> 
> Where is a problem with tsf handlers? get_tsf is not called at all
> (unless CONFIG_D80211_IBSS_DEBUG is set; well, that raises a question
> why the function exists in the first place), reset_tsf returns void.

Basically it comes down to this:

Sep 13 12:27:34 wz4a kernel: wlan0: Creating new IBSS network, BSSID 
7a:b9:60:8a:84:39
Sep 13 12:27:34 wz4a kernel: BUG: scheduling while atomic: swapper/0x0100/0
Sep 13 12:27:34 wz4a kernel:   schedule+0x43/0xa84   
extract_buf+0x97/0xc8
Sep 13 12:27:34 wz4a kernel:   wait_for_completion+0x6a/0x9f  
 default_wake_function+0x0/0xc
Sep 13 12:27:34 wz4a kernel:   usb_start_wait_urb+0x98/0xdc [usbcore] 
  timeout_kill+0x0/0x5 [usbcore]
Sep 13 12:27:34 wz4a kernel:   usb_control_msg+0xc3/0xde [usbcore]  
 rt2x00_vendor_request+0x7c/0xa6 [rt73usb]
Sep 13 12:27:34 wz4a kernel:   rt73usb_reset_tsf+0x30/0x59 [rt73usb]  
 ieee80211_sta_join_ibss+0x3a/0x572 [80211]
Sep 13 12:27:34 wz4a kernel:   printk+0x14/0x18   
ieee80211_rx_bss_add+0x88/0x90 [80211]
Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_find_ibss+0x30e/0x366 
[80211]   ieee80211_sta_timer+0x0/0x18f [80211]
Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_timer+0x7a/0x18f [80211] 
  ieee80211_sta_timer+0x0/0x18f [80211]
Sep 13 12:27:34 wz4a kernel:   run_timer_softirq+0x10b/0x153  
 __do_softirq+0x58/0xc2
Sep 13 12:27:34 wz4a kernel:   do_softirq+0x2e/0x32   
do_IRQ+0x1e/0x24
Sep 13 12:27:34 wz4a kernel:   common_interrupt+0x1a/0x20   
acpi_processor_idle+0x18a/0x39e [processor]
Sep 13 12:27:34 wz4a kernel:   cpu_idle+0x8f/0xa8   
start_kernel+0x355/0x35c

With the compilation of d80211 the CONFIG_D80211_DEBUG is set by default,
so no CONFIG_D80211_IBSS_DEBUG.

This does not happen in rt2500usb driver, since no TSF handling is possible
due to a lack of TSF registers in the device.

Ivo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Jiri Benc
On Thu, 5 Oct 2006 17:00:31 +0200, Ivo van Doorn wrote:
> Basically it comes down to this:
> 
> Sep 13 12:27:34 wz4a kernel: wlan0: Creating new IBSS network, BSSID 
> 7a:b9:60:8a:84:39
> Sep 13 12:27:34 wz4a kernel: BUG: scheduling while atomic: 
> swapper/0x0100/0
> Sep 13 12:27:34 wz4a kernel:   schedule+0x43/0xa84   
> extract_buf+0x97/0xc8
> Sep 13 12:27:34 wz4a kernel:   wait_for_completion+0x6a/0x9f  
>  default_wake_function+0x0/0xc
> Sep 13 12:27:34 wz4a kernel:   usb_start_wait_urb+0x98/0xdc 
> [usbcore]   timeout_kill+0x0/0x5 [usbcore]
> Sep 13 12:27:34 wz4a kernel:   usb_control_msg+0xc3/0xde [usbcore]  
>  rt2x00_vendor_request+0x7c/0xa6 [rt73usb]
> Sep 13 12:27:34 wz4a kernel:   rt73usb_reset_tsf+0x30/0x59 
> [rt73usb]   ieee80211_sta_join_ibss+0x3a/0x572 [80211]
> Sep 13 12:27:34 wz4a kernel:   printk+0x14/0x18   
> ieee80211_rx_bss_add+0x88/0x90 [80211]
> Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_find_ibss+0x30e/0x366 
> [80211]   ieee80211_sta_timer+0x0/0x18f [80211]
> Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_timer+0x7a/0x18f 
> [80211]   ieee80211_sta_timer+0x0/0x18f [80211]
> Sep 13 12:27:34 wz4a kernel:   run_timer_softirq+0x10b/0x153  
>  __do_softirq+0x58/0xc2
> Sep 13 12:27:34 wz4a kernel:   do_softirq+0x2e/0x32   
> do_IRQ+0x1e/0x24
> Sep 13 12:27:34 wz4a kernel:   common_interrupt+0x1a/0x20  
>  acpi_processor_idle+0x18a/0x39e [processor]
> Sep 13 12:27:34 wz4a kernel:   cpu_idle+0x8f/0xa8   
> start_kernel+0x355/0x35c

So this will be solved for free when sta_timer is converted to a
workqueue.

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Jan Kiszka
Ivo van Doorn wrote:
> On Thursday 05 October 2006 13:37, Jiri Benc wrote:
>> On Wed, 4 Oct 2006 19:22:38 +0200, Ivo van Doorn wrote:
>>> Well another point of concern for me is the TSF handling, those handlers 
>>> are called
>>> from interrupt context as well, and also deliver problems for the USB 
>>> drivers in case
>>> of adhoc mode.
>> Where is a problem with tsf handlers? get_tsf is not called at all
>> (unless CONFIG_D80211_IBSS_DEBUG is set; well, that raises a question
>> why the function exists in the first place), reset_tsf returns void.
> 
> Basically it comes down to this:
> 
> Sep 13 12:27:34 wz4a kernel: wlan0: Creating new IBSS network, BSSID 
> 7a:b9:60:8a:84:39
> Sep 13 12:27:34 wz4a kernel: BUG: scheduling while atomic: 
> swapper/0x0100/0
> Sep 13 12:27:34 wz4a kernel:   schedule+0x43/0xa84   
> extract_buf+0x97/0xc8
> Sep 13 12:27:34 wz4a kernel:   wait_for_completion+0x6a/0x9f  
>  default_wake_function+0x0/0xc
> Sep 13 12:27:34 wz4a kernel:   usb_start_wait_urb+0x98/0xdc 
> [usbcore]   timeout_kill+0x0/0x5 [usbcore]
> Sep 13 12:27:34 wz4a kernel:   usb_control_msg+0xc3/0xde [usbcore]  
>  rt2x00_vendor_request+0x7c/0xa6 [rt73usb]
> Sep 13 12:27:34 wz4a kernel:   rt73usb_reset_tsf+0x30/0x59 
> [rt73usb]   ieee80211_sta_join_ibss+0x3a/0x572 [80211]
> Sep 13 12:27:34 wz4a kernel:   printk+0x14/0x18   
> ieee80211_rx_bss_add+0x88/0x90 [80211]
> Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_find_ibss+0x30e/0x366 
> [80211]   ieee80211_sta_timer+0x0/0x18f [80211]
> Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_timer+0x7a/0x18f 
> [80211]   ieee80211_sta_timer+0x0/0x18f [80211]
> Sep 13 12:27:34 wz4a kernel:   run_timer_softirq+0x10b/0x153  
>  __do_softirq+0x58/0xc2
> Sep 13 12:27:34 wz4a kernel:   do_softirq+0x2e/0x32   
> do_IRQ+0x1e/0x24
> Sep 13 12:27:34 wz4a kernel:   common_interrupt+0x1a/0x20  
>  acpi_processor_idle+0x18a/0x39e [processor]
> Sep 13 12:27:34 wz4a kernel:   cpu_idle+0x8f/0xa8   
> start_kernel+0x355/0x35c
> 
> With the compilation of d80211 the CONFIG_D80211_DEBUG is set by default,
> so no CONFIG_D80211_IBSS_DEBUG.
> 
> This does not happen in rt2500usb driver, since no TSF handling is possible
> due to a lack of TSF registers in the device.

This path would be fixed by my conversion patch of sta.timer into
sta.work that I sent you yesterday privately. Unfortunately, I don't
have a copy at hand ATM.

What about the other timers? Can they trigger any sleeping service of
rt2x00 drivers? Ok, waiting for a BUG is always possible... ;)

Jan



signature.asc
Description: OpenPGP digital signature


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Ivo van Doorn
Hi,

> > This does not happen in rt2500usb driver, since no TSF handling is possible
> > due to a lack of TSF registers in the device.
> 
> This path would be fixed by my conversion patch of sta.timer into
> sta.work that I sent you yesterday privately. Unfortunately, I don't
> have a copy at hand ATM.

That is what I expect as well, I'll ask confirmation from the person who 
submitted this bug.

> What about the other timers? Can they trigger any sleeping service of
> rt2x00 drivers? Ok, waiting for a BUG is always possible... ;)

Well I currently have no time to check it, but can
config_interface handler still be called from interrupt context or has this 
also been fixed?

Ivo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Ivo van Doorn
On Thursday 05 October 2006 17:13, Jiri Benc wrote:
> On Thu, 5 Oct 2006 17:00:31 +0200, Ivo van Doorn wrote:
> > Basically it comes down to this:
> > 
> > Sep 13 12:27:34 wz4a kernel: wlan0: Creating new IBSS network, BSSID 
> > 7a:b9:60:8a:84:39
> > Sep 13 12:27:34 wz4a kernel: BUG: scheduling while atomic: 
> > swapper/0x0100/0
> > Sep 13 12:27:34 wz4a kernel:   schedule+0x43/0xa84   
> > extract_buf+0x97/0xc8
> > Sep 13 12:27:34 wz4a kernel:   wait_for_completion+0x6a/0x9f  
> >  default_wake_function+0x0/0xc
> > Sep 13 12:27:34 wz4a kernel:   usb_start_wait_urb+0x98/0xdc 
> > [usbcore]   timeout_kill+0x0/0x5 [usbcore]
> > Sep 13 12:27:34 wz4a kernel:   usb_control_msg+0xc3/0xde 
> > [usbcore]   rt2x00_vendor_request+0x7c/0xa6 [rt73usb]
> > Sep 13 12:27:34 wz4a kernel:   rt73usb_reset_tsf+0x30/0x59 
> > [rt73usb]   ieee80211_sta_join_ibss+0x3a/0x572 [80211]
> > Sep 13 12:27:34 wz4a kernel:   printk+0x14/0x18   
> > ieee80211_rx_bss_add+0x88/0x90 [80211]
> > Sep 13 12:27:34 wz4a kernel:   
> > ieee80211_sta_find_ibss+0x30e/0x366 [80211]   
> > ieee80211_sta_timer+0x0/0x18f [80211]
> > Sep 13 12:27:34 wz4a kernel:   ieee80211_sta_timer+0x7a/0x18f 
> > [80211]   ieee80211_sta_timer+0x0/0x18f [80211]
> > Sep 13 12:27:34 wz4a kernel:   run_timer_softirq+0x10b/0x153  
> >  __do_softirq+0x58/0xc2
> > Sep 13 12:27:34 wz4a kernel:   do_softirq+0x2e/0x32   
> > do_IRQ+0x1e/0x24
> > Sep 13 12:27:34 wz4a kernel:   common_interrupt+0x1a/0x20  
> >  acpi_processor_idle+0x18a/0x39e [processor]
> > Sep 13 12:27:34 wz4a kernel:   cpu_idle+0x8f/0xa8   
> > start_kernel+0x355/0x35c
> 
> So this will be solved for free when sta_timer is converted to a
> workqueue.

Hi,

True, this is what I realized later as well. :)
I have asked for confirmation by the bug submitter.

Ivo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Ivo van Doorn
On Thursday 05 October 2006 17:39, Jiri Benc wrote:
> On Thu, 5 Oct 2006 17:32:39 +0200, Ivo van Doorn wrote:
> > Well I currently have no time to check it, but can
> > config_interface handler still be called from interrupt context or has this 
> > also been fixed?
> 
> Will be fixed by the sta_timer conversion as well.

Excellent news. :D

Thanks.

Ivo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: d80211: ieee80211_hw handlers in atomic context

2006-10-05 Thread Jiri Benc
On Thu, 5 Oct 2006 17:32:39 +0200, Ivo van Doorn wrote:
> Well I currently have no time to check it, but can
> config_interface handler still be called from interrupt context or has this 
> also been fixed?

Will be fixed by the sta_timer conversion as well.

 Jiri

-- 
Jiri Benc
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Hans Henrik Happe
On Thursday 05 October 2006 12:21, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> > On Thursday 05 October 2006 10:57, Evgeniy Polyakov wrote:
> > 
> > > Well, it is possible to create /sys/proc entry for that, and even now
> > > userspace can grow mapping ring until it is forbiden by kernel, which
> > > means limit is reached.
> > 
> > No need for yet another /sys/proc entry.
> > 
> > Right now, I (for example) may have a use for Generic event handling, but 
for 
> > a program that needs XXX.XXX handles, and about XX.XXX events per second.
> > 
> > Right now, this program uses epoll, and reaches no limit at all, once you 
pass 
> > the "ulimit -n", and other kernel wide tunes of course, not related to 
epoll.
> > 
> > With your current kevent, I cannot switch to it, because of hardcoded 
limits.
> > 
> > I may be wrong, but what is currently missing for me is :
> > 
> > - No hardcoded limit on the max number of events. (A process that can open 
> > XXX.XXX files should be allowed to open a kevent queue with at least 
XXX.XXX 
> > events). Right now thats not clear what happens IF the current limit is 
> > reached.
> 
> This forces to overflows in fixed sized memory mapped buffer.
> If we remove memory mapped buffer or will allow to have overflows (and
> thus skipped entries) keven can easily scale to that limits (tested with
> xx.xxx events though).
> 
> > - In order to avoid touching the whole ring buffer, it might be good to be 
> > able to reset the indexes to the beginning when ring buffer is empty. (So 
if 
> > the user land is responsive enough to consume events, only first pages of 
the 
> > mapping would be used : that saves L1/L2 cpu caches)
> 
> And what happens when there are 3 empty at the beginning and \we need to
> put there 4 ready events?

Couldn't there be 3 areas in the mmap buffer:

- Unused: entries that the kernel can alloc from.
- Alloced: entries alloced by kernel but not yet used by user. Kernel can 
update these if new events requires that.
- Consumed: entries that the user are processing.

The user takes a set of alloced entries and make them consumed. Then it 
processes the events after which it makes them unused. 

If there are no unused entries and the kernel needs some, it has wait for free 
entries. The user has to notify when unused entries becomes available. It 
could set a flag in the mmap'ed area to avoid unnessesary wakeups.

The are some details with indexing and wakeup notification that I have left 
out, but I hope my idea is clear. I could give a more detailed description if 
requested. Also, I'm a user-level programmer so I might not get the whole 
picture.

Hans Henrik Happe
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Evgeniy Polyakov
On Thu, Oct 05, 2006 at 04:01:19PM +0200, Hans Henrik Happe ([EMAIL PROTECTED]) 
wrote:
> > And what happens when there are 3 empty at the beginning and \we need to
> > put there 4 ready events?
> 
> Couldn't there be 3 areas in the mmap buffer:
> 
> - Unused: entries that the kernel can alloc from.
> - Alloced: entries alloced by kernel but not yet used by user. Kernel can 
> update these if new events requires that.
> - Consumed: entries that the user are processing.
> 
> The user takes a set of alloced entries and make them consumed. Then it 
> processes the events after which it makes them unused. 
> 
> If there are no unused entries and the kernel needs some, it has wait for 
> free 
> entries. The user has to notify when unused entries becomes available. It 
> could set a flag in the mmap'ed area to avoid unnessesary wakeups.
> 
> The are some details with indexing and wakeup notification that I have left 
> out, but I hope my idea is clear. I could give a more detailed description if 
> requested. Also, I'm a user-level programmer so I might not get the whole 
> picture.

This looks good on a picture, but how can you put it into page-based
storage without major and complex shared structures, which should be
properly locked between kernelspace and userspace?

> Hans Henrik Happe

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 0/4] kevent: Generic event handling mechanism.

2006-10-05 Thread Ulrich Drepper
Evgeniy Polyakov wrote:
> And you can add/remove signal events using existing kevent api between
> calls.

That's far more expensive than using a mask under control of the program.


> And creating special cases for usual events is bad.
> There is unified way to deal with events in kevent -
> add/remove/modify/wait on them, signals are just usual events.

How can this be unified?  The installment of the temporary signal mask
is unlike the handling of signal for the purpose of reporting them
through the signal queue.  It's equally completely new functionality.
Don't kid yourself in thinking that because this is signal stuff, too,
you're "unifying" something.  The way this signal mask is used has
nothing whatsoever to do with the delivering signals via the event
queue.  For the latter the signals always must be blocked (similar to
sigwait's requirement).

As a result it means you want to introduce a new mechanism for the event
queue instead of using the well known and often used method of
optionally passing a signal mask to the syscall.  That's just insane.


> I think you wanted to say, that 'all event mechanism except the most
> commonly used poll/select/epoll use timespec'.

Get your facts straight.  select uses timeval which is just the
predecessor of of timespec.  And epoll is just (badly) designed after
poll.  Fact is therefore that poll plus its spawn is the only interface
using such a timeout method.


> I designed it to be similar to poll(), it is really good interface.

Not many people agree.  All the interfaces designed (not derived) in the
last years take a timespec parameter.

Plus, you chose to ignore all the nice things using a timespec allow you
like absolute timeout modes etc.  See the clock_nanosleep()  interface
for a way this can be useful.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox
On Wed, 2006-10-04 at 18:08 -0700, Martin Bligh wrote:
> Andi Kleen wrote:
> >>I think most likely it would crash on 2.6.18. Keith mannthey had reported
> >>a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
> >>time. Following is the link to the thread.
> > 
> > 
> > Then maybe trying 2.6.17 + the patch and then bisect between that and -rc4?
> 
> I think it's fixed already in -git22, or at least it is for the IBM box
> reporting to test.kernel.org. You might want to try that one ...

-git22 also panics for me.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm3 oops in xfrm_register_mode

2006-10-05 Thread Badari Pulavarty
On Wed, 2006-10-04 at 16:02 -0500, Steve Fox wrote:
> On Wed, 2006-10-04 at 09:57 -0700, Andrew Morton wrote:
> 
> > You might well find this bisection lands you on origin.patch.  ie: a
> > mainline bug.  I note that David merged a few more xfrm fixes this morning.
> > 
> > So to confirm that, first test just origin.patch and if that fails, test
> > git-of-the-moment.  If that doesn't fail, they fixed it.
> 
> origin.patch from --m3 failed. Unfortunately so did a fresh clone of
> Linus's git tree.
> 

I am not an expert in that area, but your stack trace made me curious.
Looking at the dis-assembly, line of code in question is:

if (likely(modemap[mode->encap] == NULL)) {

Register contents indicate that, its called as

xfrm_register_mode(&xfrm4_tunnel_mode, AF_INET);
or
xfrm_register_mode(&xfrm4_transport_mode, AF_INET);

(family is AF_INET).

The invalid deref is due to modemap = 0x7ff (RAX: 07ff)

Since its so easy to reproduce, can you add a printk before
this check to dump mode->encap and modemap, afinfo, family etc ?
Just curious ..


Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [take19 1/4] kevent: Core files.

2006-10-05 Thread Hans Henrik Happe
On Thursday 05 October 2006 16:15, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 04:01:19PM +0200, Hans Henrik Happe 
([EMAIL PROTECTED]) wrote:
> > > And what happens when there are 3 empty at the beginning and \we need to
> > > put there 4 ready events?
> > 
> > Couldn't there be 3 areas in the mmap buffer:
> > 
> > - Unused: entries that the kernel can alloc from.
> > - Alloced: entries alloced by kernel but not yet used by user. Kernel can 
> > update these if new events requires that.
> > - Consumed: entries that the user are processing.
> > 
> > The user takes a set of alloced entries and make them consumed. Then it 
> > processes the events after which it makes them unused. 
> > 
> > If there are no unused entries and the kernel needs some, it has wait for 
free 
> > entries. The user has to notify when unused entries becomes available. It 
> > could set a flag in the mmap'ed area to avoid unnessesary wakeups.
> > 
> > The are some details with indexing and wakeup notification that I have 
left 
> > out, but I hope my idea is clear. I could give a more detailed description 
if 
> > requested. Also, I'm a user-level programmer so I might not get the whole 
> > picture.
> 
> This looks good on a picture, but how can you put it into page-based
> storage without major and complex shared structures, which should be
> properly locked between kernelspace and userspace?

I wasn't clear about the structure. I meant a ring-buffer with 3 areas. So 
it's basically the same model as Eric Dumazet described, only with 3 indexes; 
2 in the user-writeable page and 1 in kernel.

When the kernel has alloced an entry it should store it in a way that makes it 
invalid after user consumsion, which is simply an increment of an index. 
Sliding-window like schemes should solve this.

Hans Henrik Happe
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Badari Pulavarty
On Thu, 2006-10-05 at 09:53 -0500, Steve Fox wrote:
> On Wed, 2006-10-04 at 18:08 -0700, Martin Bligh wrote:
> > Andi Kleen wrote:
> > >>I think most likely it would crash on 2.6.18. Keith mannthey had reported
> > >>a different crash on 2.6.18-rc4-mm2 when this patch was introduced first
> > >>time. Following is the link to the thread.
> > > 
> > > 
> > > Then maybe trying 2.6.17 + the patch and then bisect between that and 
> > > -rc4?
> > 
> > I think it's fixed already in -git22, or at least it is for the IBM box
> > reporting to test.kernel.org. You might want to try that one ...
> 
> -git22 also panics for me.
> 

Steve,

Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? 
Last time I couldn't match your instruction dump to any code segment
in the routine. And also, can you post your .config file. I have
an amd64 and em64t machine and both work fine...

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox
On Thu, 2006-10-05 at 08:12 -0700, Badari Pulavarty wrote:

> Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? 

CONFIG_DEBUG_KERNEL should be on

> Last time I couldn't match your instruction dump to any code segment
> in the routine. And also, can you post your .config file. I have
> an amd64 and em64t machine and both work fine...

Unable to handle kernel NULL pointer dereference at 0827 RIP:
 [] xfrm_register_mode+0x36/0x60
PGD 0
Oops:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-git22 #1
RIP: 0010:[]  [] 
xfrm_register_mode+0x36/0x60
RSP: :810bffcbded0  EFLAGS: 00010286
RAX: 081f RBX: 805588a0 RCX: 
RDX:  RSI: 0002 RDI: 80559550
RBP: ffef R08: 3f924371 R09: 
R10: 810bffcbdcb0 R11: 0154 R12: 
R13: 810bffcbdef0 R14:  R15: 
FS:  () GS:805d2000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0827 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb4e0)
Stack:   8061fb48  80207182
    
    0009

The base config file I'm using is at
http://flooterbu.net/kernel/elm3b239-2.6.17.config

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.19-rc1 2/2] ehea: fix port state notification, default queue sizes

2006-10-05 Thread Jan-Bernd Themann
This patch includes a bug fix for the port state notification
and fixes the default queue sizes.


Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
---
 drivers/net/ehea/ehea.h  |   13 +++--
 drivers/net/ehea/ehea_main.c |6 +++---
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 23b451a..b40724f 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@ #include 
 #include 
 
 #define DRV_NAME   "ehea"
-#define DRV_VERSION"EHEA_0028"
+#define DRV_VERSION"EHEA_0034"
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
@@ -50,6 +50,7 @@ #define EHEA_MAX_ENTRIES_RQ3 16383
 #define EHEA_MAX_ENTRIES_SQ  32767
 #define EHEA_MIN_ENTRIES_QP  127
 
+#define EHEA_SMALL_QUEUES
 #define EHEA_NUM_TX_QP 1
 
 #ifdef EHEA_SMALL_QUEUES
@@ -59,11 +60,11 @@ #define EHEA_DEF_ENTRIES_RQ14095
 #define EHEA_DEF_ENTRIES_RQ21023
 #define EHEA_DEF_ENTRIES_RQ31023
 #else
-#define EHEA_MAX_CQE_COUNT 32000
-#define EHEA_DEF_ENTRIES_SQ16000
-#define EHEA_DEF_ENTRIES_RQ1   32080
-#define EHEA_DEF_ENTRIES_RQ24020
-#define EHEA_DEF_ENTRIES_RQ34020
+#define EHEA_MAX_CQE_COUNT  4080
+#define EHEA_DEF_ENTRIES_SQ 4080
+#define EHEA_DEF_ENTRIES_RQ18160
+#define EHEA_DEF_ENTRIES_RQ22040
+#define EHEA_DEF_ENTRIES_RQ32040
 #endif
 
 #define EHEA_MAX_ENTRIES_EQ 20
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 263d1c5..0edb2f8 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -769,7 +769,7 @@ static void ehea_parse_eqe(struct ehea_a
if (EHEA_BMASK_GET(NEQE_PORT_UP, eqe)) {
if (!netif_carrier_ok(port->netdev)) {
ret = ehea_sense_port_attr(
-   adapter->port[portnum]);
+   port);
if (ret) {
ehea_error("failed resensing port "
   "attributes");
@@ -821,7 +821,7 @@ static void ehea_parse_eqe(struct ehea_a
netif_stop_queue(port->netdev);
break;
default:
-   ehea_error("unknown event code %x", ec);
+   ehea_error("unknown event code %x, eqe=0x%lX", ec, eqe);
break;
}
 }
@@ -1845,7 +1845,7 @@ static int ehea_start_xmit(struct sk_buf
 
if (netif_msg_tx_queued(port)) {
ehea_info("post swqe on QP %d", pr->qp->init_attr.qp_nr);
-   ehea_dump(swqe, sizeof(*swqe), "swqe");
+   ehea_dump(swqe, 512, "swqe");
}
 
ehea_post_swqe(pr->qp, swqe);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] net/ipv6: seperate sit driver to extra module

2006-10-05 Thread Joerg Roedel
Is there a reason why the tunnel driver for IPv6-in-IPv4 is currently
compiled into the ipv6 module? This driver is only needed in gateways
between different IPv6 networks. On all other hosts with ipv6 enabled it
is not required. To have this driver in a seperate module will save
memory on those machines.
I appended a small and trival patch to 2.6.18 which does exactly this.

Joerg
diff -upr -X linux-2.6.18/Documentation/dontdiff 
linux-2.6.18-vanilla/net/ipv6/af_inet6.c linux-2.6.18/net/ipv6/af_inet6.c
--- linux-2.6.18-vanilla/net/ipv6/af_inet6.c2006-09-20 05:42:06.0 
+0200
+++ linux-2.6.18/net/ipv6/af_inet6.c2006-10-05 16:55:02.0 +0200
@@ -849,7 +849,6 @@ static int __init inet6_init(void)
err = addrconf_init();
if (err)
goto addrconf_fail;
-   sit_init();
 
/* Init v6 extension headers. */
ipv6_rthdr_init();
@@ -920,7 +919,6 @@ static void __exit inet6_exit(void)
raw6_proc_exit();
 #endif
/* Cleanup code parts. */
-   sit_cleanup();
ip6_flowlabel_cleanup();
addrconf_cleanup();
ip6_route_cleanup();
diff -upr -X linux-2.6.18/Documentation/dontdiff 
linux-2.6.18-vanilla/net/ipv6/Kconfig linux-2.6.18/net/ipv6/Kconfig
--- linux-2.6.18-vanilla/net/ipv6/Kconfig   2006-09-20 05:42:06.0 
+0200
+++ linux-2.6.18/net/ipv6/Kconfig   2006-10-05 17:07:11.0 +0200
@@ -126,6 +126,19 @@ config INET6_XFRM_MODE_TUNNEL
 
  If unsure, say Y.
 
+config IPV6_SIT
+   tristate "IPv6: IPv6-in-IPv4 tunnel (SIT driver)"
+   depends on IPV6
+   default n
+   ---help---
+ Tunneling means encapsulating data of one protocol type within
+ another protocol and sending it over a channel that understands the
+ encapsulating protocol. This driver implements encapsulation of IPv6
+ into IPv4 packets. This is usefull if you want to connect two IPv6
+ networks over an IPv4-only path.
+
+ Saying M here will produce a module called sit.ko. If unsure, say N.
+
 config IPV6_TUNNEL
tristate "IPv6: IPv6-in-IPv6 tunnel"
select INET6_TUNNEL
diff -upr -X linux-2.6.18/Documentation/dontdiff 
linux-2.6.18-vanilla/net/ipv6/Makefile linux-2.6.18/net/ipv6/Makefile
--- linux-2.6.18-vanilla/net/ipv6/Makefile  2006-09-20 05:42:06.0 
+0200
+++ linux-2.6.18/net/ipv6/Makefile  2006-10-05 17:10:42.0 +0200
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_IPV6) += ipv6.o
 
-ipv6-objs :=   af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o sit.o \
+ipv6-objs :=   af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o raw.o \
protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \
exthdrs.o sysctl_net_ipv6.o datagram.o proc.o \
@@ -24,6 +24,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TRANSPORT) 
 obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_NETFILTER)+= netfilter/
 
+obj-$(CONFIG_IPV6_SIT) += sit.o
 obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o
 
 obj-y += exthdrs_core.o
diff -upr -X linux-2.6.18/Documentation/dontdiff 
linux-2.6.18-vanilla/net/ipv6/sit.c linux-2.6.18/net/ipv6/sit.c
--- linux-2.6.18-vanilla/net/ipv6/sit.c 2006-09-20 05:42:06.0 +0200
+++ linux-2.6.18/net/ipv6/sit.c 2006-10-05 16:55:02.0 +0200
@@ -850,3 +850,6 @@ int __init sit_init(void)
inet_del_protocol(&sit_protocol, IPPROTO_IPV6);
goto out;
 }
+
+module_init(sit_init);
+module_exit(sit_cleanup);


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen
On Thursday 05 October 2006 17:32, Steve Fox wrote:
> On Thu, 2006-10-05 at 08:12 -0700, Badari Pulavarty wrote:
> 
> > Can you post the latest panic stack again (with CONFIG_DEBUG_KERNEL) ? 
> 
> CONFIG_DEBUG_KERNEL should be on
> 
> > Last time I couldn't match your instruction dump to any code segment
> > in the routine. And also, can you post your .config file. I have
> > an amd64 and em64t machine and both work fine...
> 
> Unable to handle kernel NULL pointer dereference at 0827 RIP:
>  [] xfrm_register_mode+0x36/0x60
> PGD 0
> Oops:  [1] SMP
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.18-git22 #1
> RIP: 0010:[]  [] 
> xfrm_register_mode+0x36/0x60
> RSP: :810bffcbded0  EFLAGS: 00010286
> RAX: 081f RBX: 805588a0 RCX: 
> RDX:  RSI: 0002 RDI: 80559550
> RBP: ffef R08: 3f924371 R09: 
> R10: 810bffcbdcb0 R11: 0154 R12: 
> R13: 810bffcbdef0 R14:  R15: 
> FS:  () GS:805d2000() knlGS:
> CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> CR2: 0827 CR3: 00201000 CR4: 06e0
> Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb4e0)
> Stack:   8061fb48  80207182
>     
>     0009

Please don't snip the Code: line. It is fairly important.

> 
> The base config file I'm using is at
> http://flooterbu.net/kernel/elm3b239-2.6.17.config

My guess is that something is wrong with the global variable it is accessing.
Can you post the output of grep -5 xfrm_policy_afinfo ? 

I wonder if that variable overlaps something else.

And please add a 
printk("global %p\n",  xfrm_policy_afinfo[family]);
at the beginning of net/xfrm/xfrm_poliy.c:xfrm_policy_lock_afinfo
and post the output.

If not then it's possible
that some nearby variable is overflowing or similar. Adding some padding
around xfrm_policy_afinfo would show that. 

Another way if that global is proven to be corrupted will be to add
checks all over the boot process to track down where it gets corrupted.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.19-rc1 1/2] ehea: firmware (hvcall) interface changes

2006-10-05 Thread Jan-Bernd Themann
This eHEA patch covers required changes related to Anton Blanchard's new hvcall 
interface.

Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
---

diff --git a/drivers/net/ehea/ehea_phyp.c b/drivers/net/ehea/ehea_phyp.c
index 4a85aca..0b51a8c 100644
--- a/drivers/net/ehea/ehea_phyp.c
+++ b/drivers/net/ehea/ehea_phyp.c
@@ -44,71 +44,99 @@ #define H_ALL_RES_TYPE_EQ3
 #define H_ALL_RES_TYPE_MR5
 #define H_ALL_RES_TYPE_MW6
 
-static long ehea_hcall_9arg_9ret(unsigned long opcode,
-unsigned long arg1, unsigned long arg2,
-unsigned long arg3, unsigned long arg4,
-unsigned long arg5, unsigned long arg6,
-unsigned long arg7, unsigned long arg8,
-unsigned long arg9, unsigned long *out1,
-unsigned long *out2,unsigned long *out3,
-unsigned long *out4,unsigned long *out5,
-unsigned long *out6,unsigned long *out7,
-unsigned long *out8,unsigned long *out9)
+static long ehea_plpar_hcall_norets(unsigned long opcode,
+   unsigned long arg1,
+   unsigned long arg2,
+   unsigned long arg3,
+   unsigned long arg4,
+   unsigned long arg5,
+   unsigned long arg6,
+   unsigned long arg7)
 {
-   long hret;
+   long ret;
int i, sleep_msecs;
 
for (i = 0; i < 5; i++) {
-   hret = plpar_hcall_9arg_9ret(opcode,arg1, arg2, arg3, arg4,
-arg5, arg6, arg7, arg8, arg9, out1,
-out2, out3, out4, out5, out6, out7,
-out8, out9);
-   if (H_IS_LONG_BUSY(hret)) {
-   sleep_msecs = get_longbusy_msecs(hret);
+   ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4,
+arg5, arg6, arg7);
+
+   if (H_IS_LONG_BUSY(ret)) {
+   sleep_msecs = get_longbusy_msecs(ret);
msleep_interruptible(sleep_msecs);
continue;
}
 
-   if (hret < H_SUCCESS)
-   ehea_error("op=%lx hret=%lx "
-  "i1=%lx i2=%lx i3=%lx i4=%lx i5=%lx i6=%lx "
-  "i7=%lx i8=%lx i9=%lx "
-  "o1=%lx o2=%lx o3=%lx o4=%lx o5=%lx o6=%lx "
-  "o7=%lx o8=%lx o9=%lx",
-  opcode, hret, arg1, arg2, arg3, arg4, arg5,
-  arg6, arg7, arg8, arg9, *out1, *out2, *out3,
-  *out4, *out5, *out6, *out7, *out8, *out9);
-   return hret;
+   if (ret < H_SUCCESS)
+   ehea_error("opcode=%lx ret=%lx"
+  " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
+  " arg5=%lx arg6=%lx arg7=%lx ",
+  opcode, ret,
+  arg1, arg2, arg3, arg4, arg5,
+  arg6, arg7);
+
+   return ret;
}
+
return H_BUSY;
 }
 
-u64 ehea_h_query_ehea_qp(const u64 adapter_handle, const u8 qp_category,
-const u64 qp_handle, const u64 sel_mask, void *cb_addr)
+static long ehea_plpar_hcall9(unsigned long opcode,
+ unsigned long *outs, /* array of 9 outputs */
+ unsigned long arg1,
+ unsigned long arg2,
+ unsigned long arg3,
+ unsigned long arg4,
+ unsigned long arg5,
+ unsigned long arg6,
+ unsigned long arg7,
+ unsigned long arg8,
+ unsigned long arg9)
 {
-   u64 dummy;
+   long ret;
+   int i, sleep_msecs;
 
-   if u64)cb_addr) & (PAGE_SIZE - 1)) != 0) {
-   ehea_error("not on pageboundary");
-   return H_PARAMETER;
+   for (i = 0; i < 5; i++) {
+   ret = plpar_hcall9(opcode, outs,
+  arg1, arg2, arg3, arg4, arg5,
+  arg6, arg7, arg8, arg9);
+
+   if (H_IS_LONG_BUSY(ret)) {
+   sleep_msecs = get_longbusy_msecs(ret);
+   msleep_interruptible(sleep_msecs);
+   continue;
+   }
+
+   if (ret < H_SUCCESS)
+ 

Re: [PATCH][RFC] net/ipv6: seperate sit driver to extra module

2006-10-05 Thread Joerg Roedel
On Thu, Oct 05, 2006 at 11:49:38AM -0400, James Morris wrote:
> On Thu, 5 Oct 2006, Joerg Roedel wrote:
> 
> > Is there a reason why the tunnel driver for IPv6-in-IPv4 is currently
> > compiled into the ipv6 module? This driver is only needed in gateways
> > between different IPv6 networks. On all other hosts with ipv6 enabled it
> > is not required. To have this driver in a seperate module will save
> > memory on those machines.
> > I appended a small and trival patch to 2.6.18 which does exactly this.
> 
> Looks ok to me, although given that users used to get this by default when 
> selecting IPv6, perhaps the default in Kconfig should be y.

Ok, a good point to write y there. I change this. I wrote n there
because of the "If unsure, say N" sentence in the description.

Joerg
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.18 4/6]: sb1250-mac: Driver model & phylib support

2006-10-05 Thread Maciej W. Rozycki
 This is an update including the following changes:

1. Some help text for Kconfig.

2. Removal of unused module options.

3. Phylib support and the resulting removal of generic bits for handling 
   the PHY.

4. Proper reserving of device resources and using ioremap()ped handles
   to access MAC registers rather than platform-specific macros.

5. Handling of the device using the driver model.

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
---

 This revision fixes the problem with drivers/net/Kconfig.

 Please consider.

  Maciej

patch-2.6.18-sb1250-mac-16
diff -up --recursive --new-file linux-2.6.18.macro/drivers/net/Kconfig 
linux-2.6.18/drivers/net/Kconfig
--- linux-2.6.18.macro/drivers/net/Kconfig  2006-09-20 03:42:06.0 
+
+++ linux-2.6.18/drivers/net/Kconfig2006-10-05 15:50:20.0 +
@@ -456,6 +456,15 @@ config MIPS_AU1X00_ENET
 config NET_SB1250_MAC
tristate "SB1250 Ethernet support"
depends on NET_ETHERNET && SIBYTE_SB1xxx_SOC
+   select PHYLIB
+   ---help---
+ This driver supports gigabit Ethernet interfaces based on the
+ Broadcom SiByte family of System-On-a-Chip parts.  They include
+ the BCM1120, BCM1125, BCM1125H, BCM1250, BCM1255, BCM1280, BCM1455
+ and BCM1480 chips.
+
+ To compile this driver as a module, choose M here: the module
+ will be called sb1250-mac.
 
 config SGI_IOC3_ETH
bool "SGI IOC3 Ethernet"
diff -up --recursive --new-file linux-2.6.18.macro/drivers/net/sb1250-mac.c 
linux-2.6.18/drivers/net/sb1250-mac.c
--- linux-2.6.18.macro/drivers/net/sb1250-mac.c 2006-09-20 03:42:06.0 
+
+++ linux-2.6.18/drivers/net/sb1250-mac.c   2006-10-05 15:48:50.0 
+
@@ -1,5 +1,6 @@
 /*
  * Copyright (C) 2001,2002,2003,2004 Broadcom Corporation
+ * Copyright (c) 2006  Maciej W. Rozycki
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -18,7 +19,11 @@
  *
  * This driver is designed for the Broadcom SiByte SOC built-in
  * Ethernet controllers. Written by Mitch Lichtenberg at Broadcom Corp.
+ *
+ * Updated to the driver model and the PHY abstraction layer
+ * by Maciej W. Rozycki.
  */
+
 #include 
 #include 
 #include 
@@ -32,9 +37,18 @@
 #include 
 #include 
 #include 
-#include  /* Processor type for cache alignment. 
*/
-#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
 #include 
+#include 
+#include  /* Processor type for cache alignment. */
 
 /* This is only here until the firmware is ready.  In that case,
the firmware leaves the ethernet address in the register for us. */
@@ -48,7 +62,7 @@
 
 /* These identify the driver base version and may not be removed. */
 #if 0
-static char version1[] __devinitdata =
+static char version1[] __initdata =
 "sb1250-mac.c:1.00 1/11/2001 Written by Mitch Lichtenberg\n";
 #endif
 
@@ -57,8 +71,6 @@ static char version1[] __devinitdata =
 
 #define CONFIG_SBMAC_COALESCE
 
-#define MAX_UNITS 4/* More are supported, limit only on options */
-
 /* Time in jiffies before concluding the transmitter is hung. */
 #define TX_TIMEOUT  (2*HZ)
 
@@ -74,26 +86,6 @@ static int debug = 1;
 module_param(debug, int, S_IRUGO);
 MODULE_PARM_DESC(debug, "Debug messages");
 
-/* mii status msgs */
-static int noisy_mii = 1;
-module_param(noisy_mii, int, S_IRUGO);
-MODULE_PARM_DESC(noisy_mii, "MII status messages");
-
-/* Used to pass the media type, etc.
-   Both 'options[]' and 'full_duplex[]' should exist for driver
-   interoperability.
-   The media type is usually passed in 'options[]'.
-*/
-#ifdef MODULE
-static int options[MAX_UNITS] = {-1, -1, -1, -1};
-module_param_array(options, int, NULL, S_IRUGO);
-MODULE_PARM_DESC(options, "1-" __MODULE_STRING(MAX_UNITS));
-
-static int full_duplex[MAX_UNITS] = {-1, -1, -1, -1};
-module_param_array(full_duplex, int, NULL, S_IRUGO);
-MODULE_PARM_DESC(full_duplex, "1-" __MODULE_STRING(MAX_UNITS));
-#endif
-
 #ifdef CONFIG_SBMAC_COALESCE
 static int int_pktcnt = 0;
 module_param(int_pktcnt, int, S_IRUGO);
@@ -104,6 +96,7 @@ module_param(int_timeout, int, S_IRUGO);
 MODULE_PARM_DESC(int_timeout, "Timeout value");
 #endif
 
+#include 
 #include 
 #if defined(CONFIG_SIBYTE_BCM1x55) || defined(CONFIG_SIBYTE_BCM1x80)
 #include 
@@ -126,22 +119,43 @@ MODULE_PARM_DESC(int_timeout, "Timeout v
 #error invalid SiByte MAC configuation
 #endif
 
+#ifdef K_INT_PHY
+#define SBMAC_PHY_INT  K_INT_PHY
+#else
+#define SBMAC_PHY_INT  PHY_POLL
+#endif
+
 /**
  *  Simple types
  * */
 
-
-typedef enum { sbmac_speed_auto, sbmac_speed_10,
-  sbmac_speed_100, sbmac_speed_1000 } sbmac_speed_t;
-
-typedef enum { sbmac_duplex_auto, sbmac_duplex_half,
-  sbmac_duple

Re: [RFC] cfg80211 and nl80211

2006-10-05 Thread Jouni Malinen
On Wed, Oct 04, 2006 at 01:57:38PM -0400, Dan Williams wrote:
> On Wed, 2006-10-04 at 16:19 +0200, Johannes Berg wrote:
> > On Wed, 2006-10-04 at 09:41 +0200, Johannes Berg wrote:

> > Should cfg80211 do the chore of keeping track of the whole scan results?
> > On the other hand, that doesn't seem to be doable with legacy hardware
> > that does all the scanning. So probably one call for
> >cfg80211_notify_scan()
> > that takes a new scan result structure (taking a single BSSID etc.) and
> > notifies all listeners.
> > The same structure is used for get_scan() from the wiphy ops in an
> > iterator interface like some other calls.
> 
> Is it a problem to actually push the _entire_ scan list out to clients
> over netlink?  The scan list could be quite large, maybe even a few
> kilobytes when stuff like Information Elements, ratesets, etc is
> available.  I've seen 35-item scan lists that are already around 1.5K.

1.5 KB sounds like a small scan result set to me.. I'm hitting 100+
BSSes at work (well, not really your normal environment ;-), and 50 at
home.. These go way beyond 1.5 KB; closer to 32 KB at times, I'd guess.

> There are several issues here.  They can be roughly split by encryption
> algorithm.  But the big question:
> 
> Is there a case for _multiple_ encryption algorithms enabled
> on a single "virtual" interface at one time?

What exactly do you mean with this? WPA allows different STAs associated
with an AP to use different unicast encryption algorithms. This means
that a client may need to use CCMP with key index 0 for unicast and TKIP
with key index 1 for multicast.

> Taking one-at-a-time as a given, and the pseudo-structure
> 
> struct cmd_crypto {
>   enum crypto_alg alg;
>   union data {
>   none_data;
>   wep_data;
>   wpa_data;

wep vs. wpa in crypto configuration does not make sense to me. WPA uses
multiple ciphers; even WEP is allowed for group keys..

> Set alg == , set the options, and the driver will _enable_
> that crypto mode with the given options.  It makes no sense at all to,
> say, set the WEP transmit key index or WEP key when the card is in WPA
> mode or no-crypto mode.

What is this "WPA" mode? Please note that IEEE 802.11i allows WEP to be
used for group (multicast/broadcast) keys.. WPA should not be mixed in
here with encryption key configuration. The are different encryption
algorithms, like WEP, TKIP, CCMP, and they need to have keys and other
parameters like key index and seq# configured. This is regardless of
whether WPA is used or not.

> It's important to note that some options are independent of the initial
> operation that enabled the crypto, and need to be set later without
> triggering deauth and such.  Setting non-TX-index WEP key is one such
> operation.  I should be able to set WEP keys at indexes other than the
> transmit key index without affecting operation of the card (unless some
> hardware/firmware issue prevents this).

And same for TKIP and CCMP.

> - WEP encryption (following ops are independent of each other):
> - Set TX key index
> - Set privacy invoked

These two are not WEP specific in any way.

> - Set exclude unencrypted packets

I would consider this more as a global variable for the BSS to match
with dot11ExcludeUnecrypted variable defined in IEEE 802.11. In theory,
this is not specific to WEP, but in practice, only WEP is sometimes used
in mode which allows both encrypted and unencrypted frames.

> - Set authentication mode (open, shared-key, or both)

Not really WEP specific. Shared Key authentication can only be used with
static WEP keys, but still, this configuration is not really part of
crypto configuration. In addition, Cisco uses a proprietary "Network
EAP" authentication algorithm and IEEE 802.11r is adding a new
authentication algorithm, so there are more options to this
configuration variable.

> - Set (or clear) WEP key 1, 2, 3, or 4

Not specific to WEP.

> - WPA/WPA2/IEEE8021X
> - Jouni/others would know better and my brain is fried right now

This item should not be WPA/WPA2/IEEE8021X, but TKIP/CCMP, i.e.,
ciphers like WEP.. Just like with WEP, there would need to be key index
parameter. Default TX key could be set with separate operation (it is
valid to switch between two keys without changing either one). TKIP and
CCMP will also need options for setting and getting the sequence number
for replay protection (TSC/PN).

In other words, WEP, TKIP, CCMP (and likely all future ciphers added to
802.11) would be using the same configuration interface with same set of
parameters. Some of these parameters are just ignored for some of the
ciphers (e.g., WEP does not really need seq# get/set).

> All the WEP options should be independent attributes in nl80211.  You
> could even have a generic WEPKey attribute that is defined like so:
> 
> ATTR_WEP_KEY {
>   enum type (one of DISABLE, TYPE_40, TYPE_104, TYPE_152)

I would rather use key length than come 

Request to postpone WE-21

2006-10-05 Thread Jean Tourrilhes
Hi John,

Based on the feedback, I formally request you to back out all
of WE-21 from 2.6.19. Rationale : it's probably too early. You can
keep it for a later date if you wish.
Regards,

Jean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kernel-doc fix for sock.h

2006-10-05 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Fix kernel-doc warning in include/net/sock.h:
Warning(/var/linsrc/linux-2619-rc1-pv//include/net/sock.h:894): No description 
found for parameter 'rcu'

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 include/net/sock.h |3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)

--- linux-2619-rc1-pv.orig/include/net/sock.h
+++ linux-2619-rc1-pv/include/net/sock.h
@@ -884,8 +884,7 @@ static inline int sk_filter(struct sock 
 
 /**
  * sk_filter_release: Release a socket filter
- * @sk: socket
- * @fp: filter to remove
+ * @rcu: rcu_head that contains the sk_filter info to remove
  *
  * Remove a filter from a socket and release its resources.
  */


---
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Offloading features in VLAN interfaces

2006-10-05 Thread Ben Greear

Olivier Crameri wrote:

Same thing but with the patch this time.
Since the VLAN device's features may also change in the handler, 
shouldn't we check and generate a feature-change

event for the VLAN device(s) as well?

Ben


--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.18 7/6]: sb1250-mac: Remove "typedef" obfuscation

2006-10-05 Thread Maciej W. Rozycki
 This is a set of changes to remove unneeded type definitions that only 
make code less obvious.  It applies to all "enum" and "struct" types as 
well as to potentially unsafe use of them within sizeof().

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
---

 This applies on top of 4/6.  Please consider.

  Maciej

patch-mips-2.6.18-20060920-sb1250-mac-typedef-3
diff -up --recursive --new-file 
linux-mips-2.6.18-20060920.macro/drivers/net/sb1250-mac.c 
linux-mips-2.6.18-20060920/drivers/net/sb1250-mac.c
--- linux-mips-2.6.18-20060920.macro/drivers/net/sb1250-mac.c   2006-09-28 
02:51:29.0 +
+++ linux-mips-2.6.18-20060920/drivers/net/sb1250-mac.c 2006-10-05 
16:18:41.0 +
@@ -129,33 +129,33 @@ MODULE_PARM_DESC(int_timeout, "Timeout v
  *  Simple types
  * */
 
-typedef enum {
+enum sbmac_speed {
sbmac_speed_none = 0,
sbmac_speed_10 = SPEED_10,
sbmac_speed_100 = SPEED_100,
sbmac_speed_1000 = SPEED_1000,
-} sbmac_speed_t;
+};
 
-typedef enum {
+enum sbmac_duplex {
sbmac_duplex_none = -1,
sbmac_duplex_half = DUPLEX_HALF,
sbmac_duplex_full = DUPLEX_FULL,
-} sbmac_duplex_t;
+};
 
-typedef enum {
+enum sbmac_fc {
sbmac_fc_none,
sbmac_fc_disabled,
sbmac_fc_frame,
sbmac_fc_collision,
sbmac_fc_carrier,
-} sbmac_fc_t;
+};
 
-typedef enum {
+enum sbmac_state {
sbmac_state_uninit,
sbmac_state_off,
sbmac_state_on,
sbmac_state_broken,
-} sbmac_state_t;
+};
 
 
 /**
@@ -181,52 +181,58 @@ typedef enum {
  *  DMA Descriptor structure
  * */
 
-typedef struct sbdmadscr_s {
+struct sbdmadscr {
uint64_t  dscr_a;
uint64_t  dscr_b;
-} sbdmadscr_t;
-
-typedef unsigned long paddr_t;
+};
 
 /**
  *  DMA Controller structure
  * */
 
-typedef struct sbmacdma_s {
+struct sbmacdma {
 
/*
 * This stuff is used to identify the channel and the registers
 * associated with it.
 */
-
-   struct sbmac_softc *sbdma_eth;  /* back pointer to associated 
MAC */
-   int  sbdma_channel; /* channel number */
-   int  sbdma_txdir;   /* direction (1=transmit) */
-   int  sbdma_maxdescr;/* total # of descriptors in 
ring */
+   struct sbmac_softc  *sbdma_eth; /* back pointer to associated
+  MAC */
+   int sbdma_channel;  /* channel number */
+   int sbdma_txdir;/* direction (1=transmit) */
+   int sbdma_maxdescr; /* total # of descriptors
+  in ring */
 #ifdef CONFIG_SBMAC_COALESCE
-   int  sbdma_int_pktcnt;  /* # descriptors rx/tx before 
interrupt*/
-   int  sbdma_int_timeout; /* # usec rx/tx interrupt */
+   int sbdma_int_pktcnt;
+   /* # descriptors rx/tx
+  before interrupt */
+   int sbdma_int_timeout;
+   /* # usec rx/tx interrupt */
 #endif
-
-   volatile void __iomem *sbdma_config0;   /* DMA config register 0 */
-   volatile void __iomem *sbdma_config1;   /* DMA config register 1 */
-   volatile void __iomem *sbdma_dscrbase;  /* Descriptor base address */
-   volatile void __iomem *sbdma_dscrcnt; /* Descriptor count register 
*/
-   volatile void __iomem *sbdma_curdscr;   /* current descriptor address */
+   volatile void __iomem   *sbdma_config0; /* DMA config register 0 */
+   volatile void __iomem   *sbdma_config1; /* DMA config register 1 */
+   volatile void __iomem   *sbdma_dscrbase;
+   /* descriptor base address */
+   volatile void __iomem   *sbdma_dscrcnt; /* descriptor count register */
+   volatile void __iomem   *sbdma_curdscr; /* current descriptor
+  address */
 
/*
 * This stuff is for maintenance of the ring
 */
-
-   sbdmadscr_t *sbdma_dscrtable;   /* base of descriptor table */
-   sbdmadscr_t *sbdma_dscrtable_end; /* end of descriptor table */
-
-   struct sk_buff **sbdma_ctxtable;/* context table, one per descr */
-
-   paddr_t  sbdma_dscrtable_phys; /* and also the phys addr */
-   sbdmadscr_t *sbdma_addptr;  /* next dscr for sw to add */
-   sbdmadscr_t *sbdma_remptr;  /* next dscr for sw to remove */
-} sbmac

[PATCH 2.6.18 8/6]: sb1250-mac: Fix an incorrect use of kfree()

2006-10-05 Thread Maciej W. Rozycki
 The pointer obtained by kmalloc() is treated with ALIGN() before passing 
it to kfree().  This may or may not cause problems depending on the 
minimum alignment enforced by kmalloc() and is ugly anyway.  This change 
records the original pointer returned by kmalloc() so that kfree() may 
safely use it.

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
---

 This applies on top of the "typedef" change (7/6).  Please consider.

  Maciej

patch-mips-2.6.18-20060920-sb1250-mac-kfree-0
diff -up --recursive --new-file 
linux-mips-2.6.18-20060920.macro/drivers/net/sb1250-mac.c 
linux-mips-2.6.18-20060920/drivers/net/sb1250-mac.c
--- linux-mips-2.6.18-20060920.macro/drivers/net/sb1250-mac.c   2006-10-05 
16:18:41.0 +
+++ linux-mips-2.6.18-20060920/drivers/net/sb1250-mac.c 2006-10-04 
23:07:27.0 +
@@ -220,6 +220,7 @@ struct sbmacdma {
/*
 * This stuff is for maintenance of the ring
 */
+   void*sbdma_dscrtable_un;
struct sbdmadscr*sbdma_dscrtable;
/* base of descriptor table */
struct sbdmadscr*sbdma_dscrtable_end;
@@ -640,15 +641,16 @@ static void sbdma_initctx(struct sbmacdm
 
d->sbdma_maxdescr = maxdescr;
 
-   d->sbdma_dscrtable = kmalloc((d->sbdma_maxdescr + 1) *
-sizeof(*d->sbdma_dscrtable), GFP_KERNEL);
+   d->sbdma_dscrtable_un = kmalloc((d->sbdma_maxdescr + 1) *
+   sizeof(*d->sbdma_dscrtable),
+   GFP_KERNEL);
 
/*
 * The descriptor table must be aligned to at least 16 bytes or the
 * MAC will corrupt it.
 */
d->sbdma_dscrtable = (struct sbdmadscr *)
-ALIGN((unsigned long)d->sbdma_dscrtable,
+ALIGN((unsigned long)d->sbdma_dscrtable_un,
   sizeof(*d->sbdma_dscrtable));
 
memset(d->sbdma_dscrtable, 0,
@@ -1309,9 +1311,9 @@ static int sbmac_initctx(struct sbmac_so
 
 static void sbdma_uninitctx(struct sbmacdma *d)
 {
-   if (d->sbdma_dscrtable) {
-   kfree(d->sbdma_dscrtable);
-   d->sbdma_dscrtable = NULL;
+   if (d->sbdma_dscrtable_un) {
+   kfree(d->sbdma_dscrtable_un);
+   d->sbdma_dscrtable = d->sbdma_dscrtable_un = NULL;
}
 
if (d->sbdma_ctxtable) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] cfg80211 and nl80211

2006-10-05 Thread Jouni Malinen
On Thu, Oct 05, 2006 at 09:13:53AM -0400, Stuffed Crust wrote:

> (Leave out the RSNIE, AuthType and KeyMgmt stuff; while they're 
>  used in the actual key negotiation/derivation, they're separate 
>  problems and have no bearing on the crypto layer.  From the driver's 
>  perspective the RSNIE is just an opaque blob to be appended to 
>  beacons,presps and [re]assoc frames, KeyMgmt is purely a matter for 
>  the authenticator/supplicant, and AuthType is just a toggle that 
>  happens to be off for post-802.11i, although LEAP v1 adds some 
>  complications there..)

They are separate problems, but they do need to be taken into account in
802.11 interface to user space. Some drivers generate WPA/RSN IE
internally and they need to be told about the allowed protocol version,
authenticated key management suite, and pairwise/group cipher suites. In
other words, key management is not purely for authenticator/supplicant.

> Each key has:
> 
> * Key type (WEP/TKIP/AES-CCMP/NONE)
> * Key length (implied, but WEP can have varying key lengths)
> * Key index (only '0' is generally used for unicast frames, but 802.11i 
>  requires use of simultaneous broadcast keys)

Pre-802.11i supported key mapping and multiple default keys.. To make
things complex, many Cisco APs are configured to use non-zero key
indexes with dynamic WEP keys..

> ...Meanwhile.  It's foolish to ignore the 802.11 MLME.  It lists out
> pretty much everything that's necessary to get a working connection, and
> looking at its evolution (and changes in the pipeline) shows that it's
> impossible to do it all (right) the first time, and that changes, not
> just additions, will be necessary.

There are non-standard WLAN security protocols (look at Cisco) and one
needs to keep in mind that just looking at 802.11 MLME may not cover all
cases that, in practice, have to be supported.. Anyway, I agree that
MLME primitives do change and there will be new commands needed to cover
needs of future amendments to 802.11 (see, e.g., 802.11r and 802.11w
drafts).

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] prism54: wpa support for fullmac cards

2006-10-05 Thread chunkeey
On Wed, Oct 04, 2006 23:43 you wrote:
> On Wed, Oct 04, 2006 at 04:12:26PM +0200, [EMAIL PROTECTED] wrote:
> > the AP code never worked. And the hostapd-ioctl interface was designed
> > for prism2/2.5/3 cards, but not for "fullmac" prism54.
>
> What do you mean by never working? I have seen fullmac Prism54
> completing WPA authentication with hostapd.. This was using the
> driver_prism54.c in hostapd, not the Host AP driver interface.
> > (BTW, hostapd's backend for prism54 uses a "proprietary" interface -
> > PIMFOR -, which never made it into the kernel.)
>
> But it worked in the external driver. So yes, saying that the version in
> kernel tree never worked in AP mode would probably be valid.
>
ok, sorry my fault, I should have put it this was:

it was never woking for ME, linmax, roland warsow, ...
and I tried alot of things. (patches, how-tos, ask the maintainer, etc. )
But i only saw "Oops" or "mgt: queue full" ...
 
the PIMFOR-Interface is a direct "tunnel" to the hardware.
And guess what? it's very "crashy" .. (e.g "set/get the generic elements" does 
a very good job. ;) )

> And as far as the WEXT interface in hostapd is concerned, no, there is
> no such thing yet.

that's correct.
WEXT is not going anywhere anymore, but maybe cfg80211?

Chr
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] sk98lin: ethtool register dump

2006-10-05 Thread Stephen Hemminger
Add support for dumping the registers in the deprecated
sk98lin driver. This is allows for easier comparison with
settings in new skge driver.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- linux-2.6.orig/drivers/net/sk98lin/skethtool.c
+++ linux-2.6/drivers/net/sk98lin/skethtool.c
@@ -581,6 +581,30 @@ static int setRxCsum(struct net_device *
return 0;
 }
 
+static int getRegsLen(struct net_device *dev)
+{
+   return 0x4000;
+}
+
+/*
+ * Returns copy of whole control register region
+ * Note: skip RAM address register because accessing it will
+ *  cause bus hangs!
+ */
+static void getRegs(struct net_device *dev, struct ethtool_regs *regs,
+ void *p)
+{
+   DEV_NET *pNet = netdev_priv(dev);
+   const void __iomem *io = pNet->pAC->IoBase;
+
+   regs->version = 1;
+   memset(p, 0, regs->len);
+   memcpy_fromio(p, io, B3_RAM_ADDR);
+
+   memcpy_fromio(p + B3_RI_WTO_R1, io + B3_RI_WTO_R1,
+ regs->len - B3_RI_WTO_R1);
+}
+
 const struct ethtool_ops SkGeEthtoolOps = {
.get_settings   = getSettings,
.set_settings   = setSettings,
@@ -599,4 +623,6 @@ const struct ethtool_ops SkGeEthtoolOps 
.set_tx_csum= setTxCsum,
.get_rx_csum= getRxCsum,
.set_rx_csum= setRxCsum,
+   .get_regs   = getRegs,
+   .get_regs_len   = getRegsLen,
 };
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] sk98lin: MII ioctl support

2006-10-05 Thread Stephen Hemminger
Add MII ioctl support to the deprecated sk98lin driver.
This allows comparison with skge driver's PHY settings.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

--- linux-2.6.orig/drivers/net/sk98lin/skge.c
+++ linux-2.6/drivers/net/sk98lin/skge.c
@@ -113,6 +113,7 @@
 #include   
 #include   
 #include   
+#include   
 
 #include   "h/skdrv1st.h"
 #include   "h/skdrv2nd.h"
@@ -2843,6 +2844,56 @@ unsigned longFlags;  /* for spin 
lock 
return(&pAC->stats);
 } /* SkGeStats */
 
+/*
+ * Basic MII register access
+ */
+static int SkGeMiiIoctl(struct net_device *dev,
+   struct mii_ioctl_data *data, int cmd)
+{
+   DEV_NET *pNet = netdev_priv(dev);
+   SK_AC *pAC = pNet->pAC;
+   SK_IOC IoC = pAC->IoBase;
+   int Port = pNet->PortNr;
+   SK_GEPORT *pPrt = &pAC->GIni.GP[Port];
+   unsigned long Flags;
+   int err = 0;
+   int reg = data->reg_num & 0x1f;
+   SK_U16 val = data->val_in;
+
+   if (!netif_running(dev))
+   return -ENODEV; /* Phy still in reset */
+
+   spin_lock_irqsave(&pAC->SlowPathLock, Flags);
+   switch(cmd) {
+   case SIOCGMIIPHY:
+   data->phy_id = pPrt->PhyAddr;
+
+   /* fallthru */
+   case SIOCGMIIREG:
+   if (pAC->GIni.GIGenesis)
+   SkXmPhyRead(pAC, IoC, Port, reg, &val);
+   else
+   SkGmPhyRead(pAC, IoC, Port, reg, &val);
+
+   data->val_out = val;
+   break;
+
+   case SIOCSMIIREG:
+   if (!capable(CAP_NET_ADMIN))
+   err = -EPERM;
+
+   else if (pAC->GIni.GIGenesis)
+   SkXmPhyWrite(pAC, IoC, Port, reg, val);
+   else
+   SkGmPhyWrite(pAC, IoC, Port, reg, val);
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   }
+spin_unlock_irqrestore(&pAC->SlowPathLock, Flags);
+   return err;
+}
+
 
 /*
  *
@@ -2876,6 +2927,9 @@ int   HeaderLength = sizeof(SK_U32) + siz
pNet = netdev_priv(dev);
pAC = pNet->pAC;

+   if (cmd == SIOCGMIIPHY || cmd == SIOCSMIIREG || cmd == SIOCGMIIREG)
+   return SkGeMiiIoctl(dev, if_mii(rq), cmd);
+
if(copy_from_user(&Ioctl, rq->ifr_data, sizeof(SK_GE_IOCTL))) {
return -EFAULT;
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE2][PATCH] Add missing macros which was removed from kernel header. (Re: [GIT PATCH] NET: Fixes for net-2.6.19)

2006-10-05 Thread Stephen Hemminger
I applied a combined patch to fix all the headers to iproute2 (for the future
 2.6.19 based release).

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 3/3] enable IP multicast when bonding IPoIB devices

2006-10-05 Thread Jay Vosburgh
Or Gerlitz <[EMAIL PROTECTED]> wrote:

>Jay Vosburgh wrote:
[...]
>>  Yes.  Part of the difficulty is that the changes to the
>> initscripts and sysconfig packages won't be compatible with versions of
>> bonding prior to the bonding kernel changes (because older versions of
>> bonding will refuse to add slaves if the master is down).  It might
>> require adding another API version to bonding, and modifying ifenslave
>> to work both ways (i.e., with the current "enslave with master up" API,
>> as well as the new "enslave with master down" API).
>
>Gee, sounds bad

After some reflection, I suspect it wouldn't be all that awful.
The main concern is going to be whether or not the existing ifenslave
binaries supplied with distros will run with the new version of bonding.
Since the new version of bonding that you're proposing is really just
relaxing the rules (rather than imposing a different, incompatible set
of rules), that's probably not a really big deal.  I don't think it
would require a revision change to the bonding ifenslave API.

[...]
>So the direction to have sysconfig and initscripts tools configure bonding
>by sysfs and not by the enslave program is something you were considering
>regardless of the needs imposed by bonding support for non ARPHRD_ETHER
>netdevices? and you think the distro packages owners would like this?

Yes, the long term direction is to have the initscripts
configure bonding via sysfs, either directly or via the step of
converting ifenslave to a script that uses sysfs.  

I personally find ifenslave to be more convenient to use than
repeated "echo whatever > /sys/this/that/the/other", but there's no
reason that ifenslave couldn't do the various echo things itself under
the covers.  

One drawback to sysfs is that there's no real-time error
reporting; you have to look at dmesg to see if your request succeeded or
not.  I'm not sure offhand if, e.g., adding a sysfs file to bonding for
"last-request-status" is a kosher sysfs thing to do; if it is, then an
ifenslave script could check such a thing to figure out error returns.

It seems more logical to me to embed all of the bonding sysfs
magic stuff into a separate script, but the maintainers of initscipts or
sysconfig may see things differently.

The main advantage to either of these (initscripts/sysconfig
and/or ifenslave converted to sysfs) is that it eliminates the need to
load the bonding driver module multiple times to have more than one
bonding device with differing module parameters (because the sysfs
interface can create any number of bonding interfaces with arbitrary
settings).

>I will look into the current methods used by sysconfig to configure
>bonding and see if i can come up with sketch of how to do it with sysfs.

It's probably easier to first convert ifenslave to a sysfs-using
script that the existing initscripts can use.  

This allows the changes to be published in stages, rather than
requiring a single flag day changeover.  The first stage changes the
bonding driver itself to permit enslavement with the master down
(insuring that existing ifenslave binaries supplied with reasonably
current distros continue to function).  Next, ifenslave is changed to
use sysfs (simultaneously removing the adjustment of the master or
slave's up/down state during enslavement).  The next stage either
changes the initscripts/sysconfig to use sysfs directly or change its
use of ifenslave to not do multiple loads of the bonding driver. 

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver, support files: Documenation, makefiles etc.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/Documentation/networking/vioc.txt
linux-2.6.17.vioc/Documentation/networking/vioc.txt
--- linux-2.6.17/Documentation/networking/vioc.txt  1969-12-31
16:00:00.0 -0800
+++ linux-2.6.17.vioc/Documentation/networking/vioc.txt 2006-09-01
10:09:49.0 -0700
@@ -0,0 +1,98 @@
+VIOC Driver Release Notes (07/12/06)
+
+ [EMAIL PROTECTED]
+
+
+Overview
+
+
+A Virtual Input-Output Controller (VIOC) is a PCI device that provides
+10Gbps of I/O bandwidth that can be shared by up to 16 virtual network
+interfaces (VNICs).  VIOC hardware supports several features such as
+large frames, checksum offload, gathered send, MSI/MSI-X, bandwidth
+control, interrupt mitigation, etc.
+
+VNICs are provisioned to a host partition via an out-of-band interface
+from the System Controller -- typically before the partition boots,
+although they can be dynamically added or removed from a running
+partition as well.
+
+Each provisioned VNIC appears as an Ethernet netdevice to the host OS,
+and maintains its own transmit ring in DMA memory.  VNICs are
+configured to share up to 4 of total 16 receive rings and 1 of total
+16 receive-completion rings in DMA memory.  VIOC hardware classifies
+packets into receive rings based on size, allowing more efficient use
+of DMA buffer memory.  The default, and recommended, configuration
+uses groups of 'receive sets' (rxsets), each with 3 receive rings, a
+receive completion ring, and a VIOC Rx interrupt.  The driver gives
+each rxset a NAPI poll handler associated with a phantom (invisible)
+netdevice, for concurrency.  VNICs are assigned to rxsets using a
+simple modulus.
+
+VIOC provides 4 interrupts in INTx mode: 2 for Rx, 1 for Tx, and 1 for
+out-of-band messages from the System Controller and errors.  VIOC also
+provides 19 MSI-X interrupts: 16 for Rx, 1 for Tx, 1 for out-of-band
+messages from the System Controller, and 1 for error signalling from
+the hardware.  The VIOC driver makes a determination whether MSI-X
+functionality is supported and initializes interrupts accordingly.
+[Note: The Linux kernel disables MSI-X for VIOCs on modules with AMD
+8131, even if the device is on the HT link.]
+
+
+Module loadable parameters
+==
+
+- poll_weight (default 8) - the number of received packets will be
+  processed during one call into the NAPI poll handler.
+
+- rx_intr_timeout (default 1) - hardware rx interrupt mitigation
+  timer, in units of 5us.
+
+- rx_intr_pkt_cnt (default 64) - hardware rx interrupt mitigation
+  counter, in units of packets.
+
+- tx_pkts_per_irq (default 64) - hardware tx interrupt mitigation
+  counter, in units of packets.
+
+- tx_pkts_per_bell (default 1) - the number of packets to enqueue on a
+  transmit ring before issuing a doorbell to hardware.
+
+Performance Tuning
+==
+
+You may want to use the following sysctl settings to improve
+performance.  [NOTE: To be re-checked]
+
+# set in /etc/sysctl.conf
+
+net.ipv4.tcp_timestamps = 0
+net.ipv4.tcp_sack = 0
+net.ipv4.tcp_rmem = 1000 1000 1000
+net.ipv4.tcp_wmem = 1000 1000 1000
+net.ipv4.tcp_mem  = 1000 1000 1000
+
+net.core.rmem_max = 5242879
+net.core.wmem_max = 5242879
+net.core.rmem_default = 5242879
+net.core.wmem_default = 5242879
+net.core.optmem_max = 5242879
+net.core.netdev_max_backlog = 10
+
+Out-of-band Communications with System Controller
+=
+
+System operators can use the out-of-band facility to allow for remote
+shutdown or reboot of the host partition.  Upon receiving such a
+command, the VIOC driver executes "/sbin/reboot" or "/sbin/shutdown"
+via the usermodehelper() call.
+
+This same communications facility is used for dynamic VNIC
+provisioning (plug in and out).
+
+The VIOC driver also registers a callback with
+register_reboot_notifier().  When the callback is executed, the driver
+records the shutdown event and reason in a VIOC register to notify the
+System Controller.
+
+
+
diff -uprN linux-2.6.17/MAINTAINERS linux-2.6.17.vioc/MAINTAINERS
--- linux-2.6.17/MAINTAINERS2006-06-17 18:49:35.0 -0700
+++ linux-2.6.17.vioc/MAINTAINERS   2006-09-01 10:09:49.0 -0700
@@ -3106,6 +3106,11 @@ L:   [EMAIL PROTECTED]
 W: http://rio500.sourceforge.net
 S: Maintained

+VIOC NETWORK DRIVER
+P: [EMAIL PROTECTED]
+L: netdev@vger.kernel.org
+S: Maintained
+
 VIDEO FOR LINUX
 P: Mauro Carvalho Chehab
 M: [EMAIL PROTECTED]
diff -uprN linux-2.6.17/drivers/net/Kconfig
linux-2.6.17.vioc/drivers/net/Kconfig
--- linux-2.6.17/drivers/net/Kconfig2006-06-17 18:49:35.0 -0700
+++ linux-2.6.17.vioc/drivers/net/Kconfig   2006-09-01 10:19:35.0 
-0700
@@ -1818,9 +1818,

[PATCH 6/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Ethtool interface.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_ethtool.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_ethtool.c
--- linux-2.6.17/drivers/net/vioc/vioc_ethtool.c1969-12-31
 16:00:00.0 -0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_ethtool.c   2006-10-04
10:36:10.0 -0700
@@ -0,0 +1,309 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "f7/vnic_hw_registers.h"
+#include "f7/vnic_defs.h"
+
+#include 
+#include "vioc_vnic.h"
+#include "vioc_api.h"
+#include "driver_version.h"
+
+/* ethtool support for vnic */
+
+#ifdef SIOCETHTOOL
+#include 
+
+#ifndef ETH_GSTRING_LEN
+#define ETH_GSTRING_LEN 32
+#endif
+
+#ifdef ETHTOOL_OPS_COMPAT
+#include "kcompat_ethtool.c"
+#endif
+
+#define VIOC_READ_REG(R, M, V, viocdev) (\
+   readl((viocdev->ba.virt + GETRELADDR(M, V, R
+
+#define VIOC_WRITE_REG(R, M, V, viocdev, value) (\
+   (writel(value, viocdev->ba.virt + GETRELADDR(M, V, R
+
+#ifdef ETHTOOL_GSTATS
+struct vnic_stats {
+   char stat_string[ETH_GSTRING_LEN];
+   int sizeof_stat;
+   int stat_offset;
+};
+
+#define VNIC_STAT(m) sizeof(((struct vnic_device *)0)->m), \
+ offsetof(struct vnic_device, m)
+
+static const struct vnic_stats vnic_gstrings_stats[] = {
+   {"rx_packets", VNIC_STAT(net_stats.rx_packets)},
+   {"tx_packets", VNIC_STAT(net_stats.tx_packets)},
+   {"rx_bytes", VNIC_STAT(net_stats.rx_bytes)},
+   {"tx_bytes", VNIC_STAT(net_stats.tx_bytes)},
+   {"rx_errors", VNIC_STAT(net_stats.rx_errors)},
+   {"tx_errors", VNIC_STAT(net_stats.tx_errors)},
+   {"rx_dropped", VNIC_STAT(net_stats.rx_dropped)},
+   {"tx_dropped", VNIC_STAT(net_stats.tx_dropped)},
+   {"multicast", VNIC_STAT(net_stats.multicast)},
+   {"collisions", VNIC_STAT(net_stats.collisions)},
+   {"rx_length_errors", VNIC_STAT(net_stats.rx_length_errors)},
+   {"rx_over_errors", VNIC_STAT(net_stats.rx_over_errors)},
+   {"rx_crc_errors", VNIC_STAT(net_stats.rx_crc_errors)},
+   {"rx_frame_errors", VNIC_STAT(net_stats.rx_frame_errors)},
+   {"rx_fifo_errors", VNIC_STAT(net_stats.rx_fifo_errors)},
+   {"rx_missed_errors", VNIC_STAT(net_stats.rx_missed_errors)},
+   {"tx_aborted_errors", VNIC_STAT(net_stats.tx_aborted_errors)},
+   {"tx_carrier_errors", VNIC_STAT(net_stats.tx_carrier_errors)},
+   {"tx_fifo_errors", VNIC_STAT(net_stats.tx_fifo_errors)},
+   {"tx_heartbeat_errors", VNIC_STAT(net_stats.tx_heartbeat_errors)},
+   {"tx_window_errors", VNIC_STAT(net_stats.tx_window_errors)},
+   {"rx_fragment_errors", VNIC_STAT(vnic_stats.rx_fragment_errors)},
+   {"rx_dropped", VNIC_STAT(vnic_stats.rx_dropped)},
+   {"tx_skb_equeued", VNIC_STAT(vnic_stats.skb_enqueued)},
+   {"tx_skb_freed", VNIC_STAT(vnic_stats.skb_freed)},
+   {"netif_stops", VNIC_STAT(vnic_stats.netif_stops)},
+   {"tx_on_empty_intr", VNIC_STAT(vnic_stats.tx_on_empty_interrupts)},
+   {"tx_headroom_misses", VNIC_STAT(vnic_stats.headroom_misses)},
+   {"tx_headroom_miss_drops", VNIC_STAT(vnic_stats.headroom_miss_drops)},
+   {"tx_ring_size", VNIC_STAT(txq.count)},
+   {"tx_ring_capacity", VNIC_STAT(txq.empty)},
+   {"pkts_till_intr", VNIC_STAT(txq.tx_pkts_til_irq)},
+   {"pkts_till_bell", VNIC_STAT(txq.tx_pkts_til_bell)},
+   {"bells", VNIC_STAT(txq.bells)},
+   {"next_to_use", VNIC_STAT(txq.next_to_use)},
+   {"next_to_clean", VNIC_STAT(txq.next_to_clean)},
+   {"tx_frags", VNIC_STAT(txq.frags)},
+   {"tx_ring_wraps", VNIC_STAT(txq.wraps)},
+   {"tx_ring_fulls", VNIC_STAT(txq.full)}
+};
+
+#define VNIC_STATS_LEN \
+   sizeof(vnic_gstrings_stats) / sizeof(struct vnic_stats)
+#endif /* ETHTOOL_GS

[PATCH 7/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Interrupt handler.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_irq.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_irq.c
--- linux-2.6.17/drivers/net/vioc/vioc_irq.c1969-12-31 16:00:00.0
-0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_irq.c   2006-10-04
10:37:56.0 -0700
@@ -0,0 +1,538 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "f7/vnic_hw_registers.h"
+#include "f7/vnic_defs.h"
+#include "vioc_vnic.h"
+
+#define VIOC_INTERRUPTS_CNT19  /* 16 Rx + 1 Tx + 1 BMC + 1 Error */
+#define VIOC_INTERRUPTS_CNT_PIN_IRQ4   /* 2 Rx + 1 Tx + 1 BMC */
+
+#define VIOC_SLVAR(x) x spinlock_t vioc_driver_lock = SPIN_LOCK_UNLOCKED
+#define VIOC_CLI spin_lock_irq(&vioc_driver_lock)
+#define VIOC_STI spin_unlock_irq(&vioc_driver_lock)
+#define IRQRETURN return IRQ_HANDLED
+#define TX_IRQ_IDX 16
+#define BMC_IRQ_IDX17
+#define ERR_IRQ_IDX18
+#define HANDLER_TASKLET1
+#define HANDLER_DIRECT 2
+#define HANDLER_TASKQ  3
+#define VIOC_RX0_PCI_FUNC   0
+#define VIOC_TX_PCI_FUNC1
+#define VIOC_BMC_PCI_FUNC   2
+#define VIOC_RX1_PCI_FUNC   3
+#define VIOC_IRQ_NONE   (u16) -1
+#define VIOC_ID_NONE-1
+#define VIOC_IVEC_NONE  -1
+#define VIOC_INTR_NONE  -1
+
+
+struct vioc_msix_entry {
+   u16 vector;
+   u16 entry;
+};
+
+struct vioc_intreq {
+   char name[VIOC_NAME_LEN];
+   void (*intrFuncp) (void *);
+   void *intrFuncparm;
+irqreturn_t(*hthandler) (int, void *, struct pt_regs *);
+   unsigned int irq;
+   unsigned int vec;
+   unsigned int intr_base;
+   unsigned int intr_offset;
+   unsigned int timeout_value;
+   unsigned int pkt_counter;
+   unsigned int rxc_mask;
+   struct work_struct taskq;
+   struct tasklet_struct tasklet;
+};
+
+struct viocdev_intreq {
+   int vioc_id;
+   struct pci_dev *pci_dev;
+   void *vioc_virt;
+   unsigned long long vioc_phy;
+   void *ioapic_virt;
+   unsigned long long ioapic_phy;
+   struct vioc_intreq intreq[VIOC_INTERRUPTS_CNT];
+   struct vioc_msix_entry irqs[VIOC_INTERRUPTS_CNT];
+};
+
+/* GLOBAL VIOC Interrupt table/structure */
+struct viocdev_intreq vioc_interrupts[VIOC_MAX_VIOCS];
+
+VIOC_SLVAR();
+
+static irqreturn_t taskq_handler(int i, void *p, struct pt_regs *r)
+{
+   int intr_id = VIOC_IRQ_PARAM_INTR_ID(p);
+   int vioc_id = VIOC_IRQ_PARAM_VIOC_ID(p);
+
+   schedule_work(&vioc_interrupts[vioc_id].intreq[intr_id].taskq);
+   IRQRETURN;
+}
+
+static irqreturn_t tasklet_handler(int i, void *p, struct pt_regs *r)
+{
+   int intr_id = VIOC_IRQ_PARAM_INTR_ID(p);
+   int vioc_id = VIOC_IRQ_PARAM_VIOC_ID(p);
+
+   tasklet_schedule(&vioc_interrupts[vioc_id].intreq[intr_id].tasklet);
+   IRQRETURN;
+}
+
+static irqreturn_t direct_handler(int i, void *p, struct pt_regs *r)
+{
+   int intr_id = VIOC_IRQ_PARAM_INTR_ID(p);
+   int vioc_id = VIOC_IRQ_PARAM_VIOC_ID(p);
+
+   vioc_interrupts[vioc_id].intreq[intr_id].
+   intrFuncp(vioc_interrupts[vioc_id].intreq[intr_id].intrFuncparm);
+   IRQRETURN;
+}
+
+static int vioc_enable_msix(u32 viocdev_idx)
+{
+   struct vioc_device *viocdev = vioc_viocdev(viocdev_idx);
+   int ret;
+
+#if defined(CONFIG_MSIX_MOD)
+   ret = pci_enable_msix(viocdev->pdev,
+ (struct msix_entry *)
+ &vioc_interrupts[viocdev_idx].irqs,
+ VIOC_INTERRUPTS_CNT);
+   if (ret == 0) {
+   dev_err(&viocdev->pdev->dev, "MSI-X OK\n");
+   return VIOC_INTERRUPTS_CNT;
+   } else {
+   dev_err(&viocdev->pdev->dev,
+  

[PATCH 4/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. VIOC hardware APIs.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_api.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_api.c
--- linux-2.6.17/drivers/net/vioc/vioc_api.c1969-12-31 16:00:00.0
-0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_api.c   2006-10-04
10:21:45.0 -0700
@@ -0,0 +1,384 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "f7/vnic_hw_registers.h"
+#include "f7/vnic_defs.h"
+
+#include "vioc_vnic.h"
+#include "vioc_api.h"
+
+int vioc_set_rx_intr_param(int viocdev_idx, int rx_intr_id, u32 timeout, u32
cntout)
+{
+   int ret = 0;
+   struct vioc_device *viocdev;
+   u64 regaddr;
+
+   viocdev = vioc_viocdev(viocdev_idx);
+
+   regaddr = GETRELADDR(VIOC_IHCU, 0, (VREG_IHCU_RXCINTTIMER +
+   (rx_intr_id << 2)));
+   vioc_reg_wr(timeout, viocdev->ba.virt, regaddr);
+
+   regaddr = GETRELADDR(VIOC_IHCU, 0, (VREG_IHCU_RXCINTPKTCNT +
+   (rx_intr_id << 2)));
+   vioc_reg_wr(cntout, viocdev->ba.virt, regaddr);
+
+   return ret;
+}
+
+
+int vioc_get_vnic_mac(int viocdev_idx, u32 vnic_id, u8 * p)
+{
+   struct vioc_device *viocdev = vioc_viocdev(viocdev_idx);
+   u64 regaddr;
+   u32 value;
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, VREG_VENG_MACADDRLO);
+   vioc_reg_rd(viocdev->ba.virt, regaddr, &value);
+   *((u32 *) & p[2]) = htonl(value);
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, VREG_VENG_MACADDRHI);
+   vioc_reg_rd(viocdev->ba.virt, regaddr, &value);
+   *((u16 *) & p[0]) = htons(value);
+
+   return 0;
+}
+
+int vioc_set_vnic_mac(int viocdev_idx, u32 vnic_id, u8 * p)
+{
+   struct vioc_device *viocdev = vioc_viocdev(viocdev_idx);
+   u64 regaddr;
+   u32 value;
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, VREG_VENG_MACADDRLO);
+   value = ntohl(*((u32 *) & p[2]));
+
+   vioc_reg_wr(value, viocdev->ba.virt, regaddr);
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, VREG_VENG_MACADDRHI);
+   value = (ntohl(*((u32 *) & p[0])) >> 16) & 0x;
+
+   vioc_reg_wr(value, viocdev->ba.virt, regaddr);
+
+   return 0;
+}
+
+int vioc_set_txq(int viocdev_idx, u32 vnic_id, u32 txq_id, dma_addr_t base,
+u32 num_elements)
+{
+   int ret = 0;
+   u32 value;
+   struct vioc_device *viocdev;
+   u64 regaddr;
+
+   viocdev = vioc_viocdev(viocdev_idx);
+   if (vnic_id >= VIOC_MAX_VNICS)
+   goto parm_err_ret;
+
+   if (txq_id >= VIOC_MAX_TXQ)
+   goto parm_err_ret;
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, (VREG_VENG_TXD_W0 + (txq_id <<
5)));
+
+   value = base;
+   vioc_reg_wr(value, viocdev->ba.virt, regaddr);
+
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, (VREG_VENG_TXD_W1 + (txq_id <<
5)));
+   value = (((base >> 16) >> 16) & 0x00ff) |
+   ((num_elements << 8) & 0x0000);
+   vioc_reg_wr(value, viocdev->ba.virt, regaddr);
+
+   /*
+* Enable Interrupt-on-Empty
+*/
+   regaddr = GETRELADDR(VIOC_VENG, vnic_id, VREG_VENG_TXINTCTL);
+   vioc_reg_wr(VREG_VENG_TXINTCTL_INTONEMPTY_MASK, viocdev->ba.virt,
+   regaddr);
+
+   return ret;
+
+  parm_err_ret:
+   return -EINVAL;
+}
+
+int vioc_set_rxc(int viocdev_idx, struct rxc *rxc)
+{
+   u32 value;
+   struct vioc_device *viocdev;
+   u64 regaddr;
+   int ret = 0;
+
+   viocdev = vioc_viocdev(viocdev_idx);
+
+   regaddr = GETRELADDR(VIOC_IHCU, 0, (VREG_IHCU_RXC_LO + (rxc->rxc_id <<
 4))); +value = rxc->dma;
+   vioc_reg_wr(value, viocdev->ba.virt, regaddr);
+
+   regaddr = GETRELADDR(VIOC_IHCU, 0, (VREG_IHCU_RXC_HI + (rxc->rxc_id <<
 4))); +value = (((rxc->dma >> 16) >> 16) & 0x00ff) |
+   

[PATCH 5/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Device driver initialization/termination.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_vnic.h
linux-2.6.17.vioc/drivers/net/vioc/vioc_vnic.h
--- linux-2.6.17/drivers/net/vioc/vioc_vnic.h   1969-12-31 16:00:00.0
-0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_vnic.h  2006-10-04
10:10:04.0 -0700
@@ -0,0 +1,498 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#ifndef _VIOC_VNIC_H
+#define _VIOC_VNIC_H
+
+#include 
+#include 
+#include 
+
+#include "f7/vnic_defs.h"
+#include "f7/vnic_hw_registers.h"
+#include "f7/vioc_pkts_defs.h"
+
+/*
+ * VIOC PCI constants
+ */
+#define PCI_VENDOR_ID_FABRIC7  0xfab7
+#define PCI_DEVICE_ID_VIOC_1   0x0001
+#define PCI_DEVICE_ID_VIOC_8   0x0008
+#define PCI_DEVICE_ID_IOAPIC   0x7459
+
+#define VIOC_DRV_MODULE_NAME   "vioc"
+
+#define F7PF_HLEN_MIN   8  /* Minimal (kl=0) header */
+#define F7PF_HLEN_STD   10 /* Standard (kl=1) header */
+
+#define VNIC_MAX_MTU   9180
+#define VNIC_STD_MTU   1500
+
+/* VIOC device constants */
+#define VIOC_MAX_RXDQ  16
+#define VIOC_MAX_RXCQ  16
+#define VIOC_MAX_RXQ   4
+#define VIOC_MAX_TXQ   4
+#define VIOC_NAME_LEN  16
+
+/*
+ * VIOC device state
+ */
+
+#define VIOC_STATE_INIT0
+#define VIOC_STATE_UP  (VIOC_STATE_INIT + 1)
+
+#define RX_DESC_SIZE   sizeof (struct rx_pktBufDesc_Phys_w)
+#define RX_DESC_QUANT  (4096/RX_DESC_SIZE)
+
+#define RXC_DESC_SIZE  sizeof (struct rxc_pktDesc_Phys_w)
+#define RXC_DESC_QUANT (4096/RXC_DESC_SIZE)
+
+#define TX_DESC_SIZE   sizeof (struct tx_pktBufDesc_Phys_w)
+#define TX_DESC_QUANT  (4096/TX_DESC_SIZE)
+
+#define RXS_DESC_SIZE  sizeof (struct rxc_pktStatusBlock_w)
+
+#define VIOC_COPYOUT_THRESHOLD 128
+#define VIOC_RXD_BATCH_BITS32
+#define ALL_BATCH_SW_OWNED 0
+#define ALL_BATCH_HW_OWNED 0x
+
+#define VIOC_ANY_VNIC  0
+#define VIOC_NONE_TO_HW(u32) -1
+
+/*
+ * Status of the Rx operation as reflected in Rx Completion Descriptor
+ */
+#define GET_VNIC_RXC_STATUS(rxcd)  (\
+   GET_VNIC_RXC_BADCRC(rxcd) |\
+   GET_VNIC_RXC_BADLENGTH(rxcd) |\
+   GET_VNIC_RXC_BADSMPARITY(rxcd) |\
+   GET_VNIC_RXC_PKTABORT(rxcd)\
+   )
+#define VNIC_RXC_STATUS_OK_W   0
+
+#define VNIC_RXC_STATUS_MASK (\
+   VNIC_RXC_ISBADLENGTH_W | \
+   VNIC_RXC_ISBADCRC_W | \
+   VNIC_RXC_ISBADSMPARITY_W | \
+   VNIC_RXC_ISPKTABORT_W \
+   )
+
+#define VIOC_IRQ_PARAM_VIOC_ID(param)  \
+   (int) (((u64) param >> 28) & 0xf)
+#define VIOC_IRQ_PARAM_INTR_ID(param)  \
+   (int) ((u64) param & 0x)
+#define VIOC_IRQ_PARAM_PARAM_ID(param) \
+   (int) (((u64) param >> 16) & 0xff)
+
+#define VIOC_IRQ_PARAM_SET(vioc, intr, param) \
+   u64) vioc & 0xf) << 28) | \
+   (((u64) param & 0xff) << 16) | \
+   ((u64) intr & 0x))
+/*
+ * Return status codes
+ */
+#define E_VIOCOK   0
+#define E_VIOCMAX  1
+#define E_VIOCINTERNAL 2
+#define E_VIOCNETREGERR 3
+#define E_VIOCPARMERR  4
+#define E_VIOCNOOP 5
+#define E_VIOCTXFULL   6
+#define E_VIOCIFNOTFOUND 7
+#define E_VIOCMALLOCERR 8
+#define E_VIOCORDERR   9
+#define E_VIOCHWACCESS 10
+#define E_VIOCHWNOTREADY 11
+#define E_ALLOCERR 12
+#define E_VIOCRXHW 13
+#define E_VIOCRXCEMPTY 14
+
+/*
+ * From the HW statnd point, every VNIC has 4 RxQ - receive queues.
+ * Every RxQ is mapped to RxDQ (a ring with buffers for Rx Packets)
+ * and RxC queue (a ring with descriptors that reflect the status of the
receive.
+ * I.e. when VIOC receives the packet on any of the 4 RxQ, it would use the
mapping to determine where
+ * to get buffer for the packet (RxDQ) and where to post the result of the
operation (RxC).
+ */
+
+struct rxd_q_prov {
+   u32 buf_size;
+   u32 entries;
+   u8 id;
+   u8 state;
+};
+
+struct vnic_prov_def {
+   struct rxd_q_prov rxd_ring[4];
+   u32 tx_entries;
+   u32 rxc_en

[PATCH 3/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Out-of-band provisioning protocol support code.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/f7/spp.h
linux-2.6.17.vioc/drivers/net/vioc/f7/spp.h
--- linux-2.6.17/drivers/net/vioc/f7/spp.h  1969-12-31 16:00:00.0
 -0800 +++ linux-2.6.17.vioc/drivers/net/vioc/f7/spp.h  2006-09-06
 16:22:59.0 -0700
@@ -0,0 +1,68 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#ifndef _SPP_H_
+#define _SPP_H_
+
+#include "vnic_hw_registers.h"
+
+#define SPP_MODULE VIOC_BMC
+
+#define SPP_CMD_REG_BANK   15
+#define SPP_SIM_PMM_BANK   14
+#defineSPP_PMM_BMC_BANK13
+
+/* communications COMMAND REGISTERS */
+#define SPP_SIM_PMM_CMDREG GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R1)
+#define VIOCCP_SPP_SIM_PMM_CMDREG  \
+   VIOCCP_GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R1)
+#define SPP_PMM_SIM_CMDREG GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R2)
+#define VIOCCP_SPP_PMM_SIM_CMDREG  \
+   VIOCCP_GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R2)
+#define SPP_PMM_BMC_HB_CMDREG  GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R3)
+#define SPP_PMM_BMC_SIG_CMDREG GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R4)
+#define SPP_PMM_BMC_CMDREG GETRELADDR(SPP_MODULE, SPP_CMD_REG_BANK,
VREG_BMC_REG_R5)
+
+#define SPP_BANK_ADDR(bank) GETRELADDR(SPP_MODULE, bank, VREG_BMC_REG_R0)
+
+#define SPP_SIM_PMM_DATA GETRELADDR(SPP_MODULE, SPP_SIM_PMM_BANK,
VREG_BMC_REG_R0)
+#define VIOCCP_SPP_SIM_PMM_DATA\
+   VIOCCP_GETRELADDR(SPP_MODULE, SPP_SIM_PMM_BANK,
VREG_BMC_REG_R0)
+
+/* PMM-BMC Sensor register bits */
+#define SPP_PMM_BMC_HB_SENREG  GETRELADDR(SPP_MODULE, 0, VREG_BMC_SENSOR0)
+#define SPP_PMM_BMC_CTL_SENREG GETRELADDR(SPP_MODULE, 0, VREG_BMC_SENSOR1)
+#define SPP_PMM_BMC_SENREG GETRELADDR(SPP_MODULE, 0, VREG_BMC_SENSOR2)
+
+/* BMC Interrupt number used to alert PMM that message has been sent */
+#define SPP_SIM_PMM_INTR   1
+#define SPP_BANK_REGS  32
+
+
+#define SPP_OK 0
+#define SPP_CHKSUM_ERR 1
+#endif /* _SPP_H_ */
+
diff -uprN linux-2.6.17/drivers/net/vioc/f7/spp_msgdata.h
linux-2.6.17.vioc/drivers/net/vioc/f7/spp_msgdata.h
--- linux-2.6.17/drivers/net/vioc/f7/spp_msgdata.h  1969-12-31
16:00:00.0 -0800
+++ linux-2.6.17.vioc/drivers/net/vioc/f7/spp_msgdata.h 2006-09-06
16:22:59.0 -0700
@@ -0,0 +1,54 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#ifndef _SPPMSGDATA_H_
+#define _SPPMSGDATA_H_
+
+#include "spp.h"
+
+/* KEYs For SPP_FACILITY_VNIC */
+#define SPP_KEY_VNIC_CTL   1
+#define SPP_KEY_SET_PROV   2
+
+/* Data Register Offset for VIOC ID parameter */
+#define SPP_VIOC_ID_IDX0
+#define SPP_VIOC_ID_OFFSET GETRELADDR(SPP_MODULE, SPP_SIM_PMM_BANK,
(VREG_BMC_REG_R0 + (SPP_VIOC_ID_IDX << 2)))
+#define VIOCCP_SPP_VIOC_ID_OFFSET VIOCCP_GETRELADDR(SPP_MODULE,
SPP_SIM_PMM_BANK, (VREG_BMC_REG_R0 + (SPP_VIOC_ID_IDX << 2)))
+
+/* KEYs for  SPP_FACILITY_SYS  */
+#define SPP_KEY_REQUEST_SIGNAL 1
+
+/* Data Register Offse

[PATCH 0/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
The following patch series introduces the VIOC Device Driver, that
provides a network device inerface
to the internal fabric interconnected network used on servers designed and
built by Fabric 7 Systems.
--
Misha Tomushev
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Packet receive code.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_receive.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_receive.c
--- linux-2.6.17/drivers/net/vioc/vioc_receive.c1969-12-31
 16:00:00.0 -0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_receive.c   2006-10-04
10:39:10.0 -0700
@@ -0,0 +1,365 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "f7/vnic_hw_registers.h"
+#include "f7/vnic_defs.h"
+
+#include "vioc_vnic.h"
+#include "vioc_api.h"
+
+/*
+ * Receive one packet.  The VIOC is read-locked.  Since RxDQs are
+ * partitioned into independent RxSets and VNICs assigned to exactly
+ * one RxSet, no locking is needed on RxDQs or RxCQs.
+ * Return true if we got a packet, false if the queue is empty.
+ */
+int vioc_rx_pkt(struct vioc_device *viocdev, struct rxc *rxc, u32 sw_idx)
+{
+   u32 rx_status;
+   u32 vnic_id;
+   u32 rxdq_id;
+   u32 rxd_id;
+   u32 pkt_len;
+   u32 dmap_idx;
+   struct sk_buff *in_skb, *out_skb;
+   struct vnic_device *vnicdev;
+   struct rxdq *rxdq;
+   struct rxc_pktDesc_Phys_w *rxcd;
+   struct rx_pktBufDesc_Phys_w *rxd;
+
+   rxcd = &rxc->desc[sw_idx];
+   if (GET_VNIC_RXC_FLAGGED(rxcd) != VNIC_RXC_FLAGGED_HW_W)
+   return 0;   /* ring empty */
+
+   vnic_id = GET_VNIC_RXC_VNIC_ID_SHIFTED(rxcd);
+   rxdq_id = GET_VNIC_RXC_RXQ_ID_SHIFTED(rxcd);
+   rxd_id = GET_VNIC_RXC_IDX_SHIFTED(rxcd);
+   rxdq = viocdev->rxd_p[rxdq_id];
+   rxd = &rxdq->desc[rxd_id];
+
+   in_skb = (struct sk_buff *)rxdq->vbuf[rxd_id].skb;
+   BUG_ON(in_skb == NULL);
+   out_skb = in_skb;   /* default it here */
+
+   /*
+* Reset HW FLAG in this RxC Descriptor, marking it as "SW
+* acknowledged HW completion".
+*/
+   CLR_VNIC_RXC_FLAGGED(rxcd);
+
+   if (!(viocdev->vnics_map & (1 << vnic_id)))
+   /* VNIC is not enabled - silently drop packet */
+   goto out;
+
+   in_skb->dev = viocdev->vnic_netdev[vnic_id];
+   vnicdev = in_skb->dev->priv;
+   BUG_ON(vnicdev == NULL);
+
+   rx_status = GET_VNIC_RXC_STATUS(rxcd);
+
+   if (likely(rx_status == VNIC_RXC_STATUS_OK_W)) {
+
+   pkt_len = GET_NTOH_VIOC_F7PF_PKTLEN_SHIFTED(in_skb->data);
+
+   /* Copy out mice packets in ALL rings, even small */
+   if (pkt_len <= VIOC_COPYOUT_THRESHOLD) {
+   out_skb = dev_alloc_skb(pkt_len);
+   if (!out_skb)
+   goto drop;
+   out_skb->dev = in_skb->dev;
+   memcpy(out_skb->data, in_skb->data, pkt_len);
+   }
+
+   vnicdev->net_stats.rx_bytes += pkt_len;
+   vnicdev->net_stats.rx_packets++;
+   /* Set ->len and ->tail to reflect packet size */
+   skb_put(out_skb, pkt_len);
+
+   skb_pull(out_skb, F7PF_HLEN_STD);
+   out_skb->protocol = eth_type_trans(out_skb, out_skb->dev);
+
+   /* Checksum offload */
+   if (GET_VNIC_RXC_ENTAG_SHIFTED(rxcd) ==
+   VIOC_F7PF_ET_ETH_IPV4_CKS)
+   out_skb->ip_summed = CHECKSUM_UNNECESSARY;
+   else {
+   out_skb->ip_summed = CHECKSUM_HW;
+   out_skb->csum =
+   ntohs(~GET_VNIC_RXC_CKSUM_SHIFTED(rxcd) & 0x);
+   }
+
+   netif_receive_skb(out_skb);
+   } else {
+   vnicdev->net_stats.rx_errors++;
+   if (rx_status & VNIC_RXC_ISBADLENGTH_W)
+   vnicdev->net_stats.rx_length_errors++;
+   if (rx_status & VNIC_RXC_ISBADCRC_W)
+   vnicdev

[PATCH 10/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Packet transmit code.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_transmit.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_transmit.c
--- linux-2.6.17/drivers/net/vioc/vioc_transmit.c   1969-12-31
16:00:00.0 -0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_transmit.c  2006-10-04
10:51:49.0 -0700
@@ -0,0 +1,1032 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "f7/vnic_defs.h"
+#include "f7/vioc_pkts_defs.h"
+
+#include "vioc_vnic.h"
+#include "vioc_api.h"
+
+#define VNIC_MIN_MTU   64
+#define TXQ00
+#define NOT_SET-1
+
+static inline u32 vnic_rd_txd_ctl(struct txq *txq)
+{
+   return readl(txq->va_of_vreg_veng_txd_ctl);
+}
+
+static inline void vnic_ring_tx_bell(struct txq *txq)
+{
+   writel(txq->shadow_VREG_VENG_TXD_CTL | VREG_VENG_TXD_CTL_QRING_MASK,
+  txq->va_of_vreg_veng_txd_ctl);
+   txq->bells++;
+}
+
+static inline void vnic_reset_tx_ring_err(struct txq *txq)
+{
+   writel(txq->shadow_VREG_VENG_TXD_CTL |
+  (VREG_VENG_TXD_CTL_QENABLE_MASK | VREG_VENG_TXD_CTL_CLEARMASK),
+  txq->va_of_vreg_veng_txd_ctl);
+}
+
+static inline void vnic_enable_tx_ring(struct txq *txq)
+{
+   txq->shadow_VREG_VENG_TXD_CTL = VREG_VENG_TXD_CTL_QENABLE_MASK;
+   writel(txq->shadow_VREG_VENG_TXD_CTL, txq->va_of_vreg_veng_txd_ctl);
+}
+
+static inline void vnic_disable_tx_ring(struct txq *txq)
+{
+   txq->shadow_VREG_VENG_TXD_CTL = 0;
+   writel(0, txq->va_of_vreg_veng_txd_ctl);
+}
+
+static inline void vnic_pause_tx_ring(struct txq *txq)
+{
+   txq->shadow_VREG_VENG_TXD_CTL |= VREG_VENG_TXD_CTL_QPAUSE_MASK;
+   writel(txq->shadow_VREG_VENG_TXD_CTL, txq->va_of_vreg_veng_txd_ctl);
+}
+
+static inline void vnic_resume_tx_ring(struct txq *txq)
+{
+   txq->shadow_VREG_VENG_TXD_CTL &= ~VREG_VENG_TXD_CTL_QPAUSE_MASK;
+   writel(txq->shadow_VREG_VENG_TXD_CTL, txq->va_of_vreg_veng_txd_ctl);
+}
+
+
+/* TxQ must be locked */
+static void vnic_reset_txq(struct vnic_device *vnicdev, struct txq *txq)
+{
+
+   struct tx_pktBufDesc_Phys_w *txd;
+   int i;
+
+   vnic_reset_tx_ring_err(txq);
+
+   /* The reset of the code is not executing
+* because so far we can't reset individual VNICs.
+* Need to (SW) Reset the whole VIOC.
+*/
+
+   vnic_disable_tx_ring(txq);
+   wmb();
+   /*
+* Clean-up all Tx Descriptors, take ownership of all
+* descriptors
+*/
+   for (i = 0; i < txq->count; i++) {
+   if (txq->desc) {
+   txd = TXD_PTR(txq, i);
+   txd->word_1 = 0;
+   txd->word_0 = 0;
+   }
+   if (txq->vbuf) {
+   if (txq->vbuf[i].dma) {
+   pci_unmap_page(vnicdev->viocdev->pdev,
+  txq->vbuf[i].dma,
+  txq->vbuf[i].length,
+  PCI_DMA_TODEVICE);
+   txq->vbuf[i].dma = 0;
+   }
+
+   /* Free skb , should be for SOP (in case of frags) only 
 */
+   if (txq->vbuf[i].skb) {
+   dev_kfree_skb_any((struct sk_buff *)txq->
+ vbuf[i].skb);
+   txq->vbuf[i].skb = NULL;
+   }
+   }
+   }
+   txq->next_to_clean = 0;
+   txq->next_to_use = 0;
+   txq->empty = txq->count;
+   wmb()

[PATCH 8/10] VIOC: New Network Device Driver

2006-10-05 Thread Misha Tomushev
Adding VIOC device driver. Device driver provisioning settings.

Signed-off-by: Misha Tomushev  <[EMAIL PROTECTED]>

diff -uprN linux-2.6.17/drivers/net/vioc/vioc_provision.c
linux-2.6.17.vioc/drivers/net/vioc/vioc_provision.c
--- linux-2.6.17/drivers/net/vioc/vioc_provision.c  1969-12-31
16:00:00.0 -0800
+++ linux-2.6.17.vioc/drivers/net/vioc/vioc_provision.c 2006-10-03
12:17:03.0 -0700
@@ -0,0 +1,226 @@
+/*
+ * Fabric7 Systems Virtual IO Controller Driver
+ * Copyright (C) 2003-2005 Fabric7 Systems.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ * USA
+ *
+ * http://www.fabric7.com/
+ *
+ * Maintainers:
+ *[EMAIL PROTECTED]
+ *
+ *
+ */
+#include "f7/vnic_hw_registers.h"
+#include "vioc_vnic.h"
+
+/*
+ * Standard parameters for ring provisioning.  Single TxQ per VNIC.
+ * Two RX sets per VIOC, with 3 RxDs, 1 RxC, 1 Rx interrupt per set.
+ */
+
+#define TXQ_ENTRIES1024
+#define TX_INTR_ON_EMPTY   false
+
+/* RXDQ sizes (entry counts) must be multiples of this */
+#defineRXDQ_ALIGN  VIOC_RXD_BATCH_BITS
+#defineRXDQ_ENTRIES1024
+
+#defineRXDQ_JUMBO_ENTRIES  ALIGN(RXDQ_ENTRIES,
RXDQ_ALIGN)
+#defineRXDQ_STD_ENTRIESALIGN(RXDQ_ENTRIES,
RXDQ_ALIGN)
+#defineRXDQ_SMALL_ENTRIES  ALIGN(RXDQ_ENTRIES,
RXDQ_ALIGN)
+#defineRXDQ_EXTRA_ENTRIES  ALIGN(RXDQ_ENTRIES,
RXDQ_ALIGN)
+
+#defineRXC_ENTRIES
(RXDQ_JUMBO_ENTRIES+RXDQ_STD_ENTRIES+RXDQ_SMALL_ENTRIES+RXDQ_EXTRA_ENTRIES)
+
+#defineRXDQ_JUMBO_BUFSIZE  (VNIC_MAX_MTU+ETH_HLEN+F7PF_HLEN_STD)
+#defineRXDQ_STD_BUFSIZE(VNIC_STD_MTU+ETH_HLEN+F7PF_HLEN_STD)
+#defineRXDQ_SMALL_BUFSIZE  (256+ETH_HLEN+F7PF_HLEN_STD)
+
+#defineRXDQ_JUMBO_ALLOC_BUFSIZEALIGN(RXDQ_JUMBO_BUFSIZE,64)
+#defineRXDQ_STD_ALLOC_BUFSIZE  ALIGN(RXDQ_STD_BUFSIZE,64)
+#defineRXDQ_SMALL_ALLOC_BUFSIZEALIGN(RXDQ_SMALL_BUFSIZE,64)
+
+/*
+ Every entry in this structure is defined as follows:
+
+struct vnic_prov_def {
+   struct rxd_q_prov rxd_ring[4];
+   u32  tx_entries;Size of Tx Ring
+   u32  rxc_entries;   Size of Rx Completion Ring
+   u8  rxc_id; Rx Completion queue ID
+   u8  rxc_intr_id;INTR servicing the above Rx Completion queue
+};
+
+The 4 rxd_q_prov structures of rxd_ring[] array define  Rx queues per VNIC.
+struct rxd_q_prov {
+   u32buf_size;Buffer size
+   u32entries; Size of the queue
+   u8 id;  Queue id/
+   u8 state;   Provisioning state 1-ena, 0-dis
+};
+
+*/
+
+struct vnic_prov_def vnic_set_0 = {
+   .rxd_ring[0].buf_size = RXDQ_SMALL_ALLOC_BUFSIZE,
+   .rxd_ring[0].entries = RXDQ_SMALL_ENTRIES,
+   .rxd_ring[0].id = 0,
+   .rxd_ring[0].state = 1,
+   .rxd_ring[1].buf_size = RXDQ_STD_ALLOC_BUFSIZE,
+   .rxd_ring[1].entries = RXDQ_STD_ENTRIES,
+   .rxd_ring[1].id = 1,
+   .rxd_ring[1].state = 1,
+   .rxd_ring[2].buf_size = RXDQ_JUMBO_ALLOC_BUFSIZE,
+   .rxd_ring[2].entries = RXDQ_JUMBO_ENTRIES,
+   .rxd_ring[2].id = 2,
+   .rxd_ring[2].state = 1,
+   .tx_entries = TXQ_ENTRIES,.rxc_entries = RXC_ENTRIES,.rxc_id =
+   0,.rxc_intr_id = 0
+};
+
+struct vnic_prov_def vnic_set_1 = {
+   .rxd_ring[0].buf_size = RXDQ_SMALL_ALLOC_BUFSIZE,
+   .rxd_ring[0].entries = RXDQ_SMALL_ENTRIES,
+   .rxd_ring[0].id = 4,
+   .rxd_ring[0].state = 1,
+   .rxd_ring[1].buf_size = RXDQ_STD_ALLOC_BUFSIZE,
+   .rxd_ring[1].entries = RXDQ_STD_ENTRIES,
+   .rxd_ring[1].id = 5,
+   .rxd_ring[1].state = 1,
+   .rxd_ring[2].buf_size = RXDQ_JUMBO_ALLOC_BUFSIZE,
+   .rxd_ring[2].entries = RXDQ_JUMBO_ENTRIES,
+   .rxd_ring[2].id = 6,
+   .rxd_ring[2].state = 1,
+   .tx_entries = TXQ_ENTRIES,.rxc_entries = RXC_ENTRIES,.rxc_id =
+   1,.rxc_intr_id = 1
+};
+
+struct vnic_prov_def vnic_set_2 = {
+   .rxd_ring[0].buf_size = RXDQ_SMALL_ALLOC_BUFSIZE,
+   .rxd_ring[0].entries = RXDQ_SMALL_ENTRIES,
+   .rxd_ring[0].id = 8,
+   .rxd_ring[0].state = 1,
+   .rxd_ring[1].buf_size = RXDQ_STD_ALLOC_BUFSIZE,
+   .rxd_ring[1].e

Re: [PATCH][RFC] net/ipv6: seperate sit driver to extra module

2006-10-05 Thread James Morris
On Thu, 5 Oct 2006, Joerg Roedel wrote:

> Is there a reason why the tunnel driver for IPv6-in-IPv4 is currently
> compiled into the ipv6 module? This driver is only needed in gateways
> between different IPv6 networks. On all other hosts with ipv6 enabled it
> is not required. To have this driver in a seperate module will save
> memory on those machines.
> I appended a small and trival patch to 2.6.18 which does exactly this.

Looks ok to me, although given that users used to get this by default when 
selecting IPv6, perhaps the default in Kconfig should be y.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] net/ipv6: seperate sit driver to extra module

2006-10-05 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Thu, 5 Oct 2006 11:49:38 -0400 (EDT)), James 
Morris <[EMAIL PROTECTED]> says:

> On Thu, 5 Oct 2006, Joerg Roedel wrote:
> 
> > Is there a reason why the tunnel driver for IPv6-in-IPv4 is currently
> > compiled into the ipv6 module? This driver is only needed in gateways
> > between different IPv6 networks. On all other hosts with ipv6 enabled it
> > is not required. To have this driver in a seperate module will save
> > memory on those machines.
> > I appended a small and trival patch to 2.6.18 which does exactly this.
> 
> Looks ok to me, although given that users used to get this by default when 
> selecting IPv6, perhaps the default in Kconfig should be y.

Agreed.  And, we could add #ifdef in addrconf.c.

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox
On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:

> Please don't snip the Code: line. It is fairly important.

Sorry about that. The remote console I was using appears to overwrite
some text after I force the reboot. Here's a clean one.

global 
Unable to handle kernel NULL pointer dereference at 0827 RIP:
 [] xfrm_register_mode+0x36/0x60
PGD 0
Oops:  [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18-git22 #3
RIP: 0010:[]  [] 
xfrm_register_mode+0x36/0x60
RSP: :810bffcbded0  EFLAGS: 00010286
RAX: 081f RBX: 805588a0 RCX: 
RDX:  RSI: 0046 RDI: 80559550
RBP: ffef R08: 7a02 R09: 000e
R10: 0006 R11: 80334660 R12: 
R13: 810bffcbdef0 R14:  R15: 
FS:  () GS:805d2000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0827 CR3: 00201000 CR4: 06e0
Process swapper (pid: 1, threadinfo 810bffcbc000, task 810bffcbb4e0)
Stack:   8061fb48  80207182
    
    0009
Call Trace:
 [] init+0x162/0x330
 [] child_rip+0xa/0x12
 [] acpi_ds_init_one_object+0x0/0x82
 [] init+0x0/0x330
 [] child_rip+0x0/0x12


Code: 48 83 78 08 00 75 06 48 89 58 08 31 ed 48 89 d7 e8 65 fd ff
RIP  [] xfrm_register_mode+0x36/0x60
 RSP 
CR2: 0827
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

> My guess is that something is wrong with the global variable it is accessing.
> Can you post the output of grep -5 xfrm_policy_afinfo ? 

elm3b239:/boot # grep -5 xfrm_policy_afinfo System.map-2.6.18-git22
805594c0 d xfrm4_state_afinfo
80559500 D xfrm_cfg_mutex
80559530 d xfrm_dev_notifier
80559548 d xfrm_policy_lock
8055954c d xfrm_policy_gc_lock
80559550 d xfrm_policy_afinfo_lock
80559560 d xfrm_hash_work
805595c0 d hash_resize_mutex
80559600 D sysctl_xfrm_aevent_etime
80559604 D sysctl_xfrm_aevent_rseqth
80559610 D km_waitq
--
8075bfd8 b idiagnl
8075bfe0 B xfrm_policy_count
8075bff8 b xfrm_policy_gc_list
8075c000 b dummy.28400
8075c038 b idx_generator.27450
8075c040 b xfrm_policy_afinfo
8075c140 b xfrm_policy_gc_work
8075c1a0 b xfrm_policy_inexact
8075c1e0 B xfrm_nl
8075c1e8 b xfrm_state_gc_list
8075c1f0 b acqseq.27386

> And please add a 
> printk("global %p\n",  xfrm_policy_afinfo[family]);
> at the beginning of net/xfrm/xfrm_poliy.c:xfrm_policy_lock_afinfo
> and post the output.

Included above.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen
On Thursday 05 October 2006 19:57, Steve Fox wrote:
> On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
> 
> > Please don't snip the Code: line. It is fairly important.
> 
> Sorry about that. The remote console I was using appears to overwrite
> some text after I force the reboot. Here's a clean one.
> 
> global 

Ok that definitely shouldn't be in there.

I guess we need to track when it gets corrupted. Can you send the full
boot log with this patch applied?


-Andi

Index: linux-2.6.19-rc1-hack/init/main.c
===
--- linux-2.6.19-rc1-hack.orig/init/main.c
+++ linux-2.6.19-rc1-hack/init/main.c
@@ -75,6 +75,9 @@
 
 static int init(void *);
 
+extern void bugcheck(char *, int);
+#define CHECK bugcheck(__FILE__, __LINE__)
+
 extern void init_IRQ(void);
 extern void fork_init(unsigned long);
 extern void mca_init(void);
@@ -480,6 +483,8 @@ asmlinkage void __init start_kernel(void
char * command_line;
extern struct kernel_param __start___param[], __stop___param[];
 
+   CHECK;
+
smp_setup_processor_id();
 
/*
@@ -502,7 +507,9 @@ asmlinkage void __init start_kernel(void
page_address_init();
printk(KERN_NOTICE);
printk(linux_banner);
+   CHECK;
setup_arch(&command_line);
+   CHECK;
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
 
@@ -517,6 +524,7 @@ asmlinkage void __init start_kernel(void
 * fragile until we cpu_idle() for the first time.
 */
preempt_disable();
+   CHECK;
build_all_zonelists();
page_alloc_init();
printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line);
@@ -525,6 +533,7 @@ asmlinkage void __init start_kernel(void
   __stop___param - __start___param,
   &unknown_bootoption);
sort_main_extable();
+   CHECK;
trap_init();
rcu_init();
init_IRQ();
@@ -533,8 +542,10 @@ asmlinkage void __init start_kernel(void
hrtimers_init();
softirq_init();
timekeeping_init();
+   CHECK;
time_init();
profile_init();
+   CHECK;
if (!irqs_disabled())
printk("start_kernel(): bug: interrupts were enabled early\n");
early_boot_irqs_on();
@@ -568,7 +579,9 @@ asmlinkage void __init start_kernel(void
 #endif
vfs_caches_init_early();
cpuset_init_early();
+   CHECK;
mem_init();
+   CHECK;
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
@@ -577,6 +590,7 @@ asmlinkage void __init start_kernel(void
calibrate_delay();
pidmap_init();
pgtable_cache_init();
+   CHECK;
prio_tree_init();
anon_vma_init();
 #ifdef CONFIG_X86
@@ -586,12 +600,14 @@ asmlinkage void __init start_kernel(void
fork_init(num_physpages);
proc_caches_init();
buffer_init();
+   CHECK;
unnamed_dev_init();
key_init();
security_init();
vfs_caches_init(num_physpages);
radix_tree_init();
signals_init();
+   CHECK;
/* rootfs populating might need page-writeback */
page_writeback_init();
 #ifdef CONFIG_PROC_FS
@@ -599,6 +615,7 @@ asmlinkage void __init start_kernel(void
 #endif
cpuset_init();
taskstats_init_early();
+   CHECK;
delayacct_init();
 
check_bugs();
@@ -609,7 +626,7 @@ asmlinkage void __init start_kernel(void
rest_init();
 }
 
-static int __initdata initcall_debug;
+static int __initdata initcall_debug = 1;
 
 static int __init initcall_debug_setup(char *str)
 {
@@ -639,7 +656,11 @@ static void __init do_initcalls(void)
printk("\n");
}
 
+   CHECK;
+
result = (*call)();
+   
+   CHECK;
 
if (result && result != -ENODEV && initcall_debug) {
sprintf(msgbuf, "error code %d", result);
@@ -725,21 +746,32 @@ static int init(void * unused)
 
smp_prepare_cpus(max_cpus);
 
+   CHECK;
+
do_pre_smp_initcalls();
 
smp_init();
+
+   CHECK;
+
sched_init_smp();
 
cpuset_init_smp();
 
+   CHECK;
+
/*
 * Do this before initcalls, because some drivers want to access
 * firmware files.
 */
populate_rootfs();
 
+   CHECK;
+
do_basic_setup();
 
+   CHECK;
+
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
Index: linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
===
--- linux-2.6.19-rc1-hack.orig/net/xfrm/xfrm_policy.c
+++ linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
@@ -39,6 +39,16 @@ EXPORT_SYMBOL(xfrm_policy_count);
 static DEFINE_RWLOCK(xfrm_policy_afinfo_lock);
 sta

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox
On Thu, 2006-10-05 at 20:27 +0200, Andi Kleen wrote:

> I guess we need to track when it gets corrupted. Can you send the full
> boot log with this patch applied?

Here she blows!

root (hd0,0)
 Filesystem type is reiserfs, partition type 0x83
kernel /boot/vmlinuz-autobench root=/dev/sda1 vga=791
ip=9.47.67.239:9.47.67.5
0:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts console=tty0
console=ttyS0,
57600 autobench_args: root=/dev/sda1 ABAT:1160073474
   [Linux-bzImage, setup=0x1400, size=0x1dd755]
initrd /boot/initrd-autobench.img
   [Linux-initrd @ 0x37ceb000, 0x304c57 bytes]

Linux version 2.6.18-git22 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE
Linux)) #4 SMP Thu Oct 5 11:36:21 PDT 2006
Command line: root=/dev/sda1 vga=791
ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1
showopts console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1
ABAT:1160073474
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009ac00 (usable)
 BIOS-e820: 0009ac00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - bff764c0 (usable)
 BIOS-e820: bff764c0 - bff98880 (ACPI data)
 BIOS-e820: bff98880 - c000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 000c (usable)
end_pfn_map = 12582912
DMI 2.3 present.
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 -> 12582912
early_node_map[3] active PFN ranges
0:0 ->  154
0:  256 ->   786294
0:  1048576 -> 12582912
ACPI: PM-Timer IO Port: 0x9c
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x10] enabled)
Processor #16
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x11] enabled)
Processor #17
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x16] enabled)
Processor #22
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x17] enabled)
Processor #23
ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] enabled)
Processor #32
ACPI: LAPIC (acpi_id[0x11] lapic_id[0x21] enabled)
Processor #33
ACPI: LAPIC (acpi_id[0x12] lapic_id[0x26] enabled)
Processor #38
ACPI: LAPIC (acpi_id[0x13] lapic_id[0x27] enabled)
Processor #39
ACPI: LAPIC (acpi_id[0x14] lapic_id[0x30] enabled)
Processor #48
ACPI: LAPIC (acpi_id[0x15] lapic_id[0x31] enabled)
Processor #49
ACPI: LAPIC (acpi_id[0x16] lapic_id[0x36] enabled)
Processor #54
ACPI: LAPIC (acpi_id[0x17] lapic_id[0x37] enabled)
Processor #55
ACPI: LAPIC (acpi_id[0x20] lapic_id[0x40] enabled)
Processor #64
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x21] lapic_id[0x41] enabled)
Processor #65
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x22] lapic_id[0x46] enabled)
Processor #70
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x23] lapic_id[0x47] enabled)
Processor #71
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x24] lapic_id[0x50] enabled)
Processor #80
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x25] lapic_id[0x51] enabled)
Processor #81
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x26] lapic_id[0x56] enabled)
Processor #86
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x27] lapic_id[0x57] enabled)
Processor #87
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x30] lapic_id[0x60] enabled)
Processor #96
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x31] lapic_id[0x61] enabled)
Processor #97
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x32] lapic_id[0x66] enabled)
Processor #102
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x33] lapic_id[0x67] enabled)
Processor #103
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x34] lapic_id[0x70] enabled)
Processor #112
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x35] lapic_id[0x71] enabled)
Processor #113
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x36] lapic_id[0x76] enabled)
Processor #118
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC (acpi_id[0x37] lapic_id[0x77] enabled)
Processor #119
WARNING: NR_CPUS limit of 16 reached. Processor ignored.
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Vivek Goyal
On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote:
> On Thursday 05 October 2006 19:57, Steve Fox wrote:
> > On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
> > 
> > > Please don't snip the Code: line. It is fairly important.
> > 
> > Sorry about that. The remote console I was using appears to overwrite
> > some text after I force the reboot. Here's a clean one.
> > 
> > global 
> 
> Ok that definitely shouldn't be in there.
> 
> I guess we need to track when it gets corrupted. Can you send the full
> boot log with this patch applied?
> 

Just recalled one more observation about the problem when keith had
reported it last. If I just move .bss before .data_nosave instead
of it being at the end, keith's problem had disappeared.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen
On Thursday 05 October 2006 20:51, Steve Fox wrote:
> On Thu, 2006-10-05 at 20:27 +0200, Andi Kleen wrote:
> 
> > I guess we need to track when it gets corrupted. Can you send the full
> > boot log with this patch applied?
> 
> Here she blows!

Can you please try it again with this patch to narrow it down further?

-Andi

Index: linux-2.6.19-rc1-hack/init/main.c
===
--- linux-2.6.19-rc1-hack.orig/init/main.c
+++ linux-2.6.19-rc1-hack/init/main.c
@@ -75,6 +75,9 @@
 
 static int init(void *);
 
+extern void bugcheck(char *, int);
+#define CHECK bugcheck(__FILE__, __LINE__)
+
 extern void init_IRQ(void);
 extern void fork_init(unsigned long);
 extern void mca_init(void);
@@ -480,6 +483,8 @@ asmlinkage void __init start_kernel(void
char * command_line;
extern struct kernel_param __start___param[], __stop___param[];
 
+   CHECK;
+
smp_setup_processor_id();
 
/*
@@ -502,7 +507,9 @@ asmlinkage void __init start_kernel(void
page_address_init();
printk(KERN_NOTICE);
printk(linux_banner);
+   CHECK;
setup_arch(&command_line);
+   CHECK;
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
 
@@ -517,6 +524,7 @@ asmlinkage void __init start_kernel(void
 * fragile until we cpu_idle() for the first time.
 */
preempt_disable();
+   CHECK;
build_all_zonelists();
page_alloc_init();
printk(KERN_NOTICE "Kernel command line: %s\n", saved_command_line);
@@ -525,6 +533,7 @@ asmlinkage void __init start_kernel(void
   __stop___param - __start___param,
   &unknown_bootoption);
sort_main_extable();
+   CHECK;
trap_init();
rcu_init();
init_IRQ();
@@ -533,8 +542,10 @@ asmlinkage void __init start_kernel(void
hrtimers_init();
softirq_init();
timekeeping_init();
+   CHECK;
time_init();
profile_init();
+   CHECK;
if (!irqs_disabled())
printk("start_kernel(): bug: interrupts were enabled early\n");
early_boot_irqs_on();
@@ -568,7 +579,9 @@ asmlinkage void __init start_kernel(void
 #endif
vfs_caches_init_early();
cpuset_init_early();
+   CHECK;
mem_init();
+   CHECK;
kmem_cache_init();
setup_per_cpu_pageset();
numa_policy_init();
@@ -577,6 +590,7 @@ asmlinkage void __init start_kernel(void
calibrate_delay();
pidmap_init();
pgtable_cache_init();
+   CHECK;
prio_tree_init();
anon_vma_init();
 #ifdef CONFIG_X86
@@ -586,12 +600,14 @@ asmlinkage void __init start_kernel(void
fork_init(num_physpages);
proc_caches_init();
buffer_init();
+   CHECK;
unnamed_dev_init();
key_init();
security_init();
vfs_caches_init(num_physpages);
radix_tree_init();
signals_init();
+   CHECK;
/* rootfs populating might need page-writeback */
page_writeback_init();
 #ifdef CONFIG_PROC_FS
@@ -599,6 +615,7 @@ asmlinkage void __init start_kernel(void
 #endif
cpuset_init();
taskstats_init_early();
+   CHECK;
delayacct_init();
 
check_bugs();
@@ -609,7 +626,7 @@ asmlinkage void __init start_kernel(void
rest_init();
 }
 
-static int __initdata initcall_debug;
+static int __initdata initcall_debug = 1;
 
 static int __init initcall_debug_setup(char *str)
 {
@@ -639,7 +656,11 @@ static void __init do_initcalls(void)
printk("\n");
}
 
+   CHECK;
+
result = (*call)();
+   
+   CHECK;
 
if (result && result != -ENODEV && initcall_debug) {
sprintf(msgbuf, "error code %d", result);
@@ -725,21 +746,32 @@ static int init(void * unused)
 
smp_prepare_cpus(max_cpus);
 
+   CHECK;
+
do_pre_smp_initcalls();
 
smp_init();
+
+   CHECK;
+
sched_init_smp();
 
cpuset_init_smp();
 
+   CHECK;
+
/*
 * Do this before initcalls, because some drivers want to access
 * firmware files.
 */
populate_rootfs();
 
+   CHECK;
+
do_basic_setup();
 
+   CHECK;
+
/*
 * check if there is an early userspace init.  If yes, let it do all
 * the work
Index: linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
===
--- linux-2.6.19-rc1-hack.orig/net/xfrm/xfrm_policy.c
+++ linux-2.6.19-rc1-hack/net/xfrm/xfrm_policy.c
@@ -39,6 +39,16 @@ EXPORT_SYMBOL(xfrm_policy_count);
 static DEFINE_RWLOCK(xfrm_policy_afinfo_lock);
 static struct xfrm_policy_afinfo *xfrm_policy_afinfo[NPROTO];
 
+void bugcheck(char *where, int line)
+{
+   int i;
+   for (i = 0; i < NPROTO; i++)
+ 

Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Andi Kleen
On Thursday 05 October 2006 20:52, Vivek Goyal wrote:
> On Thu, Oct 05, 2006 at 08:27:02PM +0200, Andi Kleen wrote:
> > On Thursday 05 October 2006 19:57, Steve Fox wrote:
> > > On Thu, 2006-10-05 at 17:40 +0200, Andi Kleen wrote:
> > > 
> > > > Please don't snip the Code: line. It is fairly important.
> > > 
> > > Sorry about that. The remote console I was using appears to overwrite
> > > some text after I force the reboot. Here's a clean one.
> > > 
> > > global 
> > 
> > Ok that definitely shouldn't be in there.
> > 
> > I guess we need to track when it gets corrupted. Can you send the full
> > boot log with this patch applied?
> > 
> 
> Just recalled one more observation about the problem when keith had
> reported it last. If I just move .bss before .data_nosave instead
> of it being at the end, keith's problem had disappeared.

Yes, that could well be that it's something in the new bootmap 
management.  Steve's box failed at

Using ACPI (MADT) for SMP configuration information
Nosave address range: 0009a000 - 0009b000
Nosave address range: 0009b000 - 000a
Nosave address range: 000a - 000e
Nosave address range: 000e - 0010
Nosave address range: bff76000 - bff77000
Nosave address range: bff77000 - bff98000
Nosave address range: bff98000 - bff99000
Nosave address range: bff99000 - c000
Nosave address range: c000 - fec0
Nosave address range: fec0 - 0001
Allocating PCI resources starting at c400 (gap: c000:3ec0)
afinfo corrupted at init/main.c:512

which is directly after that code does lots of stuff.

Mel might want to take a look (and perhaps
also cut down a little on the ugly printks ...) 

BTW I found one of my test systems too now which does a lot of:
I'm about to leave for vacation so i won't have time to track it down
any time soon. But here is it for reference.

-Andi

Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 800
Bad page state in process 'swapper'
page:810003ee5480 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:   

Call Trace:  
 [] show_trace+0x34/0x47
 [] dump_stack+0x12/0x17
 [] bad_page+0x57/0x81
 [] __free_pages_ok+0x64/0x247
 [] free_all_bootmem_core+0xcc/0x1a9
 [] numa_free_all_bootmem+0x3b/0x77
 [] mem_init+0x44/0x186
 [] start_kernel+0x17b/0x207
 [] _sinittext+0x168/0x16c

Bad page state in process 'swapper'
page:810003ee54b8 flags:0x mapping: 
mapcount:1 count:0
Trying to fix it up, but a reboot is needed
Backtrace:   

Call Trace:  
 [] show_trace+0x34/0x47
 [] dump_stack+0x12/0x17
 [] bad_page+0x57/0x81
 [] __free_pages_ok+0x64/0x247
 [] free_all_bootmem_core+0xcc/0x1a9
 [] numa_free_all_bootmem+0x3b/0x77
 [] mem_init+0x44/0x186
 [] start_kernel+0x17b/0x207
 [] _sinittext+0x168/0x16c


... lots more of those ...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Please pull bcm43xx-d80211 fixes

2006-10-05 Thread Michael Buesch
Hi John,

I updated the "for-linville" branch of my tree with one important
bugfix and a small cleanup.
In case you didn't pull, yet, this will pull all changes of my previous
pull request plus the following two.

bcm43xx-d80211: Wait for the firmware to respond, before we read revision codes.
http://bu3sch.de/gitweb?p=wireless-dev.git;a=commitdiff;h=faac518bf4a2d2846a7153b0b4f8b99ff8db4166

bcm43xx-d80211: Remove unused "err" variables.
http://bu3sch.de/gitweb?p=wireless-dev.git;a=commitdiff;h=455ae5bb4ee0b18ed06ffee0d89b92a8fca3f217


The old branch (as I submitted it in my previous pull request) is
still available for reference as "old-for-linville". But I don't think
you'll need it. Simply pull from "for-linville" now, please.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-mm2 boot failure on x86-64

2006-10-05 Thread Steve Fox
On Thu, 2006-10-05 at 21:08 +0200, Andi Kleen wrote:

> Mel might want to take a look (and perhaps
> also cut down a little on the ugly printks ...) 

I tested a patch from Mel which backs out the arch independent zone
sizing and got the same results (to my inexperienced eye). I've sent him
the boot log to verify they really are the same as without this
back-out.

-- 

Steve Fox
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
This version takes into account David Miller's comments
regarding treatment of security layer errors in the case
of socket policies. Specifically, these errors will be
treated like how these kind of errors are treated for
the main/sub policies, which is to return a full lookup
failure.

 include/linux/security.h|   24 ++-
 include/net/flow.h  |2 
 include/net/xfrm.h  |3 
 net/core/flow.c |   42 
 net/ipv4/xfrm4_policy.c |2 
 net/ipv6/xfrm6_policy.c |2 
 net/key/af_key.c|5 -
 net/xfrm/xfrm_policy.c  |  101 ++
 net/xfrm/xfrm_user.c|9 --
 security/dummy.c|3 
 security/selinux/include/xfrm.h |3 
 security/selinux/xfrm.c |   53 ---
 12 files changed, 162 insertions(+), 87 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Fix for IPsec leakage with SELinux enabled - V.03: Process security errors for scket policies also

2006-10-05 Thread Venkat Yekkirala
This treats the security errors encountered in the case of
socket policy matching, the same as how these are treated in
the case of main/sub policies, which is to return a full lookup
failure.

Signed-off-by: Venkat Yekkirala <[EMAIL PROTECTED]>
---
 net/xfrm/xfrm_policy.c |   26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

--- net-2.6.sid5/net/xfrm/xfrm_policy.c 2006-10-05 14:36:07.0 -0500
+++ net-2.6/net/xfrm/xfrm_policy.c  2006-10-05 14:38:32.0 -0500
@@ -1013,12 +1013,16 @@ static struct xfrm_policy *xfrm_sk_polic
sk->sk_family);
int err = 0;
 
-   if (match)
- err = security_xfrm_policy_lookup(pol, fl->secid, 
policy_to_flow_dir(dir));
-
-   if (match && !err)
-   xfrm_pol_hold(pol);
-   else
+   if (match) {
+   err = security_xfrm_policy_lookup(pol, fl->secid,
+   policy_to_flow_dir(dir));
+   if (!err)
+   xfrm_pol_hold(pol);
+   else if (err == -ESRCH)
+   pol = NULL;
+   else
+   pol = ERR_PTR(err);
+   } else
pol = NULL;
}
read_unlock_bh(&xfrm_policy_lock);
@@ -1310,8 +1314,11 @@ restart:
pol_dead = 0;
xfrm_nr = 0;
 
-   if (sk && sk->sk_policy[1])
+   if (sk && sk->sk_policy[1]) {
policy = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
+   if (IS_ERR(policy))
+   return PTR_ERR(policy);
+   }
 
if (!policy) {
/* To accelerate a bit...  */
@@ -1604,8 +1611,11 @@ int __xfrm_policy_check(struct sock *sk,
}
 
pol = NULL;
-   if (sk && sk->sk_policy[dir])
+   if (sk && sk->sk_policy[dir]) {
pol = xfrm_sk_policy_lookup(sk, dir, &fl);
+   if (IS_ERR(pol))
+   return 0;
+   }
 
if (!pol)
pol = flow_cache_lookup(&fl, family, fl_dir,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03: Fix xfrm code

2006-10-05 Thread Venkat Yekkirala
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.

The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.

This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a 
confined service on an SELinux box (vsftpd), which did not have the 
appropriate SELinux policy permissions to send packets via IPsec.

With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).

Secondly, when a security module is loaded (in this case, SELinux), the 
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL.  We were not handling that correctly, and in fact 
inverting the return logic and propagating a false "ok" back up to 
xfrm_lookup(), which then allowed packets to pass as if they were not 
associated with an xfrm policy.

The solution for this is to first ensure that errno values are 
correctly propagated all the way back up through the various call chains 
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver 
fails (typically a permission denied via the security module), the flow 
cache entry is killed rather than having a null policy assigned (which 
indicates that the packet can pass freely).  This also forces any future 
lookups for the same flow to consult the security module (e.g. SELinux) 
for current security policy (rather than, say, caching the error on the 
flow cache entry).

This patch: Fix the selinux side of things.

This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.

Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.

Signed-off-by: Venkat Yekkirala <[EMAIL PROTECTED]>
---
 include/linux/security.h|   24 +
 include/net/xfrm.h  |3 +
 net/ipv4/xfrm4_policy.c |2 -
 net/ipv6/xfrm6_policy.c |2 -
 net/key/af_key.c|5 --
 net/xfrm/xfrm_policy.c  |7 ++-
 net/xfrm/xfrm_user.c|9 -
 security/dummy.c|3 +
 security/selinux/include/xfrm.h |3 +
 security/selinux/xfrm.c |   53 +++---
 10 files changed, 62 insertions(+), 49 deletions(-)

--- net-2.6.sid3/include/linux/security.h   2006-10-01 15:18:23.0 
-0500
+++ net-2.6.sid4/include/linux/security.h   2006-10-05 12:03:39.0 
-0500
@@ -893,7 +893,8 @@ struct request_sock;
  * Check permission when a flow selects a xfrm_policy for processing
  * XFRMs on a packet.  The hook is called when selecting either a
  * per-socket policy or a generic xfrm policy.
- * Return 0 if permission is granted.
+ * Return 0 if permission is granted, -ESRCH otherwise, or -errno
+ * on other errors.
  * @xfrm_state_pol_flow_match:
  * @x contains the state to match.
  * @xp contains the policy to check for a match.
@@ -902,6 +903,7 @@ struct request_sock;
  * @xfrm_flow_state_match:
  * @fl contains the flow key to match.
  * @xfrm points to the xfrm_state to match.
+ * @xp points to the xfrm_policy to match.
  * Return 1 if there is a match.
  * @xfrm_decode_session:
  * @skb points to skb to decode.
@@ -1402,7 +1404,8 @@ struct security_operations {
int (*xfrm_policy_lookup)(struct xfrm_policy *xp, u32 fl_secid, u8 dir);
int (*xfrm_state_pol_flow_match)(struct xfrm_state *x,
struct xfrm_policy *xp, struct flowi *fl);
-   int (*xfrm_flow_state_match)(struct flowi *fl, struct xfrm_state *xfrm);
+   int (*xfrm_flow_state_match)(struct flowi *fl, struct xfrm_state *xfrm,
+   struct xfrm_policy *xp);
int (*xfrm_decode_session)(struct sk_buff *skb, u32 *secid, int ckall);
 #endif /* CONFIG_SECURITY_NETWORK_XFRM */
 
@@ -3168,11 +3171,6 @@ static inline int security_xfrm_policy_a
return security_ops->xfrm_policy_alloc_security(xp, sec_ctx, NULL);
 }
 
-static inline int security_xfrm_sock_policy_a

[PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
From: James Morris <[EMAIL PROTECTED]>

When a security module is loaded (in this case, SELinux), the 
security_xfrm_policy_lookup() hook can return an access denied permission 
(or other error).  We were not handling that correctly, and in fact 
inverting the return logic and propagating a false "ok" back up to 
xfrm_lookup(), which then allowed packets to pass as if they were not 
associated with an xfrm policy.

The way I was seeing the problem was when connecting via IPsec to a 
confined service on an SELinux box (vsftpd), which did not have the 
appropriate SELinux policy permissions to send packets via IPsec.

The first SYNACK would be blocked, because of an uncached lookup via 
flow_cache_lookup(), which would fail to resolve an xfrm policy because 
the SELinux policy is checked at that point via the resolver.

However, retransmitted SYNACKs would then find a cached flow entry when 
calling into flow_cache_lookup() with a null xfrm policy, which is 
interpreted by xfrm_lookup() as the packet not having any associated 
policy and similarly to the first case, allowing it to pass without 
transformation.

The solution presented here is to first ensure that errno values are 
correctly propagated all the way back up through the various call chains 
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver 
fails (typically a permission denied via the security module), the flow 
cache entry is killed rather than having a null policy assigned (which 
indicates that the packet can pass freely).  This also forces any future 
lookups for the same flow to consult the security module (e.g. SELinux) 
for current security policy (rather than, say, caching the error on the 
flow cache entry).

Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 include/net/flow.h |2 -
 net/core/flow.c|   42 
 net/xfrm/xfrm_policy.c |   68 ++-
 3 files changed, 82 insertions(+), 30 deletions(-)

diff -purN -X dontdiff net-2.6.o/include/net/flow.h net-2.6.w/include/net/flow.h
--- net-2.6.o/include/net/flow.h2006-09-29 11:33:58.0 -0400
+++ net-2.6.w/include/net/flow.h2006-09-30 23:50:32.0 -0400
@@ -97,7 +97,7 @@ struct flowi {
 #define FLOW_DIR_FWD   2
 
 struct sock;
-typedef void (*flow_resolve_t)(struct flowi *key, u16 family, u8 dir,
+typedef int (*flow_resolve_t)(struct flowi *key, u16 family, u8 dir,
   void **objp, atomic_t **obj_refp);
 
 extern void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir,
diff -purN -X dontdiff net-2.6.o/net/core/flow.c net-2.6.w/net/core/flow.c
--- net-2.6.o/net/core/flow.c   2006-09-29 11:33:59.0 -0400
+++ net-2.6.w/net/core/flow.c   2006-10-01 01:07:24.0 -0400
@@ -85,6 +85,14 @@ static void flow_cache_new_hashrnd(unsig
add_timer(&flow_hash_rnd_timer);
 }
 
+static void flow_entry_kill(int cpu, struct flow_cache_entry *fle)
+{
+   if (fle->object)
+   atomic_dec(fle->object_ref);
+   kmem_cache_free(flow_cachep, fle);
+   flow_count(cpu)--;
+}
+
 static void __flow_cache_shrink(int cpu, int shrink_to)
 {
struct flow_cache_entry *fle, **flp;
@@ -100,10 +108,7 @@ static void __flow_cache_shrink(int cpu,
}
while ((fle = *flp) != NULL) {
*flp = fle->next;
-   if (fle->object)
-   atomic_dec(fle->object_ref);
-   kmem_cache_free(flow_cachep, fle);
-   flow_count(cpu)--;
+   flow_entry_kill(cpu, fle);
}
}
 }
@@ -220,24 +225,33 @@ void *flow_cache_lookup(struct flowi *ke
 
 nocache:
{
+   int err;
void *obj;
atomic_t *obj_ref;
 
-   resolver(key, family, dir, &obj, &obj_ref);
+   err = resolver(key, family, dir, &obj, &obj_ref);
 
if (fle) {
-   fle->genid = atomic_read(&flow_cache_genid);
-
-   if (fle->object)
-   atomic_dec(fle->object_ref);
-
-   fle->object = obj;
-   fle->object_ref = obj_ref;
-   if (obj)
-   atomic_inc(fle->object_ref);
+   if (err) {
+   /* Force security policy check on next lookup */
+   *head = fle->next;
+   flow_entry_kill(cpu, fle);
+   } else {
+   fle->genid = atomic_read(&flow_cache_genid);
+   
+   if (fle->object)
+   atomic_dec(fle->object_ref);
+   
+   fle->object = 

Re: [PATCH] fix for system lockups in 2.6.18-rcX caused by bcm43xx

2006-10-05 Thread Randy Dunlap
On Thu, 14 Sep 2006 10:29:30 +0200 Jarek Poplawski wrote:

> On Thu, Sep 14, 2006 at 10:25:32AM +0200, Jarek Poplawski wrote:
> ...
> > "Attachments are discouraged, but some corporate mail systems 
> > provide no other way to send patches."
> > 
> > I thought they didn't read this but now I understand for whom 
> > Mozilla Firefox is breaking all those lines with no mercy!
>  
> Mozilla Thunderbird. Sorry.

see http://mbligh.org/linuxdocs/Email/Clients/Thunderbird

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread James Morris
On Thu, 5 Oct 2006, Venkat Yekkirala wrote:

> - if (xfrm_policy_match(pol, fl, type, family, dir)) {
> + err = xfrm_policy_match(pol, fl, type, family, dir);
> + if (err) {
> + if (err == -ESRCH)
> + continue;
> + else {
> + ret = ERR_PTR(err);
> + goto fail;
> + }
> + } else {

Semantics issue: if the exact policy match fails with -EACCESS, should we 
then try an inexact match before failing?

>  #ifdef CONFIG_XFRM_SUB_POLICY
>   pol = xfrm_policy_lookup_bytype(XFRM_POLICY_TYPE_SUB, fl, family, dir);
> - if (pol)
> + if (IS_ERR(pol)) {
> + err = PTR_ERR(pol);
> + pol = NULL;
> + }
> + if (pol || err)
>   goto end;

Similarly, if the sub-policy lookup returns -EACCESS, should we then try a 
main policy lookup before failing?

I would think yes to both.

Opinions?


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix for IPsec leakage with SELinux enabled - V.02

2006-10-05 Thread David Miller
From: James Morris <[EMAIL PROTECTED]>
Date: Thu, 5 Oct 2006 16:58:31 -0400 (EDT)

> On Tue, 3 Oct 2006, David Miller wrote:
> 
> > The socket policy behavior deserves some scrutiny.  I say this because
> > if a matching socket policy is avoided due to security layer error,
> > this could potentially make key manager problems very hard to
> > diagnose.
> 
> In this case, AVC denial messages would be logged to the audit log, so 
> there'd be an indication of what's going wrong.

Ok.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread David Miller
From: Venkat Yekkirala <[EMAIL PROTECTED]>
Date: Thu, 05 Oct 2006 15:42:13 -0500

> This version takes into account David Miller's comments
> regarding treatment of security layer errors in the case
> of socket policies. Specifically, these errors will be
> treated like how these kind of errors are treated for
> the main/sub policies, which is to return a full lookup
> failure.

I only have patches "1" and "3" in my inbox, did you forget
to send the second one out or are they simply misnumbered?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix for IPsec leakage with SELinux enabled - V.02

2006-10-05 Thread James Morris
On Tue, 3 Oct 2006, David Miller wrote:

> The socket policy behavior deserves some scrutiny.  I say this because
> if a matching socket policy is avoided due to security layer error,
> this could potentially make key manager problems very hard to
> diagnose.

In this case, AVC denial messages would be logged to the audit log, so 
there'd be an indication of what's going wrong.


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Request to postpone WE-21

2006-10-05 Thread John W. Linville
On Thu, Oct 05, 2006 at 09:31:13AM -0700, Jean Tourrilhes wrote:

>   Based on the feedback, I formally request you to back out all
> of WE-21 from 2.6.19. Rationale : it's probably too early. You can
> keep it for a later date if you wish.

Jean,

What about a patch like the one below?  It tries to detect WE-20
ESSID/NICKN accesses and adjust them to WE-21 style.  What am
I missing?

I haven't had a chance to test it yet -- just hacked it
up...YMMV... :-)

John

diff --git a/net/core/wireless.c b/net/core/wireless.c
index 0da..ba5fe77 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -748,11 +748,39 @@ #endif/* WE_SET_EVENT */
int extra_size;
int user_length = 0;
int err;
+   int essid_compat = 0;
 
/* Calculate space needed by arguments. Always allocate
 * for max space. Easier, and won't last long... */
extra_size = descr->max_tokens * descr->token_size;
 
+   /* Check need for ESSID compatibility for WE < 21 */
+   switch (cmd) {
+   case SIOCSIWESSID:
+   case SIOCGIWESSID:
+   case SIOCSIWNICKN:
+   case SIOCGIWNICKN:
+   if (iwr->u.data.length == descr->max_tokens + 1)
+   essid_compat = 1;
+   else if (IW_IS_SET(cmd)) {
+   char essid[IW_ESSID_MAX_SIZE + 1];
+
+   err = copy_from_user(essid, iwr->u.data.pointer,
+iwr->u.data.length *
+descr->token_size);
+   if (err)
+   return -EFAULT;
+
+   if (essid[iwr->u.data.length] == '\0')
+   essid_compat = 1;
+   }
+   break;
+   default:
+   break;
+   }
+
+   iwr->u.data.length -= essid_compat;
+
/* Check what user space is giving us */
if(IW_IS_SET(cmd)) {
/* Check NULL pointer */
@@ -795,7 +823,8 @@ #ifdef WE_IOCTL_DEBUG
 #endif /* WE_IOCTL_DEBUG */
 
/* Create the kernel buffer */
-   extra = kmalloc(extra_size, GFP_KERNEL);
+   /*kzalloc ensures NULL-termination for essid_compat */
+   extra = kzalloc(extra_size, GFP_KERNEL);
if (extra == NULL) {
return -ENOMEM;
}
@@ -819,6 +848,8 @@ #endif  /* WE_IOCTL_DEBUG */
/* Call the handler */
ret = handler(dev, &info, &(iwr->u), extra);
 
+   iwr->u.data.length += essid_compat;
+
/* If we have something to return to the user */
if (!ret && IW_IS_GET(cmd)) {
/* Check if there is enough buffer up there */

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread David Miller
From: James Morris <[EMAIL PROTECTED]>
Date: Thu, 5 Oct 2006 16:54:38 -0400 (EDT)

> >  #ifdef CONFIG_XFRM_SUB_POLICY
> > pol = xfrm_policy_lookup_bytype(XFRM_POLICY_TYPE_SUB, fl, family, dir);
> > -   if (pol)
> > +   if (IS_ERR(pol)) {
> > +   err = PTR_ERR(pol);
> > +   pol = NULL;
> > +   }
> > +   if (pol || err)
> > goto end;
> 
> Similarly, if the sub-policy lookup returns -EACCESS, should we then try a 
> main policy lookup before failing?

We're trying to fill the flow cache here.  In the case where we'd
have a match in both the sub-policy and main table, I think the
sub-policy is supposed to take precedence, and if you fail to get
this sub-policy you should fail the entire lookup.

The way the sub-policied entries work is that you find the sub-policy
as the primary object in the flow cache, and once you notice you have
a sub-policy you do an explicit lookup in the main table to put the
whole thing together.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
> > This version takes into account David Miller's comments
> > regarding treatment of security layer errors in the case
> > of socket policies. Specifically, these errors will be
> > treated like how these kind of errors are treated for
> > the main/sub policies, which is to return a full lookup
> > failure.
> 
> I only have patches "1" and "3" in my inbox, did you forget
> to send the second one out or are they simply misnumbered?
> 

My apologies. The second one is also numbered 1, but has the
following distinct subject line:
[PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03: Fix xfrm code
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
> > > This version takes into account David Miller's comments
> > > regarding treatment of security layer errors in the case
> > > of socket policies. Specifically, these errors will be
> > > treated like how these kind of errors are treated for
> > > the main/sub policies, which is to return a full lookup
> > > failure.
> > 
> > I only have patches "1" and "3" in my inbox, did you forget
> > to send the second one out or are they simply misnumbered?
> > 
> 
> My apologies. The second one is also numbered 1, but has the
> following distinct subject line:
> [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - 
> V.03: Fix xfrm code

In actuality, patch 2 in the series has the following subject line:

[PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


socket/IP on Linux

2006-10-05 Thread Jingping Lin
Hello, Linux Kernel:
For a project I will work on for mobile, I am looking
into the IP stacks on Linux.

I have a few questions to bother you:

1. is "socket.c" the file handling the socket
interface?

2. which function is for opening a socket?
It looks like "sock_map_fd()" is the one for
opening/creating a socket? Is that correct?
The "Linux IP Stacks Commentary" book suggested the
function is "int socket()" which I didn't find in
"socket.c" though.

3. Do you have documentations discussing in details
the implemented socket interfaces?

Thanks a lot in advance for your help,
Jingping


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
> > -   if (xfrm_policy_match(pol, fl, type, family, dir)) {
> > +   err = xfrm_policy_match(pol, fl, type, family, dir);
> > +   if (err) {
> > +   if (err == -ESRCH)
> > +   continue;
> > +   else {
> > +   ret = ERR_PTR(err);
> > +   goto fail;
> > +   }
> > +   } else {
> 
> Semantics issue: if the exact policy match fails with 
> -EACCESS, should we 
> then try an inexact match before failing?

I wonder what you mean by an inexact match here.

> 
> >  #ifdef CONFIG_XFRM_SUB_POLICY
> > pol = xfrm_policy_lookup_bytype(XFRM_POLICY_TYPE_SUB, 
> fl, family, dir);
> > -   if (pol)
> > +   if (IS_ERR(pol)) {
> > +   err = PTR_ERR(pol);
> > +   pol = NULL;
> > +   }
> > +   if (pol || err)
> > goto end;
> 
> Similarly, if the sub-policy lookup returns -EACCESS, should 
> we then try a 
> main policy lookup before failing?

I would think not since we are already handling the more usual
"failure" of EACCES properly, and any other error would usually
have to be a near-fatal error concerning the whole LSM policy or
temporary memory pressure, for example. Usually the latter is auto
handled when the callers reattempt the llokup.

While it is theoretically possible
that the LSM might generate an error for the sub but not for the main,
we would have to first redefine the LSM hook to communicate this
differentiation. And at least in the case of the current user of LSM
(SELinux)
I don't currently see the need for this differentiation.

> 
> I would think yes to both.
> 
> Opinions?
> 
> 
> - James
> -- 
> James Morris
> <[EMAIL PROTECTED]>
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Request to postpone WE-21

2006-10-05 Thread Jean Tourrilhes
On Thu, Oct 05, 2006 at 04:49:54PM -0400, John W. Linville wrote:
> On Thu, Oct 05, 2006 at 09:31:13AM -0700, Jean Tourrilhes wrote:
> 
> > Based on the feedback, I formally request you to back out all
> > of WE-21 from 2.6.19. Rationale : it's probably too early. You can
> > keep it for a later date if you wish.
> 
> Jean,

Let me say I truly apreciate your effort to bring progress to
the discussion.

> What about a patch like the one below?  It tries to detect WE-20
> ESSID/NICKN accesses and adjust them to WE-21 style.  What am
> I missing?

The idea is clever.
The GET is no longer an issue. WE had half the driver doing
the GET "new style" since january, so in a sense the API change has
already happened, and I've already dealt with the bug reports. So, I
think we could drop the "GET" part.
As you may have noticed, detecting the API for the GET is
easy. On the other hand, detecting it for the SET is not clear cut. As
Jouni was pointing out, '\0' is a valid ESSID character, and in the
long term we want to allow it, even if it's in the last position.
I'm also wondering if this additional complexity could not
bring additional trouble, but I'm not currently clear on that. I
usually prefer things to be a bit more explicit.

> I haven't had a chance to test it yet -- just hacked it
> up...YMMV... :-)

And I thing there is a couple of way we could refine the
implementation, if ever we decide to go that way.
For example, the correction could happen after real
copy_from_user(), as the uncorrected iwr->u.data.length is always the
number of char to pass between kernel and userspace. I think this
would simplify drastically the code.
I'll try to check that.

> John

Thanks again...

Jean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread Venkat Yekkirala
> From: James Morris <[EMAIL PROTECTED]>
> Date: Thu, 5 Oct 2006 16:54:38 -0400 (EDT)
> 
> > >  #ifdef CONFIG_XFRM_SUB_POLICY
> > >   pol = xfrm_policy_lookup_bytype(XFRM_POLICY_TYPE_SUB, 
> fl, family, dir);
> > > - if (pol)
> > > + if (IS_ERR(pol)) {
> > > + err = PTR_ERR(pol);
> > > + pol = NULL;
> > > + }
> > > + if (pol || err)
> > >   goto end;
> > 
> > Similarly, if the sub-policy lookup returns -EACCESS, 
> should we then try a 
> > main policy lookup before failing?
> 
> We're trying to fill the flow cache here.  In the case where we'd
> have a match in both the sub-policy and main table, I think the
> sub-policy is supposed to take precedence, and if you fail to get
> this sub-policy you should fail the entire lookup.

Which is what's happening here correct?

> 
> The way the sub-policied entries work is that you find the sub-policy
> as the primary object in the flow cache, and once you notice you have
> a sub-policy you do an explicit lookup in the main table to put the
> whole thing together.

May be James can help me understand this; when exactly would a sub-policy
be "notice"d here? What does "put the whole thing together" mean?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread David Miller
From: Venkat Yekkirala <[EMAIL PROTECTED]>
Date: Thu, 5 Oct 2006 17:07:59 -0400 

> My apologies. The second one is also numbered 1, but has the
> following distinct subject line:
> [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03: Fix xfrm code

I definitely deleted one of them, since I usually get N copies
of very single patch posting and two of them looked identical:)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Fix for IPsec leakage with SELinux enabled - V.03

2006-10-05 Thread David Miller
From: Venkat Yekkirala <[EMAIL PROTECTED]>
Date: Thu, 5 Oct 2006 17:27:03 -0400 

> May be James can help me understand this; when exactly would a sub-policy
> be "notice"d here? What does "put the whole thing together" mean?

The code in xfrm_lookup() which does a flow cache lookup,
and then if it finds it has obtained a sub-policy it tries
to do an explicit main table policy lookup.  The sub-policy
and the main table policy thus found are "put together" to
form the full route.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Request to postpone WE-21

2006-10-05 Thread Jouni Malinen
On Thu, Oct 05, 2006 at 04:49:54PM -0400, John W. Linville wrote:

> What about a patch like the one below?  It tries to detect WE-20
> ESSID/NICKN accesses and adjust them to WE-21 style.  What am
> I missing?

> diff --git a/net/core/wireless.c b/net/core/wireless.c

> + else if (IW_IS_SET(cmd)) {
> + char essid[IW_ESSID_MAX_SIZE + 1];
> +
> + err = copy_from_user(essid, iwr->u.data.pointer,
> +  iwr->u.data.length *
> +  descr->token_size);

> + if (essid[iwr->u.data.length] == '\0')
> + essid_compat = 1;

This looks somewhat confusing.. WE-20 (and older) included '\0' in both
the data value and length (well, at least in most drivers and user space
tools, if I remember correctly), i.e., essid[iwr->u.data.length] would
be pointing one byte after the '\0' termination.. And since '\0' is
valid character in SSID (it is just an arbitrary array of octets) it can
also be the last octet of the SSID and WE-21 style case could have
essid[iwr->u.data.length - 1] == '\0'..

-- 
Jouni MalinenPGP id EFC895FA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >