Re: specifying scopid's for link-local IPv6 addrs

2007-07-24 Thread Bill Fink
On Tue, 24 Jul 2007, Sridhar Samudrala wrote:

> On Tue, 2007-07-24 at 10:13 -0700, Rick Jones wrote:
> > > Rick,
> > > 
> > > I don't see any way around this.  For example, on one of my test
> > > systems, I have the following link local routes:
> > > 
> > > chance% netstat -A inet6 -rn | grep fe80::/64
> > > fe80::/64   ::
> > >   U 25600 eth0
> > > fe80::/64   ::
> > >   U 25600 eth2
> > > fe80::/64   ::
> > >   U 25600 eth3
> > > fe80::/64   ::
> > >   U 25600 eth4
> > > fe80::/64   ::
> > >   U 25600 eth5
> > > fe80::/64   ::
> > >   U 25600 eth6
> > > 
> > > So if I want to run a link local test to fe80::202:b3ff:fed4:cd1,
> > > the system has no way to choose which is the correct interface to
> > > use for the test, and will give an error if the interface isn't
> > > specified. 
> > 
> > Yeah, I was wondering about that.  I'm not sure if the attempts on "those 
> > other 
> > OSes" happened to involve multiple interfaces or not.  Even so, it "feels" 
> > unpleasant for an application to deal with and I wonder if there is a way 
> > for a 
> > stack to deal with it on the application's behalf.  I guess that might 
> > involve 
> > some sort of layer violation between neightbor discovery and routing 
> > (typing 
> > while I think about things I know little about...)
> > 
> > Is there RFC chapter and verse I might read about routing with multiple 
> > link-local's on a system?
> > 
> > > You must explicitly specify the desired interface.  For example,
> > > on my test system, the correct interface is eth6 which is interface 8
> > > (lo eth0 eth1 eth2 ... eth5 eth6).  Here is an example nuttcp test
> > > specifying interface 8:
> > > 
> > > chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8
> > >  1178.5809 MB /  10.02 sec =  986.2728 Mbps 12 %TX 15 %RX
> > > 
> > > nuttcp uses getaddrinfo() which parses the "%" field,
> > > and then copies the sin6_scope_id from the res structure to the
> > > server's sockaddr_in6 structure before initiating the connect().
> > 
> > OK, I'll give that a quick try with netperf:
> > 
> > [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M 
> > -m 64K 
> > -H fe80::207:43ff:fe05:9d%2
> 
> We can even specify the interface name instead of the interface index
>%ethX
> 
> getaddrinfo() uses if_nametoindex() internally to get the index.
> 
> Thanks
> Sridhar

Cool!  That's much easier and works great.  :-)

chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%eth6
 1178.5468 MB /  10.02 sec =  986.3239 Mbps 13 %TX 15 %RX

Still learn something new every day.  Now if I just could remember
it all when I needed it later.  :-)

-Thanks

-Bill
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread David Miller
From: Rusty Russell <[EMAIL PROTECTED]>
Date: Wed, 25 Jul 2007 15:09:55 +1000

> Not pretty 8(

Not at all and cmpxchg() isn't really a generically usable
primitive.  Yes all platforms provide atomic_cmpxchg() in
one form or another, but it's really barbaric and inefficient
on some cpu types.

Let's can this idea for now.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread Rusty Russell
On Tue, 2007-07-24 at 21:29 -0700, David Miller wrote:
> From: Rusty Russell <[EMAIL PROTECTED]>
> Date: Wed, 25 Jul 2007 12:33:14 +1000
> 
> > Maybe by adding YA state bit?  Hold on, this might get ugly...
> > 
> > Say netif_rx_schedule_prep() sets the MORE_TODO bit (atomically instead
> > of setting __LINK_STATE_RX_SCHED) if it's going to fail, and
> > netif_rx_complete() returns 0 if it was set, or 1 if it's OK.  Now
> > callers do:
> > 
> > reenable_interrupts();
> > if (rx_pending() || !netif_rx_complete(netdev, napi))
> > disable_interrupts();
> 
> This is an interesting idea, and would work, but the extra atomics
> for something that 2 or 3 drivers actually need unless I
> misunderstand your suggestion?

Well, netif_rx_schedule_prep() becomes a cmpxchg instead of a
test_and_set_bit:

/* We either set MORE_TODO or RX_SCHED. */
unsigned long state;

again:
state = napi->state;
if (!(state & __LINK_STATE_RX_SCHED)) {
if (cmpxchg(&napi->state, state, state|__LINK_STATE_RX_SCHED) 
== state)
return 1;
} else {
if (cmpxchg(&napi->state, state, state|__LINK_STATE_MORE_TODO) 
== state)
return 0;
}
goto again;

netif_rx_complete would need something more complex... hmm... 

unsigned long state;
again:
state = napi->state;

if (state & __LINK_STATE_MORE_TODO)
return 0;

list_del(&napi->poll_list);
if (cmpxchg(&napi->state, state, state & ~__LINK_STATE_RX_SCHED) == 
state)
return 1;
list_add_tail(napi->poll_list, &__get_cpu_var(softnet_data).poll_list);
goto again;

Not pretty 8(

Rusty.


if (






-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 18:47:59 -0700 (PDT)

> EHEA is another case, in fact EHEA doesn't disable interrupts at
> all.
>
> The reason is that EHEA has several NAPI sources behind one single
> hardware interrupt.
> 
> That driver's issues were discussed long ago and actually were the
> initial impetus for Stephen to cut the first napi_struct patch.
> 
> Because of that weird layout EHEA needs special consideration, which I
> will get back to.

Some of the things I say above seem to be not true. :-)

I studied the EHEA driver a bit, enclosed at the end of this email is
the current version of the code in my napi_struct tree.

It seems there is one unique interrupt per port.  So this device's
layout is identical to the one sunvnet, and other virtualized net
devices, tend to have.  Multiple ports per netdev, each port
generating unique events.

It also seems that the EHEA interrupts are "one-shot" in nature, and
it seems that the:

ehea_reset_cq_ep(pr->recv_cq);
ehea_reset_cq_ep(pr->send_cq);
ehea_reset_cq_n1(pr->recv_cq);

operations are what get the interrupt going again.  If someone
reading this understands the hardware better, please let us know
if this analysis is accurate, thanks.

So it appears I subconsciously converted this driver to multi-NAPI
away from the dreaded "dummy netdevice" scheme a lot of multi-queue
drivers are using.  It's amusing because I don't remember consciously
doing this, but I'll take it nonetheless! :-)

This appears to resolve all of the NAPI limitations in the driver
without having to waste memory and resources on dummy net devices.

In short, EHEA is a great fit for napi_struct!

It may look corny how the spinlock in the interrupt handler
just sits around the netif_rx_schedule() call and nothing
else, but I do believe it is necessary so that the "reenable
interrupt, then maybe netif_rx_complete()" runs atomically
wrt. the interrupt NAPI trigger in order to avoid lost events.

EHEA is also a perfect candidate for the solutions being suggested
for multi-TX-queue situations like sunvnet.  I'll have to get back
to that at some point.

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 489c8b2..d4227be 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -351,6 +351,8 @@ struct ehea_q_skb_arr {
  * Port resources
  */
 struct ehea_port_res {
+   spinlock_t poll_lock;
+   struct napi_struct napi;
struct port_stats p_stats;
struct ehea_mr send_mr; /* send memory region */
struct ehea_mr recv_mr; /* receive memory region */
@@ -362,7 +364,6 @@ struct ehea_port_res {
struct ehea_cq *send_cq;
struct ehea_cq *recv_cq;
struct ehea_eq *eq;
-   struct net_device *d_netdev;
struct ehea_q_skb_arr rq1_skba;
struct ehea_q_skb_arr rq2_skba;
struct ehea_q_skb_arr rq3_skba;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 4c70a93..9244338 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -389,9 +389,9 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, 
int rq,
return 0;
 }
 
-static struct ehea_cqe *ehea_proc_rwqes(struct net_device *dev,
-   struct ehea_port_res *pr,
-   int *budget)
+static int ehea_proc_rwqes(struct net_device *dev,
+  struct ehea_port_res *pr,
+  int budget)
 {
struct ehea_port *port = pr->port;
struct ehea_qp *qp = pr->qp;
@@ -404,18 +404,16 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
int skb_arr_rq2_len = pr->rq2_skba.len;
int skb_arr_rq3_len = pr->rq3_skba.len;
int processed, processed_rq1, processed_rq2, processed_rq3;
-   int wqe_index, last_wqe_index, rq, my_quota, port_reset;
+   int wqe_index, last_wqe_index, rq, port_reset;
 
processed = processed_rq1 = processed_rq2 = processed_rq3 = 0;
last_wqe_index = 0;
-   my_quota = min(*budget, dev->quota);
 
cqe = ehea_poll_rq1(qp, &wqe_index);
-   while ((my_quota > 0) && cqe) {
+   while ((processed < budget) && cqe) {
ehea_inc_rq1(qp);
processed_rq1++;
processed++;
-   my_quota--;
if (netif_msg_rx_status(port))
ehea_dump(cqe, sizeof(*cqe), "CQE");
 
@@ -430,14 +428,14 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
if (netif_msg_rx_err(port))
ehea_error("LL rq1: skb=NULL");
 
-   skb = netdev_alloc_skb(port->netdev,
+   skb = netdev_alloc_skb(dev,
   EHEA_L_PKT_SIZE);

Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Mike Galbraith
On Tue, 2007-07-24 at 11:25 -0700, Linus Torvalds wrote:
> 
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > 
> > I guess this was the bug:
> 
> Looks very likely to me. Mike, Alexey, does this fix things for you?

I don't have very much runtime on it yet, but yes, it seems to have.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12 -Rev2] Changes to netdevice.h

2007-07-24 Thread Krishna Kumar2
Hi Patrick,

Krishna Kumar2/India/IBM wrote on 07/23/2007 08:27:53 AM:

> Hi Patrick,
>
> Patrick McHardy <[EMAIL PROTECTED]> wrote on 07/22/2007 10:36:51 PM:
>
> > Krishna Kumar wrote:
> > > @@ -472,6 +474,9 @@ struct net_device
> > > void *priv;   /* pointer to private data   */
> > > int (*hard_start_xmit) (struct sk_buff *skb,
> > >struct net_device *dev);
> > > +   int (*hard_start_xmit_batch) (struct net_device
> > > +   *dev);
> > > +
> >
> >
> > Os this function really needed? Can't you just call hard_start_xmit
with
> > a NULL skb and have the driver use dev->blist?

> Probably not. I will see how to do it this way and get back to you.

I think this is a good idea and makes code everywhere simpler. I
will try this change and test to make sure it doesn't have any
negative impact. Will mostly send out rev3 tomorrow.

Thanks,

- KK

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread David Miller
From: Rusty Russell <[EMAIL PROTECTED]>
Date: Wed, 25 Jul 2007 12:33:14 +1000

> Maybe by adding YA state bit?  Hold on, this might get ugly...
> 
> Say netif_rx_schedule_prep() sets the MORE_TODO bit (atomically instead
> of setting __LINK_STATE_RX_SCHED) if it's going to fail, and
> netif_rx_complete() returns 0 if it was set, or 1 if it's OK.  Now
> callers do:
> 
>   reenable_interrupts();
> if (rx_pending() || !netif_rx_complete(netdev, napi))
> disable_interrupts();

This is an interesting idea, and would work, but the extra atomics
for something that 2 or 3 drivers actually need unless I
misunderstand your suggestion?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] netxen: Load firmware during probe, dma watchdog fix.

2007-07-24 Thread Dhananjay Phadke

Jeff,

You committed old patch, which I had asked to ignore for two newer patches.

[PATCH 1/1] netxen: Load firmware during probe, dma watchdog fix.

is wrong patch that went in, instead please commit:

[PATCH 1/2] netxen: IMEZ multiport card 2nd port issue, dma watchdog fix
[PATCH 2/2] netxen: Fix interrupt handling for multiport adapters

Thanks,
Dhananjay

On 7/25/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] wrote:
> The firmware should be loaded after resetting hardware during PCI probe,
> besides module unload. This fixes issue with 2nd port of multiport adapter
> on powerpc blades. This patch also fixes a bug that PCI resources are not
> freed if dma watchdog shutdown failed. The dma watchdog poll messages
> during module unload are also suppressed.
>
> Signed-off-by: Dhananjay Phadke <[EMAIL PROTECTED]>
> Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
> Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>

applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] Implement batching skb API

2007-07-24 Thread Krishna Kumar2
Jamal,

This is silly. I am not responding to this type of presumptuous and
insulting mails.

Regards,

- KK

J Hadi Salim <[EMAIL PROTECTED]> wrote on 07/25/2007 12:58:20 AM:

> KK,
>
> On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote:
>
> >
> > J Hadi Salim <[EMAIL PROTECTED]> wrote on 07/23/2007 06:02:01 PM:
>
>
> > Actually you have not sent netperf results with prep and without prep.
>
> My results were based on pktgen (which i explained as testing the
> driver). I think depending on netperf without further analysis is
> simplistic. It was like me doing forwarding tests on these patches.
>
> > > So _which_ non-LLTX driver doesnt do that? ;->
> >
> > I have no idea since I haven't looked at all drivers. Can you tell
which
> > all non-LLTX drivers does that ? I stated this as the sole criterea.
>
> The few i have peeked at all do it. I also think the e1000 should be
> converted to be non-LLTX. The rest of netdev is screaming to kill LLTX.
>
> > > tun driver doesnt use it either - but i doubt that makes it "bloat"
> >
> > Adding extra code that is currently not usable (esp from a submission
> > point) is bloat.
>
> So far i have converted 3 drivers, 1 of them doesnt use it. Two more
> driver conversions are on the way, they will both use it. How is this
> bloat again?
> A few emails back you said if only IPOIB can use batching then thats
> good enough justification.
>
> > > You waltz in, have the luxury of looking at my code, presentations,
many
> > > discussions with me etc ...
> >
> > "luxury" ?
> > I had implemented the entire thing even before knowing that you
> > are working on something similar! and I had sent the first proposal to
> > netdev,
>
> I saw your patch at the end of may (or at least 2 weeks after you said
> it existed). That patch has very little resemblance to what you just
> posted conceptwise or codewise. I could post it if you would give me
> permission.
>
> > *after* which you told that you have your own code and presentations
(which
> > I had never seen earlier - I joined netdev a few months back, earlier I
was
> > working on RDMA, Infiniband as you know).
>
> I am gonna assume you didnt know of my work - which i have been making
> public for about 3 years. Infact i talked about this topic when i
> visited your office in 2006 on a day you were not present, so it is
> plausible you didnt hear of it.
>
> >  And it didn't give me any great
> > ideas either, remember I had posted results for E1000 at the time of
> > sending the proposals.
>
> In mid-June you sent me a series of patches which included anything from
> changing variable names to combining qdisc_restart and about everything
> i referred to as being "cosmetic differences" in your posted patches. I
> took two of those and incorporated them in. One was an "XXX" in my code
> already to allocate the dev->blist
> (Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63).
> The other one was a mechanical removal of the blist being passed
> (Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757).
> Some of the others i asked you to defer. For example, the reason i gave
> you for not merging any qdisc_restart_combine changes is because i was
> waiting for Dave to swallow the qdisc_restart changes i made; otherwise
> maintainance becomes extremely painful for me.
> Sridhar actually provided a lot more valuable comments and fixes but has
> not planted a flag on behalf of the queen of spain like you did.
>
> > However I do give credit in my proposal to you for what
> > ideas that your provided (without actual code), and the same I did for
other
> > people who did the same, like Dave, Sridhar. BTW, you too had
discussions with me,
> > and I sent some patches to improve your code too,
>
> I incorporated two of your patches and asked for deferal of others.
> These patches have now shown up in what you claim as "the difference". I
> just call them "cosmetic difference" not to downplay the importance of
> having an ethtool interface but because they do not make batching
> perform any better. The real differences are those two items. I am
> suprised you havent cannibalized those changes as well. I thought you
> renamed them to something else; according to your posting:
> "This patch will work with drivers updated by Jamal, Matt & Michael Chan
> with minor modifications - rename xmit_win to xmit_slots & rename batch
> handler". Or maybe thats a "future plan" you have in mind?
>
> > so it looks like a two
> > way street to me (and that is how open source works and should).
>
> Open source is a lot more transparent than that.
>
> You posted a question, which was part of your research. I responded and
> told you i have patches; you asked me for them and i promptly ported
> them from pre-2.6.18 to the latest kernel at the time.
>
> The nature of this batching work is one of performance. So numbers are
> important. If you had some strong disagreements on something in the
> architecture, then it would be of great value to explain it in a
> technical d

Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread Rusty Russell
On Tue, 2007-07-24 at 18:47 -0700, David Miller wrote:
> One thing that's peculiar is that when netif_rx_schedule_prep()
> fails, we don't disable interrupts but we also don't clear the
> condition that is causing the interrupt to occur.

I think we're ok, but this stuff is tricky.

In the driver in -rc1, I think it will only fail if racing with the
netif_rx_reschedule, which will do the right thing.

In your version, there is only one place where prep can fail (ie.
interrupts are enabled, but netif_rx_complete() hasn't been called):
when ibmveth_poll saw pending buffers and disabled the irq (which has
already been delivered and is spinning on the poll_lock)

In this case, we're ok because the irqs really are disabled now: the
running handler is a relic.

> Perhaps the lock is avoidable somehow, who knows :)

Maybe by adding YA state bit?  Hold on, this might get ugly...

Say netif_rx_schedule_prep() sets the MORE_TODO bit (atomically instead
of setting __LINK_STATE_RX_SCHED) if it's going to fail, and
netif_rx_complete() returns 0 if it was set, or 1 if it's OK.  Now
callers do:

reenable_interrupts();
if (rx_pending() || !netif_rx_complete(netdev, napi))
disable_interrupts();

I'm going to go absorb some more caffeine before you reply 8)

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread David Miller
From: Rusty Russell <[EMAIL PROTECTED]>
Date: Wed, 25 Jul 2007 11:15:49 +1000

> If I understand correctly, you're looking at a general model like the
> following:
> 
>   while (more_packets()) { ... netif_receive_skb() }
> 
>   enable_rx_and_rxnobuf_ints();
> 
>   /* Lock protects against race w/ rx interrupt re-queueing us */
>   spin_lock_irq();
>   if (!more_packets())
>   netif_rx_complete(dev);
>   else
>   /* We'll be scheduled again. */
>   disable_rx_and_rxnobuff_ints();
>   spin_unlock_irq();
> 
> Seems pretty robust to me.  The race is probably pretty unusual, so the
> only downside is the locking overhead?  Even non-irq-problematic drivers
> could use this (ie. virt_net.c probably wants to do it even though
> virtio implementation may not have this issue).

Yes, interesting cases come up in the existing virtual drivers.
They do no locking at all :-)  EHEA is another case, in fact
EHEA doesn't disable interrupts at all.

The reason is that EHEA has several NAPI sources behind one single
hardware interrupt.

That driver's issues were discussed long ago and actually were the
initial impetus for Stephen to cut the first napi_struct patch.

Because of that weird layout EHEA needs special consideration, which I
will get back to.

Anyways, here is how ibmveth.c looks right now in my tree:

static int ibmveth_poll(struct napi_struct *napi, int budget)
{
struct ibmveth_adapter *adapter = container_of(napi, struct 
ibmveth_adapter, napi);
struct net_device *netdev = adapter->netdev;
int frames_processed = 0;
unsigned long lpar_rc;

do {
struct sk_buff *skb;

if (!ibmveth_rxq_pending_buffer(adapter))
break;

rmb();
if (!ibmveth_rxq_buffer_valid(adapter)) {
 ...
} else {
 ...
frames_processed++;
 ...
}
} while (frames_processed < budget);

ibmveth_replenish_task(adapter);

if (frames_processed < budget) {
/* We think we are done - reenable interrupts,
 * then check once more to make sure we are done.
 */
spin_lock_irq(&adapter->poll_lock);
lpar_rc = h_vio_signal(adapter->vdev->unit_address,
   VIO_IRQ_ENABLE);

ibmveth_assert(lpar_rc == H_SUCCESS);

if (ibmveth_rxq_pending_buffer(adapter))
lpar_rc = h_vio_signal(adapter->vdev->unit_address,
   VIO_IRQ_DISABLE);
else
netif_rx_complete(netdev, napi);
spin_unlock_irq(&adapter->poll_lock);
}

return frames_processed;
}

static irqreturn_t ibmveth_interrupt(int irq, void *dev_instance)
{
struct net_device *netdev = dev_instance;
struct ibmveth_adapter *adapter = netdev->priv;
unsigned long lpar_rc;

spin_lock(&adapter->poll_lock);
if (netif_rx_schedule_prep(netdev, &adapter->napi)) {
lpar_rc = h_vio_signal(adapter->vdev->unit_address, 
VIO_IRQ_DISABLE);
ibmveth_assert(lpar_rc == H_SUCCESS);
__netif_rx_schedule(netdev, &adapter->napi);
}
spin_unlock(&adapter->poll_lock);
return IRQ_HANDLED;
}

A part of me looks at cases like this and almost believes that
the new spinlock is even superfluous.  Consider:

1) ->poll() is guarenteed single-threaded wrt. itself
2) NAPI_STATE_SCHED provides implicit synchronization,
   it behaves as a lock

However the lock is necessary to synchronize the:

reenable_interrupts();
if (rx_pending())
disable_interrupts();
else
netif_rx_complete(netdev, napi);

sequence.

Any arriving interrupt, up until the moment netif_rx_complete()
is invoked, will fail the netif_rx_schedule_prep() test.  So we
could miss a NAPI schedule without the lock.

One thing that's peculiar is that when netif_rx_schedule_prep()
fails, we don't disable interrupts but we also don't clear the
condition that is causing the interrupt to occur.

This seems to suggest to me that we can loop a lot, or even get stuck
completely handling the interrupt forever, when these scenerios
trigger.  So at the very least we need a local IRQ disable around
the netif_rx_complete() sequence.

Perhaps the lock is avoidable somehow, who knows :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Al Viro
On Wed, Jul 25, 2007 at 12:30:07AM +0100, Al Viro wrote:
> On Tue, Jul 24, 2007 at 04:26:21PM -0700, Brandon Philips wrote:
> > Could you point me to an example you have in mind?
> > 
> > I quickly searched through a handful of the PCI device drivers and
> > couldn't find an example where the .remove function didn't do something
> > to the tune of:
> > 
> > unregister_netdev(dev);
> > ... various un-allocs and cleanup ...
> > free_netdev(dev)
> 
> Now find the definition of free_netdev() and read it through, please.

PS: free_netdev() is certainly a bad name for that; the reasons are
mostly historical, since it used to free the damn thing.  When that
became impossible (read: previous time net_device has meat device
model; consequences had not been pretty), the patch series from hell
had been big enough without renaming that one on top of everything else.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread Rusty Russell
On Tue, 2007-07-24 at 17:45 -0700, David Miller wrote:
> I'm now going to go over the other resched cases and make sure
> things can be similarly handled in those drivers as well.
> To be honest I'm quite confident this will be the case.

If I understand correctly, you're looking at a general model like the
following:

while (more_packets()) { ... netif_receive_skb() }

enable_rx_and_rxnobuf_ints();

/* Lock protects against race w/ rx interrupt re-queueing us */
spin_lock_irq();
if (!more_packets())
netif_rx_complete(dev);
else
/* We'll be scheduled again. */
disable_rx_and_rxnobuff_ints();
spin_unlock_irq();

Seems pretty robust to me.  The race is probably pretty unusual, so the
only downside is the locking overhead?  Even non-irq-problematic drivers
could use this (ie. virt_net.c probably wants to do it even though
virtio implementation may not have this issue).

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>Al Boldi wrote:
>>
>>>Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead
>>>of NF_CONNTRACK.
>>
>>I guess so, and that would have to select NF_CONNTRACK.
> 
> 
> Should I resend, or can you take care of it?


Please resend after testing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFX]: napi_struct V3

2007-07-24 Thread David Miller
From: Rusty Russell <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 16:21:43 +1000

> On Mon, 2007-07-23 at 22:47 -0700, David Miller wrote:
> > Any objections?
> 
> On the contrary, this looks good.

It turns out the explicit restart logic isn't necessary.  On the first
driver I tried to "convert" this became apparent real fast.

The key is what the ->poll() caller does if you don't complete the
NAPI, from net_rx_action():

/* if napi_complete not called, reschedule */
if (test_bit(NAPI_STATE_SCHED, &n->state))
__napi_schedule(n);

Let's look at ep93xx_poll() as it sits in my current tree, which used
to use netif_rx_reschedule():

static int ep93xx_poll(struct napi_struct *napi, int budget)
{
struct ep93xx_priv *ep = container_of(napi, struct ep93xx_priv, napi);
struct net_device *dev = ep->dev;
int rx;

/*
 * @@@ Have to stop polling if device is downed while we
 * are polling.
 */

rx = ep93xx_rx(dev, 0, budget);
if (rx < budget) {
spin_lock_irq(&ep->rx_lock);
wrl(ep, REG_INTEN, REG_INTEN_TX | REG_INTEN_RX);
if (ep93xx_have_more_rx(ep))
wrl(ep, REG_INTEN, REG_INTEN_TX);
else
netif_rx_complete(dev, napi);
spin_unlock_irq(&ep->rx_lock);
}

return rx;
}

This driver handles TX in it's hardware interrupt handler, and RX
via NAPI.  So to NAPI poll it simply disables RX interrupts and
schedules NAPI.

if (status & REG_INTSTS_RX) {
spin_lock(&ep->rx_lock);
if (likely(__netif_rx_schedule_prep(dev, &ep->napi))) {
wrl(ep, REG_INTEN, REG_INTEN_TX);
__netif_rx_schedule(dev, &ep->napi);
}
spin_unlock(&ep->rx_lock);
}

anyways, if after re-enabling RX interrupts it sees some RX
work, it can simply re-disable RX interrupts and leave the NAPI
state alone.  And that's how I've coded things above.

The caller will requeue the NAPI instance onto the poll list,
nothing more needs to be done to prevent event loss.

I'm now going to go over the other resched cases and make sure
things can be similarly handled in those drivers as well.
To be honest I'm quite confident this will be the case.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-24 Thread David Miller
From: Shannon Nelson <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 17:36:06 -0700

> (repost - original eaten by vger?)

This one seems to have made it.

> Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was
> unreachable and thus unused.  The code originally was there to support
> in-kernel dma needs, but since it remains unused, we'll pull it out.
> 
> Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>

This looks OK, unless some objections come up I'll merge this in
during my next round of net fixes to Linus.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc

2007-07-24 Thread Patrick McHardy
PJ Waskiewicz wrote:
> This is a patch from Patrick McHardy to fix the sch_api code, which I
> went ahead and tested and made a slight fix to.  This also includes
> the fix to sch_prio based on Patrick's patch.
> 
> The sch->parent handle should contain the parent qdisc ID.  When the
> qdisc is the root qdisc (TC_H_ROOT), the parent handle should be the
> value TC_H_ROOT.  This fixes sch_api to set this correctly on
> qdisc_create() for both ingress and egress qdiscs.
> 
> Change this check in prio_tune() so that only the root qdisc can be
> multiqueue-enabled; use sch->parent instead of sch->handle.

Both look good, thanks.

Acked-by: Patrick McHardy <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-24 Thread Shannon Nelson
(repost - original eaten by vger?)

Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was
unreachable and thus unused.  The code originally was there to support
in-kernel dma needs, but since it remains unused, we'll pull it out.

Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>
---

 drivers/dma/iovlock.c |   27 ---
 1 files changed, 0 insertions(+), 27 deletions(-)

diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
index d637555..e763d72 100644
--- a/drivers/dma/iovlock.c
+++ b/drivers/dma/iovlock.c
@@ -143,29 +143,6 @@ void dma_unpin_iovec_pages(struct dma_pinned_list 
*pinned_list)
kfree(pinned_list);
 }
 
-static dma_cookie_t dma_memcpy_to_kernel_iovec(struct dma_chan *chan, struct
-   iovec *iov, unsigned char *kdata, size_t len)
-{
-   dma_cookie_t dma_cookie = 0;
-
-   while (len > 0) {
-   if (iov->iov_len) {
-   int copy = min_t(unsigned int, iov->iov_len, len);
-   dma_cookie = dma_async_memcpy_buf_to_buf(
-   chan,
-   iov->iov_base,
-   kdata,
-   copy);
-   kdata += copy;
-   len -= copy;
-   iov->iov_len -= copy;
-   iov->iov_base += copy;
-   }
-   iov++;
-   }
-
-   return dma_cookie;
-}
 
 /*
  * We have already pinned down the pages we will be using in the iovecs.
@@ -187,10 +164,6 @@ dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, 
struct iovec *iov,
if (!chan)
return memcpy_toiovec(iov, kdata, len);
 
-   /* -> kernel copies (e.g. smbfs) */
-   if (!pinned_list)
-   return dma_memcpy_to_kernel_iovec(chan, iov, kdata, len);
-
iovec_idx = 0;
while (iovec_idx < pinned_list->nr_iovecs) {
struct dma_page_list *page_list;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tc filtering: broken 802_3 classifier?

2007-07-24 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
> [...]
> Now I add a filter for ethernet (802.3), and things aren't as happy:
> 
> # tc filter add dev eth0 protocol 802_3 parent 1: prio 2 u32 match u32
>  0x at 0 match u32  bytes of dst mac address> 0x at 4 flowid 1:1
> 
> This should match the destination MAC address of outgoing packets, and
> put it into flowid 1:1.  For pings, using the normal priomap, they go
> into 1:2, so ping should be a good candidate for seeing if it goes into
> 1:1.  In this case, it does not filter into 1:1.


The protocol match is on skb->protocol, so it case of ethernet its
on the ethernet protocol, which is ETH_P_IP or "ip" for IPv4.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Tc filtering: broken 802_3 classifier?

2007-07-24 Thread Waskiewicz Jr, Peter P
I've been trying to use tc filtering to filter on ethertype, among other
things in the MAC layer.  I'm running into multiple issues, and want to
put this out there in case I'm using the filters wrong, or if there
really is a bug in the filter code (I've stared at most of it today, and
my head hurts).

Here's the scenario.  I am running on a recent 2.6.23 GIT tree, and am
using sch_prio with no multiqueue turned on in the qdisc.  The network
interface in question is e1000 (no multiqueue).

# tc qdisc add dev eth0 root handle 1: prio

Now to see the flowid's packet counts incrementing, I add explicit
classful qdiscs to the leaves:

# tc qdisc add dev eth0 parent 1:1 handle 10: pfifo
# tc qdisc add dev eth0 parent 1:2 handle 20: pfifo
# tc qdisc add dev eth0 parent 1:3 handle 30: pfifo

Now packet counts can be seen with:

# tc -s qdisc ls dev eth0

I can add a filter for IP for ssh, and it works as intended:

# tc filter add dev eth0 protocol ip parent 1: prio 1 u32 match ip dport
22 0x flowid 1:3

This will shove ssh traffic into the 3rd pfifo queue, where by default
it will flow into flowid 1:1.  This is good.

Now I add a filter for ethernet (802.3), and things aren't as happy:

# tc filter add dev eth0 protocol 802_3 parent 1: prio 2 u32 match u32
 0x at 0 match u32  0x at 4 flowid 1:1

This should match the destination MAC address of outgoing packets, and
put it into flowid 1:1.  For pings, using the normal priomap, they go
into 1:2, so ping should be a good candidate for seeing if it goes into
1:1.  In this case, it does not filter into 1:1.

If I expand this into 8 flows on a multiqueue NIC, using sch_prio or
sch_rr, adding any 802_3 filter to the chain will cause any traffic that
hits the classifier (i.e. no other filters match first) to go into
flowid 1:1, regardless if it actually matches.  Remove the 802_3 filter
from the chain, and all filtering starts working again.

I'm trying to get state from the classifier code now when it's running,
but it's a really big mess of black magic.  I'm wondering if anyone is
also seeing this behavior, and if they've tried to fix it.  If not, I'll
continue to search for a solution, but I'm just polling the community to
see if this is a known issue, or if I'm doing something wrong.

Thanks,

-PJ Waskiewicz
Intel Corporation
[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Thomas Gleixner
On Tue, 2007-07-24 at 22:04 +0200, Ingo Molnar wrote:
> Marcin, could you try the patch below too? [without having any other 
> patch applied.] It basically turns the critical section into an irqs-off 
> critical section and thus checks whether your problem is related to that 
> particular area of code.
> 

I read back on this thread and I think the problem is somewhere else:

delayed disable relies on the ability to re-trigger the interrupt in the
case that a real interrupt happens after the software disable was set.
In this case we actually disable the interrupt on the hardware level
_after_ it occurred.

On enable_irq, we need to re-trigger the interrupt. On i386 this relies
on a hardware resend mechanism (send_IPI_self()). 

Actually we only need the resend for edge type interrupts. Level type
interrupts come back once enable_irq() re-enables the interrupt line.

I assume that the interrupt in question is level triggered because it is
shared and above the legacy irqs 0-15:

17: 12   IO-APIC-fasteoi   eth1, eth0

Looking into the IO_APIC code, the resend via send_IPI_self() happens
unconditionally. So the resend is done for level and edge interrupts.
This makes the problem more mysterious.

The code in question lib8390.c does

disable_irq();
fiddle_with_the_network_card_hardware()
enable_irq();

The fiddle_with_the_network_card_hardware() might cause interrupts,
which are cleared in the same code path again,

Marcin found that when he disables the irq line on the hardware level
(removing the delayed disable) the card is kept alive.

So the difference is that we can get a resend on enable_irq, when an
interrupt happens during the time, where we are in the disabled region.

No idea how this affects the network card, as the code there must be
able to handle interrupts, which are not originated from the card due to
interrupt sharing.

Marcin, can you please try the patch below ? It's just a debugging aid
to gather some more data about that problem.

If the patch fixes the problem, then we should try to disable the resend
mechanism for not edge type irq lines on the irq_chip level (i.e. the
IOAPIC code)

Thanks,

tglx

--- linux-2.6.orig/kernel/irq/resend.c
+++ linux-2.6/kernel/irq/resend.c
@@ -62,6 +62,15 @@ void check_irq_resend(struct irq_desc *desc, unsigned int 
irq)
 */
desc->chip->enable(irq);
 
+   /*
+* Temporary hack to figure out more about the problem, which
+* is causing the ancient network cards to die.
+*/
+   if (desc->handle_irq != handle_edge_irq) {
+   printk(KERN_DEBUG "Skip resend for irq %u\n", irq);
+   return;
+   }
+
if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;
 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Waskiewicz Jr, Peter P
> * netdev_pci_remove_one() can replace simple pci device remove
>   functions
> 
> * devm_alloc_netdev() is like alloc_netdev but allocates 
> memory using devres.
>   alloc_netdev() can be removed once all drivers use devres.

Please look at the multiqueue network code that is in 2.6.23.  It
creates alloc_netdev_mq() for multiqueue network device drivers.  If you
want to add this devres feature (which based on the threads, might be
difficult), please make sure you comprehend the multiqueue features.

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1 sky2 boot crash in sky2_mac_intr

2007-07-24 Thread Michal Piotrowski

Hi Florian,

On 24/07/07, Florian Lohoff <[EMAIL PROTECTED]> wrote:

On Tue, Jul 24, 2007 at 09:50:08AM +0100, Stephen Hemminger wrote:
> The problem is related to power management. The PHY has a number of PCI 
configuration
> registers for power control, and the function of these changes based on the 
version and
> revision of the chip. The driver does work on older versions of the EC-U, in
> Fujitsu laptop's, it is just the new rev that is broken.
>
> The driver should probably fail smarter (by not loading) if the PHY isn't 
powered
> up correctly, but that doesn't help your problem.
>
> The vendor has provided me with documentation on many versions
> of the chip, but I don't have doc's on the lastest revision differences of 
the EC Ultra,
> so a proper solution is not easily available.  The best method for resolving 
this would
> be to first try the vendor driver version of sk98lin and see if that fixes 
it. If so,
> then it is easy to change sky2, to match the phy setup in the vendor driver.
> Another possibility is to look for places in sky2 driver where there are 
places
> that compare version/revision.
>
> The most likely bits that need to change are in PCI registers: 0x80, 0x84 and 
0x88
> You could also load the windows driver and dump PCI config space (with lspci 
from
> cygwin), and see what the settings are there.
>
> I am away from my office for a month, and therefore away from any sky2
> hardware for testing.

I'll try the above and keep you posted. The crash itself seems to be a
2.6.23-rc1 regression though. I never experienced this with 2.6.22-rc5
which i was running before.


Can you try to figure out what is causing this crash and then use git-bisect?

Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Al Viro
On Tue, Jul 24, 2007 at 04:26:21PM -0700, Brandon Philips wrote:
> Could you point me to an example you have in mind?
> 
> I quickly searched through a handful of the PCI device drivers and
> couldn't find an example where the .remove function didn't do something
> to the tune of:
> 
>   unregister_netdev(dev);
>   ... various un-allocs and cleanup ...
>   free_netdev(dev)

Now find the definition of free_netdev() and read it through, please.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Brandon Philips
On 00:09 Wed 25 Jul 2007, Al Viro wrote:
> On Tue, Jul 24, 2007 at 11:51:34PM +0100, Al Viro wrote:
> > On Tue, Jul 24, 2007 at 03:39:52PM -0700, [EMAIL PROTECTED] wrote:
> > > * netdev_pci_remove_one() can replace simple pci device remove
> > >   functions
> > > 
> > > * devm_alloc_netdev() is like alloc_netdev but allocates memory using 
> > > devres.
> > >   alloc_netdev() can be removed once all drivers use devres.
> >  
> > E...  What the hell for?  To make sure that we have struct device
> > for everything, whether we need it or not?  Have you actually read
> > through drivers/net?  I mean, _really_ read through it, looking for
> > ugly cases...
> > 
> > I've done just that several times and I'm sorry, but I would classify
> > that project as hopeless.  It's way, _way_ more diverse than SATA...
> 
> Actually, it's even worse - net_device itself simply *cannot* be
> dealt with that way.  Its lifetime can indefinitely exceed that
> of underlying e.g. PCI device.

Could you point me to an example you have in mind?

I quickly searched through a handful of the PCI device drivers and
couldn't find an example where the .remove function didn't do something
to the tune of:

unregister_netdev(dev);
... various un-allocs and cleanup ...
free_netdev(dev)
...
return;

Thank You,

Brandon
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Al Viro
On Tue, Jul 24, 2007 at 11:51:34PM +0100, Al Viro wrote:
> On Tue, Jul 24, 2007 at 03:39:52PM -0700, [EMAIL PROTECTED] wrote:
> > * netdev_pci_remove_one() can replace simple pci device remove
> >   functions
> > 
> > * devm_alloc_netdev() is like alloc_netdev but allocates memory using 
> > devres.
> >   alloc_netdev() can be removed once all drivers use devres.
>  
> E...  What the hell for?  To make sure that we have struct device
> for everything, whether we need it or not?  Have you actually read
> through drivers/net?  I mean, _really_ read through it, looking for
> ugly cases...
> 
> I've done just that several times and I'm sorry, but I would classify
> that project as hopeless.  It's way, _way_ more diverse than SATA...

Actually, it's even worse - net_device itself simply *cannot* be
dealt with that way.  Its lifetime can indefinitely exceed that
of underlying e.g. PCI device.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread Al Viro
On Tue, Jul 24, 2007 at 03:39:52PM -0700, [EMAIL PROTECTED] wrote:
> * netdev_pci_remove_one() can replace simple pci device remove
>   functions
> 
> * devm_alloc_netdev() is like alloc_netdev but allocates memory using devres.
>   alloc_netdev() can be removed once all drivers use devres.
 
E...  What the hell for?  To make sure that we have struct device
for everything, whether we need it or not?  Have you actually read
through drivers/net?  I mean, _really_ read through it, looking for
ugly cases...

I've done just that several times and I'm sorry, but I would classify
that project as hopeless.  It's way, _way_ more diverse than SATA...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][RFC] Update e100 driver to use devres.

2007-07-24 Thread bphilips
devres manages device resources and is currently used by all libata low level
drivers.   It can greatly reduce the complexity of the error handling on probe
and the device removal functions.

For example the e100_free() function and all of the gotos in e100_probe have
been removed.  Also, e100_remove() has been deleted and replaced with a much
simpler netdev_pci_remove_one() function that should be able to handle any PCI
net device that doesn't require any teardown besides resource deallocation.

Applies against 2.6.22.

Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>

---
 drivers/net/e100.c |   73 +
 1 file changed, 19 insertions(+), 54 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2514,18 +2514,11 @@ static int e100_do_ioctl(struct net_devi
 
 static int e100_alloc(struct nic *nic)
 {
-   nic->mem = pci_alloc_consistent(nic->pdev, sizeof(struct mem),
-   &nic->dma_addr);
-   return nic->mem ? 0 : -ENOMEM;
-}
+   struct device *dev = &nic->pdev->dev;
 
-static void e100_free(struct nic *nic)
-{
-   if(nic->mem) {
-   pci_free_consistent(nic->pdev, sizeof(struct mem),
-   nic->mem, nic->dma_addr);
-   nic->mem = NULL;
-   }
+   nic->mem = dmam_alloc_coherent(dev, sizeof(struct mem),
+   &nic->dma_addr, GFP_ATOMIC);
+   return nic->mem ? 0 : -ENOMEM;
 }
 
 static int e100_open(struct net_device *netdev)
@@ -2552,7 +2545,7 @@ static int __devinit e100_probe(struct p
struct nic *nic;
int err;
 
-   if(!(netdev = alloc_etherdev(sizeof(struct nic {
+   if (!(netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct nic {
if(((1 << debug) - 1) & NETIF_MSG_PROBE)
printk(KERN_ERR PFX "Etherdev alloc failed, abort.\n");
return -ENOMEM;
@@ -2582,26 +2575,26 @@ static int __devinit e100_probe(struct p
nic->msg_enable = (1 << debug) - 1;
pci_set_drvdata(pdev, netdev);
 
-   if((err = pci_enable_device(pdev))) {
+   if ((err = pcim_enable_device(pdev))) {
DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
-   goto err_out_free_dev;
+   return err;
}
 
if(!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
DPRINTK(PROBE, ERR, "Cannot find proper PCI device "
"base address, aborting.\n");
err = -ENODEV;
-   goto err_out_disable_pdev;
+   return err;
}
 
if((err = pci_request_regions(pdev, DRV_NAME))) {
DPRINTK(PROBE, ERR, "Cannot obtain PCI resources, aborting.\n");
-   goto err_out_disable_pdev;
+   return err;
}
 
if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
DPRINTK(PROBE, ERR, "No usable DMA configuration, aborting.\n");
-   goto err_out_free_res;
+   return err;
}
 
SET_MODULE_OWNER(netdev);
@@ -2610,11 +2603,11 @@ static int __devinit e100_probe(struct p
if (use_io)
DPRINTK(PROBE, INFO, "using i/o access mode\n");
 
-   nic->csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
+   nic->csr = pcim_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
if(!nic->csr) {
DPRINTK(PROBE, ERR, "Cannot map device registers, aborting.\n");
err = -ENOMEM;
-   goto err_out_free_res;
+   return err;
}
 
if(ent->driver_data)
@@ -2647,11 +2640,11 @@ static int __devinit e100_probe(struct p
 
if((err = e100_alloc(nic))) {
DPRINTK(PROBE, ERR, "Cannot alloc driver memory, aborting.\n");
-   goto err_out_iounmap;
+   return err;
}
 
if((err = e100_eeprom_load(nic)))
-   goto err_out_free;
+   return err;
 
e100_phy_init(nic);
 
@@ -2661,8 +2654,8 @@ static int __devinit e100_probe(struct p
if (!eeprom_bad_csum_allow) {
DPRINTK(PROBE, ERR, "Invalid MAC address from "
"EEPROM, aborting.\n");
-   err = -EAGAIN;
-   goto err_out_free;
+
+   return -EAGAIN;
} else {
DPRINTK(PROBE, ERR, "Invalid MAC address from EEPROM, "
"you MUST configure one.\n");
@@ -2682,7 +2675,7 @@ static int __devinit e100_probe(struct p
strcpy(netdev->name, "eth%d");
if((err = register_netdev(netdev))) {
DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n");
-   goto err_out_free;
+   return err;
}
 
   

[PATCH 1/2][RFC] Update net core to use devres.

2007-07-24 Thread bphilips
* netdev_pci_remove_one() can replace simple pci device remove
  functions

* devm_alloc_netdev() is like alloc_netdev but allocates memory using devres.
  alloc_netdev() can be removed once all drivers use devres.

Applies against 2.6.22

Signed-off-by: Brandon Philips <[EMAIL PROTECTED]>

---
 include/linux/etherdevice.h |2 +
 include/linux/netdevice.h   |5 
 net/core/dev.c  |   48 
 net/ethernet/eth.c  |6 +
 4 files changed, 57 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/netdevice.h
===
--- linux-2.6.orig/include/linux/netdevice.h
+++ linux-2.6/include/linux/netdevice.h
@@ -622,6 +622,7 @@ extern int  dev_queue_xmit(struct sk_buf
 extern int register_netdevice(struct net_device *dev);
 extern voidunregister_netdevice(struct net_device *dev);
 extern voidfree_netdev(struct net_device *dev);
+extern voidnetdev_pci_remove_one(struct pci_dev *pdev);
 extern voidsynchronize_net(void);
 extern int register_netdevice_notifier(struct notifier_block *nb);
 extern int unregister_netdevice_notifier(struct notifier_block 
*nb);
@@ -992,6 +993,10 @@ static inline void netif_tx_disable(stru
 extern voidether_setup(struct net_device *dev);
 
 /* Support for loadable net-drivers */
+
+extern struct net_device *devm_alloc_netdev(struct device *dev,
+   int sizeof_priv, const char *name,
+   void (*setup)(struct net_device *));
 extern struct net_device *alloc_netdev(int sizeof_priv, const char *name,
   void (*setup)(struct net_device *));
 extern int register_netdev(struct net_device *dev);
Index: linux-2.6/net/core/dev.c
===
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3343,7 +3344,8 @@ static struct net_device_stats *internal
 }
 
 /**
- * alloc_netdev - allocate network device
+ * __alloc_netdev - allocate network device
+ * @dev:   managed device responsible for mem; NULL if not managed
  * @sizeof_priv:   size of private data to allocate space for
  * @name:  device name format string
  * @setup: callback to initialize device
@@ -3351,8 +3353,8 @@ static struct net_device_stats *internal
  * Allocates a struct net_device with private data area for driver use
  * and performs basic initialization.
  */
-struct net_device *alloc_netdev(int sizeof_priv, const char *name,
-   void (*setup)(struct net_device *))
+struct net_device *__alloc_netdev(struct device *gendev, int sizeof_priv,
+   const char *name, void (*setup)(struct net_device *))
 {
void *p;
struct net_device *dev;
@@ -3364,7 +3366,11 @@ struct net_device *alloc_netdev(int size
alloc_size = (sizeof(*dev) + NETDEV_ALIGN_CONST) & ~NETDEV_ALIGN_CONST;
alloc_size += sizeof_priv + NETDEV_ALIGN_CONST;
 
-   p = kzalloc(alloc_size, GFP_KERNEL);
+   if (dev == NULL)
+   p = kzalloc(alloc_size, GFP_KERNEL);
+   else
+   p = devm_kzalloc(gendev, alloc_size, GFP_KERNEL);
+
if (!p) {
printk(KERN_ERR "alloc_netdev: Unable to allocate device.\n");
return NULL;
@@ -3382,8 +3388,23 @@ struct net_device *alloc_netdev(int size
strcpy(dev->name, name);
return dev;
 }
+
+struct net_device *devm_alloc_netdev(struct device *dev, int sizeof_priv, const
+   char *name, void (*setup)(struct net_device *))
+{
+   return __alloc_netdev(dev, sizeof_priv, name, setup);
+}
+EXPORT_SYMBOL(devm_alloc_netdev);
+
+struct net_device *alloc_netdev(int sizeof_priv, const char *name,
+   void (*setup)(struct net_device *))
+{
+   return __alloc_netdev(NULL, sizeof_priv, name, setup);
+}
 EXPORT_SYMBOL(alloc_netdev);
 
+
+
 /**
  * free_netdev - free network device
  * @dev: device
@@ -3411,6 +3432,25 @@ void free_netdev(struct net_device *dev)
 #endif
 }
 
+/**
+ * free_netdev - free network device
+ * @dev: device
+ *
+ * This function does the last stage of destroying an allocated device
+ * interface. The reference to the device object is released.
+ * If this is the last reference then it will be freed.
+ */
+void netdev_pci_remove_one(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   if (netdev) {
+   unregister_netdev(netdev);
+   pci_set_drvdata(pdev, NULL);
+   }
+}
+EXPORT_SYMBOL(netdev_pci_remove_one);
+
+
 /* Synchronize with packet receive processing. */

[PATCH 0/2][RFC] Update network drivers to use devres

2007-07-24 Thread bphilips
These patches update the network driver core and the e100 driver to use devres.
Devres is a simple resource manager for device drivers, see
Documentation/driver-model/devres.txt for more information.

If the RFC goes well I will create patches to update all applicable network
drivers.

Applies against 2.6.22

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK]: Correctly report errors while registering a multicast group

2007-07-24 Thread David Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 11:44:27 +0200

> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Also applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc

2007-07-24 Thread PJ Waskiewicz
This is a patch from Patrick McHardy to fix the sch_api code, which I
went ahead and tested and made a slight fix to.  This also includes
the fix to sch_prio based on Patrick's patch.

The sch->parent handle should contain the parent qdisc ID.  When the
qdisc is the root qdisc (TC_H_ROOT), the parent handle should be the
value TC_H_ROOT.  This fixes sch_api to set this correctly on
qdisc_create() for both ingress and egress qdiscs.

Change this check in prio_tune() so that only the root qdisc can be
multiqueue-enabled; use sch->parent instead of sch->handle.

-- 
PJ Waskiewicz <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] NET: Fix sch_prio to properly detect the root qdisc on multiqueue

2007-07-24 Thread PJ Waskiewicz
Fix the check in prio_tune() to see if sch->parent is TC_H_ROOT instead of
sch->handle to load or reject the qdisc for multiqueue devices.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 net/sched/sch_prio.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 2d8c084..06441db 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -239,11 +239,13 @@ static int prio_tune(struct Qdisc *sch, struct rtattr 
*opt)
/* If we're multiqueue, make sure the number of incoming bands
 * matches the number of queues on the device we're associating with.
 * If the number of bands requested is zero, then set q->bands to
-* dev->egress_subqueue_count.
+* dev->egress_subqueue_count.  Also, the root qdisc must be the
+* only one that is enabled for multiqueue, since it's the only one
+* that interacts with the underlying device.
 */
q->mq = RTA_GET_FLAG(tb[TCA_PRIO_MQ - 1]);
if (q->mq) {
-   if (sch->handle != TC_H_ROOT)
+   if (sch->parent != TC_H_ROOT)
return -EINVAL;
if (netif_is_multiqueue(sch->dev)) {
if (q->bands == 0)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK]: Fix adjustment of number of multicast groups

2007-07-24 Thread David Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 11:45:14 +0200

> The current calculation of the maximum number of genetlink
> multicast groups seems odd, fix it.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] NET: Fix sch_api to properly set sch->parent on the root qdisc

2007-07-24 Thread PJ Waskiewicz
From: Patrick McHardy <[EMAIL PROTECTED]>

Fix sch_api to correctly set sch->parent for both ingress and egress
qdiscs in qdisc_create().

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 net/sched/sch_api.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 13c09bc..dee0d5f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -380,6 +380,10 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned 
int n)
return;
while ((parentid = sch->parent)) {
sch = qdisc_lookup(sch->dev, TC_H_MAJ(parentid));
+   if (sch == NULL) {
+   WARN_ON(parentid != TC_H_ROOT);
+   return;
+   }
cops = sch->ops->cl_ops;
if (cops->qlen_notify) {
cl = cops->get(sch, parentid);
@@ -420,8 +424,6 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
unsigned long cl = cops->get(parent, classid);
if (cl) {
err = cops->graft(parent, cl, new, old);
-   if (new)
-   new->parent = classid;
cops->put(parent, cl);
}
}
@@ -436,7 +438,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc 
*parent,
  */
 
 static struct Qdisc *
-qdisc_create(struct net_device *dev, u32 handle, struct rtattr **tca, int 
*errp)
+qdisc_create(struct net_device *dev, u32 parent, u32 handle,
+  struct rtattr **tca, int *errp)
 {
int err;
struct rtattr *kind = tca[TCA_KIND-1];
@@ -482,6 +485,8 @@ qdisc_create(struct net_device *dev, u32 handle, struct 
rtattr **tca, int *errp)
goto err_out2;
}
 
+   sch->parent = parent;
+
if (handle == TC_H_INGRESS) {
sch->flags |= TCQ_F_INGRESS;
sch->stats_lock = &dev->ingress_lock;
@@ -758,9 +763,11 @@ create_n_graft:
if (!(n->nlmsg_flags&NLM_F_CREATE))
return -ENOENT;
if (clid == TC_H_INGRESS)
-   q = qdisc_create(dev, tcm->tcm_parent, tca, &err);
+   q = qdisc_create(dev, tcm->tcm_parent, tcm->tcm_parent,
+tca, &err);
else
-   q = qdisc_create(dev, tcm->tcm_handle, tca, &err);
+   q = qdisc_create(dev, tcm->tcm_parent, tcm->tcm_handle,
+tca, &err);
if (q == NULL) {
if (err == -EAGAIN)
goto replay;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread David Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Tue, 24 Jul 2007 23:52:57 +0200

> family->mcast_groups is protected by genl_lock so it must
> be held while accessing the list in genl_unregister_mc_groups().
> Requires adding a non-locking variant of genl_unregister_mc_group().
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Applied, thanks Thomas.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] TIPC: fix tipc_link_create error handling

2007-07-24 Thread Florian Westphal
if printbuf allocation or tipc_node_attach_link() fails, invalid
references to the link are left in the associated node and bearer
structures.
Fix by allocating printbuf early and moving timer initialization
and the addition of the new link to the b_ptr->links list after
tipc_node_attach_link() succeeded.

Signed-off-by: Florian Westphal <[EMAIL PROTECTED]>
---
 link.c |   28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

also move k_init_timer(), as suggested by Allan.

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 5adfdfd..1d674e0 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -423,6 +423,17 @@ struct link *tipc_link_create(struct bearer *b_ptr, const 
u32 peer,
return NULL;
}
 
+   if (LINK_LOG_BUF_SIZE) {
+   char *pb = kmalloc(LINK_LOG_BUF_SIZE, GFP_ATOMIC);
+
+   if (!pb) {
+   kfree(l_ptr);
+   warn("Link creation failed, no memory for print 
buffer\n");
+   return NULL;
+   }
+   tipc_printbuf_init(&l_ptr->print_buf, pb, LINK_LOG_BUF_SIZE);
+   }
+
l_ptr->addr = peer;
if_name = strchr(b_ptr->publ.name, ':') + 1;
sprintf(l_ptr->name, "%u.%u.%u:%s-%u.%u.%u:",
@@ -432,8 +443,6 @@ struct link *tipc_link_create(struct bearer *b_ptr, const 
u32 peer,
tipc_zone(peer), tipc_cluster(peer), tipc_node(peer));
/* note: peer i/f is appended to link name by reset/activate */
memcpy(&l_ptr->media_addr, media_addr, sizeof(*media_addr));
-   k_init_timer(&l_ptr->timer, (Handler)link_timeout, (unsigned 
long)l_ptr);
-   list_add_tail(&l_ptr->link_list, &b_ptr->links);
l_ptr->checkpoint = 1;
l_ptr->b_ptr = b_ptr;
link_set_supervision_props(l_ptr, b_ptr->media->tolerance);
@@ -459,21 +468,14 @@ struct link *tipc_link_create(struct bearer *b_ptr, const 
u32 peer,
 
l_ptr->owner = tipc_node_attach_link(l_ptr);
if (!l_ptr->owner) {
+   if (LINK_LOG_BUF_SIZE)
+   kfree(l_ptr->print_buf.buf);
kfree(l_ptr);
return NULL;
}
 
-   if (LINK_LOG_BUF_SIZE) {
-   char *pb = kmalloc(LINK_LOG_BUF_SIZE, GFP_ATOMIC);
-
-   if (!pb) {
-   kfree(l_ptr);
-   warn("Link creation failed, no memory for print 
buffer\n");
-   return NULL;
-   }
-   tipc_printbuf_init(&l_ptr->print_buf, pb, LINK_LOG_BUF_SIZE);
-   }
-
+   k_init_timer(&l_ptr->timer, (Handler)link_timeout, (unsigned 
long)l_ptr);
+   list_add_tail(&l_ptr->link_list, &b_ptr->links);
tipc_k_signal((Handler)tipc_link_start, (unsigned long)l_ptr);
 
dbg("tipc_link_create(): tolerance = %u,cont intv = %u, abort_limit = 
%u\n",
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Thomas Graf
family->mcast_groups is protected by genl_lock so it must
be held while accessing the list in genl_unregister_mc_groups().
Requires adding a non-locking variant of genl_unregister_mc_group().

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6/net/netlink/genetlink.c
===
--- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 
+0200
+++ net-2.6/net/netlink/genetlink.c 2007-07-24 23:51:11.0 +0200
@@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f
 }
 EXPORT_SYMBOL(genl_register_mc_group);
 
+static void __genl_unregister_mc_group(struct genl_family *family,
+  struct genl_multicast_group *grp)
+{
+   BUG_ON(grp->family != family);
+   netlink_clear_multicast_users(genl_sock, grp->id);
+   clear_bit(grp->id, mc_groups);
+   list_del(&grp->list);
+   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
+   grp->id = 0;
+   grp->family = NULL;
+}
+
 /**
  * genl_unregister_mc_group - unregister a multicast group
  *
@@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
 void genl_unregister_mc_group(struct genl_family *family,
  struct genl_multicast_group *grp)
 {
-   BUG_ON(grp->family != family);
genl_lock();
-   netlink_clear_multicast_users(genl_sock, grp->id);
-   clear_bit(grp->id, mc_groups);
-   list_del(&grp->list);
-   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
-   grp->id = 0;
-   grp->family = NULL;
+   __genl_unregister_mc_group(family, grp);
genl_unlock();
 }
 
@@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st
 {
struct genl_multicast_group *grp, *tmp;
 
+   genl_lock();
list_for_each_entry_safe(grp, tmp, &family->mcast_groups, list)
-   genl_unregister_mc_group(family, grp);
+   __genl_unregister_mc_group(family, grp);
+   genl_unlock();
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Thomas Graf
* Brian Haley <[EMAIL PROTECTED]> 2007-07-24 12:14
> Thomas Graf wrote:
> >@@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
> > void genl_unregister_mc_group(struct genl_family *family,
> >   struct genl_multicast_group *grp)
> > {
> >-BUG_ON(grp->family != family);
> > genl_lock();
> >-netlink_clear_multicast_users(genl_sock, grp->id);
> >-clear_bit(grp->id, mc_groups);
> >-list_del(&grp->list);
> >-genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
> >-grp->id = 0;
> >-grp->family = NULL;
> >+genl_unregister_mc_group(family, grp);
> > genl_unlock();
> > }
> 
> Shouldn't this be __genl_unregister_mc_group(family, grp) ?

Yes, thank for you noticing.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-24 Thread Shannon Nelson
Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was
unreachable and thus unused.  The code originally was there to support
in-kernel dma needs, but since it remains unused, we'll pull it out.

Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>
---

 drivers/dma/iovlock.c |   27 ---
 1 files changed, 0 insertions(+), 27 deletions(-)

diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
index d637555..e763d72 100644
--- a/drivers/dma/iovlock.c
+++ b/drivers/dma/iovlock.c
@@ -143,29 +143,6 @@ void dma_unpin_iovec_pages(struct dma_pinned_list 
*pinned_list)
kfree(pinned_list);
 }
 
-static dma_cookie_t dma_memcpy_to_kernel_iovec(struct dma_chan *chan, struct
-   iovec *iov, unsigned char *kdata, size_t len)
-{
-   dma_cookie_t dma_cookie = 0;
-
-   while (len > 0) {
-   if (iov->iov_len) {
-   int copy = min_t(unsigned int, iov->iov_len, len);
-   dma_cookie = dma_async_memcpy_buf_to_buf(
-   chan,
-   iov->iov_base,
-   kdata,
-   copy);
-   kdata += copy;
-   len -= copy;
-   iov->iov_len -= copy;
-   iov->iov_base += copy;
-   }
-   iov++;
-   }
-
-   return dma_cookie;
-}
 
 /*
  * We have already pinned down the pages we will be using in the iovecs.
@@ -187,10 +164,6 @@ dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, 
struct iovec *iov,
if (!chan)
return memcpy_toiovec(iov, kdata, len);
 
-   /* -> kernel copies (e.g. smbfs) */
-   if (!pinned_list)
-   return dma_memcpy_to_kernel_iovec(chan, iov, kdata, len);
-
iovec_idx = 0;
while (iovec_idx < pinned_list->nr_iovecs) {
struct dma_page_list *page_list;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] forcedeth: new device ids in pci_ids.h

2007-07-24 Thread Jeff Garzik

Ayaz Abdulla wrote:

This patch contains new device ids for MCP73 chipset.

Signed-Off-By: Ayaz Abdulla <[EMAIL PROTECTED]>


applied 1-2


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] netdev: i82596 Ethernet needs

2007-07-24 Thread Jeff Garzik

Geert Uytterhoeven wrote:

netdev: i82596 Ethernet needs  on m68k

drivers/net/82596.c: In function 'init_rx_bufs':
drivers/net/82596.c:552: error: implicit declaration of function 'cache_clear'
drivers/net/82596.c: In function 'i596_start_xmit':
drivers/net/82596.c:1104: error: implicit declaration of function 'cache_push'

The driver still compiles on ia32 (CONFIG_APRICOT=y)

Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] defxx: Use __maybe_unused rather than a local hack

2007-07-24 Thread Jeff Garzik

Maciej W. Rozycki wrote:
 This is a change to remove a local hack in favour to __maybe_unused that 
has been recently added.


Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>
---
Hi,

 It should be obvious.  The code builds, therefore it works.

 Please apply,

  Maciej



applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] drivers/net/acenic.c: fix check-after-use

2007-07-24 Thread Jeff Garzik

Adrian Bunk wrote:
The Coverity checker noted that we've already dereferenced "dev" when we 
check whether it's NULL.


Since it's impossible that "dev" is NULL at this place this patch 
removes the NULL check.


Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] eHEA: net_poll support

2007-07-24 Thread Jeff Garzik

Jan-Bernd Themann wrote:

net_poll support for eHEA added

Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
---


 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |   22 +-
 2 files changed, 22 insertions(+), 2 deletions(-)


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ucc_geth: add ethtool support

2007-07-24 Thread Jeff Garzik

Li Yang wrote:

The patch enables statistics in ucc_geth and adds ethtool support to
ucc_geth driver.

Signed-off-by: Li Yang <[EMAIL PROTECTED]>
---
 drivers/net/Makefile   |2 +-
 drivers/net/ucc_geth.c |   19 +-
 drivers/net/ucc_geth.h |6 +
 drivers/net/ucc_geth_ethtool.c |  388 
 drivers/net/ucc_geth_mii.c |6 +-
 5 files changed, 407 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/ucc_geth_ethtool.c


applied 1-2


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] netxen: Load firmware during probe, dma watchdog fix.

2007-07-24 Thread Jeff Garzik

[EMAIL PROTECTED] wrote:

The firmware should be loaded after resetting hardware during PCI probe,
besides module unload. This fixes issue with 2nd port of multiport adapter
on powerpc blades. This patch also fixes a bug that PCI resources are not
freed if dma watchdog shutdown failed. The dma watchdog poll messages
during module unload are also suppressed.

Signed-off-by: Dhananjay Phadke <[EMAIL PROTECTED]>
Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/10] ps3: fix wrong calculation of rx descriptor address

2007-07-24 Thread Jeff Garzik

Masakazu Mokuno wrote:

Fixed the bug that calculation of the address of rx descriptor was
wrong.

Signed-off-by: Masakazu Mokuno <[EMAIL PROTECTED]>
---
 drivers/net/ps3_gelic_net.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


applied 1-10


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] atl1: change tpd_avail function name

2007-07-24 Thread Jeff Garzik

Jay Cliburn wrote:

Change tpd_avail() to atl1_tpd_avail().

Signed-off-by: Jay Cliburn <[EMAIL PROTECTED]>
---
 drivers/net/atl1/atl1_main.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


applied 1-5


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in mace_interrupt() in drivers/net/pcmcia/nmclan_cs.c

2007-07-24 Thread Jeff Garzik

Micah Gruber wrote:
This patch fixes a potential null dereference bug where we dereference 
DEV before a null check. This patch simply moves the dereferencing after 
the null check.


Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>

---

--- a/drivers/net/pcmcia/nmclan_cs.c+++ 
b/drivers/net/pcmcia/nmclan_cs.c@@ -996,7 +996,7 @@


PLEASE PLEASE PLEASE fix your mailer.

None of your patches are apply-able via script or patch(1), which is how 
all Linux maintainers apply patches.


If you want to contribute to the kernel, you -must- figure out how to 
send patches via email.


Linux kernel development is done almost exclusively through email, so it 
is very important to get the details right.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Alexey Dobriyan
On Tue, Jul 24, 2007 at 11:25:22AM -0700, Linus Torvalds wrote:
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > I guess this was the bug:
>
> Looks very likely to me. Mike, Alexey, does this fix things for you?

Yeah, box is running for more than hour, survived LTP, gdb testsuite,
portage sync and so on.

What new to me is that someone decided TSC is unstable but that's
probably OK on full debug kernel:

Clocksource tsc unstable (delta = 91724418 ns)
Time: pit clocksource has been installed.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Tue, 24 Jul 2007, Ingo Molnar wrote:
> > 
> > please try the patch below instead.
> 
> I'm hoping this is just a "let's see if the behavior changes" patch, 
> not something that you think should be applied if it fixes something?
> 
> This patch looks like it is trying to paper over (rather than fix) 
> some possible bug in the "->disable" logic. Makes sense as a "let's 
> see if it's that" kind of thing, but not as a "let's fix it".
> 
> Or am I missing something?

yeah - it's a totaly bad and unacceptable hack (i realized how bad it 
was when i wrote up that comment section ...), i just wanted to see 
which portion of ne2k/lib8390.c is sensitive to the fact whether an irq 
line is masked or not. The patch has no SOB line either.

the current best fix forward is to undo my original change, unless we 
find a better fix for this problem. (Note that the other patches posted 
in this thread are broken too: they only mask the irq but dont reliably 
unmask it.)

here's the current method of handling irqs for Marcin's card:

17: 12   IO-APIC-fasteoi   eth1, eth0

and fasteoi is a really simple sequence: no masking/unmasking by the 
flow handler itself but a NOP at entry and an APIC-EOI at the end. The 
disable/enable irq thing should thus have minimal effect if done within 
an irq handler.

now ne2k does something uncommon: for xmit (which is normally done 
outside of irq handlers) it will disable_irq_nosync()/enable_irq() the 
interrupt. It does it to exclude the handler from _that_ CPU, but due to 
the _nosync it does not exclude it from any other CPUs. So that's a bit 
weird already.

just in case, i've just re-checked all the genirq bits that change 
IRQ_DISABLED (that bit accidentally clear would be the only way to truly 
allow an IRQ handler to interrupt the disable_irq_nosync() critical 
section on that CPU) - but i can see no way for that to happen: we 
unconditionally detect and report unbalanced and underflowing 
irq_desc->depth, and the only other place (besides enable_irq()) that 
clears IRQ_DISABLED is __set_irq_handler(), and on x86 that is only used 
during bootup.

Marcin, could you try the patch below too? [without having any other 
patch applied.] It basically turns the critical section into an irqs-off 
critical section and thus checks whether your problem is related to that 
particular area of code.

Ingo

Index: linux/drivers/net/lib8390.c
===
--- linux.orig/drivers/net/lib8390.c
+++ linux/drivers/net/lib8390.c
@@ -297,9 +297,7 @@ static int ei_start_xmit(struct sk_buff 
 *  Slow phase with lock held.
 */
 
-   disable_irq_nosync_lockdep_irqsave(dev->irq, &flags);
-
-   spin_lock(&ei_local->page_lock);
+   spin_lock_irqsave(&ei_local->page_lock, flags);
 
ei_local->irqlock = 1;
 
@@ -376,8 +374,7 @@ static int ei_start_xmit(struct sk_buff 
ei_local->irqlock = 0;
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
-   spin_unlock(&ei_local->page_lock);
-   enable_irq_lockdep_irqrestore(dev->irq, &flags);
+   spin_unlock_irqrestore(&ei_local->page_lock, flags);
 
dev_kfree_skb (skb);
ei_local->stat.tx_bytes += send_length;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Al Boldi
Patrick McHardy wrote:
> Al Boldi wrote:
> > Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead
> > of NF_CONNTRACK.
>
> I guess so, and that would have to select NF_CONNTRACK.

Should I resend, or can you take care of it?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] amso1100: QP init bug in amso driver

2007-07-24 Thread Tom Tucker
Roland:

The guys at UNH found this and fixed it. I'm surprised no
one has hit this before. I guess it only breaks when the 
refcount on the QP is non-zero.

Initialize the wait_queue_head_t in the c2_qp structure.

Signed-off-by: Ethan Burns <[EMAIL PROTECTED]>
Acked-by: Tom Tucker <[EMAIL PROTECTED]>

---
 drivers/infiniband/hw/amso1100/c2_qp.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c 
b/drivers/infiniband/hw/amso1100/c2_qp.c
index 420c138..01d0786 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -506,6 +506,7 @@ int c2_alloc_qp(struct c2_dev *c2dev,
qp->send_sgl_depth = qp_attrs->cap.max_send_sge;
qp->rdma_write_sgl_depth = qp_attrs->cap.max_send_sge;
qp->recv_sgl_depth = qp_attrs->cap.max_recv_sge;
+   init_waitqueue_head(&qp->wait);
 
/* Initialize the SQ MQ */
q_size = be32_to_cpu(reply->sq_depth);

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add new timeval_to_sec function

2007-07-24 Thread David Stevens
Oliver Hartkopp <[EMAIL PROTECTED]> wrote on 07/23/2007 11:22:39 PM:

> When you like to create any timeout based on your calculated value, you
> might run into the problem that your calculated value is set to _zero_
> even if there was "some time" before the conversion. This might probably
> not what you indented to get.

BTW, these are stats (for human consumption), not timers.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Adrian Bunk wrote:
> 
> > But do we 
> > care so much that it's worth inlining something like buffered_rmqueue()? 
> >...
> 
> Where is the problem with having buffered_rmqueue() inlined?

In this case, it was a pain to just even try to find the call chain, or 
read the asm.

I would encourage lots of kernel hackers to read the assembler code gcc 
generates. I suspect people being aware of code generation issues (and 
writing their code with that in mind) is a *much* bigger performance 
impact than gcc inlining random functions.

So maybe I'm old-fashioned and crazy, but "readability of the asm result" 
actually is a worthwhile goal. Not because we care directly, but because 
I'd like to encourage people to do it, due to the *indirect* benefits.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Andi Kleen wrote:
> 
> There's probably a --param where it can be tweaked exactly. The
> problem is that --params tend to be very gcc version specific
> and might do something completely different on a newer or 
> older version. So it's better not to use them.

I agree wholeheartedly with that sentiment. We've tried at times (because 
some gcc snapshots made some *truly* insane choices for a while), and 
maybe we still have some around. Not worth the pain.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Adrian Bunk
On Tue, Jul 24, 2007 at 12:15:48PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > 
> > fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
> > vmlinux .text from 928360 up to 955362 bytes (27k larger).
> > 
> > A surprisingly large increase - I wonder if it did something dumb.  It
> > appears to still correctly inline those things which we've manually marked
> > inline.  hm.
> 
> I think inlining small enough functions is worth it, and the thing is, the 
> kernel is actually pretty damn good at having lots of small functions. 
> It's one of the few things I really care about from a coding style 
> standpoint.
> 
> So I'm not surprised that "-fno-inline-functions-called-once" makes things 
> larger, because I think it's generally a good idea to inline things that 
> are just called once. But it does make things harder to debug, and the 
> performance advantages become increasingly small for bigger functions.
> 
> And that's a balancing act. Do we care about performance? Yes.

When using CONFIG_CC_OPTIMIZE_FOR_SIZE=y we even actively tell gcc that 
we only care about size and do not care about performance...

> But do we 
> care so much that it's worth inlining something like buffered_rmqueue()? 
>...

Where is the problem with having buffered_rmqueue() inlined?

>   Linus

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Andi Kleen
Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> So "called once" should probably make the inlining weight bigger (ie 
> inline *larger* functions than you would otherwise), it just shouldn't 
> make it "infinite". It's not worth it.

There's probably a --param where it can be tweaked exactly. The
problem is that --params tend to be very gcc version specific
and might do something completely different on a newer or 
older version. So it's better not to use them.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Ingo Molnar wrote:
> 
> please try the patch below instead.

I'm hoping this is just a "let's see if the behavior changes" patch, not 
something that you think should be applied if it fixes something?

This patch looks like it is trying to paper over (rather than fix) some 
possible bug in the "->disable" logic. Makes sense as a "let's see if it's 
that" kind of thing, but not as a "let's fix it".

Or am I missing something?

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] Implement batching skb API

2007-07-24 Thread jamal
KK,

On Tue, 2007-24-07 at 09:14 +0530, Krishna Kumar2 wrote:

> 
> J Hadi Salim <[EMAIL PROTECTED]> wrote on 07/23/2007 06:02:01 PM:


> Actually you have not sent netperf results with prep and without prep.

My results were based on pktgen (which i explained as testing the
driver). I think depending on netperf without further analysis is
simplistic. It was like me doing forwarding tests on these patches.

> > So _which_ non-LLTX driver doesnt do that? ;->
> 
> I have no idea since I haven't looked at all drivers. Can you tell which
> all non-LLTX drivers does that ? I stated this as the sole criterea.

The few i have peeked at all do it. I also think the e1000 should be
converted to be non-LLTX. The rest of netdev is screaming to kill LLTX. 

> > tun driver doesnt use it either - but i doubt that makes it "bloat"
> 
> Adding extra code that is currently not usable (esp from a submission
> point) is bloat.

So far i have converted 3 drivers, 1 of them doesnt use it. Two more
driver conversions are on the way, they will both use it. How is this
bloat again? 
A few emails back you said if only IPOIB can use batching then thats
good enough justification. 

> > You waltz in, have the luxury of looking at my code, presentations, many
> > discussions with me etc ...
> 
> "luxury" ? 
> I had implemented the entire thing even before knowing that you
> are working on something similar! and I had sent the first proposal to
> netdev,

I saw your patch at the end of may (or at least 2 weeks after you said
it existed). That patch has very little resemblance to what you just
posted conceptwise or codewise. I could post it if you would give me
permission.

> *after* which you told that you have your own code and presentations (which
> I had never seen earlier - I joined netdev a few months back, earlier I was
> working on RDMA, Infiniband as you know).

I am gonna assume you didnt know of my work - which i have been making
public for about 3 years. Infact i talked about this topic when i
visited your office in 2006 on a day you were not present, so it is
plausible you didnt hear of it.

>  And it didn't give me any great
> ideas either, remember I had posted results for E1000 at the time of
> sending the proposals. 

In mid-June you sent me a series of patches which included anything from
changing variable names to combining qdisc_restart and about everything
i referred to as being "cosmetic differences" in your posted patches. I
took two of those and incorporated them in. One was an "XXX" in my code
already to allocate the dev->blist 
(Commit: bb4464c5f67e2a69ffb233fcf07aede8657e4f63). 
The other one was a mechanical removal of the blist being passed
(Commit: 0e9959e5ee6f6d46747c97ca8edc91b3eefa0757). 
Some of the others i asked you to defer. For example, the reason i gave
you for not merging any qdisc_restart_combine changes is because i was
waiting for Dave to swallow the qdisc_restart changes i made; otherwise
maintainance becomes extremely painful for me. 
Sridhar actually provided a lot more valuable comments and fixes but has
not planted a flag on behalf of the queen of spain like you did. 

> However I do give credit in my proposal to you for what
> ideas that your provided (without actual code), and the same I did for other
> people who did the same, like Dave, Sridhar. BTW, you too had discussions 
> with me,
> and I sent some patches to improve your code too, 

I incorporated two of your patches and asked for deferal of others.
These patches have now shown up in what you claim as "the difference". I
just call them "cosmetic difference" not to downplay the importance of
having an ethtool interface but because they do not make batching
perform any better. The real differences are those two items. I am
suprised you havent cannibalized those changes as well. I thought you
renamed them to something else; according to your posting:
"This patch will work with drivers updated by Jamal, Matt & Michael Chan
with minor modifications - rename xmit_win to xmit_slots & rename batch
handler". Or maybe thats a "future plan" you have in mind?

> so it looks like a two
> way street to me (and that is how open source works and should).

Open source is a lot more transparent than that.

You posted a question, which was part of your research. I responded and
told you i have patches; you asked me for them and i promptly ported
them from pre-2.6.18 to the latest kernel at the time. 

The nature of this batching work is one of performance. So numbers are
important. If you had some strong disagreements on something in the
architecture, then it would be of great value to explain it in a
technical detail - and more importantly to provide some numbers to say
why it is a bad idea. You get numbers by running some tests. 
You did none of the above. Your effort has been to produce "your patch"
for whatever reasons. This would not have been problematic to me if it
actually was based within reasons of optimization because the end goal
would h

Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>Al Boldi wrote:
>>
>>>Patrick McHardy wrote:
>>>
But I vaguely recall having tried this myself and it broke somewhere,
maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
recall anymore. Al, if this also works without removal of
NF_CONNTRACK_ENABLED, please resend without that part.
>>>
>>>It doesn't.  But how about this, if you really can't live without
>>>NF_CONNTRACK_ENBLED:
>>>
>>>==
>>>--- Kconfig.old  2007-07-09 06:38:52.0 +0300
>>>+++ Kconfig  2007-07-24 20:24:27.0 +0300
>>>@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
>>>   and is also scheduled to replace the old syslog-based ipt_LOG
>>>   and ip6t_LOG modules.
>>>
>>>-# Rename this to NF_CONNTRACK in a 2.6.25
>>>-config NF_CONNTRACK_ENABLED
>>>+config NF_CONNTRACK
>>> tristate "Netfilter connection tracking support"
>>> help
>>>   Connection tracking keeps a record of what packets have passed
>>>@@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
>>>
>>>   To compile it as a module, choose M here.  If unsure, say N.
>>>
>>>-config NF_CONNTRACK
>>>+config NF_CONNTRACK_ENABLED
>>> tristate
>>>-default NF_CONNTRACK_ENABLED
>>>+default NF_CONNTRACK
>>>
>>> config NF_CT_ACCT
>>> bool "Connection tracking flow accounting"
>>
>>That defeats the only purpose why we kept it.
> 
> 
> I'm not sure how this would defeat the only purpose.  Isn't the purpose of 
> this to alias NF_CONNTRACK_ENABLED to NF_CONNTRACK?  And as such would yield 
> the same result.


The purpose is to avoid forcing people a second time to reconfigure
the conntrack options since we've completed nf_conntrack and removed
ip_conntrack. Previously NF_CONNTRACK was a bool (selecting the new
implementation) and NF_CONNTRACK_ENABLED specified whether to build
either nf_conntrack or ip_conntrack modular/static/not at all. So
old configs only have the information whether to build modular in
NF_CONNTRACK_ENABLED, but NF_CONNTRACK is what actually controls it.
With your change, old configs will still build nf_conntrack properly,
but they will always choose static linking.

> Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of 
> NF_CONNTRACK.

I guess so, and that would have to select NF_CONNTRACK.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Andrew Morton wrote:
> 
> fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
> vmlinux .text from 928360 up to 955362 bytes (27k larger).
> 
> A surprisingly large increase - I wonder if it did something dumb.  It
> appears to still correctly inline those things which we've manually marked
> inline.  hm.

I think inlining small enough functions is worth it, and the thing is, the 
kernel is actually pretty damn good at having lots of small functions. 
It's one of the few things I really care about from a coding style 
standpoint.

So I'm not surprised that "-fno-inline-functions-called-once" makes things 
larger, because I think it's generally a good idea to inline things that 
are just called once. But it does make things harder to debug, and the 
performance advantages become increasingly small for bigger functions.

And that's a balancing act. Do we care about performance? Yes. But do we 
care so much that it's worth inlining something like buffered_rmqueue()? 

So I would not be surprised if "-fno-inline-functions-called-once" will 
disable *all* the inlining heuristics, and say "oh, it's not an inline 
function, and it's only called once, so we won't inline it at all".

So "called once" should probably make the inlining weight bigger (ie 
inline *larger* functions than you would otherwise), it just shouldn't 
make it "infinite". It's not worth it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Al Boldi
Patrick McHardy wrote:
> Al Boldi wrote:
> > Patrick McHardy wrote:
> >>But I vaguely recall having tried this myself and it broke somewhere,
> >>maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
> >>recall anymore. Al, if this also works without removal of
> >>NF_CONNTRACK_ENABLED, please resend without that part.
> >
> > It doesn't.  But how about this, if you really can't live without
> > NF_CONNTRACK_ENBLED:
> >
> > ==
> > --- Kconfig.old 2007-07-09 06:38:52.0 +0300
> > +++ Kconfig 2007-07-24 20:24:27.0 +0300
> > @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
> >   and is also scheduled to replace the old syslog-based ipt_LOG
> >   and ip6t_LOG modules.
> >
> > -# Rename this to NF_CONNTRACK in a 2.6.25
> > -config NF_CONNTRACK_ENABLED
> > +config NF_CONNTRACK
> > tristate "Netfilter connection tracking support"
> > help
> >   Connection tracking keeps a record of what packets have passed
> > @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
> >
> >   To compile it as a module, choose M here.  If unsure, say N.
> >
> > -config NF_CONNTRACK
> > +config NF_CONNTRACK_ENABLED
> > tristate
> > -   default NF_CONNTRACK_ENABLED
> > +   default NF_CONNTRACK
> >
> >  config NF_CT_ACCT
> > bool "Connection tracking flow accounting"
>
> That defeats the only purpose why we kept it.

I'm not sure how this would defeat the only purpose.  Isn't the purpose of 
this to alias NF_CONNTRACK_ENABLED to NF_CONNTRACK?  And as such would yield 
the same result.

Also, we could leave this as is, and select NF_CONNTRACK_ENABLED instead of 
NF_CONNTRACK.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Andrew Morton
On Tue, 24 Jul 2007 11:14:21 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Tue, 24 Jul 2007, Adrian Bunk wrote:
> > 
> > buffered_rmqueue() and prep_new_page() are static functions with only 
> > one caller each, and for the normal non-debug case it's a really nice 
> > optimization to have them inlined automatically.
> 
> I'm not at all sure I agree.
> 
> Inlining big functions doesn't actually tend to generally generate any 
> better code, so if gcc's logic is "single callsite - always inline", then 
> that logic is likely not right.
> 

fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
vmlinux .text from 928360 up to 955362 bytes (27k larger).

A surprisingly large increase - I wonder if it did something dumb.  It
appears to still correctly inline those things which we've manually marked
inline.  hm.

It would be nice to defeat the autoinlining for debug purposes though.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Andrew Morton wrote:
> 
> I guess this was the bug:

Looks very likely to me. Mike, Alexey, does this fix things for you?

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NET_DMA: where do we ever call dma_skb_copy_datagram_iovec() with NULL pinned_list?

2007-07-24 Thread Nelson, Shannon
Al Viro:
>
>   AFAICS, all callers of dma_skb_copy_datagram_iovec()
>are either
>   * recursive for fragments, pass pinned_list unchanged or
>   * called from tcp, with pinned_list coming from
>tp->ucopy.pinned_list and only when tp->ucopy.dma_chan is non-NULL.
>
>Now, all non-NULL assignments to ->dma_chan have the same form:
>   if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
>   tp->ucopy.dma_chan = get_softnet_dma();
>IOW, if ->ucopy.pinned_list stays NULL, ->ucopy.dma_chan will 
>do the same.
>
>Moreover, any place that resets ->ucopy.pinned_list will also reset
>->ucopy.dma_chan.
>
>IOW, we can't ever get non-NULL tp->ucopy.dma_chan while 
>tp->ucopy.pinned_list
>is NULL.  So how can we ever get to the dma_memcpy_to_kernel_iovec()?

It looks like this is "extra" code.  The history seems to be that it was
thought to be useful for internal copying, perhaps for smbfs or iSCSI,
as the comment suggests.  However, since no one is using it, it can
probably come out.  If there is no argument, I'll post a patch to remove
it.

sln
--
==
Mr. Shannon Nelson LAN Access Division, Intel Corp.
[EMAIL PROTECTED]I don't speak for Intel
(503) 712-7659Parents can't afford to be squeamish.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Linus Torvalds


On Tue, 24 Jul 2007, Adrian Bunk wrote:
> 
> buffered_rmqueue() and prep_new_page() are static functions with only 
> one caller each, and for the normal non-debug case it's a really nice 
> optimization to have them inlined automatically.

I'm not at all sure I agree.

Inlining big functions doesn't actually tend to generally generate any 
better code, so if gcc's logic is "single callsite - always inline", then 
that logic is likely not right.

Linus
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Adrian Bunk
On Mon, Jul 23, 2007 at 02:28:11PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 23 Jul 2007, Andrew Morton wrote:
> > 
> > It'd be nice to get a clean trace.  Are you able to obtain the full
> > trace with CONFIG_FRAME_POINTER=y?
> 
> If you are talking about
> 
>   http://userweb.kernel.org/~akpm/dsc03659.jpg
> 
> then I think that _is_ a full trace. It's certainly not very messy, and it 
> seems accurate. It's just that inlining makes it much harder to see the 
> call-graphs, but that's what inlining does..
> 
> For example, missing from the call graph is
> 
>   get_page_from_freelist ->
> buffered_rmqueue ->   [ missing - inlined ]
>   prep_new_page ->[ missing - inlined ]
> prep_zero_page -> [ missing - inlined ]
>   clear_highpage ->   [ missing - inlined ]
> kmap_atomic ->[ missing - tailcall ]
>   kmap_atomic_prot
> 
> but I'm pretty sure the call trace is good (and I'm also pretty sure gcc 
> is overly aggressive at inlining, and that it causes us pain for 
> debugging, but whatever)
>...

For prep_zero_page() and clear_highpage() we can't blame gcc since we 
force gcc to always inline them.

buffered_rmqueue() and prep_new_page() are static functions with only 
one caller each, and for the normal non-debug case it's a really nice 
optimization to have them inlined automatically. But it might make sense 
to add -fno-inline-functions-called-once to the CFLAGS depending on some 
debug option?

>   Linus

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Al Boldi wrote:
> Patrick McHardy wrote:
> 
>>But I vaguely recall having tried this myself and it broke somewhere,
>>maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
>>recall anymore. Al, if this also works without removal of
>>NF_CONNTRACK_ENABLED, please resend without that part.
> 
> 
> It doesn't.  But how about this, if you really can't live without 
> NF_CONNTRACK_ENBLED:
> 
> ==
> --- Kconfig.old   2007-07-09 06:38:52.0 +0300
> +++ Kconfig   2007-07-24 20:24:27.0 +0300
> @@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
> and is also scheduled to replace the old syslog-based ipt_LOG
> and ip6t_LOG modules.
>  
> -# Rename this to NF_CONNTRACK in a 2.6.25
> -config NF_CONNTRACK_ENABLED
> +config NF_CONNTRACK
>   tristate "Netfilter connection tracking support"
>   help
> Connection tracking keeps a record of what packets have passed
> @@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
>  
> To compile it as a module, choose M here.  If unsure, say N.
>  
> -config NF_CONNTRACK
> +config NF_CONNTRACK_ENABLED
>   tristate
> - default NF_CONNTRACK_ENABLED
> + default NF_CONNTRACK
>  
>  config NF_CT_ACCT
>   bool "Connection tracking flow accounting"


That defeats the only purpose why we kept it. How about we change this
once we remove it, in 2.6.25?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Al Boldi
Patrick McHardy wrote:
> Sam Ravnborg wrote:
> > On Tue, Jul 24, 2007 at 08:36:33AM +0300, Al Boldi wrote:
> >>Replaces NF_CONNTRACK_ENABLED with NF_CONNTRACK and selects it for
> >>NF_CONNTRACK_IPV4 and NF_CONNTRACK_IPV6
> >>
> >>This exposes IPv4/6 connection tracking options for easier Kconfig
> >> setup.
> >>
> >>Signed-off-by: Al Boldi <[EMAIL PROTECTED]>
> >>Cc: David Miller <[EMAIL PROTECTED]>
> >>Cc: Sam Ravnborg <[EMAIL PROTECTED]>
> >>Cc: Andrew Morton <[EMAIL PROTECTED]>
> >>---
> >>--- a/net/netfilter/Kconfig 2007-07-09 06:38:52.0 +0300
> >>+++ b/net/netfilter/Kconfig 2007-07-24 08:28:06.0 +0300
> >>@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
> >>  and is also scheduled to replace the old syslog-based ipt_LOG
> >>  and ip6t_LOG modules.
> >>
> >>-# Rename this to NF_CONNTRACK in a 2.6.25
> >>-config NF_CONNTRACK_ENABLED
> >>+config NF_CONNTRACK
>
> We kept this mainly for an easier upgrade. As the comment states, it
> should go in 2.6.25, at which time all people having reconfigured
> their kernel at least once since ip_conntrack was removed will have
> the NF_CONNTRACK option set to the same value as NF_CONNTRACK_ENABLED.
>
> >>--- a/net/ipv4/netfilter/Kconfig2007-07-09 06:38:50.0 +0300
> >>+++ b/net/ipv4/netfilter/Kconfig2007-07-24 08:27:39.0 +0300
> >>@@ -7,7 +7,7 @@ menu "IP: Netfilter Configuration"
> >>
> >> config NF_CONNTRACK_IPV4
> >>tristate "IPv4 connection tracking support (required for NAT)"
> >>-   depends on NF_CONNTRACK
> >>+   select NF_CONNTRACK
> >>---help---
> >>  Connection tracking keeps a record of what packets have passed
> >>  through your machine, in order to figure out how they are related
> >>--- a/net/ipv6/netfilter/Kconfig2007-07-09 06:38:51.0 +0300
> >>+++ b/net/ipv6/netfilter/Kconfig2007-07-24 08:27:54.0 +0300
> >>@@ -7,7 +7,8 @@ menu "IPv6: Netfilter Configuration (EXP
> >>
> >> config NF_CONNTRACK_IPV6
> >>tristate "IPv6 connection tracking support (EXPERIMENTAL)"
> >>-   depends on INET && IPV6 && EXPERIMENTAL && NF_CONNTRACK
> >>+   depends on INET && IPV6 && EXPERIMENTAL
> >>+   select NF_CONNTRACK
> >>---help---
> >>  Connection tracking keeps a record of what packets have passed
> >>  through your machine, in order to figure out how they are related
> >
> > This change looks wrong.
> > Due to the reverse nature of "select" kconfig cannot fulfill the
> > dependencies of selected symbols. So as a rule of thumb select should
> > only select symbols with no menu and no dependencies to avoid some of
> > the
> > problems that have popped up during the last months.
>
> In this case it looks OK since the dependencies of IPv4 connection
> tracking are (besides NF_CONNTRACK) are superset of those of
> nf_conntrack.
>
> But I vaguely recall having tried this myself and it broke somewhere,
> maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
> recall anymore. Al, if this also works without removal of
> NF_CONNTRACK_ENABLED, please resend without that part.

It doesn't.  But how about this, if you really can't live without 
NF_CONNTRACK_ENBLED:

==
--- Kconfig.old 2007-07-09 06:38:52.0 +0300
+++ Kconfig 2007-07-24 20:24:27.0 +0300
@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
  and is also scheduled to replace the old syslog-based ipt_LOG
  and ip6t_LOG modules.
 
-# Rename this to NF_CONNTRACK in a 2.6.25
-config NF_CONNTRACK_ENABLED
+config NF_CONNTRACK
tristate "Netfilter connection tracking support"
help
  Connection tracking keeps a record of what packets have passed
@@ -40,9 +39,9 @@ config NF_CONNTRACK_ENABLED
 
  To compile it as a module, choose M here.  If unsure, say N.
 
-config NF_CONNTRACK
+config NF_CONNTRACK_ENABLED
tristate
-   default NF_CONNTRACK_ENABLED
+   default NF_CONNTRACK
 
 config NF_CT_ACCT
bool "Connection tracking flow accounting"
==


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: specifying scopid's for link-local IPv6 addrs

2007-07-24 Thread Rick Jones



You must explicitly specify the desired interface.  For example,
on my test system, the correct interface is eth6 which is interface 8
(lo eth0 eth1 eth2 ... eth5 eth6).  Here is an example nuttcp test
specifying interface 8:

chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8
 1178.5809 MB /  10.02 sec =  986.2728 Mbps 12 %TX 15 %RX

nuttcp uses getaddrinfo() which parses the "%" field,
and then copies the sin6_scope_id from the res structure to the
server's sockaddr_in6 structure before initiating the connect().



OK, I'll give that a quick try with netperf:

[EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M 
-m 64K -H fe80::207:43ff:fe05:9d%2
TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::207:43ff:fe05:9d%2 (fe80::207:43ff:fe05:9d) port 0 AF_INET6 : 
+/-2.5% @ 99% conf.


Cool - it establishes the data connection just fine.


Well, I spoke too soon - while it got me past my EINVAL, the connection 
establishement timed-out.  Either I picked the wrong value for n, or I may yet 
need to make some tweaks to netperf.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: specifying scopid's for link-local IPv6 addrs

2007-07-24 Thread Sridhar Samudrala
On Tue, 2007-07-24 at 10:13 -0700, Rick Jones wrote:
> > Rick,
> > 
> > I don't see any way around this.  For example, on one of my test
> > systems, I have the following link local routes:
> > 
> > chance% netstat -A inet6 -rn | grep fe80::/64
> > fe80::/64   ::  
> > U 25600 eth0
> > fe80::/64   ::  
> > U 25600 eth2
> > fe80::/64   ::  
> > U 25600 eth3
> > fe80::/64   ::  
> > U 25600 eth4
> > fe80::/64   ::  
> > U 25600 eth5
> > fe80::/64   ::  
> > U 25600 eth6
> > 
> > So if I want to run a link local test to fe80::202:b3ff:fed4:cd1,
> > the system has no way to choose which is the correct interface to
> > use for the test, and will give an error if the interface isn't
> > specified. 
> 
> Yeah, I was wondering about that.  I'm not sure if the attempts on "those 
> other 
> OSes" happened to involve multiple interfaces or not.  Even so, it "feels" 
> unpleasant for an application to deal with and I wonder if there is a way for 
> a 
> stack to deal with it on the application's behalf.  I guess that might 
> involve 
> some sort of layer violation between neightbor discovery and routing (typing 
> while I think about things I know little about...)
> 
> Is there RFC chapter and verse I might read about routing with multiple 
> link-local's on a system?
> 
> > You must explicitly specify the desired interface.  For example,
> > on my test system, the correct interface is eth6 which is interface 8
> > (lo eth0 eth1 eth2 ... eth5 eth6).  Here is an example nuttcp test
> > specifying interface 8:
> > 
> > chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8
> >  1178.5809 MB /  10.02 sec =  986.2728 Mbps 12 %TX 15 %RX
> > 
> > nuttcp uses getaddrinfo() which parses the "%" field,
> > and then copies the sin6_scope_id from the res structure to the
> > server's sockaddr_in6 structure before initiating the connect().
> 
> OK, I'll give that a quick try with netperf:
> 
> [EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M 
> -m 64K 
> -H fe80::207:43ff:fe05:9d%2

We can even specify the interface name instead of the interface index
   %ethX

getaddrinfo() uses if_nametoindex() internally to get the index.

Thanks
Sridhar
> TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to fe80::207:43ff:fe05:9d%2 
> (fe80::207:43ff:fe05:9d) port 0 AF_INET6 : +/-2.5% @ 99% conf.
> 
> Cool - it establishes the data connection just fine.
> 
> 
> To further demonstrate my ignorance :)  is that %n suffix something one might 
> expect in most/all getaddrinfo()'s or is that unique to the one in Linux?
> 
> rick jones
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NET_DMA: where do we ever call dma_skb_copy_datagram_iovec() with NULL pinned_list?

2007-07-24 Thread Andrew Grover

On 7/20/07, Al Viro <[EMAIL PROTECTED]> wrote:

AFAICS, all callers of dma_skb_copy_datagram_iovec()
are either
* recursive for fragments, pass pinned_list unchanged or
* called from tcp, with pinned_list coming from
tp->ucopy.pinned_list and only when tp->ucopy.dma_chan is non-NULL.

Now, all non-NULL assignments to ->dma_chan have the same form:
if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
tp->ucopy.dma_chan = get_softnet_dma();
IOW, if ->ucopy.pinned_list stays NULL, ->ucopy.dma_chan will do the same.

Moreover, any place that resets ->ucopy.pinned_list will also reset
->ucopy.dma_chan.

IOW, we can't ever get non-NULL tp->ucopy.dma_chan while tp->ucopy.pinned_list
is NULL.  So how can we ever get to the dma_memcpy_to_kernel_iovec()?


Shannon what do you think? Is this a bug, or does it no longer support
non-userspace iovecs?

Thanks -- Andy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: specifying scopid's for link-local IPv6 addrs

2007-07-24 Thread Rick Jones

Rick,

I don't see any way around this.  For example, on one of my test
systems, I have the following link local routes:

chance% netstat -A inet6 -rn | grep fe80::/64
fe80::/64   ::  
U 25600 eth0
fe80::/64   ::  
U 25600 eth2
fe80::/64   ::  
U 25600 eth3
fe80::/64   ::  
U 25600 eth4
fe80::/64   ::  
U 25600 eth5
fe80::/64   ::  
U 25600 eth6

So if I want to run a link local test to fe80::202:b3ff:fed4:cd1,
the system has no way to choose which is the correct interface to
use for the test, and will give an error if the interface isn't
specified. 


Yeah, I was wondering about that.  I'm not sure if the attempts on "those other 
OSes" happened to involve multiple interfaces or not.  Even so, it "feels" 
unpleasant for an application to deal with and I wonder if there is a way for a 
stack to deal with it on the application's behalf.  I guess that might involve 
some sort of layer violation between neightbor discovery and routing (typing 
while I think about things I know little about...)


Is there RFC chapter and verse I might read about routing with multiple 
link-local's on a system?



You must explicitly specify the desired interface.  For example,
on my test system, the correct interface is eth6 which is interface 8
(lo eth0 eth1 eth2 ... eth5 eth6).  Here is an example nuttcp test
specifying interface 8:

chance% nuttcp -P5100 fe80::202:b3ff:fed4:cd1%8
 1178.5809 MB /  10.02 sec =  986.2728 Mbps 12 %TX 15 %RX

nuttcp uses getaddrinfo() which parses the "%" field,
and then copies the sin6_scope_id from the res structure to the
server's sockaddr_in6 structure before initiating the connect().


OK, I'll give that a quick try with netperf:

[EMAIL PROTECTED] ~]# netperf -H 192.168.2.107 -c -C -i 30,3 -- -s 1M -S 1M -m 64K 
-H fe80::207:43ff:fe05:9d%2
TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to fe80::207:43ff:fe05:9d%2 
(fe80::207:43ff:fe05:9d) port 0 AF_INET6 : +/-2.5% @ 99% conf.


Cool - it establishes the data connection just fine.


To further demonstrate my ignorance :)  is that %n suffix something one might 
expect in most/all getaddrinfo()'s or is that unique to the one in Linux?


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: revised make xfrm_audit_log more generic patch

2007-07-24 Thread Steve Grubb
On Tuesday 24 July 2007 12:33:26 pm Joy Latten wrote:
> > It also wouldn't hurt to change the text being sent to this function to
> > have a hyphen instead of a space, so "SPD delete" becomes "SPD-delete".
> > This keeps the parser happy.
>
> Steve, more for my education, should all entries have this sort of
> syntax, that is, a hyphen in it?

Only if its something that is important to have associated in reports. More 
that 1 or 2 hyphens is probably not good.

> I imagine some entries might be a bit more wordy and so I was wondering
> ahead of time how to do it.

The audit logs should be short as possible but contain everything necessary. 
You can have language in the record that makes it more understandable for 
people reading the raw record, but it won't necessarily be picked up by 
report parsers for searching or presentation.

If you want me to help review the choices, let me know offline and we can work 
through it.

-Steve
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][30/37] Clean up duplicate includes in net/netfilter/

2007-07-24 Thread Jesper Juhl

On 24/07/07, Patrick McHardy <[EMAIL PROTECTED]> wrote:

Jesper Juhl wrote:
> This patch cleans up duplicate includes in
>   net/netfilter/


I've queued this one and the bridge-netfilter patch (27/37), thanks
Jesper.


Thanks for the feedback Patrick.

--
Jesper Juhl <[EMAIL PROTECTED]>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please  http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Netfilter Kconfig: Expose IPv4/6 connection tracking options by selecting NF_CONNTRACK

2007-07-24 Thread Patrick McHardy
Sam Ravnborg wrote:
> On Tue, Jul 24, 2007 at 08:36:33AM +0300, Al Boldi wrote:
> 
>>Replaces NF_CONNTRACK_ENABLED with NF_CONNTRACK and selects it for 
>>NF_CONNTRACK_IPV4 and NF_CONNTRACK_IPV6
>>
>>This exposes IPv4/6 connection tracking options for easier Kconfig setup.
>>
>>Signed-off-by: Al Boldi <[EMAIL PROTECTED]>
>>Cc: David Miller <[EMAIL PROTECTED]>
>>Cc: Sam Ravnborg <[EMAIL PROTECTED]>
>>Cc: Andrew Morton <[EMAIL PROTECTED]>
>>---
>>--- a/net/netfilter/Kconfig   2007-07-09 06:38:52.0 +0300
>>+++ b/net/netfilter/Kconfig   2007-07-24 08:28:06.0 +0300
>>@@ -25,8 +25,7 @@ config NETFILTER_NETLINK_LOG
>>and is also scheduled to replace the old syslog-based ipt_LOG
>>and ip6t_LOG modules.
>> 
>>-# Rename this to NF_CONNTRACK in a 2.6.25
>>-config NF_CONNTRACK_ENABLED
>>+config NF_CONNTRACK


We kept this mainly for an easier upgrade. As the comment states, it
should go in 2.6.25, at which time all people having reconfigured
their kernel at least once since ip_conntrack was removed will have
the NF_CONNTRACK option set to the same value as NF_CONNTRACK_ENABLED.

>>--- a/net/ipv4/netfilter/Kconfig  2007-07-09 06:38:50.0 +0300
>>+++ b/net/ipv4/netfilter/Kconfig  2007-07-24 08:27:39.0 +0300
>>@@ -7,7 +7,7 @@ menu "IP: Netfilter Configuration"
>> 
>> config NF_CONNTRACK_IPV4
>>  tristate "IPv4 connection tracking support (required for NAT)"
>>- depends on NF_CONNTRACK
>>+ select NF_CONNTRACK
>>  ---help---
>>Connection tracking keeps a record of what packets have passed
>>through your machine, in order to figure out how they are related
>>--- a/net/ipv6/netfilter/Kconfig  2007-07-09 06:38:51.0 +0300
>>+++ b/net/ipv6/netfilter/Kconfig  2007-07-24 08:27:54.0 +0300
>>@@ -7,7 +7,8 @@ menu "IPv6: Netfilter Configuration (EXP
>> 
>> config NF_CONNTRACK_IPV6
>>  tristate "IPv6 connection tracking support (EXPERIMENTAL)"
>>- depends on INET && IPV6 && EXPERIMENTAL && NF_CONNTRACK
>>+ depends on INET && IPV6 && EXPERIMENTAL
>>+ select NF_CONNTRACK
>>  ---help---
>>Connection tracking keeps a record of what packets have passed
>>through your machine, in order to figure out how they are related
>>
> 
> This change looks wrong.
> Due to the reverse nature of "select" kconfig cannot fulfill the dependencies
> of selected symbols. So as a rule of thumb select should only select
> symbols with no menu and no dependencies to avoid some of the
> problems that have popped up during the last months.


In this case it looks OK since the dependencies of IPv4 connection
tracking are (besides NF_CONNTRACK) are superset of those of
nf_conntrack.

But I vaguely recall having tried this myself and it broke somewhere,
maybe it was because of the NF_CONNTRACK_ENABLED option, I can't
recall anymore. Al, if this also works without removal of
NF_CONNTRACK_ENABLED, please resend without that part.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] netxen: bug fixes for multiport adapters

2007-07-24 Thread Dhananjay Phadke

Jeff,

Any chance of these patches getting committed soon?

Thanks,
-Dhananjay Phadke

On 7/21/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

These patches include fix for problem with 2nd port of multiport adapters
on IBM blades. Also improves interrupt handling for multiport adapters
avoiding interrupt flood after interrupt is down.

Generated against upstream-fixes.

 drivers/net/netxen/netxen_nic.h  |3 +-
 drivers/net/netxen/netxen_nic_main.c |   85 +++---
 2 files changed, 43 insertions(+), 45 deletions(-)

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: revised make xfrm_audit_log more generic patch

2007-07-24 Thread Joy Latten
On Tue, 2007-07-24 at 11:04 -0400, Steve Grubb wrote:

> It also wouldn't hurt to change the text being sent to this function to have 
> a 
> hyphen instead of a space, so "SPD delete" becomes "SPD-delete". This keeps 
> the parser happy.
> 
Steve, more for my education, should all entries have this sort of
syntax, that is, a hyphen in it? I imagine some entries might be a 
bit more wordy and so I was wondering ahead of time how to do it.

Thanks!!

Joy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Brian Haley

Thomas Graf wrote:

@@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
 void genl_unregister_mc_group(struct genl_family *family,
  struct genl_multicast_group *grp)
 {
-   BUG_ON(grp->family != family);
genl_lock();
-   netlink_clear_multicast_users(genl_sock, grp->id);
-   clear_bit(grp->id, mc_groups);
-   list_del(&grp->list);
-   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
-   grp->id = 0;
-   grp->family = NULL;
+   genl_unregister_mc_group(family, grp);
genl_unlock();
 }


Shouldn't this be __genl_unregister_mc_group(family, grp) ?

-Brian
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: repost work around 82571 completion timout on pseries HW

2007-07-24 Thread Andy Gospodarek
On Tue, Jul 24, 2007 at 08:31:08AM -0700, Kok, Auke wrote:
> Andy Gospodarek wrote:
> >There seems to have been some discussion about a patch like this in the
> >past but I still haven't noticed any platforms fixes or noticed that
> >this got included, so I'd like to propose this modified version.
> >
> >Thoughts?
> 
> I've written a completely new version for IBM based on the response that 
> implements this as a pci quirk (to tag the chipset, not the device).
> 
> However IBM has spotted an issue due to firmware hiding the id for linux of 
> the bridge involved (on pseries), and they're working on an alternative to 
> fix that.
> 
> So, the patch will look different, and I'll post it as soon as we have a 
> version for both pseries and xseries that works.
> 
> Auke
> 

Sounds great.  I just wanted to make sure this didn't slip through the
cracks.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Andrew Morton
On Tue, 24 Jul 2007 12:01:09 +0200 Mike Galbraith <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:
> 
> > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that 
> > out.
> 
> My box bugged during boot the first time I booted 23-rc1, but nothing
> made it to the console, and I didn't have a serial console running.  I
> didn't have DEBUG_PAGEALLOC or friends set.
> 
> > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > Both traces point up into the page allocator, but I _think_ that's stack
> > gunk.
> 
> I just enabled all debug options, and was just rewarded with the below.

doh.  It's a slab bug.

> [  119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
> [  119.558867] [ cut here ]
> [  119.572197] kernel BUG at arch/i386/mm/highmem.c:38!
> [  119.585804] invalid opcode:  [#1]
> [  119.598013] PREEMPT SMP DEBUG_PAGEALLOC
> [  119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp 
> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter 
> ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables 
> ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 
> snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer 
> snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore 
> snd_page_alloc i2c_i801 fan thermal processor
> [  119.698063] CPU:1
> [  119.698065] EIP:0060:[]Not tainted VLI
> [  119.698067] EFLAGS: 00010006   (2.6.23-rc1-smp #75)
> [  119.736358] EIP is at kmap_atomic_prot+0xa7/0xab
> [  119.749647] eax: 3d07f163   ebx: c166db80   ecx: c0750e60   edx: 0007
> [  119.765417] esi: 0022   edi: 0163   ebp: c069dcd4   esp: c069dcc8
> [  119.781273] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [  119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 
> task.ti=f477d000)
> [  119.804068] Stack: c166db80  c166db80 c069dcdc c011cd3f c069dd40 
> c015b6e0 0001 
> [  119.822272]0044 0163  0001 c165f4e0 0001 
> c165f4e0 0001 
> [  119.840762] 00028020 c061e71c c166db80 0046 0080 
> 0001 c011e4de 
> [  119.859389] Call Trace:
> [  119.881302]  [] show_trace_log_lvl+0x1a/0x30
> [  119.896319]  [] show_stack_log_lvl+0xa5/0xca
> [  119.911171]  [] show_registers+0x1fc/0x343
> [  119.925756]  [] die+0x122/0x249
> [  119.939241]  [] do_trap+0x84/0xad
> [  119.952897]  [] do_invalid_op+0x88/0x92
> [  119.967118]  [] error_code+0x72/0x78
> [  119.980948]  [] kmap_atomic+0xe/0x10
> [  119.994642]  [] get_page_from_freelist+0x39e/0x45e
> [  120.009485]  [] __alloc_pages+0x5b/0x2db
> [  120.023342]  [] cache_alloc_refill+0x380/0x6f2
> [  120.037623]  [] kmem_cache_alloc+0xa1/0xa5
> [  120.051426]  [] neigh_create+0x5f/0x506
> [  120.064894]  [] ndisc_dst_alloc+0x122/0x151
> [  120.078769]  [] __ndisc_send+0x8d/0x4fa
> [  120.092340]  [] ndisc_send_ns+0x5f/0x7d
> [  120.105848]  [] addrconf_dad_timer+0xdb/0xe0
> [  120.119758]  [] run_timer_softirq+0x130/0x191
> [  120.133717]  [] __do_softirq+0x76/0xe4
> [  120.147475]  [] do_softirq+0x63/0xac
> [  120.147488]  [] 


> (gdb) list *neigh_create+0x5f
> 0xc03fb397 is in neigh_create (include/linux/slab.h:259).
> 254 /*
> 255  * Shortcuts
> 256  */
> 257 static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t 
> flags)
> 258 {
> 259 return kmem_cache_alloc(k, flags | __GFP_ZERO);
> 260 }

See, networking's kmem_cache_alloc(..., __GFP_ZERO) ended up calling into
the page allocator with __GFP_ZERO.  This is the bug - slab isn't supposed
to do that: the __GFP_ZERO is supposed to be removed.

Now, it's not a highmem page, so prep_zero_page() won't actually establish
a kmap, but it will check that the kmap slot is presently unused on this
CPU.

But networking calls in here from softirq context (illegal for KM_USER0)
and sometimes that KM_USER0 slot *will* be in use, so kmap_atomic_prot()
will go BUG.

I must say it's really really scary that such a low-level function as
prep_zero_page() is using KM_USER0.  I don't think it has enough debugging
checks in there to prevent Bad Stuff from going undetected.

I guess this was the bug:

--- a/mm/slab.c~a
+++ a/mm/slab.c
@@ -2776,7 +2776,7 @@ static int cache_grow(struct kmem_cache 
 * 'nodeid'.
 */
if (!objp)
-   objp = kmem_getpages(cachep, flags, nodeid);
+   objp = kmem_getpages(cachep, local_flags, nodeid);
if (!objp)
goto failed;
 
_


I don't see why you later got fs corruption - afacit we won't actually
modify the KM_USER0 slot in this scenario.


> 262 /**
> 263  * kzalloc - allocate memory. The memory is set to zero.
> (gdb) list *kmem_cache_alloc+0xa1
> 0xc0172e7a is in kmem_cache_alloc (mm/slab.c:3176).
> 3171STATS_INC_ALLOCHIT(cachep);
> 3172   

Re: [PATCH]: revised make xfrm_audit_log more generic patch

2007-07-24 Thread Joy Latten
On Tue, 2007-07-24 at 11:04 -0400, Steve Grubb wrote:

> > +   audit_log_format(audit_buf, "%s: auid=%u", buf, auid);
> >  
> > if (sid != 0 &&
> > security_secid_to_secctx(sid, &secctx, &secctx_len) == 0)
> 
> The operation in buf will not be parsed by the user space tools. Let's 
> use "op=%s " where you have "%s: " above. Audit record fields are name=value 
> and fields separated by spaces. "op" is what we are using in other places to 
> mean operation. 
> 
> I know its a change from the records above, but we previously had some detail 
> about what operation was being performed by the record type and this did not 
> matter so much. Now that we only have one event type, the meaning of the 
> event being recorded needs to be parsable and in a field. 
> 
> It also wouldn't hurt to change the text being sent to this function to have 
> a 
> hyphen instead of a space, so "SPD delete" becomes "SPD-delete". This keeps 
> the parser happy.
> 
> This patch otherwise looks good.

Sounds good. I will make the changes and resend. 
Thanks!!

Joy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in xfrm_bundle_ok

2007-07-24 Thread Ken-ichirou MATSUZAWA
 Hello,

From: Ken-ichirou MATSUZAWA <[EMAIL PROTECTED]>
Subject: Oops in xfrm_bundle_ok
Date: Thu, 05 Jul 2007 02:47:10 +0900 (JST)

> I got Oops like below. I glanced xfrm_bundle_ok() in
> xfrm_policy.c and __xfrm4.bundle_create() in xfrm4_policy.c.

It seems this was fixed by below, thanks a lot.

> commit bd0bf0765ea1fba80d7085e1f0375ec045631dc1
> Author: Patrick McHardy <[EMAIL PROTECTED]>
> Date:   Wed Jul 18 01:55:52 2007 -0700
>
> [XFRM]: Fix crash introduced by struct dst_entry reordering


@@ -2141,7 +2141,7 @@ int xfrm_bundle_ok(struct xfrm_policy *pol, struct 
xfrm_dst *first,
if (last == first)
break;
 
-   last = last->u.next;
+   last = (struct xfrm_dst *)last->u.dst.next;
last->child_mtu_cached = mtu;
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][30/37] Clean up duplicate includes in net/netfilter/

2007-07-24 Thread Patrick McHardy
Jesper Juhl wrote:
> This patch cleans up duplicate includes in
>   net/netfilter/


I've queued this one and the bridge-netfilter patch (27/37), thanks
Jesper.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: repost work around 82571 completion timout on pseries HW

2007-07-24 Thread Kok, Auke

Andy Gospodarek wrote:

There seems to have been some discussion about a patch like this in the
past but I still haven't noticed any platforms fixes or noticed that
this got included, so I'd like to propose this modified version.

Thoughts?


I've written a completely new version for IBM based on the response that 
implements this as a pci quirk (to tag the chipset, not the device).


However IBM has spotted an issue due to firmware hiding the id for linux of the 
bridge involved (on pseries), and they're working on an alternative to fix that.


So, the patch will look different, and I'll post it as soon as we have a version 
for both pseries and xseries that works.


Auke




Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
---

 e1000_hw.c |7 +++
 e1000_hw.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index 9be4469..7c75b8b 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -1030,6 +1030,13 @@ e1000_init_hw(struct e1000_hw *hw)
 break;
 }
 
+#if defined(CONFIG_PPC_PSERIES)

+if (hw->mac_type == e1000_82571) {
+uint32_t gcr = E1000_READ_REG(hw, GCR);
+gcr |= E1000_GCR_DISABLE_TIMEOUT_MECHANISM;
+E1000_WRITE_REG(hw, GCR, gcr);
+}
+#endif 
 
 if (hw->mac_type == e1000_82573) {

 uint32_t gcr = E1000_READ_REG(hw, GCR);
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index bd000b8..71f4e2d 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -2211,6 +2211,7 @@ struct e1000_host_command_info {
 #define PCI_EX_82566_SNOOP_ALL PCI_EX_NO_SNOOP_ALL
 
 #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x0800

+#define E1000_GCR_DISABLE_TIMEOUT_MECHANISM 0x8000
 /* Function Active and Power State to MNG */
 #define E1000_FACTPS_FUNC0_POWER_STATE_MASK 0x0003
 #define E1000_FACTPS_LAN0_VALID 0x0004

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] e1000: repost work around 82571 completion timout on pseries HW

2007-07-24 Thread Andy Gospodarek

There seems to have been some discussion about a patch like this in the
past but I still haven't noticed any platforms fixes or noticed that
this got included, so I'd like to propose this modified version.

Thoughts?

Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
---

 e1000_hw.c |7 +++
 e1000_hw.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index 9be4469..7c75b8b 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -1030,6 +1030,13 @@ e1000_init_hw(struct e1000_hw *hw)
 break;
 }
 
+#if defined(CONFIG_PPC_PSERIES)
+if (hw->mac_type == e1000_82571) {
+uint32_t gcr = E1000_READ_REG(hw, GCR);
+gcr |= E1000_GCR_DISABLE_TIMEOUT_MECHANISM;
+E1000_WRITE_REG(hw, GCR, gcr);
+}
+#endif 
 
 if (hw->mac_type == e1000_82573) {
 uint32_t gcr = E1000_READ_REG(hw, GCR);
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index bd000b8..71f4e2d 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -2211,6 +2211,7 @@ struct e1000_host_command_info {
 #define PCI_EX_82566_SNOOP_ALL PCI_EX_NO_SNOOP_ALL
 
 #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x0800
+#define E1000_GCR_DISABLE_TIMEOUT_MECHANISM 0x8000
 /* Function Active and Power State to MNG */
 #define E1000_FACTPS_FUNC0_POWER_STATE_MASK 0x0003
 #define E1000_FACTPS_LAN0_VALID 0x0004
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.22-rc7] xfrm state selection update to use inner addresses

2007-07-24 Thread Patrick McHardy
Joakim Koskela wrote:
> This patch modifies the xfrm state selection logic to use the inner
> addresses where the outer have been (incorrectly) used. This is
> required for beet mode in general and interfamily setups in both
> tunnel and beet mode.


Looks good.

Acked-by: Patrick McHardy <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Dan Williams

On 7/24/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
[...]

> What about the new async crypto stuff? I've been looking, but is it
> guarenteed that async_memcpy() runs in process context with interrupts
> enabled always? If not, there's a km type bug there.

I think Shannon maintains that now.



I am looking after the async_tx API, I will send a patch to update
MAINTAINERS shortly.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Dan Williams

On 7/24/07, Jens Axboe <[EMAIL PROTECTED]> wrote:

What about the new async crypto stuff? I've been looking, but is it
guarenteed that async_memcpy() runs in process context with interrupts
enabled always? If not, there's a km type bug there.



Currently the only user is the MD raid456 driver, and yes, it only
performs copies from the handle_stripe routine which is always run in
process context with interrupts enabled.  However this is not
documented.  Would it be advisable to add a WARN_ON for this
condition?


In general, I think the highmem stuff could do with more safety checks:

- People ALWAYS get the atomic unmaps wrong, passing in the page instead
  of the address. I've seen tons of these. And since kunmap_atomic()
  takes a void pointer, nobody notices until it goes boom.
- People easily get the km type wrong - they use KM_USERx in interrupt
  context, or one of the irq variants without disabling interrupts.

If we could just catch these two types of bugs, we've got a lot of these
problems covered.


--
Jens Axboe


--
Dan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add new timeval_to_sec function

2007-07-24 Thread Varun Chandramohan
Patrick McHardy wrote:
> Varun Chandramohan wrote:
>   
>> Oliver Hartkopp wrote:
>>
>> 
> I don't think you should round down timeout values.
>  
>
>  
>   
 Can you elaborate on that? As per the RFC of MIB ,we need only seconds
 granularity. Taking that as the case i dont understand why round down
 should not be done?
  

 
>>> When you like to create any timeout based on your calculated value, you
>>> might run into the problem that your calculated value is set to _zero_
>>> even if there was "some time" before the conversion. This might probably
>>> not what you indented to get.
>>>
>>> So what about rounding up with
>>>
>>> return (tv->tv_sec + (tv->tv_usec + 99)/100);
>>>
>>> ???
>>>
>>>  
>>>   
>> This can done.  Is this what you were ref to me, Patrick?
>> 
>
>
> Yes, timeouts should usually be at least as long as specified.
>   
Thanks Patrick and Oliver, ill round it up. :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Add new timeval_to_sec function

2007-07-24 Thread Patrick McHardy
Varun Chandramohan wrote:
> Oliver Hartkopp wrote:
> 
I don't think you should round down timeout values.
  

  
>>>
>>>Can you elaborate on that? As per the RFC of MIB ,we need only seconds
>>>granularity. Taking that as the case i dont understand why round down
>>>should not be done?
>>>  
>>>
>>
>>When you like to create any timeout based on your calculated value, you
>>might run into the problem that your calculated value is set to _zero_
>>even if there was "some time" before the conversion. This might probably
>>not what you indented to get.
>>
>>So what about rounding up with
>>
>>return (tv->tv_sec + (tv->tv_usec + 99)/100);
>>
>>???
>>
>>  
> 
> This can done.  Is this what you were ref to me, Patrick?


Yes, timeouts should usually be at least as long as specified.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)

2007-07-24 Thread OBATA Noboru
From: Rick Jones <[EMAIL PROTECTED]>
Subject: Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)
Date: Thu, 12 Jul 2007 13:51:44 -0700

> > TCP's timeouts are perfectly fine, and the only thing you
> > might be showing above is that the application timeouts
> > are too short or that TCP needs notifications.
> 
> The application timeouts are probably being driven by external desires 
> for a given recovery time.

Agreed.

> TCP notifications don't solve anything unless the links in question are 
> local to the machine on which the TCP endpoint resides.

Agreed.  Thank you for a good explanation.

My original discussion using Dom-0 and Dom-U might be
misleading, but I was trying to say:

* Network failure and recovery(failover) are not necessarily
  visible locally.

  ** Dom-0 vs. Dom-U discussion is just an example of the case
 where a network failure is not visible locally.

  ** For another example, network switches or routers sitting
 somewhere in the middle of route are often duplicated with
 active-standby setting today.

* Quick response (retransmission) of TCP upon a recovery of such
  invisible devices as well is desired.

* If the failure and recovery are not visible locally, TCP
  notifications do not help.

Regards,

-- 
OBATA Noboru ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)

2007-07-24 Thread OBATA Noboru
From: Rick Jones <[EMAIL PROTECTED]>
Subject: Re: [PATCH 2.6.22] TCP: Make TCP_RTO_MAX a variable (take 2)
Date: Thu, 12 Jul 2007 15:27:30 -0700

> > So the problem is that RTO can grows to be twice the failover detection
> > time.  So back to the original mail, the scenario has a switch with failover
> > detection of 20seconds.  Worst case TCP RTO could grow to 40 seconds.
> > 
> > Going back in archive to original mail:
> > 
> > 
> >>Background
> >>==
> >>
> >>When designing a TCP/IP based network system on failover-capable
> >>network devices, people want to set timeouts hierarchically in
> >>three layers, network device layer, TCP layer, and application
> >>layer (bottom-up order), such that:
> >>
> >>1. Network device layer detects a failure first and switch to a
> >>   backup device (say, in 20sec).
> >>
> >>2. TCP layer timeout & retransmission comes next, _hopefully_
> >>   before the application layer timeout.
> >>
> >>3. Application layer detects a network failure last (by, say,
> >>   30sec timeout) and may trigger a system-level failover.
> > 
> > 
> > Sounds like the solution is to make the switch failover detection faster.
> > If you get switch failover down to 5sec then TCP RTO shouldn't be bigger
> > than 10sec, and application will survive.
> 
> That may indeed be the best solution, we'll have to wait to hear if 
> there is any freedom there.  When this sort of thing has crossed my path 
> in other contexts, the general answer is that the device failover time 
> is fixed, and the application layer time is similarly constrained by 
> end-user expectation/requirement.  Often as not, layer 8 and 9 issues 
> tend to dominate and expect to trump (in this case layer 4 issues).

I agree that application will survive if a user makes the
application timeout twice the failover timeout.  But I'm afraid
there is no such freedom.

Basically, to minimize downtime, shorter timeouts are preferred
as long as the probability of mis-detection is kept low at a
certain level.

In practice, failover timeouts for bonding, switches, or routers
are determined by heuristics.  Users know what timeout values and
retry counts of probe packets are suitable for detecting failure
of a certain combination of network equipments. (e.g., 5sec
timeout, retries 4 times)  Shorter is better.

And application timeout (or system timeout) is given as an
end-user requirement.  There is little change of negotiation,
really.  And again shorter (than requirement, if possible) is
better.

Regards,

-- 
OBATA Noboru ([EMAIL PROTECTED])
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in write_bulk_callback() in drivers/net/usb/pegasus.c

2007-07-24 Thread Petko Manolov

ACK :-)


cheers,
Petko


On Mon, 23 Jul 2007, Micah Gruber wrote:

This patch fixes a potential null dereference bug where we dereference 
pegasus before a null check. This patch simply moves the dereferencing after 
the null check.


Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>

---

--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -768,11 +768,13 @@
static void write_bulk_callback(struct urb *urb)
{
  pegasus_t *pegasus = urb->context;
-   struct net_device *net = pegasus->net;
+   struct net_device *net;

  if (!pegasus)
  return;

+   net = pegasus->net;
+
  if (!netif_device_present(net) || !netif_running(net))
  return;


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_net.c gso & feature support

2007-07-24 Thread Rusty Russell
Feedback welcome, as always!

(There's been talk of a virtualization git tree, in which case there'll
be a decent home for these patches soon).

Cheers,
Rusty.
==
Add feature and GSO support to virtio net driver.

If you don't do GSO, you can simply ignore the first sg element of
every outgoing packet, and tack a dummy one on every incoming.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/net/virtio_net.c   |  130 
 include/linux/virtio_net.h |   25 +++-
 2 files changed, 143 insertions(+), 12 deletions(-)

===
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -16,12 +16,13 @@
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
-//#define DEBUG
+#define DEBUG
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /* FIXME: Make dynamic */
 #define MAX_PACKET_LEN (ETH_HLEN+ETH_DATA_LEN)
@@ -40,6 +41,18 @@ struct virtnet_info
struct sk_buff_head send;
 };
 
+static inline struct virtio_net_hdr *skb_vnet_hdr(struct sk_buff *skb)
+{
+   return (struct virtio_net_hdr *)skb->cb;
+}
+
+static void vnet_hdr_to_sg(struct scatterlist *sg, struct sk_buff *skb)
+{
+   sg->page = virt_to_page(skb_vnet_hdr(skb));
+   sg->offset = offset_in_page(skb_vnet_hdr(skb));
+   sg->length = sizeof(struct virtio_net_hdr);
+}
+
 static bool skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq->priv;
@@ -52,12 +65,14 @@ static void receive_skb(struct net_devic
 static void receive_skb(struct net_device *dev, struct sk_buff *skb,
unsigned len)
 {
-   if (unlikely(len < ETH_HLEN)) {
+   struct virtio_net_hdr *hdr = skb_vnet_hdr(skb);
+
+   if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
-   dev_kfree_skb(skb);
-   return;
-   }
+   goto drop;
+   }
+   len -= sizeof(struct virtio_net_hdr);
BUG_ON(len > MAX_PACKET_LEN);
 
skb_trim(skb, len);
@@ -66,13 +81,70 @@ static void receive_skb(struct net_devic
 ntohs(skb->protocol), skb->len, skb->pkt_type);
dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;
+
+   if (hdr->flags & VIRTIO_NET_F_NEEDS_CSUM) {
+   pr_debug("Needs csum!\n");
+   skb->ip_summed = CHECKSUM_PARTIAL;
+   skb->csum_start = hdr->csum_start;
+   skb->csum_offset = hdr->csum_offset;
+   if (skb->csum_start > skb->len - 2
+   || skb->csum_offset > skb->len - 2) {
+   if (net_ratelimit())
+   printk(KERN_WARNING "%s: csum=%u/%u len=%u\n",
+  dev->name, skb->csum_start,
+  skb->csum_offset, skb->len);
+   goto frame_err;
+   }
+   }
+
+   if (hdr->gso_type != VIRTIO_NET_GSO_NONE) {
+   pr_debug("GSO!\n");
+   switch (hdr->gso_type) {
+   case VIRTIO_NET_GSO_TCP:
+   skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+   break;
+   case VIRTIO_NET_GSO_TCP_ECN:
+   skb_shinfo(skb)->gso_type = SKB_GSO_TCP_ECN;
+   break;
+   case VIRTIO_NET_GSO_UDP:
+   skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+   break;
+   case VIRTIO_NET_GSO_TCPV6:
+   skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+   break;
+   default:
+   if (net_ratelimit())
+   printk(KERN_WARNING "%s: bad gso type %u.\n",
+  dev->name, hdr->gso_type);
+   goto frame_err;
+   }
+
+   skb_shinfo(skb)->gso_size = hdr->gso_size;
+   if (skb_shinfo(skb)->gso_size == 0) {
+   if (net_ratelimit())
+   printk(KERN_WARNING "%s: zero gso size.\n",
+  dev->name);
+   goto frame_err;
+   }
+
+   /* Header must be checked, and gso_segs computed. */
+   skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
+   skb_shinfo(skb)->gso_segs = 0;
+   }
+
netif_receive_skb(skb);
+   return;
+
+frame_err:
+   dev->stats.rx_frame_errors++;
+drop:
+   dev_kfree_skb(skb);
 }
 
 static void try_fill_recv(struct virtnet_info *vi)
 {
struct sk_buff *skb;
-   struct scatterlist sg[MAX_SKB_FRAGS];
+   struct scatterlist sg[1+MAX_SKB_FRAGS];
int num, err;
 

Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-07-24 Thread Richard MUSIL
Thomas Graf wrote:
> Please provide a new overall patch which is not based on your
> initial patch so I can review your idea properly.

Here it goes (merging two previous patches). I have diffed
against v2.6.22, which I am using currently as my base:

 include/net/genetlink.h |1 +
 net/netlink/genetlink.c |  106 +++
 2 files changed, 80 insertions(+), 27 deletions(-)

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index b6eaca1..681ad13 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -25,6 +25,7 @@ struct genl_family
struct nlattr **attrbuf;/* private */
struct list_headops_list;   /* private */
struct list_headfamily_list;/* private */
+   struct mutexlock;   /* private */
 };
 
 /**
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index b9ab62f..0104267 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -38,6 +38,32 @@ static void genl_unlock(void)
genl_sock->sk_data_ready(genl_sock, 0);
 }
 
+static DEFINE_MUTEX(genl_fam_mutex);   /* serialization for family list 
management */
+
+static inline void genl_fam_lock(struct genl_family *family)
+{
+   mutex_lock(&genl_fam_mutex);
+   if (family)
+   mutex_lock(&family->lock);
+}
+
+static inline void genl_fam_unlock(struct genl_family *family)
+{
+   if (family)
+   mutex_unlock(&family->lock);
+   mutex_unlock(&genl_fam_mutex);
+}
+
+static inline void genl_onefam_lock(struct genl_family *family)
+{
+   mutex_lock(&family->lock);
+}
+
+static inline void genl_onefam_unlock(struct genl_family *family)
+{
+   mutex_unlock(&family->lock);
+}
+
 #define GENL_FAM_TAB_SIZE  16
 #define GENL_FAM_TAB_MASK  (GENL_FAM_TAB_SIZE - 1)
 
@@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct 
genl_ops *ops)
if (ops->policy)
ops->flags |= GENL_CMD_CAP_HASPOL;
 
-   genl_lock();
+   genl_fam_lock(family);
list_add_tail(&ops->ops_list, &family->ops_list);
-   genl_unlock();
+   genl_fam_unlock(family);
 
genl_ctrl_event(CTRL_CMD_NEWOPS, ops);
err = 0;
@@ -180,16 +206,16 @@ int genl_unregister_ops(struct genl_family *family, 
struct genl_ops *ops)
 {
struct genl_ops *rc;
 
-   genl_lock();
+   genl_fam_lock(family);
list_for_each_entry(rc, &family->ops_list, ops_list) {
if (rc == ops) {
list_del(&ops->ops_list);
-   genl_unlock();
+   genl_fam_unlock(family);
genl_ctrl_event(CTRL_CMD_DELOPS, ops);
return 0;
}
}
-   genl_unlock();
+   genl_fam_unlock(family);
 
return -ENOENT;
 }
@@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family)
goto errout;
 
INIT_LIST_HEAD(&family->ops_list);
+   mutex_init(&family->lock);
 
-   genl_lock();
+   genl_fam_lock(family);
 
if (genl_family_find_byname(family->name)) {
err = -EEXIST;
@@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family)
family->attrbuf = NULL;
 
list_add_tail(&family->family_list, genl_family_chain(family->id));
-   genl_unlock();
+   genl_fam_unlock(family);
 
genl_ctrl_event(CTRL_CMD_NEWFAMILY, family);
 
return 0;
 
 errout_locked:
-   genl_unlock();
+   genl_fam_unlock(family);
 errout:
return err;
 }
@@ -275,7 +302,7 @@ int genl_unregister_family(struct genl_family *family)
 {
struct genl_family *rc;
 
-   genl_lock();
+   genl_fam_lock(family);
 
list_for_each_entry(rc, genl_family_chain(family->id), family_list) {
if (family->id != rc->id || strcmp(rc->name, family->name))
@@ -283,14 +310,16 @@ int genl_unregister_family(struct genl_family *family)
 
list_del(&rc->family_list);
INIT_LIST_HEAD(&family->ops_list);
-   genl_unlock();
+
+   genl_fam_unlock(family);
+   mutex_destroy(&family->lock);
 
kfree(family->attrbuf);
genl_ctrl_event(CTRL_CMD_DELFAMILY, family);
return 0;
}
 
-   genl_unlock();
+   genl_fam_unlock(family);
 
return -ENOENT;
 }
@@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
struct genlmsghdr *hdr = nlmsg_data(nlh);
int hdrlen, err;
 
+   genl_fam_lock(NULL);
family = genl_family_find_byid(nlh->nlmsg_type);
-   if (family == NULL)
+   if (family == NULL) {
+   genl_fam_unlock(NULL);
return -ENOENT;
+   }
+
+   /* get particular family lock, but release global family lock
+* so registering operati

Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Mike Galbraith
On Tue, 2007-07-24 at 12:01 +0200, Mike Galbraith wrote:
> On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:
> 
> > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that 
> > out.
> 
> My box bugged during boot the first time I booted 23-rc1, but nothing
> made it to the console, and I didn't have a serial console running.  I
> didn't have DEBUG_PAGEALLOC or friends set.
> 
> > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > Both traces point up into the page allocator, but I _think_ that's stack
> > gunk.
> 
> I just enabled all debug options, and was just rewarded with the below.

Hm.  I just also experienced filesystem corruption when I tried to send
from that kernel, and it bugged in the process.  My mount table ended up
in /etc/resolv.conf along with some binary goop, making nscd rather
unhappy after reboot.  fsck time.

.Mike

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

2007-07-24 Thread Mike Galbraith
On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:

> You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.

My box bugged during boot the first time I booted 23-rc1, but nothing
made it to the console, and I didn't have a serial console running.  I
didn't have DEBUG_PAGEALLOC or friends set.

> I haven't worked out where that kmap_atomic() call is coming from yet. 
> Both traces point up into the page allocator, but I _think_ that's stack
> gunk.

I just enabled all debug options, and was just rewarded with the below.

[  119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
[  119.558867] [ cut here ]
[  119.572197] kernel BUG at arch/i386/mm/highmem.c:38!
[  119.585804] invalid opcode:  [#1]
[  119.598013] PREEMPT SMP DEBUG_PAGEALLOC
[  119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp 
ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter 
ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables 
ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 
snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer 
snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore 
snd_page_alloc i2c_i801 fan thermal processor
[  119.698063] CPU:1
[  119.698065] EIP:0060:[]Not tainted VLI
[  119.698067] EFLAGS: 00010006   (2.6.23-rc1-smp #75)
[  119.736358] EIP is at kmap_atomic_prot+0xa7/0xab
[  119.749647] eax: 3d07f163   ebx: c166db80   ecx: c0750e60   edx: 0007
[  119.765417] esi: 0022   edi: 0163   ebp: c069dcd4   esp: c069dcc8
[  119.781273] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[  119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 
task.ti=f477d000)
[  119.804068] Stack: c166db80  c166db80 c069dcdc c011cd3f c069dd40 
c015b6e0 0001 
[  119.822272]0044 0163  0001 c165f4e0 0001 
c165f4e0 0001 
[  119.840762] 00028020 c061e71c c166db80 0046 0080 
0001 c011e4de 
[  119.859389] Call Trace:
[  119.881302]  [] show_trace_log_lvl+0x1a/0x30
[  119.896319]  [] show_stack_log_lvl+0xa5/0xca
[  119.911171]  [] show_registers+0x1fc/0x343
[  119.925756]  [] die+0x122/0x249
[  119.939241]  [] do_trap+0x84/0xad
[  119.952897]  [] do_invalid_op+0x88/0x92
[  119.967118]  [] error_code+0x72/0x78
[  119.980948]  [] kmap_atomic+0xe/0x10
[  119.994642]  [] get_page_from_freelist+0x39e/0x45e
[  120.009485]  [] __alloc_pages+0x5b/0x2db
[  120.023342]  [] cache_alloc_refill+0x380/0x6f2
[  120.037623]  [] kmem_cache_alloc+0xa1/0xa5
[  120.051426]  [] neigh_create+0x5f/0x506
[  120.064894]  [] ndisc_dst_alloc+0x122/0x151
[  120.078769]  [] __ndisc_send+0x8d/0x4fa
[  120.092340]  [] ndisc_send_ns+0x5f/0x7d
[  120.105848]  [] addrconf_dad_timer+0xdb/0xe0
[  120.119758]  [] run_timer_softirq+0x130/0x191
[  120.133717]  [] __do_softirq+0x76/0xe4
[  120.147475]  [] do_softirq+0x63/0xac
[  120.147488]  [] 

(gdb) list *neigh_create+0x5f
0xc03fb397 is in neigh_create (include/linux/slab.h:259).
254 /*
255  * Shortcuts
256  */
257 static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
258 {
259 return kmem_cache_alloc(k, flags | __GFP_ZERO);
260 }
261
262 /**
263  * kzalloc - allocate memory. The memory is set to zero.
(gdb) list *kmem_cache_alloc+0xa1
0xc0172e7a is in kmem_cache_alloc (mm/slab.c:3176).
3171STATS_INC_ALLOCHIT(cachep);
3172ac->touched = 1;
3173objp = ac->entry[--ac->avail];
3174} else {
3175STATS_INC_ALLOCMISS(cachep);
3176objp = cache_alloc_refill(cachep, flags);
3177}
3178return objp;
3179}
3180
(gdb) list *cache_alloc_refill+0x380
0xc0172872 is in cache_alloc_refill (include/linux/gfp.h:154).
149
150 /* Unknown node is current node */
151 if (nid < 0)
152 nid = numa_node_id();
153
154 return __alloc_pages(gfp_mask, order,
155 NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
156 }
157
158 #ifdef CONFIG_NUMA
(gdb) list *__alloc_pages+0x5b
0xc015b7fb is in __alloc_pages (mm/page_alloc.c:1248).
1243if (unlikely(*z == NULL)) {
1244/* Should this ever happen?? */
1245return NULL;
1246}
1247
1248page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
1249zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
1250if (page)
1251goto got_pg;
1252
(gdb) list *get_page_from_freelist+0x39e
0xc015b6e0 is in get_page_from_freelist (include/linux/highmem.h:122).
117 return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
118 }
119
120 static inline void clear_highpage(struct page *page)
121 {
122 void

  1   2   >