[tip:sched/core] sched/cputime: Remove extra cost in task_cputime ()

2015-12-04 Thread tip-bot for Hiroshi Shimamoto
Commit-ID:  7877a0ba5ec63c7b0111b06c773f1696fa17b35a
Gitweb: http://git.kernel.org/tip/7877a0ba5ec63c7b0111b06c773f1696fa17b35a
Author: Hiroshi Shimamoto 
AuthorDate: Thu, 19 Nov 2015 16:47:29 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 4 Dec 2015 10:34:43 +0100

sched/cputime: Remove extra cost in task_cputime()

There is an extra cost in task_cputime() and task_cputime_scaled() when
nohz_full is not activated. When vtime accounting is not enabled, we
don't need to get deltas of utime and stime under vtime seqlock.

This patch removes that cost with adding a shortcut route if vtime
accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless cputime delta.

Signed-off-by: Hiroshi Shimamoto 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Linus Torvalds 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E . McKenney 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1447948054-28668-3-git-send-email-fweis...@gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cputime.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 05de80b..1128d4b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -853,6 +853,14 @@ void task_cputime(struct task_struct *t, cputime_t *utime, 
cputime_t *stime)
 {
cputime_t udelta, sdelta;
 
+   if (!context_tracking_is_enabled()) {
+   if (utime)
+   *utime = t->utime;
+   if (stime)
+   *stime = t->stime;
+   return;
+   }
+
fetch_task_cputime(t, utime, stime, &t->utime,
   &t->stime, &udelta, &sdelta);
if (utime)
@@ -866,6 +874,14 @@ void task_cputime_scaled(struct task_struct *t,
 {
cputime_t udelta, sdelta;
 
+   if (!context_tracking_is_enabled()) {
+   if (utimescaled)
+   *utimescaled = t->utimescaled;
+   if (stimescaled)
+   *stimescaled = t->stimescaled;
+   return;
+   }
+
fetch_task_cputime(t, utimescaled, stimescaled,
   &t->utimescaled, &t->stimescaled, &udelta, &sdelta);
if (utimescaled)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:locking/core] sched/cputime: Fix invalid gtime in proc

2015-12-04 Thread tip-bot for Hiroshi Shimamoto
Commit-ID:  2541117b0cf79977fa11a0d6e17d61010677bd7b
Gitweb: http://git.kernel.org/tip/2541117b0cf79977fa11a0d6e17d61010677bd7b
Author: Hiroshi Shimamoto 
AuthorDate: Thu, 19 Nov 2015 16:47:28 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 4 Dec 2015 10:18:49 +0100

sched/cputime: Fix invalid gtime in proc

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is runtime enabled.

This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:

 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 
1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; 
sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
    S 5726
    S 5726

Signed-off-by: Hiroshi Shimamoto 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Linus Torvalds 
Cc: Luiz Capitulino 
Cc: Mike Galbraith 
Cc: Paul E . McKenney 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweis...@gmail.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/cputime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 26a5446..05de80b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -788,6 +788,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;
 
+   if (!context_tracking_is_enabled())
+   return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 1/2] cputime: fix invalid gtime in proc

2015-11-10 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH v3 1/2] cputime: fix invalid gtime in proc
> 
> On Mon, Nov 02, 2015 at 05:13:51PM +0100, Peter Zijlstra wrote:
> > On Fri, Oct 30, 2015 at 12:46:39AM +, Hiroshi Shimamoto wrote:
> > > +++ b/kernel/sched/cputime.c
> > > @@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
> > >   unsigned int seq;
> > >   cputime_t gtime;
> > >
> > > + if (!context_tracking_is_enabled())
> > > + return t->gtime;
> > > +
> >
> > Yeah, not happy about that.. why do we have to touch context tracking
> > muck to find vtime state etc.
> 
> That's right, this is because it is deemed to be a quick and non invasive fix
> to be backported.
> 
> Then will come the more invasive but proper fix consisting in having
> vtime_accounting_enabled() telling if vtime is running on any CPU and
> vtime_accounting_cpu_enabled(). The first will be used for remote readers
> (as in this patch) and the second for writers.
> 
> Since we are dealing with a regression, it's better to minimize the changes.
> AFAICT, the regression got introduced in 2012:
> 
>   6a61671bb2f3a1bd12cd17b8fca811a624782632
>   ("cputime: Safely read cputime of full dynticks CPUs")

Is this patch going to apply to fix the regression?

thanks,
Hiroshi

> 
> >
> > >   do {
> > >   seq = read_seqbegin(&t->vtime_seqlock);
> > >
> > > --
> > > 1.8.3.1
> > >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] cputime: remove extra cost in task_cputime

2015-10-29 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

There is an extra cost in task_cputime() and task_cputime_scaled() when
nohz_full is not activated. This patch removes that cost. When vtime
accounting is not enabled, we don't need to get deltas of utime and stime
with seqlock.

This patch adds a shortcut route if vtime accounting is not enabled.

Use context_tracking_is_enabled() to check if the vtime accounting on
current cpu. In future we should check the state of cpu which is running
target thread.

Signed-off-by: Hiroshi Shimamoto 
Cc: sta...@vger.kernel.org
---
v3: this patch is newly added for related issue.

 kernel/sched/cputime.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 63904e7..ff0365d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -851,6 +851,14 @@ void task_cputime(struct task_struct *t, cputime_t *utime, 
cputime_t *stime)
 {
cputime_t udelta, sdelta;
 
+   if (!context_tracking_is_enabled()) {
+   if (utime)
+   *utime = t->utime;
+   if (stime)
+   *stime = t->stime;
+   return;
+   }
+
fetch_task_cputime(t, utime, stime, &t->utime,
   &t->stime, &udelta, &sdelta);
if (utime)
@@ -864,6 +872,14 @@ void task_cputime_scaled(struct task_struct *t,
 {
cputime_t udelta, sdelta;
 
+   if (!context_tracking_is_enabled()) {
+   if (utimescaled)
+   *utimescaled = t->utimescaled;
+   if (stimescaled)
+   *stimescaled = t->stimescaled;
+   return;
+   }
+
fetch_task_cputime(t, utimescaled, stimescaled,
   &t->utimescaled, &t->stimescaled, &udelta, &sdelta);
if (utimescaled)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] cputime: fix invalid gtime in proc

2015-10-29 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is enabled.

This patch makes task_gtime() just return gtime when vtime accounting
is not enabled.

Use context_tracking_is_enabled() to check if the vtime accounting on
current cpu. In future we should check the state of cpu which is running
target thread.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second.
 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
S 5726
S 5726

Signed-off-by: Hiroshi Shimamoto 
Cc: sta...@vger.kernel.org
---
v2: Update ChangeLog to put the script and show the point.
v3: Use context_tracking_is_enabled() instead of vtime_accounting_enabled().

 kernel/sched/cputime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8cbc3db..63904e7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;
 
+   if (!context_tracking_is_enabled())
+   return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] cputime: fix invalid gtime

2015-10-28 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH v2] cputime: fix invalid gtime
> 
> On Thu, Oct 29, 2015 at 01:10:01AM +, Hiroshi Shimamoto wrote:
> > > Obviously I completely messed up there. And task_cputime() has a similar 
> > > issue
> > > but it happens to work due to vtime_snap_whence set to VTIME_SLEEPING 
> > > when vtime
> > > doesn't run. Still it works at the cost of a seqcount read operation.
> > >
> > > Do you think you could fix it too (along with task_cputime_scaled())? I 
> > > think those
> > > patches will also need a stable tag.
> >
> > Do you mean that task_cputime() and task_cputime_scaled() don't hit invalid 
> > behavior
> > but have some extra operation cost which could be removed?
> 
> Exactly.
> 
> >
> > Will look into it, and send patches with stable tag.
> 
> Thanks a lot!
> 
> Oh and another detail: vtime_accounting_enabled() checks if vtime
> accounting is done precisely on the current CPU. That's what we want to check
> when we account the time but not when we want to read the cputime of a task.
> 
> For example, CPU 0 never has vtime_accounting_enabled() because it plays the
> role of timekeeper and as such it keeps the tick periodic. So if task A runs 
> on
> CPU 1 that has vtime accounting on, and we read the cputime of task A from 
> CPU 0,
> vtime_accounting_enabled() will be false whereas we need to compute the delta.
> 
> So vtime_accounting_enabled() isn't suitable to check if vtime is running on 
> _some_
> CPU such that we can't return utime/stime with a raw read.

I see the point, vtime accounting can be enabled on dedicated cpu and there is 
no
guarantee the reading thread is on the same state.

> 
> Ideally we shoud rename vtime_accounting_enabled() to 
> vtime_accounting_cpu_enabled()
> and have vtime_accounting_enabled() to check if vtime runs somewhere. But 
> that would
> be too much an invasive change for a stable patch. So lets just use
> context_tracking_is_enabled() for now instead.

I have dig the code.
And my understanding is that vtime_accounting_enabled() does check global flag 
with
context_tracking_is_enabled() and then check current cpu state with
context_tracking_cpu_is_enabled(). For now, we just check global flag to fix 
current
issue instead of checking both in vtime_accounting_enabled(). In future we 
should fix
more precisely.
Is that correct?

thanks,
Hiroshi

> 
> I think task_gtime() is also buggy when the target task runs in a CPU that 
> doesn't do
> vtime accounting (whereas another CPU does vtime accounting). We should fix 
> that with
> using a new VTIME_GUEST value instead of (or along with) PF_VCPU. But that's 
> another story,
> your fix is much more important for now.
> 
> >
> > thanks,
> > Hiroshi
> >
> > >
> > > Thanks!
> > >
> > >
> > >
> > > > do {
> > > > seq = read_seqbegin(&t->vtime_seqlock);
> > > >
> > > > --
> > > > 1.8.3.1
> > > >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] cputime: fix invalid gtime

2015-10-28 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH v2] cputime: fix invalid gtime
> 
> On Wed, Oct 28, 2015 at 07:01:18AM +, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > /proc/stats shows invalid gtime when the thread is running in guest.
> > When vtime accounting is not enabled, we cannot get a valid delta.
> > The delta is calculated now - tsk->vtime_snap, but tsk->vtime_snap
> > is only updated when vtime accounting is enabled.
> >
> > This patch makes task_gtime() just return gtime when vtime accounting
> > is not enabled.
> >
> > The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
> > CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
> >
> > I ran and stop a busy loop in VM and see the gtime in host.
> > Dump the 43rd field which shows the gtime in every second.
> >  # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; 
> > done
> > S 4348
> > R 7064566
> > R 7064766
> > R 7064967
> > R 7065168
> > S 4759
> > S 4759
> >
> > During running busy loop, it returns large value.
> >
> > After applying this patch, we can see right gtime.
> >
> >  # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; 
> > done
> > S 5338
> > R 5365
> > R 5465
> > R 5566
> > R 5666
> > S 5726
> > S 5726
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > ---
> >  kernel/sched/cputime.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> > index 8cbc3db..f614ee9 100644
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
> > unsigned int seq;
> > cputime_t gtime;
> >
> > +   if (!vtime_accounting_enabled())
> > +   return t->gtime;
> > +
> 
> Obviously I completely messed up there. And task_cputime() has a similar issue
> but it happens to work due to vtime_snap_whence set to VTIME_SLEEPING when 
> vtime
> doesn't run. Still it works at the cost of a seqcount read operation.
> 
> Do you think you could fix it too (along with task_cputime_scaled())? I think 
> those
> patches will also need a stable tag.

Do you mean that task_cputime() and task_cputime_scaled() don't hit invalid 
behavior
but have some extra operation cost which could be removed?

Will look into it, and send patches with stable tag.

thanks,
Hiroshi

> 
> Thanks!
> 
> 
> 
> > do {
> > seq = read_seqbegin(&t->vtime_seqlock);
> >
> > --
> > 1.8.3.1
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] cputime: fix invalid gtime

2015-10-28 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is enabled.

This patch makes task_gtime() just return gtime when vtime accounting
is not enabled.

The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.

I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second.
 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759

During running busy loop, it returns large value.

After applying this patch, we can see right gtime.

 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
S 5726
S 5726

Signed-off-by: Hiroshi Shimamoto 
---
 kernel/sched/cputime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8cbc3db..f614ee9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;
 
+   if (!vtime_accounting_enabled())
+   return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] cputime: fix invalid gtime

2015-09-17 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH] cputime: fix invalid gtime
> 
> On Thu, Sep 03, 2015 at 12:45:50AM +, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > /proc/stats shows invalid gtime when the thread is running in guest.
> 
> Why is this a problem?

In host, when I monitored cpu usage in guest I noticed that the cpu time
is not stable.

> 
> > When vtime accounting is not enabled, we cannot get a valid delta.
> > Just return gtime when vtime accounting is not enabled in task_gtime().
> 
> But isn't other stuff then also broken, like fetch_task_cputime(). Tell
> me more about why you think your patch is the right one.

No, because I think vtime_snap_whence keeps VTIME_SLEEPING until
vtime_accounting_enabled() returns true. Then no delta is added.

> 
> > Before
> > 10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 3776 460 0 0 20 
> > 0 24 0 11960 8090398720 151288 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 3554 0 
> 0 0 0 0 0 0 0 0
> 
> It would have been helpful if you'd used a small script to take out the
> right column. As is I've no clue which field to look at.

Sorry for inconvenience with little explanation.
I watched /proc//stats.
When guest running busy, gtime looks big number.

10987 (qemu-kvm) S ... 3554 ...
10987 (qemu-kvm) R ... 21415 ...

and keep it busy. after 1 second.

10987 (qemu-kvm) R ... 21616 ...

few second later, I stopped busy program on guest.
The gtime back to sane value.

10987 (qemu-kvm) S ... 4084 ...


thanks,
Hiroshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] cputime: fix invalid gtime

2015-09-17 Thread Hiroshi Shimamoto
Hi,

have you had to time to see it?

thanks,
Hiroshi

> Subject: [PATCH] cputime: fix invalid gtime
> 
> From: Hiroshi Shimamoto 
> 
> /proc/stats shows invalid gtime when the thread is running in guest.
> When vtime accounting is not enabled, we cannot get a valid delta.
> 
> Just return gtime when vtime accounting is not enabled in task_gtime().
> 
> Before
> 10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 3776 460 0 0 20 0 
> 24 0 11960 8090398720 151288 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 3554 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 3776 460 0 0 20 0 
> 17 0 11960 8031649792 150268 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 3554 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 3843 460 0 0 20 0 
> 17 0 11960 8031649792 150268 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 21415 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 3943 460 0 0 20 0 
> 17 0 11960 8031649792 150268 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 21616 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4044 460 0 0 20 0 
> 17 0 11960 8031649792 150268 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 21816 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4144 460 0 0 20 0 
> 17 0 11960 8031649792 150268 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 22017 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4245 460 0 0 20 0 
> 11 0 11960 7981293568 149758 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 22218 0 
> 0 0 0 0 0 0 0 0
> 10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 4308 460 0 0 20 0 
> 11 0 11960 7981293568 149758 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 9 0 0 0 4084 0 
> 0 0 0 0 0 0 0 0
> 
> After
> 10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 202 0 0 0 2858 30 0 0 20 0 29 
> 0 7676 8511279104 148187 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 2874 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 202 0 0 0 2858 30 0 0 20 0 29 
> 0 7676 8511279104 148187 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 2874 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 2936 30 0 0 20 0 29 
> 0 7676 8511279104 148187 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 2952 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3037 30 0 0 20 0 29 
> 0 7676 8511279104 152184 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 3052 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3137 30 0 0 20 0 29 
> 0 7676 8511279104 152184 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 3152 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3237 30 0 0 20 0 27 
> 0 7676 8511279104 152188 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 3252 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 203 0 0 0 3262 31 0 0 20 0 11 
> 0 7676 8393781248 151156 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 3277 0 
> 0 0 0 0 0 0 0 0
> 10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 203 0 0 0 3262 31 0 0 20 0 11 
> 0 7676 8393781248 151156 18446744073709551615
> 1 1 0 0 0 0 2147220671 4096 25155 18446744073709551615 0 0 -1 3 0 0 0 3277 0 
> 0 0 0 0 0 0 0 0
> 
> Signed-off-by: Hiroshi Shimamoto 
> ---
>  kernel/sched/cputime.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 8cbc3db..f614ee9 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
>   unsigned int seq;
>   cputime_t gtime;
> 
> + if (!vtime_accounting_enabled())
> + return t->gtime;
> +
>   do {
>   seq = read_seqbegin(&t->vtime_seqlock);
> 
> --
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-

[PATCH] cputime: fix invalid gtime

2015-09-02 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.

Just return gtime when vtime accounting is not enabled in task_gtime().

Before
10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 3776 460 0 0 20 0 
24 0 11960 8090398720 151288 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 3554 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 3776 460 0 0 20 0 
17 0 11960 8031649792 150268 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 3554 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 3843 460 0 0 20 0 
17 0 11960 8031649792 150268 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 21415 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 3943 460 0 0 20 0 
17 0 11960 8031649792 150268 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 21616 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4044 460 0 0 20 0 
17 0 11960 8031649792 150268 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 21816 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4144 460 0 0 20 0 
17 0 11960 8031649792 150268 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 22017 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) R 1 10923 10923 0 -1 138428624 7521 0 90 0 4245 460 0 0 20 0 
11 0 11960 7981293568 149758 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 22218 0 0 0 0 0 0 0 0 0
10987 (qemu-kvm) S 1 10923 10923 0 -1 138428608 7521 0 90 0 4308 460 0 0 20 0 
11 0 11960 7981293568 149758 18446744073709551615 1 1 0 0 0 0 2147220671 4096 
25155 18446744073709551615 0 0 -1 9 0 0 0 4084 0 0 0 0 0 0 0 0 0

After
10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 202 0 0 0 2858 30 0 0 20 0 29 0 
7676 8511279104 148187 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 2874 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 202 0 0 0 2858 30 0 0 20 0 29 0 
7676 8511279104 148187 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 2874 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 2936 30 0 0 20 0 29 0 
7676 8511279104 148187 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 2952 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3037 30 0 0 20 0 29 0 
7676 8511279104 152184 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 3052 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3137 30 0 0 20 0 29 0 
7676 8511279104 152184 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 3152 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) R 1 10792 10792 0 -1 138428624 203 0 0 0 3237 30 0 0 20 0 27 0 
7676 8511279104 152188 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 3252 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 203 0 0 0 3262 31 0 0 20 0 11 0 
7676 8393781248 151156 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 3277 0 0 0 0 0 0 0 0 0
10845 (qemu-kvm) S 1 10792 10792 0 -1 138428608 203 0 0 0 3262 31 0 0 20 0 11 0 
7676 8393781248 151156 18446744073709551615 1 1 0 0 0 0 2147220671 4096 25155 
18446744073709551615 0 0 -1 3 0 0 0 3277 0 0 0 0 0 0 0 0 0

Signed-off-by: Hiroshi Shimamoto 
---
 kernel/sched/cputime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8cbc3db..f614ee9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -786,6 +786,9 @@ cputime_t task_gtime(struct task_struct *t)
unsigned int seq;
cputime_t gtime;
 
+   if (!vtime_accounting_enabled())
+   return t->gtime;
+
do {
seq = read_seqbegin(&t->vtime_seqlock);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Intel-wired-lan] [PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-27 Thread Hiroshi Shimamoto
> Subject: Re: [Intel-wired-lan] [PATCH v5] ixgbe: Add module parameter to 
> disable VLAN filter
> 
> On 05/26/2015 06:11 PM, Hiroshi Shimamoto wrote:
> >> On 05/21/2015 06:10 AM, Hiroshi Shimamoto wrote:
> >>> From: Hiroshi Shimamoto 
> >>>
> >>> Introduce module parameter "disable_hw_vlan_filter" to disable HW VLAN
> >>> filter on ixgbe module load.
> >>>
> >>>   From the hardware limitation, there are only 64 VLAN entries for HW VLAN
> >>> filter, and it leads to limit the number of VLANs up to 64 among PF and
> >>> VFs. For SDN/NFV case, we need to handle unlimited VLAN packets on VF.
> >>> In such case, every VLAN packet can be transmitted to each VF.
> >>>
> >>> When we try to make VLAN devices on VF, the 65th VLAN registration fails
> >>> and never be able to receive a packet with that VLAN tag.
> >>> If we do the below command on VM, ethX.65 to ethX.100 cannot be created.
> >>> # for i in `seq 1 100`; do \
> >>>   ip link add link ethX name ethX.$i type vlan id $i; done
> >>>
> >>> There is a capability to disable HW VLAN filter and that makes all VLAN
> >>> tagged packets can be transmitted to every VFs. After VLAN filter stage,
> >>> unicast packets are transmitted to VF which has the MAC address same as
> >>> the transmitting packet.
> >>>
> >>> With this patch and "disable_hw_vlan_filter=1", we can use unlimited
> >>> number of VLANs on VF.
> >>>
> >>> Disabling HW VLAN filter breaks some NIC features such as DCB and FCoE.
> >>> DCB and FCoE are disabled when HW VLAN filter is disabled by this module
> >>> parameter.
> >>> Because of that reason, the administrator has to know that before turning
> >>> off HW VLAN filter.
> >> You might also want to note that it makes the system susceptible to
> >> broadcast/multicast storms since it eliminates any/all VLAN isolation.
> >> So a broadcast or multicast packet on one VLAN is received on ALL
> >> interfaces regardless of their VLAN configuration. In addition the
> >> current VF driver is likely to just receive the packet as untagged, see
> >> ixgbevf_process_skb_fields().  As a result one or two VFs can bring the
> >> entire system to a crawl by saturating the PCIe bus via
> >> broadcast/multicast traffic since there is nothing to prevent them from
> >> talking to each other over VLANs that are no longer there.
> > that's right.
> >
> > On the other hand, an untagged packet is not isolated,
> > doesn't it same broadcast/multicast storm on untagged network?
> 
> Yes, that is one of the reasons for VLANs.  It provides isolation so
> that if you have two entities on the same network you won't have entity
> A able to talk to entity B.  The problem is with VLAN promiscuous
> enabled if entity B is a VF it will see the traffic but has no way to
> know that it was VLAN tagged and a part of entity A's VLAN.

Sorry, I guess I failed to make a question to clarify.
Occupying PCIe bus with broadcast/multicast packets causes performance
degradation. VLAN filter can isolate traffic and reduce PCIe bus usage,
but untagged broadcast/multicast traffic is still problem, I think.
What is difference between tagged packet and untagged packet?

> 
> >
> >> For the sake of backwards compatibility I would say that a feature like
> >> this should be mutually exclusive with SR-IOV as well since it will
> >> cause erratic behavior.  The VF will receive requests from all VLANs
> >> thinking the traffic is untagged, and then send replies back to VLAN 0
> >> even though that isn't where the message originated.
> > Sorry, I couldn't catch the above part.
> > Could you explain a bit more?
> >
> > thanks,
> > Hiroshi
> >
> >> Until the VF issue
> >> is fixed this type of feature is a no-go.
> >
> 
> The current behavior for a VF is that if it receives a VLAN that it
> didn't request it assumes it is operating in port VLAN mode.  The
> problem is with your patch the VF will be receiving all traffic but have
> no idea which VLAN it came from.  As a result it could be replying to
> multicast or broadcast requests on one VLAN with the wrong VLAN ID.
> 
> The VLAN behavior of the VF drivers will need to be fixed before
> something like that could be supported with ANY of the VFs.  As such you
> will probably need to fix the VF driver in order to allow any of them to
> come online when VLAN filtering is disabled, as the driver will need to
> report the VLAN tag ID up to the stack.

Thanks, that explains cleaner, I think I got the issue.
I have to check the exact behavior on my box to understand correctly, will do.

thanks,
Hiroshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [Intel-wired-lan] [PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-26 Thread Hiroshi Shimamoto
> On 05/21/2015 06:10 AM, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > Introduce module parameter "disable_hw_vlan_filter" to disable HW VLAN
> > filter on ixgbe module load.
> >
> >  From the hardware limitation, there are only 64 VLAN entries for HW VLAN
> > filter, and it leads to limit the number of VLANs up to 64 among PF and
> > VFs. For SDN/NFV case, we need to handle unlimited VLAN packets on VF.
> > In such case, every VLAN packet can be transmitted to each VF.
> >
> > When we try to make VLAN devices on VF, the 65th VLAN registration fails
> > and never be able to receive a packet with that VLAN tag.
> > If we do the below command on VM, ethX.65 to ethX.100 cannot be created.
> ># for i in `seq 1 100`; do \
> >  ip link add link ethX name ethX.$i type vlan id $i; done
> >
> > There is a capability to disable HW VLAN filter and that makes all VLAN
> > tagged packets can be transmitted to every VFs. After VLAN filter stage,
> > unicast packets are transmitted to VF which has the MAC address same as
> > the transmitting packet.
> >
> > With this patch and "disable_hw_vlan_filter=1", we can use unlimited
> > number of VLANs on VF.
> >
> > Disabling HW VLAN filter breaks some NIC features such as DCB and FCoE.
> > DCB and FCoE are disabled when HW VLAN filter is disabled by this module
> > parameter.
> > Because of that reason, the administrator has to know that before turning
> > off HW VLAN filter.
> 
> You might also want to note that it makes the system susceptible to
> broadcast/multicast storms since it eliminates any/all VLAN isolation.
> So a broadcast or multicast packet on one VLAN is received on ALL
> interfaces regardless of their VLAN configuration. In addition the
> current VF driver is likely to just receive the packet as untagged, see
> ixgbevf_process_skb_fields().  As a result one or two VFs can bring the
> entire system to a crawl by saturating the PCIe bus via
> broadcast/multicast traffic since there is nothing to prevent them from
> talking to each other over VLANs that are no longer there.

that's right.

On the other hand, an untagged packet is not isolated,
doesn't it same broadcast/multicast storm on untagged network?

> 
> For the sake of backwards compatibility I would say that a feature like
> this should be mutually exclusive with SR-IOV as well since it will
> cause erratic behavior.  The VF will receive requests from all VLANs
> thinking the traffic is untagged, and then send replies back to VLAN 0
> even though that isn't where the message originated.

Sorry, I couldn't catch the above part.
Could you explain a bit more?

thanks,
Hiroshi

> Until the VF issue
> is fixed this type of feature is a no-go.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-21 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH v5] ixgbe: Add module parameter to disable VLAN filter
> 
> From: Hiroshi Shimamoto 
> Date: Thu, 21 May 2015 13:10:49 +
> 
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index 263cb40..b45570f 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -158,6 +158,10 @@ module_param(allow_unsupported_sfp, uint, 0);
> >  MODULE_PARM_DESC(allow_unsupported_sfp,
> >  "Allow unsupported and untested SFP+ modules on 82599-based 
> > adapters");
> >
> > +static unsigned int disable_hw_vlan_filter;
> > +module_param(disable_hw_vlan_filter, uint, 0);
> > +MODULE_PARM_DESC(disable_hw_vlan_filter, "Disable HW VLAN filter");
> 
> Sorry, module parameters like this are not allowed.
> 
> You must use a generic, portable interface, to configure networking
> device settings.

Could you please tell me which interface is good for this?

> 
> Otherwise every other driver that wants to do something similar will
> have yet another module option with a different name, and every user
> will suffer because they will need to learn a different mechanism
> to perform this configuration for every driver.

Right, I agree.
But I thought that this requirement seems really special and closed in
ixgbe driver, that the reason I tried it with module parameter.

thanks,
Hiroshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5] ixgbe: Add module parameter to disable VLAN filter

2015-05-21 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Introduce module parameter "disable_hw_vlan_filter" to disable HW VLAN
filter on ixgbe module load.

>From the hardware limitation, there are only 64 VLAN entries for HW VLAN
filter, and it leads to limit the number of VLANs up to 64 among PF and
VFs. For SDN/NFV case, we need to handle unlimited VLAN packets on VF.
In such case, every VLAN packet can be transmitted to each VF.

When we try to make VLAN devices on VF, the 65th VLAN registration fails
and never be able to receive a packet with that VLAN tag.
If we do the below command on VM, ethX.65 to ethX.100 cannot be created.
  # for i in `seq 1 100`; do \
ip link add link ethX name ethX.$i type vlan id $i; done

There is a capability to disable HW VLAN filter and that makes all VLAN
tagged packets can be transmitted to every VFs. After VLAN filter stage,
unicast packets are transmitted to VF which has the MAC address same as
the transmitting packet.

With this patch and "disable_hw_vlan_filter=1", we can use unlimited
number of VLANs on VF.

Disabling HW VLAN filter breaks some NIC features such as DCB and FCoE.
DCB and FCoE are disabled when HW VLAN filter is disabled by this module
parameter.
Because of that reason, the administrator has to know that before turning
off HW VLAN filter.

Signed-off-by: Hiroshi Shimamoto 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 29 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5181a4d..492615d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -632,6 +632,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG_FCOE_ENABLED (u32)(1 << 21)
 #define IXGBE_FLAG_SRIOV_CAPABLE(u32)(1 << 22)
 #define IXGBE_FLAG_SRIOV_ENABLED(u32)(1 << 23)
+#define IXGBE_FLAG_VLAN_FILTER_ENABLED  (u32)(1 << 24)
 
u32 flags2;
 #define IXGBE_FLAG2_RSC_CAPABLE (u32)(1 << 0)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 263cb40..b45570f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -158,6 +158,10 @@ module_param(allow_unsupported_sfp, uint, 0);
 MODULE_PARM_DESC(allow_unsupported_sfp,
 "Allow unsupported and untested SFP+ modules on 82599-based 
adapters");
 
+static unsigned int disable_hw_vlan_filter;
+module_param(disable_hw_vlan_filter, uint, 0);
+MODULE_PARM_DESC(disable_hw_vlan_filter, "Disable HW VLAN filter");
+
 #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK)
 static int debug = -1;
 module_param(debug, int, 0);
@@ -4159,6 +4163,9 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
hw->addr_ctrl.user_set_promisc = false;
}
 
+   if (!(adapter->flags & IXGBE_FLAG_VLAN_FILTER_ENABLED))
+   vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
+
/*
 * Write addresses to available RAR registers, if there is not
 * sufficient space to store all the addresses then enable
@@ -5251,6 +5258,22 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 #endif /* CONFIG_IXGBE_DCB */
 #endif /* IXGBE_FCOE */
 
+   if (likely(!disable_hw_vlan_filter)) {
+   /* HW VLAN filter is enabled by default */
+   adapter->flags |= IXGBE_FLAG_VLAN_FILTER_ENABLED;
+   } else {
+   e_dev_warn("Disabling HW VLAN filter. "
+  "DCB and FCoE are also disabled.\n");
+#ifdef IXGBE_FCOE
+   /* Disabling FCoE */
+   adapter->flags &= ~IXGBE_FLAG_FCOE_CAPABLE;
+   adapter->flags &= ~IXGBE_FLAG_FCOE_ENABLED;
+#ifdef CONFIG_IXGBE_DCB
+   adapter->fcoe.up = 0;
+#endif /* CONFIG_IXGBE_DCB */
+#endif /* IXGBE_FCOE */
+   }
+
adapter->mac_table = kzalloc(sizeof(struct ixgbe_mac_addr) *
 hw->mac.num_rar_entries,
 GFP_ATOMIC);
@@ -7733,6 +7756,9 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
+   /* Unable to use DCB if HW VLAN filter is disabled */
+   if (!(adapter->flags & IXGBE_FLAG_VLAN_FILTER_ENABLED))
+   tc = 0;
if (tc) {
netdev_set_num_tc(dev, tc);
ixgbe_set_prio_tc_map(adapter);
@@ -8562,7 +8588,8 @@ skip_sriov:
}
 
netdev->hw_features |= NETIF_F_RXALL;
-   netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+   if 

RE: [E1000-devel] [PATCH v3 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-04-08 Thread Hiroshi Shimamoto
> Subject: Re: [E1000-devel] [PATCH v3 3/3] ixgbe: Add new ndo to allow VF 
> multicast promiscuous mode
> 
> On Wed, 2015-04-08 at 15:15 -0700, Alexander Duyck wrote:
> > On 04/07/2015 10:38 PM, Hiroshi Shimamoto wrote:
> > > From: Hiroshi Shimamoto 
> > >
> > > Implements the new netdev op to allow VF multicast promiscuous mode.
> > >
> > > The multicast promiscuous mode is not allowed for all VFs by default.
> > >
> > > The administrator can allow to VF multicast promiscuous mode for only
> > > trusted VM. After allowing multicast promiscuous mode from the host,
> > > we can use over 30 IPv6 addresses on VM.
> > >  # ip link set dev eth0 vf 1 mc_promisc on
> > >
> > > When disallowing multicast promiscuous mode, ixgbevf can only handle 30
> > > IPv6 addresses at most.
> > >  # ip link set dev eth0 vf 1 mc_promisc off
> > >
> > > Signed-off-by: Hiroshi Shimamoto 
> > > Reviewed-by: Hayato Momma 
> > > CC: Choi, Sy Jong 
> > > ---
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  7 ++
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 32 
> > > --
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  2 ++
> > >  4 files changed, 40 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
> > > b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > > index 08e65b6..4a9f74d 100644
> > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > > @@ -153,6 +153,7 @@ struct vf_data_storage {
> > >   u16 vlan_count;
> > >   u8 spoofchk_enabled;
> > >   bool rss_query_enabled;
> > > + u8 mc_promisc_allowed;
> > >   unsigned int vf_api;
> > >  };
> > >
> > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> > > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > index 2f41403..c0e07c5 100644
> > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > @@ -3663,6 +3663,12 @@ static void ixgbe_configure_virtualization(struct 
> > > ixgbe_adapter *adapter)
> > >   ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
> > > 
> > > adapter->vfinfo[i].rss_query_enabled);
> > >   }
> > > +
> > > + /* Reconfigure multicast promiscuous mode */
> > > + for (i = 0; i < adapter->num_vfs; i++) {
> > > + ixgbe_ndo_set_vf_mc_promisc(adapter->netdev, i,
> > > + adapter->vfinfo[i].mc_promisc_allowed);
> > > + }
> > >  }
> > >
> > >  static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
> >
> > This doesn't need to be a separate loop.  You can push it up into the
> > block above since it is already looping through all VFs.
> >
> > Once that is fixed the rest of this patch and the other two looked fine
> > to me.
> 
> Hiroshi I am dropping this series and will await v4.
> 
> Remember to send them to intel-wired-lan mailing list, please.

yep, will do.

thanks,
Hiroshi


[PATCH v3 2/3] if_link: Add VF multicast promiscuous control

2015-04-07 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Add netlink directives and ndo entry to allow VF multicast promiscuous mode.

This controls the permission to enter VF multicast promiscuous mode.
The administrator will dedicatedly allow multicast promiscuous per VF.

When the VF is under multicast promiscuous mode, all multicast packets are
sent to the VF.

Don't allow VF multicast promiscuous if the VM isn't fully trusted.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 include/linux/if_link.h  |  1 +
 include/linux/netdevice.h|  3 +++
 include/uapi/linux/if_link.h |  6 ++
 net/core/rtnetlink.c | 19 +--
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index da49299..df212f4 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -15,5 +15,6 @@ struct ifla_vf_info {
__u32 min_tx_rate;
__u32 max_tx_rate;
__u32 rss_query_en;
+   __u32 mc_promisc;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fc4da22..a444e1d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -873,6 +873,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
  *   int max_tx_rate);
  * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
+ * int (*ndo_set_vf_mc_promisc)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  * int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int 
link_state);
@@ -1094,6 +1095,8 @@ struct net_device_ops {
   int max_tx_rate);
int (*ndo_set_vf_spoofchk)(struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_set_vf_mc_promisc)(struct net_device *dev,
+int vf, bool setting);
int (*ndo_get_vf_config)(struct net_device *dev,
 int vf,
 struct ifla_vf_info *ivf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d9cd192..44c3bbe 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -468,6 +468,7 @@ enum {
IFLA_VF_RSS_QUERY_EN,   /* RSS Redirection Table and Hash Key query
 * on/off switch
 */
+   IFLA_VF_MC_PROMISC, /* Multicast Promiscuous allow/disallow */
__IFLA_VF_MAX,
 };
 
@@ -517,6 +518,11 @@ struct ifla_vf_rss_query_en {
__u32 setting;
 };
 
+struct ifla_vf_mc_promisc {
+   __u32 vf;
+   __u32 setting;
+};
+
 /* VF ports management section
  *
  * Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 74431d6..f247bf2 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -819,7 +819,8 @@ static inline int rtnl_vfinfo_size(const struct net_device 
*dev,
 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
 nla_total_size(sizeof(struct ifla_vf_rate)) +
 nla_total_size(sizeof(struct ifla_vf_link_state)) +
-nla_total_size(sizeof(struct ifla_vf_rss_query_en)));
+nla_total_size(sizeof(struct ifla_vf_rss_query_en)) +
+nla_total_size(sizeof(struct ifla_vf_mc_promisc)));
return size;
} else
return 0;
@@ -1134,6 +1135,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_link_state vf_linkstate;
struct ifla_vf_rss_query_en vf_rss_query_en;
+   struct ifla_vf_mc_promisc vf_mc_promisc;
 
/*
 * Not all SR-IOV capable drivers support the
@@ -1143,6 +1145,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
 */
ivi.spoofchk = -1;
ivi.rss_query_en = -1;
+   ivi.mc_promisc = -1;
memset(ivi.mac, 0, sizeof(ivi.mac));
/* The default value for VF link state is "auto"
 * IFLA_VF_LINK_STATE_AUTO which equals zero
@@ -1156,7 +1159,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
vf_tx_rate.vf =
  

[PATCH v3 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-04-07 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Implements the new netdev op to allow VF multicast promiscuous mode.

The multicast promiscuous mode is not allowed for all VFs by default.

The administrator can allow to VF multicast promiscuous mode for only
trusted VM. After allowing multicast promiscuous mode from the host,
we can use over 30 IPv6 addresses on VM.
 # ip link set dev eth0 vf 1 mc_promisc on

When disallowing multicast promiscuous mode, ixgbevf can only handle 30
IPv6 addresses at most.
 # ip link set dev eth0 vf 1 mc_promisc off

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  7 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 32 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  2 ++
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 08e65b6..4a9f74d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -153,6 +153,7 @@ struct vf_data_storage {
u16 vlan_count;
u8 spoofchk_enabled;
bool rss_query_enabled;
+   u8 mc_promisc_allowed;
unsigned int vf_api;
 };
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2f41403..c0e07c5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3663,6 +3663,12 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
  
adapter->vfinfo[i].rss_query_enabled);
}
+
+   /* Reconfigure multicast promiscuous mode */
+   for (i = 0; i < adapter->num_vfs; i++) {
+   ixgbe_ndo_set_vf_mc_promisc(adapter->netdev, i,
+   adapter->vfinfo[i].mc_promisc_allowed);
+   }
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -8165,6 +8171,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_vf_rate= ixgbe_ndo_set_vf_bw,
.ndo_set_vf_spoofchk= ixgbe_ndo_set_vf_spoofchk,
.ndo_set_vf_rss_query_en = ixgbe_ndo_set_vf_rss_query_en,
+   .ndo_set_vf_mc_promisc  = ixgbe_ndo_set_vf_mc_promisc,
.ndo_get_vf_config  = ixgbe_ndo_get_vf_config,
.ndo_get_stats64= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 615f651..42b24a0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -117,8 +117,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
 */
adapter->vfinfo[i].rss_query_enabled = 0;
 
-   /* Turn multicast promiscuous mode off for all VFs */
+   /* Disallow VF multicast promiscuous capability
+* and turn it off for all VFs
+*/
adapter->vfinfo[i].mc_promisc = false;
+   adapter->vfinfo[i].mc_promisc_allowed = false;
}
 
return 0;
@@ -1068,7 +1071,7 @@ static int ixgbe_set_vf_mc_promisc(struct ixgbe_adapter 
*adapter,
 
adapter->vfinfo[vf].mc_promisc = enable;
 
-   if (enable)
+   if (enable && adapter->vfinfo[vf].mc_promisc_allowed)
return ixgbe_enable_vf_mc_promisc(adapter, vf);
else
return ixgbe_disable_vf_mc_promisc(adapter, vf);
@@ -1492,6 +1495,30 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device 
*netdev, int vf,
return 0;
 }
 
+int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool 
setting)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(netdev);
+
+   if (vf >= adapter->num_vfs)
+   return -EINVAL;
+
+   /* nothing to do */
+   if (adapter->vfinfo[vf].mc_promisc_allowed == setting)
+   return 0;
+
+   adapter->vfinfo[vf].mc_promisc_allowed = setting;
+
+   /* if VF requests multicast promiscuous */
+   if (adapter->vfinfo[vf].mc_promisc) {
+   if (setting)
+   ixgbe_enable_vf_mc_promisc(adapter, vf);
+   else
+   ixgbe_disable_vf_mc_promisc(adapter, vf);
+   }
+
+   return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
int vf, struct ifla_vf_info *ivi)
 {
@@ -1506,5 +1533,6 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
ivi->qos = adapter->vfinfo[vf].pf_qos;
ivi->spoofchk 

[PATCH v3 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-04-07 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

The limitation of the number of multicast address for VF is not enough
for the large scale server with SR-IOV feature.
IPv6 requires the multicast MAC address for each IP address to handle
the Neighbor Solicitation message.
We couldn't assign over 30 IPv6 addresses to a single VF interface.

The easy way to solve this is enabling multicast promiscuous mode.
It is good to have a functionality to enable multicast promiscuous mode
for each VF from VF driver.

This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
enable/disable multicast promiscuous mode in VF. If multicast
promiscuous mode is enabled the VF can receive all multicast packets.

With this patch, the ixgbevf driver automatically enable multicast
promiscuous mode when the number of multicast addresses is over than 30
if possible.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---

This adds new mbox API, but doesn't change the version because
v1.3 was newly added in the current dev-queue.
Is that okay, or shall I increment the version?

 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 76 +++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  3 +
 drivers/net/ethernet/intel/ixgbevf/mbx.h  |  2 +
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 27 +++-
 drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
 7 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 636f9e3..08e65b6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -146,6 +146,7 @@ struct vf_data_storage {
u16 vlans_enabled;
bool clear_to_send;
bool pf_set_mac;
+   bool mc_promisc;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
u16 tx_rate;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index b1e4703..dd623ca 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -102,6 +102,8 @@ enum ixgbe_pfvf_api_rev {
 #define IXGBE_VF_GET_RETA  0x0a/* VF request for RETA */
 #define IXGBE_VF_GET_RSS_KEY   0x0b/* get RSS key */
 
+#define IXGBE_VF_SET_MC_PROMISC0x0c/* VF requests PF to set MC 
promiscuous */
+
 /* length of permanent address message returned from PF */
 #define IXGBE_VF_PERMADDR_MSG_LEN 4
 /* word in permanent address message with the current multicast type */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 1d17b58..615f651 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -116,6 +116,9 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
 * we want to disable the querying by default.
 */
adapter->vfinfo[i].rss_query_enabled = 0;
+
+   /* Turn multicast promiscuous mode off for all VFs */
+   adapter->vfinfo[i].mc_promisc = false;
}
 
return 0;
@@ -318,6 +321,40 @@ int ixgbe_pci_sriov_configure(struct pci_dev *dev, int 
num_vfs)
return ixgbe_pci_sriov_enable(dev, num_vfs);
 }
 
+static int ixgbe_enable_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+
+   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
+
+   vmolr |= IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+   return 0;
+}
+
+static int ixgbe_disable_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+
+   e_info(drv, "VF %u: disabling multicast promiscuous\n", vf);
+
+   vmolr &= ~IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+   return 0;
+}
+
 static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
   u32 *msgbuf, u32 vf)
 {
@@ -332,6 +369,12 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter 
*adapter,
u32 mta_reg;
u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
 
+   /* Disable multicast promiscuous first */
+   if (adapter->vfinfo[vf].mc_promisc) {
+   ixgbe_disable_vf_mc_promisc(adapter, vf);
+   adapter->vfinfo[vf].mc_promisc = false;
+   }
+
/* only so many hash values supported */

RE: [E1000-devel] [PATCH v3] ixgbe: make VLAN filter conditional

2015-03-20 Thread Hiroshi Shimamoto
> On 03/16/2015 05:33 AM, Hiroshi Shimamoto wrote:
> >> On 03/11/2015 10:58 PM, Hiroshi Shimamoto wrote:
> >>>> On 03/10/2015 05:59 PM, Hiroshi Shimamoto wrote:
> >>>>> From: Hiroshi Shimamoto 
> >>>>>
> >>>>> Disable hardware VLAN filtering if netdev->features VLAN flag is 
> >>>>> dropped.
> >>>>>
> >>>>> In SR-IOV case, there is a use case which needs to disable VLAN filter.
> >>>>> For example, we need to make a network function with VF in virtualized
> >>>>> environment. That network function may be a software switch, a router
> >>>>> or etc. It means that that network function will be an end point which
> >>>>> terminates many VLANs.
> >>>>>
> >>>>> In the current implementation, VLAN filtering always be turned on and
> >>>>> VF can receive only 63 VLANs. It means that only 63 VLANs can be 
> >>>>> terminated
> >>>>> in one NIC.
> >>>> Technically it is 4096 VLANs that can be terminated in one NIC, only 63
> >>>> VLANs can be routed to VFs/VMDq pools though.  The PF receives all VLAN
> >>>> traffic that isn't routed to a specific VF, but does pass the VFTA
> >>>> registers.
> >>> Right, my explanation was not accurate.
> >>> >From the hardware limitation, there are 64 entries in the shared VLAN 
> >>> >filter.
> >>> That means that only 64 VLANs can be used per port.
> >>>
> >>> Our requirement is that we want to use VLANs without limitation in VF.
> >>> Currently there is only this way, disabling VLAN filter, I could find.
> >> The problem is that unlike multicast promiscuous option that was
> >> supported by the hardware there is nothing to limit this to any one VF.
> >> So if you enable this feature you are not only allowing that one VF to
> >> ignore the VLAN filter rules, but you are disabling them for the PF and
> >> all VFs at once.
> > I'm afraid that I could not explain what we want.
> > We want to use 4k VLANs in a VM which has VF.
> >
> > I understand that when HW VLAN filter is disabled, all VFs and the PF loses
> > this functionality.
> >
> >>>>> On the other hand disabling HW VLAN filtering causes a SECURITY issue
> >>>>> that each VF can receive all VLAN packets. That means that a VF can see
> >>>>> any packet which is sent to other VF.
> >>>> It is worse than that.  Now you also receive all broadcast packets on
> >>>> all VFs.  It means that any of your systems could be buried in traffic
> >>>> with a simple ping flood since it will multiply each frame by the number
> >>>> of VFs you have enabled.
> >>> Is that VLAN filtering specific?
> >>> I understood that broadcast/multicast packets copied to VFs.
> >>> But I couldn't imagine the case each VF has and uses different VLAN.
> >> VLANs are used for isolation, that is kind of the point of a VLAN. So
> >> for example if you had a multi-tenant data center you might use VLANs to
> >> separate the systems that belong to each tenant.  This way it appears
> >> that they are off in their own little cloud and not affecting one
> >> another.  With VLANs disabled you strip that option away and as a result
> >> you end up with each VF being able to see all of the broadcast/multicast
> >> traffic from every other VF.
> > On the other hand, ixgbe chip can only have 64 VLANs and 64 VFs at most.
> > That means I think few number of VLANs can be used in each VF and some VLANs
> > or untagged VLAN may be shared among VFs, then there is broadcast/multicast
> > storm possibility already, that is just my feeling.
> 
> The idea is to only share VLANs between any given customer.  So for
> example if you have 63 VFs (upper limit for ixgbe as I recall), and 5
> customers you would typically break this up into 5 VLANs where each
> customer is assigned one VLAN to isolate their network from the others.
> As a result one customer couldn't send a broadcast storm to the others.
> 
> > By the way, I think, there is another possibility of DoS by requesting much
> > number of VLANs from VF. That causes that later VFs cannot have their VLAN
> > because there are only 64 VLAN entries.
> > The first VM creates 64 VLANs that id 1-64, then start the second VM and the
> > second one fails to have requesting VLAN id 65 because there is no room.
> 

RE: [E1000-devel] [PATCH v3] ixgbe: make VLAN filter conditional

2015-03-16 Thread Hiroshi Shimamoto
> On 03/11/2015 10:58 PM, Hiroshi Shimamoto wrote:
> >> On 03/10/2015 05:59 PM, Hiroshi Shimamoto wrote:
> >>> From: Hiroshi Shimamoto 
> >>>
> >>> Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.
> >>>
> >>> In SR-IOV case, there is a use case which needs to disable VLAN filter.
> >>> For example, we need to make a network function with VF in virtualized
> >>> environment. That network function may be a software switch, a router
> >>> or etc. It means that that network function will be an end point which
> >>> terminates many VLANs.
> >>>
> >>> In the current implementation, VLAN filtering always be turned on and
> >>> VF can receive only 63 VLANs. It means that only 63 VLANs can be 
> >>> terminated
> >>> in one NIC.
> >> Technically it is 4096 VLANs that can be terminated in one NIC, only 63
> >> VLANs can be routed to VFs/VMDq pools though.  The PF receives all VLAN
> >> traffic that isn't routed to a specific VF, but does pass the VFTA
> >> registers.
> > Right, my explanation was not accurate.
> > >From the hardware limitation, there are 64 entries in the shared VLAN 
> > >filter.
> > That means that only 64 VLANs can be used per port.
> >
> > Our requirement is that we want to use VLANs without limitation in VF.
> > Currently there is only this way, disabling VLAN filter, I could find.
> 
> The problem is that unlike multicast promiscuous option that was
> supported by the hardware there is nothing to limit this to any one VF.
> So if you enable this feature you are not only allowing that one VF to
> ignore the VLAN filter rules, but you are disabling them for the PF and
> all VFs at once.

I'm afraid that I could not explain what we want.
We want to use 4k VLANs in a VM which has VF.

I understand that when HW VLAN filter is disabled, all VFs and the PF loses
this functionality.

> 
> >>> On the other hand disabling HW VLAN filtering causes a SECURITY issue
> >>> that each VF can receive all VLAN packets. That means that a VF can see
> >>> any packet which is sent to other VF.
> >> It is worse than that.  Now you also receive all broadcast packets on
> >> all VFs.  It means that any of your systems could be buried in traffic
> >> with a simple ping flood since it will multiply each frame by the number
> >> of VFs you have enabled.
> > Is that VLAN filtering specific?
> > I understood that broadcast/multicast packets copied to VFs.
> > But I couldn't imagine the case each VF has and uses different VLAN.
> 
> VLANs are used for isolation, that is kind of the point of a VLAN. So
> for example if you had a multi-tenant data center you might use VLANs to
> separate the systems that belong to each tenant.  This way it appears
> that they are off in their own little cloud and not affecting one
> another.  With VLANs disabled you strip that option away and as a result
> you end up with each VF being able to see all of the broadcast/multicast
> traffic from every other VF.

On the other hand, ixgbe chip can only have 64 VLANs and 64 VFs at most.
That means I think few number of VLANs can be used in each VF and some VLANs
or untagged VLAN may be shared among VFs, then there is broadcast/multicast
storm possibility already, that is just my feeling.

By the way, I think, there is another possibility of DoS by requesting much
number of VLANs from VF. That causes that later VFs cannot have their VLAN
because there are only 64 VLAN entries.
The first VM creates 64 VLANs that id 1-64, then start the second VM and the
second one fails to have requesting VLAN id 65 because there is no room.

> 
> >>> Signed-off-by: Hiroshi Shimamoto 
> >>> Reviewed-by: Hayato Momma 
> >>> CC: Choi, Sy Jong 
> >>> ---
> >>>drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 26 
> >>> ++
> >>>drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
> >>>2 files changed, 30 insertions(+)
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> >>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> >>> index cd5a2c5..2f7bbb2 100644
> >>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> >>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> >>> @@ -4079,6 +4079,10 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
> >>>   hw->addr_ctrl.user_set_promisc = false;
> >>>   }
> >>>
> >>> +

RE: [E1000-devel] [PATCH v3] ixgbe: make VLAN filter conditional

2015-03-11 Thread Hiroshi Shimamoto
> On 03/10/2015 05:59 PM, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.
> >
> > In SR-IOV case, there is a use case which needs to disable VLAN filter.
> > For example, we need to make a network function with VF in virtualized
> > environment. That network function may be a software switch, a router
> > or etc. It means that that network function will be an end point which
> > terminates many VLANs.
> >
> > In the current implementation, VLAN filtering always be turned on and
> > VF can receive only 63 VLANs. It means that only 63 VLANs can be terminated
> > in one NIC.
> 
> Technically it is 4096 VLANs that can be terminated in one NIC, only 63
> VLANs can be routed to VFs/VMDq pools though.  The PF receives all VLAN
> traffic that isn't routed to a specific VF, but does pass the VFTA
> registers.

Right, my explanation was not accurate.
>From the hardware limitation, there are 64 entries in the shared VLAN filter.
That means that only 64 VLANs can be used per port.

Our requirement is that we want to use VLANs without limitation in VF.
Currently there is only this way, disabling VLAN filter, I could find.

> 
> > On the other hand disabling HW VLAN filtering causes a SECURITY issue
> > that each VF can receive all VLAN packets. That means that a VF can see
> > any packet which is sent to other VF.
> 
> It is worse than that.  Now you also receive all broadcast packets on
> all VFs.  It means that any of your systems could be buried in traffic
> with a simple ping flood since it will multiply each frame by the number
> of VFs you have enabled.

Is that VLAN filtering specific?
I understood that broadcast/multicast packets copied to VFs.
But I couldn't imagine the case each VF has and uses different VLAN.

> 
> > This VLAN filtering can be turned off when SR-IOV is disabled, if not
> > the operation is rejected, to prevent unexpected behavior.
> 
> Yes, but you neglect to mention you allow enabling SR-IOV after it has
> been disabled.  In addition you neglected to address DCB and FCoE which
> are two other features that require VLAN support that are supported on
> these adapters.
> 
> > Signed-off-by: Hiroshi Shimamoto 
> > Reviewed-by: Hayato Momma 
> > CC: Choi, Sy Jong 
> > ---
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 26 
> > ++
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
> >   2 files changed, 30 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index cd5a2c5..2f7bbb2 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -4079,6 +4079,10 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
> > hw->addr_ctrl.user_set_promisc = false;
> > }
> >
> > +   /* Disable hardware VLAN filter if the feature flag is dropped */
> > +   if (!(netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
> > +   vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
> > +
> > /*
> >  * Write addresses to available RAR registers, if there is not
> >  * sufficient space to store all the addresses then enable
> 
> This is outright dangerous for end user configuration.  In addition
> there are other features such as FCoE and DCB that don't function if the
> VLAN filtering is disabled.  Have you even looked into those

Actually I didn't take care about those features.
I'll try to take care about other features in next time.

> complications?  I am pretty certain that the fact tha
> NETIF_F_HW_VLAN_CTAG_FILTER can even be toggled by the user is a bug
> since last I knew the only way to do VLAN promiscuous mode on ixgbe
> parts was to populate the entire VLAN table to all 1s.
> 
> > @@ -7736,6 +7740,28 @@ static int ixgbe_set_features(struct net_device 
> > *netdev,
> > netdev_features_t changed = netdev->features ^ features;
> > bool need_reset = false;
> >
> > +   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
> > +   int vlan_filter = features & NETIF_F_HW_VLAN_CTAG_FILTER;
> > +
> > +   /* Prevent controlling VLAN filter if VFs exist */
> > +   if (adapter->num_vfs > 0) {
> > +   e_dev_info("%s HW VLAN filter is not allowed when "
> > +  "SR-IOV enabled.\n",
> > +  vlan_filter ? "Enabling&qu

[PATCH v3] ixgbe: make VLAN filter conditional

2015-03-10 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.

In SR-IOV case, there is a use case which needs to disable VLAN filter.
For example, we need to make a network function with VF in virtualized
environment. That network function may be a software switch, a router
or etc. It means that that network function will be an end point which
terminates many VLANs.

In the current implementation, VLAN filtering always be turned on and
VF can receive only 63 VLANs. It means that only 63 VLANs can be terminated
in one NIC.

On the other hand disabling HW VLAN filtering causes a SECURITY issue
that each VF can receive all VLAN packets. That means that a VF can see
any packet which is sent to other VF.

This VLAN filtering can be turned off when SR-IOV is disabled, if not
the operation is rejected, to prevent unexpected behavior.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 26 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
 2 files changed, 30 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cd5a2c5..2f7bbb2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4079,6 +4079,10 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
hw->addr_ctrl.user_set_promisc = false;
}
 
+   /* Disable hardware VLAN filter if the feature flag is dropped */
+   if (!(netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
+
/*
 * Write addresses to available RAR registers, if there is not
 * sufficient space to store all the addresses then enable
@@ -7736,6 +7740,28 @@ static int ixgbe_set_features(struct net_device *netdev,
netdev_features_t changed = netdev->features ^ features;
bool need_reset = false;
 
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
+   int vlan_filter = features & NETIF_F_HW_VLAN_CTAG_FILTER;
+
+   /* Prevent controlling VLAN filter if VFs exist */
+   if (adapter->num_vfs > 0) {
+   e_dev_info("%s HW VLAN filter is not allowed when "
+  "SR-IOV enabled.\n",
+  vlan_filter ? "Enabling" : "Disabling");
+   return -EINVAL;
+   }
+   if (!vlan_filter) {
+   e_dev_warn("Disabling HW VLAN filter. This cause "
+  "SERIOUS SECURITY issues.\n");
+   e_dev_warn("Every VF users can receive a packet to "
+  "other VFs.\n");
+   e_dev_warn("You cannot turn it on again if you are "
+  "using SR-IOV.\n");
+   }
+   /* reset if HW VLAN filter is changed */
+   need_reset = true;
+   }
+
/* Make sure RSC matches LRO, reset if change */
if (!(features & NETIF_F_LRO)) {
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 2d98ecd..f3a315c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -787,6 +787,10 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter 
*adapter,
u32 bits;
u8 tcs = netdev_get_num_tc(adapter->netdev);
 
+   /* Ignore if VLAN filter is disabled */
+   if (!(adapter->netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   return 0;
+
if (adapter->vfinfo[vf].pf_vlan || tcs) {
e_warn(drv,
   "VF %d attempted to override administratively set VLAN 
configuration\n"
-- 
2.1.0

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [E1000-devel] [PATCH v2 2/3] if_link: Add VF multicast promiscuous control

2015-03-09 Thread Hiroshi Shimamoto
> On Mon, 2015-03-09 at 18:52 -0700, Jeff Kirsher wrote:
> > On Tue, 2015-03-10 at 01:42 +, Hiroshi Shimamoto wrote:
> > > > On 03/08/2015 02:15 PM, Or Gerlitz wrote:
> > > > > On Mon, Feb 23, 2015 at 11:14 PM, Jeff Kirsher
> > > > >  wrote:
> > > > > [...]
> > > > >> We discussed this during NetConf last week, and Don is correct
> > > that a
> > > > >> custom sysfs interface is not the way we want to handle this.  We
> > > agreed
> > > > >> upon a generic interface so that any NIC is able to turn on or
> > > off VF
> > > > >> multicast promiscuous mode.
> > > > >
> > > > > Jeff, please make sure to either respond to my comments on the V2
> > > > > thread (or better) address them for the V3 post.
> > > > >
> > > > >
> > > > > http://marc.info/?l=linux-netdev&m=142441852518152&w=2
> > > > > http://marc.info/?l=linux-netdev&m=142441867218183&w=2
> > > >
> > > > I agree with you that the patch descriptions should be cleaned up
> > > and
> > > > "beefed" up for that matter.
> > > >
> > > > If/when I look to push his series of patches, I will make sure that
> > > your
> > > > concerns are addressed so that we can get a accurate changelog.
> > >
> > > I see that the patchset should have better explanation in changelog.
> > > I will rewrite it and submit again.
> > >
> > > Jeff, are you planning to drop the patchset from your tree?
> > > I just concerned which tree I should create patches against for.
> >
> > Yes, I will drop the current patchset in my queue.  I am in the process
> > of updating my queue, go ahead and make your patches against the
> > following tree:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git
> > all-queue branch
> >
> > If you give me an hour or so, I should have my tree updated with all the
> > patches in my queue currently.
> 
> Ok, correction on the branch name.  After doing some cleanup and future
> planning, the following tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git
> is what you want to use and the branch name is:
> unstable-queue
> 
> The branch has all the patches currently in my queue.

OK, now I have the above branch:
From git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
   d7ed747..115403d  master -> jeff-next/master
 * [new branch]  unstable-queue -> jeff-next/unstable-queue

I will work against that tree.

thanks,
Hiroshi


Re: [E1000-devel] [PATCH v2 2/3] if_link: Add VF multicast promiscuous control

2015-03-09 Thread Hiroshi Shimamoto
> On 03/08/2015 02:15 PM, Or Gerlitz wrote:
> > On Mon, Feb 23, 2015 at 11:14 PM, Jeff Kirsher
> >  wrote:
> > [...]
> >> We discussed this during NetConf last week, and Don is correct that a
> >> custom sysfs interface is not the way we want to handle this.  We agreed
> >> upon a generic interface so that any NIC is able to turn on or off VF
> >> multicast promiscuous mode.
> >
> > Jeff, please make sure to either respond to my comments on the V2
> > thread (or better) address them for the V3 post.
> >
> >
> > http://marc.info/?l=linux-netdev&m=142441852518152&w=2
> > http://marc.info/?l=linux-netdev&m=142441867218183&w=2
> 
> I agree with you that the patch descriptions should be cleaned up and
> "beefed" up for that matter.
> 
> If/when I look to push his series of patches, I will make sure that your
> concerns are addressed so that we can get a accurate changelog.

I see that the patchset should have better explanation in changelog.
I will rewrite it and submit again.

Jeff, are you planning to drop the patchset from your tree?
I just concerned which tree I should create patches against for.

thanks,
Hiroshi

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH v2] ixgbe: make VLAN filter conditional

2015-03-06 Thread Hiroshi Shimamoto
> On Fri, 2015-03-06 at 06:04 +0000, Hiroshi Shimamoto wrote:
> > > From: Hiroshi Shimamoto 
> > >
> > > Disable hardware VLAN filtering if netdev->features VLAN flag is
> > dropped.
> > >
> > > In SR-IOV case, there is a use case which needs to disable VLAN
> > filter.
> > > For example, we need to make a network function with VF in
> > virtualized
> > > environment. That network function may be a software switch, a
> > router
> > > or etc. It means that that network function will be an end point
> > which
> > > terminates many VLANs.
> > >
> > > In the current implementation, VLAN filtering always be turned on
> > and
> > > VF can receive only 63 VLANs. It means that only 63 VLANs can be
> > terminated
> > > in one NIC.
> > >
> > > With this patch, if the user turns VLAN filtering off on the host,
> > VF
> > > can receive every VLAN packet.
> > >
> > > This VLAN filtering can be turned on or off when SR-IOV is disabled,
> > if not
> > > the operation is rejected.
> >
> > Hi,
> >
> > any comment about this?
> > I added a warning message and prevent operation during SR-IOV is
> > enabled.
> 
> Yes, the warning message you added says nothing of the huge security
> hole this exposes.  We need a message the correctly expresses the
> dangers in turning this off.

hm okay.
Do you mean I should add a message like "this causes SECURITY issue", right?

> 
> Also it does not appear that you addressed Ben Hutchings concerns, as I
> asked.  Correct me if I am wrong and you did address Ben's concerns.

I think Ben's suggestion is to prevent turn VLAN filtering back on during
VFs are used because that breaks guest's behavior.
I added the code that make it impossible. We cannot turn on (or off) if
the NIC has VFs.

thanks,
Hiroshi


RE: [PATCH v2] ixgbe: make VLAN filter conditional

2015-03-05 Thread Hiroshi Shimamoto
> From: Hiroshi Shimamoto 
> 
> Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.
> 
> In SR-IOV case, there is a use case which needs to disable VLAN filter.
> For example, we need to make a network function with VF in virtualized
> environment. That network function may be a software switch, a router
> or etc. It means that that network function will be an end point which
> terminates many VLANs.
> 
> In the current implementation, VLAN filtering always be turned on and
> VF can receive only 63 VLANs. It means that only 63 VLANs can be terminated
> in one NIC.
> 
> With this patch, if the user turns VLAN filtering off on the host, VF
> can receive every VLAN packet.
> 
> This VLAN filtering can be turned on or off when SR-IOV is disabled, if not
> the operation is rejected.

Hi,

any comment about this?
I added a warning message and prevent operation during SR-IOV is enabled.


thanks,
Hiroshi

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH v2] ixgbe: make VLAN filter conditional

2015-02-26 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.

In SR-IOV case, there is a use case which needs to disable VLAN filter.
For example, we need to make a network function with VF in virtualized
environment. That network function may be a software switch, a router
or etc. It means that that network function will be an end point which
terminates many VLANs.

In the current implementation, VLAN filtering always be turned on and
VF can receive only 63 VLANs. It means that only 63 VLANs can be terminated
in one NIC.

With this patch, if the user turns VLAN filtering off on the host, VF
can receive every VLAN packet.

This VLAN filtering can be turned on or off when SR-IOV is disabled, if not
the operation is rejected.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 23 +++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
 2 files changed, 27 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f690f5d..9593366 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4081,6 +4081,10 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
hw->addr_ctrl.user_set_promisc = false;
}
 
+   /* Disable hardware VLAN filter if the feature flag is dropped */
+   if (!(netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
+
/*
 * Write addresses to available RAR registers, if there is not
 * sufficient space to store all the addresses then enable
@@ -7734,6 +7738,26 @@ static int ixgbe_set_features(struct net_device *netdev,
netdev_features_t changed = netdev->features ^ features;
bool need_reset = false;
 
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
+   int vlan_filter = features & NETIF_F_HW_VLAN_CTAG_FILTER;
+
+   /* Prevent controlling VLAN filter if VFs exist */
+   if (adapter->num_vfs > 0) {
+   e_dev_info("%s HW VLAN filter is not allowed when "
+  "SR-IOV enabled.\n",
+  vlan_filter ? "Enabling" : "Disabling");
+   return -EINVAL;
+   }
+   if (!vlan_filter) {
+   e_dev_warn("Disabling HW VLAN filter. All VFs cannot "
+  "set VLAN filter from VF driver.\n");
+   e_dev_warn("All VLAN packets are delivered to "
+  "every VF.\n");
+   }
+   /* reset if HW VLAN filter is changed */
+   need_reset = true;
+   }
+
/* Make sure RSC matches LRO, reset if change */
if (!(features & NETIF_F_LRO)) {
if (adapter->flags2 & IXGBE_FLAG2_RSC_ENABLED)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 288f39f..9ad45738 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -839,6 +839,10 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter 
*adapter,
u32 bits;
u8 tcs = netdev_get_num_tc(adapter->netdev);
 
+   /* Ignore if VLAN filter is disabled */
+   if (!(adapter->netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   return 0;
+
if (adapter->vfinfo[vf].pf_vlan || tcs) {
e_warn(drv,
   "VF %d attempted to override administratively set VLAN 
configuration\n"
-- 
2.1.0



RE: [E1000-devel] [PATCH] ixgbe: make VLAN filter conditional in SR-IOV case

2015-02-24 Thread Hiroshi Shimamoto
> On Wed, 2015-02-25 at 00:51 +0000, Hiroshi Shimamoto wrote:
> > > Subject: Re: [E1000-devel] [PATCH] ixgbe: make VLAN filter
> > conditional in SR-IOV case
> > >
> > > On Thu, 2014-11-13 at 08:28 +, Hiroshi Shimamoto wrote:
> > > > From: Hiroshi Shimamoto 
> > > >
> > > > Disable hardware VLAN filtering if netdev->features VLAN flag is
> > > > dropped.
> > > >
> > > > In SR-IOV case, there is a use case which needs to disable VLAN
> > > > filter.
> > > > For example, we need to make a network function with VF in
> > virtualized
> > > > environment. That network function may be a software switch, a
> > router
> > > > or etc. It means that that network function will be an end point
> > which
> > > > terminates many VLANs.
> > > >
> > > > In the current implementation, VLAN filtering always be turned on
> > and
> > > > VF can receive only 63 VLANs. It means that only 63 VLANs can be
> > used
> > > > and it's not enough at all for building a virtual router.
> > > >
> > > > With this patch, if the user turns VLAN filtering off on the host,
> > VF
> > > > can receive every VLAN packet.
> > > > The behavior is changed only if VLAN filtering is turned off by
> > > > ethtool.
> > > >
> > > > Signed-off-by: Hiroshi Shimamoto 
> > > > CC: Choi, Sy Jong 
> > > > ---
> > > >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 10 ++
> > > >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
> > > >  2 files changed, 14 insertions(+)
> > >
> > > Thanks Hiroshi, I will add your patch to my queue.
> >
> > How about this patch?
> > It hasn't been in your tree,.
> > Is there any issue?
> 
> This patch was dropped for two reasons.  First was Ben Hutchings issues
> with the patch needed to be addressed.  Second, was due to a possible
> security hole which is why VLAN filtering was not disabled in SRIOV
> mode, where isolation is lost between VMs.
> 
> If you want to continue going forward with this change, a warning
> message should be added, at least, warning the user of the possible
> security issues.

okay, I understand.
I will submit a patch which has warning message.

thanks,
Hiroshi



RE: [E1000-devel] [PATCH] ixgbe: make VLAN filter conditional in SR-IOV case

2015-02-24 Thread Hiroshi Shimamoto
> Subject: Re: [E1000-devel] [PATCH] ixgbe: make VLAN filter conditional in 
> SR-IOV case
> 
> On Thu, 2014-11-13 at 08:28 +0000, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > Disable hardware VLAN filtering if netdev->features VLAN flag is
> > dropped.
> >
> > In SR-IOV case, there is a use case which needs to disable VLAN
> > filter.
> > For example, we need to make a network function with VF in virtualized
> > environment. That network function may be a software switch, a router
> > or etc. It means that that network function will be an end point which
> > terminates many VLANs.
> >
> > In the current implementation, VLAN filtering always be turned on and
> > VF can receive only 63 VLANs. It means that only 63 VLANs can be used
> > and it's not enough at all for building a virtual router.
> >
> > With this patch, if the user turns VLAN filtering off on the host, VF
> > can receive every VLAN packet.
> > The behavior is changed only if VLAN filtering is turned off by
> > ethtool.
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > CC: Choi, Sy Jong 
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 10 ++
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
> >  2 files changed, 14 insertions(+)
> 
> Thanks Hiroshi, I will add your patch to my queue.

How about this patch?
It hasn't been in your tree,.
Is there any issue?

thanks,
Hiroshi

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [E1000-devel] [PATCH v2 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-02-24 Thread Hiroshi Shimamoto
> >-Original Message-
> >From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> >Sent: Thursday, February 19, 2015 5:01 PM
> > Subject: [E1000-devel] [PATCH v2 3/3] ixgbe: Add new ndo to allow VF 
> > multicast promiscuous mode
> >
> >From: Hiroshi Shimamoto 
> >
> >Implements the new netdev op to allow VF multicast promiscuous mode.
> >
> >The administrator can allow to VF multicast promiscuous mode for only
> >trusted VM. After allowing multicast promiscuous mode from the host,
> >we can use over 30 IPv6 addresses on VM.
> > # ./ip link set dev eth0 vf 1 mc_promisc on
> >
> >When disallowing multicast promiscuous mode, we can only use 30 IPv6 
> >addresses.
> > # ./ip link set dev eth0 vf 1 mc_promisc off
> >
> >Signed-off-by: Hiroshi Shimamoto 
> >Reviewed-by: Hayato Momma 
> >CC: Choi, Sy Jong 
> 
> 
> 
> +int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool 
> setting)
> +{
> + struct ixgbe_adapter *adapter = netdev_priv(netdev);
> + struct ixgbe_hw *hw = &adapter->hw;
> + u32 vmolr;
> 
> vmolr is unused variable in this function.
> 
> +
> + if (vf >= adapter->num_vfs)
> + return -EINVAL;
> +
> + /* nothing to do */
> + if (adapter->vfinfo[vf].mc_promisc_allowed == setting)
> + return 0;
> +
> + adapter->vfinfo[vf].mc_promisc_allowed = setting;
> +
> + /* if VF requests multicast promiscuous */
> + if (adapter->vfinfo[vf].mc_promisc) {
> + if (setting)
> + ixgbe_enable_vf_mc_promisc(adapter, vf);
> + else
> + ixgbe_disable_vf_mc_promisc(adapter, vf);
> + }
> +
> + return 0;
> +}

thank you for pointing it.
I realized it and am preparing a patch.

thanks,
Hiroshi


[PATCH v2 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-02-19 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Implements the new netdev op to allow VF multicast promiscuous mode.

The administrator can allow to VF multicast promiscuous mode for only
trusted VM. After allowing multicast promiscuous mode from the host,
we can use over 30 IPv6 addresses on VM.
 # ./ip link set dev eth0 vf 1 mc_promisc on

When disallowing multicast promiscuous mode, we can only use 30 IPv6 addresses.
 # ./ip link set dev eth0 vf 1 mc_promisc off

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  7 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 35 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  2 ++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 41ed5ab..05293d7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -152,6 +152,7 @@ struct vf_data_storage {
u16 tx_rate;
u16 vlan_count;
u8 spoofchk_enabled;
+   u8 mc_promisc_allowed;
unsigned int vf_api;
 };
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 70cc4c5..c169fba 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3604,6 +3604,12 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
if (hw->mac.ops.set_ethertype_anti_spoofing)
hw->mac.ops.set_ethertype_anti_spoofing(hw, true, i);
}
+
+   /* Reconfigure multicast promiscuous mode */
+   for (i = 0; i < adapter->num_vfs; i++) {
+   ixgbe_ndo_set_vf_mc_promisc(adapter->netdev, i,
+   adapter->vfinfo[i].mc_promisc_allowed);
+   }
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -8052,6 +8058,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_vf_vlan= ixgbe_ndo_set_vf_vlan,
.ndo_set_vf_rate= ixgbe_ndo_set_vf_bw,
.ndo_set_vf_spoofchk= ixgbe_ndo_set_vf_spoofchk,
+   .ndo_set_vf_mc_promisc  = ixgbe_ndo_set_vf_mc_promisc,
.ndo_get_vf_config  = ixgbe_ndo_get_vf_config,
.ndo_get_stats64= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 965ad29..288f39f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -108,8 +108,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
for (i = 0; i < adapter->num_vfs; i++) {
/* Enable spoof checking for all VFs */
adapter->vfinfo[i].spoofchk_enabled = true;
-   /* Turn multicast promiscuous mode off for all VFs */
+   /* Disallow VF multicast promiscuous capability
+* and turn it off for all VFs
+*/
adapter->vfinfo[i].mc_promisc = false;
+   adapter->vfinfo[i].mc_promisc_allowed = false;
}
return 0;
}
@@ -1016,7 +1019,7 @@ static int ixgbe_set_vf_mc_promisc(struct ixgbe_adapter 
*adapter,
 
adapter->vfinfo[vf].mc_promisc = enable;
 
-   if (enable)
+   if (enable && adapter->vfinfo[vf].mc_promisc_allowed)
return ixgbe_enable_vf_mc_promisc(adapter, vf);
else
return ixgbe_disable_vf_mc_promisc(adapter, vf);
@@ -1414,6 +1417,32 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, 
int vf, bool setting)
return 0;
 }
 
+int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool 
setting)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(netdev);
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 vmolr;
+
+   if (vf >= adapter->num_vfs)
+   return -EINVAL;
+
+   /* nothing to do */
+   if (adapter->vfinfo[vf].mc_promisc_allowed == setting)
+   return 0;
+
+   adapter->vfinfo[vf].mc_promisc_allowed = setting;
+
+   /* if VF requests multicast promiscuous */
+   if (adapter->vfinfo[vf].mc_promisc) {
+   if (setting)
+   ixgbe_enable_vf_mc_promisc(adapter, vf);
+   else
+   ixgbe_disable_vf_mc_promisc(adapter, vf);
+   }
+
+   return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
int vf, struct ifla_vf_info *ivi)
 {
@@ -1427,5 +1456,7 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
  

[PATCH v2 2/3] if_link: Add VF multicast promiscuous control

2015-02-19 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Add netlink directives and ndo entry to allow VF multicast promiscuous mode.

The administrator wants to allow dedicatedly multicast promiscuous per VF.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 include/linux/if_link.h  |  1 +
 include/linux/netdevice.h|  3 +++
 include/uapi/linux/if_link.h |  6 ++
 net/core/rtnetlink.c | 18 --
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 119130e..bc29ddf 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -14,5 +14,6 @@ struct ifla_vf_info {
__u32 linkstate;
__u32 min_tx_rate;
__u32 max_tx_rate;
+   __u32 mc_promisc;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d115256..fd15d87 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -870,6 +870,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
  *   int max_tx_rate);
  * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
+ * int (*ndo_set_vf_mc_promisc)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  * int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int 
link_state);
@@ -1086,6 +1087,8 @@ struct net_device_ops {
   int max_tx_rate);
int (*ndo_set_vf_spoofchk)(struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_set_vf_mc_promisc)(struct net_device *dev,
+int vf, bool setting);
int (*ndo_get_vf_config)(struct net_device *dev,
 int vf,
 struct ifla_vf_info *ivf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 0deee3e..d7dc39c 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -458,6 +458,7 @@ enum {
IFLA_VF_SPOOFCHK,   /* Spoof Checking on/off switch */
IFLA_VF_LINK_STATE, /* link state enable/disable/auto switch */
IFLA_VF_RATE,   /* Min and Max TX Bandwidth Allocation */
+   IFLA_VF_MC_PROMISC, /* Multicast Promiscuous allow/disallow */
__IFLA_VF_MAX,
 };
 
@@ -502,6 +503,11 @@ struct ifla_vf_link_state {
__u32 link_state;
 };
 
+struct ifla_vf_mc_promisc {
+   __u32 vf;
+   __u32 setting;
+};
+
 /* VF ports management section
  *
  * Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5be499b..b668e96 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -818,7 +818,8 @@ static inline int rtnl_vfinfo_size(const struct net_device 
*dev,
 nla_total_size(sizeof(struct ifla_vf_vlan)) +
 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
 nla_total_size(sizeof(struct ifla_vf_rate)) +
-nla_total_size(sizeof(struct ifla_vf_link_state)));
+nla_total_size(sizeof(struct ifla_vf_link_state)) +
+nla_total_size(sizeof(struct ifla_vf_mc_promisc)));
return size;
} else
return 0;
@@ -,6 +1112,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
struct ifla_vf_tx_rate vf_tx_rate;
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_link_state vf_linkstate;
+   struct ifla_vf_mc_promisc vf_mc_promisc;
 
/*
 * Not all SR-IOV capable drivers support the
@@ -1119,6 +1121,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
 * report anything.
 */
ivi.spoofchk = -1;
+   ivi.mc_promisc = -1;
memset(ivi.mac, 0, sizeof(ivi.mac));
/* The default value for VF link state is "auto"
 * IFLA_VF_LINK_STATE_AUTO which equals zero
@@ -1131,7 +1134,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
vf_rate.vf =
vf_tx_rate.vf =
vf_spoofchk.vf =
-   vf_linkstate.vf = ivi.vf;
+   vf_linkstate.vf =
+   vf_mc_

[PATCH v2 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-19 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

The limitation of the number of multicast address for VF is not enough
for the large scale server with SR-IOV feature.
IPv6 requires the multicast MAC address for each IP address to handle
the Neighbor Solicitation message.
We couldn't assign over 30 IPv6 addresses to a single VF interface.

The easy way to solve this is enabling multicast promiscuous mode.
It is good to have a functionality to enable multicast promiscuous mode
for each VF from VF driver.

This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
enable/disable multicast promiscuous mode in VF. If multicast promiscuous
mode is enabled the VF can receive all multicast packets.

With this patch, the ixgbevf driver automatically enable multicast
promiscuous mode when the number of multicast addresses is over than 30
if possible.

This also bump the API version up to 1.2 to check whether the API,
IXGBE_VF_SET_MC_PROMISC is available.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---

This patchset is against Jeff's tree.

cfba326 e1000e: Fix 82574/82583 TimeSync errata handling for SYSTIM read

The tree hasn't haven the fix of the IPv6 checksum issue yet,
but I cherry-picked the commit and tested.

 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  4 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 88 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++-
 drivers/net/ethernet/intel/ixgbevf/mbx.h  |  4 ++
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 28 +++-
 drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
 7 files changed, 135 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 7dcbbec..41ed5ab 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -146,6 +146,7 @@ struct vf_data_storage {
u16 vlans_enabled;
bool clear_to_send;
bool pf_set_mac;
+   bool mc_promisc;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
u16 tx_rate;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index a5cb755..2963557 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -73,6 +73,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_10,  /* API version 1.0, linux/freebsd VF driver */
ixgbe_mbox_api_20,  /* API version 2.0, solaris Phase1 VF driver */
ixgbe_mbox_api_11,  /* API version 1.1, linux/freebsd VF driver */
+   ixgbe_mbox_api_12,  /* API version 1.2, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
 };
@@ -91,6 +92,9 @@ enum ixgbe_pfvf_api_rev {
 /* mailbox API, version 1.1 VF requests */
 #define IXGBE_VF_GET_QUEUES0x09 /* get queue configuration */
 
+/* mailbox API, version 1.2 VF requests */
+#define IXGBE_VF_SET_MC_PROMISC0x0a /* VF requests PF to set MC 
promiscuous */
+
 /* GET_QUEUES return data indices within the mailbox */
 #define IXGBE_VF_TX_QUEUES 1   /* number of Tx queues supported */
 #define IXGBE_VF_RX_QUEUES 2   /* number of Rx queues supported */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 7f37fe7..965ad29 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -105,9 +105,12 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 IXGBE_FLAG2_RSC_ENABLED);
 
-   /* enable spoof checking for all VFs */
-   for (i = 0; i < adapter->num_vfs; i++)
+   for (i = 0; i < adapter->num_vfs; i++) {
+   /* Enable spoof checking for all VFs */
adapter->vfinfo[i].spoofchk_enabled = true;
+   /* Turn multicast promiscuous mode off for all VFs */
+   adapter->vfinfo[i].mc_promisc = false;
+   }
return 0;
}
 
@@ -308,6 +311,40 @@ int ixgbe_pci_sriov_configure(struct pci_dev *dev, int 
num_vfs)
return ixgbe_pci_sriov_enable(dev, num_vfs);
 }
 
+static int ixgbe_enable_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+
+   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
+
+   vmolr |= IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+

RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-15 Thread Hiroshi Shimamoto
> > Can you please fix up your patches based on my tree:
> > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/que
> > ue.git
> 
>  Yes. I haven't noticed your tree.
>  Will resend patches against it.
> 
> >>>
> >>> I encountered an issue with your tree, the commit id is below.
> >>>
> >>> $ git log | head
> >>> commit e6f1649780f8f5a87299bf6af04453f93d1e3d5e
> >>> Author: Rasmus Villemoes 
> >>> Date:   Fri Jan 23 20:43:14 2015 -0800
> >>>
> >>> ethernet: fm10k: Actually drop 4 bits
> >>>
> >>> The comment explains the intention, but vid has type u16. Before
> >> the
> >>> inner shift, it is promoted to int, which has plenty of space for 
> >>> all
> >>> vid's bits, so nothing is dropped. Use a simple mask instead.
> >>>
> >>>
> >>> I use the kernel from your tree in both host and guest.
> >>>
> >>> Assign an IPv6 for VF in guest.
> >>> # ip -6 addr add 2001:db8::18:1/64 dev ens0
> >>>
> >>> Send ping packet from other server to the VM.
> >>> # ping6  2001:db8::18:1 -I eth0
> >>>
> >>> The following message was shown.
> >>> ixgbevf :00:08.0: partial checksum but l4 proto=3a!
> >>>
> >>> If I did the same operation in the host, I saw the same error
> >>> message in
> > host too.
> >>> ixgbe :2d:00.0: partial checksum but l4 proto=3a!
> >>>
> >>> Do you have any idea about that?
> >>
> >> Ah, sorry about that, try this tree again:
> >> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> >>
> >> That patch was dropped for favor of a patch that Matthew Vick
> >> put together (and recently got pushed upstream).  So my queue no
> >> longer has that patch in the queue, since it got dropped.
> >
> > I still see the same error, the head id is the below
> >
> > $ git log | head
> > commit a072afb0b45904022b76deef3b770ee9a93cb13a
> > Author: Nicholas Krause 
> > Date:   Mon Feb 9 00:27:00 2015 -0800
> >
> > igb: Remove outdated fix me comment in the
> > function,gb_acquire_swfw_sync_i210
> >
> >
> > thanks,
> > Hiroshi
> 
>  I'm having our validation see if they can recreate the same issue
>  internally.  When they get back to me I'll let you
> >>> know
>  what we found.
> >>>
> >>> We did bisect, and the below looks the culprit;
> >>>
> >>> 32dce968dd987adfb0c00946d78dad9154f64759 is the first bad commit
> >>> commit 32dce968dd987adfb0c00946d78dad9154f64759
> >>> Author: Vlad Yasevich 
> >>> Date:   Sat Jan 31 10:40:18 2015 -0500
> >>>
> >>> ipv6: Allow for partial checksums on non-ufo packets
> >>>
> >>> Currntly, if we are not doing UFO on the packet, all UDP
> >>> packets will start with CHECKSUM_NONE and thus perform full
> >>> checksum computations in software even if device support
> >>> IPv6 checksum offloading.
> >>>
> >>> Let's start start with CHECKSUM_PARTIAL if the device
> >>> supports it and we are sending only a single packet at
> >>> or below mtu size.
> >>>
> >>> Signed-off-by: Vladislav Yasevich 
> >>> Signed-off-by: David S. Miller 
> >>>
> >>> :04 04 4437eaf7e944f5a6136ebf668a256fee688fda3d
> >> fade8da998d35c8da97a15f0556949ad371e5347 M  net
> >>
> >> When I reverted the commit, the issue was solved.
> >>
> >> thanks,
> >> Hiroshi
> >
> > I believe the issue is that this patch (32dce968dd98 - ipv6: Allow for 
> > partial checksums on non-ufo packets) is that
> it now sets CHECKSUM_PARTIAL on all IPv6 packets including ICMPv6 ones.  Our 
> HW (82599) only supports checksum offload
> on TCP/UDP (NETIF_F_IPV6_CSUM) so we get hung up on the skb's protocol and 
> the fact that it is CHECKSUM_PARTIAL.
> >
> > Another thing that confuses me is the feature test in this patch.  It 
> > checks (rt->dst.dev->features & NETIF_F_V6_CSUM)
> but NETIF_F_V6_CSUM is a two bit field?
> >
> > #define NETIF_F_V6_CSUM (NETIF_F_GEN_CSUM | NETIF_F_IPV6_CSUM)
> >
> > So the test would succeed if either bit was high, that doesn't seem right.  
> > I cc'd the author so maybe he could clue
> us in.
> 
> This has been addressed by:
> commit bf250a1fa769f2eb8fc7a4e28b3b523e9cb67eef
> Author: Vlad Yasevich 
> Date:   Tue Feb 10 11:37:29 2015 -0500
> 
> ipv6: Partial checksum only UDP packets
> 
> 
> As far the 2 bit issue, GEN_CSUM (HW_SUM) and IPV6_CSUM can not coexist at 
> the same time.
> See netdev_fix_features().
> 

thanks for pointing it. I will test with that commit.

Jeff's tree hasn't included that commit yet, right?
Which branch has the commit?

thanks,
Hiroshi
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-12 Thread Hiroshi Shimamoto
> > > -Original Message-
> > > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > > Sent: Monday, February 09, 2015 6:29 PM
> > > To: Kirsher, Jeffrey T
> > > Cc: Alexander Duyck; Skidmore, Donald C; Bjørn Mork; e1000-
> > > de...@lists.sourceforge.net; net...@vger.kernel.org; Choi, Sy Jong; linux-
> > > ker...@vger.kernel.org; David Laight; Hayato Momma
> > > Subject: RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to
> > > enable MC promiscuous mode
> > >
> > > > > > > Can you please fix up your patches based on my tree:
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> > > > > >
> > > > > > Yes. I haven't noticed your tree.
> > > > > > Will resend patches against it.
> > > > > >
> > > > >
> > > > > I encountered an issue with your tree, the commit id is below.
> > > > >
> > > > > $ git log | head
> > > > > commit e6f1649780f8f5a87299bf6af04453f93d1e3d5e
> > > > > Author: Rasmus Villemoes 
> > > > > Date:   Fri Jan 23 20:43:14 2015 -0800
> > > > >
> > > > > ethernet: fm10k: Actually drop 4 bits
> > > > >
> > > > > The comment explains the intention, but vid has type u16. Before 
> > > > > the
> > > > > inner shift, it is promoted to int, which has plenty of space for 
> > > > > all
> > > > > vid's bits, so nothing is dropped. Use a simple mask instead.
> > > > >
> > > > >
> > > > > I use the kernel from your tree in both host and guest.
> > > > >
> > > > > Assign an IPv6 for VF in guest.
> > > > > # ip -6 addr add 2001:db8::18:1/64 dev ens0
> > > > >
> > > > > Send ping packet from other server to the VM.
> > > > > # ping6  2001:db8::18:1 -I eth0
> > > > >
> > > > > The following message was shown.
> > > > > ixgbevf :00:08.0: partial checksum but l4 proto=3a!
> > > > >
> > > > > If I did the same operation in the host, I saw the same error message 
> > > > > in
> > > host too.
> > > > > ixgbe :2d:00.0: partial checksum but l4 proto=3a!
> > > > >
> > > > > Do you have any idea about that?
> > > >
> > > > Ah, sorry about that, try this tree again:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> > > >
> > > > That patch was dropped for favor of a patch that Matthew Vick put
> > > > together (and recently got pushed upstream).  So my queue no longer
> > > > has that patch in the queue, since it got dropped.
> > >
> > > I still see the same error, the head id is the below
> > >
> > > $ git log | head
> > > commit a072afb0b45904022b76deef3b770ee9a93cb13a
> > > Author: Nicholas Krause 
> > > Date:   Mon Feb 9 00:27:00 2015 -0800
> > >
> > > igb: Remove outdated fix me comment in the
> > > function,gb_acquire_swfw_sync_i210
> > >
> > >
> > > thanks,
> > > Hiroshi
> >
> > I'm having our validation see if they can recreate the same issue 
> > internally.  When they get back to me I'll let you
> know
> > what we found.
> 
> We did bisect, and the below looks the culprit;
> 
> 32dce968dd987adfb0c00946d78dad9154f64759 is the first bad commit
> commit 32dce968dd987adfb0c00946d78dad9154f64759
> Author: Vlad Yasevich 
> Date:   Sat Jan 31 10:40:18 2015 -0500
> 
> ipv6: Allow for partial checksums on non-ufo packets
> 
> Currntly, if we are not doing UFO on the packet, all UDP
> packets will start with CHECKSUM_NONE and thus perform full
> checksum computations in software even if device support
> IPv6 checksum offloading.
> 
> Let's start start with CHECKSUM_PARTIAL if the device
> supports it and we are sending only a single packet at
> or below mtu size.
> 
> Signed-off-by: Vladislav Yasevich 
> Signed-off-by: David S. Miller 
> 
> :04 04 4437eaf7e944f5a6136ebf668a256fee688fda3d 
> fade8da998d35c8da97a15f0556949ad371e5347 M  net

When I reverted the commit, the issue was solved.

thanks,
Hiroshi

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-11 Thread Hiroshi Shimamoto
> Subject: RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to 
> enable MC promiscuous mode
> 
> > -Original Message-
> > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > Sent: Monday, February 09, 2015 6:29 PM
> > To: Kirsher, Jeffrey T
> > Cc: Alexander Duyck; Skidmore, Donald C; Bjørn Mork; e1000-
> > de...@lists.sourceforge.net; net...@vger.kernel.org; Choi, Sy Jong; linux-
> > ker...@vger.kernel.org; David Laight; Hayato Momma
> > Subject: RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to
> > enable MC promiscuous mode
> >
> > > > > > Can you please fix up your patches based on my tree:
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> > > > >
> > > > > Yes. I haven't noticed your tree.
> > > > > Will resend patches against it.
> > > > >
> > > >
> > > > I encountered an issue with your tree, the commit id is below.
> > > >
> > > > $ git log | head
> > > > commit e6f1649780f8f5a87299bf6af04453f93d1e3d5e
> > > > Author: Rasmus Villemoes 
> > > > Date:   Fri Jan 23 20:43:14 2015 -0800
> > > >
> > > > ethernet: fm10k: Actually drop 4 bits
> > > >
> > > > The comment explains the intention, but vid has type u16. Before the
> > > > inner shift, it is promoted to int, which has plenty of space for 
> > > > all
> > > > vid's bits, so nothing is dropped. Use a simple mask instead.
> > > >
> > > >
> > > > I use the kernel from your tree in both host and guest.
> > > >
> > > > Assign an IPv6 for VF in guest.
> > > > # ip -6 addr add 2001:db8::18:1/64 dev ens0
> > > >
> > > > Send ping packet from other server to the VM.
> > > > # ping6  2001:db8::18:1 -I eth0
> > > >
> > > > The following message was shown.
> > > > ixgbevf :00:08.0: partial checksum but l4 proto=3a!
> > > >
> > > > If I did the same operation in the host, I saw the same error message in
> > host too.
> > > > ixgbe :2d:00.0: partial checksum but l4 proto=3a!
> > > >
> > > > Do you have any idea about that?
> > >
> > > Ah, sorry about that, try this tree again:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> > >
> > > That patch was dropped for favor of a patch that Matthew Vick put
> > > together (and recently got pushed upstream).  So my queue no longer
> > > has that patch in the queue, since it got dropped.
> >
> > I still see the same error, the head id is the below
> >
> > $ git log | head
> > commit a072afb0b45904022b76deef3b770ee9a93cb13a
> > Author: Nicholas Krause 
> > Date:   Mon Feb 9 00:27:00 2015 -0800
> >
> > igb: Remove outdated fix me comment in the
> > function,gb_acquire_swfw_sync_i210
> >
> >
> > thanks,
> > Hiroshi
> 
> I'm having our validation see if they can recreate the same issue internally. 
>  When they get back to me I'll let you know
> what we found.

We did bisect, and the below looks the culprit;

32dce968dd987adfb0c00946d78dad9154f64759 is the first bad commit
commit 32dce968dd987adfb0c00946d78dad9154f64759
Author: Vlad Yasevich 
Date:   Sat Jan 31 10:40:18 2015 -0500

ipv6: Allow for partial checksums on non-ufo packets

Currntly, if we are not doing UFO on the packet, all UDP
packets will start with CHECKSUM_NONE and thus perform full
checksum computations in software even if device support
IPv6 checksum offloading.

Let's start start with CHECKSUM_PARTIAL if the device
supports it and we are sending only a single packet at
or below mtu size.

Signed-off-by: Vladislav Yasevich 
Signed-off-by: David S. Miller 

:04 04 4437eaf7e944f5a6136ebf668a256fee688fda3d 
fade8da998d35c8da97a15f0556949ad371e5347 M  net

thanks,
Hiroshi

> 
> Thanks,
> -Don Skidmore 



RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-11 Thread Hiroshi Shimamoto
> > > > Can you please fix up your patches based on my tree:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> > >
> > > Yes. I haven't noticed your tree.
> > > Will resend patches against it.
> > >
> >
> > I encountered an issue with your tree, the commit id is below.
> >
> > $ git log | head
> > commit e6f1649780f8f5a87299bf6af04453f93d1e3d5e
> > Author: Rasmus Villemoes 
> > Date:   Fri Jan 23 20:43:14 2015 -0800
> >
> > ethernet: fm10k: Actually drop 4 bits
> >
> > The comment explains the intention, but vid has type u16. Before the
> > inner shift, it is promoted to int, which has plenty of space for all
> > vid's bits, so nothing is dropped. Use a simple mask instead.
> >
> >
> > I use the kernel from your tree in both host and guest.
> >
> > Assign an IPv6 for VF in guest.
> > # ip -6 addr add 2001:db8::18:1/64 dev ens0
> >
> > Send ping packet from other server to the VM.
> > # ping6  2001:db8::18:1 -I eth0
> >
> > The following message was shown.
> > ixgbevf :00:08.0: partial checksum but l4 proto=3a!
> >
> > If I did the same operation in the host, I saw the same error message in 
> > host too.
> > ixgbe :2d:00.0: partial checksum but l4 proto=3a!
> >
> > Do you have any idea about that?
> 
> Ah, sorry about that, try this tree again:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> 
> That patch was dropped for favor of a patch that Matthew Vick put
> together (and recently got pushed upstream).  So my queue no longer has
> that patch in the queue, since it got dropped.

I still see the same error, the head id is the below

$ git log | head
commit a072afb0b45904022b76deef3b770ee9a93cb13a
Author: Nicholas Krause 
Date:   Mon Feb 9 00:27:00 2015 -0800

igb: Remove outdated fix me comment in the 
function,gb_acquire_swfw_sync_i210


thanks,
Hiroshi


RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-08 Thread Hiroshi Shimamoto
> > Subject: Re: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to 
> > enable MC promiscuous mode
> >
> > On Fri, 2015-01-30 at 11:37 +, Hiroshi Shimamoto wrote:
> > > From: Hiroshi Shimamoto 
> > >
> > > The limitation of the number of multicast address for VF is not enough
> > > for the large scale server with SR-IOV feature.
> > > IPv6 requires the multicast MAC address for each IP address to handle
> > > the Neighbor Solicitation message.
> > > We couldn't assign over 30 IPv6 addresses to a single VF interface.
> > >
> > > The easy way to solve this is enabling multicast promiscuous mode.
> > > It is good to have a functionality to enable multicast promiscuous
> > > mode
> > > for each VF from VF driver.
> > >
> > > This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> > > enable/disable multicast promiscuous mode in VF. If multicast
> > > promiscuous
> > > mode is enabled the VF can receive all multicast packets.
> > >
> > > With this patch, the ixgbevf driver automatically enable multicast
> > > promiscuous mode when the number of multicast addresses is over than
> > > 30
> > > if possible.
> > >
> > > This also bump the API version up to 1.2 to check whether the API,
> > > IXGBE_VF_SET_MC_PROMISC is available.
> > >
> > > Signed-off-by: Hiroshi Shimamoto 
> > > Reviewed-by: Hayato Momma 
> > > CC: Choi, Sy Jong 
> > > ---
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  4 +
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 89
> > > ++-
> > >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++-
> > >  drivers/net/ethernet/intel/ixgbevf/mbx.h  |  4 +
> > >  drivers/net/ethernet/intel/ixgbevf/vf.c   | 29 +++-
> > >  drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
> > >  7 files changed, 137 insertions(+), 4 deletions(-)
> >
> > Hiroshi, I tried to apply your patches to my queue but they do not apply
> > cleanly and they are in a DOS file format, not UNIX.  I also noted
> > several checkpatch.pl issues with your patches, so please fix those up
> > as well.
> 
> I'm sorry to bother you.
> Will fix.
> 
> >
> > Can you please fix up your patches based on my tree:
> > git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git
> 
> Yes. I haven't noticed your tree.
> Will resend patches against it.
> 

I encountered an issue with your tree, the commit id is below.

$ git log | head
commit e6f1649780f8f5a87299bf6af04453f93d1e3d5e
Author: Rasmus Villemoes 
Date:   Fri Jan 23 20:43:14 2015 -0800

ethernet: fm10k: Actually drop 4 bits

The comment explains the intention, but vid has type u16. Before the
inner shift, it is promoted to int, which has plenty of space for all
vid's bits, so nothing is dropped. Use a simple mask instead.


I use the kernel from your tree in both host and guest.

Assign an IPv6 for VF in guest.
# ip -6 addr add 2001:db8::18:1/64 dev ens0

Send ping packet from other server to the VM.
# ping6  2001:db8::18:1 -I eth0

The following message was shown.
ixgbevf :00:08.0: partial checksum but l4 proto=3a!

If I did the same operation in the host, I saw the same error message in host 
too.
ixgbe :2d:00.0: partial checksum but l4 proto=3a!

Do you have any idea about that?

thanks,
Hiroshi


RE: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-02-04 Thread Hiroshi Shimamoto
> Subject: Re: [E1000-devel] [PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to 
> enable MC promiscuous mode
> 
> On Fri, 2015-01-30 at 11:37 +0000, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > The limitation of the number of multicast address for VF is not enough
> > for the large scale server with SR-IOV feature.
> > IPv6 requires the multicast MAC address for each IP address to handle
> > the Neighbor Solicitation message.
> > We couldn't assign over 30 IPv6 addresses to a single VF interface.
> >
> > The easy way to solve this is enabling multicast promiscuous mode.
> > It is good to have a functionality to enable multicast promiscuous
> > mode
> > for each VF from VF driver.
> >
> > This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> > enable/disable multicast promiscuous mode in VF. If multicast
> > promiscuous
> > mode is enabled the VF can receive all multicast packets.
> >
> > With this patch, the ixgbevf driver automatically enable multicast
> > promiscuous mode when the number of multicast addresses is over than
> > 30
> > if possible.
> >
> > This also bump the API version up to 1.2 to check whether the API,
> > IXGBE_VF_SET_MC_PROMISC is available.
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > Reviewed-by: Hayato Momma 
> > CC: Choi, Sy Jong 
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  4 +
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 89
> > ++-
> >  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++-
> >  drivers/net/ethernet/intel/ixgbevf/mbx.h  |  4 +
> >  drivers/net/ethernet/intel/ixgbevf/vf.c   | 29 +++-
> >  drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
> >  7 files changed, 137 insertions(+), 4 deletions(-)
> 
> Hiroshi, I tried to apply your patches to my queue but they do not apply
> cleanly and they are in a DOS file format, not UNIX.  I also noted
> several checkpatch.pl issues with your patches, so please fix those up
> as well.

I'm sorry to bother you.
Will fix.

> 
> Can you please fix up your patches based on my tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/queue.git

Yes. I haven't noticed your tree.
Will resend patches against it.

thanks,
Hiroshi

> 
> This my queue of all community patches against the Intel LAN drivers and
> will be where I queue up your patches while they are under review and
> testing.


[PATCH 3/3] ixgbe: Add new ndo to allow VF multicast promiscuous mode

2015-01-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Implements the new netdev op to allow VF multicast promiscuous mode.

The administrator can allow to VF multicast promiscuous mode for only
trusted VM. After allowing multicast promiscuous mode from the host,
we can use over 30 IPv6 addresses on VM.
 # ./ip link set dev eth0 vf 1 mc_promisc on

When disallowing multicast promiscuous mode, we can only use 30 IPv6 addresses.
 # ./ip link set dev eth0 vf 1 mc_promisc off

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  7 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 35 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  1 +
 4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index bfe..33fde2e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -150,6 +150,7 @@ struct vf_data_storage {
u16 tx_rate;
u16 vlan_count;
u8 spoofchk_enabled;
+   u8 mc_promisc_allowed;
unsigned int vf_api;
 };
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2ed2c7d..34924f7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3569,6 +3569,12 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
if (!adapter->vfinfo[i].spoofchk_enabled)
ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i, false);
}
+
+   /* Reconfigure multicast promiscuous mode */
+   for (i = 0; i < adapter->num_vfs; i++) {
+   ixgbe_ndo_set_vf_mc_promisc(adapter->netdev, i,
+   adapter->vfinfo[i].mc_promisc_allowed);
+   }
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -7955,6 +7961,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_vf_vlan= ixgbe_ndo_set_vf_vlan,
.ndo_set_vf_rate= ixgbe_ndo_set_vf_bw,
.ndo_set_vf_spoofchk= ixgbe_ndo_set_vf_spoofchk,
+   .ndo_set_vf_mc_promisc  = ixgbe_ndo_set_vf_mc_promisc,
.ndo_get_vf_config  = ixgbe_ndo_get_vf_config,
.ndo_get_stats64= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index c19b7b8..9f39a26 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -111,8 +111,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
for (i = 0; i < adapter->num_vfs; i++) {
/* Enable spoof checking for all VFs */
adapter->vfinfo[i].spoofchk_enabled = true;
-   /* Turn multicast promiscuous mode off for all VFs */
+   /*
+* Disallow VF multicast promiscuous capability
+* and turn it off for all VFs */
adapter->vfinfo[i].mc_promisc = false;
+   adapter->vfinfo[i].mc_promisc_allowed = false;
}
return 0;
}
@@ -1019,7 +1022,7 @@ static int ixgbe_set_vf_mc_promisc(struct ixgbe_adapter 
*adapter,
 
adapter->vfinfo[vf].mc_promisc = enable;
 
-   if (enable)
+   if (enable && adapter->vfinfo[vf].mc_promisc_allowed)
return ixgbe_enable_vf_mc_promisc(adapter, vf);
else
return ixgbe_disable_vf_mc_promisc(adapter, vf);
@@ -1415,6 +1418,32 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, 
int vf, bool setting)
return 0;
 }
 
+int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool 
setting)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(netdev);
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 vmolr;
+
+   if (vf >= adapter->num_vfs)
+   return -EINVAL;
+
+   /* nothing to do */
+   if (adapter->vfinfo[vf].mc_promisc_allowed == setting)
+   return 0;
+
+   adapter->vfinfo[vf].mc_promisc_allowed = setting;
+
+   /* if VF requests multicast promiscuous */
+   if (adapter->vfinfo[vf].mc_promisc) {
+   if (setting)
+   ixgbe_enable_vf_mc_promisc(adapter, vf);
+   else
+   ixgbe_disable_vf_mc_promisc(adapter, vf);
+   }
+
+   return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
int vf, struct ifla_vf_info *ivi)
 {
@@ -1428,5 +1457,7 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
  

[PATCH 2/3] if_link: Add VF multicast promiscuous control

2015-01-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Add netlink directives and ndo entry to allow VF multicast promiscuous mode.

The administrator wants to allow dedicatedly multicast promiscuous per VF.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 include/linux/if_link.h  |  1 +
 include/linux/netdevice.h|  3 +++
 include/uapi/linux/if_link.h |  6 ++
 net/core/rtnetlink.c | 18 --
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 119130e..bc29ddf 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -14,5 +14,6 @@ struct ifla_vf_info {
__u32 linkstate;
__u32 min_tx_rate;
__u32 max_tx_rate;
+   __u32 mc_promisc;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 52fd8e8..12e88a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -868,6 +868,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
  *   int max_tx_rate);
  * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
+ * int (*ndo_set_vf_mc_promisc)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  * int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int 
link_state);
@@ -1084,6 +1085,8 @@ struct net_device_ops {
   int max_tx_rate);
int (*ndo_set_vf_spoofchk)(struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_set_vf_mc_promisc)(struct net_device *dev,
+int vf, bool setting);
int (*ndo_get_vf_config)(struct net_device *dev,
 int vf,
 struct ifla_vf_info *ivf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..a476aea 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -454,6 +454,7 @@ enum {
IFLA_VF_SPOOFCHK,   /* Spoof Checking on/off switch */
IFLA_VF_LINK_STATE, /* link state enable/disable/auto switch */
IFLA_VF_RATE,   /* Min and Max TX Bandwidth Allocation */
+   IFLA_VF_MC_PROMISC, /* Multicast Promiscuous allow/disallow */
__IFLA_VF_MAX,
 };
 
@@ -498,6 +499,11 @@ struct ifla_vf_link_state {
__u32 link_state;
 };
 
+struct ifla_vf_mc_promisc {
+   __u32 vf;
+   __u32 setting;
+};
+
 /* VF ports management section
  *
  * Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9cf6fe9..5992245 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -807,7 +807,8 @@ static inline int rtnl_vfinfo_size(const struct net_device 
*dev,
 nla_total_size(sizeof(struct ifla_vf_vlan)) +
 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
 nla_total_size(sizeof(struct ifla_vf_rate)) +
-nla_total_size(sizeof(struct ifla_vf_link_state)));
+nla_total_size(sizeof(struct ifla_vf_link_state)) +
+nla_total_size(sizeof(struct ifla_vf_mc_promisc)));
return size;
} else
return 0;
@@ -1099,6 +1100,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
struct ifla_vf_tx_rate vf_tx_rate;
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_link_state vf_linkstate;
+   struct ifla_vf_mc_promisc vf_mc_promisc;
 
/*
 * Not all SR-IOV capable drivers support the
@@ -1107,6 +1109,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
 * report anything.
 */
ivi.spoofchk = -1;
+   ivi.mc_promisc = -1;
memset(ivi.mac, 0, sizeof(ivi.mac));
/* The default value for VF link state is "auto"
 * IFLA_VF_LINK_STATE_AUTO which equals zero
@@ -1119,7 +1122,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
vf_rate.vf =
vf_tx_rate.vf =
vf_spoofchk.vf =
-   vf_linkstate.vf = ivi.vf;
+   vf_linkstate.vf =
+   vf_mc_

[PATCH 1/3] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2015-01-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

The limitation of the number of multicast address for VF is not enough
for the large scale server with SR-IOV feature.
IPv6 requires the multicast MAC address for each IP address to handle
the Neighbor Solicitation message.
We couldn't assign over 30 IPv6 addresses to a single VF interface.

The easy way to solve this is enabling multicast promiscuous mode.
It is good to have a functionality to enable multicast promiscuous mode
for each VF from VF driver.

This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
enable/disable multicast promiscuous mode in VF. If multicast promiscuous
mode is enabled the VF can receive all multicast packets.

With this patch, the ixgbevf driver automatically enable multicast
promiscuous mode when the number of multicast addresses is over than 30
if possible.

This also bump the API version up to 1.2 to check whether the API,
IXGBE_VF_SET_MC_PROMISC is available.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 89 ++-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++-
 drivers/net/ethernet/intel/ixgbevf/mbx.h  |  4 +
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 29 +++-
 drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
 7 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index b6137be..bfe 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -144,6 +144,7 @@ struct vf_data_storage {
u16 vlans_enabled;
bool clear_to_send;
bool pf_set_mac;
+   bool mc_promisc;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
u16 tx_rate;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index a5cb755..2963557 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -73,6 +73,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_10,  /* API version 1.0, linux/freebsd VF driver */
ixgbe_mbox_api_20,  /* API version 2.0, solaris Phase1 VF driver */
ixgbe_mbox_api_11,  /* API version 1.1, linux/freebsd VF driver */
+   ixgbe_mbox_api_12,  /* API version 1.2, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
 };
@@ -91,6 +92,9 @@ enum ixgbe_pfvf_api_rev {
 /* mailbox API, version 1.1 VF requests */
 #define IXGBE_VF_GET_QUEUES0x09 /* get queue configuration */
 
+/* mailbox API, version 1.2 VF requests */
+#define IXGBE_VF_SET_MC_PROMISC0x0a /* VF requests PF to set MC 
promiscuous */
+
 /* GET_QUEUES return data indices within the mailbox */
 #define IXGBE_VF_TX_QUEUES 1   /* number of Tx queues supported */
 #define IXGBE_VF_RX_QUEUES 2   /* number of Rx queues supported */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index c76ba90..c19b7b8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -108,9 +108,12 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 IXGBE_FLAG2_RSC_ENABLED);
 
-   /* enable spoof checking for all VFs */
-   for (i = 0; i < adapter->num_vfs; i++)
+   for (i = 0; i < adapter->num_vfs; i++) {
+   /* Enable spoof checking for all VFs */
adapter->vfinfo[i].spoofchk_enabled = true;
+   /* Turn multicast promiscuous mode off for all VFs */
+   adapter->vfinfo[i].mc_promisc = false;
+   }
return 0;
}
 
@@ -311,6 +314,40 @@ int ixgbe_pci_sriov_configure(struct pci_dev *dev, int 
num_vfs)
return ixgbe_pci_sriov_enable(dev, num_vfs);
 }
 
+static int ixgbe_enable_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+
+   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
+
+   vmolr |= IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+   return 0;
+}
+
+static int ixgbe_disable_vf_mc_promisc(struct ixgbe_adapter *adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(v

RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-27 Thread Hiroshi Shimamoto
> > On 01/22/2015 04:32 PM, Hiroshi Shimamoto wrote:
> > >> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast
> > >> promiscuous mode control
> > >>> "Skidmore, Donald C"  writes:
> > >>>
> > >>>> My hang up is more related to: without the nob to enable it (off by
> > >>>> default) we are letting one VF dictate policy for all the other VFs
> > >>>> and the PF.  If one VF needs to be in promiscuous multicast so is
> > >>>> everyone else.  Their stacks now needs to deal with all the extra
> > >>>> multicast packets.  As you point out this might not be a direct
> > >>>> concern for isolation in that the VM could have 'chosen' to join
> > >>>> any Multicast group and seen this traffic.  My concern over
> > >>>> isolation is one VF has chosen that all the other VM now have to
> > >>>> see this multicast traffic.
> > >>> Apologies if this question is stupid, but I just have to ask about
> > >>> stuff I don't understand...
> > >>>
> > >>> Looking at the proposed implementation, the promiscous multicast
> > >>> flag seems to be a per-VF flag:
> > >>>
> > >>> +int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf,
> > >>> +bool
> > >>> +setting) {
> > >>> +   struct ixgbe_adapter *adapter = netdev_priv(netdev);
> > >>> +   struct ixgbe_hw *hw = &adapter->hw;
> > >>> +   u32 vmolr;
> > >>> +
> > >>> +   if (vf >= adapter->num_vfs)
> > >>> +   return -EINVAL;
> > >>> +
> > >>> +   adapter->vfinfo[vf].mc_promisc_enabled = setting;
> > >>> +
> > >>> +   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
> > >>> +   if (setting) {
> > >>> +   e_info(drv, "VF %u: enabling multicast promiscuous\n", 
> > >>> vf);
> > >>> +   vmolr |= IXGBE_VMOLR_MPE;
> > >>> +   } else {
> > >>> +   e_info(drv, "VF %u: disabling multicast promiscuous\n", 
> > >>> vf);
> > >>> +   vmolr &= ~IXGBE_VMOLR_MPE;
> > >>> +   }
> > >>> +
> > >>> +   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
> > >>> +
> > >>> +   return 0;
> > >>> +}
> > >>> +
> > >>>
> > >>> I haven't read the data sheet, but I took a quick look at the
> > >>> excellent high level driver docs:
> > >>> http://www.intel.com/content/dam/doc/design-guide/82599-sr-iov-
> > drive
> > >>> r-
> > >>> companion-guide.pdf
> > >>>
> > >>> It mentions "Multicast Promiscuous Enable" in its "Thoughts for
> > >>> Customization" section:
> > >>>
> > >>>  7.1 Multicast Promiscuous Enable
> > >>>
> > >>>  The controller has provisions to allow each VF to be put into
> > >>> Multicast Promiscuous mode.  The Intel reference driver does not
> > >>> configure this option .
> > >>>
> > >>>  The capability can be enabled/disabled by manipulating the MPE
> > >>> field  (bit
> > >>> 28) of the PF VF L2 Control Register (PFVML2FLT – 0x0F000)
> > >>>
> > >>> and showing a section from the data sheet describing the "PF VM L2
> > >>> Control Register - PFVML2FLT[n]  (0x0F000 + 4 * n, n=0...63; RW)"
> > >>>
> > >>> To me it looks like enabling Promiscuos Multicast for a VF won't
> > >>> affect any other VF at all.  Is this really not the case?
> > >>>
> > >>>
> > >>>
> > >>> Bjørn
> > >> Clearly not a dumb question at all and I'm glad you mentioned that.
> > >> :)  I was going off the assumption, been awhile since I read the
> > >> patch, that the patch was using FCTRL.MPE or MANC.MCST_PASS_L2
> > which would turn multicast promiscuous on for everyone.  Since the patch is
> > using PFVML2FLT.MPE this lessens my concern over effect on the entire
> > system.
> > > I believe the patches for this VF multicast promiscuous mode is per VF.
> > >
> > &

RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-22 Thread Hiroshi Shimamoto
> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous 
> mode control
> >
> > "Skidmore, Donald C"  writes:
> >
> > > My hang up is more related to: without the nob to enable it (off by
> > > default) we are letting one VF dictate policy for all the other VFs
> > > and the PF.  If one VF needs to be in promiscuous multicast so is
> > > everyone else.  Their stacks now needs to deal with all the extra
> > > multicast packets.  As you point out this might not be a direct
> > > concern for isolation in that the VM could have 'chosen' to join any
> > > Multicast group and seen this traffic.  My concern over isolation is
> > > one VF has chosen that all the other VM now have to see this multicast
> > > traffic.
> >
> > Apologies if this question is stupid, but I just have to ask about stuff I 
> > don't
> > understand...
> >
> > Looking at the proposed implementation, the promiscous multicast flag
> > seems to be a per-VF flag:
> >
> > +int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool
> > +setting) {
> > +   struct ixgbe_adapter *adapter = netdev_priv(netdev);
> > +   struct ixgbe_hw *hw = &adapter->hw;
> > +   u32 vmolr;
> > +
> > +   if (vf >= adapter->num_vfs)
> > +   return -EINVAL;
> > +
> > +   adapter->vfinfo[vf].mc_promisc_enabled = setting;
> > +
> > +   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
> > +   if (setting) {
> > +   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
> > +   vmolr |= IXGBE_VMOLR_MPE;
> > +   } else {
> > +   e_info(drv, "VF %u: disabling multicast promiscuous\n", vf);
> > +   vmolr &= ~IXGBE_VMOLR_MPE;
> > +   }
> > +
> > +   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
> > +
> > +   return 0;
> > +}
> > +
> >
> > I haven't read the data sheet, but I took a quick look at the excellent high
> > level driver docs:
> > http://www.intel.com/content/dam/doc/design-guide/82599-sr-iov-driver-
> > companion-guide.pdf
> >
> > It mentions "Multicast Promiscuous Enable" in its "Thoughts for
> > Customization" section:
> >
> >  7.1 Multicast Promiscuous Enable
> >
> >  The controller has provisions to allow each VF to be put into Multicast
> > Promiscuous mode.  The Intel reference driver does not configure this
> > option .
> >
> >  The capability can be enabled/disabled by manipulating the MPE field  (bit
> > 28) of the PF VF L2 Control Register (PFVML2FLT – 0x0F000)
> >
> > and showing a section from the data sheet describing the "PF VM L2 Control
> > Register - PFVML2FLT[n]  (0x0F000 + 4 * n, n=0...63; RW)"
> >
> > To me it looks like enabling Promiscuos Multicast for a VF won't affect any
> > other VF at all.  Is this really not the case?
> >
> >
> >
> > Bjørn
> 
> Clearly not a dumb question at all and I'm glad you mentioned that. :)  I was 
> going off the assumption, been awhile since
> I read the patch, that the patch was using FCTRL.MPE or MANC.MCST_PASS_L2 
> which would turn multicast promiscuous on for
> everyone.  Since the patch is using PFVML2FLT.MPE this lessens my concern 
> over effect on the entire system.

I believe the patches for this VF multicast promiscuous mode is per VF.

> 
> That said I still would prefer having a way to override this behavior on the 
> PF, although I admit my argument is weaker.
> I'm still concerned about a VF changing the behavior of the PF without any 
> way to prevent it.  This might be one part
> philosophical (PF sets policy not the VF) but this still could have a 
> noticeable effect on the overall system.  If any
> other VFs (or the PF) are receiving MC packets these will have to be 
> replicated which will be a performance hit.  When
> we use the MC hash this is limited vs. when anyone is in MC promiscuous every 
> MC packet used by another pool would be
> replicated.  I could imagine in some environments (i.e. public clouds) where 
> you don't trust what is running in your VM
> you might what to block this from happening.

I understand your request and I'm thinking to submit the patches
  1) Add new mbox API between ixgbe/ixgbevf to turn MC promiscuous on,
 and enables it when ixgbevf needs over 30 MC addresses.
  2) Add a policy knob to prevent enabling it from the PF.

Does it seem okay?

BTW, I'm bit worried about to use ndo interface for 2) because adding a
new hook makes core code complicated.
Is it really reasonable to do it with ndo?
I haven't find any other suitable method to do it, right now. And using
ndo VF hook looks standard way to control VF functionality.
Then, I think it's the best way to implement this policy in ndo hook.

> 
> In some ways it is almost the mirror image of the issue you brought up:
> 
> Adding a new hook for this seems over-complicated to me.  And it still
> doesn't solve the real problems that
>  a) the user has to know about this limit, and
>  b) manually configure the feature
> 
> My reverse argument might be that if this happens automatically.  It might 
> take the VM provider a long time to r

RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-21 Thread Hiroshi Shimamoto
> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous 
> mode control
> 
> From: Hiroshi Shimamoto
> > My concern is what is the real issue that VF multicast promiscuous mode can 
> > cause.
> > I think there is the 4k entries to filter multicast address, and the 
> > current ixgbe/ixgbevf
> > can turn all bits on from VM. That is almost same as enabling multicast 
> > promiscuous mode.
> > I mean that we can receive all multicast addresses by an onerous operation 
> > in untrusted VM.
> > I think we should clarify what is real security issue in this context.
> 
> If you are worried about passing un-enabled multicasts to users then
> what about doing a software hash of received multicasts and checking
> against an actual list of multicasts enabled for that hash entry.
> Under normal conditions there is likely to be only a single address to check.
> 
> It may (or may not) be best to use the same hash as any hashing hardware
> filter uses.

thanks for the comment. But I don't think that is the point.

I guess, introducing VF multicast promiscuous mode seems to add new privilege
to peek every multicast packet in VM and that doesn't look good.
On the other hand, I think that there has been the same privilege in the current
ixgbe/ixgbevf implementation already. Or I'm reading the code wrongly.
I'd like to clarify what is the issue of allowing to receive all multicast 
packets.

thanks,
Hiroshi



RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-21 Thread Hiroshi Shimamoto
> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous 
> mode control
> 
> 
> 
> > -Original Message-----
> > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > Sent: Tuesday, January 20, 2015 5:07 PM
> > To: Skidmore, Donald C; Bjørn Mork
> > Cc: e1000-de...@lists.sourceforge.net; net...@vger.kernel.org; Choi, Sy
> > Jong; linux-kernel@vger.kernel.org; Hayato Momma
> > Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous
> > mode control
> >
> > > Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast
> > > promiscuous mode control
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > > > Sent: Tuesday, January 20, 2015 3:40 PM
> > > > To: Bjørn Mork
> > > > Cc: e1000-de...@lists.sourceforge.net; net...@vger.kernel.org; Choi,
> > > > Sy Jong; linux-kernel@vger.kernel.org; Hayato Momma
> > > > Subject: Re: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast
> > > > promiscuous mode control
> > > >
> > > > > Subject: Re: [PATCH 1/2] if_link: Add VF multicast promiscuous
> > > > > mode control
> > > > >
> > > > > Hiroshi Shimamoto  writes:
> > > > >
> > > > > > From: Hiroshi Shimamoto 
> > > > > >
> > > > > > Add netlink directives and ndo entry to control VF multicast
> > > > > > promiscuous
> > > > mode.
> > > > > >
> > > > > > Intel ixgbe and ixgbevf driver can handle only 30 multicast MAC
> > > > > > addresses per VF. It means that we cannot assign over 30 IPv6
> > > > > > addresses to a single VF interface on VM. We want thousands IPv6
> > > > addresses in VM.
> > > > > >
> > > > > > There is capability of multicast promiscuous mode in Intel 82599 
> > > > > > chip.
> > > > > > It enables all multicast packets are delivered to the target VF.
> > > > > >
> > > > > > This patch prepares to control that VF multicast promiscuous
> > > > functionality.
> > > > >
> > > > > Adding a new hook for this seems over-complicated to me.  And it
> > > > > still doesn't solve the real problems that
> > > > >  a) the user has to know about this limit, and
> > > > >  b) manually configure the feature
> > > > >
> > > > > Most of us, lacking the ability to imagine such arbitrary hardware
> > > > > limitations, will go through a few hours of frustrating debugging
> > > > > before we figure this one out...
> > > > >
> > > > > Why can't the ixgbevf driver just automatically signal the ixgbe
> > > > > driver to enable multicast promiscuous mode whenever the list
> > > > > grows past the limit?
> > > >
> > > > I had submitted a patch to change ixgbe and ixgbevf driver for this 
> > > > issue.
> > > > https://lkml.org/lkml/2014/11/27/269
> > > >
> > > > The previous patch introduces API between ixgbe and ixgbevf driver
> > > > to enable multicast promiscuous mode, and ixgbevf enables it
> > > > automatically if the number of addresses is over than 30.
> > >
> > > I believe the issue is with allowing a VF to automatically enter
> > > Promiscuous Multicast without the PF's ok is concern
> >
> > So you mean that we should take care about enabling VF multicast
> > promiscuous mode in host side, right? The host allows multicast promiscuous
> > and VF requests it too, then enables VF multicast promiscuous mode.
> > So, what is preferred way to do in host do you think?
> 
> I think we are saying the same thing.  I believe it would be fine if the VF 
> requests this to happen (threw a mailbox message
> like you set up) and the PF will do it if the systems policy has been set up 
> that way (as you did with the control mode).
> 
> This way the behavior (related to multicast) is the same as it has been, 
> unless the system has been setup specifically
> to allow VF multicast promiscuous mode.

Now I understand what you're saying.
Will make patches that make knob in host/PF and ixgbe/ixgbevf interface.
I think I should try to find whether there is a way to make the know without 
ndo.

> 
> I know it's been mentioned that th

RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-20 Thread Hiroshi Shimamoto
> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous 
> mode control
> 
> 
> 
> > -Original Message-----
> > From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com]
> > Sent: Tuesday, January 20, 2015 3:40 PM
> > To: Bjørn Mork
> > Cc: e1000-de...@lists.sourceforge.net; net...@vger.kernel.org; Choi, Sy
> > Jong; linux-kernel@vger.kernel.org; Hayato Momma
> > Subject: Re: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous
> > mode control
> >
> > > Subject: Re: [PATCH 1/2] if_link: Add VF multicast promiscuous mode
> > > control
> > >
> > > Hiroshi Shimamoto  writes:
> > >
> > > > From: Hiroshi Shimamoto 
> > > >
> > > > Add netlink directives and ndo entry to control VF multicast promiscuous
> > mode.
> > > >
> > > > Intel ixgbe and ixgbevf driver can handle only 30 multicast MAC
> > > > addresses per VF. It means that we cannot assign over 30 IPv6
> > > > addresses to a single VF interface on VM. We want thousands IPv6
> > addresses in VM.
> > > >
> > > > There is capability of multicast promiscuous mode in Intel 82599 chip.
> > > > It enables all multicast packets are delivered to the target VF.
> > > >
> > > > This patch prepares to control that VF multicast promiscuous
> > functionality.
> > >
> > > Adding a new hook for this seems over-complicated to me.  And it still
> > > doesn't solve the real problems that
> > >  a) the user has to know about this limit, and
> > >  b) manually configure the feature
> > >
> > > Most of us, lacking the ability to imagine such arbitrary hardware
> > > limitations, will go through a few hours of frustrating debugging
> > > before we figure this one out...
> > >
> > > Why can't the ixgbevf driver just automatically signal the ixgbe
> > > driver to enable multicast promiscuous mode whenever the list grows
> > > past the limit?
> >
> > I had submitted a patch to change ixgbe and ixgbevf driver for this issue.
> > https://lkml.org/lkml/2014/11/27/269
> >
> > The previous patch introduces API between ixgbe and ixgbevf driver to
> > enable multicast promiscuous mode, and ixgbevf enables it automatically if
> > the number of addresses is over than 30.
> 
> I believe the issue is with allowing a VF to automatically enter Promiscuous 
> Multicast without the PF's ok is concern

So you mean that we should take care about enabling VF multicast promiscuous 
mode
in host side, right? The host allows multicast promiscuous and VF requests it 
too,
then enables VF multicast promiscuous mode.
So, what is preferred way to do in host do you think?

> over VM isolation.   Of course that isolation, when it comes to multicast, is 
> rather limited anyway given that our multicast
> filter uses only 12-bit of the address for a match.  Still this (or doing it 
> by default) would only open that up considerably
> more (all multicasts).  I assume for your application you're not concerned, 
> but are there other use cases that would worry
> about such things?

Sorry I couldn't catch the point.

What is the issue? I think there is no difference for the users who don't want
many multicast addresses in guest. In the current implementation, overflowed
multicast addresses silently discarded in ixgbevf. I believe there is no user
who want to use over 30 multicast addresses now. If VF multicast promiscuous 
mode
is enabled in certain VF, the behavior of other VFs is not changed.

thanks,
Hiroshi

> 
> Thanks,
> -Don Skidmore 
> 
> >
> > I got some comment and I would like to clarify the point, but there was no
> > answer.
> > That's the reason I submitted this patch.
> >
> > Do you think a patch for the ixgbe/ixgbevf driver is preferred?
> >
> >
> > thanks,
> > Hiroshi
> >
> > >
> > > I'd also like to note that this comment in
> > > drivers/net/ethernet/intel/ixgbevf/vf.c
> > > indicates that the author had some ideas about how more than 30
> > > addresses could/should be handled:
> > >
> > > static s32 ixgbevf_update_mc_addr_list_vf(struct ixgbe_hw *hw,
> > > struct net_device *netdev)
> > > {
> > >   struct netdev_hw_addr *ha;
> > >   u32 msgbuf[IXGBE_VFMAILBOX_SIZE];
> > >   u16 *vector_list = (u16 *)&msgbuf[1];
> > >   u32 cnt, i;
> > >
> > >   /* Each entry in the list uses 1 16 bit word.  We have 30
> > >* 16 bit words available in our HW msg buffer (minus 1 for the
> > >* msg type).  That's 30 hash values if we pack 'em right.  If
> > >* there are more than 30 MC addresses to add then punt the
> > >* extras for now and then add code to handle more than 30 later.
> > >* It would be unusual for a server to request that many multi-cast
> > >* addresses except for in large enterprise network environments.
> > >*/
> > >
> > >
> > >
> > > The last 2 lines of that comment are of course totally bogus and
> > > pointless and should be deleted in any case...  It's obvious that 30
> > > multicast addresses is ridiculously low for lots of normal use cases.
> > >
> > >
> > > Bjørn



RE: [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-20 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH 1/2] if_link: Add VF multicast promiscuous mode control
> 
> Hiroshi Shimamoto  writes:
> 
> > From: Hiroshi Shimamoto 
> >
> > Add netlink directives and ndo entry to control VF multicast promiscuous 
> > mode.
> >
> > Intel ixgbe and ixgbevf driver can handle only 30 multicast MAC addresses
> > per VF. It means that we cannot assign over 30 IPv6 addresses to a single
> > VF interface on VM. We want thousands IPv6 addresses in VM.
> >
> > There is capability of multicast promiscuous mode in Intel 82599 chip.
> > It enables all multicast packets are delivered to the target VF.
> >
> > This patch prepares to control that VF multicast promiscuous functionality.
> 
> Adding a new hook for this seems over-complicated to me.  And it still
> doesn't solve the real problems that
>  a) the user has to know about this limit, and
>  b) manually configure the feature
> 
> Most of us, lacking the ability to imagine such arbitrary hardware
> limitations, will go through a few hours of frustrating debugging before
> we figure this one out...
> 
> Why can't the ixgbevf driver just automatically signal the ixgbe driver
> to enable multicast promiscuous mode whenever the list grows past the
> limit?

I had submitted a patch to change ixgbe and ixgbevf driver for this issue.
https://lkml.org/lkml/2014/11/27/269

The previous patch introduces API between ixgbe and ixgbevf driver to
enable multicast promiscuous mode, and ixgbevf enables it automatically
if the number of addresses is over than 30.

I got some comment and I would like to clarify the point, but there was no
answer.
That's the reason I submitted this patch.

Do you think a patch for the ixgbe/ixgbevf driver is preferred?


thanks,
Hiroshi

> 
> I'd also like to note that this comment in
> drivers/net/ethernet/intel/ixgbevf/vf.c
> indicates that the author had some ideas about how more than 30
> addresses could/should be handled:
> 
> static s32 ixgbevf_update_mc_addr_list_vf(struct ixgbe_hw *hw,
> struct net_device *netdev)
> {
>   struct netdev_hw_addr *ha;
>   u32 msgbuf[IXGBE_VFMAILBOX_SIZE];
>   u16 *vector_list = (u16 *)&msgbuf[1];
>   u32 cnt, i;
> 
>   /* Each entry in the list uses 1 16 bit word.  We have 30
>* 16 bit words available in our HW msg buffer (minus 1 for the
>* msg type).  That's 30 hash values if we pack 'em right.  If
>* there are more than 30 MC addresses to add then punt the
>* extras for now and then add code to handle more than 30 later.
>* It would be unusual for a server to request that many multi-cast
>* addresses except for in large enterprise network environments.
>*/
> 
> 
> 
> The last 2 lines of that comment are of course totally bogus and
> pointless and should be deleted in any case...  It's obvious that 30
> multicast addresses is ridiculously low for lots of normal use cases.
> 
> 
> Bjørn


[PATCH 1/2] if_link: Add VF multicast promiscuous mode control

2015-01-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Add netlink directives and ndo entry to control VF multicast promiscuous mode.

Intel ixgbe and ixgbevf driver can handle only 30 multicast MAC addresses
per VF. It means that we cannot assign over 30 IPv6 addresses to a single
VF interface on VM. We want thousands IPv6 addresses in VM.

There is capability of multicast promiscuous mode in Intel 82599 chip.
It enables all multicast packets are delivered to the target VF.

This patch prepares to control that VF multicast promiscuous functionality.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 include/linux/if_link.h  |  1 +
 include/linux/netdevice.h|  3 +++
 include/uapi/linux/if_link.h |  6 ++
 net/core/rtnetlink.c | 18 --
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 119130e..bc29ddf 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -14,5 +14,6 @@ struct ifla_vf_info {
__u32 linkstate;
__u32 min_tx_rate;
__u32 max_tx_rate;
+   __u32 mc_promisc;
 };
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 52fd8e8..12e88a7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -868,6 +868,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
  *   int max_tx_rate);
  * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
+ * int (*ndo_set_vf_mc_promisc)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  * int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int 
link_state);
@@ -1084,6 +1085,8 @@ struct net_device_ops {
   int max_tx_rate);
int (*ndo_set_vf_spoofchk)(struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_set_vf_mc_promisc)(struct net_device *dev,
+int vf, bool setting);
int (*ndo_get_vf_config)(struct net_device *dev,
 int vf,
 struct ifla_vf_info *ivf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..32b4b9e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -454,6 +454,7 @@ enum {
IFLA_VF_SPOOFCHK,   /* Spoof Checking on/off switch */
IFLA_VF_LINK_STATE, /* link state enable/disable/auto switch */
IFLA_VF_RATE,   /* Min and Max TX Bandwidth Allocation */
+   IFLA_VF_MC_PROMISC, /* Multicast Promiscuous on/off switch */
__IFLA_VF_MAX,
 };
 
@@ -498,6 +499,11 @@ struct ifla_vf_link_state {
__u32 link_state;
 };
 
+struct ifla_vf_mc_promisc {
+   __u32 vf;
+   __u32 setting;
+};
+
 /* VF ports management section
  *
  * Nested layout of set/get msg is:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9cf6fe9..5992245 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -807,7 +807,8 @@ static inline int rtnl_vfinfo_size(const struct net_device 
*dev,
 nla_total_size(sizeof(struct ifla_vf_vlan)) +
 nla_total_size(sizeof(struct ifla_vf_spoofchk)) +
 nla_total_size(sizeof(struct ifla_vf_rate)) +
-nla_total_size(sizeof(struct ifla_vf_link_state)));
+nla_total_size(sizeof(struct ifla_vf_link_state)) +
+nla_total_size(sizeof(struct ifla_vf_mc_promisc)));
return size;
} else
return 0;
@@ -1099,6 +1100,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
struct ifla_vf_tx_rate vf_tx_rate;
struct ifla_vf_spoofchk vf_spoofchk;
struct ifla_vf_link_state vf_linkstate;
+   struct ifla_vf_mc_promisc vf_mc_promisc;
 
/*
 * Not all SR-IOV capable drivers support the
@@ -1107,6 +1109,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
 * report anything.
 */
ivi.spoofchk = -1;
+   ivi.mc_promisc = -1;
memset(ivi.mac, 0, sizeof(ivi.mac));
/* The default value for VF link state is "auto"
 * IFLA_VF_LINK_STATE_AUTO which equals zero
@@ -1119,7 +1122,8 @@ static int rtnl_f

[PATCH 2/2] ixgbe: Add new ndo to control VF multicast promiscuous mode

2015-01-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Implements the new netdev op to turn VF multicast promiscuous mode on or off.

When VF multicast promiscuous mode is enabled, all multicast packets are
delivered to the VF.

After enabling multicast promiscuous mode from the host, we can use over 30
IPv6 addresses on VM.
 # ./ip link set dev eth0 vf 1 mc_promisc on

When disabling multicast promiscuous mode, we can only use 30 IPv6 addresses.
 # ./ip link set dev eth0 vf 1 mc_promisc off

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h   |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  7 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 34 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |  1 +
 4 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index b6137be..1975570 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -149,6 +149,7 @@ struct vf_data_storage {
u16 tx_rate;
u16 vlan_count;
u8 spoofchk_enabled;
+   u8 mc_promisc_enabled;
unsigned int vf_api;
 };
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2ed2c7d..6fb1753 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3569,6 +3569,12 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
if (!adapter->vfinfo[i].spoofchk_enabled)
ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i, false);
}
+
+   /* Reconfigure multicast promiscuous mode */
+   for (i = 0; i < adapter->num_vfs; i++) {
+   ixgbe_ndo_set_vf_mc_promisc(adapter->netdev, i,
+   adapter->vfinfo[i].mc_promisc_enabled);
+   }
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -7955,6 +7961,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_set_vf_vlan= ixgbe_ndo_set_vf_vlan,
.ndo_set_vf_rate= ixgbe_ndo_set_vf_bw,
.ndo_set_vf_spoofchk= ixgbe_ndo_set_vf_spoofchk,
+   .ndo_set_vf_mc_promisc  = ixgbe_ndo_set_vf_mc_promisc,
.ndo_get_vf_config  = ixgbe_ndo_get_vf_config,
.ndo_get_stats64= ixgbe_get_stats64,
 #ifdef CONFIG_IXGBE_DCB
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index c76ba90..3e83f03 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -108,9 +108,12 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 IXGBE_FLAG2_RSC_ENABLED);
 
-   /* enable spoof checking for all VFs */
-   for (i = 0; i < adapter->num_vfs; i++)
+   for (i = 0; i < adapter->num_vfs; i++) {
+   /* enable spoof checking for all VFs */
adapter->vfinfo[i].spoofchk_enabled = true;
+   /* disable multicast promiscuous for all VFs */
+   adapter->vfinfo[i].mc_promisc_enabled = false;
+   }
return 0;
}
 
@@ -1330,6 +1333,31 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, 
int vf, bool setting)
return 0;
 }
 
+int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf, bool 
setting)
+{
+   struct ixgbe_adapter *adapter = netdev_priv(netdev);
+   struct ixgbe_hw *hw = &adapter->hw;
+   u32 vmolr;
+
+   if (vf >= adapter->num_vfs)
+   return -EINVAL;
+
+   adapter->vfinfo[vf].mc_promisc_enabled = setting;
+
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+   if (setting) {
+   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
+   vmolr |= IXGBE_VMOLR_MPE;
+   } else {
+   e_info(drv, "VF %u: disabling multicast promiscuous\n", vf);
+   vmolr &= ~IXGBE_VMOLR_MPE;
+   }
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+   return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
int vf, struct ifla_vf_info *ivi)
 {
@@ -1343,5 +1371,7 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
ivi->vlan = adapter->vfinfo[vf].pf_vlan;
ivi->qos = adapter->vfinfo[vf].pf_qos;
ivi->spoofchk = adapter->vfinfo[vf].spoofchk_enabled;
+   ivi->mc_promisc = adapter->vfinfo[vf].mc_promisc_enabled;
+
return 0;
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h 
b/dr

RE: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2014-12-21 Thread Hiroshi Shimamoto
> > > Subject: Re: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to 
> > > enable MC promiscuous mode
> > >
> > > On 11/27/2014 02:39 AM, Hiroshi Shimamoto wrote:
> > > > From: Hiroshi Shimamoto 
> > > >
> > > > The limitation of the number of multicast address for VF is not enough
> > > > for the large scale server with SR-IOV feature.
> > > > IPv6 requires the multicast MAC address for each IP address to handle
> > > > the Neighbor Solicitation message.
> > > > We couldn't assign over 30 IPv6 addresses to a single VF interface.
> > > >
> > > > The easy way to solve this is enabling multicast promiscuous mode.
> > > > It is good to have a functionality to enable multicast promiscuous mode
> > > > for each VF from VF driver.
> > > >
> > > > This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> > > > enable/disable multicast promiscuous mode in VF. If multicast 
> > > > promiscuous
> > > > mode is enabled the VF can receive all multicast packets.
> > > >
> > > > With this patch, the ixgbevf driver automatically enable multicast
> > > > promiscuous mode when the number of multicast addresses is over than 30
> > > > if possible.
> > > >
> > > > This also bump the API version up to 1.2 to check whether the API,
> > > > IXGBE_VF_SET_MC_PROMISC is available.
> > > >
> > > > Signed-off-by: Hiroshi Shimamoto 
> > > > CC: Choi, Sy Jong 
> > > > Reviewed-by: Hayato Momma 
> > >
> > > This is a REALLY bad idea unless you plan to limit this to privileged VFs.
> > >
> > > I would recommend looking at adding an ndo operation to control this
> > > feature so that it could be disabled by default in the PF and only
> > > enabled on the host side if specifically requested.  Otherwise the

Do you think whether introducing ndo_set_vf_mc_promisc to control the multicast
promiscuous mode of VF from host is good to you?
If that's okay I'm fine to post the new patch.

We need the capability to use thousands IPv6 addresses in VF.
I think setting multicast promiscuous mode on is the easiest way to do it.

thanks,
Hiroshi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2014-12-15 Thread Hiroshi Shimamoto
> > Subject: Re: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to 
> > enable MC promiscuous mode
> >
> > On 11/27/2014 02:39 AM, Hiroshi Shimamoto wrote:
> > > From: Hiroshi Shimamoto 
> > >
> > > The limitation of the number of multicast address for VF is not enough
> > > for the large scale server with SR-IOV feature.
> > > IPv6 requires the multicast MAC address for each IP address to handle
> > > the Neighbor Solicitation message.
> > > We couldn't assign over 30 IPv6 addresses to a single VF interface.
> > >
> > > The easy way to solve this is enabling multicast promiscuous mode.
> > > It is good to have a functionality to enable multicast promiscuous mode
> > > for each VF from VF driver.
> > >
> > > This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> > > enable/disable multicast promiscuous mode in VF. If multicast promiscuous
> > > mode is enabled the VF can receive all multicast packets.
> > >
> > > With this patch, the ixgbevf driver automatically enable multicast
> > > promiscuous mode when the number of multicast addresses is over than 30
> > > if possible.
> > >
> > > This also bump the API version up to 1.2 to check whether the API,
> > > IXGBE_VF_SET_MC_PROMISC is available.
> > >
> > > Signed-off-by: Hiroshi Shimamoto 
> > > CC: Choi, Sy Jong 
> > > Reviewed-by: Hayato Momma 
> >
> > This is a REALLY bad idea unless you plan to limit this to privileged VFs.
> >
> > I would recommend looking at adding an ndo operation to control this
> > feature so that it could be disabled by default in the PF and only
> > enabled on the host side if specifically requested.  Otherwise the
> 
> Do you mean that PF driver should have the flag to enable or disable per VF
> and disallow the request from VF?

Could you answer about that?

> 
> > problem is I can easily see this leading security issues as the VFs
> > might begin getting access to messages that they aren't supposed to.
> 
> OK, by the way, I think that the current ixgbe and ixgbevf implementation
> has already such issue. The guest can add hash entry to receive MAC and it
> can get every multicast MAC frame with the current mbox API.
> Does your concern come from the easiness of doing that?

There is the single MTA per PF, not per VF.
VF requests PF to register the hash of MC MAC, then PF set a bit in the MTA
and set the flag IXGBE_VMOLR_ROMPE of VF, which enables packets switching to
the VF if MC MAC hits the hash entry in the MTA.
If VM1 has VF1 which uses MC MAC1 and VM2 has VF2 which uses MC MAC2, both
of VM1 and VM2 will receive MC MAC1. VM2 doesn't know why it receives MAC1.
In other words, in the current implementation, a VF receives all multicast
packets which are registered from other VFs.
Because the above reason, I hadn't imagined that enabling MC promiscuous mode
increases receiving the MC messages that they aren't supposed to.
I think that this patch doesn't change that behavior.

thanks,
Hiroshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2014-12-05 Thread Hiroshi Shimamoto
> Subject: Re: [E1000-devel] [PATCH] ixgbe, ixgbevf: Add new mbox API to enable 
> MC promiscuous mode
> 
> On 11/27/2014 02:39 AM, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > The limitation of the number of multicast address for VF is not enough
> > for the large scale server with SR-IOV feature.
> > IPv6 requires the multicast MAC address for each IP address to handle
> > the Neighbor Solicitation message.
> > We couldn't assign over 30 IPv6 addresses to a single VF interface.
> >
> > The easy way to solve this is enabling multicast promiscuous mode.
> > It is good to have a functionality to enable multicast promiscuous mode
> > for each VF from VF driver.
> >
> > This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
> > enable/disable multicast promiscuous mode in VF. If multicast promiscuous
> > mode is enabled the VF can receive all multicast packets.
> >
> > With this patch, the ixgbevf driver automatically enable multicast
> > promiscuous mode when the number of multicast addresses is over than 30
> > if possible.
> >
> > This also bump the API version up to 1.2 to check whether the API,
> > IXGBE_VF_SET_MC_PROMISC is available.
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > CC: Choi, Sy Jong 
> > Reviewed-by: Hayato Momma 
> 
> This is a REALLY bad idea unless you plan to limit this to privileged VFs.
> 
> I would recommend looking at adding an ndo operation to control this
> feature so that it could be disabled by default in the PF and only
> enabled on the host side if specifically requested.  Otherwise the

Do you mean that PF driver should have the flag to enable or disable per VF
and disallow the request from VF?

> problem is I can easily see this leading security issues as the VFs
> might begin getting access to messages that they aren't supposed to.

OK, by the way, I think that the current ixgbe and ixgbevf implementation
has already such issue. The guest can add hash entry to receive MAC and it
can get every multicast MAC frame with the current mbox API.
Does your concern come from the easiness of doing that?

thanks,
Hiroshi

> 
> - Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ixgbe, ixgbevf: Add new mbox API to enable MC promiscuous mode

2014-11-27 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

The limitation of the number of multicast address for VF is not enough
for the large scale server with SR-IOV feature.
IPv6 requires the multicast MAC address for each IP address to handle
the Neighbor Solicitation message.
We couldn't assign over 30 IPv6 addresses to a single VF interface.

The easy way to solve this is enabling multicast promiscuous mode.
It is good to have a functionality to enable multicast promiscuous mode
for each VF from VF driver.

This patch introduces the new mbox API, IXGBE_VF_SET_MC_PROMISC, to
enable/disable multicast promiscuous mode in VF. If multicast promiscuous
mode is enabled the VF can receive all multicast packets.

With this patch, the ixgbevf driver automatically enable multicast
promiscuous mode when the number of multicast addresses is over than 30
if possible.

This also bump the API version up to 1.2 to check whether the API,
IXGBE_VF_SET_MC_PROMISC is available.

Signed-off-by: Hiroshi Shimamoto 
CC: Choi, Sy Jong 
Reviewed-by: Hayato Momma 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h  |  4 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c| 80 +++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 13 +++-
 drivers/net/ethernet/intel/ixgbevf/mbx.h  |  4 ++
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 29 +++-
 drivers/net/ethernet/intel/ixgbevf/vf.h   |  1 +
 7 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5032a60..2a5e3d3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -144,6 +144,7 @@ struct vf_data_storage {
u16 vlans_enabled;
bool clear_to_send;
bool pf_set_mac;
+   bool vf_mc_promisc;
u16 pf_vlan; /* When set, guest VLAN config not allowed. */
u16 pf_qos;
u16 tx_rate;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
index a5cb755..2963557 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h
@@ -73,6 +73,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_10,  /* API version 1.0, linux/freebsd VF driver */
ixgbe_mbox_api_20,  /* API version 2.0, solaris Phase1 VF driver */
ixgbe_mbox_api_11,  /* API version 1.1, linux/freebsd VF driver */
+   ixgbe_mbox_api_12,  /* API version 1.2, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
 };
@@ -91,6 +92,9 @@ enum ixgbe_pfvf_api_rev {
 /* mailbox API, version 1.1 VF requests */
 #define IXGBE_VF_GET_QUEUES0x09 /* get queue configuration */
 
+/* mailbox API, version 1.2 VF requests */
+#define IXGBE_VF_SET_MC_PROMISC0x0a /* VF requests PF to set MC 
promiscuous */
+
 /* GET_QUEUES return data indices within the mailbox */
 #define IXGBE_VF_TX_QUEUES 1   /* number of Tx queues supported */
 #define IXGBE_VF_RX_QUEUES 2   /* number of Rx queues supported */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 01f7081..427993c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -108,6 +108,10 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter 
*adapter)
adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 IXGBE_FLAG2_RSC_ENABLED);
 
+   /* Disable multicast promiscuous mode of each VF at first */
+   for (i = 0; i < adapter->num_vfs; i++)
+   adapter->vfinfo[i].vf_mc_promisc = false;
+
/* enable spoof checking for all VFs */
for (i = 0; i < adapter->num_vfs; i++)
adapter->vfinfo[i].spoofchk_enabled = true;
@@ -310,6 +314,46 @@ int ixgbe_pci_sriov_configure(struct pci_dev *dev, int 
num_vfs)
return ixgbe_pci_sriov_enable(dev, num_vfs);
 }
 
+static int ixgbe_enable_vf_mc_promisc(struct ixgbe_adapter * adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   if (adapter->vfinfo[vf].vf_mc_promisc)
+   return 0;
+
+   hw = &adapter->hw;
+   vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
+
+   e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
+
+   vmolr |= IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
+
+   return 0;
+}
+
+static int ixgbe_disable_vf_mc_promisc(struct ixgbe_adapter * adapter, u32 vf)
+{
+   struct ixgbe_hw *hw;
+   u32 vmolr;
+
+   if (!adapter->vfinfo[vf].vf_mc_promisc)
+   return 0;
+
+   hw = &adapt

RE: [PATCH] ixgbe: make VLAN filter conditional in SR-IOV case

2014-11-21 Thread Hiroshi Shimamoto
> Subject: Re: [PATCH] ixgbe: make VLAN filter conditional in SR-IOV case
> 
> On Thu, 2014-11-13 at 08:28 +, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.
> >
> > In SR-IOV case, there is a use case which needs to disable VLAN filter.
> > For example, we need to make a network function with VF in virtualized
> > environment. That network function may be a software switch, a router
> > or etc. It means that that network function will be an end point which
> > terminates many VLANs.
> >
> > In the current implementation, VLAN filtering always be turned on and
> > VF can receive only 63 VLANs. It means that only 63 VLANs can be used
> > and it's not enough at all for building a virtual router.
> >
> > With this patch, if the user turns VLAN filtering off on the host, VF
> > can receive every VLAN packet.
> > The behavior is changed only if VLAN filtering is turned off by ethtool.
> [...]
> 
> What happens when VLAN filtering is turned back on and a VF uses too
> many VLANs?  It seems like that should either be prevented (you can't
> turn it back on) or the driver should log a message saying the VF is now
> broken.

that's reasonable.
Will submit additional patch to take care about that.

thanks,
Hiroshi

> 
> Ben.
> 
> --
> Ben Hutchings
> Beware of bugs in the above code;
> I have only proved it correct, not tried it. - Donald Knuth
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH] ixgbe: make VLAN filter conditional in SR-IOV case

2014-11-13 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto 

Disable hardware VLAN filtering if netdev->features VLAN flag is dropped.

In SR-IOV case, there is a use case which needs to disable VLAN filter.
For example, we need to make a network function with VF in virtualized
environment. That network function may be a software switch, a router
or etc. It means that that network function will be an end point which
terminates many VLANs.

In the current implementation, VLAN filtering always be turned on and
VF can receive only 63 VLANs. It means that only 63 VLANs can be used
and it's not enough at all for building a virtual router.

With this patch, if the user turns VLAN filtering off on the host, VF
can receive every VLAN packet.
The behavior is changed only if VLAN filtering is turned off by ethtool.

Signed-off-by: Hiroshi Shimamoto 
CC: Choi, Sy Jong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 10 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |  4 
 2 files changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d2df4e3..91ce3a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3948,6 +3948,12 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
hw->addr_ctrl.user_set_promisc = false;
}
 
+   /* Disable hardware VLAN filter if the feature flag is dropped */
+   if (!(netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
+   dev_info(&adapter->pdev->dev, "Disable HW VLAN filter\n");
+   vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
+   }
+
/*
 * Write addresses to available RAR registers, if there is not
 * sufficient space to store all the addresses then enable
@@ -7634,6 +7640,10 @@ static int ixgbe_set_features(struct net_device *netdev,
else
ixgbe_vlan_strip_disable(adapter);
 
+   /* reset if HW VLAN filter is changed */
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER)
+   need_reset = true;
+
if (changed & NETIF_F_RXALL)
need_reset = true;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 13916d8..5508d8a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -776,6 +776,10 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter 
*adapter,
u32 bits;
u8 tcs = netdev_get_num_tc(adapter->netdev);
 
+   /* Ignore if VLAN filter is disabled */
+   if (!(adapter->netdev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+   return 0;
+
if (adapter->vfinfo[vf].pf_vlan || tcs) {
e_warn(drv,
   "VF %d attempted to override administratively set VLAN 
configuration\n"
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] latencytop: change /proc task_struct access method

2008-02-20 Thread Hiroshi Shimamoto
Hi Ingo,

Here is a delta patch against sched-devel.git tree.

These two patches in sched-devel.git tree fix the issues.
latencytop: fix kernel panic while reading latency proc file
latencytop: fix memory leak on latency proc file

However, this is more appropriate way to fix, I think.

---
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Change getting task_struct by get_proc_task() at read or write time,
and returns -ESRCH if get_proc_task() returns NULL.
This is same behavior as other /proc files.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |   40 
 1 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 64661c3..bebf9a8 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -314,9 +314,12 @@ static int proc_pid_schedstat(struct task_struct *task, 
char *buffer)
 static int lstats_show_proc(struct seq_file *m, void *v)
 {
int i;
-   struct task_struct *task = m->private;
-   seq_puts(m, "Latency Top version : v0.1\n");
+   struct inode *inode = m->private;
+   struct task_struct *task = get_proc_task(inode);
 
+   if (!task)
+   return -ESRCH;
+   seq_puts(m, "Latency Top version : v0.1\n");
for (i = 0; i < 32; i++) {
if (task->latency_record[i].backtrace[0]) {
int q;
@@ -341,43 +344,24 @@ static int lstats_show_proc(struct seq_file *m, void *v)
}
 
}
+   put_task_struct(task);
return 0;
 }
 
 static int lstats_open(struct inode *inode, struct file *file)
 {
-   int ret;
-   struct seq_file *m;
-   struct task_struct *task = get_proc_task(inode);
-
-   if (!task)
-   return -ENOENT;
-   ret = single_open(file, lstats_show_proc, NULL);
-   if (!ret) {
-   m = file->private_data;
-   m->private = task;
-   }
-   return ret;
-}
-
-static int lstats_release(struct inode *inode, struct file *file)
-{
-   struct seq_file *m = file->private_data;
-   struct task_struct *task = m->private;
-
-   put_task_struct(task);
-   return single_release(inode, file);
+   return single_open(file, lstats_show_proc, inode);
 }
 
 static ssize_t lstats_write(struct file *file, const char __user *buf,
size_t count, loff_t *offs)
 {
-   struct seq_file *m;
-   struct task_struct *task;
+   struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
 
-   m = file->private_data;
-   task = m->private;
+   if (!task)
+   return -ESRCH;
clear_all_latency_tracing(task);
+   put_task_struct(task);
 
return count;
 }
@@ -387,7 +371,7 @@ static const struct file_operations proc_lstats_operations 
= {
.read   = seq_read,
.write  = lstats_write,
.llseek = seq_lseek,
-   .release= lstats_release,
+   .release= single_release,
 };
 
 #endif
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] x86: unify cpu/proc|_64.c

2008-02-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Now cpu/proc.c and cpu/proc_64.c are same.
So cpu/proc_64.c can be removed.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/Makefile  |5 +-
 arch/x86/kernel/cpu/proc_64.c |  180 -
 2 files changed, 2 insertions(+), 183 deletions(-)
 delete mode 100644 arch/x86/kernel/cpu/proc_64.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 8ba7d28..ee7c452 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -3,9 +3,9 @@
 #
 
 obj-y  := intel_cacheinfo.o addon_cpuid_features.o
-obj-y  += feature_names.o
+obj-y  += proc.o feature_names.o
 
-obj-$(CONFIG_X86_32)   += common.o proc.o bugs.o
+obj-$(CONFIG_X86_32)   += common.o bugs.o
 obj-$(CONFIG_X86_32)   += amd.o
 obj-$(CONFIG_X86_32)   += cyrix.o
 obj-$(CONFIG_X86_32)   += centaur.o
@@ -13,7 +13,6 @@ obj-$(CONFIG_X86_32)  += transmeta.o
 obj-$(CONFIG_X86_32)   += intel.o
 obj-$(CONFIG_X86_32)   += nexgen.o
 obj-$(CONFIG_X86_32)   += umc.o
-obj-$(CONFIG_X86_64)   += proc_64.o
 
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/proc_64.c b/arch/x86/kernel/cpu/proc_64.c
deleted file mode 100644
index 15043a3..000
--- a/arch/x86/kernel/cpu/proc_64.c
+++ /dev/null
@@ -1,180 +0,0 @@
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-/*
- * Get CPU information for use by the procfs.
- */
-#ifdef CONFIG_X86_32
-static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
- unsigned int cpu)
-{
-#ifdef CONFIG_X86_HT
-   if (c->x86_max_cores * smp_num_siblings > 1) {
-   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-   seq_printf(m, "siblings\t: %d\n",
-  cpus_weight(per_cpu(cpu_core_map, cpu)));
-   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
-   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
-   }
-#endif
-}
-
-static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
-{
-   /*
-* We use exception 16 if we have hardware math and we've either seen
-* it or the CPU claims it is internal
-*/
-   int fpu_exception = c->hard_math && (ignore_fpu_irq || cpu_has_fpu);
-   seq_printf(m,
-  "fdiv_bug\t: %s\n"
-  "hlt_bug\t\t: %s\n"
-  "f00f_bug\t: %s\n"
-  "coma_bug\t: %s\n"
-  "fpu\t\t: %s\n"
-  "fpu_exception\t: %s\n"
-  "cpuid level\t: %d\n"
-  "wp\t\t: %s\n",
-  c->fdiv_bug ? "yes" : "no",
-  c->hlt_works_ok ? "no" : "yes",
-  c->f00f_bug ? "yes" : "no",
-  c->coma_bug ? "yes" : "no",
-  c->hard_math ? "yes" : "no",
-  fpu_exception ? "yes" : "no",
-  c->cpuid_level,
-  c->wp_works_ok ? "yes" : "no");
-}
-#else
-static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
- unsigned int cpu)
-{
-#ifdef CONFIG_SMP
-   if (c->x86_max_cores * smp_num_siblings > 1) {
-   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-   seq_printf(m, "siblings\t: %d\n",
-  cpus_weight(per_cpu(cpu_core_map, cpu)));
-   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
-   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
-   }
-#endif
-}
-
-static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
-{
-   seq_printf(m,
-  "fpu\t\t: yes\n"
-  "fpu_exception\t: yes\n"
-  "cpuid level\t: %d\n"
-  "wp\t\t: yes\n",
-  c->cpuid_level);
-}
-#endif
-
-static int show_cpuinfo(struct seq_file *m, void *v)
-{
-   struct cpuinfo_x86 *c = v;
-   unsigned int cpu = 0;
-   int i;
-
-#ifdef CONFIG_SMP
-   cpu = c->cpu_index;
-#endif
-   seq_printf(m, "processor\t: %u\n"
-  "vendor_id\t: %s\n"
-  "cpu family\t: %d\n"
-  "model\t\t: %u\n"
-  "model name\t: %s\n",
-  cpu,
-  c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
-  c->x86,

[PATCH 4/5] x86: cosmetic unification cpu/proc|_64.c

2008-02-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

make cpu/proc.c and cpu/proc_64.c same.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/proc.c|   36 ++
 arch/x86/kernel/cpu/proc_64.c |   49 +++-
 2 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index fd3823a..15043a3 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -8,6 +8,7 @@
 /*
  * Get CPU information for use by the procfs.
  */
+#ifdef CONFIG_X86_32
 static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
  unsigned int cpu)
 {
@@ -47,6 +48,31 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)
   c->cpuid_level,
   c->wp_works_ok ? "yes" : "no");
 }
+#else
+static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
+ unsigned int cpu)
+{
+#ifdef CONFIG_SMP
+   if (c->x86_max_cores * smp_num_siblings > 1) {
+   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+   seq_printf(m, "siblings\t: %d\n",
+  cpus_weight(per_cpu(cpu_core_map, cpu)));
+   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
+   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
+   }
+#endif
+}
+
+static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
+{
+   seq_printf(m,
+  "fpu\t\t: yes\n"
+  "fpu_exception\t: yes\n"
+  "cpuid level\t: %d\n"
+  "wp\t\t: yes\n",
+  c->cpuid_level);
+}
+#endif
 
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
@@ -97,7 +123,17 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
   c->loops_per_jiffy/(50/HZ),
   (c->loops_per_jiffy/(5000/HZ)) % 100);
+
+#ifdef CONFIG_X86_64
+   if (c->x86_tlbsize > 0)
+   seq_printf(m, "TLB size\t: %d 4K pages\n", c->x86_tlbsize);
+#endif
seq_printf(m, "clflush size\t: %u\n", c->x86_clflush_size);
+#ifdef CONFIG_X86_64
+   seq_printf(m, "cache_alignment\t: %d\n", c->x86_cache_alignment);
+   seq_printf(m, "address sizes\t: %u bits physical, %u bits virtual\n",
+  c->x86_phys_bits, c->x86_virt_bits);
+#endif
 
seq_printf(m, "power management:");
for (i = 0; i < 32; i++) {
diff --git a/arch/x86/kernel/cpu/proc_64.c b/arch/x86/kernel/cpu/proc_64.c
index ce1b08f..15043a3 100644
--- a/arch/x86/kernel/cpu/proc_64.c
+++ b/arch/x86/kernel/cpu/proc_64.c
@@ -8,6 +8,47 @@
 /*
  * Get CPU information for use by the procfs.
  */
+#ifdef CONFIG_X86_32
+static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
+ unsigned int cpu)
+{
+#ifdef CONFIG_X86_HT
+   if (c->x86_max_cores * smp_num_siblings > 1) {
+   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+   seq_printf(m, "siblings\t: %d\n",
+  cpus_weight(per_cpu(cpu_core_map, cpu)));
+   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
+   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
+   }
+#endif
+}
+
+static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
+{
+   /*
+* We use exception 16 if we have hardware math and we've either seen
+* it or the CPU claims it is internal
+*/
+   int fpu_exception = c->hard_math && (ignore_fpu_irq || cpu_has_fpu);
+   seq_printf(m,
+  "fdiv_bug\t: %s\n"
+  "hlt_bug\t\t: %s\n"
+  "f00f_bug\t: %s\n"
+  "coma_bug\t: %s\n"
+  "fpu\t\t: %s\n"
+  "fpu_exception\t: %s\n"
+  "cpuid level\t: %d\n"
+  "wp\t\t: %s\n",
+  c->fdiv_bug ? "yes" : "no",
+  c->hlt_works_ok ? "no" : "yes",
+  c->f00f_bug ? "yes" : "no",
+  c->coma_bug ? "yes" : "no",
+  c->hard_math ? "yes" : "no",
+  fpu_exception ? "yes" : "no",
+  c->cpuid_level,
+  c->wp_works_ok ? "yes" : "no");
+}
+#else
 static void 

[PATCH 3/5] x86_32: add power management line in /proc/cpuinfo

2008-02-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Change /proc/cpuinfo. It will look like x86_64's.
'power management' line is added and power management information
will be printed at the line.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/proc.c |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 9bc3b04..fd3823a 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -94,7 +94,13 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
seq_printf(m, " %s", x86_cap_flags[i]);
 
-   for (i = 0; i < 32; i++)
+   seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
+  c->loops_per_jiffy/(50/HZ),
+  (c->loops_per_jiffy/(5000/HZ)) % 100);
+   seq_printf(m, "clflush size\t: %u\n", c->x86_clflush_size);
+
+   seq_printf(m, "power management:");
+   for (i = 0; i < 32; i++) {
if (c->x86_power & (1 << i)) {
if (i < ARRAY_SIZE(x86_power_flags) &&
x86_power_flags[i])
@@ -104,11 +110,9 @@ static int show_cpuinfo(struct seq_file *m, void *v)
else
seq_printf(m, " [%d]", i);
}
+   }
 
-   seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
-  c->loops_per_jiffy/(50/HZ),
-  (c->loops_per_jiffy/(5000/HZ)) % 100);
-   seq_printf(m, "clflush size\t: %u\n\n", c->x86_clflush_size);
+   seq_printf(m, "\n\n");
 
return 0;
 }
-- 
1.5.3.8


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] x86: make cpu/proc|_64.c similar

2008-02-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

clean up for unification.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/proc.c|  120 +++-
 arch/x86/kernel/cpu/proc_64.c |   63 -
 2 files changed, 105 insertions(+), 78 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index af11d31..9bc3b04 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -8,75 +8,90 @@
 /*
  * Get CPU information for use by the procfs.
  */
+static void show_cpuinfo_core(struct seq_file *m, struct cpuinfo_x86 *c,
+ unsigned int cpu)
+{
+#ifdef CONFIG_X86_HT
+   if (c->x86_max_cores * smp_num_siblings > 1) {
+   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+   seq_printf(m, "siblings\t: %d\n",
+  cpus_weight(per_cpu(cpu_core_map, cpu)));
+   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
+   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
+   }
+#endif
+}
+
+static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
+{
+   /*
+* We use exception 16 if we have hardware math and we've either seen
+* it or the CPU claims it is internal
+*/
+   int fpu_exception = c->hard_math && (ignore_fpu_irq || cpu_has_fpu);
+   seq_printf(m,
+  "fdiv_bug\t: %s\n"
+  "hlt_bug\t\t: %s\n"
+  "f00f_bug\t: %s\n"
+  "coma_bug\t: %s\n"
+  "fpu\t\t: %s\n"
+  "fpu_exception\t: %s\n"
+  "cpuid level\t: %d\n"
+  "wp\t\t: %s\n",
+  c->fdiv_bug ? "yes" : "no",
+  c->hlt_works_ok ? "no" : "yes",
+  c->f00f_bug ? "yes" : "no",
+  c->coma_bug ? "yes" : "no",
+  c->hard_math ? "yes" : "no",
+  fpu_exception ? "yes" : "no",
+  c->cpuid_level,
+  c->wp_works_ok ? "yes" : "no");
+}
+
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
struct cpuinfo_x86 *c = v;
-   int i, n = 0;
-   int fpu_exception;
+   unsigned int cpu = 0;
+   int i;
 
 #ifdef CONFIG_SMP
-   n = c->cpu_index;
+   cpu = c->cpu_index;
 #endif
-   seq_printf(m, "processor\t: %d\n"
-   "vendor_id\t: %s\n"
-   "cpu family\t: %d\n"
-   "model\t\t: %d\n"
-   "model name\t: %s\n",
-   n,
-   c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
-   c->x86,
-   c->x86_model,
-   c->x86_model_id[0] ? c->x86_model_id : "unknown");
+   seq_printf(m, "processor\t: %u\n"
+  "vendor_id\t: %s\n"
+  "cpu family\t: %d\n"
+  "model\t\t: %u\n"
+  "model name\t: %s\n",
+  cpu,
+  c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
+  c->x86,
+  c->x86_model,
+  c->x86_model_id[0] ? c->x86_model_id : "unknown");
 
if (c->x86_mask || c->cpuid_level >= 0)
seq_printf(m, "stepping\t: %d\n", c->x86_mask);
else
seq_printf(m, "stepping\t: unknown\n");
 
-   if ( cpu_has(c, X86_FEATURE_TSC) ) {
-   unsigned int freq = cpufreq_quick_get(n);
+   if (cpu_has(c, X86_FEATURE_TSC)) {
+   unsigned int freq = cpufreq_quick_get(cpu);
+
if (!freq)
freq = cpu_khz;
seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
-   freq / 1000, (freq % 1000));
+  freq / 1000, (freq % 1000));
}
 
/* Cache size */
if (c->x86_cache_size >= 0)
seq_printf(m, "cache size\t: %d KB\n", c->x86_cache_size);
-#ifdef CONFIG_X86_HT
-   if (c->x86_max_cores * smp_num_siblings > 1) {
-   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
-   seq_printf(m, "siblings\t: %d\n",
-   cpus_weight(per_cpu(cpu_core_map, n)));
-   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
-   seq_printf(m, "cpu cores\t: %d\n&qu

[PATCH 1/5] x86_64: split cpuinfo from setup_64.c into cpu/proc_64.c

2008-02-20 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

x86 /proc/cpuinfo code can be unified.
This is the first step of unification.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/Makefile  |1 +
 arch/x86/kernel/cpu/proc_64.c |  126 +
 arch/x86/kernel/setup_64.c|  120 ---
 3 files changed, 127 insertions(+), 120 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/proc_64.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index a0c4d7c..8ba7d28 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_X86_32)  += transmeta.o
 obj-$(CONFIG_X86_32)   += intel.o
 obj-$(CONFIG_X86_32)   += nexgen.o
 obj-$(CONFIG_X86_32)   += umc.o
+obj-$(CONFIG_X86_64)   += proc_64.o
 
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/proc_64.c b/arch/x86/kernel/cpu/proc_64.c
new file mode 100644
index 000..bf4a94b
--- /dev/null
+++ b/arch/x86/kernel/cpu/proc_64.c
@@ -0,0 +1,126 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Get CPU information for use by the procfs.
+ */
+
+static int show_cpuinfo(struct seq_file *m, void *v)
+{
+   struct cpuinfo_x86 *c = v;
+   int cpu = 0, i;
+
+#ifdef CONFIG_SMP
+   cpu = c->cpu_index;
+#endif
+
+   seq_printf(m, "processor\t: %u\n"
+  "vendor_id\t: %s\n"
+  "cpu family\t: %d\n"
+  "model\t\t: %d\n"
+  "model name\t: %s\n",
+  (unsigned)cpu,
+  c->x86_vendor_id[0] ? c->x86_vendor_id : "unknown",
+  c->x86,
+  (int)c->x86_model,
+  c->x86_model_id[0] ? c->x86_model_id : "unknown");
+
+   if (c->x86_mask || c->cpuid_level >= 0)
+   seq_printf(m, "stepping\t: %d\n", c->x86_mask);
+   else
+   seq_printf(m, "stepping\t: unknown\n");
+
+   if (cpu_has(c, X86_FEATURE_TSC)) {
+   unsigned int freq = cpufreq_quick_get((unsigned)cpu);
+
+   if (!freq)
+   freq = cpu_khz;
+   seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
+  freq / 1000, (freq % 1000));
+   }
+
+   /* Cache size */
+   if (c->x86_cache_size >= 0)
+   seq_printf(m, "cache size\t: %d KB\n", c->x86_cache_size);
+
+#ifdef CONFIG_SMP
+   if (smp_num_siblings * c->x86_max_cores > 1) {
+   seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+   seq_printf(m, "siblings\t: %d\n",
+  cpus_weight(per_cpu(cpu_core_map, cpu)));
+   seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
+   seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
+   }
+#endif
+
+   seq_printf(m,
+  "fpu\t\t: yes\n"
+  "fpu_exception\t: yes\n"
+  "cpuid level\t: %d\n"
+  "wp\t\t: yes\n"
+  "flags\t\t:",
+  c->cpuid_level);
+
+   for (i = 0; i < 32*NCAPINTS; i++)
+   if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
+   seq_printf(m, " %s", x86_cap_flags[i]);
+
+   seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
+  c->loops_per_jiffy/(50/HZ),
+  (c->loops_per_jiffy/(5000/HZ)) % 100);
+
+   if (c->x86_tlbsize > 0)
+   seq_printf(m, "TLB size\t: %d 4K pages\n", c->x86_tlbsize);
+   seq_printf(m, "clflush size\t: %d\n", c->x86_clflush_size);
+   seq_printf(m, "cache_alignment\t: %d\n", c->x86_cache_alignment);
+
+   seq_printf(m, "address sizes\t: %u bits physical, %u bits virtual\n",
+  c->x86_phys_bits, c->x86_virt_bits);
+
+   seq_printf(m, "power management:");
+   for (i = 0; i < 32; i++) {
+   if (c->x86_power & (1 << i)) {
+   if (i < ARRAY_SIZE(x86_power_flags) &&
+   x86_power_flags[i])
+   seq_printf(m, "%s%s",
+  x86_power_flags[i][0]?" ":"",
+  x86_power_flags[i]);
+   else
+   seq_printf(m, " [%d]", i);
+   }
+   }
+
+   seq_printf(m, "\n\n");
+
+   return 0;
+}
+
+static void *c_start(struct seq_file *m, loff_t *pos)

Re: [PATCH] latencytop: fix kernel panic and memory leak on proc

2008-02-19 Thread Hiroshi Shimamoto
Ingo Molnar wrote:
> * Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
>> On Thu, 14 Feb 2008 14:51:19 -0800
>> Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> I posted 2 patches to fix kernel panic and memory leak.
>>> http://lkml.org/lkml/2008/2/14/282
>>> http://lkml.org/lkml/2008/2/14/283
>>>
>>> But, I think this patch is better than old ones.
> 
> thanks Hiroshi, applied.

Hi Ingo,

I'd like to be applied new patch for latencytop fix.
I think it better than old ptaches.

Can you apply this patch for latencytop issues?
http://lkml.org/lkml/2008/2/14/451

It's replacement of old patches you applied.
It makes latency file behavior same as other proc files.
get_proc_task() and put_task_struct() are called at read time,
and returns -ESRCH if get_proc_task() failed.

If is there any problem, please let me know.

thanks,
Hiroshi Shimamoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v3 PATCH] RTTIME watchdog timer proc interface

2008-02-15 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Introduce new proc interface for RTTIME watchdog.
It makes administrator able to set RTTIME watchdog to existing real-time
applications without impact. It's useful we don't want to change software
stack, but use RTTIME watchdog for that software.

New proc files:
 /proc//rttime
 /proc//task//rttime
these files has same content.

$ cat /proc//rttime
1000 2000
It shows current RLIMIT_RTTIME values, and the unit is nsec.
If the value is RLIM_INFINITY, it prints "unlmited".

$ echo "1000" > /proc//rttime
It sets RTTIME current value to 1000.

$ echo "1000 2000" > /proc//rttime
It sets RTTIME current value to 1000 and max value to 2000.

$ echo "0 0" > /proc//rttime
It sets RTTIME values to unlimited.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |  103 
 1 files changed, 103 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 88f8edf..34b485e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -381,6 +381,107 @@ static const struct file_operations 
proc_lstats_operations = {
 
 #endif
 
+static int rttime_show_proc(struct seq_file *m, void *v)
+{
+   struct inode *inode = m->private;
+   struct task_struct *task = get_proc_task(inode);
+   struct rlimit *rt;
+
+   if (!task)
+   return -ESRCH;
+
+   rt = &task->signal->rlim[RLIMIT_RTTIME];
+
+   if (rt->rlim_cur == RLIM_INFINITY)
+   seq_printf(m, "unlimited ");
+   else
+   seq_printf(m, "%lu ", rt->rlim_cur);
+
+   if (rt->rlim_max == RLIM_INFINITY)
+   seq_printf(m, "unlimited\n");
+   else
+   seq_printf(m, "%lu\n", rt->rlim_max);
+
+   put_task_struct(task);
+
+   return 0;
+}
+
+static int rttime_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, rttime_show_proc, inode);
+}
+
+static ssize_t rttime_do_write(struct task_struct *task,
+  const char __user *buf,
+  size_t count)
+{
+   char buffer[PROC_NUMBUF], *end;
+   struct rlimit new_rlim, *old_rlim;
+   size_t bufsz;
+   int ret;
+
+   old_rlim = task->signal->rlim + RLIMIT_RTTIME;
+   new_rlim = *old_rlim;
+   memset(buffer, 0, sizeof(buffer));
+   bufsz = min(count, sizeof(buffer) - 1);
+   if (copy_from_user(buffer, buf, bufsz))
+   return -EFAULT;
+   new_rlim.rlim_cur = simple_strtoul(buffer, &end, 0);
+   if (end - buffer == 0)
+   return -EINVAL;
+   /* 0 means unlimited */
+   if (new_rlim.rlim_cur == 0)
+   new_rlim.rlim_cur = RLIM_INFINITY;
+   if (*end == ' ') {
+   ++end;
+   buf += end - buffer;
+   memset(buffer, 0, sizeof(buffer));
+   bufsz = min(count - (end - buffer), sizeof(buffer) - 1);
+   if (copy_from_user(buffer, buf, bufsz))
+   return -EFAULT;
+   ret = strict_strtoul(buffer, 0, &new_rlim.rlim_max);
+   if (ret)
+   return ret;
+   /* 0 means unlimited */
+   if (new_rlim.rlim_max == 0)
+   new_rlim.rlim_max = RLIM_INFINITY;
+   }
+   if (new_rlim.rlim_cur > new_rlim.rlim_max)
+   return -EINVAL;
+   if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
+   !__capable(task, CAP_SYS_RESOURCE))
+   return -EPERM;
+   task_lock(task->group_leader);
+   *old_rlim = new_rlim;
+   task_unlock(task->group_leader);
+
+   return count;
+}
+
+static ssize_t rttime_write(struct file *file,
+   const char __user *buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+   int ret;
+
+   if (!task)
+   return -ESRCH;
+   ret = rttime_do_write(task, buf, count);
+   put_task_struct(task);
+   return ret;
+}
+
+static const struct file_operations proc_rttime_operations = {
+   .open   = rttime_open,
+   .read   = seq_read,
+   .write  = rttime_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 /* The badness from the OOM killer */
 unsigned long badness(struct task_struct *p, unsigned long uptime);
 static int proc_oom_score(struct task_struct *task, char *buffer)
@@ -2293,6 +2394,7 @@ static const struct pid_entry tgid_base_stuff[] = {
LNK("exe",exe),
REG("mounts", S_IRUGO, mounts),
REG("mountstats", S_IRUSR, moun

[PATCH] latencytop: fix kernel panic and memory leak on proc

2008-02-14 Thread Hiroshi Shimamoto
Hi,

I posted 2 patches to fix kernel panic and memory leak.
http://lkml.org/lkml/2008/2/14/282
http://lkml.org/lkml/2008/2/14/283

But, I think this patch is better than old ones.

---
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Reading /proc//latency or /proc//task//latency could cause
NULL pointer dereference.

In lstats_open(), get_proc_task() can return NULL, in which case the kernel
will oops at lstats_show_proc() because m->private is NULL.

This can be reproduced by the follwoing script.
while :
do
bash -c 'ls > ls.$$' &
pid=$!
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency
done

And the task struct which gotten by get_proc_task() is never put.
put_task_struct() should be called.

This patch changes the private is used to store inode, and the task struct
will be gotten and putted in read or write function.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |   27 +++
 1 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7c6b4ec..5de8dd5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -314,9 +314,12 @@ static int proc_pid_schedstat(struct task_struct *task, 
char *buffer)
 static int lstats_show_proc(struct seq_file *m, void *v)
 {
int i;
-   struct task_struct *task = m->private;
-   seq_puts(m, "Latency Top version : v0.1\n");
+   struct inode *inode = m->private;
+   struct task_struct *task = get_proc_task(inode);
 
+   if (!task)
+   return -ESRCH;
+   seq_puts(m, "Latency Top version : v0.1\n");
for (i = 0; i < 32; i++) {
if (task->latency_record[i].backtrace[0]) {
int q;
@@ -341,32 +344,24 @@ static int lstats_show_proc(struct seq_file *m, void *v)
}
 
}
+   put_task_struct(task);
return 0;
 }
 
 static int lstats_open(struct inode *inode, struct file *file)
 {
-   int ret;
-   struct seq_file *m;
-   struct task_struct *task = get_proc_task(inode);
-
-   ret = single_open(file, lstats_show_proc, NULL);
-   if (!ret) {
-   m = file->private_data;
-   m->private = task;
-   }
-   return ret;
+   return single_open(file, lstats_show_proc, inode);
 }
 
 static ssize_t lstats_write(struct file *file, const char __user *buf,
size_t count, loff_t *offs)
 {
-   struct seq_file *m;
-   struct task_struct *task;
+   struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
 
-   m = file->private_data;
-   task = m->private;
+   if (!task)
+   return -ESRCH;
clear_all_latency_tracing(task);
+   put_task_struct(task);
 
return count;
 }
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] latencytop: fix memory leak on latency proc file

2008-02-14 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

At lstats_open(), calling get_proc_task() gets task struct, but it never put.
put_task_struct() should be called when releasing.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1710b03..dc651a9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -360,6 +360,15 @@ static int lstats_open(struct inode *inode, struct file 
*file)
return ret;
 }
 
+static int lstats_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *m = file->private_data;
+   struct task_struct *task = m->private;
+
+   put_task_struct(task);
+   return single_release(inode, file);
+}
+
 static ssize_t lstats_write(struct file *file, const char __user *buf,
size_t count, loff_t *offs)
 {
@@ -378,7 +387,7 @@ static const struct file_operations proc_lstats_operations 
= {
.read   = seq_read,
.write  = lstats_write,
.llseek = seq_lseek,
-   .release= single_release,
+   .release= lstats_release,
 };
 
 #endif
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] latencytop: fix kernel panic while reading latency proc file

2008-02-14 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Reading /proc//latency or /proc//task//latency could cause
NULL pointer dereference.

In lstats_open(), get_proc_task() can return NULL, in which case the kernel
will oops at lstats_show_proc() because m->private is NULL.

When get_proc_task() returns NULL, the kernel should return -ENOENT.

This can be reproduced by the following script.
while :
do
date
bash -c 'ls > ls.$$' &
pid=$!
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency &
cat /proc/$pid/latency
done

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7c6b4ec..1710b03 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -350,6 +350,8 @@ static int lstats_open(struct inode *inode, struct file 
*file)
struct seq_file *m;
struct task_struct *task = get_proc_task(inode);
 
+   if (!task)
+   return -ENOENT;
ret = single_open(file, lstats_show_proc, NULL);
if (!ret) {
m = file->private_data;
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 PATCH] RTTIME watchdog timer proc interface

2008-02-13 Thread Hiroshi Shimamoto
Andrew Morton wrote:
> On Wed, 13 Feb 2008 09:45:54 -0800
> Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>>>> And /proc//task//rttime is also accessible.
>>> Please describe the format in the changelog.
>> I'm sorry I cannot catch your meaning.
> 
> Please include an example of the output of
> `cat /proc//task//rttime' in the changelog so that we
> can see precisely what interface you are proposing.

thanks, I see.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] RTTIME watchdog timer proc interface

2008-02-13 Thread Hiroshi Shimamoto
Peter Zijlstra wrote:
> On Tue, 2008-02-12 at 14:21 -0800, Hiroshi Shimamoto wrote:
>> Peter Zijlstra wrote:
>>> On Mon, 2008-02-11 at 13:44 -0800, Hiroshi Shimamoto wrote:
>>>> Hi Ingo,
>>>>
>>>> I think an interface to access RLIMIT_RTTIME from outside is useful.
>>>> It makes administrator able to set RLIMIT_RTTIME watchdog to existing
>>>> real-time applications without impact.
>>>>
>>>> I implemented that interface with /proc filesystem.
>>> /proc//tasks//rttime might also make sense.
>>>
>> thanks, I'll add.
> 
> I just realized that because its an rlimit, we store these values
> process-wide. The per task thing was a feature request from someone, and
> I just jumped on your interface without proper consideration.

I know RLIMIT_RTTIME is process-wide.
I'm not sure that someone requests for the per task.

> 
> I'll need to think a bit more on this

OK.

Thanks,
Hiroshi Shimamoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 PATCH] RTTIME watchdog timer proc interface

2008-02-13 Thread Hiroshi Shimamoto
Andrew Morton wrote:
> On Tue, 12 Feb 2008 14:41:42 -0800 Hiroshi Shimamoto <[EMAIL PROTECTED]> 
> wrote:
> 
>> From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>>
>> Introduce new proc interface for RTTIME watchdog.
>> It makes administrator able to set RTTIME watchdog to existing
>> real-time applications without impact.
>>
>> $ echo 1000 > /proc//rttime
>> set RTTIME current value to 1000, it means 10sec.
>>
>> $ echo "1000 2000" > /proc//rttime
>> set RTTIME current value to 1000 and max value to 2000.
> 
> How does one set it to `unlimited'?

There is no way now. Will add.

> 
>> And /proc//task//rttime is also accessible.
> 
> Please describe the format in the changelog.

I'm sorry I cannot catch your meaning.

> 
>> Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>> ---
>>  fs/proc/base.c |   89 
>> 
>>  1 files changed, 89 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> index 7c6b4ec..3212b44 100644
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -381,6 +381,93 @@ static const struct file_operations 
>> proc_lstats_operations = {
>>  
>>  #endif
>>  
>> +static int rttime_show_proc(struct seq_file *m, void *v)
>> +{
>> +struct task_struct *task = m->private;
>> +struct signal_struct *signal = task->signal;
>> +struct rlimit *rt = &signal->rlim[RLIMIT_RTTIME];
>> +
>> +if (rt->rlim_cur == RLIM_INFINITY)
>> +seq_printf(m, "unlimited ");
>> +else
>> +seq_printf(m, "%lu ", rt->rlim_cur);
>> +
>> +if (rt->rlim_max == RLIM_INFINITY)
>> +seq_printf(m, "unlimited\n");
>> +else
>> +seq_printf(m, "%lu\n", rt->rlim_max);
>> +
>> +return 0;
>> +}
>> +
>> +static int rttime_open(struct inode *inode, struct file *file)
>> +{
>> +int ret;
>> +struct seq_file *m;
>> +struct task_struct *task = get_proc_task(inode);
>> +
>> +ret = single_open(file, rttime_show_proc, NULL);
>> +if (!ret) {
>> +m = file->private_data;
>> +m->private = task;
>> +}
>> +return ret;
>> +}
> 
> get_proc_task() can return NULL, in which case it appears that the kernel
> will later oops?

Yes, it could cause oops. Will fix.

> 
>> +static ssize_t rttime_write(struct file *file,
>> +const char __user *buf,
>> +size_t count,
>> +loff_t *ppos)
>> +{
>> +struct seq_file *m = file->private_data;
>> +struct task_struct *task = m->private;
>> +char buffer[PROC_NUMBUF], *end;
>> +struct rlimit new_rlim, *old_rlim;
>> +int n, ret;
> 
> `n' should be size_t.  And a better name would be nice.

Agree.

> 
>> +old_rlim = task->signal->rlim + RLIMIT_RTTIME;
>> +new_rlim = *old_rlim;
>> +memset(buffer, 0, sizeof(buffer));
>> +n = count;
>> +if (n > sizeof(buffer) - 1)
>> +n = sizeof(buffer) - 1;
> 
> min()

Thanks, I hadn't noticed min().

> 
>> +if (copy_from_user(buffer, buf, n))
>> +return -EFAULT;
>> +new_rlim.rlim_cur = simple_strtoul(buffer, &end, 0);
>> +if (*end == ' ') {
>> +++end;
>> +buf += end - buffer;
>> +memset(buffer, 0, sizeof(buffer));
>> +n = count - (end - buffer);
>> +if (n > sizeof(buffer) - 1)
>> +n = sizeof(buffer) - 1;
> 
> min()
> 
>> +if (copy_from_user(buffer, buf, n))
>> +return -EFAULT;
>> +new_rlim.rlim_max = simple_strtoul(buffer, &end, 0);
> 
> strict_strtoul()?

OK, I should look at it.

> 
>> +}
>> +if (new_rlim.rlim_cur > new_rlim.rlim_max)
>> +return -EINVAL;
>> +if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
>> +!capable(CAP_SYS_RESOURCE))
>> +return -EPERM;
>> +ret = security_task_setrlimit(RLIMIT_RTTIME, &new_rlim);
>> +if (ret)
>> +return ret;
>> +task_lock(task->group_leader);
>> +*old_rlim = new_rlim;
>> +task_unlock(task->group_leader);
> 
> hm.  Why do we lo

[RFC v2 PATCH] RTTIME watchdog timer proc interface

2008-02-12 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Introduce new proc interface for RTTIME watchdog.
It makes administrator able to set RTTIME watchdog to existing
real-time applications without impact.

$ echo 1000 > /proc//rttime
set RTTIME current value to 1000, it means 10sec.

$ echo "1000 2000" > /proc//rttime
set RTTIME current value to 1000 and max value to 2000.

And /proc//task//rttime is also accessible.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |   89 
 1 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7c6b4ec..3212b44 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -381,6 +381,93 @@ static const struct file_operations proc_lstats_operations 
= {
 
 #endif
 
+static int rttime_show_proc(struct seq_file *m, void *v)
+{
+   struct task_struct *task = m->private;
+   struct signal_struct *signal = task->signal;
+   struct rlimit *rt = &signal->rlim[RLIMIT_RTTIME];
+
+   if (rt->rlim_cur == RLIM_INFINITY)
+   seq_printf(m, "unlimited ");
+   else
+   seq_printf(m, "%lu ", rt->rlim_cur);
+
+   if (rt->rlim_max == RLIM_INFINITY)
+   seq_printf(m, "unlimited\n");
+   else
+   seq_printf(m, "%lu\n", rt->rlim_max);
+
+   return 0;
+}
+
+static int rttime_open(struct inode *inode, struct file *file)
+{
+   int ret;
+   struct seq_file *m;
+   struct task_struct *task = get_proc_task(inode);
+
+   ret = single_open(file, rttime_show_proc, NULL);
+   if (!ret) {
+   m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static ssize_t rttime_write(struct file *file,
+   const char __user *buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct task_struct *task = m->private;
+   char buffer[PROC_NUMBUF], *end;
+   struct rlimit new_rlim, *old_rlim;
+   int n, ret;
+
+   old_rlim = task->signal->rlim + RLIMIT_RTTIME;
+   new_rlim = *old_rlim;
+   memset(buffer, 0, sizeof(buffer));
+   n = count;
+   if (n > sizeof(buffer) - 1)
+   n = sizeof(buffer) - 1;
+   if (copy_from_user(buffer, buf, n))
+   return -EFAULT;
+   new_rlim.rlim_cur = simple_strtoul(buffer, &end, 0);
+   if (*end == ' ') {
+   ++end;
+   buf += end - buffer;
+   memset(buffer, 0, sizeof(buffer));
+   n = count - (end - buffer);
+   if (n > sizeof(buffer) - 1)
+   n = sizeof(buffer) - 1;
+   if (copy_from_user(buffer, buf, n))
+   return -EFAULT;
+   new_rlim.rlim_max = simple_strtoul(buffer, &end, 0);
+   }
+   if (new_rlim.rlim_cur > new_rlim.rlim_max)
+   return -EINVAL;
+   if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
+   !capable(CAP_SYS_RESOURCE))
+   return -EPERM;
+   ret = security_task_setrlimit(RLIMIT_RTTIME, &new_rlim);
+   if (ret)
+   return ret;
+   task_lock(task->group_leader);
+   *old_rlim = new_rlim;
+   task_unlock(task->group_leader);
+
+   return count;
+}
+
+static const struct file_operations proc_rttime_operations = {
+   .open   = rttime_open,
+   .read   = seq_read,
+   .write  = rttime_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 /* The badness from the OOM killer */
 unsigned long badness(struct task_struct *p, unsigned long uptime);
 static int proc_oom_score(struct task_struct *task, char *buffer)
@@ -2300,6 +2387,7 @@ static const struct pid_entry tgid_base_stuff[] = {
LNK("exe",exe),
REG("mounts", S_IRUGO, mounts),
REG("mountstats", S_IRUSR, mountstats),
+   REG("rttime", S_IRUSR|S_IWUSR, rttime),
 #ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps",  S_IRUGO, smaps),
@@ -2630,6 +2718,7 @@ static const struct pid_entry tid_base_stuff[] = {
LNK("root",  root),
LNK("exe",   exe),
REG("mounts",S_IRUGO, mounts),
+   REG("rttime",S_IRUSR|S_IWUSR, rttime),
 #ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps", S_IRUGO, smaps),
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] RTTIME watchdog timer proc interface

2008-02-12 Thread Hiroshi Shimamoto
Peter Zijlstra wrote:
> On Mon, 2008-02-11 at 13:44 -0800, Hiroshi Shimamoto wrote:
>> Hi Ingo,
>>
>> I think an interface to access RLIMIT_RTTIME from outside is useful.
>> It makes administrator able to set RLIMIT_RTTIME watchdog to existing
>> real-time applications without impact.
>>
>> I implemented that interface with /proc filesystem.
> 
> /proc//tasks//rttime might also make sense.
> 
thanks, I'll add.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] RTTIME watchdog timer proc interface

2008-02-11 Thread Hiroshi Shimamoto
Hi Ingo,

I think an interface to access RLIMIT_RTTIME from outside is useful.
It makes administrator able to set RLIMIT_RTTIME watchdog to existing
real-time applications without impact.

I implemented that interface with /proc filesystem.

---
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

Introduce new proc interface for RTTIME watchdog.
It makes administrator able to set RTTIME watchdog to existing applications.

$ echo 1000 > /proc//rttime
set RTTIME current value to 1000, it means 10sec.

$ echo "1000 2000" > /proc//rttime
set RTTIME current value to 1000 and max value to 2000.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 fs/proc/base.c |   88 
 1 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7c6b4ec..5689c0e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -381,6 +381,93 @@ static const struct file_operations proc_lstats_operations 
= {
 
 #endif
 
+static int rttime_show_proc(struct seq_file *m, void *v)
+{
+   struct task_struct *task = m->private;
+   struct signal_struct *signal = task->signal;
+   struct rlimit *rt = &signal->rlim[RLIMIT_RTTIME];
+
+   if (rt->rlim_cur == RLIM_INFINITY)
+   seq_printf(m, "unlimited ");
+   else
+   seq_printf(m, "%lu ", rt->rlim_cur);
+
+   if (rt->rlim_max == RLIM_INFINITY)
+   seq_printf(m, "unlimited\n");
+   else
+   seq_printf(m, "%lu\n", rt->rlim_max);
+
+   return 0;
+}
+
+static int rttime_open(struct inode *inode, struct file *file)
+{
+   int ret;
+   struct seq_file *m;
+   struct task_struct *task = get_proc_task(inode);
+
+   ret = single_open(file, rttime_show_proc, NULL);
+   if (!ret) {
+   m = file->private_data;
+   m->private = task;
+   }
+   return ret;
+}
+
+static ssize_t rttime_write(struct file *file,
+   const char __user *buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct seq_file *m = file->private_data;
+   struct task_struct *task = m->private;
+   char buffer[PROC_NUMBUF], *end;
+   struct rlimit new_rlim, *old_rlim;
+   int n, ret;
+
+   old_rlim = task->signal->rlim + RLIMIT_RTTIME;
+   new_rlim = *old_rlim;
+   memset(buffer, 0, sizeof(buffer));
+   n = count;
+   if (n > sizeof(buffer) - 1)
+   n = sizeof(buffer) - 1;
+   if (copy_from_user(buffer, buf, n))
+   return -EFAULT;
+   new_rlim.rlim_cur = simple_strtoul(buffer, &end, 0);
+   if (*end == ' ') {
+   ++end;
+   buf += end - buffer;
+   memset(buffer, 0, sizeof(buffer));
+   n = count - (end - buffer);
+   if (n > sizeof(buffer) - 1)
+   n = sizeof(buffer) - 1;
+   if (copy_from_user(buffer, buf, n))
+   return -EFAULT;
+   new_rlim.rlim_max = simple_strtoul(buffer, &end, 0);
+   }
+   if (new_rlim.rlim_cur > new_rlim.rlim_max)
+   return -EINVAL;
+   if ((new_rlim.rlim_max > old_rlim->rlim_max) &&
+   !capable(CAP_SYS_RESOURCE))
+   return -EPERM;
+   ret = security_task_setrlimit(RLIMIT_RTTIME, &new_rlim);
+   if (ret)
+   return ret;
+   task_lock(task->group_leader);
+   *old_rlim = new_rlim;
+   task_unlock(task->group_leader);
+
+   return count;
+}
+
+static const struct file_operations proc_rttime_operations = {
+   .open   = rttime_open,
+   .read   = seq_read,
+   .write  = rttime_write,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 /* The badness from the OOM killer */
 unsigned long badness(struct task_struct *p, unsigned long uptime);
 static int proc_oom_score(struct task_struct *task, char *buffer)
@@ -2300,6 +2387,7 @@ static const struct pid_entry tgid_base_stuff[] = {
LNK("exe",exe),
REG("mounts", S_IRUGO, mounts),
REG("mountstats", S_IRUSR, mountstats),
+   REG("rttime", S_IRUSR|S_IWUSR, rttime),
 #ifdef CONFIG_PROC_PAGE_MONITOR
REG("clear_refs", S_IWUSR, clear_refs),
REG("smaps",  S_IRUGO, smaps),
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: remove struct cpu_model_info

2008-01-23 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86_64: remove struct cpu_model_info

No one uses struct cpu_model_info on x86_64 now.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/setup_64.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index 2643a8f..eb722b0 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -881,12 +881,6 @@ static void __cpuinit get_cpu_vendor(struct cpuinfo_x86 *c)
c->x86_vendor = X86_VENDOR_UNKNOWN;
 }
 
-struct cpu_model_info {
-   int vendor;
-   int family;
-   char *model_names[16];
-};
-
 /* Do some early cpuid on the boot CPU to get some parameter that are
needed before check_bugs. Everything advanced is in identify_cpu
below. */
-- 
1.5.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86: kdump failure

2008-01-17 Thread Hiroshi Shimamoto
Hi Ingo,

on recent x86.git kernel fails to kdump with following BUG.

SysRq : Trigger a crashdump
[ cut here ]
kernel BUG at include/linux/elfcore.h:105!
invalid opcode:  [1] PREEMPT SMP

In crash_save_cpu(), elf_core_copy_regs() is called and
ELF_CORE_COPY_REGS macro is required because struct pt_regs
and elf_gregset_t are different.

---
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86: kdump needs ELF_CORE_COPY_REGS macro

kdump needs ELF_CORE_COPY_REGS in crash_save_cpu().
This lack of the macro causes the following BUG.

SysRq : Trigger a crashdump
[ cut here ]
kernel BUG at include/linux/elfcore.h:105!
invalid opcode:  [1] PREEMPT SMP

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 include/asm-x86/elf.h |   62 +
 1 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/elf.h b/include/asm-x86/elf.h
index dab4744..d9c94e7 100644
--- a/include/asm-x86/elf.h
+++ b/include/asm-x86/elf.h
@@ -106,6 +106,31 @@ extern unsigned int vdso_enabled;
_r->ax = 0; \
 } while (0)
 
+/*
+ * regs is struct pt_regs, pr_reg is elf_gregset_t (which is
+ * now struct_user_regs, they are different)
+ */
+
+#define ELF_CORE_COPY_REGS(pr_reg, regs) do {  \
+   pr_reg[0] = regs->bx;   \
+   pr_reg[1] = regs->cx;   \
+   pr_reg[2] = regs->dx;   \
+   pr_reg[3] = regs->si;   \
+   pr_reg[4] = regs->di;   \
+   pr_reg[5] = regs->bp;   \
+   pr_reg[6] = regs->ax;   \
+   pr_reg[7] = regs->ds & 0x;  \
+   pr_reg[8] = regs->es & 0x;  \
+   pr_reg[9] = regs->fs & 0x;  \
+   savesegment(gs, pr_reg[10]);\
+   pr_reg[11] = regs->orig_ax; \
+   pr_reg[12] = regs->ip;  \
+   pr_reg[13] = regs->cs & 0x; \
+   pr_reg[14] = regs->flags;   \
+   pr_reg[15] = regs->sp;  \
+   pr_reg[16] = regs->ss & 0x; \
+} while (0);
+
 #define ELF_PLATFORM   (utsname()->machine)
 #define set_personality_64bit()do { } while (0)
 
@@ -165,6 +190,43 @@ static inline void elf_common_init(struct thread_struct *t,
} while (0)
 #define COMPAT_ELF_PLATFORM("i686")
 
+/*
+ * regs is struct pt_regs, pr_reg is elf_gregset_t (which is
+ * now struct_user_regs, they are different). Assumes current is the process
+ * getting dumped.
+ */
+
+#define ELF_CORE_COPY_REGS(pr_reg, regs)  do { \
+   unsigned v; \
+   (pr_reg)[0] = (regs)->r15;  \
+   (pr_reg)[1] = (regs)->r14;  \
+   (pr_reg)[2] = (regs)->r13;  \
+   (pr_reg)[3] = (regs)->r12;  \
+   (pr_reg)[4] = (regs)->bp;   \
+   (pr_reg)[5] = (regs)->bx;   \
+   (pr_reg)[6] = (regs)->r11;  \
+   (pr_reg)[7] = (regs)->r10;  \
+   (pr_reg)[8] = (regs)->r9;   \
+   (pr_reg)[9] = (regs)->r8;   \
+   (pr_reg)[10] = (regs)->ax;  \
+   (pr_reg)[11] = (regs)->cx;  \
+   (pr_reg)[12] = (regs)->dx;  \
+   (pr_reg)[13] = (regs)->si;  \
+   (pr_reg)[14] = (regs)->di;  \
+   (pr_reg)[15] = (regs)->orig_ax; \
+   (pr_reg)[16] = (regs)->ip;  \
+   (pr_reg)[17] = (regs)->cs;  \
+   (pr_reg)[18] = (regs)->flags;   \
+   (pr_reg)[19] = (regs)->sp;  \
+   (pr_reg)[20] = (regs)->ss;  \
+   (pr_reg)[21] = current->thread.fs;  \
+   (pr_reg)[22] = current->thread.gs;  \
+   asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v;   \
+   asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v;   \
+   asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v;   \
+   asm("movl %%gs,%0" : "=r" (v)); (pr_reg)[26] = v;   \
+} while (0);
+
 /* I'm not sure if we can use '-' here */
 #

[PATCH 2/2] x86_64: move select_idle_routine() call after detect_ht()

2008-01-15 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86_64: move select_idle_routine() call after detect_ht()

Move the select_idle_routine() call to after the detect_ht() call at
identify_cpu() on 64-bit.
This change is for printing the polling idle and HT enabled warning
message properly.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/setup_64.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index c8dcdd2..8ebf990 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -1067,7 +1067,6 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
break;
}
 
-   select_idle_routine(c);
detect_ht(c);
 
/*
@@ -1085,6 +1084,8 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_MCE
mcheck_init(c);
 #endif
+   select_idle_routine(c);
+
if (c != &boot_cpu_data)
mtrr_ap_init();
 #ifdef CONFIG_NUMA
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] x86: move warning message of polling idle and HT enabled

2008-01-15 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86: move warning message of polling idle and HT enabled

The warning message at idle_setup() is never shown because smp_num_sibling 
hasn't
been updated at this point yet.

Move this polling idle and HT enabled warning to select_idle_routine().
I also implement this warning on 64-bit kernel.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   18 --
 arch/x86/kernel/process_64.c |   17 -
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 69a69c3..d52c032 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,17 +287,27 @@ static void mwait_idle(void)
 
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
+   static int selected;
+
+   if (selected)
+   return;
+#ifdef CONFIG_X86_SMP
+   if (pm_idle == poll_idle && smp_num_siblings > 1) {
+   printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
+   " performance may degrade.\n");
+   }
+#endif
if (cpu_has(c, X86_FEATURE_MWAIT)) {
-   printk("monitor/mwait feature present.\n");
/*
 * Skip, if setup has overridden idle.
 * One CPU supports mwait => All CPUs supports mwait
 */
if (!pm_idle) {
-   printk("using mwait in idle threads.\n");
+   printk(KERN_INFO "using mwait in idle threads.\n");
pm_idle = mwait_idle;
}
}
+   selected = 1;
 }
 
 static int __init idle_setup(char *str)
@@ -305,10 +315,6 @@ static int __init idle_setup(char *str)
if (!strcmp(str, "poll")) {
printk("using polling idle threads.\n");
pm_idle = poll_idle;
-#ifdef CONFIG_X86_SMP
-   if (smp_num_siblings > 1)
-   printk("WARNING: polling idle and HT enabled, 
performance may degrade.\n");
-#endif
} else if (!strcmp(str, "mwait"))
force_mwait = 1;
else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5e12edd..8cff606 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -282,20 +282,27 @@ static void mwait_idle(void)
 
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
-   static int printed;
+   static int selected;
+
+   if (selected)
+   return;
+#ifdef CONFIG_X86_SMP
+   if (pm_idle == poll_idle && smp_num_siblings > 1) {
+   printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
+   " performance may degrade.\n");
+   }
+#endif
if (cpu_has(c, X86_FEATURE_MWAIT)) {
/*
 * Skip, if setup has overridden idle.
 * One CPU supports mwait => All CPUs supports mwait
 */
if (!pm_idle) {
-   if (!printed) {
-   printk(KERN_INFO "using mwait in idle 
threads.\n");
-   printed = 1;
-   }
+   printk(KERN_INFO "using mwait in idle threads.\n");
pm_idle = mwait_idle;
}
}
+   selected = 1;
 }
 
 static int __init idle_setup(char *str)
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] x86: move warning message of polling idle and HT enabled, take 3

2008-01-15 Thread Hiroshi Shimamoto
This patch set for printing properly the warning message;
"polling idle and HT enabled, performance may degrade"

The warning message is never shown because smp_num_sibling hasn't
been updated yet at idle_setup().
So I moved the warning to select_idle_routine() and made it called
after detect_ht().

It also needs Andi's patch for 32-bit.
i386: Move MWAIT idle check to generic CPU
http://lkml.org/lkml/2008/1/2/358

[PATCH 1/2] x86: move warning message of polling idle and HT enabled
[PATCH 2/2] x86_64: move select_idle_routine() call after detect_ht()

Thanks,
Hiroshi Shimamoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: move warning message of polling idle and HT enabled

2008-01-15 Thread Hiroshi Shimamoto
Ingo Molnar wrote:
> * Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>> From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>> Subject: [PATCH] x86: move warning message of polling idle and HT enabled
>>
>> This warning at idle_setup() is never shown because smp_num_sibling hasn't
>> been updated at this point yet.
>>
>> Move this polling idle and HT enabled warning message to 
>> select_idle_routine(). I also implement this warning on 64-bit kernel 
>> and make select_idle_routine() call after detect_ht() call.
> 
> looks good to me, but could you please split this up into two patches 
> instead - one that just moves/adds the printks, the other one that moves 
> the select_idle_routine() call? (Moving init calls around is notoriously 
> error-prone, so we want as small of a bisection target as possible.) 

Sure, I'll repost take3 soon.

Thanks,
Hiroshi Shimamoto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: move warning message of polling idle and HT enabled

2008-01-14 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86: move warning message of polling idle and HT enabled

This warning at idle_setup() is never shown because smp_num_sibling hasn't
been updated at this point yet.

Move this polling idle and HT enabled warning message to select_idle_routine().
I also implement this warning on 64-bit kernel and make select_idle_routine()
call after detect_ht() call.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
Ingo, this patch needs Andi's patch;
i386: Move MWAIT idle check to generic CPU
http://lkml.org/lkml/2008/1/2/358
I haven't found it in x86.git tree.

 arch/x86/kernel/process_32.c |   18 --
 arch/x86/kernel/process_64.c |   17 -
 arch/x86/kernel/setup_64.c   |3 ++-
 3 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 69a69c3..d52c032 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -287,17 +287,27 @@ static void mwait_idle(void)
 
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
+   static int selected;
+
+   if (selected)
+   return;
+#ifdef CONFIG_X86_SMP
+   if (pm_idle == poll_idle && smp_num_siblings > 1) {
+   printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
+   " performance may degrade.\n");
+   }
+#endif
if (cpu_has(c, X86_FEATURE_MWAIT)) {
-   printk("monitor/mwait feature present.\n");
/*
 * Skip, if setup has overridden idle.
 * One CPU supports mwait => All CPUs supports mwait
 */
if (!pm_idle) {
-   printk("using mwait in idle threads.\n");
+   printk(KERN_INFO "using mwait in idle threads.\n");
pm_idle = mwait_idle;
}
}
+   selected = 1;
 }
 
 static int __init idle_setup(char *str)
@@ -305,10 +315,6 @@ static int __init idle_setup(char *str)
if (!strcmp(str, "poll")) {
printk("using polling idle threads.\n");
pm_idle = poll_idle;
-#ifdef CONFIG_X86_SMP
-   if (smp_num_siblings > 1)
-   printk("WARNING: polling idle and HT enabled, 
performance may degrade.\n");
-#endif
} else if (!strcmp(str, "mwait"))
force_mwait = 1;
else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5e12edd..8cff606 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -282,20 +282,27 @@ static void mwait_idle(void)
 
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
-   static int printed;
+   static int selected;
+
+   if (selected)
+   return;
+#ifdef CONFIG_X86_SMP
+   if (pm_idle == poll_idle && smp_num_siblings > 1) {
+   printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
+   " performance may degrade.\n");
+   }
+#endif
if (cpu_has(c, X86_FEATURE_MWAIT)) {
/*
 * Skip, if setup has overridden idle.
 * One CPU supports mwait => All CPUs supports mwait
 */
if (!pm_idle) {
-   if (!printed) {
-   printk(KERN_INFO "using mwait in idle 
threads.\n");
-   printed = 1;
-   }
+   printk(KERN_INFO "using mwait in idle threads.\n");
pm_idle = mwait_idle;
}
}
+   selected = 1;
 }
 
 static int __init idle_setup(char *str)
diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index c8dcdd2..8ebf990 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -1067,7 +1067,6 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
break;
}
 
-   select_idle_routine(c);
detect_ht(c);
 
/*
@@ -1085,6 +1084,8 @@ void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_MCE
mcheck_init(c);
 #endif
+   select_idle_routine(c);
+
if (c != &boot_cpu_data)
mtrr_ap_init();
 #ifdef CONFIG_NUMA
-- 
1.5.3.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_32: remove warning message not used

2008-01-14 Thread Hiroshi Shimamoto
Ingo Molnar wrote:
> * Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>> From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>>
>> smp_num_siblings hasn't been updated at this point yet, so it's always 
>> 1. This polling and HT warning message is never shown.
> 
> hah, nice one. But could you perhaps move it to a place where it has a 
> chance to be printed? The warning still makes sense.
> 

yeah, you're right. I'll do.

Thanks,
Hiroshi Shimamoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_32: remove warning message not used

2008-01-14 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>

smp_num_siblings hasn't been updated at this point yet, so it's always 1.
This polling and HT warning message is never shown.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 69a69c3..f449b6d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -305,10 +305,6 @@ static int __init idle_setup(char *str)
if (!strcmp(str, "poll")) {
printk("using polling idle threads.\n");
pm_idle = poll_idle;
-#ifdef CONFIG_X86_SMP
-   if (smp_num_siblings > 1)
-   printk("WARNING: polling idle and HT enabled, 
performance may degrade.\n");
-#endif
} else if (!strcmp(str, "mwait"))
force_mwait = 1;
else
-- 
1.5.3.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: move out tick_nohz_stop_sched_tick() call from the loop

2008-01-09 Thread Hiroshi Shimamoto
Hello,

tick_nohz_stop_sched_tick() and tick_nohz_restart_sched_tick()
pair in cpu_idle() is different from 32-bit version.

From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86_64: move out tick_nohz_stop_sched_tick() call from the loop

Move out tick_nohz_stop_sched_tick() call from the loop in cpu_idle
same as 32-bit version.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_64.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 93ce4f3..6870208 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -170,14 +170,13 @@ void cpu_idle(void)
current_thread_info()->status |= TS_POLLING;
/* endless idle loop with no priority at all */
while (1) {
+   tick_nohz_stop_sched_tick();
while (!need_resched()) {
void (*idle)(void);
 
if (__get_cpu_var(cpu_idle_state))
__get_cpu_var(cpu_idle_state) = 0;
 
-   tick_nohz_stop_sched_tick();
-
rmb();
idle = pm_idle;
if (!idle)
-- 
1.5.3.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64: enable irq in default_idle

2008-01-08 Thread Hiroshi Shimamoto
Ingo Molnar wrote:
> * Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>> Hi Ingo,
>>
>> I think local_irq_enable() is missing in default_idle() on x86_64. 
>> It's for recent x86 tree.
>>
>> From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>> Subject: [PATCH] x86_64: enable irq in default_idle
>>
>> local_irq_enable() is missing after sched_clock_idle_wakeup_event().
> 
> thanks Hiroshi, applied.
> 
> The effects of this bug should be increased latencies on 64-bit. Did you 
> notice these latencies, or did you find the bug in some other way (code 
> review)?

I found this when I was comparing 32-bit and 64-bit source code
for x86 unification.

Thanks,
Hiroshi Shimamoto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: enable irq in default_idle

2008-01-07 Thread Hiroshi Shimamoto
Hi Ingo,

I think local_irq_enable() is missing in default_idle() on x86_64.
It's for recent x86 tree.

From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86_64: enable irq in default_idle

local_irq_enable() is missing after sched_clock_idle_wakeup_event().

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_64.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index c6ad1a0..93ce4f3 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -119,8 +119,8 @@ void default_idle(void)
t1 = ktime_get();
t1n = ktime_to_ns(t1);
sched_clock_idle_wakeup_event(t1n - t0n);
-   } else
-   local_irq_enable();
+   }
+   local_irq_enable();
current_thread_info()->status |= TS_POLLING;
 }
 
-- 
1.5.3.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: clean up apic_32/64.c

2008-01-02 Thread Hiroshi Shimamoto
Ingo Molnar wrote:
> * Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>> White space and coding style clean up. Make apic_32/64.c similar.
> 
> thanks, applied. FYI, there's still a bit left in apic_32.c:
> 
>  total: 5 errors, 1 warnings, 1566 lines checked
> 
> we might as well go for all of them? :-)

thanks, I made a clean up patch for it.

total: 0 errors, 0 warnings, 1567 lines checked

---
From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
Subject: [PATCH] x86: clean up apic_32.c

White space and coding style clean up.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/apic_32.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index 1e417df..3defb3c 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -46,7 +46,7 @@
 /*
  * Sanity check
  */
-#if (SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F
+#if ((SPURIOUS_APIC_VECTOR & 0x0F) != 0x0F)
 # error SPURIOUS_APIC_VECTOR definition error
 #endif
 
@@ -55,7 +55,7 @@
  *
  * -1=force-disable, +1=force-enable
  */
-static int enable_local_apic __initdata = 0;
+static int enable_local_apic __initdata;
 
 /* Local APIC timer verification ok */
 static int local_apic_timer_verify_ok;
@@ -432,7 +432,7 @@ void __init setup_boot_APIC_clock(void)
   "with PM Timer: %ldms instead of 100ms\n",
   (long)res);
/* Correct the lapic counter value */
-   res = (((u64) delta ) * pm_100ms);
+   res = (((u64) delta) * pm_100ms);
do_div(res, deltapm);
printk(KERN_INFO "APIC delta adjusted to PM-Timer: "
   "%lu (%ld)\n", (unsigned long) res, delta);
@@ -976,7 +976,8 @@ void __cpuinit setup_local_APIC(void)
value |= APIC_LVT_LEVEL_TRIGGER;
apic_write_around(APIC_LVT1, value);
 
-   if (integrated && !esr_disable) {   /* !82489DX */
+   if (integrated && !esr_disable) {
+   /* !82489DX */
maxlvt = lapic_get_maxlvt();
if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */
apic_write(APIC_ESR, 0);
@@ -1262,7 +1263,7 @@ void smp_error_interrupt(struct pt_regs *regs)
   6: Received illegal vector
   7: Illegal register address
*/
-   printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
+   printk(KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
smp_processor_id(), v , v1);
irq_exit();
 }
@@ -1349,7 +1350,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
value = apic_read(APIC_LVT0);
value &= ~(APIC_MODE_MASK | APIC_SEND_PENDING |
APIC_INPUT_POLARITY | APIC_LVT_REMOTE_IRR |
-   APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED );
+   APIC_LVT_LEVEL_TRIGGER | APIC_LVT_MASKED);
value |= APIC_LVT_REMOTE_IRR | APIC_SEND_PENDING;
value = SET_APIC_DELIVERY_MODE(value, APIC_MODE_EXTINT);
apic_write_around(APIC_LVT0, value);
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: clean up apic_32/64.c

2008-01-02 Thread Hiroshi Shimamoto
White space and coding style clean up.
Make apic_32/64.c similar.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/apic_32.c |5 ++---
 arch/x86/kernel/apic_64.c |   23 +--
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index 2865792..1e417df 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -577,7 +577,6 @@ static void local_apic_timer_interrupt(void)
  * [ if a single-CPU system runs an SMP kernel then we call the local
  *   interrupt as well. Thus we cannot inline the local irq ... ]
  */
-
 void smp_apic_timer_interrupt(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
@@ -1021,7 +1020,7 @@ void __cpuinit setup_local_APIC(void)
 /*
  * Detect and initialize APIC
  */
-static int __init detect_init_APIC (void)
+static int __init detect_init_APIC(void)
 {
u32 h, l, features;
 
@@ -1165,7 +1164,7 @@ fake_ioapic_page:
  * This initializes the IO-APIC and APIC hardware if this is
  * a UP kernel.
  */
-int __init APIC_init_uniprocessor (void)
+int __init APIC_init_uniprocessor(void)
 {
if (enable_local_apic < 0)
clear_cpu_cap(&boot_cpu_data, X86_FEATURE_APIC);
diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 9439aa3..286a396 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -23,33 +23,37 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 
-int apic_verbosity;
 int disable_apic_timer __cpuinitdata;
 static int apic_calibrate_pmtmr __initdata;
 int disable_apic;
 
-/* Local APIC timer works in C2? */
+/* Local APIC timer works in C2 */
 int local_apic_timer_c2_ok;
 EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
 
+/*
+ * Debug level, exported for io_apic.c
+ */
+int apic_verbosity;
+
 static struct resource lapic_resource = {
.name = "Local APIC",
.flags = IORESOURCE_MEM | IORESOURCE_BUSY,
@@ -355,6 +359,11 @@ static void __init calibrate_APIC_clock(void)
calibration_result = result / HZ;
 }
 
+/*
+ * Setup the boot APIC
+ *
+ * Calibrate and verify the result.
+ */
 void __init setup_boot_APIC_clock(void)
 {
/*
@@ -1109,8 +1118,8 @@ static struct sysdev_class lapic_sysclass = {
 };
 
 static struct sys_device device_lapic = {
-   .id = 0,
-   .cls= &lapic_sysclass,
+   .id = 0,
+   .cls= &lapic_sysclass,
 };
 
 static void __cpuinit apic_pm_activate(void)
@@ -1121,9 +1130,11 @@ static void __cpuinit apic_pm_activate(void)
 static int __init init_lapic_sysfs(void)
 {
int error;
+
if (!cpu_has_apic)
return 0;
/* XXX: remove suspend/resume procs if !apic_pm_state.active? */
+
error = sysdev_class_register(&lapic_sysclass);
if (!error)
error = sysdev_register(&device_lapic);
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: clean up process_32/64.c

2007-11-27 Thread Hiroshi Shimamoto
White space and coding style clean up.
Make process_32/64.c similar.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/process_32.c |   20 ++--
 arch/x86/kernel/process_64.c |  307 +-
 2 files changed, 163 insertions(+), 164 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a20de7f..bd707db 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -133,7 +133,7 @@ EXPORT_SYMBOL(default_idle);
  * to poll the ->work.need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+static void poll_idle(void)
 {
cpu_relax();
 }
@@ -330,8 +330,8 @@ void __show_registers(struct pt_regs *regs, int all)
printk("ESI: %08lx EDI: %08lx EBP: %08lx ESP: %08lx\n",
regs->esi, regs->edi, regs->ebp, esp);
printk(" DS: %04x ES: %04x FS: %04x GS: %04x SS: %04x\n",
-  regs->xds & 0x, regs->xes & 0x,
-  regs->xfs & 0x, gs, ss);
+   regs->xds & 0x, regs->xes & 0x,
+   regs->xfs & 0x, gs, ss);
 
if (!all)
return;
@@ -426,7 +426,7 @@ void flush_thread(void)
struct task_struct *tsk = current;
 
memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
-   memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+   memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
clear_tsk_thread_flag(tsk, TIF_DEBUG);
/*
 * Forget coprocessor state..
@@ -451,8 +451,8 @@ void prepare_to_copy(struct task_struct *tsk)
 }
 
 int copy_thread(int nr, unsigned long clone_flags, unsigned long esp,
-   unsigned long unused,
-   struct task_struct * p, struct pt_regs * regs)
+   unsigned long unused,
+   struct task_struct * p, struct pt_regs * regs)
 {
struct pt_regs * childregs;
struct task_struct *tsk;
@@ -468,7 +468,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned 
long esp,
 
p->thread.eip = (unsigned long) ret_from_fork;
 
-   savesegment(gs,p->thread.gs);
+   savesegment(gs, p->thread.gs);
 
tsk = current;
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
@@ -513,7 +513,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
for (i = 0; i < 8; i++)
-   dump->u_debugreg[i] = current->thread.debugreg[i];  
+   dump->u_debugreg[i] = current->thread.debugreg[i];
 
if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - 
dump->start_stack)) >> PAGE_SHIFT;
@@ -528,7 +528,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
dump->regs.ds = regs->xds;
dump->regs.es = regs->xes;
dump->regs.fs = regs->xfs;
-   savesegment(gs,dump->regs.gs);
+   savesegment(gs, dump->regs.gs);
dump->regs.orig_eax = regs->orig_eax;
dump->regs.eip = regs->eip;
dump->regs.cs = regs->xcs;
@@ -540,7 +540,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
 }
 EXPORT_SYMBOL(dump_thread);
 
-/* 
+/*
  * Capture the user space registers if the task is not running (in user space)
  */
 int dump_task_regs(struct task_struct *tsk, elf_gregset_t *regs)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 87c8e7f..57167dc 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -3,7 +3,7 @@
  *
  *  Pentium III FXSR, SSE support
  * Gareth Hughes <[EMAIL PROTECTED]>, May 2000
- * 
+ *
  *  X86-64 port
  * Andi Kleen.
  *
@@ -19,19 +19,19 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -122,43 +122,12 @@ static void default_idle(void)
  * to poll the ->need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+static void poll_idle(void)
 {
local_irq_enable();
cpu_relax();
 }
 
-void cpu_idle_wait(void)
-{
-   unsigned int cpu, this_cpu = get_cpu();
-   cpumask_t map, tmp = current->cpus_allowed;
-
-   set_cpus_allowed(current, cpumask_of_cpu(this_cpu));
-   put_cpu();
-
-   cpus_clear(map);
-   for_each_online_cpu(cpu) {
-   per_cpu(cpu_idle_state, cpu) = 1;
-   cpu_set(cpu, map);
-   }
-
-   __get_cpu_var(cpu_idle_state) = 0;
-
-   wmb(

[PATCH] x86: clean up nmi_32/64.c

2007-11-15 Thread Hiroshi Shimamoto
Subject: [PATCH] x86: clean up nmi_32/64.c

clean up and make nmi_32/64.c more similar.
- white space and coding style clean up.
- nmi_cpu_busy is available on CONFIG_SMP.
- move functions __acpi_nmi_enable, acpi_nmi_enable,
  __acpi_nmi_disable and acpi_nmi_disable.
- make variables name more similar.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/nmi_32.c |   18 +
 arch/x86/kernel/nmi_64.c |   92 +++---
 2 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c
index 600fd40..d0da785 100644
--- a/arch/x86/kernel/nmi_32.c
+++ b/arch/x86/kernel/nmi_32.c
@@ -51,13 +51,13 @@ static int unknown_nmi_panic_callback(struct pt_regs *regs, 
int cpu);
 
 static int endflag __initdata = 0;
 
+#ifdef CONFIG_SMP
 /* The performance counters used by NMI_LOCAL_APIC don't trigger when
  * the CPU is idle. To make sure the NMI watchdog really ticks on all
  * CPUs during the test make them busy.
  */
 static __init void nmi_cpu_busy(void *data)
 {
-#ifdef CONFIG_SMP
local_irq_enable_in_hardirq();
/* Intentionally don't use cpu_relax here. This is
   to make sure that the performance counter really ticks,
@@ -67,8 +67,8 @@ static __init void nmi_cpu_busy(void *data)
   care if they get somewhat less cycles. */
while (endflag == 0)
mb();
-#endif
 }
+#endif
 
 static int __init check_nmi_watchdog(void)
 {
@@ -87,11 +87,13 @@ static int __init check_nmi_watchdog(void)
 
printk(KERN_INFO "Testing NMI watchdog ... ");
 
+#ifdef CONFIG_SMP
if (nmi_watchdog == NMI_LOCAL_APIC)
smp_call_function(nmi_cpu_busy, (void *)&endflag, 0, 0);
+#endif
 
for_each_possible_cpu(cpu)
-   prev_nmi_count[cpu] = per_cpu(irq_stat, cpu).__nmi_count;
+   prev_nmi_count[cpu] = nmi_count(cpu);
local_irq_enable();
mdelay((20*1000)/nmi_hz); // wait 20 ticks
 
@@ -113,12 +115,13 @@ static int __init check_nmi_watchdog(void)
atomic_dec(&nmi_active);
}
}
-   endflag = 1;
if (!atomic_read(&nmi_active)) {
kfree(prev_nmi_count);
atomic_set(&nmi_active, -1);
+   endflag = 1;
return -1;
}
+   endflag = 1;
printk("OK.\n");
 
/* now that we know it works we can reduce NMI frequency to
@@ -173,7 +176,6 @@ static int lapic_nmi_resume(struct sys_device *dev)
return 0;
 }
 
-
 static struct sysdev_class nmi_sysclass = {
set_kset_name("lapic_nmi"),
.resume = lapic_nmi_resume,
@@ -236,10 +238,10 @@ void acpi_nmi_disable(void)
on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
 }
 
-void setup_apic_nmi_watchdog (void *unused)
+void setup_apic_nmi_watchdog(void *unused)
 {
if (__get_cpu_var(wd_enabled))
-   return;
+   return;
 
/* cheap hack to support suspend/resume */
/* if cpu0 is not active neither should the other cpus */
@@ -328,7 +330,7 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, 
unsigned reason)
unsigned int sum;
int touched = 0;
int cpu = smp_processor_id();
-   int rc=0;
+   int rc = 0;
 
/* check for other users first */
if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
diff --git a/arch/x86/kernel/nmi_64.c b/arch/x86/kernel/nmi_64.c
index ce08111..b4f88d6 100644
--- a/arch/x86/kernel/nmi_64.c
+++ b/arch/x86/kernel/nmi_64.c
@@ -78,22 +78,22 @@ static __init void nmi_cpu_busy(void *data)
 }
 #endif
 
-int __init check_nmi_watchdog (void)
+int __init check_nmi_watchdog(void)
 {
-   int *counts;
+   int *prev_nmi_count;
int cpu;
 
-   if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DISABLED)) 
+   if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DISABLED))
return 0;
 
if (!atomic_read(&nmi_active))
return 0;
 
-   counts = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
-   if (!counts)
+   prev_nmi_count = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
+   if (!prev_nmi_count)
return -1;
 
-   printk(KERN_INFO "testing NMI watchdog ... ");
+   printk(KERN_INFO "Testing NMI watchdog ... ");
 
 #ifdef CONFIG_SMP
if (nmi_watchdog == NMI_LOCAL_APIC)
@@ -101,24 +101,24 @@ int __init check_nmi_watchdog (void)
 #endif
 
for (cpu = 0; cpu < NR_CPUS; cpu++)
-   counts[cpu] = cpu_pda(cpu)->__nmi_count;
+   prev_nmi_count[cpu] = cpu_pda(cpu)->__nmi_count;
local_irq_enable();
mdelay((20*1000)/nmi_hz); // wait 20 ticks
 
for_each_online_cpu(cpu) {
if (!per_cpu(wd_enabled, cpu))
cont

[PATCH] x86: io_apic_64.c: remove unused config check

2007-11-09 Thread Hiroshi Shimamoto
CONFIG_IRQBALANCE doesn't exist on x86_64.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/io_apic_64.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/io_apic_64.c b/arch/x86/kernel/io_apic_64.c
index 953328b..04b90ce 100644
--- a/arch/x86/kernel/io_apic_64.c
+++ b/arch/x86/kernel/io_apic_64.c
@@ -1435,7 +1435,7 @@ static void ack_apic_level(unsigned int irq)
int do_unmask_irq = 0;
 
irq_complete_move(irq);
-#if defined(CONFIG_GENERIC_PENDING_IRQ) || defined(CONFIG_IRQBALANCE)
+#ifdef CONFIG_GENERIC_PENDING_IRQ
/* If we are moving the irq we need to mask it */
if (unlikely(irq_desc[irq].status & IRQ_MOVE_PENDING)) {
do_unmask_irq = 1;
-- 
1.5.3.4
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert x86: add lapic_shutdown for x86_64

2007-10-29 Thread Hiroshi Shimamoto
Arjan van de Ven wrote:
> On Mon, 29 Oct 2007 15:39:46 -0700
> Hiroshi Shimamoto <[EMAIL PROTECTED]> wrote:
> 
>> lapic_shutdown is useless on x86_64.
>>
> 
>  but since the goal is to get apic_32.c and apic_64.c to be more
> converging (to the point of becoming the same file)... isn't your patch
> going in the opposite direction?
> 
Hmm, I'm not sure that this revert affects x86 unification.
Vivek said that probably we don't have to introduce lapic_shutdown() for 64bit.
So I submitted this patch which reverts my previous post, it was applied before
the comment.

Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] x86: add lapic_shutdown for x86_64

2007-10-29 Thread Hiroshi Shimamoto
Eric W. Biederman wrote:
> Hiroshi Shimamoto <[EMAIL PROTECTED]> writes:
> 
>>> Do we really have to introduce this function for 64bit? I remember some
>>> issues were faced on i386 w.r.t kernel enabling the LAPIC against the
>>> wishes of BIOS hence kernel was disabling it while shutting down. No
>>> such problems were reported for x86_64 hence this function existed only
>>> for i386.
>> Thanks for the comment. I didn't know the issues, so I'd simply added
>> this function for unification.
>>
>>> If that is the case, probably we don't have to introduce lapic_shutdown()
>>> for x86_64. Instead call lapic_shutdown() for X86_32, and 
>>> disble_local_APIC()
>>> otherwise?
>> I will do that. I was thinking which is good when posting these patches.
> 
> I'm a little concerned here.  This sounds like forced unification.
> If we can't clean up the infrastructure so things are obviously better
> and cleanly factored for both architectures we should not unify the files.
> 
> As a general principle I would rather have two crudy files side by
> side the one super crudy file.
> 
> So for unification I suggest finally fixing this right and taking the
> apics completely out of the kexec on panic path.

Thanks for the suggestion.
But it's hard for me to imagine.
I'll try to consider about it.

Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Revert x86: add lapic_shutdown for x86_64

2007-10-29 Thread Hiroshi Shimamoto
lapic_shutdown is useless on x86_64.

Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
---
 arch/x86/kernel/apic_64.c |   14 --
 arch/x86/kernel/crash.c   |5 +
 include/asm-x86/apic_64.h |1 -
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index f28ccb5..f47bc49 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -287,20 +287,6 @@ void disable_local_APIC(void)
apic_write(APIC_SPIV, value);
 }
 
-void lapic_shutdown(void)
-{
-   unsigned long flags;
-
-   if (!cpu_has_apic)
-   return;
-
-   local_irq_save(flags);
-
-   disable_local_APIC();
-
-   local_irq_restore(flags);
-}
-
 /*
  * This is to verify that we're looking at a real local APIC.
  * Check these against your board if the CPUs aren't getting
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8bb482f..79a5a25 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -136,7 +136,12 @@ void machine_crash_shutdown(struct pt_regs *regs)
/* Make a note of crashing cpu. Will be used in NMI callback.*/
crashing_cpu = safe_smp_processor_id();
nmi_shootdown_cpus();
+#ifdef CONFIG_X86_32
lapic_shutdown();
+#else
+   if (cpu_has_apic)
+   disable_local_APIC();
+#endif
 #if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
 #endif
diff --git a/include/asm-x86/apic_64.h b/include/asm-x86/apic_64.h
index 2747a11..3c8f21e 100644
--- a/include/asm-x86/apic_64.h
+++ b/include/asm-x86/apic_64.h
@@ -69,7 +69,6 @@ extern void clear_local_APIC (void);
 extern void connect_bsp_APIC (void);
 extern void disconnect_bsp_APIC (int virt_wire_setup);
 extern void disable_local_APIC (void);
-extern void lapic_shutdown (void);
 extern int verify_local_APIC (void);
 extern void cache_APIC_registers (void);
 extern void sync_Arb_IDs (void);
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86: unify crash_32/64.c

2007-10-26 Thread Hiroshi Shimamoto
Hiroshi Shimamoto wrote:
> Thomas Gleixner wrote:
>> On Fri, 26 Oct 2007, Hiroshi Shimamoto wrote:
>>
>> Added Venki to CC
>>
>>>> I'm now testing crash on 32bit, but there is an issue before
>>>> applying the patches. My machine stopped at checking 'hlt'
>>>> after kexec, showing below message.
>>>>
>>>> CPU: Intel(R) Xeon(TM) CPU 3.80GHz stepping 0a
>>>> Checking 'hlt' instruction...
>>>>
>>> v2.6.23.1 works fine for 1st kernel.
>>>> I'm investigating it..
>>> I found that the following patch makes my machine stopped.
>>> bfe0c1cc6456bba1f4e3cc1fe29c0ea578ac763a
>>> x86: HPET force enable for ICH5
>>>
>>> It means that after applied this patch, HPET is enabled
>>> automatically on 1st kernel and after crash/kexec the 2nd
>>> kernel stopped at checking 'hlt'.
>>>
>>> I also tested the latest kernel(2.6.24-rc1-gec3b67c1).
>>> Boot parameter "nohpet" resolves this issue and kdump
>>> works well on 32bit.
>>> So I guess HPET affects this.
>>> But I don't know why 64bit kernel with HPET is OK.
>> Hmm. Does the 64 bit code shutdown HPET and restore the IRQ routing to
>> PIT and 32 bit is missing this ?
> 
> Sorry, I'm not sure how I can get these informations.
> Can you please tell me what I should do?
> I'll continue to dig the issue.

I attached the .config files and console logs.
config32/64 are for 1st kernel, and cap32/64 are for 2nd capture kernel.
kdump1.log is boot with nohpet on 32bit.
kdump2.log is boot without nohpet on 32bit and the 2nd kernel hangs.
kdump3.log is on 64bit. And the first kdump is failed because of
without noapic.

Thanks
Hiroshi Shimamoto


configs.tar.bz2
Description: Binary data


consolelog.tar.bz2
Description: Binary data


Re: [PATCH 0/3] x86: unify crash_32/64.c

2007-10-26 Thread Hiroshi Shimamoto
Thomas Gleixner wrote:
> On Fri, 26 Oct 2007, Hiroshi Shimamoto wrote:
> 
> Added Venki to CC
> 
>>> I'm now testing crash on 32bit, but there is an issue before
>>> applying the patches. My machine stopped at checking 'hlt'
>>> after kexec, showing below message.
>>>
>>> CPU: Intel(R) Xeon(TM) CPU 3.80GHz stepping 0a
>>> Checking 'hlt' instruction...
>>>
>> v2.6.23.1 works fine for 1st kernel.
>>> I'm investigating it..
>> I found that the following patch makes my machine stopped.
>> bfe0c1cc6456bba1f4e3cc1fe29c0ea578ac763a
>> x86: HPET force enable for ICH5
>>
>> It means that after applied this patch, HPET is enabled
>> automatically on 1st kernel and after crash/kexec the 2nd
>> kernel stopped at checking 'hlt'.
>>
>> I also tested the latest kernel(2.6.24-rc1-gec3b67c1).
>> Boot parameter "nohpet" resolves this issue and kdump
>> works well on 32bit.
>> So I guess HPET affects this.
>> But I don't know why 64bit kernel with HPET is OK.
> 
> Hmm. Does the 64 bit code shutdown HPET and restore the IRQ routing to
> PIT and 32 bit is missing this ?

Sorry, I'm not sure how I can get these informations.
Can you please tell me what I should do?
I'll continue to dig the issue.

I also have the following message.
..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 
'noapic' kernel parameter

It appeared only on 64bit and the 2nd kernel without
boot parameter noapic.


Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86: unify crash_32/64.c

2007-10-26 Thread Hiroshi Shimamoto
> I'm now testing crash on 32bit, but there is an issue before
> applying the patches. My machine stopped at checking 'hlt'
> after kexec, showing below message.
> 
> CPU: Intel(R) Xeon(TM) CPU 3.80GHz stepping 0a
> Checking 'hlt' instruction...
> 
> v2.6.23.1 works fine for 1st kernel.
> I'm investigating it..

I found that the following patch makes my machine stopped.
bfe0c1cc6456bba1f4e3cc1fe29c0ea578ac763a
x86: HPET force enable for ICH5

It means that after applied this patch, HPET is enabled
automatically on 1st kernel and after crash/kexec the 2nd
kernel stopped at checking 'hlt'.

I also tested the latest kernel(2.6.24-rc1-gec3b67c1).
Boot parameter "nohpet" resolves this issue and kdump
works well on 32bit.
So I guess HPET affects this.
But I don't know why 64bit kernel with HPET is OK.

Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86: unify crash_32/64.c

2007-10-25 Thread Hiroshi Shimamoto
Hiroshi Shimamoto wrote:
> Vivek Goyal wrote:
>> On Fri, Oct 19, 2007 at 06:18:27PM -0700, Hiroshi Shimamoto wrote:
>>> Hi,
>>>
>>> I made patches to unify crash_32/64.c.
>>> There are three patches;
>>> 1. add lapic_shutdown for x86_64
>>> 2. add safe_smp_processor_id for x86_64
>>> 3. unify crash_32/64.c
>>>
>>> I'm not sure that it's good to split to these patches.
>>>
>>> I've compiled on both of 32bit and 64bit, and tested
>>> kdump on 64bit.
>>>
>> Hi Hiroshi,
>>
>> Thanks for the patches. Can you please also test it on 32bit to make
>> sure nothing is broken.
> 
> Okay, I'll test it on 32bit.
> A build problem already has been found on 32bit.
> 
I'm now testing crash on 32bit, but there is an issue before
applying the patches. My machine stopped at checking 'hlt'
after kexec, showing below message.

CPU: Intel(R) Xeon(TM) CPU 3.80GHz stepping 0a
Checking 'hlt' instruction...

v2.6.23.1 works fine for 1st kernel.
I'm investigating it..

Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] x86: add lapic_shutdown for x86_64

2007-10-24 Thread Hiroshi Shimamoto
Vivek Goyal wrote:
> On Fri, Oct 19, 2007 at 06:21:11PM -0700, Hiroshi Shimamoto wrote:
>> From: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>>
>> Signed-off-by: Hiroshi Shimamoto <[EMAIL PROTECTED]>
>> ---
>>  arch/x86/kernel/apic_64.c |   14 ++
>>  include/asm-x86/apic_64.h |1 +
>>  2 files changed, 15 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
>> index f47bc49..f28ccb5 100644
>> --- a/arch/x86/kernel/apic_64.c
>> +++ b/arch/x86/kernel/apic_64.c
>> @@ -287,6 +287,20 @@ void disable_local_APIC(void)
>>  apic_write(APIC_SPIV, value);
>>  }
>>
>> +void lapic_shutdown(void)
>> +{
>> +unsigned long flags;
>> +
>> +if (!cpu_has_apic)
>> +return;
>> +
>> +local_irq_save(flags);
>> +
>> +disable_local_APIC();
>> +
>> +local_irq_restore(flags);
>> +}
>> +
>>  /*
> 
> Do we really have to introduce this function for 64bit? I remember some
> issues were faced on i386 w.r.t kernel enabling the LAPIC against the
> wishes of BIOS hence kernel was disabling it while shutting down. No
> such problems were reported for x86_64 hence this function existed only
> for i386.

Thanks for the comment. I didn't know the issues, so I'd simply added
this function for unification.

> If that is the case, probably we don't have to introduce lapic_shutdown()
> for x86_64. Instead call lapic_shutdown() for X86_32, and disble_local_APIC()
> otherwise?

I will do that. I was thinking which is good when posting these patches.

Thanks
Hiroshi Shimamoto
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] x86: unify crash_32/64.c

2007-10-24 Thread Hiroshi Shimamoto
Vivek Goyal wrote:
> On Fri, Oct 19, 2007 at 06:18:27PM -0700, Hiroshi Shimamoto wrote:
>> Hi,
>>
>> I made patches to unify crash_32/64.c.
>> There are three patches;
>> 1. add lapic_shutdown for x86_64
>> 2. add safe_smp_processor_id for x86_64
>> 3. unify crash_32/64.c
>>
>> I'm not sure that it's good to split to these patches.
>>
>> I've compiled on both of 32bit and 64bit, and tested
>> kdump on 64bit.
>>
> 
> Hi Hiroshi,
> 
> Thanks for the patches. Can you please also test it on 32bit to make
> sure nothing is broken.

Okay, I'll test it on 32bit.
A build problem already has been found on 32bit.

Thanks,
Hiroshi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >