date:20140610

Re: [PATCH] ring-buffer: Fix polling on trace_pipe

2014-06-10 Thread Martin Lau

Hi Steve,

Attached is the modified test program.  Here is the sample output:

localhost ~ # ./ftrace-test-epoll-kafai
   <...>-1857  [000] ...1   720.174295: tracing_mark_write: some data
1857: waitting for more data..
1858: written more data


Thanks,
--Martin

On Tue, Jun 10, 2014 at 11:49:15AM -0400, Steven Rostedt wrote:
> On Mon, 9 Jun 2014 23:06:42 -0700
> Martin Lau  wrote:
> 
> > ring_buffer_poll_wait() should always put the poll_table to its wait_queue
> > even there is immediate data available.  Otherwise, the following epoll and
> > read sequence will eventually hang forever:
> > 
> > 1. Put some data to make the trace_pipe ring_buffer read ready first
> > 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
> > 3. epoll_wait()
> > 4. read(trace_pipe_fd) till EAGAIN
> > 5. Add some more data to the trace_pipe ring_buffer
> > 6. epoll_wait() -> this epoll_wait() will block forever
> > 
> > ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
> >   ring_buffer_poll_wait() returns immediately without adding poll_table,
> >   which has poll_table->_qproc pointing to ep_poll_callback(), to its
> >   wait_queue.
> > ~ During the epoll_wait() call in step 3 and step 6,
> >   ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
> >   because the poll_table->_qproc is NULL and it is how epoll works.
> > ~ When there is new data available in step 6, ring_buffer does not know
> >   it has to call ep_poll_callback() because it is not in its wait queue.
> >   Hence, block forever.
> > 
> > Other poll implementation seems to call poll_wait() unconditionally as the 
> > very
> > first thing to do.  For example, tcp_poll() in tcp.c.
> 
> I'm trying to see the effect of this bug, but can't seem to reproduce
> it. Maybe I did something wrong. Attached is a test program I wrote
> trying to follow your instructions. I don't use epoll, so perhaps I
> didn't use it correctly.
> 
> Can you modify it to show me the problem this is trying to fix. That
> is, without this patch it hangs, but with the patch it does not.
> 
> Thanks!
> 
> -- Steve
> 
> > 
> > Signed-off-by: Martin Lau 
> > ---
> >  kernel/trace/ring_buffer.c | 4 
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index fd12cc5..a6e64e8 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -613,10 +613,6 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, 
> > int cpu,
> > struct ring_buffer_per_cpu *cpu_buffer;
> > struct rb_irq_work *work;
> >  
> > -   if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
> > -   (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, 
> > cpu)))
> > -   return POLLIN | POLLRDNORM;
> > -
> > if (cpu == RING_BUFFER_ALL_CPUS)
> > work = >irq_work;
> > else {
> 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static const char * debugfs_list[] = {
"/debug/tracing",
"/sys/kernel/debug/tracing",
"/d/tracing",
NULL,
};

static const char *debugfs;
static int markfd;
static int trace_pipe_fd;

static const char *find_debugfs(void)
{
struct stat st;
int i;
int r;

for (i = 0; debugfs_list[i]; i++) {
r = stat(debugfs_list[i], );
if (r < 0)
continue;
if (S_ISDIR(st.st_mode))
return debugfs_list[i];
}
return NULL;
}

static char * make_path(const char *file)
{
char *path;
int size;

size = strlen(debugfs) + strlen(file) + 2;
path = malloc(size);
if (!path) {
perror("malloc");
exit(-1);
}
sprintf(path, "%s/%s", debugfs, file);
return path;
}

static void mark_write(const char *str)
{
write(markfd, str, strlen(str));
}

static void read_trace_pipe(void)
{
char buf[1024];
int r;

while ((r = read(trace_pipe_fd, buf, 1024)) > 0)
printf("%.*s", r, buf);
}

int main (int argc, char **argv)
{
struct epoll_event ee;
char *marker;
char *pipe;
int efd;
int ret;
pid_t dwrt_pid;

debugfs = find_debugfs();
if (!debugfs) {
fprintf(stderr, "Could not find debugfs\n");
exit(-1);
}

marker = make_path("trace_marker");
pipe = make_path("trace_pipe");

markfd = open(marker, O_WRONLY);
if (markfd < 0) {
perror("marker");
exit(-1);
}
trace_pipe_fd = open(pipe, O_RDONLY|O_NONBLOCK);
if (trace_pipe_fd < 0) {
perror("trace_pipe");
exit(-1);
}

efd = epoll_create(1);
if (efd < 0) {
perror("epoll_create");
exit(-1);

Possible netns creation and execution performance/scalability regression since v3.8 due to rcu callbacks being offloaded to multiple cpus

2014-06-10 Thread Rafael Tinoco

Paul E. McKenney, Eric Biederman, David Miller (and/or anyone else interested):

It was brought to my attention that netns creation/execution might
have suffered scalability/performance regression after v3.8.

I would like you, or anyone interested, to review these charts/data
and check if there is something that could be discussed/said before I
move further.

The following script was used for all the tests and charts generation:


#!/bin/bash
IP=/sbin/ip

function add_fake_router_uuid() {
j=`uuidgen`
$IP netns add bar-${j}
$IP netns exec bar-${j} $IP link set lo up
$IP netns exec bar-${j} sysctl -w net.ipv4.ip_forward=1 > /dev/null
k=`echo $j | cut -b -11`
$IP link add qro-${k} type veth peer name qri-${k} netns bar-${j}
$IP link add qgo-${k} type veth peer name qgi-${k} netns bar-${j}
}

for i in `seq 1 $1`; do
if [ `expr $i % 250` -eq 0 ]; then
echo "$i by `date +%s`"
fi
add_fake_router_uuid
done


This script gives how many "fake routers" are added per second (from 0
to 3000 router creation mark, ex). With this and a git bisect on
kernel tree I was led to one specific commit causing
scalability/performance regression: #911af50 "rcu: Provide
compile-time control for no-CBs CPUs". Even Though this change was
experimental at that point, it introduced a performance scalability
regression (explained below) that still lasts.

RCU related code looked like to be responsible for the problem. With
that, every commit from tag v3.8 to master that changed any of this
files: "kernel/rcutree.c kernel/rcutree.h kernel/rcutree_plugin.h
include/trace/events/rcu.h include/linux/rcupdate.h" had the kernel
checked out/compiled/tested. The idea was to check performance
regression during rcu development, if that was the case. In the worst
case, the regression not being related to rcu, I would still have
chronological data to interpret.

All text below this refer to 2 groups of charts, generated during the study:


1) Kernel git tags from 3.8 to 3.14.
*** http://people.canonical.com/~inaddy/lp1328088/charts/250-tag.html ***

2) Kernel git commits for rcu development (111 commits) -> Clearly
shows regressions:
*** http://people.canonical.com/~inaddy/lp1328088/charts/250.html ***

Obs:

1) There is a general chart with 111 commits. With this chart you can
see performance evolution/regression on each test mark. Test mark goes
from 0 to 2500 and refers to "fake routers already created". Example:
Throughput was 50 routers/sec on 250 already created mark and 30
routers/sec on 1250 mark.

2) Clicking on a specific commit will give you that commit evolution
from 0 routers already created to 2500 routers already created mark.


Since there were differences in results, depending on how many cpus or
how the no-cb cpus were configured, 3 kernel config options were used
on every measure, for 1 and 4 cpus.


- CONFIG_RCU_NOCB_CPU (disabled): nocbno
- CONFIG_RCU_NOCB_CPU_ALL (enabled): nocball
- CONFIG_RCU_NOCB_CPU_NONE (enabled): nocbnone

Obs: For 1 cpu cases: nocbno, nocbnone, nocball behaves the same (or
should) since w/ only 1 cpu there is no no-cb cpu.


After charts being generated it was clear that NOCB_CPU_ALL (4 cpus)
affected the "fake routers" creation process performance and this
regression continues up to upstream version. It was also clear that,
after commit #911af50, having more than 1 cpu does not improve
performance/scalability for netns, makes it worse.

#911af50

...
+#ifdef CONFIG_RCU_NOCB_CPU_ALL
+ pr_info("\tExperimental no-CBs for all CPUs\n");
+ cpumask_setall(rcu_nocb_mask);
+#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
...


Comparing standing out points (see charts):

#81e5949 - good
#911af50 - bad

I was able to see that, from the script above, the following lines
causes major impact on netns scalability/performance:

1) ip netns add -> huge performance regression:

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 250 netns/sec to 50 netns/sec on 500 netns
already created mark

2) ip netns exec -> some performance regression

 1 cpu: no regression
 4 cpu: regression for NOCB_CPU_ALL

 obs: regression from 40 netns (+1 exec per netns creation) to 20
netns/sec on 500 netns created mark



FULL NOTE: http://people.canonical.com/~inaddy/lp1328088/

** Assumption: RCU callbacks being offloaded to multiple cpus
(cpumask_setall) caused regression in
copy_net_ns<-created_new_namespaces or unshare(clone_newnet).

** Next Steps: I'll probably begin to function_graph netns creation execution
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recvmmsg/sendmmsg result types inconsistent, integer overflows?

2014-06-10 Thread Eric Dumazet

On Wed, 2014-06-11 at 07:24 +0200, Mike Galbraith wrote:
> (CCs network wizard hangout)
> 
> On Wed, 2014-06-11 at 00:12 -0400, Rich Felker wrote: 
> > While looking to add support for the recvmmsg and sendmmsg syscalls in
> > musl libc, I ran into some disturbing findings on the kernel side. In
> > the struct mmsghdr, the field where the result for each message is
> > stored has type int, which is inconsistent with the return type
> > ssize_t of recvmsg/sendmsg. So I tried to track down what happens when
> > the result is or would be larger than 2GB, and quickly found an
> > explanation for why the type in the structure was defined wrong:
> > internally, the kernel uses int as the return type for revcmsg and
> > sendmsg. Oops.
> > 
> > A bit more RTFS'ing brought me to tcp_sendmsg in net/ipv4/tcp.c (I
> > figured let's look at a stream-based protocol, since datagrams can
> > likely never be that big for any existing protocol), and as far as I
> > can tell, it's haphazardly mixing int and size_t with no checks for
> > overflows. I looked for anywhere the kernel might try to verify before
> > starting that the sum of the lengths of all the iovec components
> > doesn't overflow INT_MAX or even SIZE_MAX, but didn't find any such
> > checks.
> > 
> > Is there some magic that makes this all safe, or is this a big mess of
> > possibly-security-relevant bugs?
> > 
> > Rich
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 


See commit 8acfe468b0384e834a303f08ebc4953d72fb690a
("net: Limit socket I/O iovec total length to INT_MAX.")

(or grep for verify_iovec() )


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] err.h: silence sparse warning: dereference of noderef expression

2014-06-10 Thread Dan Carpenter

On Tue, Jun 10, 2014 at 05:38:49PM -0400, Jeff Layton wrote:
> From: Jeff Layton 
> 
> Lately, when I do a make with C=1, I get *tons* of these warnings:
> 
> include/linux/err.h:35:16: warning: dereference of noderef expression
> include/linux/err.h:30:23: warning: dereference of noderef expression

Which version of Sparse, which version of the kernel and which .c file
can I compile to reproduce this?

I built fs/cifs/ and I didn't see the sparse warning.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Mismatch in gmch_pfit.lvds_border_bits on EeePC 900

2014-06-10 Thread Sitsofe Wheeler

On Tue, Jun 10, 2014 at 08:26:55AM +0200, Daniel Vetter wrote:
> On Sun, Jun 08, 2014 at 10:30:15PM +0100, Sitsofe Wheeler wrote:
> > With a tree that is close to 3.15 final I'm regularly seeing the
> > following on my EeePC 900 when starting ioquake3:
> > 
> > [drm:intel_pipe_config_compare] *ERROR* mismatch in 
> > gmch_pfit.lvds_border_bits (expected 32768, found 0)
> 
> Hm, I've thought we've fixed that by now. Alas, no :(
> 
> Can you please add drm.debug=0xe to your kernel cmdline, reproduce the
> issue and attach the entire dmesg? Please make sure it contains everything
> since boot-up so that we can reconstruct the state properly (might need to
> grab it from logfiles if dmesg is cut off).

Please find kern.log.gz attached.

> Also, do you have any ideas when you reproduce this? Anything that changes
> the lvds output could be relevant ...

Doing
xrandr -s 800x600
xrandr -s 0

was enough to provoke the messages in the attached log.

-- 
Sitsofe | http://sucs.org/~sits/


kern.log.gz
Description: Binary data

Re: [PATCH] staging: usbip: stub_main.c: Cleaning up missing null-terminate after strncpy call

2014-06-10 Thread Dan Carpenter

On Tue, Jun 10, 2014 at 10:48:35PM +0200, Rickard Strandqvist wrote:
> Hi
> 
> True!
> Sorry  :-(
> 
> But then one would either operate strcpy outright.
> 
> Or use strlcpy then the code would be:
> 
> /* strlcpy() handles not include \0 */
> len = strlcpy(busid, buf + 4, BUSID_SIZE);
> 
> /* busid needs to include \0 termination */
> if (!(len < BUSID_SIZE))

I don't like this condition.  Just say (len >= BUSID_SIZE).  The
comments here are obvious and could be left out.

> return -EINVAL;

I don't have strong feelings about a cleanup patch.  But I think that
cppcheck is not being very sofisticated here with the NUL termination
warning so we should not go out of our way to try to silence the
warning.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] vmalloc: use rcu list iterator to reduce vmap_area_lock contention

2014-06-10 Thread Eric Dumazet

On Tue, 2014-06-10 at 23:32 -0400, Peter Hurley wrote:

> While rcu list traversal over the vmap_area_list is safe, this may
> arrive at different results than the spinlocked version. The rcu list
> traversal version will not be a 'snapshot' of a single, valid instant
> of the entire vmap_area_list, but rather a potential amalgam of
> different list states.
> 
> This is because the vmap_area_list can continue to change during
> list traversal.


As soon as we exit from get_vmalloc_info(), information can be obsolete
anyway, especially if we held a spinlock for the whole list traversal.

So using the spinlock is certainly not protecting anything in this
regard.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] ARM: dts: Update the parent for Audss clocks in Exynos5420

2014-06-10 Thread Tushar Behera

Currently CLK_FOUT_EPLL was set as one of the parents of AUDSS mux.
As per the user manual, it should be CLK_MAU_EPLL.

The problem surfaced when the bootloader in Peach-pit board set
the EPLL clock as the parent of AUDSS mux. While booting the kernel,
we used to get a system hang during late boot if CLK_MAU_EPLL was
disabled.

Signed-off-by: Tushar Behera 
Signed-off-by: Shaik Ameer Basha 
Reported-by: Kevin Hilman 
---
 arch/arm/boot/dts/exynos5420.dtsi |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/exynos5420.dtsi 
b/arch/arm/boot/dts/exynos5420.dtsi
index e385322..79e9119 100644
--- a/arch/arm/boot/dts/exynos5420.dtsi
+++ b/arch/arm/boot/dts/exynos5420.dtsi
@@ -167,7 +167,7 @@
compatible = "samsung,exynos5420-audss-clock";
reg = <0x0381 0x0C>;
#clock-cells = <1>;
-   clocks = < CLK_FIN_PLL>, < CLK_FOUT_EPLL>,
+   clocks = < CLK_FIN_PLL>, < CLK_MAU_EPLL>,
 < CLK_SCLK_MAUDIO0>, < CLK_SCLK_MAUPCM0>;
clock-names = "pll_ref", "pll_in", "sclk_audio", "sclk_pcm_in";
};
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] ARM: dts: Enable audio support for Peach-pi board

2014-06-10 Thread Tushar Behera

Peach-pi board has MAX98090 audio codec connected on HSI2C-7 bus.

Signed-off-by: Tushar Behera 
---
 arch/arm/boot/dts/exynos5800-peach-pi.dts |   31 +
 1 file changed, 31 insertions(+)

diff --git a/arch/arm/boot/dts/exynos5800-peach-pi.dts 
b/arch/arm/boot/dts/exynos5800-peach-pi.dts
index f3af207..76f5966 100644
--- a/arch/arm/boot/dts/exynos5800-peach-pi.dts
+++ b/arch/arm/boot/dts/exynos5800-peach-pi.dts
@@ -78,9 +78,27 @@
pinctrl-0 = <_vbus_en>;
enable-active-high;
};
+
+   sound {
+   compatible = "google,snow-audio-max98090";
+
+   samsung,i2s-controller = <>;
+   samsung,audio-codec = <>;
+   };
+};
+
+ {
+   status = "okay";
 };
 
 _0 {
+   max98090_irq: max98090-irq {
+   samsung,pins = "gpx0-2";
+   samsung,pin-function = <0>;
+   samsung,pin-pud = <0>;
+   samsung,pin-drv = <0>;
+   };
+
tpm_irq: tpm-irq {
samsung,pins = "gpx1-0";
samsung,pin-function = <0>;
@@ -207,6 +225,19 @@
samsung,invert-vclk;
 };
 
+_7 {
+   status = "okay";
+
+   max98090: codec@10 {
+   compatible = "maxim,max98090";
+   reg = <0x10>;
+   interrupts = <2 0>;
+   interrupt-parent = <>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_irq>;
+   };
+};
+
 _9 {
status = "okay";
clock-frequency = <40>;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] Fix boot-hang on Peach-pit and Enable audio

2014-06-10 Thread Tushar Behera

With next-20140610, Peach-pit/Peach-pi board hangs during boot if we
run 'sound init' during u-boot. The issue is fixed in following patches.
While at it, also enable audio support for Peach-pi board.

How to test audio on Peach-pi:
* On top of exynos_defconfig, enable SND_SOC_SNOW and PL330_DMA.
* Run 'sound init' at u-boot prompt.

Tushar Behera (3):
  clk: exynos-audss: Keep the parent of mout_audss always enabled
  ARM: dts: Update the parent for Audss clocks in Exynos5420
  ARM: dts: Enable audio support for Peach-pi board

 arch/arm/boot/dts/exynos5420.dtsi |2 +-
 arch/arm/boot/dts/exynos5800-peach-pi.dts |   31 +
 drivers/clk/samsung/clk-exynos-audss.c|   17 +---
 3 files changed, 46 insertions(+), 4 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] clk: exynos-audss: Keep the parent of mout_audss always enabled

2014-06-10 Thread Tushar Behera

When the output clock of AUDSS mux is disabled, we are getting kernel
oops while doing a clk_get() on other clocks provided by AUDSS. Though
user manual doesn't specify this dependency, we came across this issue
while disabling the parent of AUDSS mux clocks.

Keeping the parents of AUDSS mux always enabled fixes this issue.

Signed-off-by: Tushar Behera 
Signed-off-by: Shaik Ameer Basha 
---
 drivers/clk/samsung/clk-exynos-audss.c |   17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/samsung/clk-exynos-audss.c 
b/drivers/clk/samsung/clk-exynos-audss.c
index 13eae14c..1542f30 100644
--- a/drivers/clk/samsung/clk-exynos-audss.c
+++ b/drivers/clk/samsung/clk-exynos-audss.c
@@ -30,6 +30,8 @@ static struct clk **clk_table;
 static void __iomem *reg_base;
 static struct clk_onecell_data clk_data;
 
+static struct clk *pll_ref, *pll_in;
+
 #define ASS_CLK_SRC 0x0
 #define ASS_CLK_DIV 0x4
 #define ASS_CLK_GATE 0x8
@@ -83,7 +85,7 @@ static int exynos_audss_clk_probe(struct platform_device 
*pdev)
const char *mout_audss_p[] = {"fin_pll", "fout_epll"};
const char *mout_i2s_p[] = {"mout_audss", "cdclk0", "sclk_audio0"};
const char *sclk_pcm_p = "sclk_pcm0";
-   struct clk *pll_ref, *pll_in, *cdclk, *sclk_audio, *sclk_pcm_in;
+   struct clk *cdclk, *sclk_audio, *sclk_pcm_in;
const struct of_device_id *match;
enum exynos_audss_clk_type variant;
 
@@ -113,10 +115,14 @@ static int exynos_audss_clk_probe(struct platform_device 
*pdev)
 
pll_ref = devm_clk_get(>dev, "pll_ref");
pll_in = devm_clk_get(>dev, "pll_in");
-   if (!IS_ERR(pll_ref))
+   if (!IS_ERR(pll_ref)) {
mout_audss_p[0] = __clk_get_name(pll_ref);
-   if (!IS_ERR(pll_in))
+   clk_prepare_enable(pll_ref);
+   }
+   if (!IS_ERR(pll_in)) {
mout_audss_p[1] = __clk_get_name(pll_in);
+   clk_prepare_enable(pll_in);
+   }
clk_table[EXYNOS_MOUT_AUDSS] = clk_register_mux(NULL, "mout_audss",
mout_audss_p, ARRAY_SIZE(mout_audss_p),
CLK_SET_RATE_NO_REPARENT,
@@ -217,6 +223,11 @@ static int exynos_audss_clk_remove(struct platform_device 
*pdev)
clk_unregister(clk_table[i]);
}
 
+   if (!IS_ERR(pll_in))
+   clk_disable_unprepare(pll_in);
+   if (!IS_ERR(pll_ref))
+   clk_disable_unprepare(pll_ref);
+
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] xfs: updates for 3.16-rc1

2014-06-10 Thread Dave Chinner

Hi Linus,

Can you please pull the changes from the tree below. Lots of changes
all over the place in XFS, the main addition is a new on-disk btree
for tracking free inodes and the associated optimised allocator
rework to make use of it. Most of the rest of the changes are
cleanups or reworking of existing functionality, as well as various
bug fixes.

-Dave.

The following changes since commit d6d211db37e75de2ddc3a4f979038c40df7cc79c:

  Linux 3.15-rc5 (2014-05-09 13:10:52 -0700)

are available in the git repository at:

  git://oss.sgi.com/xfs/xfs.git tags/xfs-for-linus-3.16-rc1

for you to fetch changes up to 7691283d0561a350b7517be94818669fb5e3d910:

  Merge branch 'xfs-misc-fixes-3-for-3.16' into for-next (2014-06-10 07:32:56 
+1000)



xfs: update for 3.16-rc1

This update contains:
o cleanup removing unused function args
o rework of the filestreams allocator to use dentry cache parent lookups
o new on-disk free inode btree and optimised inode allocator
o various bug fixes
o rework of internal attribute API
o cleanup of superblock feature bit support to remove historic cruft
o more fixes and minor cleanups
o added a new directory/attribute geometry abstraction
o yet more fixes and minor cleanups.


Brian Foster (11):
  xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers
  xfs: reserve v5 superblock read-only compat. feature bit for finobt
  xfs: support the XFS_BTNUM_FINOBT free inode btree type
  xfs: update inode allocation/free transaction reservations for finobt
  xfs: insert newly allocated inode chunks into the finobt
  xfs: use and update the finobt on inode allocation
  xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper
  xfs: update the finobt on inode free
  xfs: add finobt support to growfs
  xfs: report finobt status in fs geometry
  xfs: enable the finobt feature on v5 superblocks

Christoph Hellwig (15):
  xfs: don't try to use the filestream allocator for metadata allocations
  xfs: split xfs_bmap_btalloc_nullfb
  xfs: handle duplicate entries in xfs_mru_cache_insert
  xfs: embedd mru_elem into parent structure
  xfs: remove XFS_IFILESTREAM
  xfs: rewrite the filestream allocator using the dentry cache
  xfs: don't create a slab cache for filestream items
  xfs: remove xfs_filestream_associate
  xfs: add filestream allocator tracepoints
  xfs: fold xfs_attr_set_int into xfs_attr_set
  xfs: fold xfs_attr_get_int into xfs_attr_get
  xfs: fold xfs_attr_remove_int into xfs_attr_remove
  xfs: simplify attr name setup
  xfs: pass struct da_args to xfs_attr_calc_size
  xfs: tone down writepage/releasepage WARN_ONs

Dan Carpenter (1):
  xfs: small cleanup in xfs_lowbit64()

Dave Chinner (50):
  xfs: remove dquot hints
  xfs: truncate_setsize should be outside transactions
  xfs: don't sleep in xlog_cil_force_lsn on shutdown
  xfs: fix directory readahead offset off-by-one
  xfs: xfs_dir_fsync() returns positive errno
  xfs: fix incorrect error sign in xfs_file_aio_read
  xfs: xfs_commit_metadata returns wrong errno
  xfs: correct error sign on COLLAPSE_RANGE errors
  xfs: fix wrong errno from xfs_initxattrs
  xfs: fix wrong err sign on xfs_set_acl()
  xfs: negate mount workqueue init error value
  xfs: negate xfs_icsb_init_counters error value
  xfs: list_lru_init returns a negative error
  Merge branch 'xfs-unused-args-cleanup' into for-next
  Merge branch 'xfs-filestreams-lookup' into for-next
  Merge branch 'xfs-free-inode-btree' into for-next
  Merge branch 'xfs-misc-fixes-1-for-3.16' into for-next
  Merge branch 'xfs-attr-cleanup' into for-next
  xfs: make superblock version checks reflect reality
  xfs: keep sb_bad_features2 the same a sb_features2
  xfs: turn NLINK feature on by default
  xfs: don't need dirv2 checks anymore
  xfs: remove shared supberlock feature checking
  xfs: log vector rounding leaks log space
  xfs: remove redundant checks from xfs_da_read_buf
  Merge branch 'xfs-misc-fixes-2-for-3.16' into for-next
  Merge branch 'xfs-feature-bit-cleanup' into for-next
  xfs: introduce directory geometry structure
  xfs: move directory block translatiosn to xfs_dir2_priv.h
  xfs: kill XFS_DIR2...FIRSTDB macros
  xfs: convert dir byte/off conversion to xfs_da_geometry
  xfs: convert directory dablk conversion to xfs_da_geometry
  xfs: convert directory db conversion to xfs_da_geometry
  xfs: convert directory segment limits to xfs_da_geometry
  xfs: convert m_dirblkfsbs to xfs_da_geometry
  xfs: convert m_dirblksize to xfs_da_geometry
  xfs: convert dir/attr btree threshold to xfs_da_geometry
  xfs: move node entry counts to xfs_da_geometry
  xfs: reduce direct

Re: recvmmsg/sendmmsg result types inconsistent, integer overflows?

2014-06-10 Thread Mike Galbraith

(CCs network wizard hangout)

On Wed, 2014-06-11 at 00:12 -0400, Rich Felker wrote: 
> While looking to add support for the recvmmsg and sendmmsg syscalls in
> musl libc, I ran into some disturbing findings on the kernel side. In
> the struct mmsghdr, the field where the result for each message is
> stored has type int, which is inconsistent with the return type
> ssize_t of recvmsg/sendmsg. So I tried to track down what happens when
> the result is or would be larger than 2GB, and quickly found an
> explanation for why the type in the structure was defined wrong:
> internally, the kernel uses int as the return type for revcmsg and
> sendmsg. Oops.
> 
> A bit more RTFS'ing brought me to tcp_sendmsg in net/ipv4/tcp.c (I
> figured let's look at a stream-based protocol, since datagrams can
> likely never be that big for any existing protocol), and as far as I
> can tell, it's haphazardly mixing int and size_t with no checks for
> overflows. I looked for anywhere the kernel might try to verify before
> starting that the sum of the lengths of all the iovec components
> doesn't overflow INT_MAX or even SIZE_MAX, but didn't find any such
> checks.
> 
> Is there some magic that makes this all safe, or is this a big mess of
> possibly-security-relevant bugs?
> 
> Rich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Jun 11

2014-06-10 Thread Stephen Rothwell

Hi all,

The powerpc allyesconfig is again broken more than usual.

Changes since 20140610:

Dropped tree: drm-intel-fixes (build problems)

The drm-intel-fixes still had its build failure so I dropped it at the
maintainers request.

The pci tree gained a build failure so I used the version from
next-20140610.

The akpm tree lost several patches that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 3911
 3097 files changed, 122540 insertions(+), 56951 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 219 trees (counting Linus' and 29 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (2937f5efa575 Merge branch 'for_linus' of 
git://cavan.codon.org.uk/platform-drivers-x86)
Merging fixes/master (4b660a7f5c80 Linux 3.15-rc6)
Merging kbuild-current/rc-fixes (38dbfb59d117 Linus 3.14-rc1)
Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4)
Merging arm-current/fixes (3f8517e7937d ARM: 8063/1: bL_switcher: fix 
individual online status reporting of removed CPUs)
Merging m68k-current/for-linus (e8d6dc5ad26e m68k/hp300: Convert printk to 
pr_foo())
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging powerpc-merge/merge (8212f58a9b15 powerpc: Wire renameat2() syscall)
Merging sparc/master (8ecc1bad4c9b sparc64: fix format string mismatch in 
arch/sparc/kernel/sysfs.c)
Merging net/master (87757a917b0b net: force a list_del() in 
unregister_netdevice_many())
Merging ipsec/master (6d004d6cc739 vti: Use the tunnel mark for lookup in the 
error handlers.)
Merging sound-current/for-linus (6538de03a98f ALSA: hda - Add quirk for ABit 
AA8XE)
Merging pci-current/for-linus (d0b4cc4e3270 PCI: Wrong register used to check 
pending traffic)
Merging wireless/master (2c316e699fa4 Merge branch 'for-john' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging driver-core.current/driver-core-linus (4b660a7f5c80 Linux 3.15-rc6)
Merging tty.current/tty-linus (d6d211db37e7 Linux 3.15-rc5)
Merging usb.current/usb-linus (5dc2808c4729 xhci: delete endpoints from 
bandwidth list before freeing whole device)
Merging usb-gadget-fixes/fixes (886c7c426d46 usb: gadget: at91-udc: fix irq and 
iomem resource retrieval)
Merging staging.current/staging-linus (9326c5ca0982 staging: r8192e_pci: fix 
htons error)
Merging char-misc.current/char-misc-linus (d1db0eea8524 Linux 3.15-rc3)
Merging input-current/for-linus (a292241cccb7 Merge branch 'next' into 
for-linus)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (3901c1124ec5 crypto: s390 - fix aes,des ctr mode 
concurrency finding.)
Merging ide/master (5b40dd30bbfa ide: Fix SC1200 dependencies)
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merge (4b660a7f5c80 Linux 3.15-rc6)
Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting 
module removal.)
Merging mfd-fixes/master (73beb63d290f mfd: rtsx_pcr: Disable interrupts before 
cancelling delayed works)
Merging vfio-fixes/for-linus (239a87020b26 Merge branch 
'for-joerg/arm-smmu/fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus)
Merging drm-intel-fixes/for-linux-next-fixes (15d24aa5602f drm/i915: BDW: 
Adding missing cursor offsets.)
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem_gtt.c
CONFLICT (content): Merge co

Re: recvmmsg/sendmmsg result types inconsistent, integer overflows?

2014-06-10 Thread Michael Kerrisk

[adding developers of the two syscalls to CC; maybe they have some insights.]

On Wed, Jun 11, 2014 at 6:12 AM, Rich Felker  wrote:
> While looking to add support for the recvmmsg and sendmmsg syscalls in
> musl libc, I ran into some disturbing findings on the kernel side. In
> the struct mmsghdr, the field where the result for each message is
> stored has type int, which is inconsistent with the return type
> ssize_t of recvmsg/sendmsg. So I tried to track down what happens when
> the result is or would be larger than 2GB, and quickly found an
> explanation for why the type in the structure was defined wrong:
> internally, the kernel uses int as the return type for revcmsg and
> sendmsg. Oops.
>
> A bit more RTFS'ing brought me to tcp_sendmsg in net/ipv4/tcp.c (I
> figured let's look at a stream-based protocol, since datagrams can
> likely never be that big for any existing protocol), and as far as I
> can tell, it's haphazardly mixing int and size_t with no checks for
> overflows. I looked for anywhere the kernel might try to verify before
> starting that the sum of the lengths of all the iovec components
> doesn't overflow INT_MAX or even SIZE_MAX, but didn't find any such
> checks.
>
> Is there some magic that makes this all safe, or is this a big mess of
> possibly-security-relevant bugs?
>
> Rich
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Staging: rtl8192e: dot11d: Fixed printk coding style issues

2014-06-10 Thread A Raghavendra Rao

Replaced 'printk' with 'netdev_' function

Signed-off-by: A Raghavendra Rao 
---
 drivers/staging/rtl8192e/dot11d.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8192e/dot11d.c 
b/drivers/staging/rtl8192e/dot11d.c
index 53da610..bfcc935 100644
--- a/drivers/staging/rtl8192e/dot11d.c
+++ b/drivers/staging/rtl8192e/dot11d.c
@@ -133,12 +133,12 @@ void Dot11d_UpdateCountryIe(struct rtllib_device *dev, u8 
*pTaddr,
pTriple = (struct chnl_txpow_triple *)(pCoutryIe + 3);
for (i = 0; i < NumTriples; i++) {
if (MaxChnlNum >= pTriple->FirstChnl) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it1\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it1\n");
return;
}
if (MAX_CHANNEL_NUMBER < (pTriple->FirstChnl +
pTriple->NumChnls)) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it2\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it2\n");
return;
}
 
@@ -165,7 +165,7 @@ u8 DOT11D_GetMaxTxPwrInDbm(struct rtllib_device *dev, u8 
Channel)
u8 MaxTxPwrInDbm = 255;
 
if (MAX_CHANNEL_NUMBER < Channel) {
-   printk(KERN_INFO "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
+   netdev_info(dev->dev, "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
return MaxTxPwrInDbm;
}
if (pDot11dInfo->channel_map[Channel])
@@ -204,7 +204,7 @@ int ToLegalChannel(struct rtllib_device *dev, u8 channel)
}
 
if (MAX_CHANNEL_NUMBER < channel) {
-   printk(KERN_ERR "%s(): Invalid Channel\n", __func__);
+   netdev_err(dev->dev, "%s(): Invalid Channel\n", __func__);
return default_chn;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: Find correct 64 bit ramdisk address for microcode early update

2014-06-10 Thread Yinghai Lu

When using kexec with 64bit kernel, bzImage and ramdisk could be
loaded above 4G. We need this to get correct ramdisk adress.

Make get_ramdisk_image() global and use it for early microcode updating.
Also make it to take boot_params pointer for different usage.

Signed-off-by: Yinghai Lu 

---
 arch/x86/include/asm/setup.h|3 +++
 arch/x86/kernel/cpu/microcode/amd_early.c   |   10 +-
 arch/x86/kernel/cpu/microcode/intel_early.c |8 
 arch/x86/kernel/setup.c |   28 ++--
 4 files changed, 26 insertions(+), 23 deletions(-)

Index: linux-2.6/arch/x86/include/asm/setup.h
===
--- linux-2.6.orig/arch/x86/include/asm/setup.h
+++ linux-2.6/arch/x86/include/asm/setup.h
@@ -105,6 +105,9 @@ void *extend_brk(size_t size, size_t ali
RESERVE_BRK(name, sizeof(type) * entries)
 
 extern void probe_roms(void);
+u64 get_ramdisk_image(struct boot_params *bp);
+u64 get_ramdisk_size(struct boot_params *bp);
+
 #ifdef __i386__
 
 asmlinkage void __init i386_start_kernel(void);
Index: linux-2.6/arch/x86/kernel/cpu/microcode/amd_early.c
===
--- linux-2.6.orig/arch/x86/kernel/cpu/microcode/amd_early.c
+++ linux-2.6/arch/x86/kernel/cpu/microcode/amd_early.c
@@ -51,12 +51,12 @@ static struct cpio_data __init find_ucod
 */
p   = (struct boot_params *)__pa_nodebug(_params);
path= (char *)__pa_nodebug(ucode_path);
-   start   = (void *)p->hdr.ramdisk_image;
-   size= p->hdr.ramdisk_size;
+   start   = (void *)(unsigned long)get_ramdisk_image(p);
+   size= get_ramdisk_size(p);
 #else
path= ucode_path;
-   start   = (void *)(boot_params.hdr.ramdisk_image + PAGE_OFFSET);
-   size= boot_params.hdr.ramdisk_size;
+   start   = (void *)(get_ramdisk_image(_params) + PAGE_OFFSET);
+   size= get_ramdisk_size(_params);
 #endif
 
return find_cpio_data(path, start, size, );
@@ -371,7 +371,7 @@ int __init save_microcode_in_initrd_amd(
 */
if (relocated_ramdisk)
container = (u8 *)(__va(relocated_ramdisk) +
-(cont - boot_params.hdr.ramdisk_image));
+(cont - get_ramdisk_size(_params)));
 
if (ucode_new_rev)
pr_info("microcode: updated early to new patch_level=0x%08x\n",
Index: linux-2.6/arch/x86/kernel/cpu/microcode/intel_early.c
===
--- linux-2.6.orig/arch/x86/kernel/cpu/microcode/intel_early.c
+++ linux-2.6/arch/x86/kernel/cpu/microcode/intel_early.c
@@ -733,8 +733,8 @@ load_ucode_intel_bsp(void)
struct boot_params *boot_params_p;
 
boot_params_p = (struct boot_params *)__pa_nodebug(_params);
-   ramdisk_image = boot_params_p->hdr.ramdisk_image;
-   ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(boot_params_p);
+   ramdisk_size  = get_ramdisk_size(boot_params_p);
initrd_start_early = ramdisk_image;
initrd_end_early = initrd_start_early + ramdisk_size;
 
@@ -743,8 +743,8 @@ load_ucode_intel_bsp(void)
(unsigned long *)__pa_nodebug(_saved_in_initrd),
initrd_start_early, initrd_end_early, );
 #else
-   ramdisk_image = boot_params.hdr.ramdisk_image;
-   ramdisk_size  = boot_params.hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(_params);
+   ramdisk_size  = get_ramdisk_size(_params);
initrd_start_early = ramdisk_image + PAGE_OFFSET;
initrd_end_early = initrd_start_early + ramdisk_size;
 
Index: linux-2.6/arch/x86/kernel/setup.c
===
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -299,19 +299,19 @@ u64 relocated_ramdisk;
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
-static u64 __init get_ramdisk_image(void)
+u64 __init get_ramdisk_image(struct boot_params *bp)
 {
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+   u64 ramdisk_image = bp->hdr.ramdisk_image;
 
-   ramdisk_image |= (u64)boot_params.ext_ramdisk_image << 32;
+   ramdisk_image |= (u64)bp->ext_ramdisk_image << 32;
 
return ramdisk_image;
 }
-static u64 __init get_ramdisk_size(void)
+u64 __init get_ramdisk_size(struct boot_params *bp)
 {
-   u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_size = bp->hdr.ramdisk_size;
 
-   ramdisk_size |= (u64)boot_params.ext_ramdisk_size << 32;
+   ramdisk_size |= (u64)bp->ext_ramdisk_size << 32;
 
return ramdisk_size;
 }
@@ -320,8 +320,8 @@ static u64 __init get_ramdisk_size(void)
 static void __init relocate_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = get_ramdisk_image();
-   u64 ramdisk_size  =

[PATCH] devres: remove devm_request_and_ioremap()

2014-06-10 Thread Jingoo Han

devm_request_and_ioremap() was obsoleted by the commit 7509657
("lib: devres: Introduce devm_ioremap_resource()") and has been
deprecated for a long time. So, let's remove this function.
In addition, all usages of devm_request_and_ioremap() are also
removed.

Signed-off-by: Jingoo Han 
---
Based-on the latest linux kernel
(dfb9454 Merge git://www.linux-watchdog.org/linux-watchdog)

 Documentation/driver-model/devres.txt  |1 -
 drivers/bus/brcmstb_gisb.c |6 +-
 drivers/gpu/drm/armada/armada_crtc.c   |8 +-
 include/linux/device.h |2 -
 lib/devres.c   |   28 --
 scripts/coccinelle/api/devm_ioremap_resource.cocci |   90 
 6 files changed, 6 insertions(+), 129 deletions(-)
 delete mode 100644 scripts/coccinelle/api/devm_ioremap_resource.cocci

diff --git a/Documentation/driver-model/devres.txt 
b/Documentation/driver-model/devres.txt
index 8947255..001740b 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -278,7 +278,6 @@ IOMAP
   devm_ioremap_nocache()
   devm_iounmap()
   devm_ioremap_resource() : checks resource, requests memory region, ioremaps
-  devm_request_and_ioremap() : obsoleted by devm_ioremap_resource()
   pcim_iomap()
   pcim_iounmap()
   pcim_iomap_table()   : array of mapped addresses indexed by BAR
diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c
index 6159b77..f2cd6a2d 100644
--- a/drivers/bus/brcmstb_gisb.c
+++ b/drivers/bus/brcmstb_gisb.c
@@ -212,9 +212,9 @@ static int brcmstb_gisb_arb_probe(struct platform_device 
*pdev)
mutex_init(>lock);
INIT_LIST_HEAD(>next);
 
-   gdev->base = devm_request_and_ioremap(>dev, r);
-   if (!gdev->base)
-   return -ENOMEM;
+   gdev->base = devm_ioremap_resource(>dev, r);
+   if (IS_ERR(gdev->base))
+   return PTR_ERR(gdev->base);
 
err = devm_request_irq(>dev, timeout_irq,
brcmstb_gisb_timeout_handler, 0, pdev->name,
diff --git a/drivers/gpu/drm/armada/armada_crtc.c 
b/drivers/gpu/drm/armada/armada_crtc.c
index 81c34f9..3aedf9e 100644
--- a/drivers/gpu/drm/armada/armada_crtc.c
+++ b/drivers/gpu/drm/armada/armada_crtc.c
@@ -1039,11 +1039,9 @@ int armada_drm_crtc_create(struct drm_device *dev, 
unsigned num,
if (ret)
return ret;
 
-   base = devm_request_and_ioremap(dev->dev, res);
-   if (!base) {
-   DRM_ERROR("failed to ioremap register\n");
-   return -ENOMEM;
-   }
+   base = devm_ioremap_resource(dev->dev, res);
+   if (IS_ERR(base))
+   return PTR_ERR(base);
 
dcrtc = kzalloc(sizeof(*dcrtc), GFP_KERNEL);
if (!dcrtc) {
diff --git a/include/linux/device.h b/include/linux/device.h
index af424ac..921fa0a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -631,8 +631,6 @@ extern unsigned long devm_get_free_pages(struct device *dev,
 extern void devm_free_pages(struct device *dev, unsigned long addr);
 
 void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res);
-void __iomem *devm_request_and_ioremap(struct device *dev,
-   struct resource *res);
 
 /* allows to add/remove a custom action to devres stack */
 int devm_add_action(struct device *dev, void (*action)(void *), void *data);
diff --git a/lib/devres.c b/lib/devres.c
index f562bf6..6a4aee8 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -142,34 +142,6 @@ void __iomem *devm_ioremap_resource(struct device *dev, 
struct resource *res)
 }
 EXPORT_SYMBOL(devm_ioremap_resource);
 
-/**
- * devm_request_and_ioremap() - Check, request region, and ioremap resource
- * @dev: Generic device to handle the resource for
- * @res: resource to be handled
- *
- * Takes all necessary steps to ioremap a mem resource. Uses managed device, so
- * everything is undone on driver detach. Checks arguments, so you can feed
- * it the result from e.g. platform_get_resource() directly. Returns the
- * remapped pointer or NULL on error. Usage example:
- *
- * res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- * base = devm_request_and_ioremap(>dev, res);
- * if (!base)
- * return -EADDRNOTAVAIL;
- */
-void __iomem *devm_request_and_ioremap(struct device *dev,
-  struct resource *res)
-{
-   void __iomem *dest_ptr;
-
-   dest_ptr = devm_ioremap_resource(dev, res);
-   if (IS_ERR(dest_ptr))
-   return NULL;
-
-   return dest_ptr;
-}
-EXPORT_SYMBOL(devm_request_and_ioremap);
-
 #ifdef CONFIG_HAS_IOPORT_MAP
 /*
  * Generic iomap devres
diff --git a/scripts/coccinelle/api/devm_ioremap_resource.cocci 
b/scripts/coccinelle/api/devm_ioremap_resource.cocci
deleted file mode 100644
index 495daa3..000
--- a/scripts/coccinelle/api/devm_ioremap_resource.cocci
+++ /dev/null
@@ -1,90 +0,0

Re: [PATCH] Staging: rtl8192e: dot11d: Fixed printk coding style issues

2014-06-10 Thread Greg KH

On Wed, Jun 11, 2014 at 10:11:55AM +0530, A Raghavendra Rao wrote:
> Replaced 'printk' with 'netdev_' function
> 
> Signed-off-by: A Raghavendra Rao 
> ---
>  drivers/staging/rtl8192e/dot11d.c |9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/rtl8192e/dot11d.c 
> b/drivers/staging/rtl8192e/dot11d.c
> index 53da610..ef9da86 100644
> --- a/drivers/staging/rtl8192e/dot11d.c
> +++ b/drivers/staging/rtl8192e/dot11d.c
> @@ -49,6 +49,7 @@ static struct channel_list ChannelPlan[] = {
>  void dot11d_init(struct rtllib_device *ieee)
>  {
>   struct rt_dot11d_info *pDot11dInfo = GET_DOT11D_INFO(ieee);
> +
>   pDot11dInfo->bEnabled = false;
>  
>   pDot11dInfo->State = DOT11D_STATE_NONE;

This change doesn't match what you said you did :(

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disable bus's drivers_autoprobe before rootfs has mounted

2014-06-10 Thread Peter Chen

On Tue, Jun 10, 2014 at 11:35:07PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Tue, Jun 10, 2014 at 09:10:00PM -0700, Greg KH wrote:
> > > Let's take USB peripheral as an example, there is a device for
> > > udc, and a device driver for usb gadget driver, at default, we want
> > > the device to be bound to driver automatically, this is what
> > > we have done now. But if there are more than one udcs and gadget
> > > drivers (eg one B port for mass storage, another B port for usb ethernet),
> > > the user may want to have specific binding (eg, udc-0 -> mass storage,
> > > udc-1 -> usb ethernet), so the binding will be established
> > > after rootfs has mounted. (This feature is implementing)
> > 
> > Then there better be a way to describe this on the kernel command line
> > (i.e. module paramaters), right?  Which is a total mess, why not just
> > not bind anything in this case and let the user pick what they want?
> 
> you can also blacklist all gadget drivers and manually probe them or -
> get this - you can refrain from using gadget drivers and use libusbg to
> build the gadget drivers out of raw usb functions, then bind them to the
> UDC of your liking.
> 

I am just worried if we change the behaviour of using gadget driver,
can it be accepted by user? If you think it can be accepted if we can
have some docs, we can implement manually binding for gadget driver
from now on.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.13 131/160] Bluetooth: Fix redundant encryption request for reauthentication

2014-06-10 Thread Johan Hedberg

Hi,

On Tue, Jun 10, 2014, Kamal Mostafa wrote:
> 3.13.11.3 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Johan Hedberg 
> 
> commit 09da1f3463eb81d59685df723b1c5950b7570340 upstream.
> 
> When we're performing reauthentication (in order to elevate the
> security level from an unauthenticated key to an authenticated one) we
> do not need to issue any encryption command once authentication
> completes. Since the trigger for the encryption HCI command is the
> ENCRYPT_PEND flag this flag should not be set in this scenario.
> Instead, the REAUTH_PEND flag takes care of all necessary steps for
> reauthentication.
> 
> Signed-off-by: Johan Hedberg 
> Signed-off-by: Marcel Holtmann 
> Signed-off-by: Kamal Mostafa 
> ---
>  net/bluetooth/hci_conn.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

This one has a regression reported against it:

https://bugzilla.kernel.org/show_bug.cgi?id=77541

The report also has a working fix for the issue which we'll be sending
to the stable trees (it's already in the Bluetooth subsystem tree). So
I'm not sure what the right way to proceed here: ignore this patch until
the other patch is available, apply this one and wait for the other one,
or just forget about both patches for the stable trees.

Johan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: EXYNOS: mcpm: Don't rely on firmware's secondary_cpu_start

2014-06-10 Thread Chander Kashyap

Hi Doug,

On Tue, Jun 10, 2014 at 9:19 PM, Nicolas Pitre  wrote:
> On Tue, 10 Jun 2014, Doug Anderson wrote:
>
>> My S-state knowledge is not strong, but I believe that Lorenzo's
>> questions matter if we're using S2 for CPUidle (where we actually turn
>> off power and hot unplug CPUs) but not when we're using S1 for CPUidle
>> (where we just enter WFI/WFE).
>>

No Its not plain WFI.

All cores in Exynos5420 can be powered off independently.
This functionality has been tested.

Below is the link for the posted patches.

https://lkml.org/lkml/2014/6/10/194

And as Nicolas wrote, these patches need MCPM for that.

>> I believe that in ChromeOS we use S1 CPUidle and that it works fine.
>> We've never implemented S2 that I'm aware of.
>
> You'll have to rely on MCPM for that.  That's probably why it hasn't
> been implemented before.
>
>
> Nicolas
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Staging: rtl8192e: dot11d: Fixed printk coding style issues

2014-06-10 Thread A Raghavendra Rao

Replaced 'printk' with 'netdev_' function

Signed-off-by: A Raghavendra Rao 
---
 drivers/staging/rtl8192e/dot11d.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8192e/dot11d.c 
b/drivers/staging/rtl8192e/dot11d.c
index 53da610..ef9da86 100644
--- a/drivers/staging/rtl8192e/dot11d.c
+++ b/drivers/staging/rtl8192e/dot11d.c
@@ -49,6 +49,7 @@ static struct channel_list ChannelPlan[] = {
 void dot11d_init(struct rtllib_device *ieee)
 {
struct rt_dot11d_info *pDot11dInfo = GET_DOT11D_INFO(ieee);
+
pDot11dInfo->bEnabled = false;
 
pDot11dInfo->State = DOT11D_STATE_NONE;
@@ -133,12 +134,12 @@ void Dot11d_UpdateCountryIe(struct rtllib_device *dev, u8 
*pTaddr,
pTriple = (struct chnl_txpow_triple *)(pCoutryIe + 3);
for (i = 0; i < NumTriples; i++) {
if (MaxChnlNum >= pTriple->FirstChnl) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it1\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it1\n");
return;
}
if (MAX_CHANNEL_NUMBER < (pTriple->FirstChnl +
pTriple->NumChnls)) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it2\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it2\n");
return;
}
 
@@ -165,7 +166,7 @@ u8 DOT11D_GetMaxTxPwrInDbm(struct rtllib_device *dev, u8 
Channel)
u8 MaxTxPwrInDbm = 255;
 
if (MAX_CHANNEL_NUMBER < Channel) {
-   printk(KERN_INFO "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
+   netdev_info(dev->dev, "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
return MaxTxPwrInDbm;
}
if (pDot11dInfo->channel_map[Channel])
@@ -204,7 +205,7 @@ int ToLegalChannel(struct rtllib_device *dev, u8 channel)
}
 
if (MAX_CHANNEL_NUMBER < channel) {
-   printk(KERN_ERR "%s(): Invalid Channel\n", __func__);
+   netdev_err(dev->dev, "%s(): Invalid Channel\n", __func__);
return default_chn;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disable bus's drivers_autoprobe before rootfs has mounted

2014-06-10 Thread Peter Chen

On Tue, Jun 10, 2014 at 09:10:00PM -0700, Greg KH wrote:
> On Wed, Jun 11, 2014 at 10:14:40AM +0800, Peter Chen wrote:
> > Hi Greg,
> > 
> > Currently, we can't disable auto probe function during booting
> > if both device and device driver register code are built in due
> > to .drivers_autoprobe is a private value for bus core and this
> > value can only be changed by sys entry.
> 
> Then don't build them into the kernel :)
> 
> > It causes we can't implement feature that the user can choose
> > manual binding and auto binding through module parameters.
> 
> Wait, you just asked about building the stuff into the kernel, not a
> module.

Yes, build the code into the kernel.
> 
> > Eg, the default binding is automatic, but the user can override
> > it by module parameter.
> 
> Do we do that for any other "bus" anywhere?

I don't know.

> 
> > Let's take USB peripheral as an example, there is a device for
> > udc, and a device driver for usb gadget driver, at default, we want
> > the device to be bound to driver automatically, this is what
> > we have done now. But if there are more than one udcs and gadget
> > drivers (eg one B port for mass storage, another B port for usb ethernet),
> > the user may want to have specific binding (eg, udc-0 -> mass storage,
> > udc-1 -> usb ethernet), so the binding will be established
> > after rootfs has mounted. (This feature is implementing)
> 
> Then there better be a way to describe this on the kernel command line
> (i.e. module paramaters), right?  Which is a total mess, why not just
> not bind anything in this case and let the user pick what they want?

If the user is used to do nothing at rootfs for current or earlier kernel,
Is it ok we change the driver's behaviour and a sys entry is mandatory
for user?

> 
> > From what I read code, we can't implement above feature, but I may
> > be wrong, if you have some solutions, give me some hints please.
> > If there is no solution for above feature, do we agree with exporting
> > .drivers_autoprobe for bus driver or something similar?
> 
> I don't understand what you mean by this, care to show me with code?

I mean the individual bus driver can't change bus->p->drivers_autoprobe?
bus->p->drivers_autoprobe is handled at drivers/base/bus.c.

If the individual bus driver can change bus->p->drivers_autoprobe, we
can disable autoprobe (auto-binding) during booting.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress

2014-06-10 Thread Paul E. McKenney

On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
> Hi Paul,
> 
> On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
>  wrote:
> >>   if (rnp->gpnum != rnp->completed ||
> >> - ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> >> + ACCESS_ONCE(rnp_root->gpnum) != 
> >> ACCESS_ONCE(rnp_root->completed)) {
> >
> > At this point in the code, we are checking the current rcu_node structure,
> > which might or might not be the root.  If it is not the root, we absolutely
> > cannot compare against the root because we don't yet hold the root's lock.
> >
> 
> I was a bit thrown by the double checking which is being done
> (rnp->gpnum != rnp->complete) in that if condition. Once without
> ACCESS_ONCE and one with. Is there any particular reason for this?
> 
> I now understand that we are comparing ->gpnum and ->completed of the
> root node which might change from under us if we don't hold the root's
> lock. I will keep looking :)

Hmmm...  Now that you mention it, that does look a bit strange.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disable bus's drivers_autoprobe before rootfs has mounted

2014-06-10 Thread Felipe Balbi

Hi,

On Tue, Jun 10, 2014 at 09:10:00PM -0700, Greg KH wrote:
> > Let's take USB peripheral as an example, there is a device for
> > udc, and a device driver for usb gadget driver, at default, we want
> > the device to be bound to driver automatically, this is what
> > we have done now. But if there are more than one udcs and gadget
> > drivers (eg one B port for mass storage, another B port for usb ethernet),
> > the user may want to have specific binding (eg, udc-0 -> mass storage,
> > udc-1 -> usb ethernet), so the binding will be established
> > after rootfs has mounted. (This feature is implementing)
> 
> Then there better be a way to describe this on the kernel command line
> (i.e. module paramaters), right?  Which is a total mess, why not just
> not bind anything in this case and let the user pick what they want?

you can also blacklist all gadget drivers and manually probe them or -
get this - you can refrain from using gadget drivers and use libusbg to
build the gadget drivers out of raw usb functions, then bind them to the
UDC of your liking.

-- 
balbi


signature.asc
Description: Digital signature

[net-next PATCH] mrf24j40: add device managed APIs

2014-06-10 Thread Varka Bhadram

adds the device managed APIs so that no need worry about
freeing the resources.

Signed-off-by: Varka Bhadram 
---
 drivers/net/ieee802154/mrf24j40.c |   33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ieee802154/mrf24j40.c 
b/drivers/net/ieee802154/mrf24j40.c
index 78a6552..4048062 100644
--- a/drivers/net/ieee802154/mrf24j40.c
+++ b/drivers/net/ieee802154/mrf24j40.c
@@ -618,12 +618,12 @@ static int mrf24j40_probe(struct spi_device *spi)
 
printk(KERN_INFO "mrf24j40: probe(). IRQ: %d\n", spi->irq);
 
-   devrec = kzalloc(sizeof(struct mrf24j40), GFP_KERNEL);
+   devrec = devm_kzalloc(>dev, sizeof(struct mrf24j40), GFP_KERNEL);
if (!devrec)
-   goto err_devrec;
-   devrec->buf = kzalloc(3, GFP_KERNEL);
+   goto err_ret;
+   devrec->buf = devm_kzalloc(>dev, 3, GFP_KERNEL);
if (!devrec->buf)
-   goto err_buf;
+   goto err_ret;
 
spi->mode = SPI_MODE_0; /* TODO: Is this appropriate for right here? */
if (spi->max_speed_hz > MAX_SPI_SPEED_HZ)
@@ -638,7 +638,7 @@ static int mrf24j40_probe(struct spi_device *spi)
 
devrec->dev = ieee802154_alloc_device(0, _ops);
if (!devrec->dev)
-   goto err_alloc_dev;
+   goto err_ret;
 
devrec->dev->priv = devrec;
devrec->dev->parent = >spi->dev;
@@ -676,12 +676,13 @@ static int mrf24j40_probe(struct spi_device *spi)
val &= ~0x3; /* Clear RX mode (normal) */
write_short_reg(devrec, REG_RXMCR, val);
 
-   ret = request_threaded_irq(spi->irq,
-  NULL,
-  mrf24j40_isr,
-  IRQF_TRIGGER_LOW|IRQF_ONESHOT,
-  dev_name(>dev),
-  devrec);
+   ret = devm_request_threaded_irq(>dev,
+   spi->irq,
+   NULL,
+   mrf24j40_isr,
+   IRQF_TRIGGER_LOW|IRQF_ONESHOT,
+   dev_name(>dev),
+   devrec);
 
if (ret) {
dev_err(printdev(devrec), "Unable to get IRQ");
@@ -695,11 +696,7 @@ err_read_reg:
ieee802154_unregister_device(devrec->dev);
 err_register_device:
ieee802154_free_device(devrec->dev);
-err_alloc_dev:
-   kfree(devrec->buf);
-err_buf:
-   kfree(devrec);
-err_devrec:
+err_ret:
return ret;
 }
 
@@ -709,15 +706,11 @@ static int mrf24j40_remove(struct spi_device *spi)
 
dev_dbg(printdev(devrec), "remove\n");
 
-   free_irq(spi->irq, devrec);
ieee802154_unregister_device(devrec->dev);
ieee802154_free_device(devrec->dev);
/* TODO: Will ieee802154_free_device() wait until ->xmit() is
 * complete? */
 
-   /* Clean up the SPI stuff. */
-   kfree(devrec->buf);
-   kfree(devrec);
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-10 Thread George Spelvin

> So have you actually instrumented the kernel to demonstrate that in
> fact we have super deep stack call paths where the 128 bytes worth of
> stack actually matters?

I haven't got a specific call chain where 128 bytes pushes it
over a limit.  But kernel stack usage is a perennial problem.
Wasn't there some discussion about that just recenty?
6538b8ea8: "x86_64: expand kernel stack to 16K"

I agree a 128 byte stack frame is not one of the worst offenders,
but it's enough to try to clean up if possible.

You can search LKML for a bunch of discussion of 176 bytes
in __alloc_pages_slowpath().

And in this case, it's so *easy*.  extract_buf() works 10 bytes at a
time anyway, and _mix_pool_bytes is byte at a time.

>> I hadn't tested the patch when I mailed it to you (I prepared it in
>> order to reply to your e-mail, and it's annoying to reboot the machine
>> I'm composing an e-mail on), but I have since.  It works.

> As an aside, I'd strongly suggest that you use kvm to do your kernel
> testing.  It means you can do a lot more testing which is always a
> good thing

H'mmm. I need to learn what KVM *is*.  Apparently there's a second
meaning other than "keyboard, video & mouse". :-)

Normally, I just test using modules.  Especially when working on a
driver for a hardware device, virtualization makes life difficult.
But /dev/random is (for good reasons) not modularizable.

(I can see how it'd be useful for filesystem development, however.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Counting currently open file descriptors per process.

2014-06-10 Thread wmealing

Gday,

I'm seeking some guidance on how to best (or if) to implement a feature.
 
Please CC me on any reply, I am not subscribed to this list.

The feature is, "an application would like to know how many files another 
process has open".

>From user space, the cheapest way  would be to use the call
syscall(SYS_getdents ...) in the proc/pid/fd directory. 

Alternatively from kernel space one could achieve a similar behavior by
iterating through the tasks fdtable, as i have attempted to here:

https://gist.github.com/wmealing/c0836bc6a38f8f90aa0d

Colleagues of mine have pointed out that this may have performance 
impacts for tools that frequently parse /proc/pid/status.

I have compiled a kernel with the above patch and here are the performance 
stats.

System settings
# sysctl -w fs.file-max=500
fs.file-max = 500

Increase this sessions limits.
# ulimit -n 100

test.py had 52 files open each time.

Here are some of the performance benchmarks on an idle
system: 

# time cat /proc/`pidof python test.py`/status |grep FD
FDSize: 524288
FDCount: 52

real0m0.008s
user0m0.002s
sys 0m0.004s

# time ./test-getdents /proc/`pidof python test.py`/fd &> /dev/null 

real0m0.631s
user0m0.001s
sys 0m0.485s

or this time with readdir(3)

# time ./test-opendir /proc/`pidof python test.py`/fd &> /dev/null

real0m0.129s
user0m0.001s
sys 0m0.007s

(which oddly seems faster?)

My benchmark values above are not meant for micro-benchmarking 
but rather as a scale to know how far behind the code is.

Is the current method of getting a live fd count acceptable, if 
not how should it be done ?

Thanks for your time.

Wade Mealing.

[1] https://github.com/wmealing/live-fd-count/  
git repo with code used in the above.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] vmalloc: use rcu list iterator to reduce vmap_area_lock contention

2014-06-10 Thread Joonsoo Kim

On Tue, Jun 10, 2014 at 11:32:19PM -0400, Peter Hurley wrote:
> PF: none (google.com: pe...@hurleysoftware.com does not designate permitted 
> sender hosts) client-ip=216.70.64.70;
> Received: from h96-61-95-138.cntcnh.dsl.dynamic.tds.net ([96.61.95.138]:55986 
> helo=[192.168.1.139])
>   by n23.mail01.mtsvc.net with esmtpsa (TLSv1:AES128-SHA:128)
>   (Exim 4.72)
>   (envelope-from )
>   id 1WuZGw-00064f-2L; Tue, 10 Jun 2014 23:32:22 -0400
> Message-ID: <5397cdc3.1050...@hurleysoftware.com>
> Date: Tue, 10 Jun 2014 23:32:19 -0400
> From: Peter Hurley 
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 
> Thunderbird/24.5.0
> MIME-Version: 1.0
> To: Joonsoo Kim , Andrew Morton
> 
> CC: Zhang Yanfei , Johannes Weiner
> ,
> Andi Kleen , linux...@kvack.org,
> linux-kernel@vger.kernel.org, Richard Yao , Eric
> Dumazet 
> Subject: Re: [PATCH v2] vmalloc: use rcu list iterator to reduce 
> vmap_area_lock
> contention
> References: <1402453146-10057-1-git-send-email-iamjoonsoo@lge.com>
> In-Reply-To: <1402453146-10057-1-git-send-email-iamjoonsoo@lge.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> Content-Transfer-Encoding: 7bit
> X-Authenticated-User: 990527 pe...@hurleysoftware.com
> X-MT-ID: 8FA290C2A27252AACF65DBC4A42F3CE3735FB2A4
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.00, version=1.2.4
> Sender: owner-linux...@kvack.org
> Precedence: bulk
> X-Loop: owner-majord...@kvack.org
> List-ID: 
> Status: O
> Content-Length: 3338
> Lines: 96
> 
> On 06/10/2014 10:19 PM, Joonsoo Kim wrote:
> >Richard Yao reported a month ago that his system have a trouble
> >with vmap_area_lock contention during performance analysis
> >by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo
> >stressfully, but he didn't answer it.
> >
> >https://lkml.org/lkml/2014/4/10/416
> >
> >Although I'm not sure that this is right usage or not, there is a solution
> >reducing vmap_area_lock contention with no side-effect. That is just
> >to use rcu list iterator in get_vmalloc_info().
> >
> >rcu can be used in this function because all RCU protocol is already
> >respected by writers, since Nick Piggin commit db64fe02258f1507e13fe5
> >("mm: rewrite vmap layer") back in linux-2.6.28
> 
> While rcu list traversal over the vmap_area_list is safe, this may
> arrive at different results than the spinlocked version. The rcu list
> traversal version will not be a 'snapshot' of a single, valid instant
> of the entire vmap_area_list, but rather a potential amalgam of
> different list states.

Hello,

Yes, you are right, but I don't think that we should be strict here.
Meminfo is already not a 'snapshot' at specific time. While we try to
get certain stats, the other stats can change.
And, although we may arrive at different results than the spinlocked
version, the difference would not be large and would not make serious
side-effect.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress

2014-06-10 Thread Pranith Kumar

Hi Paul,

On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
 wrote:
>>   if (rnp->gpnum != rnp->completed ||
>> - ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
>> + ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
>
> At this point in the code, we are checking the current rcu_node structure,
> which might or might not be the root.  If it is not the root, we absolutely
> cannot compare against the root because we don't yet hold the root's lock.
>

I was a bit thrown by the double checking which is being done
(rnp->gpnum != rnp->complete) in that if condition. Once without
ACCESS_ONCE and one with. Is there any particular reason for this?

I now understand that we are comparing ->gpnum and ->completed of the
root node which might change from under us if we don't hold the root's
lock. I will keep looking :)

Thanks!
-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: rtl8192e: dot11d: Fixed coding style issues

2014-06-10 Thread Greg KH

On Wed, Jun 11, 2014 at 09:25:47AM +0530, A Raghavendra Rao wrote:
> From: Raghavendra 
> 
> Fixed coding style issues

Which specific coding style issue?  Be exact please.

And don't try to fix more than one type of coding style issue at a
time...

> 
> Signed-off-by: A Raghavendra Rao 

This name doesn't match the From: line :(

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

recvmmsg/sendmmsg result types inconsistent, integer overflows?

2014-06-10 Thread Rich Felker

While looking to add support for the recvmmsg and sendmmsg syscalls in
musl libc, I ran into some disturbing findings on the kernel side. In
the struct mmsghdr, the field where the result for each message is
stored has type int, which is inconsistent with the return type
ssize_t of recvmsg/sendmsg. So I tried to track down what happens when
the result is or would be larger than 2GB, and quickly found an
explanation for why the type in the structure was defined wrong:
internally, the kernel uses int as the return type for revcmsg and
sendmsg. Oops.

A bit more RTFS'ing brought me to tcp_sendmsg in net/ipv4/tcp.c (I
figured let's look at a stream-based protocol, since datagrams can
likely never be that big for any existing protocol), and as far as I
can tell, it's haphazardly mixing int and size_t with no checks for
overflows. I looked for anywhere the kernel might try to verify before
starting that the sum of the lengths of all the iovec components
doesn't overflow INT_MAX or even SIZE_MAX, but didn't find any such
checks.

Is there some magic that makes this all safe, or is this a big mess of
possibly-security-relevant bugs?

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/13] staging: rtl8188eu: Remove unused funtion _rtw_read_mem()

2014-06-10 Thread navin patidar

some times i get confused by one patch should do only one thing
policy, for example this patch removes
many other things along  _rtw_read_mem().
But you are also right it's much easier to review when they are all
folded together.
I'm glad, i did it right this time. :)

regards,
navin patidar

On Tue, Jun 10, 2014 at 12:58 PM, Dan Carpenter
 wrote:
> Thanks.  This is much nicer to review when they are all folded together
> like this.
>
> regards,
> dan carpenter
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress

2014-06-10 Thread Paul E. McKenney

On Tue, Jun 10, 2014 at 11:20:19PM -0400, Pranith Kumar wrote:
> The comment above the code says that we are checking both the current node and
> the parent node to see if a grace period is in progress. Change the code
> accordingly.

Almost...  Please see below.

Thanx, Paul

> Signed-off-by: Pranith Kumar 
> ---
>  kernel/rcu/tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f1ba773..b632189 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1227,7 +1227,7 @@ rcu_start_future_gp(struct rcu_node *rnp, struct 
> rcu_data *rdp,
>* need to explicitly start one.
>*/
>   if (rnp->gpnum != rnp->completed ||
> - ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> + ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {

At this point in the code, we are checking the current rcu_node structure,
which might or might not be the root.  If it is not the root, we absolutely
cannot compare against the root because we don't yet hold the root's lock.

So I cannot take this change.

That said, I do heartily encourage you to keep looking.  After all, there
are bound to be at least a few bugs in RCU somewhere.

>   rnp->need_future_gp[c & 0x1]++;
>   trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
>   goto out;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH V2] rt/aio: fix rcu garbage collection might_sleep() splat

2014-06-10 Thread Mike Galbraith

On Tue, 2014-06-10 at 13:50 -0400, Benjamin LaHaise wrote: 
> On Tue, Jun 10, 2014 at 05:47:28AM +0200, Mike Galbraith wrote:
> > On Mon, 2014-06-09 at 10:08 +0800, Lai Jiangshan wrote: 
> > > Hi, rt-people
> > > 
> > > I don't think it is the correct direction.
> > > Softirq (including local_bh_disable()) in RT kernel should be preemptible.
> > 
> > How about the below then?
> > 
> > I was sorely tempted to post a tiny variant that dropped taking ctx_lock
> > in free_ioctx_users() entirely, as someone diddling with no reference
> > didn't make sense.  Cc Ben, he would know.
> 
> That should be okay...  Let's ask Kent to chime in on whether this looks 
> safe to him on the percpu ref front as well, since he's the one who wrote 
> this code.

Looking at the gizzard of our in-tree user of kiocb_set_cancel_fn()
(gadget), cancel() leads to dequeue() methods, which take other sleeping
locks, so tiniest variant is not an option, patchlet stands.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disable bus's drivers_autoprobe before rootfs has mounted

2014-06-10 Thread Greg KH

On Wed, Jun 11, 2014 at 10:14:40AM +0800, Peter Chen wrote:
> Hi Greg,
> 
> Currently, we can't disable auto probe function during booting
> if both device and device driver register code are built in due
> to .drivers_autoprobe is a private value for bus core and this
> value can only be changed by sys entry.

Then don't build them into the kernel :)

> It causes we can't implement feature that the user can choose
> manual binding and auto binding through module parameters.

Wait, you just asked about building the stuff into the kernel, not a
module.

> Eg, the default binding is automatic, but the user can override
> it by module parameter.

Do we do that for any other "bus" anywhere?

> Let's take USB peripheral as an example, there is a device for
> udc, and a device driver for usb gadget driver, at default, we want
> the device to be bound to driver automatically, this is what
> we have done now. But if there are more than one udcs and gadget
> drivers (eg one B port for mass storage, another B port for usb ethernet),
> the user may want to have specific binding (eg, udc-0 -> mass storage,
> udc-1 -> usb ethernet), so the binding will be established
> after rootfs has mounted. (This feature is implementing)

Then there better be a way to describe this on the kernel command line
(i.e. module paramaters), right?  Which is a total mess, why not just
not bind anything in this case and let the user pick what they want?

> From what I read code, we can't implement above feature, but I may
> be wrong, if you have some solutions, give me some hints please.
> If there is no solution for above feature, do we agree with exporting
> .drivers_autoprobe for bus driver or something similar?

I don't understand what you mean by this, care to show me with code?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] perf/amd: NULL return of kzalloc_node should be handled

2014-06-10 Thread Zhouyi Zhou


Signed-off-by: Zhouyi Zhou 
---
 arch/x86/kernel/cpu/perf_event_amd_uncore.c |   32 +++
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_uncore.c 
b/arch/x86/kernel/cpu/perf_event_amd_uncore.c
index 3bbdf4c..f60a50e 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_uncore.c
@@ -300,24 +300,28 @@ static void amd_uncore_cpu_up_prepare(unsigned int cpu)
 
if (amd_uncore_nb) {
uncore = amd_uncore_alloc(cpu);
-   uncore->cpu = cpu;
-   uncore->num_counters = NUM_COUNTERS_NB;
-   uncore->rdpmc_base = RDPMC_BASE_NB;
-   uncore->msr_base = MSR_F15H_NB_PERF_CTL;
-   uncore->active_mask = _nb_active_mask;
-   uncore->pmu = _nb_pmu;
-   *per_cpu_ptr(amd_uncore_nb, cpu) = uncore;
+   if (uncore) {
+   uncore->cpu = cpu;
+   uncore->num_counters = NUM_COUNTERS_NB;
+   uncore->rdpmc_base = RDPMC_BASE_NB;
+   uncore->msr_base = MSR_F15H_NB_PERF_CTL;
+   uncore->active_mask = _nb_active_mask;
+   uncore->pmu = _nb_pmu;
+   *per_cpu_ptr(amd_uncore_nb, cpu) = uncore;
+   }
}
 
if (amd_uncore_l2) {
uncore = amd_uncore_alloc(cpu);
-   uncore->cpu = cpu;
-   uncore->num_counters = NUM_COUNTERS_L2;
-   uncore->rdpmc_base = RDPMC_BASE_L2;
-   uncore->msr_base = MSR_F16H_L2I_PERF_CTL;
-   uncore->active_mask = _l2_active_mask;
-   uncore->pmu = _l2_pmu;
-   *per_cpu_ptr(amd_uncore_l2, cpu) = uncore;
+   if (uncore) {
+   uncore->cpu = cpu;
+   uncore->num_counters = NUM_COUNTERS_L2;
+   uncore->rdpmc_base = RDPMC_BASE_L2;
+   uncore->msr_base = MSR_F16H_L2I_PERF_CTL;
+   uncore->active_mask = _l2_active_mask;
+   uncore->pmu = _l2_pmu;
+   *per_cpu_ptr(amd_uncore_l2, cpu) = uncore;
+   }
}
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v2] perf/amd: Try to fix some mem allocation failure handling

2014-06-10 Thread Zhouyi Zhou

According to Peter's advice, put the failure handling to a goto chain.
Compiled in x86_64, could you check if there is anything that I missed.
 
Signed-off-by: Zhouyi Zhou 
---
 arch/x86/kernel/cpu/perf_event_amd_uncore.c |  111 ---
 1 file changed, 84 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_uncore.c 
b/arch/x86/kernel/cpu/perf_event_amd_uncore.c
index 3bbdf4c..30790d7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_uncore.c
@@ -294,31 +294,41 @@ static struct amd_uncore *amd_uncore_alloc(unsigned int 
cpu)
cpu_to_node(cpu));
 }
 
-static void amd_uncore_cpu_up_prepare(unsigned int cpu)
+static int amd_uncore_cpu_up_prepare(unsigned int cpu)
 {
-   struct amd_uncore *uncore;
+   struct amd_uncore *uncore_nb = NULL, *uncore_l2;
 
if (amd_uncore_nb) {
-   uncore = amd_uncore_alloc(cpu);
-   uncore->cpu = cpu;
-   uncore->num_counters = NUM_COUNTERS_NB;
-   uncore->rdpmc_base = RDPMC_BASE_NB;
-   uncore->msr_base = MSR_F15H_NB_PERF_CTL;
-   uncore->active_mask = _nb_active_mask;
-   uncore->pmu = _nb_pmu;
-   *per_cpu_ptr(amd_uncore_nb, cpu) = uncore;
+   uncore_nb = amd_uncore_alloc(cpu);
+   if (!uncore_nb)
+   goto fail;
+   uncore_nb->cpu = cpu;
+   uncore_nb->num_counters = NUM_COUNTERS_NB;
+   uncore_nb->rdpmc_base = RDPMC_BASE_NB;
+   uncore_nb->msr_base = MSR_F15H_NB_PERF_CTL;
+   uncore_nb->active_mask = _nb_active_mask;
+   uncore_nb->pmu = _nb_pmu;
+   *per_cpu_ptr(amd_uncore_nb, cpu) = uncore_nb;
}
 
if (amd_uncore_l2) {
-   uncore = amd_uncore_alloc(cpu);
-   uncore->cpu = cpu;
-   uncore->num_counters = NUM_COUNTERS_L2;
-   uncore->rdpmc_base = RDPMC_BASE_L2;
-   uncore->msr_base = MSR_F16H_L2I_PERF_CTL;
-   uncore->active_mask = _l2_active_mask;
-   uncore->pmu = _l2_pmu;
-   *per_cpu_ptr(amd_uncore_l2, cpu) = uncore;
+   uncore_l2 = amd_uncore_alloc(cpu);
+   if (!uncore_l2)
+   goto fail;
+   uncore_l2->cpu = cpu;
+   uncore_l2->num_counters = NUM_COUNTERS_L2;
+   uncore_l2->rdpmc_base = RDPMC_BASE_L2;
+   uncore_l2->msr_base = MSR_F16H_L2I_PERF_CTL;
+   uncore_l2->active_mask = _l2_active_mask;
+   uncore_l2->pmu = _l2_pmu;
+   *per_cpu_ptr(amd_uncore_l2, cpu) = uncore_l2;
}
+
+   return 0;
+
+fail:
+   kfree(uncore_nb);
+   return -ENOMEM;
 }
 
 static struct amd_uncore *
@@ -441,7 +451,7 @@ static void uncore_dead(unsigned int cpu, struct amd_uncore 
* __percpu *uncores)
 
if (!--uncore->refcnt)
kfree(uncore);
-   *per_cpu_ptr(amd_uncore_nb, cpu) = NULL;
+   *per_cpu_ptr(uncores, cpu) = NULL;
 }
 
 static void amd_uncore_cpu_dead(unsigned int cpu)
@@ -461,7 +471,8 @@ amd_uncore_cpu_notifier(struct notifier_block *self, 
unsigned long action,
 
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
-   amd_uncore_cpu_up_prepare(cpu);
+   if (amd_uncore_cpu_up_prepare(cpu))
+   return notifier_from_errno(-ENOMEM);
break;
 
case CPU_STARTING:
@@ -501,20 +512,33 @@ static void __init init_cpu_already_online(void *dummy)
amd_uncore_cpu_online(cpu);
 }
 
+static void cleanup_cpu_online(void *dummy)
+{
+   unsigned int cpu = smp_processor_id();
+
+   amd_uncore_cpu_dead(cpu);
+}
+
 static int __init amd_uncore_init(void)
 {
-   unsigned int cpu;
+   unsigned int cpu, cpu2;
int ret = -ENODEV;
 
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
-   return -ENODEV;
+   goto fail_nodev;
 
if (!cpu_has_topoext)
-   return -ENODEV;
+   goto fail_nodev;
 
if (cpu_has_perfctr_nb) {
amd_uncore_nb = alloc_percpu(struct amd_uncore *);
-   perf_pmu_register(_nb_pmu, amd_nb_pmu.name, -1);
+   if (!amd_uncore_nb) {
+   ret = -ENOMEM;
+   goto fail_nb;
+   }
+   ret = perf_pmu_register(_nb_pmu, amd_nb_pmu.name, -1);
+   if (ret)
+   goto fail_nb;
 
printk(KERN_INFO "perf: AMD NB counters detected\n");
ret = 0;
@@ -522,20 +546,28 @@ static int __init amd_uncore_init(void)
 
if (cpu_has_perfctr_l2) {
amd_uncore_l2 = alloc_percpu(struct amd_uncore *);
-   perf_pmu_register(_l2_pmu, amd_l2_pmu.name, -1);
+   if (!amd_uncore_l2) {
+   ret =

[PATCH] Staging: rtl8192e: dot11d: Fixed coding style issues

2014-06-10 Thread A Raghavendra Rao

From: Raghavendra 

Fixed coding style issues

Signed-off-by: A Raghavendra Rao 
---
 drivers/staging/rtl8192e/dot11d.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8192e/dot11d.c 
b/drivers/staging/rtl8192e/dot11d.c
index 53da610..ef9da86 100644
--- a/drivers/staging/rtl8192e/dot11d.c
+++ b/drivers/staging/rtl8192e/dot11d.c
@@ -49,6 +49,7 @@ static struct channel_list ChannelPlan[] = {
 void dot11d_init(struct rtllib_device *ieee)
 {
struct rt_dot11d_info *pDot11dInfo = GET_DOT11D_INFO(ieee);
+
pDot11dInfo->bEnabled = false;
 
pDot11dInfo->State = DOT11D_STATE_NONE;
@@ -133,12 +134,12 @@ void Dot11d_UpdateCountryIe(struct rtllib_device *dev, u8 
*pTaddr,
pTriple = (struct chnl_txpow_triple *)(pCoutryIe + 3);
for (i = 0; i < NumTriples; i++) {
if (MaxChnlNum >= pTriple->FirstChnl) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it1\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it1\n");
return;
}
if (MAX_CHANNEL_NUMBER < (pTriple->FirstChnl +
pTriple->NumChnls)) {
-   printk(KERN_INFO "Dot11d_UpdateCountryIe(): Invalid 
country IE, skip it2\n");
+   netdev_info(dev->dev, "Dot11d_UpdateCountryIe(): 
Invalid country IE, skip it2\n");
return;
}
 
@@ -165,7 +166,7 @@ u8 DOT11D_GetMaxTxPwrInDbm(struct rtllib_device *dev, u8 
Channel)
u8 MaxTxPwrInDbm = 255;
 
if (MAX_CHANNEL_NUMBER < Channel) {
-   printk(KERN_INFO "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
+   netdev_info(dev->dev, "DOT11D_GetMaxTxPwrInDbm(): Invalid 
Channel\n");
return MaxTxPwrInDbm;
}
if (pDot11dInfo->channel_map[Channel])
@@ -204,7 +205,7 @@ int ToLegalChannel(struct rtllib_device *dev, u8 channel)
}
 
if (MAX_CHANNEL_NUMBER < channel) {
-   printk(KERN_ERR "%s(): Invalid Channel\n", __func__);
+   netdev_err(dev->dev, "%s(): Invalid Channel\n", __func__);
return default_chn;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-10 Thread George Spelvin

> Actually, it's **fine**.  That's because RNDADDENTROPY adds the
> entropy to the input pool, which is has the limit flag set.  So we
> will never pull more entropy than the pool is credited as having.
> This means that race can't happen.  It ***is*** safe.
> 
> 1)  Assume the entropy count starts at 10 bytes.
> 
> 2)  Random writer mixes in 20 bytes of entropy into the entropy pool.
> 
> 3)  Random extractor tries to extract 32 bytes of entropy.  Since the
> entropy count is still is 10, it will only get 10 bytes.  (And if we
> started with the entropy count started at zero, we wouldn't extract
> any entropy at all.)
> 
> 4) Random writer credit the entropy counter with the 20 bytes mixed in
> step #2.
> 
> See? no problems!

You can forbid underflows, but the code doesn't forbid overflows.

1. Assume the entropy count starts at 512 bytes (input pool full)
2. Random writer mixes in 20 bytes of entropy into the input pool.
2a. Input pool entropy is, however, capped at 512 bytes.
3. Random extractor extracts 32 bytes of entropy from the pool.
   Succeeds because 32 < 512.  Pool is left with 480 bytes of
   entropy.
3a. Random extractor decrements pool entropy estimate to 480 bytes.
This is accurate.
4. Random writer credits pool with 20 bytes of entropy.
5. Input pool entropy is now 480 bytes, estimate is 500 bytes.

Problem, no?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] brcmfmac: prevent watchdog from interfering with scanning and connecting

2014-06-10 Thread Fu, Zhonghui

>From 14485894add32aedacb3e486ebb2cc2b73861abf Mon Sep 17 00:00:00 2001
From: Fu zhonghui 
Date: Wed, 11 Jun 2014 11:06:55 +0800
Subject: [PATCH] brcmfmac: prevent watchdog from interfering with scanning and 
connecting

Watchdog in brcmfmac driver may make WiFi chip enter sleep mode
before completion of scanning or connecting.

This will lead to scanning or connecting failure.

Increasing temporarily idle-time threshold during scanning or
connecting can ensure scanning or connecting success without
watchdog interference.

Signed-off-by: Fu zhonghui 
---
 drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c |   18 --
 .../net/wireless/brcm80211/brcmfmac/wl_cfg80211.c  |3 ++-
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c 
b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
index 13c89a0..729deab 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,10 @@
 #include "sdio_host.h"
 #include "chip.h"
 #include "nvram.h"
+#include "dhd.h"
+#include "fwil_types.h"
+#include "p2p.h"
+#include "wl_cfg80211.h"
 
 #define DCMD_RESP_TIMEOUT  2000/* In milli second */
 
@@ -307,6 +312,7 @@ struct rte_console {
 * when idle
 */
 #define BRCMF_IDLE_INTERVAL1
+#define BRCMF_IDLE_INTERVAL_SCANNING_CONNECTING100
 
 #define KSO_WAIT_US 50
 #define MAX_KSO_ATTEMPTS (PMU_MAX_TRANSITION_DLY/KSO_WAIT_US)
@@ -3613,9 +3619,9 @@ void brcmf_sdio_isr(struct brcmf_sdio *bus)
 
 static bool brcmf_sdio_bus_watchdog(struct brcmf_sdio *bus)
 {
-#ifdef DEBUG
struct brcmf_bus *bus_if = dev_get_drvdata(bus->sdiodev->dev);
-#endif /* DEBUG */
+   struct brcmf_cfg80211_info *cfg = bus_if->drvr->config;
+   struct brcmf_if *ifp = cfg->pub->iflist[0];
 
brcmf_dbg(TIMER, "Enter\n");
 
@@ -3678,6 +3684,14 @@ static bool brcmf_sdio_bus_watchdog(struct brcmf_sdio 
*bus)
 
/* On idle timeout clear activity flag and/or turn off clock */
if ((bus->idletime > 0) && (bus->clkstate == CLK_AVAIL)) {
+
+   if (test_bit(BRCMF_SCAN_STATUS_BUSY, >scan_status) ||
+   test_bit(BRCMF_VIF_STATUS_CONNECTING, 
>vif->sme_state)) {
+   bus->idletime = BRCMF_IDLE_INTERVAL_SCANNING_CONNECTING;
+   } else {
+   bus->idletime = BRCMF_IDLE_INTERVAL;
+   }
+
if (++bus->idlecount >= bus->idletime) {
bus->idlecount = 0;
if (bus->activity) {
diff --git a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c 
b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
index be19852..e76517e 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
@@ -913,6 +913,8 @@ brcmf_cfg80211_escan(struct wiphy *wiphy, struct 
brcmf_cfg80211_vif *vif,
return -EAGAIN;
}
 
+   set_bit(BRCMF_SCAN_STATUS_BUSY, >scan_status);
+
/* If scan req comes for p2p0, send it over primary I/F */
if (vif == cfg->p2p.bss_idx[P2PAPI_BSSCFG_DEVICE].vif)
vif = cfg->p2p.bss_idx[P2PAPI_BSSCFG_PRIMARY].vif;
@@ -933,7 +935,6 @@ brcmf_cfg80211_escan(struct wiphy *wiphy, struct 
brcmf_cfg80211_vif *vif,
}
 
cfg->scan_request = request;
-   set_bit(BRCMF_SCAN_STATUS_BUSY, >scan_status);
if (escan_req) {
cfg->escan_info.run = brcmf_run_escan;
err = brcmf_p2p_scan_prep(wiphy, request, vif);
-- 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Disable bus's drivers_autoprobe before rootfs has mounted

2014-06-10 Thread Peter Chen

Hi Greg,

Currently, we can't disable auto probe function during booting
if both device and device driver register code are built in due
to .drivers_autoprobe is a private value for bus core and this
value can only be changed by sys entry.

It causes we can't implement feature that the user can choose
manual binding and auto binding through module parameters.
Eg, the default binding is automatic, but the user can override
it by module parameter.

Let's take USB peripheral as an example, there is a device for
udc, and a device driver for usb gadget driver, at default, we want
the device to be bound to driver automatically, this is what
we have done now. But if there are more than one udcs and gadget
drivers (eg one B port for mass storage, another B port for usb ethernet),
the user may want to have specific binding (eg, udc-0 -> mass storage,
udc-1 -> usb ethernet), so the binding will be established
after rootfs has mounted. (This feature is implementing)

>From what I read code, we can't implement above feature, but I may
be wrong, if you have some solutions, give me some hints please.
If there is no solution for above feature, do we agree with exporting
.drivers_autoprobe for bus driver or something similar?

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/10] mm: rename allocflags_to_migratetype for clarity

2014-06-10 Thread Zhang Yanfei

On 06/11/2014 10:41 AM, Minchan Kim wrote:
> On Mon, Jun 09, 2014 at 11:26:19AM +0200, Vlastimil Babka wrote:
>> From: David Rientjes 
>>
>> The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like
>> ALLOC_CPUSET) that have separate semantics.
>>
>> The function allocflags_to_migratetype() actually takes gfp flags, not alloc
>> flags, and returns a migratetype.  Rename it to gfpflags_to_migratetype().
>>
>> Signed-off-by: David Rientjes 
>> Signed-off-by: Vlastimil Babka 
> 
> I was one of person who got confused sometime.

Some names in MM really make people confused. But sometimes thinking
an appropriate name is also a hard thing. Like I once wanted to change
the name of function nr_free_zone_pages() and also nr_free_buffer_pages().
But it is hard to name them, so at last Andrew suggested to add the
detailed function description to make it clear only.

Reviewed-by: Zhang Yanfei 

> 
> Acked-by: Minchan Kim 
> 


-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] vmalloc: use rcu list iterator to reduce vmap_area_lock contention

2014-06-10 Thread Peter Hurley


On 06/10/2014 10:19 PM, Joonsoo Kim wrote:

Richard Yao reported a month ago that his system have a trouble
with vmap_area_lock contention during performance analysis
by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo
stressfully, but he didn't answer it.

https://lkml.org/lkml/2014/4/10/416

Although I'm not sure that this is right usage or not, there is a solution
reducing vmap_area_lock contention with no side-effect. That is just
to use rcu list iterator in get_vmalloc_info().

rcu can be used in this function because all RCU protocol is already
respected by writers, since Nick Piggin commit db64fe02258f1507e13fe5
("mm: rewrite vmap layer") back in linux-2.6.28


While rcu list traversal over the vmap_area_list is safe, this may
arrive at different results than the spinlocked version. The rcu list
traversal version will not be a 'snapshot' of a single, valid instant
of the entire vmap_area_list, but rather a potential amalgam of
different list states.

This is because the vmap_area_list can continue to change during
list traversal.

Regards,
Peter Hurley


Specifically :
insertions use list_add_rcu(),
deletions use list_del_rcu() and kfree_rcu().

Note the rb tree is not used from rcu reader (it would not be safe),
only the vmap_area_list has full RCU protection.

Note that __purge_vmap_area_lazy() already uses this rcu protection.

 rcu_read_lock();
 list_for_each_entry_rcu(va, _area_list, list) {
 if (va->flags & VM_LAZY_FREE) {
 if (va->va_start < *start)
 *start = va->va_start;
 if (va->va_end > *end)
 *end = va->va_end;
 nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
 list_add_tail(>purge_list, );
 va->flags |= VM_LAZY_FREEING;
 va->flags &= ~VM_LAZY_FREE;
 }
 }
 rcu_read_unlock();

v2: add more commit description from Eric

[eduma...@google.com: add more commit description]
Reported-by: Richard Yao 
Acked-by: Eric Dumazet 
Signed-off-by: Joonsoo Kim 

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f64632b..fdbb116 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2690,14 +2690,14 @@ void get_vmalloc_info(struct vmalloc_info *vmi)

prev_end = VMALLOC_START;

-   spin_lock(_area_lock);
+   rcu_read_lock();

if (list_empty(_area_list)) {
vmi->largest_chunk = VMALLOC_TOTAL;
goto out;
}

-   list_for_each_entry(va, _area_list, list) {
+   list_for_each_entry_rcu(va, _area_list, list) {
unsigned long addr = va->va_start;

/*
@@ -2724,7 +2724,7 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
vmi->largest_chunk = VMALLOC_END - prev_end;

  out:
-   spin_unlock(_area_lock);
+   rcu_read_unlock();
  }
  #endif




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] random: remove unneeded hash of a portion of the entropy pool

2014-06-10 Thread Theodore Ts'o

We previously extracted a portion of the entropy pool in
mix_pool_bytes() and hashed it in to avoid racing CPU's from returning
duplicate random values.  Now that we are using a spinlock to prevent
this from happening, this is no longer necessary.  So remove it, to
simplify the code a bit.

Signed-off-by: Theodore Ts'o 
Cc: George Spelvin 
---
 drivers/char/random.c | 51 ---
 1 file changed, 20 insertions(+), 31 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 76d2f53..97f390b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -481,9 +481,9 @@ static __u32 const twist_table[8] = {
  * the entropy is concentrated in the low-order bits.
  */
 static void _mix_pool_bytes(struct entropy_store *r, const void *in,
-   int nbytes, __u8 out[64])
+   int nbytes)
 {
-   unsigned long i, j, tap1, tap2, tap3, tap4, tap5;
+   unsigned long i, tap1, tap2, tap3, tap4, tap5;
int input_rotate;
int wordmask = r->poolinfo->poolwords - 1;
const char *bytes = in;
@@ -525,27 +525,23 @@ static void _mix_pool_bytes(struct entropy_store *r, 
const void *in,
 
r->input_rotate = input_rotate;
r->add_ptr = i;
-
-   if (out)
-   for (j = 0; j < 16; j++)
-   ((__u32 *)out)[j] = r->pool[(i - j) & wordmask];
 }
 
 static void __mix_pool_bytes(struct entropy_store *r, const void *in,
-int nbytes, __u8 out[64])
+int nbytes)
 {
trace_mix_pool_bytes_nolock(r->name, nbytes, _RET_IP_);
-   _mix_pool_bytes(r, in, nbytes, out);
+   _mix_pool_bytes(r, in, nbytes);
 }
 
 static void mix_pool_bytes(struct entropy_store *r, const void *in,
-  int nbytes, __u8 out[64])
+  int nbytes)
 {
unsigned long flags;
 
trace_mix_pool_bytes(r->name, nbytes, _RET_IP_);
spin_lock_irqsave(>lock, flags);
-   _mix_pool_bytes(r, in, nbytes, out);
+   _mix_pool_bytes(r, in, nbytes);
spin_unlock_irqrestore(>lock, flags);
 }
 
@@ -737,13 +733,13 @@ void add_device_randomness(const void *buf, unsigned int 
size)
 
trace_add_device_randomness(size, _RET_IP_);
spin_lock_irqsave(_pool.lock, flags);
-   _mix_pool_bytes(_pool, buf, size, NULL);
-   _mix_pool_bytes(_pool, , sizeof(time), NULL);
+   _mix_pool_bytes(_pool, buf, size);
+   _mix_pool_bytes(_pool, , sizeof(time));
spin_unlock_irqrestore(_pool.lock, flags);
 
spin_lock_irqsave(_pool.lock, flags);
-   _mix_pool_bytes(_pool, buf, size, NULL);
-   _mix_pool_bytes(_pool, , sizeof(time), NULL);
+   _mix_pool_bytes(_pool, buf, size);
+   _mix_pool_bytes(_pool, , sizeof(time));
spin_unlock_irqrestore(_pool.lock, flags);
 }
 EXPORT_SYMBOL(add_device_randomness);
@@ -776,7 +772,7 @@ static void add_timer_randomness(struct timer_rand_state 
*state, unsigned num)
sample.cycles = random_get_entropy();
sample.num = num;
r = nonblocking_pool.initialized ? _pool : _pool;
-   mix_pool_bytes(r, , sizeof(sample), NULL);
+   mix_pool_bytes(r, , sizeof(sample));
 
/*
 * Calculate number of bits of randomness we probably added.
@@ -864,7 +860,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
return;
}
fast_pool->last = now;
-   __mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL);
+   __mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool));
 
/*
 * If we have architectural seed generator, produce a seed and
@@ -873,7 +869,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
 */
credit = 1;
if (arch_get_random_seed_long()) {
-   __mix_pool_bytes(r, , sizeof(seed), NULL);
+   __mix_pool_bytes(r, , sizeof(seed));
credit += sizeof(seed) * 4;
}
spin_unlock(>lock);
@@ -954,7 +950,7 @@ static void _xfer_secondary_pool(struct entropy_store *r, 
size_t nbytes)
  ENTROPY_BITS(r), ENTROPY_BITS(r->pull));
bytes = extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_bits / 8, rsvd_bytes);
-   mix_pool_bytes(r, tmp, bytes, NULL);
+   mix_pool_bytes(r, tmp, bytes);
credit_entropy_bits(r, bytes*8);
 }
 
@@ -1029,7 +1025,6 @@ static void extract_buf(struct entropy_store *r, __u8 
*out)
unsigned long l[LONGS(20)];
} hash;
__u32 workspace[SHA_WORKSPACE_WORDS];
-   __u8 extract[64];
unsigned long flags;
 
/*
@@ -1058,15 +1053,9 @@ static void extract_buf(struct entropy_store *r, __u8 
*out)
 * brute-forcing the feedback as hard as brute-forcing the
 * hash.
 */
-   __mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
+

[PATCH 1/3] random: always update the entropy pool under the spinlock

2014-06-10 Thread Theodore Ts'o

Instead of using lockless techniques introduced in commit
902c098a3663, use spin_trylock to try to grab entropy pool's lock.  If
we can't get the lock, then just try again on the next interrupt.

Based on discussions with George Spelvin.

Signed-off-by: Theodore Ts'o 
Cc: George Spelvin 
---
 drivers/char/random.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 102c50d..76d2f53 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -495,9 +495,8 @@ static void _mix_pool_bytes(struct entropy_store *r, const 
void *in,
tap4 = r->poolinfo->tap4;
tap5 = r->poolinfo->tap5;
 
-   smp_rmb();
-   input_rotate = ACCESS_ONCE(r->input_rotate);
-   i = ACCESS_ONCE(r->add_ptr);
+   input_rotate = r->input_rotate;
+   i = r->add_ptr;
 
/* mix one byte at a time to simplify size handling and churn faster */
while (nbytes--) {
@@ -524,9 +523,8 @@ static void _mix_pool_bytes(struct entropy_store *r, const 
void *in,
input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
}
 
-   ACCESS_ONCE(r->input_rotate) = input_rotate;
-   ACCESS_ONCE(r->add_ptr) = i;
-   smp_wmb();
+   r->input_rotate = input_rotate;
+   r->add_ptr = i;
 
if (out)
for (j = 0; j < 16; j++)
@@ -860,17 +858,31 @@ void add_interrupt_randomness(int irq, int irq_flags)
if ((fast_pool->count & 63) && !time_after(now, fast_pool->last + HZ))
return;
 
-   fast_pool->last = now;
-
r = nonblocking_pool.initialized ? _pool : _pool;
+   if (!spin_trylock(>lock)) {
+   fast_pool->count--;
+   return;
+   }
+   fast_pool->last = now;
__mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL);
 
/*
+* If we have architectural seed generator, produce a seed and
+* add it to the pool.  For the sake of paranoia count it as
+* 50% entropic.
+*/
+   credit = 1;
+   if (arch_get_random_seed_long()) {
+   __mix_pool_bytes(r, , sizeof(seed), NULL);
+   credit += sizeof(seed) * 4;
+   }
+   spin_unlock(>lock);
+
+   /*
 * If we don't have a valid cycle counter, and we see
 * back-to-back timer interrupts, then skip giving credit for
 * any entropy, otherwise credit 1 bit.
 */
-   credit = 1;
if (cycles == 0) {
if (irq_flags & __IRQF_TIMER) {
if (fast_pool->last_timer_intr)
@@ -880,16 +892,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->last_timer_intr = 0;
}
 
-   /*
-* If we have architectural seed generator, produce a seed and
-* add it to the pool.  For the sake of paranoia count it as
-* 50% entropic.
-*/
-   if (arch_get_random_seed_long()) {
-   __mix_pool_bytes(r, , sizeof(seed), NULL);
-   credit += sizeof(seed) * 4;
-   }
-
credit_entropy_bits(r, credit);
 }
 
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] random: only update the last_pulled time if we actually transferred entropy

2014-06-10 Thread Theodore Ts'o

In xfer_secondary_pull(), check to make sure we need to pull from the
secondary pool before checking and potentially updating the
last_pulled time.

Signed-off-by: Theodore Ts'o 
Cc: George Spelvin 
---
 drivers/char/random.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 97f390b..4bb6e37 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -919,6 +919,11 @@ static ssize_t extract_entropy(struct entropy_store *r, 
void *buf,
 static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes);
 static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 {
+   if (!r->pull ||
+   r->entropy_count >= (nbytes << (ENTROPY_SHIFT + 3)) ||
+   r->entropy_count > r->poolinfo->poolfracbits)
+   return;
+
if (r->limit == 0 && random_min_urandom_seed) {
unsigned long now = jiffies;
 
@@ -927,10 +932,8 @@ static void xfer_secondary_pool(struct entropy_store *r, 
size_t nbytes)
return;
r->last_pulled = now;
}
-   if (r->pull &&
-   r->entropy_count < (nbytes << (ENTROPY_SHIFT + 3)) &&
-   r->entropy_count < r->poolinfo->poolfracbits)
-   _xfer_secondary_pool(r, nbytes);
+
+   _xfer_secondary_pool(r, nbytes);
 }
 
 static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] random driver improvements

2014-06-10 Thread Theodore Ts'o

After going through a very long thread, and trying to work out things in
words, I was frustrated enough that I decided a good way to improve the
conversation was to do it in code instead of words.

I don't think any of this should be controversial...

- Ted

Theodore Ts'o (3):
  random: always update the entropy pool under the spinlock
  random: remove unneeded hash of a portion of the entropy pool
  random: only update the last_pulled time if we actually transferred
entropy

 drivers/char/random.c | 98 ---
 1 file changed, 46 insertions(+), 52 deletions(-)

-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/10] mm, compaction: remember position within pageblock in free pages scanner

2014-06-10 Thread Zhang Yanfei

On 06/09/2014 05:26 PM, Vlastimil Babka wrote:
> Unlike the migration scanner, the free scanner remembers the beginning of the
> last scanned pageblock in cc->free_pfn. It might be therefore rescanning pages
> uselessly when called several times during single compaction. This might have
> been useful when pages were returned to the buddy allocator after a failed
> migration, but this is no longer the case.
> 
> This patch changes the meaning of cc->free_pfn so that if it points to a
> middle of a pageblock, that pageblock is scanned only from cc->free_pfn to the
> end. isolate_freepages_block() will record the pfn of the last page it looked
> at, which is then used to update cc->free_pfn.
> 
> In the mmtests stress-highalloc benchmark, this has resulted in lowering the
> ratio between pages scanned by both scanners, from 2.5 free pages per migrate
> page, to 2.25 free pages per migrate page, without affecting success rates.
> 
> Signed-off-by: Vlastimil Babka 

Reviewed-by: Zhang Yanfei 

> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> Cc: David Rientjes 
> ---
>  mm/compaction.c | 33 -
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 83f72bd..58dfaaa 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -297,7 +297,7 @@ static bool suitable_migration_target(struct page *page)
>   * (even though it may still end up isolating some pages).
>   */
>  static unsigned long isolate_freepages_block(struct compact_control *cc,
> - unsigned long blockpfn,
> + unsigned long *start_pfn,
>   unsigned long end_pfn,
>   struct list_head *freelist,
>   bool strict)
> @@ -306,6 +306,7 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   struct page *cursor, *valid_page = NULL;
>   unsigned long flags;
>   bool locked = false;
> + unsigned long blockpfn = *start_pfn;
>  
>   cursor = pfn_to_page(blockpfn);
>  
> @@ -314,6 +315,9 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   int isolated, i;
>   struct page *page = cursor;
>  
> + /* Record how far we have got within the block */
> + *start_pfn = blockpfn;
> +
>   /*
>* Periodically drop the lock (if held) regardless of its
>* contention, to give chance to IRQs. Abort async compaction
> @@ -424,6 +428,9 @@ isolate_freepages_range(struct compact_control *cc,
>   LIST_HEAD(freelist);
>  
>   for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
> + /* Protect pfn from changing by isolate_freepages_block */
> + unsigned long isolate_start_pfn = pfn;
> +
>   if (!pfn_valid(pfn) || cc->zone != page_zone(pfn_to_page(pfn)))
>   break;
>  
> @@ -434,8 +441,8 @@ isolate_freepages_range(struct compact_control *cc,
>   block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
>   block_end_pfn = min(block_end_pfn, end_pfn);
>  
> - isolated = isolate_freepages_block(cc, pfn, block_end_pfn,
> -, true);
> + isolated = isolate_freepages_block(cc, _start_pfn,
> + block_end_pfn, , true);
>  
>   /*
>* In strict mode, isolate_freepages_block() returns 0 if
> @@ -774,6 +781,7 @@ static void isolate_freepages(struct zone *zone,
>   block_end_pfn = block_start_pfn,
>   block_start_pfn -= pageblock_nr_pages) {
>   unsigned long isolated;
> + unsigned long isolate_start_pfn;
>  
>   /*
>* This can iterate a massively long zone without finding any
> @@ -807,12 +815,27 @@ static void isolate_freepages(struct zone *zone,
>   continue;
>  
>   /* Found a block suitable for isolating free pages from */
> - cc->free_pfn = block_start_pfn;
> - isolated = isolate_freepages_block(cc, block_start_pfn,
> + isolate_start_pfn = block_start_pfn;
> +
> + /*
> +  * If we are restarting the free scanner in this block, do not
> +  * rescan the beginning of the block
> +  */
> + if (cc->free_pfn < block_end_pfn)
> + isolate_start_pfn = cc->free_pfn;
> +
> + isolated = isolate_freepages_block(cc, _start_pfn,
>   block_end_pfn, freelist, false);
>   nr_freepages += isolated;
>  
>   /*
> +  * Remember where the free scanner should restart next

[PATCH v6 5/9] seccomp: split mode set routines

2014-06-10 Thread Kees Cook

Extracts the common check/assign logic, and separates the two mode
setting paths to make things more readable with fewer #ifdefs within
function bodies.

Signed-off-by: Kees Cook 
---
 kernel/seccomp.c |  124 +-
 1 file changed, 85 insertions(+), 39 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7ec99b99e400..39d32c2904fc 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -195,7 +195,29 @@ static u32 seccomp_run_filters(int syscall)
}
return ret;
 }
+#endif /* CONFIG_SECCOMP_FILTER */
 
+static inline bool seccomp_check_mode(struct task_struct *task,
+ unsigned long seccomp_mode)
+{
+   BUG_ON(!spin_is_locked(>sighand->siglock));
+
+   if (task->seccomp.mode && task->seccomp.mode != seccomp_mode)
+   return false;
+
+   return true;
+}
+
+static inline void seccomp_assign_mode(struct task_struct *task,
+  unsigned long seccomp_mode)
+{
+   BUG_ON(!spin_is_locked(>sighand->siglock));
+
+   task->seccomp.mode = seccomp_mode;
+   set_tsk_thread_flag(task, TIF_SECCOMP);
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -486,69 +508,86 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * seccomp_set_mode: internal function for setting seccomp mode
- * @seccomp_mode: requested mode to use
- * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ * seccomp_set_mode_strict: internal function for setting strict seccomp
  *
- * This function may be called repeatedly with a @seccomp_mode of
- * SECCOMP_MODE_FILTER to install additional filters.  Every filter
- * successfully installed will be evaluated (in reverse order) for each system
- * call the task makes.
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+static long seccomp_set_mode_strict(void)
+{
+   const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
+   unsigned long irqflags;
+   int ret = -EINVAL;
+
+   if (unlikely(!lock_task_sighand(current, )))
+   return -EINVAL;
+
+   if (!seccomp_check_mode(current, seccomp_mode))
+   goto out;
+
+#ifdef TIF_NOTSC
+   disable_TSC();
+#endif
+   seccomp_assign_mode(current, seccomp_mode);
+   ret = 0;
+
+out:
+   unlock_task_sighand(current, );
+
+   return ret;
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
+/**
+ * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @filter: struct sock_fprog containing filter
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the task makes.
  *
  * Once current->seccomp.mode is non-zero, it may not be changed.
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode_filter(char __user *filter)
 {
+   const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
struct seccomp_filter *prepared = NULL;
unsigned long irqflags;
long ret = -EINVAL;
 
-#ifdef CONFIG_SECCOMP_FILTER
-   /* Prepare the new filter outside of the seccomp lock. */
-   if (seccomp_mode == SECCOMP_MODE_FILTER) {
-   prepared = seccomp_prepare_user_filter(filter);
-   if (IS_ERR(prepared))
-   return PTR_ERR(prepared);
-   }
-#endif
+   /* Prepare the new filter outside of any locking. */
+   prepared = seccomp_prepare_user_filter(filter);
+   if (IS_ERR(prepared))
+   return PTR_ERR(prepared);
 
if (unlikely(!lock_task_sighand(current, )))
goto out_free;
 
-   if (current->seccomp.mode &&
-   current->seccomp.mode != seccomp_mode)
+   if (!seccomp_check_mode(current, seccomp_mode))
goto out;
 
-   switch (seccomp_mode) {
-   case SECCOMP_MODE_STRICT:
-   ret = 0;
-#ifdef TIF_NOTSC
-   disable_TSC();
-#endif
-   break;
-#ifdef CONFIG_SECCOMP_FILTER
-   case SECCOMP_MODE_FILTER:
-   ret = seccomp_attach_filter(prepared);
-   if (ret)
-   goto out;
-   /* Do not free the successfully attached filter. */
-   prepared = NULL;
-   break;
-#endif
-   default:
+   ret = seccomp_attach_filter(prepared);
+   if (ret)
goto out;
-   }
+   /* Do not free the successfully attached filter. */
+   prepared = NULL;
 
-   current->seccomp.mode = seccomp_mode;
-   set_thread_flag(TIF_SECCOMP);
+   seccomp_assign_mode(current, seccomp_mode);
 out:
unlock_task_sighand(current, );
 out_free:

[PATCH v6 9/9] MIPS: add seccomp syscall

2014-06-10 Thread Kees Cook

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook 
---
 arch/mips/include/uapi/asm/unistd.h |   15 +--
 arch/mips/kernel/scall32-o32.S  |1 +
 arch/mips/kernel/scall64-64.S   |1 +
 arch/mips/kernel/scall64-n32.S  |1 +
 arch/mips/kernel/scall64-o32.S  |1 +
 5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/uapi/asm/unistd.h 
b/arch/mips/include/uapi/asm/unistd.h
index 5805414777e0..9bc13eaf9d67 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -372,16 +372,17 @@
 #define __NR_sched_setattr (__NR_Linux + 349)
 #define __NR_sched_getattr (__NR_Linux + 350)
 #define __NR_renameat2 (__NR_Linux + 351)
+#define __NR_seccomp   (__NR_Linux + 352)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls351
+#define __NR_Linux_syscalls352
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux 4000
-#define __NR_O32_Linux_syscalls351
+#define __NR_O32_Linux_syscalls352
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -701,16 +702,17 @@
 #define __NR_sched_setattr (__NR_Linux + 309)
 #define __NR_sched_getattr (__NR_Linux + 310)
 #define __NR_renameat2 (__NR_Linux + 311)
+#define __NR_seccomp   (__NR_Linux + 312)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls311
+#define __NR_Linux_syscalls312
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux  5000
-#define __NR_64_Linux_syscalls 311
+#define __NR_64_Linux_syscalls 312
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1034,15 +1036,16 @@
 #define __NR_sched_setattr (__NR_Linux + 313)
 #define __NR_sched_getattr (__NR_Linux + 314)
 #define __NR_renameat2 (__NR_Linux + 315)
+#define __NR_seccomp   (__NR_Linux + 316)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls315
+#define __NR_Linux_syscalls316
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux 6000
-#define __NR_N32_Linux_syscalls315
+#define __NR_N32_Linux_syscalls316
 
 #endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 3245474f19d5..ab02d14f1b5c 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -578,3 +578,4 @@ EXPORT(sys_call_table)
PTR sys_sched_setattr
PTR sys_sched_getattr   /* 4350 */
PTR sys_renameat2
+   PTR sys_seccomp
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index be2fedd4ae33..010dccf128ec 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -431,4 +431,5 @@ EXPORT(sys_call_table)
PTR sys_sched_setattr
PTR sys_sched_getattr   /* 5310 */
PTR sys_renameat2
+   PTR sys_seccomp
.size   sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index c1dbcda4b816..c3b3b6525df5 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -424,4 +424,5 @@ EXPORT(sysn32_call_table)
PTR sys_sched_setattr
PTR sys_sched_getattr
PTR sys_renameat2   /* 6315 */
+   PTR sys_seccomp
.size   sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f1343ccd7ed7..bb1550b1f501 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -557,4 +557,5 @@ EXPORT(sys32_call_table)
PTR sys_sched_setattr
PTR sys_sched_getattr   /* 4350 */
PTR sys_renameat2
+   PTR sys_seccomp
.size   sys32_call_table,.-sys32_call_table
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 3/9] seccomp: introduce writer locking

2014-06-10 Thread Kees Cook

Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to
the seccomp fields are now protected by the sighand spinlock (which
is unique to the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the tasklist_lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook 
---
 include/linux/seccomp.h |6 +++---
 kernel/fork.c   |   40 
 kernel/seccomp.c|   22 --
 3 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 4054b0994071..9ff98b4bfe2e 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -14,11 +14,11 @@ struct seccomp_filter;
  *
  * @mode:  indicates one of the valid values above for controlled
  * system calls available to a process.
- * @filter: The metadata and ruleset for determining what system calls
- *  are allowed for a task.
+ * @filter: must always point to a valid seccomp-filter or NULL as it is
+ *  accessed without locking during system call entry.
  *
  *  @filter must only be accessed from the context of current as there
- *  is no locking.
+ *  is no read locking.
  */
 struct seccomp {
int mode;
diff --git a/kernel/fork.c b/kernel/fork.c
index d2799d1fc952..6b2a9add1079 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -315,6 +315,15 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig)
goto free_ti;
 
tsk->stack = ti;
+#ifdef CONFIG_SECCOMP
+   /*
+* We must handle setting up seccomp filters once we're under
+* the tasklist_lock in case orig has changed between now and
+* then. Until then, filter must be NULL to avoid messing up
+* the usage counts on the error path calling free_task.
+*/
+   tsk->seccomp.filter = NULL;
+#endif
 
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
@@ -1081,6 +1090,23 @@ static int copy_signal(unsigned long clone_flags, struct 
task_struct *tsk)
return 0;
 }
 
+static void copy_seccomp(struct task_struct *p)
+{
+#ifdef CONFIG_SECCOMP
+   /*
+* Must be called with sighand->lock held. Child lock not needed
+* since it is not yet in tasklist.
+*/
+   BUG_ON(!spin_is_locked(>sighand->siglock));
+
+   get_seccomp_filter(current);
+   p->seccomp = current->seccomp;
+
+   if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
+   set_tsk_thread_flag(p, TIF_SECCOMP);
+#endif
+}
+
 SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
 {
current->clear_child_tid = tidptr;
@@ -1142,6 +1168,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
 {
int retval;
struct task_struct *p;
+   unsigned long irqflags;
 
if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
@@ -1196,7 +1223,6 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
goto fork_out;
 
ftrace_graph_init_task(p);
-   get_seccomp_filter(p);
 
rt_mutex_init_task(p);
 
@@ -1434,7 +1460,13 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
p->parent_exec_id = current->self_exec_id;
}
 
-   spin_lock(>sighand->siglock);
+   spin_lock_irqsave(>sighand->siglock, irqflags);
+
+   /*
+* Copy seccomp details explicitly here, in case they were changed
+* before holding tasklist_lock.
+*/
+   copy_seccomp(p);
 
/*
 * Process group and session signals need to be delivered to just the
@@ -1446,7 +1478,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
*/
recalc_sigpending();
if (signal_pending(current)) {
-   spin_unlock(>sighand->siglock);
+   spin_unlock_irqrestore(>sighand->siglock, irqflags);
write_unlock_irq(_lock);
retval = -ERESTARTNOINTR;
goto bad_fork_free_pid;
@@ -1486,7 +1518,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
}

[PATCH v6 7/9] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC

2014-06-10 Thread Kees Cook

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions are against another thread requesting TSYNC, another
thread performing a clone, and another thread changing its filter. The
sighand lock is sufficient for these cases, though the clone case is
assisted by the tasklist_lock so that new threads must have a duplicate
of its parent seccomp state when it appears on the tasklist.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes 
Signed-off-by: Kees Cook 
---
 arch/Kconfig |1 +
 include/uapi/linux/seccomp.h |4 ++
 kernel/seccomp.c |  135 +-
 3 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 97ff872c7acc..0eae9df35b88 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -321,6 +321,7 @@ config HAVE_ARCH_SECCOMP_FILTER
  - secure_computing is called from a ptrace_event()-safe context
  - secure_computing return value is checked and a return value of -1
results in the system call being skipped immediately.
+ - seccomp syscall wired up
 
 config SECCOMP_FILTER
def_bool y
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index b258878ba754..3e651f757b48 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -14,6 +14,10 @@
 #define SECCOMP_SET_MODE_STRICT0
 #define SECCOMP_SET_MODE_FILTER1
 
+/* Valid flags for SECCOMP_SET_MODE_FILTER */
+#define SECCOMP_FILTER_FLAG_TSYNC  1
+#define SECCOMP_FILTER_FLAG_MASK   ~(SECCOMP_FILTER_FLAG_TSYNC)
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index c0cafa9e84af..d03d470ca36b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,7 @@
 #ifdef CONFIG_SECCOMP_FILTER
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -219,6 +220,107 @@ static inline void seccomp_assign_mode(struct task_struct 
*task,
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
+/* Returns 1 if the candidate is an ancestor. */
+static int is_ancestor(struct seccomp_filter *candidate,
+  struct seccomp_filter *child)
+{
+   /* NULL is the root ancestor. */
+   if (candidate == NULL)
+   return 1;
+   for (; child; child = child->prev)
+   if (child == candidate)
+   return 1;
+   return 0;
+}
+
+/**
+ * seccomp_can_sync_threads: checks if all threads can be synchronized
+ *
+ * Expects both tasklist_lock and current->sighand->siglock to be held.
+ *
+ * Returns 0 on success, -ve on error, or the pid of a thread which was
+ * either not in the correct seccomp mode or it did not have an ancestral
+ * seccomp filter.
+ */
+static pid_t seccomp_can_sync_threads(void)
+{
+   struct task_struct *thread, *caller;
+
+   BUG_ON(write_can_lock(_lock));
+   BUG_ON(!spin_is_locked(>sighand->siglock));
+
+   if (current->seccomp.mode != SECCOMP_MODE_FILTER)
+   return -EACCES;
+
+   /* Validate all threads being eligible for synchronization. */
+   thread = caller = current;
+   for_each_thread(caller, thread) {
+   pid_t failed;
+
+   if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
+   (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
+is_ancestor(thread->seccomp.filter,
+caller->seccomp.filter)))
+   continue;
+
+   /* Return the first thread that cannot be synchronized. */
+   failed = task_pid_vnr(thread);
+   /* If the pid cannot be resolved, then return -ESRCH */
+   if (failed ==

[PATCH v6 4/9] seccomp: move no_new_privs into seccomp

2014-06-10 Thread Kees Cook

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, changes must be atomic. This moves the nnp
flag into the seccomp field as a separate unsigned long for atomic access.

Signed-off-by: Kees Cook 
Acked-by: Andy Lutomirski 
---
 fs/exec.c  |4 ++--
 include/linux/sched.h  |   13 ++---
 include/linux/seccomp.h|8 +++-
 kernel/seccomp.c   |2 +-
 kernel/sys.c   |4 ++--
 security/apparmor/domain.c |4 ++--
 6 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 238b7aa26f68..614fcb993739 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1233,7 +1233,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 * This isn't strictly necessary, but it makes it harder for LSMs to
 * mess up.
 */
-   if (current->no_new_privs)
+   if (task_no_new_privs(current))
bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
 
t = p;
@@ -1271,7 +1271,7 @@ int prepare_binprm(struct linux_binprm *bprm)
bprm->cred->egid = current_egid();
 
if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
-   !current->no_new_privs &&
+   !task_no_new_privs(current) &&
kuid_has_mapping(bprm->cred->user_ns, inode->i_uid) &&
kgid_has_mapping(bprm->cred->user_ns, inode->i_gid)) {
/* Set-uid? */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ea74596014a2..50b41affb7b1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1307,9 +1307,6 @@ struct task_struct {
 * execve */
unsigned in_iowait:1;
 
-   /* task may not gain privileges */
-   unsigned no_new_privs:1;
-
/* Revert to default priority/policy when forking */
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;
@@ -2525,6 +2522,16 @@ static inline void task_unlock(struct task_struct *p)
spin_unlock(>alloc_lock);
 }
 
+static inline bool task_no_new_privs(struct task_struct *p)
+{
+   return test_bit(SECCOMP_FLAG_NO_NEW_PRIVS, >seccomp.flags);
+}
+
+static inline void task_set_no_new_privs(struct task_struct *p)
+{
+   set_bit(SECCOMP_FLAG_NO_NEW_PRIVS, >seccomp.flags);
+}
+
 extern struct sighand_struct *__lock_task_sighand(struct task_struct *tsk,
unsigned long *flags);
 
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9ff98b4bfe2e..6a5e2d0ec912 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,6 +3,8 @@
 
 #include 
 
+#define SECCOMP_FLAG_NO_NEW_PRIVS  0   /* task may not gain privs */
+
 #ifdef CONFIG_SECCOMP
 
 #include 
@@ -16,6 +18,7 @@ struct seccomp_filter;
  * system calls available to a process.
  * @filter: must always point to a valid seccomp-filter or NULL as it is
  *  accessed without locking during system call entry.
+ * @flags: flags under task->sighand->siglock lock
  *
  *  @filter must only be accessed from the context of current as there
  *  is no read locking.
@@ -23,6 +26,7 @@ struct seccomp_filter;
 struct seccomp {
int mode;
struct seccomp_filter *filter;
+   unsigned long flags;
 };
 
 extern int __secure_computing(int);
@@ -51,7 +55,9 @@ static inline int seccomp_mode(struct seccomp *s)
 
 #include 
 
-struct seccomp { };
+struct seccomp {
+   unsigned long flags;
+};
 struct seccomp_filter { };
 
 static inline int secure_computing(int this_syscall) { return 0; }
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 33655302b658..7ec99b99e400 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -219,7 +219,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct 
sock_fprog *fprog)
 * This avoids scenarios where unprivileged tasks can affect the
 * behavior of privileged children.
 */
-   if (!current->no_new_privs &&
+   if (!task_no_new_privs(current) &&
security_capable_noaudit(current_cred(), current_user_ns(),
 CAP_SYS_ADMIN) != 0)
return ERR_PTR(-EACCES);
diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751ebf9d9..ce8129192a26 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1990,12 +1990,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
arg2, unsigned long, arg3,
if (arg2 != 1 || arg3 || arg4 || arg5)
return -EINVAL;
 
-   current->no_new_privs = 1;
+   task_set_no_new_privs(current);
break;
case PR_GET_NO_NEW_PRIVS:
if (arg2 || arg3 || arg4 || arg5)
return -EINVAL;
-   return current->no_new_privs ? 1 : 0;
+   return task_no_new_privs(current) ? 1 : 0;
case PR_GET_THP_DISABLE:
if (arg2 || arg3 || arg4

[PATCH v6 1/9] seccomp: create internal mode-setting function

2014-06-10 Thread Kees Cook

In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook 
---
 kernel/seccomp.c |   16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f6d76bebe69f..552b972b8f83 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -465,7 +465,7 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * prctl_set_seccomp: configures current->seccomp.mode
+ * seccomp_set_mode: internal function for setting seccomp mode
  * @seccomp_mode: requested mode to use
  * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
  *
@@ -478,7 +478,7 @@ long prctl_get_seccomp(void)
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 {
long ret = -EINVAL;
 
@@ -509,3 +509,15 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char 
__user *filter)
 out:
return ret;
 }
+
+/**
+ * prctl_set_seccomp: configures current->seccomp.mode
+ * @seccomp_mode: requested mode to use
+ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+{
+   return seccomp_set_mode(seccomp_mode, filter);
+}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 2/9] seccomp: split filter prep from check and apply

2014-06-10 Thread Kees Cook

In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

Signed-off-by: Kees Cook 
---
 kernel/seccomp.c |   86 --
 1 file changed, 58 insertions(+), 28 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 552b972b8f83..7a9257ddd69c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* #define SECCOMP_DEBUG 1 */
 
@@ -26,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -197,27 +197,21 @@ static u32 seccomp_run_filters(int syscall)
 }
 
 /**
- * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
  *
- * Returns 0 on success or an errno on failure.
+ * Returns filter on success or an ERR_PTR on failure.
  */
-static long seccomp_attach_filter(struct sock_fprog *fprog)
+static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
struct seccomp_filter *filter;
unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
-   unsigned long total_insns = fprog->len;
struct sock_filter *fp;
int new_len;
long ret;
 
if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
-   return -EINVAL;
-
-   for (filter = current->seccomp.filter; filter; filter = filter->prev)
-   total_insns += filter->len + 4;  /* include a 4 instr penalty */
-   if (total_insns > MAX_INSNS_PER_PATH)
-   return -ENOMEM;
+   return ERR_PTR(-EINVAL);
 
/*
 * Installing a seccomp filter requires that the task has
@@ -228,11 +222,11 @@ static long seccomp_attach_filter(struct sock_fprog 
*fprog)
if (!current->no_new_privs &&
security_capable_noaudit(current_cred(), current_user_ns(),
 CAP_SYS_ADMIN) != 0)
-   return -EACCES;
+   return ERR_PTR(-EACCES);
 
fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
if (!fp)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
/* Copy the instructions from fprog. */
ret = -EFAULT;
@@ -270,31 +264,26 @@ static long seccomp_attach_filter(struct sock_fprog 
*fprog)
atomic_set(>usage, 1);
filter->len = new_len;
 
-   /*
-* If there is an existing filter, make it the prev and don't drop its
-* task reference.
-*/
-   filter->prev = current->seccomp.filter;
-   current->seccomp.filter = filter;
-   return 0;
+   return filter;
 
 free_filter:
kfree(filter);
 free_prog:
kfree(fp);
-   return ret;
+   return ERR_PTR(ret);
 }
 
 /**
- * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * seccomp_prepare_user_filter - prepares a user-supplied sock_fprog
  * @user_filter: pointer to the user data containing a sock_fprog.
  *
- * Returns 0 on success and non-zero otherwise.
+ * Returns filter on success and ERR_PTR otherwise.
  */
-static long seccomp_attach_user_filter(char __user *user_filter)
+static
+struct seccomp_filter *seccomp_prepare_user_filter(char __user *user_filter)
 {
struct sock_fprog fprog;
-   long ret = -EFAULT;
+   struct seccomp_filter *filter = ERR_PTR(-EFAULT);
 
 #ifdef CONFIG_COMPAT
if (is_compat_task()) {
@@ -307,9 +296,37 @@ static long seccomp_attach_user_filter(char __user 
*user_filter)
 #endif
if (copy_from_user(, user_filter, sizeof(fprog)))
goto out;
-   ret = seccomp_attach_filter();
+   filter = seccomp_prepare_filter();
 out:
-   return ret;
+   return filter;
+}
+
+/**
+ * seccomp_attach_filter: validate and attach filter
+ * @filter: seccomp filter to add to the current process
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static long seccomp_attach_filter(struct seccomp_filter *filter)
+{
+   unsigned long total_insns;
+   struct seccomp_filter *walker;
+
+   /* Validate resulting filter length. */
+   total_insns = filter->len;
+   for (walker = current->seccomp.filter; walker; walker = filter->prev)
+   total_insns += walker->len + 4;  /* include a 4 instr penalty */
+   if (total_insns > MAX_INSNS_PER_PATH)
+   return -ENOMEM;
+
+   /*
+* If there is an existing filter, make it the prev and don't drop its
+* task reference.
+*/
+   filter->prev = current->seccomp.filter;
+   current->seccomp.filter = filter;
+
+   return 0;
 }
 
 /* get_seccomp_filter - increments the reference count of the filter on @tsk */
@@ -480,8 +497,18 @@ long prctl_get_seccomp(void)
  */

[PATCH v6 0/9] seccomp: add thread sync ability

2014-06-10 Thread Kees Cook

[re-send with smaller CC list]

This adds the ability for threads to request seccomp filter
synchronization across their thread group (at filter attach time).
For example, for Chrome to make sure graphic driver threads are fully
confined after seccomp filters have been attached.

To support this, locking on seccomp changes is introduced, along with
refactoring of no_new_privs. Races with thread creation/death are handled
via tasklist_lock.

This includes a new syscall (instead of adding a new prctl option),
as suggested by Andy Lutomirski and Michael Kerrisk.

Thanks!

-Kees

v6:
 - switch from seccomp-specific lock to thread-group lock to gain atomicity
 - implement seccomp syscall across all architectures with seccomp filter
 - clean up sparse warnings around locking
v5:
 - move includes around (drysdale)
 - drop set_nnp return value (luto)
 - use smp_load_acquire/store_release (luto)
 - merge nnp changes to seccomp always, fewer ifdef (luto)
v4:
 - cleaned up locking further, as noticed by David Drysdale
v3:
 - added SECCOMP_EXT_ACT_FILTER for new filter install options
v2:
 - reworked to avoid clone races

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 6/9] seccomp: add "seccomp" syscall

2014-06-10 Thread Kees Cook

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

Signed-off-by: Kees Cook 
Cc: linux-...@vger.kernel.org
---
 arch/x86/syscalls/syscall_32.tbl  |1 +
 arch/x86/syscalls/syscall_64.tbl  |1 +
 include/linux/syscalls.h  |2 ++
 include/uapi/asm-generic/unistd.h |4 ++-
 include/uapi/linux/seccomp.h  |4 +++
 kernel/seccomp.c  |   63 -
 kernel/sys_ni.c   |3 ++
 7 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index d6b867921612..7527eac24122 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -360,3 +360,4 @@
 351i386sched_setattr   sys_sched_setattr
 352i386sched_getattr   sys_sched_getattr
 353i386renameat2   sys_renameat2
+354i386seccomp sys_seccomp
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index ec255a1646d2..16272a6c12b7 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -323,6 +323,7 @@
 314common  sched_setattr   sys_sched_setattr
 315common  sched_getattr   sys_sched_getattr
 316common  renameat2   sys_renameat2
+317common  seccomp sys_seccomp
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b0881a0ed322..1713977ee26f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
 unsigned long idx1, unsigned long idx2);
 asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
+asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
+   const char __user *uargs);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index 333640608087..65acbf0e2867 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -699,9 +699,11 @@ __SYSCALL(__NR_sched_setattr, sys_sched_setattr)
 __SYSCALL(__NR_sched_getattr, sys_sched_getattr)
 #define __NR_renameat2 276
 __SYSCALL(__NR_renameat2, sys_renameat2)
+#define __NR_seccomp 277
+__SYSCALL(__NR_seccomp, sys_seccomp)
 
 #undef __NR_syscalls
-#define __NR_syscalls 277
+#define __NR_syscalls 278
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index ac2dc9f72973..b258878ba754 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -10,6 +10,10 @@
 #define SECCOMP_MODE_STRICT1 /* uses hard-coded filter. */
 #define SECCOMP_MODE_FILTER2 /* uses user-supplied filter. */
 
+/* Valid operations for seccomp syscall. */
+#define SECCOMP_SET_MODE_STRICT0
+#define SECCOMP_SET_MODE_FILTER1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 39d32c2904fc..c0cafa9e84af 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* #define SECCOMP_DEBUG 1 */
 
@@ -301,8 +302,8 @@ free_prog:
  *
  * Returns filter on success and ERR_PTR otherwise.
  */
-static
-struct seccomp_filter *seccomp_prepare_user_filter(char __user *user_filter)
+static struct seccomp_filter *
+seccomp_prepare_user_filter(const char __user *user_filter)
 {
struct sock_fprog fprog;
struct seccomp_filter *filter = ERR_PTR(-EFAULT);
@@ -325,19 +326,25 @@ out:
 
 /**
  * seccomp_attach_filter: validate and attach filter
+ * @flags:  flags to change filter behavior
  * @filter: seccomp filter to add to the current process
  *
  * Caller must be holding current->sighand->siglock lock.
  *
  * Returns 0 on success, -ve on error.
  */
-static long seccomp_attach_filter(struct seccomp_filter *filter)
+static long seccomp_attach_filter(unsigned int flags,
+ struct seccomp_filter *filter)
 {
unsigned long total_insns;
struct seccomp_filter *walker;
 
BUG_ON(!spin_is_locked(>sighand->siglock));
 
+   /* Validate flags. */
+   if (flags != 0)
+   return -EINVAL;
+
/* Validate resulting filter length. */
total_insns = filter->len;
for (walker = current->seccomp.filter; walker; walker = filter->prev)
@@ -541,6 +548,7 @@ out:
 #ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_set_mode_filter:

[PATCH v6 8/9] ARM: add seccomp syscall

2014-06-10 Thread Kees Cook

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook 
---
 arch/arm/include/uapi/asm/unistd.h |1 +
 arch/arm/kernel/calls.S|1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h 
b/arch/arm/include/uapi/asm/unistd.h
index ba94446c72d9..e21b4a069701 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -409,6 +409,7 @@
 #define __NR_sched_setattr (__NR_SYSCALL_BASE+380)
 #define __NR_sched_getattr (__NR_SYSCALL_BASE+381)
 #define __NR_renameat2 (__NR_SYSCALL_BASE+382)
+#define __NR_seccomp   (__NR_SYSCALL_BASE+383)
 
 /*
  * This may need to be greater than __NR_last_syscall+1 in order to
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 8f51bdcdacbb..bea85f97f363 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -392,6 +392,7 @@
 /* 380 */  CALL(sys_sched_setattr)
CALL(sys_sched_getattr)
CALL(sys_renameat2)
+   CALL(sys_seccomp)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] KEYS: validate key trust with owner and builtin keys only

2014-06-10 Thread Matthew Garrett

On Tue, Jun 10, 2014 at 11:08:15PM -0400, Mimi Zohar wrote:
> On Wed, 2014-06-11 at 03:22 +0100, Matthew Garrett wrote: 
> > Providing a userspace mechanism for selectively dropping keys from the 
> > kernel seems like a good thing?
> 
> No, patch "KEYS: verify a certificate is signed by a 'trusted' key" adds
> signed public keys.

Yes. Wouldn't having a mechanism to allow userspace to drop keys that 
have otherwise been imported be a generally useful solution to the issue 
you have with that?

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm/vmscan.c: avoid recording the original scan targets in shrink_lruvec()

2014-06-10 Thread Chen Yucong

On Tue, 2014-06-10 at 16:33 -0700, Andrew Morton wrote:
> >   break;
> >  
> >   if (nr_file > nr_anon) {
> > - unsigned long scan_target =
> targets[LRU_INACTIVE_ANON] +
> >
> - targets[LRU_ACTIVE_ANON]
> + 1;
> > + nr_to_scan = nr_file - ratio * nr_anon;
> > + percentage = nr[LRU_FILE] * 100 / nr_file;
> 
> here, nr_file and nr_anon are derived from the contents of nr[].  But
> nr[] was modified in the for_each_evictable_lru() loop, so its
> contents
> now may differ from what was in targets[]? 

nr_to_scan is used for recording the number of pages that should be
scanned to keep original *ratio*.

We can assume that the value of (nr_file > nr_anon) is true, nr_to_scan
should be distribute to nr[LRU_ACTIVE_FILE] and nr[LRU_INACTIVE_FILE] in
proportion.

nr_file = nr[LRU_ACTIVE_FILE] + nr[LRU_INACTIVE_FILE];
percentage = nr[LRU_FILE] / nr_file;

Note that in comparison with *old* percentage, the "new" percentage has
the different meaning. It is just used to divide nr_so_scan pages
appropriately.

thx!
cyc 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress

2014-06-10 Thread Pranith Kumar

The comment above the code says that we are checking both the current node and
the parent node to see if a grace period is in progress. Change the code
accordingly.

Signed-off-by: Pranith Kumar 
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f1ba773..b632189 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1227,7 +1227,7 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data 
*rdp,
 * need to explicitly start one.
 */
if (rnp->gpnum != rnp->completed ||
-   ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+   ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
rnp->need_future_gp[c & 0x1]++;
trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
goto out;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] KEYS: validate key trust with owner and builtin keys only

2014-06-10 Thread Mimi Zohar

On Wed, 2014-06-11 at 03:22 +0100, Matthew Garrett wrote: 
> On Tue, Jun 10, 2014 at 09:24:53PM -0400, Mimi Zohar wrote:
> > On Tue, 2014-06-10 at 22:40 +0100, Matthew Garrett wrote: 
> > > The hole is that the system trusts keys that you don't trust. The 
> > > appropriate thing to do is to remove that trust from the entire system, 
> > > not just one layer of the system. If people gain the impression that 
> > > they can simply pass a kernel parameter and avoid trusting the vendor 
> > > keys, they'll be upset to discover that it's easily circumvented.
> > 
> > Assuming I remove all the keys I don't trust, there are still keys that
> > are trusted while booting, but are not necessary afterwards.  We should
> > be able to limit the scope of where and when keys are trusted.
> 
> Providing a userspace mechanism for selectively dropping keys from the 
> kernel seems like a good thing?

No, patch "KEYS: verify a certificate is signed by a 'trusted' key" adds
signed public keys.

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/tlb_uv: Fixing some memory allocation failure in x86 UV

2014-06-10 Thread Joe Perches

On Wed, 2014-06-11 at 09:46 +0800, Zhouyi Zhou wrote:
> Thanks for reviewing, I will work on a new version

If you do, please remove the "out of memory" messages.

These messages are redundant to a generic OOM and
stack dump from the memory subsystem.

Less code is also makes it less likely to have an OOM.

btw: I added a new checkpatch test based on your patch.

https://lkml.org/lkml/2014/6/10/382


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Change in security maintainer for a few weeks

2014-06-10 Thread Serge E. Hallyn

Quoting Mimi Zohar (zo...@linux.vnet.ibm.com):
> On Tue, 2014-06-10 at 13:23 -0700, Greg KH wrote: 
> > On Tue, Jun 10, 2014 at 03:20:15PM +1000, James Morris wrote:
> > > On Thu, 5 Jun 2014, Greg KH wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > James has had to step back from doing kernel work for a few weeks, so
> > > > I've offered to step up and handle the security patches to get shuttled
> > > > to Linus for merging for a while.
> > > > 
> > > > I'll take his git tree on kernel.org and push those to Linus for
> > > > 3.16-rc1, as those look like they have had proper testing in linux-next.
> > > > 
> > > > But there only seems to be 14 patches in there.  Are there pending
> > > > patches that people have been sending and need to get in besides those?
> > > > 
> > > 
> > > Thanks for stepping up.
> > > 
> > > I had to take urgent leave and was unable to access email until this week.
> > > 
> > > We really should have a co-maintainer for the security subsystem -- 
> > > please 
> > > feel free to nominate someone.
> > > 
> > > There are several good candidates, including Serge, Kees and Paul Moore.
> > 
> > As Serge has done it in the past when you are on vacation, I suggest he
> > continue doing it.
> 
> As long as Serge agrees, that sounds good.

Sure, happy to do help however I can.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/10] mm, compaction: pass gfp mask to compact_control

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:20AM +0200, Vlastimil Babka wrote:
> From: David Rientjes 
> 
> struct compact_control currently converts the gfp mask to a migratetype, but 
> we
> need the entire gfp mask in a follow-up patch.
> 
> Pass the entire gfp mask as part of struct compact_control.
> 
> Signed-off-by: David Rientjes 
> Signed-off-by: Vlastimil Babka 
> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> ---
>  mm/compaction.c | 12 +++-
>  mm/internal.h   |  2 +-
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c339ccd..d1e30ba 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -965,8 +965,8 @@ static isolate_migrate_t isolate_migratepages(struct zone 
> *zone,
>   return ISOLATE_SUCCESS;
>  }
>  
> -static int compact_finished(struct zone *zone,
> - struct compact_control *cc)
> +static int compact_finished(struct zone *zone, struct compact_control *cc,
> + const int migratetype)

If we has gfp_mask, we could use gfpflags_to_migratetype from cc->gfp_mask.
What's is your intention?

>  {
>   unsigned int order;
>   unsigned long watermark;
> @@ -1012,7 +1012,7 @@ static int compact_finished(struct zone *zone,
>   struct free_area *area = >free_area[order];
>  
>   /* Job done if page is free of the right migratetype */
> - if (!list_empty(>free_list[cc->migratetype]))
> + if (!list_empty(>free_list[migratetype]))
>   return COMPACT_PARTIAL;
>  
>   /* Job done if allocation would set block type */
> @@ -1078,6 +1078,7 @@ static int compact_zone(struct zone *zone, struct 
> compact_control *cc)
>   int ret;
>   unsigned long start_pfn = zone->zone_start_pfn;
>   unsigned long end_pfn = zone_end_pfn(zone);
> + const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
>   const bool sync = cc->mode != MIGRATE_ASYNC;
>  
>   ret = compaction_suitable(zone, cc->order);
> @@ -1120,7 +1121,8 @@ static int compact_zone(struct zone *zone, struct 
> compact_control *cc)
>  
>   migrate_prep_local();
>  
> - while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
> + while ((ret = compact_finished(zone, cc, migratetype)) ==
> + COMPACT_CONTINUE) {
>   int err;
>  
>   switch (isolate_migratepages(zone, cc)) {
> @@ -1178,7 +1180,7 @@ static unsigned long compact_zone_order(struct zone 
> *zone, int order,
>   .nr_freepages = 0,
>   .nr_migratepages = 0,
>   .order = order,
> - .migratetype = gfpflags_to_migratetype(gfp_mask),
> + .gfp_mask = gfp_mask,
>   .zone = zone,
>   .mode = mode,
>   };
> diff --git a/mm/internal.h b/mm/internal.h
> index 584d04f..af15461 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -149,7 +149,7 @@ struct compact_control {
>   bool finished_update_migrate;
>  
>   int order;  /* order a direct compactor needs */
> - int migratetype;/* MOVABLE, RECLAIMABLE etc */
> + const gfp_t gfp_mask;   /* gfp mask of a direct compactor */
>   struct zone *zone;
>   enum compact_contended contended; /* Signal need_sched() or lock
>  * contention detected during
> -- 
> 1.8.4.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/10] mm, compaction: do not recheck suitable_migration_target under lock

2014-06-10 Thread Zhang Yanfei

On 06/09/2014 05:26 PM, Vlastimil Babka wrote:
> isolate_freepages_block() rechecks if the pageblock is suitable to be a target
> for migration after it has taken the zone->lock. However, the check has been
> optimized to occur only once per pageblock, and compact_checklock_irqsave()
> might be dropping and reacquiring lock, which means somebody else might have
> changed the pageblock's migratetype meanwhile.
> 
> Furthermore, nothing prevents the migratetype to change right after
> isolate_freepages_block() has finished isolating. Given how imperfect this is,
> it's simpler to just rely on the check done in isolate_freepages() without
> lock, and not pretend that the recheck under lock guarantees anything. It is
> just a heuristic after all.
> 
> Signed-off-by: Vlastimil Babka 

Reviewed-by: Zhang Yanfei 

> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> Cc: David Rientjes 
> ---
> I suggest folding mm-compactionc-isolate_freepages_block-small-tuneup.patch 
> into this
> 
>  mm/compaction.c | 13 -
>  1 file changed, 13 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5175019..b73b182 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -276,7 +276,6 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   struct page *cursor, *valid_page = NULL;
>   unsigned long flags;
>   bool locked = false;
> - bool checked_pageblock = false;
>  
>   cursor = pfn_to_page(blockpfn);
>  
> @@ -307,18 +306,6 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   if (!locked)
>   break;
>  
> - /* Recheck this is a suitable migration target under lock */
> - if (!strict && !checked_pageblock) {
> - /*
> -  * We need to check suitability of pageblock only once
> -  * and this isolate_freepages_block() is called with
> -  * pageblock range, so just check once is sufficient.
> -  */
> - checked_pageblock = true;
> - if (!suitable_migration_target(page))
> - break;
> - }
> -
>   /* Recheck this is a buddy page under lock */
>   if (!PageBuddy(page))
>   goto isolate_fail;
> 


-- 
Thanks.
Zhang Yanfei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/10] mm: rename allocflags_to_migratetype for clarity

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:19AM +0200, Vlastimil Babka wrote:
> From: David Rientjes 
> 
> The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like
> ALLOC_CPUSET) that have separate semantics.
> 
> The function allocflags_to_migratetype() actually takes gfp flags, not alloc
> flags, and returns a migratetype.  Rename it to gfpflags_to_migratetype().
> 
> Signed-off-by: David Rientjes 
> Signed-off-by: Vlastimil Babka 

I was one of person who got confused sometime.

Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] block: make nr_requests tunable for loop

2014-06-10 Thread Junxiao Bi

On 06/10/2014 11:12 AM, Jens Axboe wrote:
> On 2014-06-09 20:50, Junxiao Bi wrote:
>> On 06/10/2014 10:41 AM, Jens Axboe wrote:
>>> On 2014-06-09 20:31, Junxiao Bi wrote:
 commit 7b5a3522 (loop: Limit the number of requests in the bio list)
 limit
 the request number in loop queue to not over 128. Since the
 "request_fn" of
 loop device is null, the requests number is not allowed tuned. Make
 it tunable
 from sysfs can improve performance.

 The following test is done on a machine with 512M memory. The
 backend of
 /dev/loop1 is a nfs file.

 [root@bijx mnt]# cat /sys/block/loop0/queue/nr_requests
 128
 [root@bijx mnt]# dd if=/dev/zero of=/dev/loop0 bs=1M count=5000
 5000+0 records in
 5000+0 records out
 524288 bytes (5.2 GB) copied, 501.572 s, 10.5 MB/s
 [root@bijx mnt]#
 [root@bijx mnt]# echo 1024 > /sys/block/loop0/queue/nr_requests
 [root@bijx mnt]# cat /sys/block/loop0/queue/nr_requests
 1024
 [root@bijx mnt]# dd if=/dev/zero of=/dev/loop0 bs=1M count=5000
 5000+0 records in
 5000+0 records out
 524288 bytes (5.2 GB) copied, 464.481 s, 11.3 MB/s

 Signed-off-by: Junxiao Bi 
 ---
block/blk-core.c  |6 ++
block/blk-sysfs.c |9 +++--
2 files changed, 9 insertions(+), 6 deletions(-)

 diff --git a/block/blk-core.c b/block/blk-core.c
 index 40d6548..58c4bd4 100644
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -851,6 +851,12 @@ int blk_update_nr_requests(struct request_queue
 *q, unsigned int nr)
q->nr_requests = nr;
blk_queue_congestion_threshold(q);

 +/* for loop device, return after set its nr_requests */
 +if (!q->request_fn) {
 +spin_unlock_irq(q->queue_lock);
 +return 0;
 +}
>>>
>>> It'd be prettier to split this differently - something ala:
>>>
>>> if (request_fn)
>>>  blk_update_congestion_thresholds(q);
>> The congestion threshholds is needed in commit 7b5a3522 (loop: Limit the
>> number of requests in the bio list). So I think it needs be set even
>> request_fn is null.
>
> I mean the request list thresholds, the part below where you currently
> just exit.
>
>>> But I think you have a larger issue here... For the request lists, we
>>> update the congestion thresholds and wakeup anyone waiting, if we need
>>> to. There's no way to do that for loop, since the waitqueue is
>>> internal to loop.
>> Loop do the congestion control by itself, in loop_make_request() /
>> loop_thread().
>
> Yes, that is my point! You update nr_congestion_off, but you don't
> wake anyone currently sitting in wait_event_lock_irq() on that value.
> See what the code below where you just exit does for request list
> based devices.
Jens, do you have an idea to resolve it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler

2014-06-10 Thread Yuyang Du

On Tue, Jun 10, 2014 at 12:16:22PM +0200, Peter Zijlstra wrote:
> What other target would you optimize for? The purpose here is to build
> an energy aware scheduler, one that schedules tasks so that the total
> amount of energy, for the given amount of work, is minimal.
> 
> So we can't measure in Watt, since if we forced the CPU into the lowest
> P-state (or even C-state for that matter) work would simply not
> complete. So we need a complete energy term.
> 
> Now. IPC is instructions/cycle, Watt is Joule/second, so IPC/Watt is
> 
> instructions   second
>  * -- ~ instructions / joule
>   cyclejoule
> 
> Seeing how both cycles and seconds are time units.
> 
> So for any given amount of instructions, the work needs to be done, we
> want the minimal amount of energy consumed, and IPC/Watt is the natural
> metric to measure this over an entire workload.

Ok, I understand. Whether we take IPC/watt as an input metric in scheduler or
as a goal for scheduler, we definitely need to try both.

Thanks, Peter.

Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sctp: Fix sk_ack_backlog wrap-around problem

2014-06-10 Thread Xufeng Zhang

Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman 
Signed-off-by: Xufeng Zhang 
---
 net/sctp/associola.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 39579c3..60564f2 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -330,7 +330,7 @@ void sctp_association_free(struct sctp_association *asoc)
/* Only real associations count against the endpoint, so
 * don't bother for if this is a temporary association.
 */
-   if (!asoc->temp) {
+   if (!asoc->temp && !list_empty(>asocs)) {
list_del(>asocs);
 
/* Decrement the backlog value for a TCP-style listening
-- 
1.7.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI / hotplug / PCI: Add hotplug contexts to PCI host bridges

2014-06-10 Thread Bjorn Helgaas

On Tue, Jun 10, 2014 at 2:51 PM, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> After relatively recent changes in the ACPI-based PCI hotplug
> (ACPIPHP) code, the acpiphp_check_host_bridge() executed for PCI
> host bridges via acpi_pci_root_scan_dependent() doesn't do anything
> useful, because those bridges do not have hotplug contexts.  That
> happens by mistake, so fix it by making acpiphp_enumerate_slots()
> add hotplug contexts to PCI host bridges too and modify
> acpiphp_remove_slots() to drop those contexts for host bridges
> as appropriate.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=76901
> Fixes: 2d8b1d566a5f (ACPI / hotplug / PCI: Get rid of check_sub_bridges())
> Reported-and-tested-by: Gavin Guo 
> Cc: 3.15+  # 3.15+
> Signed-off-by: Rafael J. Wysocki 

Acked-by: Bjorn Helgaas 

Rafael, do you want to merge this via your tree, since you merged the
original acpiphp rework?

I do have a small cleanup of acpiphp_glue.c in my queue, but it won't
conflict with this.

Thanks a lot for fixing this!

Bjorn

> ---
>  drivers/pci/hotplug/acpiphp.h  |   10 ++
>  drivers/pci/hotplug/acpiphp_glue.c |   60 
> +
>  2 files changed, 52 insertions(+), 18 deletions(-)
>
> Index: linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> ===
> --- linux-pm.orig/drivers/pci/hotplug/acpiphp_glue.c
> +++ linux-pm/drivers/pci/hotplug/acpiphp_glue.c
> @@ -373,17 +373,13 @@ static acpi_status acpiphp_add_context(a
>
>  static struct acpiphp_bridge *acpiphp_dev_to_bridge(struct acpi_device *adev)
>  {
> -   struct acpiphp_context *context;
> struct acpiphp_bridge *bridge = NULL;
>
> acpi_lock_hp_context();
> -   context = acpiphp_get_context(adev);
> -   if (context) {
> -   bridge = context->bridge;
> +   if (adev->hp) {
> +   bridge = to_acpiphp_root_context(adev->hp)->root_bridge;
> if (bridge)
> get_bridge(bridge);
> -
> -   acpiphp_put_context(context);
> }
> acpi_unlock_hp_context();
> return bridge;
> @@ -881,7 +877,17 @@ void acpiphp_enumerate_slots(struct pci_
>  */
> get_device(>dev);
>
> -   if (!pci_is_root_bus(bridge->pci_bus)) {
> +   acpi_lock_hp_context();
> +   if (pci_is_root_bus(bridge->pci_bus)) {
> +   struct acpiphp_root_context *root_context;
> +
> +   root_context = kzalloc(sizeof(*root_context), GFP_KERNEL);
> +   if (!root_context)
> +   goto err;
> +
> +   root_context->root_bridge = bridge;
> +   acpi_set_hp_context(adev, _context->hp, NULL, NULL, 
> NULL);
> +   } else {
> struct acpiphp_context *context;
>
> /*
> @@ -890,21 +896,16 @@ void acpiphp_enumerate_slots(struct pci_
>  * parent is going to be handled by pciehp, in which case this
>  * bridge is not interesting to us either.
>  */
> -   acpi_lock_hp_context();
> context = acpiphp_get_context(adev);
> -   if (!context) {
> -   acpi_unlock_hp_context();
> -   put_device(>dev);
> -   pci_dev_put(bridge->pci_dev);
> -   kfree(bridge);
> -   return;
> -   }
> +   if (!context)
> +   goto err;
> +
> bridge->context = context;
> context->bridge = bridge;
> /* Get a reference to the parent bridge. */
> get_bridge(context->func.parent);
> -   acpi_unlock_hp_context();
> }
> +   acpi_unlock_hp_context();
>
> /* Must be added to the list prior to calling acpiphp_add_context(). 
> */
> mutex_lock(_mutex);
> @@ -919,6 +920,30 @@ void acpiphp_enumerate_slots(struct pci_
> cleanup_bridge(bridge);
> put_bridge(bridge);
> }
> +   return;
> +
> + err:
> +   acpi_unlock_hp_context();
> +   put_device(>dev);
> +   pci_dev_put(bridge->pci_dev);
> +   kfree(bridge);
> +}
> +
> +void acpiphp_drop_bridge(struct acpiphp_bridge *bridge)
> +{
> +   if (pci_is_root_bus(bridge->pci_bus)) {
> +   struct acpiphp_root_context *root_context;
> +   struct acpi_device *adev;
> +
> +   acpi_lock_hp_context();
> +   adev = ACPI_COMPANION(bridge->pci_bus->bridge);
> +   root_context = to_acpiphp_root_context(adev->hp);
> +   adev->hp = NULL;
> +   acpi_unlock_hp_context();
> +   kfree(root_context);
> +   }
> +   cleanup_bridge(bridge);
> +   put_bridge(bridge);
>  }
>
>  /**
> @@ -936,8 +961,7 @@ void acpiphp_remove_slots(struct pci_bus
> list_for_each_entry(bridge, _list,

Re: linux-next: build failure after merge of the pci tree

2014-06-10 Thread Bjorn Helgaas

On Tue, Jun 10, 2014 at 8:02 PM, Stephen Rothwell  wrote:
> Hi Bjorn,
>
> After merging the pci tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
>
>
> ERROR: ".pci_try_set_mwi" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
> ERROR: ".pci_clear_mwi" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
> ERROR: ".pci_try_set_mwi" [drivers/scsi/lpfc/lpfc.ko] undefined!
>
> Caused by commit 9259d755975f ("PCI: Move EXPORT_SYMBOL so it
> immediately follows function/variable").
> arch/powerpc/include/asm/pci.h defines PCI_DISABLE_MWI and there are
> two version of those functions depending on the setting of that symbol.
>
> I have used the pci tree from next-20140610 for today.

Thanks.  This is my fault, not Ryan's.  I made more similar changes,
but didn't notice the #ifdefs around the definitions.  I fixed it and
will update my "next" branch tomorrow, after Fengguang's buildbot
confirms the fix.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] KEYS: validate key trust with owner and builtin keys only

2014-06-10 Thread Matthew Garrett

On Tue, Jun 10, 2014 at 09:24:53PM -0400, Mimi Zohar wrote:
> On Tue, 2014-06-10 at 22:40 +0100, Matthew Garrett wrote: 
> > The hole is that the system trusts keys that you don't trust. The 
> > appropriate thing to do is to remove that trust from the entire system, 
> > not just one layer of the system. If people gain the impression that 
> > they can simply pass a kernel parameter and avoid trusting the vendor 
> > keys, they'll be upset to discover that it's easily circumvented.
> 
> Assuming I remove all the keys I don't trust, there are still keys that
> are trusted while booting, but are not necessary afterwards.  We should
> be able to limit the scope of where and when keys are trusted.

Providing a userspace mechanism for selectively dropping keys from the 
kernel seems like a good thing?

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-10 Thread Theodore Ts'o

On Tue, Jun 10, 2014 at 08:10:03PM -0400, George Spelvin wrote:
> 
> But even I get annoyed when I have a 1-line comment typo fix and wonder
> if it really deserves its own commit or if I can just include it with
> the other changes I'm making to that file.

Unless you're actually modifying that section of code, I usually
recommend that people just not bother.  The fact that someone included
a one-line comment fix caused a merge conflict with the ext4 pull in
this merge window that Linus had to fix up.  Not that it's a big deal,
but unless it's something that's really going to confuse the reader, I
treat it as white space fixes; something that's only worth fixing if
that particular function is being modified for a "real" change.

> I have half a dozen patches to random.c already waiting.  For example,
> one is a bulk conversion of __u8 and __u32 to u8 and u32.  The underscore
> versions are only intended for public header files where namespace
> pollution is a problem.

And sorry, I consider this sort of thing is to be just code wankery.
We use __u32 in a number of places in the kernel, and it doesn't
really affect code readability.  A change like this almost guarantees
that stable patches won't apply manually, and will have to be ported
manually.  It's just not worth it.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: numa: drop ZONE_ALIGN

2014-06-10 Thread Luiz Capitulino

On Tue, 10 Jun 2014 15:10:01 -0700 (PDT)
David Rientjes  wrote:

> On Mon, 9 Jun 2014, Luiz Capitulino wrote:
> 
> > > > > > diff --git a/arch/x86/include/asm/numa.h 
> > > > > > b/arch/x86/include/asm/numa.h
> > > > > > index 4064aca..01b493e 100644
> > > > > > --- a/arch/x86/include/asm/numa.h
> > > > > > +++ b/arch/x86/include/asm/numa.h
> > > > > > @@ -9,7 +9,6 @@
> > > > > >  #ifdef CONFIG_NUMA
> > > > > >  
> > > > > >  #define NR_NODE_MEMBLKS(MAX_NUMNODES*2)
> > > > > > -#define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> > > > > >  
> > > > > >  /*
> > > > > >   * Too small node sizes may confuse the VM badly. Usually they
> > > > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > > > > index 1d045f9..69f6362 100644
> > > > > > --- a/arch/x86/mm/numa.c
> > > > > > +++ b/arch/x86/mm/numa.c
> > > > > > @@ -200,8 +200,6 @@ static void __init setup_node_data(int nid, u64 
> > > > > > start, u64 end)
> > > > > > if (end && (end - start) < NODE_MIN_SIZE)
> > > > > > return;
> > > > > >  
> > > > > > -   start = roundup(start, ZONE_ALIGN);
> > > > > > -
> > > > > > printk(KERN_INFO "Initmem setup node %d [mem 
> > > > > > %#010Lx-%#010Lx]\n",
> > > > > >nid, start, end - 1);
> > > > > >  
> > > > > 
> > > > > What ensures this start address is page aligned from the BIOS?
> > > > 
> > > > To which start address do you refer to?
> > > 
> > > The start address displayed in the dmesg is not page aligned anymore with 
> > > your change, correct?  
> > 
> > I have to check that but I don't expect this to happen because my
> > understanding of the code is that what's rounded up here is just discarded
> > in free_area_init_node(). Am I wrong?
> > 
> 
> NODE_DATA(nid)->node_start_pfn needs to be accurate if 
> node_set_online(nid).  Since there is no guarantee about page alignment 
> from the ACPI spec, removing the roundup() entirely could cause the 
> address shift >> PAGE_SIZE to be off by one.  I, like you, do not see the 
> need for the ZONE_ALIGN above, but I think we agree that it should be 
> replaced with PAGE_SIZE instead.

Agreed. I'm just not completely sure setup_node_data() is the best place
for it, shouldn't we do it in acpi_numa_memory_affinity_init(), which is
when the ranges are read off the SRAT table?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] vmalloc: use rcu list iterator to reduce vmap_area_lock contention

2014-06-10 Thread Joonsoo Kim

Richard Yao reported a month ago that his system have a trouble
with vmap_area_lock contention during performance analysis
by /proc/meminfo. Andrew asked why his analysis checks /proc/meminfo
stressfully, but he didn't answer it.

https://lkml.org/lkml/2014/4/10/416

Although I'm not sure that this is right usage or not, there is a solution
reducing vmap_area_lock contention with no side-effect. That is just
to use rcu list iterator in get_vmalloc_info().

rcu can be used in this function because all RCU protocol is already
respected by writers, since Nick Piggin commit db64fe02258f1507e13fe5
("mm: rewrite vmap layer") back in linux-2.6.28

Specifically :
   insertions use list_add_rcu(),
   deletions use list_del_rcu() and kfree_rcu().

Note the rb tree is not used from rcu reader (it would not be safe),
only the vmap_area_list has full RCU protection.

Note that __purge_vmap_area_lazy() already uses this rcu protection.

rcu_read_lock();
list_for_each_entry_rcu(va, _area_list, list) {
if (va->flags & VM_LAZY_FREE) {
if (va->va_start < *start)
*start = va->va_start;
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
list_add_tail(>purge_list, );
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
}
}
rcu_read_unlock();

v2: add more commit description from Eric

[eduma...@google.com: add more commit description]
Reported-by: Richard Yao 
Acked-by: Eric Dumazet 
Signed-off-by: Joonsoo Kim 

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f64632b..fdbb116 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2690,14 +2690,14 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 
prev_end = VMALLOC_START;
 
-   spin_lock(_area_lock);
+   rcu_read_lock();
 
if (list_empty(_area_list)) {
vmi->largest_chunk = VMALLOC_TOTAL;
goto out;
}
 
-   list_for_each_entry(va, _area_list, list) {
+   list_for_each_entry_rcu(va, _area_list, list) {
unsigned long addr = va->va_start;
 
/*
@@ -2724,7 +2724,7 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
vmi->largest_chunk = VMALLOC_END - prev_end;
 
 out:
-   spin_unlock(_area_lock);
+   rcu_read_unlock();
 }
 #endif
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 07/16 v3] Init Workload Consolidation flags in sched_domain

2014-06-10 Thread Yuyang Du

On Tue, Jun 10, 2014 at 12:52:06PM +0100, Dietmar Eggemann wrote:

Hi Dietmar,

> Not in this sense but there is no functionality in the scheduler right
> now to check constantly if an sd flag has been set/unset via sysctl.

Sorry, I still don't understand. There are many "if (sd->flags & SD_XXX)"
in fair.c. What does it mean to you?

Probably you mean the SD_XX should be fixed in init and never changed via sysctl
thereafter. Ah... I don't know about this...

Overall, I think I should come up with a better way to implement the 
SD_WORKLOAD_CONSOLIDATION
policy (enabled or disabled) in load balancing (as is also pointed out by 
PeterZ).
But I just don't see the current implementation is any particular different than
any other SD_XX's.

Have you tried it on your platform?

Thanks a lot,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/10] mm, compaction: remember position within pageblock in free pages scanner

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:17AM +0200, Vlastimil Babka wrote:
> Unlike the migration scanner, the free scanner remembers the beginning of the
> last scanned pageblock in cc->free_pfn. It might be therefore rescanning pages
> uselessly when called several times during single compaction. This might have
> been useful when pages were returned to the buddy allocator after a failed
> migration, but this is no longer the case.
> 
> This patch changes the meaning of cc->free_pfn so that if it points to a
> middle of a pageblock, that pageblock is scanned only from cc->free_pfn to the
> end. isolate_freepages_block() will record the pfn of the last page it looked
> at, which is then used to update cc->free_pfn.
> 
> In the mmtests stress-highalloc benchmark, this has resulted in lowering the
> ratio between pages scanned by both scanners, from 2.5 free pages per migrate
> page, to 2.25 free pages per migrate page, without affecting success rates.
> 
> Signed-off-by: Vlastimil Babka 
Reviewed-by: Minchan Kim 

Below is a nitpick.

> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> Cc: David Rientjes 
> ---
>  mm/compaction.c | 33 -
>  1 file changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 83f72bd..58dfaaa 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -297,7 +297,7 @@ static bool suitable_migration_target(struct page *page)
>   * (even though it may still end up isolating some pages).
>   */
>  static unsigned long isolate_freepages_block(struct compact_control *cc,
> - unsigned long blockpfn,
> + unsigned long *start_pfn,
>   unsigned long end_pfn,
>   struct list_head *freelist,
>   bool strict)
> @@ -306,6 +306,7 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   struct page *cursor, *valid_page = NULL;
>   unsigned long flags;
>   bool locked = false;
> + unsigned long blockpfn = *start_pfn;
>  
>   cursor = pfn_to_page(blockpfn);
>  
> @@ -314,6 +315,9 @@ static unsigned long isolate_freepages_block(struct 
> compact_control *cc,
>   int isolated, i;
>   struct page *page = cursor;
>  
> + /* Record how far we have got within the block */
> + *start_pfn = blockpfn;
> +

Couldn't we move this out of the loop for just one store?

>   /*
>* Periodically drop the lock (if held) regardless of its
>* contention, to give chance to IRQs. Abort async compaction
> @@ -424,6 +428,9 @@ isolate_freepages_range(struct compact_control *cc,
>   LIST_HEAD(freelist);
>  
>   for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
> + /* Protect pfn from changing by isolate_freepages_block */
> + unsigned long isolate_start_pfn = pfn;
> +
>   if (!pfn_valid(pfn) || cc->zone != page_zone(pfn_to_page(pfn)))
>   break;
>  
> @@ -434,8 +441,8 @@ isolate_freepages_range(struct compact_control *cc,
>   block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
>   block_end_pfn = min(block_end_pfn, end_pfn);
>  
> - isolated = isolate_freepages_block(cc, pfn, block_end_pfn,
> -, true);
> + isolated = isolate_freepages_block(cc, _start_pfn,
> + block_end_pfn, , true);
>  
>   /*
>* In strict mode, isolate_freepages_block() returns 0 if
> @@ -774,6 +781,7 @@ static void isolate_freepages(struct zone *zone,
>   block_end_pfn = block_start_pfn,
>   block_start_pfn -= pageblock_nr_pages) {
>   unsigned long isolated;
> + unsigned long isolate_start_pfn;
>  
>   /*
>* This can iterate a massively long zone without finding any
> @@ -807,12 +815,27 @@ static void isolate_freepages(struct zone *zone,
>   continue;
>  
>   /* Found a block suitable for isolating free pages from */
> - cc->free_pfn = block_start_pfn;
> - isolated = isolate_freepages_block(cc, block_start_pfn,
> + isolate_start_pfn = block_start_pfn;
> +
> + /*
> +  * If we are restarting the free scanner in this block, do not
> +  * rescan the beginning of the block
> +  */
> + if (cc->free_pfn < block_end_pfn)
> + isolate_start_pfn = cc->free_pfn;
> +
> + isolated = isolate_freepages_block(cc, _start_pfn,
>   block_end_pfn, freelist, false);
>   nr_freepages +=

Re: [PATCH] mm/vmscan.c: avoid recording the original scan targets in shrink_lruvec()

2014-06-10 Thread Chen Yucong

On Tue, 2014-06-10 at 16:33 -0700, Andrew Morton wrote:
> On Mon,  9 Jun 2014 21:27:16 +0800 Chen Yucong  wrote:
> 
> > Via https://lkml.org/lkml/2013/4/10/334 , we can find that recording the
> > original scan targets introduces extra 40 bytes on the stack. This patch
> > is able to avoid this situation and the call to memcpy(). At the same time,
> > it does not change the relative design idea.
> > 
> > ratio = original_nr_file / original_nr_anon;
> > 
> > If (nr_file > nr_anon), then ratio = (nr_file - x) / nr_anon.
> >  x = nr_file - ratio * nr_anon;
> > 
> > if (nr_file <= nr_anon), then ratio = nr_file / (nr_anon - x).
> >  x = nr_anon - nr_file / ratio;
> > 
> > ...
> >
> 
> Are you sure this is an equivalent-to-before change?  If so, then I
> can't immediately see why :(
> 
The relative design idea is to keep

   ratio
== scan_target[anon] : scan_target[file]
  == really_scanned_num[anon] : really_scanned_num[file]

The original implementation is 
   ratio 
== (scan_target[anon] * percentage_anon) / 
   (scan_target[file] * percentage_file) 

To keep the original ratio, percentage_anon should equal to
percentage_file. In other word, we need to calculate the difference
value between percentage_anon and percentage_file, we also have to
record the original scan targets for this.

Instead, we can calculate the *ratio* at the beginning of
shrink_lruvec(). As a result, this can avoid introducing the extra 40
bytes.

In short, we have the same goal: keep the same *ratio* from beginning to
end.

thx!
cyc
 
   

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drivers/char/random.c: more ruminations

2014-06-10 Thread Theodore Ts'o

On Tue, Jun 10, 2014 at 08:10:03PM -0400, George Spelvin wrote:
> What I wanted to do was eliminate that huge tmp buffer from
> _xfer_secondary_pool.  There's no good reason why it needs to be there.
> and several reasons for getting rid of it.

So have you actually instrumented the kernel to demonstrate that in
fact we have super deep stack call paths where the 128 bytes worth of
stack actually matters?

Premature optimization being the root of all evil (not to mention
wasting a lot of time of kernel developers) and all that

> I hadn't tested the patch when I mailed it to you (I prepared it in
> order to reply to your e-mail, and it's annoying to reboot the machine
> I'm composing an e-mail on), but I have since.  It works.

As an aside, I'd strongly suggest that you use kvm to do your kernel
testing.  It means you can do a lot more testing which is always a
good thing

> The *fundamental* race, as I see it, is the one between modifying pools
> and crediting entropy.
> 
> As I noted, you can't safely do the credit either before *or* after modifying
> the pool; you will always end up with the wrong answer in some situation.

Actually, it's **fine**.  That's because RNDADDENTROPY adds the
entropy to the input pool, which is has the limit flag set.  So we
will never pull more entropy than the pool is credited as having.
This means that race can't happen.  It ***is*** safe.

1)  Assume the entropy count starts at 10 bytes.

2)  Random writer mixes in 20 bytes of entropy into the entropy pool.

3)  Random extractor tries to extract 32 bytes of entropy.  Since the
entropy count is still is 10, it will only get 10 bytes.  (And if we
started with the entropy count started at zero, we wouldn't extract
any entropy at all.)

4) Random writer credit the entropy counter with the 20 bytes mixed in
step #2.

See? no problems!

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL REQUEST] watchdog - v3.16 merge window

2014-06-10 Thread Linus Torvalds

On Tue, Jun 10, 2014 at 1:05 PM, Wim Van Sebroeck  wrote:
> Hi Linus,
>
> Please pull from 'master' branch of
> git://www.linux-watchdog.org/linux-watchdog.git

gmail has decided that you are a spammer, and the only reason I saw
this email was that BenH had the same fate and emailed me from another
account to check.

You and BenH have something in common: no SPF records. That tends to
be one of the things that makes gmail unhappy, and look closer at
emails and make it more likely to consider things spam:

  Received-SPF: none (google.com: wi...@spo001.leaseweb.com does not
designate permitted sender hosts) client-ip=83.149.101.17;
  Authentication-Results: mx.google.com;
 spf=neutral (google.com: wi...@spo001.leaseweb.com does not
designate permitted sender hosts) smtp.mail=wi...@spo001.leaseweb.com

Anyway, no need to resend since I found it, but you might want to talk
to your email provider about your SPF records...

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the pci tree

2014-06-10 Thread Stephen Rothwell

Hi Bjorn,

After merging the pci tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:


ERROR: ".pci_try_set_mwi" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: ".pci_clear_mwi" [drivers/scsi/qla2xxx/qla2xxx.ko] undefined!
ERROR: ".pci_try_set_mwi" [drivers/scsi/lpfc/lpfc.ko] undefined!

Caused by commit 9259d755975f ("PCI: Move EXPORT_SYMBOL so it
immediately follows function/variable").
arch/powerpc/include/asm/pci.h defines PCI_DISABLE_MWI and there are
two version of those functions depending on the setting of that symbol.

I have used the pci tree from next-20140610 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

[for-next][PATCH 1/4] ring-buffer: Check if buffer exists before polling

2014-06-10 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

The per_cpu buffers are created one per possible CPU. But these do
not mean that those CPUs are online, nor do they even exist.

With the addition of the ring buffer polling, it assumes that the
caller polls on an existing buffer. But this is not the case if
the user reads trace_pipe from a CPU that does not exist, and this
causes the kernel to crash.

Simple fix is to check the cpu against buffer bitmask against to see
if the buffer was allocated or not and return -ENODEV if it is
not.

More updates were done to pass the -ENODEV back up to userspace.

Link: http://lkml.kernel.org/r/5393db61.6060...@oracle.com

Reported-by: Sasha Levin 
Cc: sta...@vger.kernel.org # 3.10+
Signed-off-by: Steven Rostedt 
---
 include/linux/ring_buffer.h |  2 +-
 kernel/trace/ring_buffer.c  |  5 -
 kernel/trace/trace.c| 22 --
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index d69cf637a15a..49a4d6f59108 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -97,7 +97,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, 
struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key);   \
 })
 
-void ring_buffer_wait(struct ring_buffer *buffer, int cpu);
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu);
 int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
  struct file *filp, poll_table *poll_table);
 
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c634868c2921..7c56c3d06943 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -543,7 +543,7 @@ static void rb_wake_up_waiters(struct irq_work *work)
  * as data is added to any of the @buffer's cpu buffers. Otherwise
  * it will wait for data to be added to a specific cpu buffer.
  */
-void ring_buffer_wait(struct ring_buffer *buffer, int cpu)
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu)
 {
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
@@ -557,6 +557,8 @@ void ring_buffer_wait(struct ring_buffer *buffer, int cpu)
if (cpu == RING_BUFFER_ALL_CPUS)
work = >irq_work;
else {
+   if (!cpumask_test_cpu(cpu, buffer->cpumask))
+   return -ENODEV;
cpu_buffer = buffer->buffers[cpu];
work = _buffer->irq_work;
}
@@ -591,6 +593,7 @@ void ring_buffer_wait(struct ring_buffer *buffer, int cpu)
schedule();
 
finish_wait(>waiters, );
+   return 0;
 }
 
 /**
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 16f7038d1f4d..56422f1decba 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1085,13 +1085,13 @@ update_max_tr_single(struct trace_array *tr, struct 
task_struct *tsk, int cpu)
 }
 #endif /* CONFIG_TRACER_MAX_TRACE */
 
-static void wait_on_pipe(struct trace_iterator *iter)
+static int wait_on_pipe(struct trace_iterator *iter)
 {
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
-   return;
+   return 0;
 
-   ring_buffer_wait(iter->trace_buffer->buffer, iter->cpu_file);
+   return ring_buffer_wait(iter->trace_buffer->buffer, iter->cpu_file);
 }
 
 #ifdef CONFIG_FTRACE_STARTUP_TEST
@@ -4378,6 +4378,7 @@ tracing_poll_pipe(struct file *filp, poll_table 
*poll_table)
 static int tracing_wait_pipe(struct file *filp)
 {
struct trace_iterator *iter = filp->private_data;
+   int ret;
 
while (trace_empty(iter)) {
 
@@ -4399,10 +4400,13 @@ static int tracing_wait_pipe(struct file *filp)
 
mutex_unlock(>mutex);
 
-   wait_on_pipe(iter);
+   ret = wait_on_pipe(iter);
 
mutex_lock(>mutex);
 
+   if (ret)
+   return ret;
+
if (signal_pending(current))
return -EINTR;
}
@@ -5327,8 +5331,12 @@ tracing_buffers_read(struct file *filp, char __user 
*ubuf,
goto out_unlock;
}
mutex_unlock(_types_lock);
-   wait_on_pipe(iter);
+   ret = wait_on_pipe(iter);
mutex_lock(_types_lock);
+   if (ret) {
+   size = ret;
+   goto out_unlock;
+   }
if (signal_pending(current)) {
size = -EINTR;
goto out_unlock;
@@ -5538,8 +5546,10 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
goto out;
}
mutex_unlock(_types_lock);
-   wait_on_pipe(iter);
+   ret = wait_on_pipe(iter);

[for-next][PATCH 3/4] tracing: Fix leak of per cpu max data in instances

2014-06-10 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

The freeing of an instance, if max data is configured, there will be
per cpu data structures created. But these are not freed when the instance
is deleted, which causes a memory leak.

A new helper function is added that frees the individual buffers within a
trace array, instead of duplicating the code. This way changes made for one
are applied to the other (normal buffer vs max buffer).

Link: http://lkml.kernel.org/r/87k38pbake@sejong.aot.lge.com

Reported-by: Namhyung Kim 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2b458c60e0da..384ede311717 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6242,22 +6242,25 @@ static int allocate_trace_buffers(struct trace_array 
*tr, int size)
return 0;
 }
 
+static void free_trace_buffer(struct trace_buffer *buf)
+{
+   if (buf->buffer) {
+   ring_buffer_free(buf->buffer);
+   buf->buffer = NULL;
+   free_percpu(buf->data);
+   buf->data = NULL;
+   }
+}
+
 static void free_trace_buffers(struct trace_array *tr)
 {
if (!tr)
return;
 
-   if (tr->trace_buffer.buffer) {
-   ring_buffer_free(tr->trace_buffer.buffer);
-   tr->trace_buffer.buffer = NULL;
-   free_percpu(tr->trace_buffer.data);
-   }
+   free_trace_buffer(>trace_buffer);
 
 #ifdef CONFIG_TRACER_MAX_TRACE
-   if (tr->max_buffer.buffer) {
-   ring_buffer_free(tr->max_buffer.buffer);
-   tr->max_buffer.buffer = NULL;
-   }
+   free_trace_buffer(>max_buffer);
 #endif
 }
 
-- 
2.0.0.rc2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 2/4] tracing: Cleanup saved_cmdlines_size changes

2014-06-10 Thread Steven Rostedt

From: Namhyung Kim 

The recent addition of saved_cmdlines_size file had some remaining
(minor - mostly coding style) issues.  Fix them by passing pointer
name to sizeof() and using scnprintf().

Link: 
http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhy...@kernel.org

Cc: Namhyung Kim 
Cc: Ingo Molnar 
Cc: Yoshihiro YUNOMAE 
Signed-off-by: Namhyung Kim 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 56422f1decba..2b458c60e0da 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1338,7 +1338,7 @@ static int trace_create_savedcmd(void)
 {
int ret;
 
-   savedcmd = kmalloc(sizeof(struct saved_cmdlines_buffer), GFP_KERNEL);
+   savedcmd = kmalloc(sizeof(*savedcmd), GFP_KERNEL);
if (!savedcmd)
return -ENOMEM;
 
@@ -3840,7 +3840,7 @@ tracing_saved_cmdlines_size_read(struct file *filp, char 
__user *ubuf,
int r;
 
arch_spin_lock(_cmdline_lock);
-   r = sprintf(buf, "%u\n", savedcmd->cmdline_num);
+   r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num);
arch_spin_unlock(_cmdline_lock);
 
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
@@ -3857,7 +3857,7 @@ static int tracing_resize_saved_cmdlines(unsigned int val)
 {
struct saved_cmdlines_buffer *s, *savedcmd_temp;
 
-   s = kmalloc(sizeof(struct saved_cmdlines_buffer), GFP_KERNEL);
+   s = kmalloc(sizeof(*s), GFP_KERNEL);
if (!s)
return -ENOMEM;
 
-- 
2.0.0.rc2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 4/4] tracing: Fix check of ftrace_trace_arrays list_empty() check

2014-06-10 Thread Steven Rostedt

From: "Steven Rostedt (Red Hat)" 

The check that tests if ftrace_trace_arrays is empty in
top_trace_array(), uses the .prev pointer:

  if (list_empty(ftrace_trace_arrays.prev))

instead of testing the variable itself:

  if (list_empty(_trace_arrays))

Although it is technically correct, it is awkward and confusing.
Use the proper method.

Link: http://lkml.kernel.org/r/87oay1bas8@sejong.aot.lge.com

Reported-by: Namhyung Kim 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9e82551dd566..9258f5a815db 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -252,7 +252,7 @@ static inline struct trace_array *top_trace_array(void)
 {
struct trace_array *tr;
 
-   if (list_empty(ftrace_trace_arrays.prev))
+   if (list_empty(_trace_arrays))
return NULL;
 
tr = list_entry(ftrace_trace_arrays.prev,
-- 
2.0.0.rc2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 0/4] tracing: Cleanups and fixes for 3.16

2014-06-10 Thread Steven Rostedt

Thanks to Namhyung Kim who pointed out some slight things with the code
that went to Linus already, I have some early cleanups.

There's also a bug that needs to be fixed in the ring buffer waiter
logic.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: da9c3413a27be5ba6f996e90495c836dd30b8841


Namhyung Kim (1)
  tracing: Cleanup saved_cmdlines_size changes

Steven Rostedt (Red Hat) (3)
  ring-buffer: Check if buffer exists before polling
  tracing: Fix leak of per cpu max data in instances
  tracing: Fix check of ftrace_trace_arrays list_empty() check


 include/linux/ring_buffer.h |2 -
 kernel/trace/ring_buffer.c  |5 +++-
 kernel/trace/trace.c|   49 +++-
 kernel/trace/trace.h|2 -
 4 files changed, 37 insertions(+), 21 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/10] mm, compaction: skip rechecks when lock was already held

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:16AM +0200, Vlastimil Babka wrote:
> Compaction scanners try to lock zone locks as late as possible by checking
> many page or pageblock properties opportunistically without lock and skipping
> them if not unsuitable. For pages that pass the initial checks, some 
> properties
> have to be checked again safely under lock. However, if the lock was already
> held from a previous iteration in the initial checks, the rechecks are
> unnecessary.
> 
> This patch therefore skips the rechecks when the lock was already held. This 
> is
> now possible to do, since we don't (potentially) drop and reacquire the lock
> between the initial checks and the safe rechecks anymore.
> 
> Signed-off-by: Vlastimil Babka 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/tlb_uv: Fixing some memory allocation failure in x86 UV

2014-06-10 Thread Zhouyi Zhou

Thanks for reviewing, I will work on a new version

On Wed, Jun 11, 2014 at 7:28 AM, Thomas Gleixner  wrote:
> On Tue, 10 Jun 2014, H. Peter Anvin wrote:
>
>> On 06/10/2014 12:35 AM, Zhouyi Zhou wrote:
>> > Fixing some memory allocation failure handling in x86 UV
>> >
>> > Signed-off-by: Zhouyi Zhou 
>>
>> Sorry, this really isn't enough description for this size of a patch.
>
> Correction: This is not a proper description for any patch.
>
> "some" is wrong to begin with.
>
> Either we fix a particular issue or we address all of them, but "some"
> means: We fixed a few, but we did not care about the rest.
>
> Aside of that, I agree. The changelog is disjunct from the patch
> itself.
>
> Thanks,
>
> tglx
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] cpufreq: intel_pstate: Trivial code cleanup

2014-06-10 Thread Joe Perches

On Wed, 2014-06-11 at 02:23 +0200, Rafael J. Wysocki wrote:
> On Tuesday, June 10, 2014 02:26:45 PM Joe Perches wrote:
> > c89 is 25 years ago now.
> Apparently, I'm old.

nah, just older than yesterday.
No doubt better too.

> > > Either way, in my opinion it's better to put the parens into the 
> > > expression
> > > in this particular case to clearly state the intention.
> > 
> > I don't think so.
> 
> Of course, you're free to disagree, but I guess you'll admit that
> a * b / c is generally different from b / c * a and if you see something
> like this it is hard to say at first sight whether or not this is intentional
> or an expression ordering bug.

true enough.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Change in security maintainer for a few weeks

2014-06-10 Thread Mimi Zohar

On Tue, 2014-06-10 at 13:23 -0700, Greg KH wrote: 
> On Tue, Jun 10, 2014 at 03:20:15PM +1000, James Morris wrote:
> > On Thu, 5 Jun 2014, Greg KH wrote:
> > 
> > > Hi all,
> > > 
> > > James has had to step back from doing kernel work for a few weeks, so
> > > I've offered to step up and handle the security patches to get shuttled
> > > to Linus for merging for a while.
> > > 
> > > I'll take his git tree on kernel.org and push those to Linus for
> > > 3.16-rc1, as those look like they have had proper testing in linux-next.
> > > 
> > > But there only seems to be 14 patches in there.  Are there pending
> > > patches that people have been sending and need to get in besides those?
> > > 
> > 
> > Thanks for stepping up.
> > 
> > I had to take urgent leave and was unable to access email until this week.
> > 
> > We really should have a co-maintainer for the security subsystem -- please 
> > feel free to nominate someone.
> > 
> > There are several good candidates, including Serge, Kees and Paul Moore.
> 
> As Serge has done it in the past when you are on vacation, I suggest he
> continue doing it.

As long as Serge agrees, that sounds good.

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/10] mm, compaction: periodically drop lock and restore IRQs in scanners

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:15AM +0200, Vlastimil Babka wrote:
> Compaction scanners regularly check for lock contention and need_resched()
> through the compact_checklock_irqsave() function. However, if there is no
> contention, the lock can be held and IRQ disabled for potentially long time.
> 
> This has been addressed by commit b2eef8c0d0 ("mm: compaction: minimise the
> time IRQs are disabled while isolating pages for migration") for the migration
> scanner. However, the refactoring done by commit 748446bb6b ("mm: compaction:
> acquire the zone->lru_lock as late as possible") has changed the conditions so
> that the lock is dropped only when there's contention on the lock or
> need_resched() is true. Also, need_resched() is checked only when the lock is
> already held. The comment "give a chance to irqs before checking need_resched"
> is therefore misleading, as IRQs remain disabled when the check is done.
> 
> This patch restores the behavior intended by commit b2eef8c0d0 and also tries
> to better balance and make more deterministic the time spent by checking for
> contention vs the time the scanners might run between the checks. It also
> avoids situations where checking has not been done often enough before. The
> result should be avoiding both too frequent and too infrequent contention
> checking, and especially the potentially long-running scans with IRQs disabled
> and no checking of need_resched() or for fatal signal pending, which can 
> happen
> when many consecutive pages or pageblocks fail the preliminary tests and do 
> not
> reach the later call site to compact_checklock_irqsave(), as explained below.
> 
> Before the patch:
> 
> In the migration scanner, compact_checklock_irqsave() was called each loop, if
> reached. If not reached, some lower-frequency checking could still be done if
> the lock was already held, but this would not result in aborting contended
> async compaction until reaching compact_checklock_irqsave() or end of
> pageblock. In the free scanner, it was similar but completely without the
> periodical checking, so lock can be potentially held until reaching the end of
> pageblock.
> 
> After the patch, in both scanners:
> 
> The periodical check is done as the first thing in the loop on each
> SWAP_CLUSTER_MAX aligned pfn, using the new compact_unlock_should_abort()
> function, which always unlocks the lock (if locked) and aborts async 
> compaction
> if scheduling is needed or someone else holds the lock. It also aborts any 
> type
> of compaction when a fatal signal is pending.
> 
> The compact_checklock_irqsave() function is replaced with a slightly different
> compact_trylock_irqsave(). The biggest difference is that the function is not
> called at all if the lock is already held. The periodical contention checking
> is left solely to compact_unlock_should_abort(). If the lock is not held, the
> function however does avoid contended run for async compaction by aborting 
> when
> trylock fails. Sync compaction does not use trylock.
> 
> Signed-off-by: Vlastimil Babka 

Generally, I like this but below a question.

> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> Cc: David Rientjes 
> ---
> V2: do not consider need/cond_resched() in compact_trylock_irqsave(); spelling
> remove inline: compaction.o size reduced
>  mm/compaction.c | 121 
> 
>  1 file changed, 79 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d37f4a8..e1a4283 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -185,54 +185,77 @@ static void update_pageblock_skip(struct 
> compact_control *cc,
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> -enum compact_contended should_release_lock(spinlock_t *lock)
> +/*
> + * Compaction requires the taking of some coarse locks that are potentially
> + * very heavily contended. For async compaction, back out if the lock cannot
> + * be taken immediately. For sync compaction, spin on the lock if needed.
> + *
> + * Returns true if the lock is held
> + * Returns false if the lock is not held and compaction should abort
> + */
> +static bool compact_trylock_irqsave(spinlock_t *lock,
> + unsigned long *flags, struct compact_control *cc)
>  {
> - if (need_resched())
> - return COMPACT_CONTENDED_SCHED;
> - else if (spin_is_contended(lock))
> - return COMPACT_CONTENDED_LOCK;
> - else
> - return COMPACT_CONTENDED_NONE;
> + if (cc->mode == MIGRATE_ASYNC) {
> + if (!spin_trylock_irqsave(lock, *flags)) {
> + cc->contended = COMPACT_CONTENDED_LOCK;
> + return false;
> + }
> + } else {
> + spin_lock_irqsave(lock, *flags);
> + }
> +
> + return true;
>  }
>  
>  /*
>   * Compaction requires the taking of some coarse locks that are

Re: [PATCH ftrace/core 2/2] ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict

2014-06-10 Thread Masami Hiramatsu

(2014/06/10 22:53), Namhyung Kim wrote:
> Hi Masami,
> 
> 2014-06-10 (화), 10:50 +, Masami Hiramatsu:
>> Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
>> ftrace users who may modify regs->ip to change the execution
>> path. This also adds the flag to kprobe_ftrace_ops, since
>> ftrace-based kprobes already modifies regs->ip. Thus, if
>> another user modifies the regs->ip on the same function entry,
>> one of them will be broken. So both should add IPMODIFY flag
>> and make sure that ftrace_set_filter_ip() succeeds.
>>
>> Note that currently conflicts of IPMODIFY are detected on the
>> filter hash. It does NOT care about the notrace hash. This means
>> that if you set filter hash all functions and notrace(mask)
>> some of them, the IPMODIFY flag will be applied to all
>> functions.
>>
> 
> [SNIP]
>> +static int __ftrace_hash_update_ipmodify(struct ftrace_ops *ops,
>> + struct ftrace_hash *old_hash,
>> + struct ftrace_hash *new_hash)
>> +{
>> +struct ftrace_page *pg;
>> +struct dyn_ftrace *rec, *end = NULL;
>> +int in_old, in_new;
>> +
>> +/* Only update if the ops has been registered */
>> +if (!(ops->flags & FTRACE_OPS_FL_ENABLED))
>> +return 0;
>> +
>> +if (!(ops->flags & FTRACE_OPS_FL_SAVE_REGS) ||
>> +!(ops->flags & FTRACE_OPS_FL_IPMODIFY))
>> +return 0;
>> +
>> +/* Update rec->flags */
>> +do_for_each_ftrace_rec(pg, rec) {
>> +/* We need to update only differences of filter_hash */
>> +in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip);
>> +in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip);
> 
> Why not use ftrace_hash_empty() here instead of checking NULL? 

Ah, a trick is here. Since an empty filter_hash must hit all, we can not
enable/disable filter_hash if we use ftrace_hash_empty() here.

To enabling the new_hash, old_hash must be EMPTY_HASH which means in_old
always be false. To disabling, new_hash is EMPTY_HASH too.
Please see ftrace_hash_ipmodify_enable/disable/update().

> Also
> return value of ftrace_lookup_ip is not boolean..  maybe you need to
> add !! or convert type of the in_{old,new} to bool.

Yeah, I see. And there is '||' (logical OR) which evaluates the result
as boolean. :)

> 
> 
>> +if (in_old == in_new)
>> +continue;
>> +
>> +if (in_new) {
>> +/* New entries must ensure no others are using it */
>> +if (rec->flags & FTRACE_FL_IPMODIFY)
>> +goto rollback;
>> +rec->flags |= FTRACE_FL_IPMODIFY;
>> +} else /* Removed entry */
>> +rec->flags &= ~FTRACE_FL_IPMODIFY;
>> +} while_for_each_ftrace_rec();
>> +
>> +return 0;
>> +
>> +rollback:
>> +end = rec;
>> +
>> +/* Roll back what we did above */
>> +do_for_each_ftrace_rec(pg, rec) {
>> +if (rec == end)
>> +goto err_out;
>> +
>> +in_old = !old_hash || ftrace_lookup_ip(old_hash, rec->ip);
>> +in_new = !new_hash || ftrace_lookup_ip(new_hash, rec->ip);
>> +if (in_old == in_new)
>> +continue;
>> +
>> +if (in_new)
>> +rec->flags &= ~FTRACE_FL_IPMODIFY;
>> +else
>> +rec->flags |= FTRACE_FL_IPMODIFY;
>> +} while_for_each_ftrace_rec();
>> +
>> +err_out:
>> +return -EBUSY;
>> +}
>> +
>> +static int ftrace_hash_ipmodify_enable(struct ftrace_ops *ops)
>> +{
>> +struct ftrace_hash *hash = ops->filter_hash;
>> +
>> +if (ftrace_hash_empty(hash))
>> +hash = NULL;
>> +
>> +return __ftrace_hash_update_ipmodify(ops, EMPTY_HASH, hash);
>> +}
> 
> Please see above comment.  You can pass an empty hash as is, or pass
> NULL as second arg.  The same goes to below...

As I said above, that is by design :). EMPTY_HASH means it hits nothing,
NULL means it hits all.

> 
> Thanks,
> Namhyung
> 
> 
>> +
>> +/* Disabling always succeeds */
>> +static void ftrace_hash_ipmodify_disable(struct ftrace_ops *ops)
>> +{
>> +struct ftrace_hash *hash = ops->filter_hash;
>> +
>> +if (ftrace_hash_empty(hash))
>> +hash = NULL;
>> +
>> +__ftrace_hash_update_ipmodify(ops, hash, EMPTY_HASH);
>> +}
>> +
>> +static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops,
>> +   struct ftrace_hash *new_hash)
>> +{
>> +struct ftrace_hash *old_hash = ops->filter_hash;
>> +
>> +if (ftrace_hash_empty(old_hash))
>> +old_hash = NULL;
>> +
>> +if (ftrace_hash_empty(new_hash))
>> +new_hash = NULL;
>> +
>> +return __ftrace_hash_update_ipmodify(ops, old_hash, new_hash);
>> +}
>> +
> 
> 
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail:

Re: [PATCH] of/platform: Fix microblaze build failure

2014-06-10 Thread Michal Simek

On 06/10/2014 08:59 PM, Guenter Roeck wrote:
> Commit bf5db2f (microblaze: Use generic device.h) removes the
> microblaze specific pdev_archdata and dma_mask.
> 
> At the same time, commit 591c1ee (of: configure the platform
> device dma parameters) initializes the just removed field.
> This causes all microblaze builds to fail.
> 
> Drop the unnecessary initialization.
> 
> Cc: Michal Simek 
> Signed-off-by: Guenter Roeck 
> ---
>  drivers/of/platform.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 6c48d73..500436f 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -166,10 +166,6 @@ static void of_dma_configure(struct platform_device 
> *pdev)
>   int ret;
>   struct device *dev = >dev;
>  
> -#if defined(CONFIG_MICROBLAZE)
> - pdev->archdata.dma_mask = 0xUL;
> -#endif
> -
>   /*
>* Set default dma-mask to 32 bit. Drivers are expected to setup
>* the correct supported dma_mask.
> 

Acked-by: Michal Simek 

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform




signature.asc
Description: OpenPGP digital signature

Re: [PATCH 0/4] KEYS: validate key trust with owner and builtin keys only

2014-06-10 Thread Mimi Zohar

On Tue, 2014-06-10 at 22:40 +0100, Matthew Garrett wrote: 
> On Wed, Jun 11, 2014 at 12:34:28AM +0300, Dmitry Kasatkin wrote:
> 
> > My statement is still valid. It is a hole...
> > 
> > To prevent the hole it should be explained that one might follow
> > certain instructions
> > to take ownership of your PC. Generate your own keys and remove MS and
> > Vendor ones...
> 
> The hole is that the system trusts keys that you don't trust. The 
> appropriate thing to do is to remove that trust from the entire system, 
> not just one layer of the system. If people gain the impression that 
> they can simply pass a kernel parameter and avoid trusting the vendor 
> keys, they'll be upset to discover that it's easily circumvented.

Assuming I remove all the keys I don't trust, there are still keys that
are trusted while booting, but are not necessary afterwards.  We should
be able to limit the scope of where and when keys are trusted.

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] asus-nb-wmi: set wapf=4 for ASUSTeK COMPUTER INC. X75VBP & X550CA

2014-06-10 Thread Matthew Garrett

Hm. Sorry, I thought I'd picked that one up. I'll send it in a couple of 
days.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.15 .. and continuation of merge window

2014-06-10 Thread Al Viro

On Mon, Jun 09, 2014 at 12:30:34PM +0900, J. R. Okajima wrote:
> 
> Linus Torvalds:
> > So I ended up doing an rc8 because I was a bit worried about some
> > last-minute dcache fixes, but it turns out that nobody seemed to even
> > notice those. We did have other issues during the week, though, so it
>   :::
> 
> I am afraid there is a problem in dcache. Please read
> http://marc.info/?l=linux-fsdevel=140214911608925=2

There is a problem, all right, but your fix doesn't really fix it - just
narrows the race window ;-/  I would really like to detach the bugger
as soon as __dentry_kill() removes it from the list of children, but
unfortunately NFS wants it to be still valid in ->d_iput() (BTW,
nfs_can_unlink() is doing something very odd -
parent = dget_parent(dentry);
if (parent == NULL)
goto out_free;
is pointless, since dget_parent() never returns NULL; what was that
check trying to accomplish?)

Your scenario isn't feasible as described, but something similar can
happen with *two* shrinkers racing - dirB might've ended up on one list,
with fileC looked up a bit later, ending up on another list right after
that.  So the problem is real; unfortunately, DCACHE_DENTRY_KILLED might
have appeared right after we'd dropped ->d_lock.

So I suspect that the right fix is a bit trickier - in addition to check
on the fast path (i.e. when trylock gets us the lock on parent), we need
to
* get rcu_read_lock() before dropping ->d_lock.
* check if dentry is already doomed right after taking rcu_read_lock();
if not, any value we might see in ->d_parent afterwards will point to object
not freed until we drop rcu_read_lock.

IOW, something like the delta below.  Comments?

PS: apologies for being MIA; caught some crap, spent the last week being
very unhappy ;-/  I'll send a pull request tomorrow morning.

diff --git a/fs/dcache.c b/fs/dcache.c
index be2bea8..65ec10f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -532,10 +532,16 @@ static inline struct dentry *lock_parent(struct dentry 
*dentry)
struct dentry *parent = dentry->d_parent;
if (IS_ROOT(dentry))
return NULL;
+   if (unlikely((int)dentry->d_lockref.count < 0))
+   return NULL;
if (likely(spin_trylock(>d_lock)))
return parent;
-   spin_unlock(>d_lock);
rcu_read_lock();
+   if (unlikely((int)dentry->d_lockref.count < 0)) {
+   rcu_read_unlock();
+   return NULL;
+   }
+   spin_unlock(>d_lock);
 again:
parent = ACCESS_ONCE(dentry->d_parent);
spin_lock(>d_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/10] mm, compaction: report compaction as contended only due to lock contention

2014-06-10 Thread Minchan Kim

On Mon, Jun 09, 2014 at 11:26:14AM +0200, Vlastimil Babka wrote:
> Async compaction aborts when it detects zone lock contention or need_resched()
> is true. David Rientjes has reported that in practice, most direct async
> compactions for THP allocation abort due to need_resched(). This means that a
> second direct compaction is never attempted, which might be OK for a page
> fault, but hugepaged is intended to attempt a sync compaction in such case and
> in these cases it won't.
> 
> This patch replaces "bool contended" in compact_control with an enum that
> distinguieshes between aborting due to need_resched() and aborting due to lock
> contention. This allows propagating the abort through all compaction functions
> as before, but declaring the direct compaction as contended only when lock
> contantion has been detected.
> 
> As a result, hugepaged will proceed with second sync compaction as intended,
> when the preceding async compaction aborted due to need_resched().

You said "second direct compaction is never attempted, which might be OK
for a page fault" and said "hugepagd is intented to attempt a sync compaction"
so I feel you want to handle khugepaged so special unlike other direct compact
(ex, page fault).

By this patch, direct compaction take care only lock contention, not 
rescheduling
so that pop questions.

Is it okay not to consider need_resched in direct compaction really?
We have taken care of it in direct reclaim path so why direct compaction is
so special?

Why does khugepaged give up easily if lock contention/need_resched happens?
khugepaged is important for success ratio as I read your description so IMO,
khugepaged should do synchronously without considering early bail out by
lock/rescheduling.

If it causes problems, user should increase 
scan_sleep_millisecs/alloc_sleep_millisecs,
which is exactly the knob for that cases.

So, my point is how about making khugepaged doing always dumb synchronous
compaction thorough PG_KHUGEPAGED or GFP_SYNC_TRANSHUGE?

> 
> Reported-by: David Rientjes 
> Signed-off-by: Vlastimil Babka 
> Cc: Minchan Kim 
> Cc: Mel Gorman 
> Cc: Joonsoo Kim 
> Cc: Michal Nazarewicz 
> Cc: Naoya Horiguchi 
> Cc: Christoph Lameter 
> Cc: Rik van Riel 
> ---
>  mm/compaction.c | 20 ++--
>  mm/internal.h   | 15 +++
>  2 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index b73b182..d37f4a8 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -185,9 +185,14 @@ static void update_pageblock_skip(struct compact_control 
> *cc,
>  }
>  #endif /* CONFIG_COMPACTION */
>  
> -static inline bool should_release_lock(spinlock_t *lock)
> +enum compact_contended should_release_lock(spinlock_t *lock)
>  {
> - return need_resched() || spin_is_contended(lock);
> + if (need_resched())
> + return COMPACT_CONTENDED_SCHED;
> + else if (spin_is_contended(lock))
> + return COMPACT_CONTENDED_LOCK;
> + else
> + return COMPACT_CONTENDED_NONE;
>  }
>  
>  /*
> @@ -202,7 +207,9 @@ static inline bool should_release_lock(spinlock_t *lock)
>  static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
> bool locked, struct compact_control *cc)
>  {
> - if (should_release_lock(lock)) {
> + enum compact_contended contended = should_release_lock(lock);
> +
> + if (contended) {
>   if (locked) {
>   spin_unlock_irqrestore(lock, *flags);
>   locked = false;
> @@ -210,7 +217,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, 
> unsigned long *flags,
>  
>   /* async aborts if taking too long or contended */
>   if (cc->mode == MIGRATE_ASYNC) {
> - cc->contended = true;
> + cc->contended = contended;
>   return false;
>   }
>  
> @@ -236,7 +243,7 @@ static inline bool compact_should_abort(struct 
> compact_control *cc)
>   /* async compaction aborts if contended */
>   if (need_resched()) {
>   if (cc->mode == MIGRATE_ASYNC) {
> - cc->contended = true;
> + cc->contended = COMPACT_CONTENDED_SCHED;
>   return true;
>   }
>  
> @@ -1095,7 +1102,8 @@ static unsigned long compact_zone_order(struct zone 
> *zone, int order,
>   VM_BUG_ON(!list_empty());
>   VM_BUG_ON(!list_empty());
>  
> - *contended = cc.contended;
> + /* We only signal lock contention back to the allocator */
> + *contended = cc.contended == COMPACT_CONTENDED_LOCK;
>   return ret;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 7f22a11f..4659e8e 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -117,6 +117,13 @@ extern int user_min_free_kbytes;
>  
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>  
> +/* Used to signal whether compaction

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1720 matches

Mail list logo