[PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with

2015-05-01 Thread Hou Pengyang
For arm & arm64, when tracing with tracepoint events, the IP and cpsr 
are set to 0, preventing the perf code parsing the callchain and 
resolving the symbols correctly. 

These two patches fix this by implementing perf_arch_fetch_caller_regs
for arm and arm64, which fills several necessary register info for 
callchain unwinding and symbol resolving.

v2->v3:
 - split the original patch into two, one for arm and the other arm64;
 - change '|=' to '=' when setting cpsr. 

Hou Pengyang (2):
  arm: perf: Fix callchain parse error with kernel tracepoint events
  arm64: perf: Fix callchain parse error with kernel tracepoint events

 arch/arm/include/asm/perf_event.h   | 7 +++
 arch/arm64/include/asm/perf_event.h | 7 +++
 2 files changed, 14 insertions(+)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events

2015-05-01 Thread Hou Pengyang
For ARM, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.006 MB perf.data ]
 ./perf report -f
Samples: 5  of event 'sched:sched_switch', Event count (approx.): 5 
Children  SelfCommand  Shared Object Symbol
100.00%   100.00%  ls   [unknown] [.] 

The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and cpsr.

With this patch, callchain can be parsed correctly as :

   .
-  100.00%   100.00%  ls   [kernel.kallsyms]  [k] __sched_text_start 
   + __sched_text_start 
+   20.00% 0.00%  ls   libc-2.18.so   [.] _dl_addr 
+   20.00% 0.00%  ls   libc-2.18.so   [.] write
   .

Jean Pihet found this in ARM and come up with a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

This patch rewrite Jean's patch in C.

Signed-off-by: Hou Pengyang 
---
 arch/arm/include/asm/perf_event.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/perf_event.h 
b/arch/arm/include/asm/perf_event.h
index d9cf138..4f9dec4 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,4 +19,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)  perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+   (regs)->ARM_pc = (__ip); \
+   (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+   (regs)->ARM_sp = current_stack_pointer; \
+   (regs)->ARM_cpsr = SVC_MODE; \
+}
+
 #endif /* __ARM_PERF_EVENT_H__ */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events

2015-05-01 Thread Hou Pengyang
For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
Children  SelfCommand  Shared Object Symbol
100.00%   100.00%  ls   [unknown] [.] 

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

 ..
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] vfs_symlink
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] follow_down
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] pfkey_get
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] do_execveat_common.isra.33
-2.63% 0.00%  ls   [kernel.kallsyms]  [k] pfkey_send_policy_notify
 pfkey_send_policy_notify
 pfkey_get
 v9fs_vfs_rename
 page_follow_link_light
 link_path_walk
 el0_svc_naked
...

Signed-off-by: Hou Pengyang 
---
 arch/arm64/include/asm/perf_event.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)  perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+   (regs)->ARM_pc = (__ip);\
+   (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+   (regs)->ARM_sp = current_stack_pointer; \
+   (regs)->ARM_cpsr = PSR_MODE_EL1h;   \
+}
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 1/2] arm: perf: Fix callchain parse error with kernel tracepoint events

2015-05-01 Thread Hou Pengyang
For ARM, when tracing with tracepoint events, the IP and cpsr are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.006 MB perf.data ]
 ./perf report -f
Samples: 5  of event 'sched:sched_switch', Event count (approx.): 5 
Children  SelfCommand  Shared Object Symbol
100.00%   100.00%  ls   [unknown] [.] 

The fix is to implement perf_arch_fetch_caller_regs for ARM, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and cpsr.

With this patch, callchain can be parsed correctly as :

   .
-  100.00%   100.00%  ls   [kernel.kallsyms]  [k] __sched_text_start 
   + __sched_text_start 
+   20.00% 0.00%  ls   libc-2.18.so   [.] _dl_addr 
+   20.00% 0.00%  ls   libc-2.18.so   [.] write
   .

Jean Pihet found this in ARM and come up with a patch:
http://thread.gmane.org/gmane.linux.kernel/1734283/focus=1734280

This patch rewrite Jean's patch in C.

Signed-off-by: Hou Pengyang 
---
 arch/arm/include/asm/perf_event.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/perf_event.h 
b/arch/arm/include/asm/perf_event.h
index d9cf138..4f9dec4 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -19,4 +19,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)  perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+   (regs)->ARM_pc = (__ip); \
+   (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+   (regs)->ARM_sp = current_stack_pointer; \
+   (regs)->ARM_cpsr = SVC_MODE; \
+}
+
 #endif /* __ARM_PERF_EVENT_H__ */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] arm64: perf: Fix callchain parse error with kernel tracepoint events

2015-05-01 Thread Hou Pengyang
For ARM64, when tracing with tracepoint events, the IP and pstate are set
to 0, preventing the perf code parsing the callchain and resolving the
symbols correctly.

 ./perf record -e sched:sched_switch -g --call-graph dwarf ls
[ perf record: Captured and wrote 0.146 MB perf.data ]
 ./perf report -f
Samples: 194  of event 'sched:sched_switch', Event count (approx.): 194
Children  SelfCommand  Shared Object Symbol
100.00%   100.00%  ls   [unknown] [.] 

The fix is to implement perf_arch_fetch_caller_regs for ARM64, which fills
several necessary registers used for callchain unwinding, including pc,sp,
fp and spsr .

With this patch, callchain can be parsed correctly as follows:

 ..
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] vfs_symlink
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] follow_down
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] pfkey_get
+2.63% 0.00%  ls   [kernel.kallsyms]  [k] do_execveat_common.isra.33
-2.63% 0.00%  ls   [kernel.kallsyms]  [k] pfkey_send_policy_notify
 pfkey_send_policy_notify
 pfkey_get
 v9fs_vfs_rename
 page_follow_link_light
 link_path_walk
 el0_svc_naked
...

Signed-off-by: Hou Pengyang 
---
 arch/arm64/include/asm/perf_event.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/perf_event.h 
b/arch/arm64/include/asm/perf_event.h
index d26d1d5..cc92021 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -24,4 +24,11 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)  perf_misc_flags(regs)
 #endif
 
+#define perf_arch_fetch_caller_regs(regs, __ip) { \
+   (regs)->ARM_pc = (__ip);\
+   (regs)->ARM_fp = (unsigned long) __builtin_frame_address(0); \
+   (regs)->ARM_sp = current_stack_pointer; \
+   (regs)->ARM_cpsr = PSR_MODE_EL1h;   \
+}
+
 #endif
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 0/2] arm & arm64: perf: Fix callchain parse error with

2015-05-01 Thread Hou Pengyang
For arm & arm64, when tracing with tracepoint events, the IP and cpsr 
are set to 0, preventing the perf code parsing the callchain and 
resolving the symbols correctly. 

These two patches fix this by implementing perf_arch_fetch_caller_regs
for arm and arm64, which fills several necessary register info for 
callchain unwinding and symbol resolving.

v2->v3:
 - split the original patch into two, one for arm and the other arm64;
 - change '|=' to '=' when setting cpsr. 

Hou Pengyang (2):
  arm: perf: Fix callchain parse error with kernel tracepoint events
  arm64: perf: Fix callchain parse error with kernel tracepoint events

 arch/arm/include/asm/perf_event.h   | 7 +++
 arch/arm64/include/asm/perf_event.h | 7 +++
 2 files changed, 14 insertions(+)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] rtc: s3c: Integrate Exynos3250 into S3C6410

2015-05-01 Thread Krzysztof Kozlowski
From: Krzysztof Kozlowski 

There are now no differences between RTC on Exynos3250 and S3C6410.
Merge everything into one so duplicated code could be removed.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Chanwoo Choi 
Acked-by: Alexandre Belloni 
Reviewed-by: Javier Martinez Canillas 
Signed-off-by: Krzysztof Kozlowski 

---
Changes since v1:
1. Add acks and reviews.
---
 drivers/rtc/rtc-s3c.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/rtc/rtc-s3c.c b/drivers/rtc/rtc-s3c.c
index 76cbad7a99d3..a0f832362199 100644
--- a/drivers/rtc/rtc-s3c.c
+++ b/drivers/rtc/rtc-s3c.c
@@ -772,18 +772,6 @@ static struct s3c_rtc_data const s3c6410_rtc_data = {
.disable= s3c6410_rtc_disable,
 };
 
-static struct s3c_rtc_data const exynos3250_rtc_data = {
-   .max_user_freq  = 32768,
-   .needs_src_clk  = true,
-   .irq_handler= s3c6410_rtc_irq,
-   .set_freq   = s3c6410_rtc_setfreq,
-   .enable_tick= s3c6410_rtc_enable_tick,
-   .save_tick_cnt  = s3c6410_rtc_save_tick_cnt,
-   .restore_tick_cnt   = s3c6410_rtc_restore_tick_cnt,
-   .enable = s3c24xx_rtc_enable,
-   .disable= s3c6410_rtc_disable,
-};
-
 static const struct of_device_id s3c_rtc_dt_match[] = {
{
.compatible = "samsung,s3c2410-rtc",
@@ -799,7 +787,7 @@ static const struct of_device_id s3c_rtc_dt_match[] = {
.data = (void *)_rtc_data,
}, {
.compatible = "samsung,exynos3250-rtc",
-   .data = (void *)_rtc_data,
+   .data = (void *)_rtc_data,
},
{ /* sentinel */ },
 };
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry

2015-05-01 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Fri, May 1, 2015 at 12:11 PM, Rik van Riel  wrote:
> > On 05/01/2015 02:40 PM, Ingo Molnar wrote:
> >
> >> Or we could do that in the syscall path with a single store of a
> >> constant flag to a location in the task struct. We have a number of
> >> natural flags that get written on syscall entry, such as:
> >>
> >> pushq_cfi $__USER_DS/* pt_regs->ss */
> >>
> >> That goes to a constant location on the kernel stack. On return from
> >> system calls we could write 0 to that location.
> 
> Huh?  IRET with zero there will fault, and we can't guarantee that 
> all syscalls return using sysret. [...]

So IRET is a slowpath - I was thinking about the fastpath mainly.

> [...]  Also, we'd have to audit all the entries, and 
> system_call_after_swapgs currently enables interrupts early enough 
> that an interrupt before all the pushes will do unpredictable things 
> to pt_regs.

An irq hardware frame won't push zero to that selector value, right? 
That's the only bad thing that would confuse the code.

> We could further abuse orig_ax, but that would require a lot of 
> auditing.  Honestly, though, I think keeping a flag in an 
> otherwise-hot cache line is a better bet. [...]

That would work too, at the cost of one more instruction, as now we'd 
have to both set and clear it.

> [...]  Also, making it per-cpu instead of per-task will probably be 
> easier on the RCU code, since otherwise the RCU code will have to do 
> some kind of synchronization (even if it's a wait-free probe of the 
> rq lock or similar) to figure out which pt_regs to look at.

So the synchronize_rcu() part is an even slower slow path, in 
comparison with system call entry overhead.

But yes, safely accessing the remote task is a complication, but with 
such a scheme we cannot avoid it, we'd still have to set TIF_NOHZ. So 
even if we do:

> If we went that route, I'd advocate sticking the flag in tss->sp1. 
> That cacheline is unconditionally read on kernel entry already, and 
> sp1 is available in tip:x86/asm (and maybe even in Linus' tree -- I 
> lost track). [1]
> 
> Alternatively, I'd suggest that we actually add a whole new word to 
> pt_regs.

... we'd still have to set TIF_NOHZ and are back to square one in 
terms of race complexity.

But it's not overly complex: by taking the remote CPU's rq-lock from 
synchronize_rcu() we could get a stable pointer to the currently 
executing task.

And if we do that, we might as well use the opportunity and take a 
look at pt_regs as well - this is why sticking it into pt_regs might 
be better.

So I'd:

  - enlarge pt_regs by a word and stick the flag there (this 
allocation is essentially free)

  - update the word from entry/exit

  - synchronize_rcu() avoids having to send an IPI by taking a 
peak at rq->curr's pt_regs::flag, and if:

 - the flag is 0 then it has observed a quiescent state.

 - the flag is 1, then it would set TIF_NOHZ and wait for a 
   completion from a TIF_NOHZ callback.

synchronize_rcu() often involves waiting for (timer tick driven) grace 
periods anyway, so this is a relatively fast solution - and it would 
limit the overhead to 2 instructions.

On systems that have zero nohz-full CPUs (i.e. !context_tracking_enabled)
we could patch out those two instructions into NOPs, which would be
eliminated in the decoder.

Regarding the user/kernel execution time split measurement:

1) the same flag could be used to sample a remote CPU's statistics 
from another CPU and update the stats in the currently executing task. 
As long as there's at least one non-nohz-full CPU, this would work. Or 
are there systems were all CPUs are nohz-full?

2) Alternatively we could just drive user/kernel split statistics from 
context switches, which would be inaccurate if the workload is 
SCHED_FIFO that only rarely context switches.

How does this sound?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] netconsole: implement extended console support

2015-05-01 Thread Tetsuo Handa
Tejun Heo wrote:
> +If a message doesn't fit in 1000 bytes, the message is split into
> +multiple fragments by netconsole. These fragments are transmitted with
> +"ncfrag" header field added.
> +
> + ncfrag=/
> +
> +For example,
> +
> + 6,416,1758426,-,ncfrag=0/33;the first chunk,
> + 6,416,1758426,-,ncfrag=16/33;the second chunk.
> +

Wouldn't total-bytes > 1000 than 33 in this example?

> +/**
> + * send_ext_msg_udp - send extended log message to target
> + * @nt: target to send message to
> + * @msg: extended log message to send
> + * @msg_len: length of message
> + *
> + * Transfer extended log @msg to @nt.  If @msg is longer than
> + * MAX_PRINT_CHUNK, it'll be split and transmitted in multiple chunks with
> + * ncfrag header field added to identify them.
> + */
> +static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
> +  int msg_len)
> +{
> + static char buf[MAX_PRINT_CHUNK];
> + const int max_extra_len = sizeof(",ncfrag=/");
> + const char *header, *body;
> + int header_len = msg_len, body_len = 0;
> + int chunk_len, nr_chunks, i;
> +
> + if (msg_len <= MAX_PRINT_CHUNK) {
> + netpoll_send_udp(>np, msg, msg_len);
> + return;
> + }
> +
> + /* need to insert extra header fields, detect header and body */
> + header = msg;
> + body = memchr(msg, ';', msg_len);
> + if (body) {
> + header_len = body - header;
> + body_len = msg_len - header_len - 1;
> + body++;
> + }
> +
> + chunk_len = MAX_PRINT_CHUNK - header_len - max_extra_len;
> + if (WARN_ON_ONCE(chunk_len <= 0))
> + return;

This path is executed only when msg_len > MAX_PRINT_CHUNK.
And since header_len == msg_len if body == NULL, chunk_len <= 0 is true.
We will hit this WARN_ON_ONCE() if memchr(msg, ';', msg_len) == NULL
which will fail to send the message. Is this what you want?

> +static void write_ext_msg(struct console *con, const char *msg,
> +   unsigned int len)
> +{
> + struct netconsole_target *nt;
> + unsigned long flags;
> +
> + if ((oops_only && !oops_in_progress) || list_empty(_list))
> + return;
> +
> + spin_lock_irqsave(_list_lock, flags);
> + list_for_each_entry(nt, _list, list)

Don't you need to call netconsole_target_get() here

> + if (nt->extended && nt->enabled && netif_running(nt->np.dev))
> + send_ext_msg_udp(nt, msg, len);

and netconsole_target_put() here as with write_msg()?

> + spin_unlock_irqrestore(_list_lock, flags);
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MODSIGN: Change default key details [ver #2]

2015-05-01 Thread Linus Torvalds
On Fri, May 1, 2015 at 2:41 PM, Abelardo Ricart III  wrote:
>
> Here's my two-line patch strictly defining the build order, for your perusal.

Ok, so this looks possible and sounds like it could explain the issues.

But I'd like somebody who is much more familiar with these kinds of
subtleties in 'make' to take anothe rlook and ack it. Because I had
personally never even heard (much less used) about these magical GNU
make "order-only prerequisites". Live and learn.

> -signing_key.priv signing_key.x509: x509.genkey
> +signing_key.priv signing_key.x509: | x509.genkey
> +   $(warning *** X.509 module signing key pair not found in root of 
> source tree ***)

So we shouldn't warn about this. The "generate random key" should be
the normal action for just about everybody but actual preduction
vendor builds. It's definitely not an error condition.

But that ": |" syntax is interesting. I quick grep does show that we
do have a few previous uses, so I guess we really *do* use just about
every possible feature of GNU make even if I wasn't aware of this
one..

The "generate random key" does seem to be a similar "prep" phase as
the __dtbs_install_prep thing we do in the dtb install.

Adding Michal Marek to the cc, since I want an Ack from somebody who
knows the details of GNU make more than I do.  Anybody else who is a
makefile God?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drm/exynos: Fix build breakage on !DRM_EXYNOS_FIMD

2015-05-01 Thread Krzysztof Kozlowski
Selecting CONFIG_FB_S3C disables CONFIG_DRM_EXYNOS_FIMD leading to build
error:

drivers/built-in.o: In function `exynos_dp_dpms':
binder.c:(.text+0xd6a840): undefined reference to `fimd_dp_clock_enable'
binder.c:(.text+0xd6ab54): undefined reference to `fimd_dp_clock_enable'

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/gpu/drm/exynos/exynos_drm_fimd.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.h 
b/drivers/gpu/drm/exynos/exynos_drm_fimd.h
index b4fcaa568456..db67f3d9786d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fimd.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.h
@@ -10,6 +10,10 @@
 #ifndef _EXYNOS_DRM_FIMD_H_
 #define _EXYNOS_DRM_FIMD_H_
 
+#ifdef CONFIG_DRM_EXYNOS_FIMD
 extern void fimd_dp_clock_enable(struct exynos_drm_crtc *crtc, bool enable);
+#else
+static inline void fimd_dp_clock_enable(struct exynos_drm_crtc *crtc, bool 
enable) {};
+#endif
 
 #endif /* _EXYNOS_DRM_FIMD_H_ */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry

2015-05-01 Thread Mike Galbraith
On Fri, 2015-05-01 at 14:05 -0400, Rik van Riel wrote:
> On 05/01/2015 12:34 PM, Ingo Molnar wrote:
> > 
> > * Rik van Riel  wrote:
> > 
> >>> I can understand people running hard-RT workloads not wanting to 
> >>> see the overhead of a timer tick or a scheduler tick with variable 
> >>> (and occasionally heavy) work done in IRQ context, but the jitter 
> >>> caused by a single trivial IPI with constant work should be very, 
> >>> very low and constant.
> >>
> >> Not if the realtime workload is running inside a KVM guest.
> > 
> > I don't buy this:
> > 
> >> At that point an IPI, either on the host or in the guest, involves a 
> >> full VMEXIT & VMENTER cycle.
> > 
> > So a full VMEXIT/VMENTER costs how much, 2000 cycles? That's around 1 
> > usec on recent hardware, and I bet it will get better with time.
> > 
> > I'm not aware of any hard-RT workload that cannot take 1 usec 
> > latencies.
> 
> Now think about doing this kind of IPI from inside a guest,
> to another VCPU on the same guest.
> 
> Now you are looking at VMEXIT/VMENTER on the first VCPU,
> plus the cost of the IPI on the host, plus the cost of
> the emulation layer, plus VMEXIT/VMENTER on the second
> VCPU to trigger the IPI work, and possibly a second
> VMEXIT/VMENTER for IPI completion.
> 
> I suspect it would be better to do RCU callback offload
> in some other way.

I don't get it.  How the heck do people manage to talk about realtime in
virtual boxen, and not at least crack a smile.  Real, virtual, real,
virtual... what's wrong with this picture?

Why is virtual realtime not an oxymoron?

I did that for grins once, and it was either really funny, or really
sad, not sure which... but it did not look really really useful.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.19 016/175] ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

2015-05-01 Thread Mike Galbraith
On Fri, 2015-05-01 at 22:52 +0200, Greg Kroah-Hartman wrote:
> On Fri, May 01, 2015 at 03:00:00PM -0500, Josh Hunt wrote:
> > On Wed, Mar 4, 2015 at 12:13 AM, Greg Kroah-Hartman
> >  wrote:
> > > 3.19-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > >
> > > --
> > >
> > > From: Calvin Owens 
> > >
> > > commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
> > >
> > > While debugging an issue with excessive softirq usage, I encountered the
> > > following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
> > > infrastructure"):
> > >
> > > [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
> > >
> > > ...but despite this note, the patch still calls RCU with IRQs disabled.
> > >
> > > This seemingly innocuous change caused a significant regression in softirq
> > > CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
> > > introducing 0.01% packet loss, the softirq usage would jump to around 25%,
> > > spiking as high as 50%. Before the change, the usage would never exceed 
> > > 5%.
> > >
> > > Moving the call to rcu_note_context_switch() after the cond_sched() call,
> > > as it was originally before the hotplug patch, completely eliminated this
> > > problem.
> > >
> > > Signed-off-by: Calvin Owens 
> > > Signed-off-by: Paul E. McKenney 
> > > Signed-off-by: Greg Kroah-Hartman 
> > >
> > > ---
> > >  kernel/softirq.c |6 +-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > --- a/kernel/softirq.c
> > > +++ b/kernel/softirq.c
> > > @@ -656,9 +656,13 @@ static void run_ksoftirqd(unsigned int c
> > >  * in the task stack here.
> > >  */
> > > __do_softirq();
> > > -   rcu_note_context_switch();
> > > local_irq_enable();
> > > cond_resched();
> > > +
> > > +   preempt_disable();
> > > +   rcu_note_context_switch();
> > > +   preempt_enable();
> > > +
> > > return;
> > > }
> > > local_irq_enable();
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > Sorry for the delay in noticing this, but should this be applied to
> > 3.14-stable as well?
> 
> Why should it?

The regression inducing change arrived in 3.7-rc1.

>   And odds are, if I didn't apply it there, it was either
> because it didn't apply, or it broke the build.

a. [x]  rcu_note_context_switch(cpu) -> rcu_note_context_switch()

>From 28423ad283d5348793b0c45cc9b1af058e776fd6 Mon Sep 17 00:00:00 2001
From: Calvin Owens 
Date: Tue, 13 Jan 2015 13:16:18 -0800
Subject: ksoftirqd: Enable IRQs and call cond_resched() before poking RCU

While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae24a706 ("softirq: Use hotplug thread
infrastructure"):

[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

...but despite this note, the patch still calls RCU with IRQs disabled.

This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.

Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.

Signed-off-by: Calvin Owens 
Cc: sta...@vger.kernel.org
Signed-off-by: Paul E. McKenney 
Signed-off-by: Mike Galbraith 
---
 kernel/softirq.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -657,9 +657,13 @@ static void run_ksoftirqd(unsigned int c
 * in the task stack here.
 */
__do_softirq();
-   rcu_note_context_switch(cpu);
local_irq_enable();
cond_resched();
+
+   preempt_disable();
+   rcu_note_context_switch(cpu);
+   preempt_enable();
+
return;
}
local_irq_enable();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 12/20] clk: tegra: pll: Add specialized logic for T210

2015-05-01 Thread Jim Lin

> +static void clk_plle_tegra210_is_enabled(struct struct clk_hw *hw) {
Returned type is "int" instead of "void".
Also one "struct" only for "clk_hw *hw"?

> + struct tegra_clk_pll *pll = to_clk_pll(hw);
> + u32 val;
> +
> + val = pll_readl_base(pll);
> +
> + return val & PLLE_BASE_ENABLE ? 1 : 0; }
> +
--nvpublic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT] Networking

2015-05-01 Thread David Miller

1) Receive packet length needs to be adjust by 2 on RX to accomodate
   the two padding bytes in altera_tse driver.  From Vlastimil Setka.

2) If rx frame is dropped due to out of memory in macb driver, we
   leave the receive ring descriptors in an undefined state.  From
   Punnaiah Choudary Kalluri

3) Some netlink subsystems erroneously signal NLM_F_MULTI.  That is
   only for dumps.  Fix from Nicolas Dichtel.

4) Fix mis-use of raw rt->rt_pmtu value in ipv4, one must always go
   via the ipv4_mtu() helper.  From Herbert Xu.

5) Fix null deref in bridge netfilter, and miscalculated lengths in
   jump/goto nf_tables verdicts.  From Florian Westphal.

6) Unhash ping sockets properly.

7) Software implementation of BPF divide did 64/32 rather than 64/64
   bit divide.  The JITs got it right.  Fix from Alexei Starovoitov.

Please pull, thanks a lot!

The following changes since commit 2decb2682f80759f631c8332f9a2a34a02150a03:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-04-27 
14:05:19 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to a134f083e79fb4c3d0a925691e732c56911b4326:

  ipv4: Missing sk_nulls_node_init() in ping_unhash(). (2015-05-01 22:02:47 
-0400)


Alexei Starovoitov (1):
  bpf: fix 64-bit divide

Antonio Ospite (2):
  trivial: net: atl1e: atl1e_hw.h: fix 0x0x prefix
  trivial: net: systemport: bcmsysport.h: fix 0x0x prefix

Benjamin Poirier (2):
  mlx4: Fix tx ring affinity_mask creation
  mlx4_en: Use correct loop cursor in error path.

David Ahern (1):
  net/mlx4_core: Fix unaligned accesses

David S. Miller (3):
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'bnx2x'
  ipv4: Missing sk_nulls_node_init() in ping_unhash().

Florian Westphal (3):
  netfilter: nf_tables: fix wrong length for jump/goto verdicts
  netfilter: bridge: fix NULL deref in physin/out ifindex helpers
  net: sched: act_connmark: don't zap skb->nfct

Guenter Roeck (1):
  net: dsa: Fix scope of eeprom-length property

Hariprasad Shenai (1):
  cxgb4: Fix MC1 memory offset calculation

Herbert Xu (1):
  route: Use ipv4_mtu instead of raw rt_pmtu

Ido Shamay (1):
  net/mlx4_en: Schedule napi when RX buffers allocation fails

Jon Paul Maloy (1):
  tipc: fix problem with parallel link synchronization mechanism

KY Srinivasan (1):
  hv_netvsc: Fix a bug in netvsc_start_xmit()

Karicheri, Muralidharan (1):
  net: netcp: remove call to netif_carrier_(on/off) for MAC to Phy interface

Markus Pargmann (1):
  net: fec: Fix RGMII-ID mode

Michal Schmidt (3):
  bnx2x: mark LRO as a fixed disabled feature if disable_tpa is set
  bnx2x: merge fp->disable_tpa with fp->mode
  bnx2x: remove {TPA,GRO}_ENABLE_FLAG

Nicolas Dichtel (3):
  bridge/mdb: remove wrong use of NLM_F_MULTI
  bridge/nl: remove wrong use of NLM_F_MULTI
  tipc: remove wrong use of NLM_F_MULTI

Pai (1):
  net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table

Punnaiah Choudary Kalluri (1):
  net: macb: Fix race condition in driver when Rx frame is dropped

Simon Xiao (1):
  hv_netvsc: introduce netif-msg into netvsc module

Tony Camuso (1):
  netxen_nic: use spin_[un]lock_bh around tx_clean_lock

Vlastimil Setka (1):
  altera_tse: Correct rx packet length

Yuval Mintz (1):
  bnx2x: Delay during kdump load

 drivers/net/bonding/bond_main.c| 10 
 drivers/net/ethernet/altera/altera_tse_main.c  |  6 ++
 drivers/net/ethernet/atheros/atl1e/atl1e_hw.h  |  2 +-
 drivers/net/ethernet/broadcom/bcmsysport.h |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h|  4 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 67 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h|  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c   | 17 +++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c   |  2 +-
 drivers/net/ethernet/cadence/macb.c|  3 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c |  2 +-
 drivers/net/ethernet/emulex/benet/be_main.c|  5 +-
 drivers/net/ethernet/freescale/fec_main.c  |  5 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|  7 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |  4 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  3 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 26 -
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  8 ++-
 drivers/net/ethernet/mellanox/mlx4/fw.c| 18 --
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  1 +
 .../net/ethernet/qlogic/netxen/netxen_nic_init.c   |  4 +-
 drivers/net/ethernet/rocker/rocker.c   |  5 +-
 drivers/net/ethernet/ti/netcp_ethss.c  |  8 ++-
 drivers/net/hyperv/hyperv_net.h 

Re: Sharing credentials in general (Re: [GIT PULL] kdbus for 4.1-rc1)

2015-05-01 Thread Andy Lutomirski
On Mon, Apr 27, 2015 at 9:33 AM, David Herrmann  wrote:
> On Mon, Apr 27, 2015 at 6:13 PM, Andy Lutomirski  wrote:
>> 2.  This is a nice thought, but it doesn't work in practice.  Sorry.
>> I can give you a big pile of CVEs from last year if you like, or I can
>> try explaining again.
>>
>> The issue boils down to what type of privileges you want to assert and
>> over what object you want to assert them.  Suppose I have a method
>> "write".  When I call it, I do Write(destination, text).  In your
>> model, it's basically never safe to do:
>
> You're correct. So don't create such APIs.
> In fact, never ever accept FDs or file-paths from a less privileged
> caller. It might be FUSE backed and under their full control.

Really?

So writing to stdout is not okay?  So what if it's on FUSE?  At worst
it should be a DoS, unless you screw up the security model (as the
kernel did) and it's privilege escalation or system state corruption.

How about privileged services that mediate interactions between
client-provided objects and other client-inaccessible things?

FWIW, all these threads seem to have spawned tons on comparisons
between dbus, COM, Binder, etc.  I don't know too much about all of
these technologies, but one difference springs to mind.  COM and
Binder both have very clear concepts of capability-passing.  COM calls
it "marshalling an interface".  Binder calls it "IBinder".  Quoting
from the reference [1]:

The data sent through transact() is a Parcel, a generic buffer of data
that also maintains some meta-data about its contents. The meta data
is used to manage IBinder object references in the buffer, so that
those references can be maintained as the buffer moves across
processes.

Wikipedia suggests CORBA has "objects by reference".  I think that
Wayland's "new_id" type is effectively an object reference.  I have no
clue what Mac OS X and iOS do.

AFAICT D-Bus is completely missing any corresponding concept.  I'm
rather confused how Samsung thinks they can cleanly layer Binder over
kdbus unless they use kdbus as a completely dumb transport and
implement object references in userspace.  If they do that, then I
don't see the point of using kdbus at all -- sockets would be fine.

One can certainly debate the merits of capability-based security, but
I've (so far) never heard anyone claim that passing object references
around is a bad idea.

Havoc, am I missing something here?  If I'm right about this aspect of
D-Bus, then I'm a bit surprised.

[1] http://developer.android.com/reference/android/os/IBinder.html

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 0/6] arm64: Add kernel probes (kprobes) support

2015-05-01 Thread William Cohen
On 04/29/2015 06:23 AM, Will Deacon wrote:
> On Tue, Apr 28, 2015 at 03:58:21AM +0100, William Cohen wrote:
>> Hi All,
> 
> Hi Will,
> 
>> I have been experimenting with the patches for arm64 kprobes support.
>> On occasion the kernel gets stuck in a loop printing output:
>>
>>  Unexpected kernel single-step exception at EL1
>>
>> This message by itself is not that enlighten.  I added the attached
>> patch to get some additional information about register state when the
>> warning is printed out.  Below is an example output:
> 
> Given that we've got the pt_regs in our hands at that point, I'm happy to
> print something more useful if you like (e.g. the PC?).
> 
>> [14613.263536] Unexpected kernel single-step exception at EL1
>> [14613.269001] kcb->ss_ctx.ss_status = 1
>> [14613.272643] kcb->ss_ctx.match_addr = fdfffc001250 0xfdfffc001250
>> [14613.279324] instruction_pointer(regs) = fe093358 el1_da+0x8/0x70
>> [14613.286003] 
>> [14613.287487] CPU: 3 PID: 621 Comm: irqbalance Tainted: G   OE   
>> 4.0.0u4+ #6
>> [14613.295019] Hardware name: AppliedMicro Mustang/Mustang, BIOS 
>> 1.1.0-rh-0.15 Mar 13 2015
>> [14613.302982] task: fe01d6806780 ti: fe01d68ac000 task.ti: 
>> fe01d68ac000
>> [14613.310430] PC is at el1_da+0x8/0x70
>> [14613.313990] LR is at trampoline_probe_handler+0x188/0x1ec
> 
>> The really odd thing is the address of the PC it is in el1_da the code
>> to handle data aborts.  it looks like it is getting the unexpected
>> single_step exception right after the enable_debug in el1_da.  I think
>> what might be happening is:
>>
>> -an instruction is instrumented with kprobe
>> -the instruction is copied to a buffer
>> -a breakpoint replaces the instruction
>> -the kprobe fires when the breakpoint is encountered
>> -the instruction in the buffer is set to single step
>> -a single step of the instruction is attempted
>> -a data abort exception is raised
>> -el1_da is called
> 
> So that's the bit that I find weird. Can you take a look at what we're doing
> in trampoline_probe_handler, please? It could be that we're doing something
> like get_user and aborting on a faulting userspace address, but I think
> kprobes should handle that rather than us trying to get the generic
> single-step code to deal with it.
> 
>> It looks like commit 1059c6bf8534acda249e7e65c81e7696fb074dc1 from Mon
>> Sep 22   "arm64: debug: don't re-enable debug exceptions on return from 
>> el1_dbg"
>> was trying to address a similar problem for the el1_dbg
>> function.  Should el1_da and other el1_* functions have the enable_dbg
>> removed?
> 
> I don't think so. The current behaviour of the low-level debug handler is to
> step into traps, which is more flexible than trying to step over them (which
> could lead to us stepping over interrupts, or preemption points). It should
> be up to the higher-level debugger (e.g. kprobes, kgdb) to distinguish
> between the traps it does and does not care about.
> 
> An equivalent userspace example would be GDB stepping into single handlers,
> I suppose.
> 
> Will
> 

Dave Long and I did some additional experimentation to better
understand what is condition causes the kernel to sometimes spew:

Unexpected kernel single-step exception at EL1

The functioncallcount.stp test instruments the entry and return of
every function in the mm files, including kfree.  In most cases the
arm64 trampoline_probe_handler just determines which return probe
instance matches the current conditions, runs the associated handler,
and recycles the return probe instance for another use by placing it
on a hlist.  However, it is possible that a return probe instance has
been set up on function entry and the return probe is unregistered
before the return probe instance fires.  In this case kfree is called
by the trampoline handler to remove the return probe instances related
to the unregistered kretprobe.  This case where the the kprobed kfree
is called within the arm64 trampoline_probe_handler function trigger
the problem.

The kprobe breakpoint for the kfree call from within the
trampoline_probe_handler is encountered and started, but things go
wrong when attempting the single step on the instruction.

It took a while to trigger this problem with the sytemtap testsuite.
Dave Long came up with steps that reproduce this more quickly with a
probed function that is always called within the trampoline handler.
Trying the same on x86_64 doesn't trigger the problem.  It appears
that the x86_64 code can handle a single step from within the
trampoline_handler.

-Will Cohen



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection

2015-05-01 Thread Vikas Shivappa
This patch adds support for the new Cache Allocation Technology (CAT)
feature found in future Intel Xeon processors. CAT is part of Intel
Resource Director Technology(RDT) which enables sharing of processor
resources. This patch includes CPUID enumeration routines for CAT and
new values to track CAT resources to the cpuinfo_x86 structure.

Cache Allocation Technology(CAT) provides a way for the Software
(OS/VMM) to restrict cache allocation to a defined 'subset' of cache
which may be overlapping with other 'subsets'.  This feature is used
when allocating a line in cache ie when pulling new data into the cache.
The programming of the h/w is done via programming  MSRs.

More information about CAT be found in the Intel (R) x86 Architecture
Software Developer Manual,Volume 3, section 17.15.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/cpufeature.h |  6 +-
 arch/x86/include/asm/processor.h  |  3 +++
 arch/x86/kernel/cpu/Makefile  |  1 +
 arch/x86/kernel/cpu/common.c  | 15 +
 arch/x86/kernel/cpu/intel_rdt.c   | 44 +++
 init/Kconfig  | 11 ++
 6 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 3d6606f..30cb56c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -12,7 +12,7 @@
 #include 
 #endif
 
-#define NCAPINTS   13  /* N 32-bit words worth of info */
+#define NCAPINTS   14  /* N 32-bit words worth of info */
 #define NBUGINTS   1   /* N 32-bit bug flags */
 
 /*
@@ -229,6 +229,7 @@
 #define X86_FEATURE_RTM( 9*32+11) /* Restricted Transactional 
Memory */
 #define X86_FEATURE_CQM( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX( 9*32+14) /* Memory Protection 
Extension */
+#define X86_FEATURE_RDT( 9*32+15) /* Resource Allocation */
 #define X86_FEATURE_AVX512F( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX 
instructions */
@@ -252,6 +253,9 @@
 /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x000F:1 (edx), word 12 */
 #define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 
*/
 
+/* Intel-defined CPU features, CPUID level 0x0010:0 (ebx), word 13 */
+#define X86_FEATURE_CAT_L3 (13*32 + 1) /* Cache QOS Enforcement L3 */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 23ba676..7d9aee2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -114,6 +114,9 @@ struct cpuinfo_x86 {
int x86_cache_occ_scale;/* scale to bytes */
int x86_power;
unsigned long   loops_per_jiffy;
+   /* Cache Allocation Technology values */
+   u16 x86_cat_cbmlength;
+   u16 x86_cat_closs;
/* cpuid returned max cores value: */
u16  x86_max_cores;
u16 apicid;
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9bff687..4ff7a1f 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)+= 
perf_event_intel_uncore.o \
   perf_event_intel_uncore_nhmex.o
 endif
 
+obj-$(CONFIG_CGROUP_RDT)   += intel_rdt.o
 
 obj-$(CONFIG_X86_MCE)  += mcheck/
 obj-$(CONFIG_MTRR) += mtrr/
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..f39d948 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -670,6 +670,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
}
}
 
+   /* Additional Intel-defined flags: level 0x0010 */
+   if (c->cpuid_level >= 0x0010) {
+   u32 eax, ebx, ecx, edx;
+
+   cpuid_count(0x0010, 0, , , , );
+   c->x86_capability[13] = ebx;
+
+   if (cpu_has(c, X86_FEATURE_CAT_L3)) {
+
+   cpuid_count(0x0010, 1, , , , );
+   c->x86_cat_closs = edx + 1;
+   c->x86_cat_cbmlength = eax + 1;
+   }
+   }
+
/* AMD-defined flags: level 0x8001 */
xlvl = cpuid_eax(0x8000);
c->extended_cpuid_level = xlvl;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 000..901b6fa
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,44 @@
+/*
+ * Resource Director Technology(RDT)/Cache quality of service code.
+ *
+ * Copyright (C) 

[PATCH 2/7] x86/intel_rdt: Adds support for Class of service management

2015-05-01 Thread Vikas Shivappa
This patch adds a cgroup subsystem to support Intel Resource Director
Technology(RDT) or Platform Shared resources Control. The resources that
are currently supported for sharing is L3 cache
(Cache Allocation Technology or CAT).
When a RDT cgroup is created it has a CLOSid and CBM associated with it
which are inherited from its parent. A Class of service(CLOS) in Cache
Allocation is represented by a CLOSid. CLOSid is internal to the kernel
and not exposed to user. Cache bitmask(CBM) represents one global cache
'subset'. Tasks belonging to a cgroup would get to fill the L3 cache
represented by the CBM. Root cgroup would have all available bits set
for its CBM and would be assigned the CLOSid 0.

CLOSid allocation is tracked using a separate bitmap. The maximum number
of CLOSids is specified by the h/w during CPUID enumeration and the
kernel simply throws an -ENOSPC when it runs out of CLOSids.

Each CBM has an associated CLOSid. If multiple cgroups have the same CBM
they would also have the same CLOSid. The reference count parameter in
CLOSid-CBM map keeps track of how many cgroups are using each
CLOSid<->CBM mapping.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |  38 +++
 arch/x86/kernel/cpu/intel_rdt.c  | 100 ++-
 include/linux/cgroup_subsys.h|   4 ++
 3 files changed, 140 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_rdt.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 000..87af1a5
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,38 @@
+#ifndef _RDT_H_
+#define _RDT_H_
+
+#ifdef CONFIG_CGROUP_RDT
+
+#include 
+
+struct rdt_subsys_info {
+   /* Clos Bitmap to keep track of available CLOSids.*/
+   unsigned long *closmap;
+};
+
+struct intel_rdt {
+   struct cgroup_subsys_state css;
+   /* Class of service for the cgroup.*/
+   unsigned int clos;
+};
+
+struct clos_cbm_map {
+   unsigned long cbm;
+   unsigned int cgrp_count;
+};
+
+/*
+ * Return rdt group corresponding to this container.
+ */
+static inline struct intel_rdt *css_rdt(struct cgroup_subsys_state *css)
+{
+   return css ? container_of(css, struct intel_rdt, css) : NULL;
+}
+
+static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir)
+{
+   return css_rdt(ir->css.parent);
+}
+
+#endif
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 901b6fa..eec57fe 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,17 +24,97 @@
 #include 
 #include 
 #include 
+#include 
+
+/*
+ * ccmap maintains 1:1 mapping between CLOSid and cbm.
+ */
+static struct clos_cbm_map *ccmap;
+static struct rdt_subsys_info rdtss_info;
+static DEFINE_MUTEX(rdt_group_mutex);
+struct intel_rdt rdt_root_group;
+
+static inline bool cat_supported(struct cpuinfo_x86 *c)
+{
+   if (cpu_has(c, X86_FEATURE_CAT_L3))
+   return true;
+
+   return false;
+}
+
+/*
+* Called with the rdt_group_mutex held.
+*/
+static int rdt_free_closid(struct intel_rdt *ir)
+{
+
+   lockdep_assert_held(_group_mutex);
+
+   WARN_ON(!ccmap[ir->clos].cgrp_count);
+   ccmap[ir->clos].cgrp_count--;
+   if (!ccmap[ir->clos].cgrp_count)
+   clear_bit(ir->clos, rdtss_info.closmap);
+
+   return 0;
+}
+
+static struct cgroup_subsys_state *
+rdt_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+   struct intel_rdt *parent = css_rdt(parent_css);
+   struct intel_rdt *ir;
+
+   /*
+* Cannot return failure on systems with no Cache Allocation
+* as the cgroup_init does not handle failures gracefully.
+*/
+   if (!parent)
+   return _root_group.css;
+
+   ir = kzalloc(sizeof(struct intel_rdt), GFP_KERNEL);
+   if (!ir)
+   return ERR_PTR(-ENOMEM);
+
+   mutex_lock(_group_mutex);
+   ir->clos = parent->clos;
+   ccmap[parent->clos].cgrp_count++;
+   mutex_unlock(_group_mutex);
+
+   return >css;
+}
 
 static int __init rdt_late_init(void)
 {
struct cpuinfo_x86 *c = _cpu_data;
+   static struct clos_cbm_map *ccm;
+   size_t sizeb;
int maxid, cbm_len;
 
-   if (!cpu_has(c, X86_FEATURE_CAT_L3))
+   if (!cat_supported(c)) {
+   rdt_root_group.css.ss->disabled = 1;
return -ENODEV;
-
+   }
maxid = c->x86_cat_closs;
cbm_len = c->x86_cat_cbmlength;
+   sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
+
+   rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!rdtss_info.closmap)
+   return -ENOMEM;
+
+   sizeb = maxid * sizeof(struct clos_cbm_map);
+   ccmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!ccmap) {
+   kfree(rdtss_info.closmap);
+   return -ENOMEM;
+   }
+
+   set_bit(0, rdtss_info.closmap);
+   rdt_root_group.clos = 0;
+

[PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR

2015-05-01 Thread Vikas Shivappa
This patch implements a common software cache for IA32_PQR_MSR(RMID 0:9,
CLOSId 32:63) to be used by both CMT and CAT. CMT updates the RMID
where as CAT updates the CLOSid in the software cache. When the new
RMID/CLOSid value is different from the cached values, IA32_PQR_MSR is
updated. Since the measured rdmsr latency for IA32_PQR_MSR is very
high(~250 cycles) this software cache is necessary to avoid reading the
MSR to compare the current CLOSid value.
During CPU hotplug the pqr cache is updated to zero.

Signed-off-by: Vikas Shivappa 

Conflicts:
arch/x86/kernel/cpu/perf_event_intel_cqm.c
---
 arch/x86/include/asm/intel_rdt.h   | 31 +++---
 arch/x86/include/asm/rdt_common.h  | 13 +
 arch/x86/kernel/cpu/intel_rdt.c|  3 +++
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 20 +++
 4 files changed, 39 insertions(+), 28 deletions(-)
 create mode 100644 arch/x86/include/asm/rdt_common.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 2fc496f..6aae109 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,12 +4,13 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+#include 
 
-#define MSR_IA32_PQR_ASSOC 0xc8f
 #define MAX_CBM_LENGTH 32
 #define IA32_L3_CBM_BASE   0xc90
 #define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
-DECLARE_PER_CPU(unsigned int, x86_cpu_clos);
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
 extern struct static_key rdt_enable_key;
 
 struct rdt_subsys_info {
@@ -61,30 +62,30 @@ static inline struct intel_rdt *task_rdt(struct task_struct 
*task)
 static inline void rdt_sched_in(struct task_struct *task)
 {
struct intel_rdt *ir;
-   unsigned int clos;
+   struct intel_pqr_state *state = this_cpu_ptr(_state);
+   unsigned long flags;
 
if (!rdt_enabled())
return;
 
-   /*
-* This needs to be fixed
-* to cache the whole PQR instead of just CLOSid.
-* PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
-* Should not write a 0 to the low 10 bits of PQR
-* and corrupt RMID.
-*/
-   clos = this_cpu_read(x86_cpu_clos);
-
+   raw_spin_lock_irqsave(>lock, flags);
rcu_read_lock();
ir = task_rdt(task);
-   if (ir->clos == clos) {
+   if (ir->clos == state->clos) {
rcu_read_unlock();
+   raw_spin_unlock_irqrestore(>lock, flags);
return;
}
 
-   wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
-   this_cpu_write(x86_cpu_clos, ir->clos);
+   /*
+* PQR has closid in high 32 bits and CQM-RMID
+* in low 10 bits. Rewrite the exsting rmid from
+* software cache.
+*/
+   wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, ir->clos);
+   state->clos = ir->clos;
rcu_read_unlock();
+   raw_spin_unlock_irqrestore(>lock, flags);
 }
 
 #else
diff --git a/arch/x86/include/asm/rdt_common.h 
b/arch/x86/include/asm/rdt_common.h
new file mode 100644
index 000..33fd8ea
--- /dev/null
+++ b/arch/x86/include/asm/rdt_common.h
@@ -0,0 +1,13 @@
+#ifndef _X86_RDT_H_
+#define _X86_RDT_H_
+
+#define MSR_IA32_PQR_ASSOC 0x0c8f
+
+struct intel_pqr_state {
+   raw_spinlock_t  lock;
+   int rmid;
+   int clos;
+   int cnt;
+};
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 74b1e28..9da61b2 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -347,6 +347,9 @@ static inline bool rdt_update_cpumask(int cpu)
  */
 static inline void rdt_cpu_start(int cpu)
 {
+   struct intel_pqr_state *state = _cpu(pqr_state, cpu);
+
+   state->clos = 0;
mutex_lock(_group_mutex);
if (rdt_update_cpumask(cpu))
cbm_update_msrs(cpu);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c 
b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index e4d1b8b..fd039899 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -7,22 +7,16 @@
 #include 
 #include 
 #include 
+#include 
 #include "perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC 0x0c8f
 #define MSR_IA32_QM_CTR0x0c8e
 #define MSR_IA32_QM_EVTSEL 0x0c8d
 
 static unsigned int cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 
-struct intel_cqm_state {
-   raw_spinlock_t  lock;
-   int rmid;
-   int cnt;
-};
-
-static DEFINE_PER_CPU(struct intel_cqm_state, cqm_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 
 /*
  * Protects cache_cgroups and cqm_rmid_free_lru and cqm_rmid_limbo_lru.
@@ -961,7 +955,7 @@ out:
 
 static void intel_cqm_event_start(struct perf_event *event, int mode)
 {
-   struct intel_cqm_state *state 

[PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide

2015-05-01 Thread Vikas Shivappa
Adds a description of Cache allocation technology, overview
of kernel implementation and usage of CAT cgroup interface.

Signed-off-by: Vikas Shivappa 
---
 Documentation/cgroups/rdt.txt | 180 ++
 1 file changed, 180 insertions(+)
 create mode 100644 Documentation/cgroups/rdt.txt

diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt
new file mode 100644
index 000..6051b73
--- /dev/null
+++ b/Documentation/cgroups/rdt.txt
@@ -0,0 +1,180 @@
+RDT
+---
+
+Copyright (C) 2014 Intel Corporation
+Written by vikas.shiva...@linux.intel.com
+(based on contents and format from cpusets.txt)
+
+CONTENTS:
+=
+
+1. Cache Allocation Technology
+  1.1 What is RDT and CAT ?
+  1.2 Why is CAT needed ?
+  1.3 CAT implementation overview
+  1.4 Assignment of CBM and CLOS
+  1.5 Scheduling and Context Switch
+2. Usage Examples and Syntax
+
+1. Cache Allocation Technology(CAT)
+===
+
+1.1 What is RDT and CAT
+---
+
+CAT is a part of Resource Director Technology(RDT) or Platform Shared
+resource control which provides support to control Platform shared
+resources like cache. Currently Cache is the only resource that is
+supported in RDT.
+More information can be found in the Intel SDM section 17.15.
+
+Cache Allocation Technology provides a way for the Software (OS/VMM)
+to restrict cache allocation to a defined 'subset' of cache which may
+be overlapping with other 'subsets'.  This feature is used when
+allocating a line in cache ie when pulling new data into the cache.
+The programming of the h/w is done via programming  MSRs.
+
+The different cache subsets are identified by CLOS identifier (class
+of service) and each CLOS has a CBM (cache bit mask).  The CBM is a
+contiguous set of bits which defines the amount of cache resource that
+is available for each 'subset'.
+
+1.2 Why is CAT needed
+-
+
+The CAT  enables more cache resources to be made available for higher
+priority applications based on guidance from the execution
+environment.
+
+The architecture also allows dynamically changing these subsets during
+runtime to further optimize the performance of the higher priority
+application with minimal degradation to the low priority app.
+Additionally, resources can be rebalanced for system throughput
+benefit.  (Refer to Section 17.15 in the Intel SDM)
+
+This technique may be useful in managing large computer systems which
+large LLC. Examples may be large servers running  instances of
+webservers or database servers. In such complex systems, these subsets
+can be used for more careful placing of the available cache
+resources.
+
+1.3 CAT implementation Overview
+---
+
+Kernel implements a cgroup subsystem to support cache allocation.
+
+Each cgroup has a CLOSid <-> CBM(cache bit mask) mapping.
+A CLOS(Class of service) is represented by a CLOSid.CLOSid is internal
+to the kernel and not exposed to user.  Each cgroup would have one CBM
+and would just represent one cache 'subset'.
+
+The cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
+cgroup never fails.  When a child cgroup is created it inherits the
+CLOSid and the CBM from its parent.  When a user changes the default
+CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
+used before.  The changing of 'cache_mask' may fail with -ERRNOSPC once the
+kernel runs out of maximum CLOSids it can support.
+User can create as many cgroups as he wants but having different CBMs
+at the same time is restricted by the maximum number of CLOSids
+(multiple cgroups can have the same CBM).
+Kernel maintains a CLOSid<->cbm mapping which keeps reference counter
+for each cgroup using a CLOSid.
+
+The tasks in the cgroup would get to fill the LLC cache represented by
+the cgroup's 'cache_mask' file.
+
+Root directory would have all available  bits set in 'cache_mask' file by
+default.
+
+1.4 Assignment of CBM,CLOS
+--
+
+The 'cache_mask' needs to be a  subset of the parent node's 'cache_mask'.
+Any contiguous subset of these bits(with a minimum of 2 bits) maybe
+set to indicate the cache mapping desired.  The 'cache_mask' between 2
+directories can overlap. The 'cache_mask' would represent the cache 'subset'
+of the CAT cgroup.  For ex: on a system with 16 bits of max cbm bits,
+if the directory has the least significant 4 bits set in its 'cache_mask'
+file(meaning the 'cache_mask' is just 0xf), it would be allocated the right
+quarter of the Last level cache which means the tasks belonging to
+this CAT cgroup can use the right quarter of the cache to fill. If it
+has the most significant 8 bits set ,it would be allocated the left
+half of the cache(8 bits  out of 16 represents 50%).
+
+The cache portion defined in the CBM file is available to all tasks
+within the cgroup to fill and these task are not allowed to allocate
+space in other parts of the 

[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-01 Thread Vikas Shivappa
Add support for cache bit mask manipulation. The change adds a file
cache_mask to the RDT cgroup which represents the CBM(cache bit mask)
  for the cgroup.

Update to the CBM is done by writing to the IA32_L3_MASK_n.
The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the cache_mask from its parent.  When a user changes the
default CBM for a cgroup, a new CLOSid may be allocated if the
cache_mask was not used before. If the new CBM is the one that is
already used, the count for that CLOSid<->CBM is incremented. The
changing of 'cbm' may fail with -ENOSPC once the kernel runs out of
maximum CLOSids it can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the L3 cache represented
by the cgroup's cache_mask file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

During cpu bringup due to a hotplug event, IA32_L3_MASK_n MSR is
synchronized from the clos cbm map if it is used by any cgroup for the
package.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |   7 +-
 arch/x86/kernel/cpu/intel_rdt.c  | 364 ---
 2 files changed, 346 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..9e9dbbe 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
@@ -17,8 +20,8 @@ struct intel_rdt {
 };
 
 struct clos_cbm_map {
-   unsigned long cbm;
-   unsigned int cgrp_count;
+   unsigned long cache_mask;
+   unsigned int clos_refcnt;
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index eec57fe..58b39d6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,16 +24,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
- * ccmap maintains 1:1 mapping between CLOSid and cbm.
+ * ccmap maintains 1:1 mapping between CLOSid and cache_mask.
  */
 static struct clos_cbm_map *ccmap;
 static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+/*
+ * Mask of CPUs for writing CBM values. We only need one per-socket.
+ */
+static cpumask_t rdt_cpumask;
+
+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -42,22 +51,66 @@ static inline bool cat_supported(struct cpuinfo_x86 *c)
return false;
 }
 
+static void __clos_init(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = [closid];
+
+   lockdep_assert_held(_group_mutex);
+
+   ccm->clos_refcnt = 1;
+}
+
 /*
-* Called with the rdt_group_mutex held.
-*/
-static int rdt_free_closid(struct intel_rdt *ir)
+ * Allocates a new closid from unused closids.
+ */
+static int rdt_alloc_closid(struct intel_rdt *ir)
 {
+   unsigned int id;
+   unsigned int maxid;
 
lockdep_assert_held(_group_mutex);
 
-   WARN_ON(!ccmap[ir->clos].cgrp_count);
-   ccmap[ir->clos].cgrp_count--;
-   if (!ccmap[ir->clos].cgrp_count)
-   clear_bit(ir->clos, rdtss_info.closmap);
+   maxid = boot_cpu_data.x86_cat_closs;
+   id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, rdtss_info.closmap);
+   __clos_init(id);
+   ir->clos = id;
 
return 0;
 }
 
+static void rdt_free_closid(unsigned int clos)
+{
+
+   lockdep_assert_held(_group_mutex);
+
+   clear_bit(clos, rdtss_info.closmap);
+}
+
+static void __clos_get(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = [closid];
+
+   lockdep_assert_held(_group_mutex);
+
+   ccm->clos_refcnt += 1;
+}
+
+static void __clos_put(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = [closid];
+
+   lockdep_assert_held(_group_mutex);
+   WARN_ON(!ccm->clos_refcnt);
+
+   ccm->clos_refcnt -= 1;
+   if (!ccm->clos_refcnt)
+   rdt_free_closid(closid);
+}
+
 static struct cgroup_subsys_state *
 rdt_css_alloc(struct 

[PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration

2015-05-01 Thread Vikas Shivappa
CAT(Cache Allocation Technology) on hsw needs to be enumerated
separately. CAT is only supported on certain HSW SKUs.  This patch does
a probe test for hsw CPUs by writing a CLOSid into high 32 bits of
IA32_PQR_MSR and see if the bits stick. The probe test is only done
after confirming that the CPU is HSW.
HSW also requires the L3 cache bitmask to be at least two bits.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/kernel/cpu/intel_rdt.c | 56 ++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 9da61b2..4c12e5b 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -38,6 +38,11 @@ struct static_key __read_mostly rdt_enable_key = 
STATIC_KEY_INIT_FALSE;
 DEFINE_PER_CPU(unsigned int, x86_cpu_clos);
 
 /*
+ * Minimum bits required in Cache bitmasks.
+ */
+static unsigned int min_bitmask_len = 1;
+
+/*
  * Mask of CPUs for writing CBM values. We only need one per-socket.
  */
 static cpumask_t rdt_cpumask;
@@ -45,11 +50,54 @@ static cpumask_t rdt_cpumask;
 #define rdt_for_each_child(pos_css, parent_ir) \
css_for_each_child((pos_css), &(parent_ir)->css)
 
+/*
+ * hsw_probetest() - Have to do probe
+ * test for Intel haswell CPUs as it does not have
+ * CPUID enumeration support for CAT.
+ *
+ * Probes by writing to the high 32 bits(CLOSid)
+ * of the IA32_PQR_MSR and testing if the bits stick.
+ * Then hardcode the max CLOS and max bitmask length on hsw.
+ * The minimum cache bitmask length allowed for HSW is 2 bits.
+ */
+static inline bool hsw_probetest(void)
+{
+   u32 l, h_old, h_new, h_tmp;
+
+   if (rdmsr_safe(MSR_IA32_PQR_ASSOC, , _old))
+   return false;
+
+   /*
+* Default value is always 0 if feature is present.
+*/
+   h_tmp = h_old ^ 0x1U;
+   if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp) ||
+   rdmsr_safe(MSR_IA32_PQR_ASSOC, , _new))
+   return false;
+
+   if (h_tmp != h_new)
+   return false;
+
+   wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old);
+
+   boot_cpu_data.x86_cat_closs = 4;
+   boot_cpu_data.x86_cat_cbmlength = 20;
+   min_bitmask_len = 2;
+
+   return true;
+}
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
return true;
 
+   /*
+* Probe test for Haswell CPUs.
+*/
+   if (c->x86 == 0x6 && c->x86_model == 0x3f)
+   return hsw_probetest();
+
return false;
 }
 
@@ -153,7 +201,7 @@ static inline bool cbm_is_contiguous(unsigned long var)
unsigned long first_bit, zero_bit;
unsigned long maxcbm = MAX_CBM_LENGTH;
 
-   if (!var)
+   if (bitmap_weight(, maxcbm) < min_bitmask_len)
return false;
 
first_bit = find_next_bit(, maxcbm, 0);
@@ -180,7 +228,8 @@ static int validate_cbm(struct intel_rdt *ir, unsigned long 
cbmvalue)
unsigned long *cbm_tmp;
 
if (!cbm_is_contiguous(cbmvalue)) {
-   pr_err("bitmask should have >= 1 bits and be contiguous\n");
+   pr_err("bitmask should have >=%d bits and be contiguous\n",
+min_bitmask_len);
return -EINVAL;
}
 
@@ -236,7 +285,8 @@ static void __cpu_cbm_update(void *info)
 }
 
 /*
- * cbm_update_all() - Update the cache bit mask for all packages.
+ * cbm_update_all() - Update the cache bit mask for
+ * all packages.
  */
 static inline void cbm_update_all(unsigned int closid)
 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology

2015-05-01 Thread Vikas Shivappa
This patch adds a cgroup subsystem to support the new Cache Allocation 
Technology (CAT) feature found in future Intel Xeon Intel processors. CAT is 
part of Resource Director Technology(RDT) or
Platform Shared resource control which provides support to control
sharing of platform resources like L3 cache.

Cache Allocation Technology(CAT) provides a way for the Software
(OS/VMM) to restrict cache allocation to a defined 'subset' of cache
which may be overlapping with other 'subsets'.  This feature is used
when allocating a line in cache ie when pulling new data into the cache.
The programming of the h/w is done via programming  MSRs. 
The patch series  support to perform L3 cache allocation.

In todays new processors the number of cores is continuously increasing
which in turn increase the number of threads or workloads that can 
simultaneously be run. When multi-threaded
 applications run concurrently, they compete for shared
resources including L3 cache.  At times, this L3 cache resource contention may 
result in inefficient space utilization. For example a higher priority thread 
may end up with lesser L3 cache resource or a cache sensitive app may not get
optimal cache occupancy thereby degrading the performance.
CAT kernel patch helps provides a framework for sharing L3 cache so that users 
can allocate the resource according to set requirements.

More information about the feature can be found in the Intel SDM, Volume 3 
section 17.15.  SDM does not yet use the 'RDT' term yet and it is planned to be 
changed at a later time.

*All the patches will apply on 4.1-rc0*.

Changes in V6:
- rebased to 4.1-rc1 which has the CMT(cache monitoring) support included.
- (Thanks to Marcelo's feedback).Fixed support for hot cpu handling for 
IA32_L3_QOS MSRs. Although during deep C states the MSR need not be restored 
this is needed when physically a new package is added.
-coding convention changes including renaming to cache_mask using a refcnt to 
track the number of cgroups using a closid in clos_cbm map.
-1b cbm support for non-hsw SKUs. HSW is an exception which needs the cache bit 
 masks to be at least 2 bits.

Changes in v5:
- Added support to propagate the cache bit mask update for each 
package.
- Removed the cache bit mask reference in the intel_rdt structure as
  there was no need for that and we already maintain a separate
  closid<->cbm mapping.
- Made a few coding convention changes which include adding the 
assertion while freeing the CLOSID.

Changes in V4:
- Integrated with the latest V5 CMT patches.
- Changed naming of cgroup to rdt(resource director technology) from
  cat(cache allocation technology). This was done as the RDT is the
  umbrella term for platform shared resources allocation. Hence in
  future it would be easier to add resource allocation to the same 
  cgroup
- Naming changes also applied to a lot of other data structures/APIs.
- Added documentation on cgroup usage for cache allocation to address
  a lot of questions from various academic and industry regarding 
  cache allocation usage.

Changes in V3:
- Implements a common software cache for IA32_PQR_MSR
- Implements support for hsw CAT enumeration. This does not use the brand 
strings like earlier version but does a probe test. The probe test is done only 
on hsw family of processors
- Made a few coding convention, name changes
- Check for lock being held when ClosID manipulation happens

Changes in V2:
- Removed HSW specific enumeration changes. Plan to include it later as a
  separate patch.  
- Fixed the code in prep_arch_switch to be specific for x86 and removed
  x86 defines.
- Fixed cbm_write to not write all 1s when a cgroup is freed.
- Fixed one possible memory leak in init.  
- Changed some of manual bitmap
  manipulation to use the predefined bitmap APIs to make code more readable
- Changed name in sources from cqe to cat
- Global cat enable flag changed to static_key and disabled cgroup early_init
  
[PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection
[PATCH 2/7] x86/intel_rdt: Adds support for Class of service
[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
[PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR
[PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration
[PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT

2015-05-01 Thread Vikas Shivappa
Adds support for IA32_PQR_ASSOC MSR writes during task scheduling.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the cgroup to which the task belongs to the CPU's
IA32_PQR_ASSOC MSR.

For Cache Allocation, this would let the task fill in the cache 'subset'
represented by the cgroup's Cache bit mask(CBM).

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h | 54 
 arch/x86/include/asm/switch_to.h |  3 +++
 arch/x86/kernel/cpu/intel_rdt.c  |  3 +++
 kernel/sched/core.c  |  1 +
 kernel/sched/sched.h |  3 +++
 5 files changed, 64 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 9e9dbbe..2fc496f 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,9 +4,13 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+
+#define MSR_IA32_PQR_ASSOC 0xc8f
 #define MAX_CBM_LENGTH 32
 #define IA32_L3_CBM_BASE   0xc90
 #define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
+DECLARE_PER_CPU(unsigned int, x86_cpu_clos);
+extern struct static_key rdt_enable_key;
 
 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
@@ -24,6 +28,11 @@ struct clos_cbm_map {
unsigned int clos_refcnt;
 };
 
+static inline bool rdt_enabled(void)
+{
+   return static_key_false(_enable_key);
+}
+
 /*
  * Return rdt group corresponding to this container.
  */
@@ -37,5 +46,50 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt 
*ir)
return css_rdt(ir->css.parent);
 }
 
+/*
+ * Return rdt group to which this task belongs.
+ */
+static inline struct intel_rdt *task_rdt(struct task_struct *task)
+{
+   return css_rdt(task_css(task, rdt_cgrp_id));
+}
+
+/*
+ * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ * if the current Closid is different than the new one.
+ */
+static inline void rdt_sched_in(struct task_struct *task)
+{
+   struct intel_rdt *ir;
+   unsigned int clos;
+
+   if (!rdt_enabled())
+   return;
+
+   /*
+* This needs to be fixed
+* to cache the whole PQR instead of just CLOSid.
+* PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
+* Should not write a 0 to the low 10 bits of PQR
+* and corrupt RMID.
+*/
+   clos = this_cpu_read(x86_cpu_clos);
+
+   rcu_read_lock();
+   ir = task_rdt(task);
+   if (ir->clos == clos) {
+   rcu_read_unlock();
+   return;
+   }
+
+   wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
+   this_cpu_write(x86_cpu_clos, ir->clos);
+   rcu_read_unlock();
+}
+
+#else
+
+static inline void rdt_sched_in(struct task_struct *task) {}
+
 #endif
 #endif
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 751bf4b..82ef4b3 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,9 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
  struct tss_struct *tss);
 
+#include 
+#define post_arch_switch(current)  rdt_sched_in(current)
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 58b39d6..74b1e28 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -34,6 +34,8 @@ static struct clos_cbm_map *ccmap;
 static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
+struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
+DEFINE_PER_CPU(unsigned int, x86_cpu_clos);
 
 /*
  * Mask of CPUs for writing CBM values. We only need one per-socket.
@@ -433,6 +435,7 @@ static int __init rdt_late_init(void)
__hotcpu_notifier(rdt_cpu_notifier, 0);
 
cpu_notifier_register_done();
+   static_key_slow_inc(_enable_key);
pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid);
 
return 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a8..cacb490 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct 
*prev)
prev_state = prev->state;
vtime_task_switch(prev);
finish_arch_switch(prev);
+   post_arch_switch(current);
perf_event_task_sched_in(prev, current);
finish_lock_switch(rq, prev);
finish_arch_post_lock_switch();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..9153747 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1045,6 +1045,9 @@ static inline int task_on_rq_migrating(struct task_struct 
*p)
 #ifndef finish_arch_switch
 # define finish_arch_switch(prev) 

[PATCH CORRECT 03/10] parse_integer: convert sscanf()

2015-05-01 Thread Alexey Dobriyan
Rewrite kstrto*() functions through parse_integer().

_kstrtoul() and _kstrtol() are removed because parse_integer()
can dispatch based on BITS_PER_LONG saving function call.

Also move function definitions and comment one instance.
Remove redundant boilerplate comments from elsewhere.

High bit base hack suggested by Andrew M.

Signed-off-by: Alexey Dobriyan 
---

I copied patch twice, lol.

 include/linux/kernel.h|  124 ---
 include/linux/parse-integer.h |  109 
 lib/kstrtox.c |  222 --
 lib/parse-integer.c   |   38 ++-
 4 files changed, 143 insertions(+), 350 deletions(-)

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -264,130 +264,6 @@ void do_exit(long error_code)
 void complete_and_exit(struct completion *, long)
__noreturn;
 
-/* Internal, do not use. */
-int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long 
*res);
-int __must_check _kstrtol(const char *s, unsigned int base, long *res);
-
-int __must_check kstrtoull(const char *s, unsigned int base, unsigned long 
long *res);
-int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
-
-/**
- * kstrtoul - convert a string to an unsigned long
- * @s: The start of the string. The string must be null-terminated, and may 
also
- *  include a single newline before its terminating null. The first character
- *  may also be a plus sign, but not a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- *  given as 0, then the base of the string is automatically detected with the
- *  conventional semantics - If it begins with 0x the number will be parsed as 
a
- *  hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- *  parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Used as a replacement for the obsolete simple_strtoull. Return code must
- * be checked.
-*/
-static inline int __must_check kstrtoul(const char *s, unsigned int base, 
unsigned long *res)
-{
-   /*
-* We want to shortcut function call, but
-* __builtin_types_compatible_p(unsigned long, unsigned long long) = 0.
-*/
-   if (sizeof(unsigned long) == sizeof(unsigned long long) &&
-   __alignof__(unsigned long) == __alignof__(unsigned long long))
-   return kstrtoull(s, base, (unsigned long long *)res);
-   else
-   return _kstrtoul(s, base, res);
-}
-
-/**
- * kstrtol - convert a string to a long
- * @s: The start of the string. The string must be null-terminated, and may 
also
- *  include a single newline before its terminating null. The first character
- *  may also be a plus sign or a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- *  given as 0, then the base of the string is automatically detected with the
- *  conventional semantics - If it begins with 0x the number will be parsed as 
a
- *  hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- *  parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Used as a replacement for the obsolete simple_strtoull. Return code must
- * be checked.
- */
-static inline int __must_check kstrtol(const char *s, unsigned int base, long 
*res)
-{
-   /*
-* We want to shortcut function call, but
-* __builtin_types_compatible_p(long, long long) = 0.
-*/
-   if (sizeof(long) == sizeof(long long) &&
-   __alignof__(long) == __alignof__(long long))
-   return kstrtoll(s, base, (long long *)res);
-   else
-   return _kstrtol(s, base, res);
-}
-
-int __must_check kstrtouint(const char *s, unsigned int base, unsigned int 
*res);
-int __must_check kstrtoint(const char *s, unsigned int base, int *res);
-
-static inline int __must_check kstrtou64(const char *s, unsigned int base, u64 
*res)
-{
-   return kstrtoull(s, base, res);
-}
-
-static inline int __must_check kstrtos64(const char *s, unsigned int base, s64 
*res)
-{
-   return kstrtoll(s, base, res);
-}
-
-static inline int __must_check kstrtou32(const char *s, unsigned int base, u32 
*res)
-{
-   return kstrtouint(s, base, res);
-}
-
-static inline int __must_check kstrtos32(const char *s, unsigned int base, s32 
*res)
-{
-   return kstrtoint(s, base, res);
-}
-
-int __must_check kstrtou16(const char *s, unsigned int base, u16 *res);
-int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
-int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
-int __must_check kstrtos8(const char *s, 

Re: Tux3 Report: How fast can we fsync?

2015-05-01 Thread David Lang

On Fri, 1 May 2015, Daniel Phillips wrote:


On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote:


Well, yes - I never claimed XFS is a general purpose filesystem.  It
is a high performance filesystem. Is is also becoming more relevant
to general purpose systems as low cost storage gains capabilities
that used to be considered the domain of high performance storage...


OK. Well, Tux3 is general purpose and that means we care about single
spinning disk and small systems.


keep in mind that if you optimize only for the small systems you may not scale 
as well to the larger ones.



So, to demonstrate, I'll run the same tests but using a 256GB
samsung 840 EVO SSD and show how much the picture changes.


I will go you one better, I ran a series of fsync tests using
tmpfs, and I now have a very clear picture of how the picture
changes. The executive summary is: Tux3 is still way faster, and
still scales way better to large numbers of tasks. I have every
confidence that the same is true of SSD.


/dev/ramX can't be compared to an SSD.  Yes, they both have low
seek/IO latency but they have very different dispatch and IO
concurrency models.  One is synchronous, the other is fully
asynchronous.


I had ram available and no SSD handy to abuse. I was interested in
measuring the filesystem overhead with the device factored out. I
mounted loopback on a tmpfs file, which seems to be about the same as
/dev/ram, maybe slightly faster, but much easier to configure. I ran
some tests on a ramdisk just now and was mortified to find that I have
to reboot to empty the disk. It would take a compelling reason before
I do that again.


This is an important distinction, as we'll see later on


I regard it as predictive of Tux3 performance on NVM.


per the ramdisk but, possibly not as relavent as you may think. This is why it's 
good to test on as many different systems as you can. As you run into different 
types of performance you can then pick ones to keep and test all the time.


Single spinning disk is interesting now, but will be less interesting later. 
multiple spinning disks in an array of some sort is going to remain very 
interesting for quite a while.


now, some things take a lot more work to test than others. Getting time on a 
system with a high performance, high capacity RAID is hard, but getting hold of 
an SSD from Fry's is much easier. If it's a budget item, ping me directly and I 
can donate one for testing (the cost of a drive is within my unallocated budget 
and using that to improve Linux is worthwhile)



Running the same thing on tmpfs, Tux3 is significantly faster:

Ext4:   1.40s
XFS:1.10s
Btrfs:  1.56s
Tux3:   1.07s


3% is not "signficantly faster". It's within run to run variation!


You are right, XFS and Tux3 are within experimental error for single
syncs on the ram disk, while Ext4 and Btrfs are way slower:

  Ext4:   1.59s
  XFS:1.11s
  Btrfs:  1.70s
  Tux3:   1.11s

A distinct performance gap appears between Tux3 and XFS as parallel
tasks increase.


It will be interesting to see if this continues to be true on more systems. I 
hope it does.



You wish. In fact, Tux3 is a lot faster. ...


Yes, it's easy to be fast when you have simple, naive algorithms and
an empty filesystem.


No it isn't or the others would be fast too. In any case our algorithms
are far from naive, except for allocation. You can rest assured that
when allocation is brought up to a respectable standard in the fullness
of time, it will be competitive and will not harm our clean filesystem
performance at all.

There is no call for you to disparage our current achievements, which
are significant. I do not mind some healthy skepticism about the
allocation work, you know as well as anyone how hard it is. However your
denial of our current result is irritating and creates the impression
that you have an agenda. If you want to complain about something real,
complain that our current code drop is not done yet. I will humbly
apologize, and the same for enospc.


As I'm reading Dave's comments, he isn't attacking you the way you seem to think 
he is. He is pointing ot that there are problems with your data, but he's also 
taking a lot of time to explain what's happening (and yes, some of this is 
probably because your simple tests with XFS made it look so bad)


the other filesystems don't use naive algortihms, they use something more 
complex, and while your current numbers are interesting, they are only 
preliminary until you add something to handle fragmentation. That can cause very 
significant problems. Remember how fabulous btrfs looked in the initial reports? 
and then corner cases were found that caused real problems and as the algorithms 
have been changed to prevent those corner cases from being so easy to hit, the 
common case has suffered somewhat. This isn't an attack on Tux2 or btrfs, it's 
just a reality of programming. If you are not accounting for all the corner 

[i915]] *ERROR* mismatch in scaler_state.scaler_id

2015-05-01 Thread Sergey Senozhatsky
Hi,

linux-next 20150501

[1.968953] [drm:check_crtc_state [i915]] *ERROR* mismatch in 
scaler_state.scaler_id (expected 0, found -1)
[1.968953] [ cut here ]
[1.968983] WARNING: CPU: 0 PID: 6 at 
drivers/gpu/drm/i915/intel_display.c:12008 check_crtc_state+0xb15/0xb83 [i915]()
[1.968983] pipe state doesn't match!
[..]
[1.969005] CPU: 0 PID: 6 Comm: kworker/u16:0 Not tainted 
4.1.0-rc1-next-20150501-dbg-00011-gbcb7bed-dirty #49
[1.969010] Workqueue: events_unbound async_run_entry_fn
[1.969012]  0009 88041d9eb448 812e9753 

[1.969013]  88041d9eb498 88041d9eb488 81034c24 
88041d9eb490
[1.969015]  a03a81dc 88041d123000 88041cbbc800 
0001
[1.969015] Call Trace:
[1.969019]  [] dump_stack+0x45/0x57
[1.969022]  [] warn_slowpath_common+0x97/0xb1
[1.969050]  [] ? check_crtc_state+0xb15/0xb83 [i915]
[1.969052]  [] warn_slowpath_fmt+0x41/0x43
[1.969080]  [] check_crtc_state+0xb15/0xb83 [i915]
[1.969082]  [] ? update_curr+0x68/0xd1
[1.969112]  [] intel_modeset_check_state+0x603/0xa3d 
[i915]
[1.969140]  [] intel_crtc_set_config+0x8dc/0xc02 [i915]
[1.969147]  [] ? 
drm_atomic_helper_plane_set_property+0x6c/0xa4 [drm_kms_helper]
[1.969159]  [] drm_mode_set_config_internal+0x57/0xe3 
[drm]
[1.969164]  [] restore_fbdev_mode+0xb5/0xcf 
[drm_kms_helper]
[1.969169]  [] 
drm_fb_helper_restore_fbdev_mode_unlocked+0x22/0x59 [drm_kms_helper]
[1.969173]  [] drm_fb_helper_set_par+0x31/0x35 
[drm_kms_helper]
[1.969202]  [] intel_fbdev_set_par+0x15/0x58 [i915]
[1.969204]  [] fbcon_init+0x323/0x431
[1.969206]  [] visual_init+0xb7/0x10d
[1.969208]  [] do_bind_con_driver+0x1b1/0x2d8
[1.969209]  [] do_take_over_console+0x15a/0x184
[1.969212]  [] do_fbcon_takeover+0x5b/0x97
[1.969213]  [] fbcon_event_notify+0x419/0x740
[1.969215]  [] notifier_call_chain+0x3b/0x5f
[1.969217]  [] __blocking_notifier_call_chain+0x43/0x5f
[1.969219]  [] blocking_notifier_call_chain+0xf/0x11
[1.969220]  [] fb_notifier_call_chain+0x16/0x18
[1.969222]  [] register_framebuffer+0x28f/0x2c7
[1.969250]  [] ? intelfb_create+0x2e7/0x38a [i915]
[1.969256]  [] drm_fb_helper_initial_config+0x297/0x3e1 
[drm_kms_helper]
[1.969284]  [] intel_fbdev_initial_config+0x16/0x18 [i915]
[1.969286]  [] async_run_entry_fn+0x33/0xca
[1.969288]  [] process_one_work+0x192/0x2a8
[1.969290]  [] worker_thread+0x266/0x34c
[1.969291]  [] ? rescuer_thread+0x276/0x276
[1.969293]  [] kthread+0xcd/0xd5
[1.969294]  [] ? kthread_worker_fn+0x130/0x130
[1.969296]  [] ret_from_fork+0x42/0x70
[1.969297]  [] ? kthread_worker_fn+0x130/0x130
[1.969298] ---[ end trace 7f37d8e5ab4ee0a8 ]---

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/10] ext2, ext3, ext4: convert to parse_integer()/kstrto*()

2015-05-01 Thread Alexey Dobriyan
Convert away from deprecated simple_strto*() interfaces.

Signed-off-by: Alexey Dobriyan 
---

 fs/ext2/super.c |6 --
 fs/ext3/super.c |7 ---
 fs/ext4/super.c |   15 +++
 3 files changed, 15 insertions(+), 13 deletions(-)

--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -383,16 +383,18 @@ static unsigned long get_sb_block(void **data)
 {
unsigned long   sb_block;
char*options = (char *) *data;
+   int rv;
 
if (!options || strncmp(options, "sb=", 3) != 0)
return 1;   /* Default location */
options += 3;
-   sb_block = simple_strtoul(options, , 0);
-   if (*options && *options != ',') {
+   rv = parse_integer(options, 0, _block);
+   if (rv < 0 || (options[rv] && options[rv] != ',')) {
printk("EXT2-fs: Invalid sb specification: %s\n",
   (char *) *data);
return 1;
}
+   options += rv;
if (*options == ',')
options++;
*data = (void *) options;
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -902,17 +902,18 @@ static ext3_fsblk_t get_sb_block(void **data, struct 
super_block *sb)
 {
ext3_fsblk_tsb_block;
char*options = (char *) *data;
+   int rv;
 
if (!options || strncmp(options, "sb=", 3) != 0)
return 1;   /* Default location */
options += 3;
-   /*todo: use simple_strtoll with >32bit ext3 */
-   sb_block = simple_strtoul(options, , 0);
-   if (*options && *options != ',') {
+   rv = parse_integer(options, 0, _block);
+   if (rv < 0 || (options[rv] && options[rv] != ',')) {
ext3_msg(sb, KERN_ERR, "error: invalid sb specification: %s",
   (char *) *data);
return 1;
}
+   options += rv;
if (*options == ',')
options++;
*data = (void *) options;
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1210,18 +1210,19 @@ static ext4_fsblk_t get_sb_block(void **data)
 {
ext4_fsblk_tsb_block;
char*options = (char *) *data;
+   int rv;
 
if (!options || strncmp(options, "sb=", 3) != 0)
return 1;   /* Default location */
 
options += 3;
-   /* TODO: use simple_strtoll with >32bit ext4 */
-   sb_block = simple_strtoul(options, , 0);
-   if (*options && *options != ',') {
+   rv = parse_integer(options, 0, _block);
+   if (rv < 0 || (options[rv] && options[rv] != ',')) {
printk(KERN_ERR "EXT4-fs: Invalid sb specification: %s\n",
   (char *) *data);
return 1;
}
+   options += rv;
if (*options == ',')
options++;
*data = (void *) options;
@@ -2491,10 +2492,10 @@ static ssize_t inode_readahead_blks_store(struct 
ext4_attr *a,
  struct ext4_sb_info *sbi,
  const char *buf, size_t count)
 {
-   unsigned long t;
+   unsigned int t;
int ret;
 
-   ret = kstrtoul(skip_spaces(buf), 0, );
+   ret = kstrtouint(skip_spaces(buf), 0, );
if (ret)
return ret;
 
@@ -2518,13 +2519,11 @@ static ssize_t sbi_ui_store(struct ext4_attr *a,
const char *buf, size_t count)
 {
unsigned int *ui = (unsigned int *) (((char *) sbi) + a->u.offset);
-   unsigned long t;
int ret;
 
-   ret = kstrtoul(skip_spaces(buf), 0, );
+   ret = kstrtouint(skip_spaces(buf), 0, ui);
if (ret)
return ret;
-   *ui = t;
return count;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/10] ocfs2: convert to parse_integer()/kstrto*()

2015-05-01 Thread Alexey Dobriyan
Convert away from deprecated simple_strto*() interfaces.

Autodetection of value range from type allows to remove disgusting checks:

if ((major == LONG_MIN) || (major == LONG_MAX) ||
(major > (u8)-1) || (major < 1))
return -ERANGE;
if ((minor == LONG_MIN) || (minor == LONG_MAX) ||
(minor > (u8)-1) || (minor < 0))

Poof, they're gone!

Signed-off-by: Alexey Dobriyan 
---

 fs/ocfs2/cluster/heartbeat.c   |   54 +
 fs/ocfs2/cluster/nodemanager.c |   50 -
 fs/ocfs2/stack_user.c  |   50 ++---
 3 files changed, 70 insertions(+), 84 deletions(-)

--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1484,13 +1484,12 @@ static int o2hb_read_block_input(struct o2hb_region 
*reg,
 unsigned long *ret_bytes,
 unsigned int *ret_bits)
 {
-   unsigned long bytes;
-   char *p = (char *)page;
-
-   bytes = simple_strtoul(p, , 0);
-   if (!p || (*p && (*p != '\n')))
-   return -EINVAL;
+   unsigned int bytes;
+   int rv;
 
+   rv = kstrtouint(page, 0, );
+   if (rv < 0)
+   return rv;
/* Heartbeat and fs min / max block sizes are the same. */
if (bytes > 4096 || bytes < 512)
return -ERANGE;
@@ -1543,18 +1542,14 @@ static ssize_t o2hb_region_start_block_write(struct 
o2hb_region *reg,
 const char *page,
 size_t count)
 {
-   unsigned long long tmp;
-   char *p = (char *)page;
+   int rv;
 
if (reg->hr_bdev)
return -EINVAL;
 
-   tmp = simple_strtoull(p, , 0);
-   if (!p || (*p && (*p != '\n')))
-   return -EINVAL;
-
-   reg->hr_start_block = tmp;
-
+   rv = kstrtoull(page, 0, >hr_start_block);
+   if (rv < 0)
+   return rv;
return count;
 }
 
@@ -1568,20 +1563,19 @@ static ssize_t o2hb_region_blocks_write(struct 
o2hb_region *reg,
const char *page,
size_t count)
 {
-   unsigned long tmp;
-   char *p = (char *)page;
+   unsigned int tmp;
+   int rv;
 
if (reg->hr_bdev)
return -EINVAL;
 
-   tmp = simple_strtoul(p, , 0);
-   if (!p || (*p && (*p != '\n')))
-   return -EINVAL;
-
+   rv = kstrtouint(page, 0, );
+   if (rv < 0)
+   return rv;
if (tmp > O2NM_MAX_NODES || tmp == 0)
return -ERANGE;
 
-   reg->hr_blocks = (unsigned int)tmp;
+   reg->hr_blocks = tmp;
 
return count;
 }
@@ -1717,9 +1711,8 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region 
*reg,
 size_t count)
 {
struct task_struct *hb_task;
-   long fd;
+   int fd;
int sectsize;
-   char *p = (char *)page;
struct fd f;
struct inode *inode;
ssize_t ret = -EINVAL;
@@ -1733,10 +1726,9 @@ static ssize_t o2hb_region_dev_write(struct o2hb_region 
*reg,
if (o2nm_this_node() == O2NM_MAX_NODES)
goto out;
 
-   fd = simple_strtol(p, , 0);
-   if (!p || (*p && (*p != '\n')))
+   ret = kstrtoint(page, 0, );
+   if (ret < 0)
goto out;
-
if (fd < 0 || fd >= INT_MAX)
goto out;
 
@@ -2210,12 +2202,12 @@ static ssize_t 
o2hb_heartbeat_group_threshold_store(struct o2hb_heartbeat_group
const char *page,
size_t count)
 {
-   unsigned long tmp;
-   char *p = (char *)page;
+   unsigned int tmp;
+   int rv;
 
-   tmp = simple_strtoul(p, , 10);
-   if (!p || (*p && (*p != '\n')))
-return -EINVAL;
+   rv = kstrtouint(page, 10, );
+   if (rv < 0)
+   return rv;
 
/* this will validate ranges for us. */
o2hb_dead_threshold_set((unsigned int) tmp);
--- a/fs/ocfs2/cluster/nodemanager.c
+++ b/fs/ocfs2/cluster/nodemanager.c
@@ -195,13 +195,12 @@ static ssize_t o2nm_node_num_write(struct o2nm_node 
*node, const char *page,
   size_t count)
 {
struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node);
-   unsigned long tmp;
-   char *p = (char *)page;
-
-   tmp = simple_strtoul(p, , 0);
-   if (!p || (*p && (*p != '\n')))
-   return -EINVAL;
+   unsigned int tmp;
+   int rv;
 
+   rv = parse_integer(page, 0, );
+   if (rv < 0)
+   return rv;
if (tmp >= O2NM_MAX_NODES)
return -ERANGE;
 
@@ -215,16 +214,15 @@ static ssize_t o2nm_node_num_write(struct o2nm_node 
*node, const char *page,
 
write_lock(>cl_nodes_lock);

Re: [v2,2/2] powerpc32: add support for csum_add()

2015-05-01 Thread Scott Wood
On Tue, 2015-04-28 at 21:01 +0200, christophe leroy wrote:
> 
> 
> Le 25/03/2015 02:30, Scott Wood a écrit :
> 
> > On Tue, Feb 03, 2015 at 12:39:27PM +0100, LEROY Christophe wrote:
> > > The C version of csum_add() as defined in include/net/checksum.h gives the
> > > following assembly:
> > >0:   7c 04 1a 14 add r0,r4,r3
> > >4:   7c 64 00 10 subfc   r3,r4,r0
> > >8:   7c 63 19 10 subfe   r3,r3,r3
> > >c:   7c 63 00 50 subfr3,r3,r0
> > > 
> > > include/net/checksum.h also offers the possibility to define an arch 
> > > specific
> > > function.
> > > This patch provides a ppc32 specific csum_add() inline function.
> > What makes it 32-bit specific?
> > 
> > 
> As far as I understand, the 64-bit will do a 64 bit addition, so we
> will have to handle differently the carry, can't just be an addze like
> in 32-bit.

OK.  Before I couldn't find where this was ifdeffed to 32-bit, but it's
in patch 1/2.

> The generated code is most likely different on ppc64. I have no ppc64
> compiler so I can't check what gcc generates for the following code:
> 
> __wsum csum_add(__wsum csum, __wsum addend)
> {
>   u32 res = (__force u32)csum;
>   res += (__force u32)addend;
>   return (__force __wsum)(res + (res < (__force u32)addend));
> }
> 
> Can someone with a ppc64 compiler tell what we get ?

With CONFIG_GENERIC_CPU:

   0xc0001af8 <+0>: add r3,r3,r4
   0xc0001afc <+4>: cmplw   cr7,r3,r4
   0xc0001b00 <+8>: mfcrr4
   0xc0001b04 <+12>:rlwinm  r4,r4,29,31,31
   0xc0001b08 <+16>:add r3,r4,r3
   0xc0001b0c <+20>:clrldi  r3,r3,32
   0xc0001b10 <+24>:blr

The mfcr is particularly nasty, at least on our chips.

With CONFIG_CPU_E6500:

   0xc0001b30 <+0>: add r3,r3,r4
   0xc0001b34 <+4>: cmplw   cr7,r3,r4
   0xc0001b38 <+8>: mfocrf  r4,1
   0xc0001b3c <+12>:rlwinm  r4,r4,29,31,31
   0xc0001b40 <+16>:add r3,r4,r3
   0xc0001b44 <+20>:clrldi  r3,r3,32
   0xc0001b48 <+24>:blr

Ideal (short of a 64-bit __wsum) would probably be something like (untested):

add r3,r3,r4
srdir5,r3,32
add r3,r3,r5
clrldi  r3,r3,32

Or in C code (which would let the compiler schedule it better):

static inline __wsum csum_add(__wsum csum, __wsum addend)
{
u64 res = (__force u64)csum;
res += (__force u32)addend;
return (__force __wsum)((u32)res + (res >> 32));
}

-Scott


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/10] fs/cachefiles/: convert to parse_integer()

2015-05-01 Thread Alexey Dobriyan
Convert away from deprecated simple_strto*() interfaces.

Switch "unsigned long" to "unsigned int" where possible.
kstrto*() functions can't be used because of trailing "%" sign. :^)

Signed-off-by: Alexey Dobriyan 
---

 fs/cachefiles/daemon.c |   84 ++---
 1 file changed, 45 insertions(+), 39 deletions(-)

--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -326,14 +326,15 @@ static int cachefiles_daemon_range_error(struct 
cachefiles_cache *cache,
  */
 static int cachefiles_daemon_frun(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long frun;
+   unsigned int frun;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   frun = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -350,14 +351,15 @@ static int cachefiles_daemon_frun(struct cachefiles_cache 
*cache, char *args)
  */
 static int cachefiles_daemon_fcull(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long fcull;
+   unsigned int fcull;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   fcull = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -374,14 +376,15 @@ static int cachefiles_daemon_fcull(struct 
cachefiles_cache *cache, char *args)
  */
 static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long fstop;
+   unsigned int fstop;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   fstop = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -398,14 +401,15 @@ static int cachefiles_daemon_fstop(struct 
cachefiles_cache *cache, char *args)
  */
 static int cachefiles_daemon_brun(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long brun;
+   unsigned int brun;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   brun = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -422,14 +426,15 @@ static int cachefiles_daemon_brun(struct cachefiles_cache 
*cache, char *args)
  */
 static int cachefiles_daemon_bcull(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long bcull;
+   unsigned int bcull;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   bcull = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -446,14 +451,15 @@ static int cachefiles_daemon_bcull(struct 
cachefiles_cache *cache, char *args)
  */
 static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long bstop;
+   unsigned int bstop;
+   int rv;
 
_enter(",%s", args);
 
-   if (!*args)
-   return -EINVAL;
-
-   bstop = simple_strtoul(args, , 10);
+   rv = parse_integer(args, 10, );
+   if (rv < 0)
+   return rv;
+   args += rv;
if (args[0] != '%' || args[1] != '\0')
return -EINVAL;
 
@@ -601,21 +607,21 @@ inval:
  */
 static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args)
 {
-   unsigned long mask;
+   unsigned int mask;
+   int rv;
 
_enter(",%s", args);
 
-   mask = simple_strtoul(args, , 0);
-   if (args[0] != '\0')
-   goto inval;
-
+   rv = parse_integer(args, 0, );
+   if (rv < 0)
+   return rv;
+   if (args[rv] != '\0') {
+   pr_err("debug command requires mask\n");
+   return -EINVAL;
+   }
cachefiles_debug = mask;
_leave(" = 0");
return 0;
-
-inval:
-   pr_err("debug command requires mask\n");
-   return -EINVAL;
 }
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/10] parse_integer: convert misc fs/ code

2015-05-01 Thread Alexey Dobriyan
Convert random fs/ code away from simple_strto*() interfaces.

Note about "struct simple_attr" conversion:
->set_buf is unneeded because everything can be done from stack.
->get_buf is useless as well, but that's a separate patch.
Mutex is not removed, as it may guard readers from writers,
separate story as well.
(code has been copied to arch/powerpc/.../spufs/, don't forget!)

binfmt_misc: file offset can't really be negative, type changed.

Signed-off-by: Alexey Dobriyan 
---

 fs/binfmt_misc.c |   12 +---
 fs/dcache.c  |2 +-
 fs/inode.c   |2 +-
 fs/libfs.c   |   26 ++
 fs/namespace.c   |4 ++--
 5 files changed, 23 insertions(+), 23 deletions(-)

--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -47,7 +47,7 @@ enum {Enabled, Magic};
 typedef struct {
struct list_head list;
unsigned long flags;/* type, status, etc. */
-   int offset; /* offset of magic */
+   unsigned int offset;/* offset of magic */
int size;   /* size of magic/mask */
char *magic;/* magic or filename extension */
char *mask; /* mask, NULL for exact match */
@@ -370,7 +370,13 @@ static Node *create_entry(const char __user *buffer, 
size_t count)
if (!s)
goto einval;
*s++ = '\0';
-   e->offset = simple_strtoul(p, , 10);
+   err = parse_integer(p, 10, >offset);
+   if (err < 0) {
+   kfree(e);
+   goto out;
+
+   }
+   p += err;
if (*p++)
goto einval;
pr_debug("register: offset: %#x\n", e->offset);
@@ -548,7 +554,7 @@ static void entry_status(Node *e, char *page)
if (!test_bit(Magic, >flags)) {
sprintf(dp, "extension .%s\n", e->magic);
} else {
-   dp += sprintf(dp, "offset %i\nmagic ", e->offset);
+   dp += sprintf(dp, "offset %u\nmagic ", e->offset);
dp = bin2hex(dp, e->magic, e->size);
if (e->mask) {
dp += sprintf(dp, "\nmask ");
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3374,7 +3374,7 @@ static int __init set_dhash_entries(char *str)
 {
if (!str)
return 0;
-   dhash_entries = simple_strtoul(str, , 0);
+   parse_integer(str, 0, _entries);
return 1;
 }
 __setup("dhash_entries=", set_dhash_entries);
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1787,7 +1787,7 @@ static int __init set_ihash_entries(char *str)
 {
if (!str)
return 0;
-   ihash_entries = simple_strtoul(str, , 0);
+   parse_integer(str, 0, _entries);
return 1;
 }
 __setup("ihash_entries=", set_ihash_entries);
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -752,7 +752,6 @@ struct simple_attr {
int (*get)(void *, u64 *);
int (*set)(void *, u64);
char get_buf[24];   /* enough to store a u64 and "\n\0" */
-   char set_buf[24];
void *data;
const char *fmt;/* format for read operation */
struct mutex mutex; /* protects access to these buffers */
@@ -830,31 +829,26 @@ ssize_t simple_attr_write(struct file *file, const char 
__user *buf,
  size_t len, loff_t *ppos)
 {
struct simple_attr *attr;
-   u64 val;
-   size_t size;
-   ssize_t ret;
+   s64 val;
+   int ret;
 
attr = file->private_data;
if (!attr->set)
return -EACCES;
 
+   ret = kstrtos64_from_user(buf, len, 0, );
+   if (ret < 0)
+   return ret;
+
ret = mutex_lock_interruptible(>mutex);
if (ret)
return ret;
-
-   ret = -EFAULT;
-   size = min(sizeof(attr->set_buf) - 1, len);
-   if (copy_from_user(attr->set_buf, buf, size))
-   goto out;
-
-   attr->set_buf[size] = '\0';
-   val = simple_strtoll(attr->set_buf, NULL, 0);
ret = attr->set(attr->data, val);
-   if (ret == 0)
-   ret = len; /* on success, claim we got the whole input */
-out:
mutex_unlock(>mutex);
-   return ret;
+   if (ret < 0)
+   return ret;
+   /* on success, claim we got the whole input */
+   return len;
 }
 EXPORT_SYMBOL_GPL(simple_attr_write);
 
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -37,7 +37,7 @@ static int __init set_mhash_entries(char *str)
 {
if (!str)
return 0;
-   mhash_entries = simple_strtoul(str, , 0);
+   parse_integer(str, 0, _entries);
return 1;
 }
 __setup("mhash_entries=", set_mhash_entries);
@@ -47,7 +47,7 @@ static int __init set_mphash_entries(char *str)
 {
if (!str)
return 0;
-   mphash_entries = simple_strtoul(str, , 0);
+   parse_integer(str, 0, _entries);
return 1;
 }
 

[PATCH 06/10] parse_integer: convert mm/

2015-05-01 Thread Alexey Dobriyan
Convert mm/ directory away from deprecated simple_strto*() interface.

One thing to note about parse_integer() and seemingly useless casts --
range of accepted values depends on result type.

int val;
parse_integer(s, 0, );

will accept negative integers, while

int val;
parse_integer(s, 0, (unsigned int *));

will accept only 0 and positive integers.

Cast is needed when result variable has to be of different type
for some reason.

This is very important and hopefully [knocks wood] obvious.

Signed-off-by: Alexey Dobriyan 
---

 mm/memcontrol.c |   19 +++
 mm/memtest.c|2 +-
 mm/page_alloc.c |2 +-
 mm/shmem.c  |   14 --
 4 files changed, 21 insertions(+), 16 deletions(-)

--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4096,20 +4096,23 @@ static ssize_t memcg_write_event_control(struct 
kernfs_open_file *of,
struct fd efile;
struct fd cfile;
const char *name;
-   char *endp;
int ret;
 
buf = strstrip(buf);
 
-   efd = simple_strtoul(buf, , 10);
-   if (*endp != ' ')
+   ret = parse_integer(buf, 10, );
+   if (ret < 0)
+   return ret;
+   buf += ret;
+   if (*buf++ != ' ')
return -EINVAL;
-   buf = endp + 1;
-
-   cfd = simple_strtoul(buf, , 10);
-   if ((*endp != ' ') && (*endp != '\0'))
+   ret = parse_integer(buf, 10, );
+   if (ret < 0)
+   return ret;
+   buf += ret;
+   if (*buf != ' ' && *buf != '\0')
return -EINVAL;
-   buf = endp + 1;
+   buf++;
 
event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event)
--- a/mm/memtest.c
+++ b/mm/memtest.c
@@ -93,7 +93,7 @@ static int memtest_pattern __initdata;
 static int __init parse_memtest(char *arg)
 {
if (arg)
-   memtest_pattern = simple_strtoul(arg, NULL, 0);
+   parse_integer(arg, 0, (unsigned int *)_pattern);
else
memtest_pattern = ARRAY_SIZE(patterns);
 
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6020,7 +6020,7 @@ static int __init set_hashdist(char *str)
 {
if (!str)
return 0;
-   hashdist = simple_strtoul(str, , 0);
+   parse_integer(str, 0, (unsigned int *));
return 1;
 }
 __setup("hashdist=", set_hashdist);
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2742,6 +2742,7 @@ static int shmem_parse_options(char *options, struct 
shmem_sb_info *sbinfo,
struct mempolicy *mpol = NULL;
uid_t uid;
gid_t gid;
+   int rv;
 
while (options != NULL) {
this_char = options;
@@ -2795,14 +2796,15 @@ static int shmem_parse_options(char *options, struct 
shmem_sb_info *sbinfo,
} else if (!strcmp(this_char,"mode")) {
if (remount)
continue;
-   sbinfo->mode = simple_strtoul(value, , 8) & 0;
-   if (*rest)
+   rv = parse_integer(value, 8, >mode);
+   if (rv < 0 || value[rv])
goto bad_val;
+   sbinfo->mode &= 0;
} else if (!strcmp(this_char,"uid")) {
if (remount)
continue;
-   uid = simple_strtoul(value, , 0);
-   if (*rest)
+   rv = parse_integer(value, 0, );
+   if (rv < 0 || value[rv])
goto bad_val;
sbinfo->uid = make_kuid(current_user_ns(), uid);
if (!uid_valid(sbinfo->uid))
@@ -2810,8 +2812,8 @@ static int shmem_parse_options(char *options, struct 
shmem_sb_info *sbinfo,
} else if (!strcmp(this_char,"gid")) {
if (remount)
continue;
-   gid = simple_strtoul(value, , 0);
-   if (*rest)
+   rv = parse_integer(value, 0, );
+   if (rv < 0 || value[rv])
goto bad_val;
sbinfo->gid = make_kgid(current_user_ns(), gid);
if (!gid_valid(sbinfo->gid))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/19] f2fs: move existing definitions into f2fs.h

2015-05-01 Thread Jaegeuk Kim
This patch moves some inode-related definitions from node.h to f2fs.h to
add new features.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h | 22 ++
 fs/f2fs/node.h | 22 --
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8be6cab..cd9748a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -320,6 +320,13 @@ struct extent_tree {
 #define FADVISE_COLD_BIT   0x01
 #define FADVISE_LOST_PINO_BIT  0x02
 
+#define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT)
+#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
+#define file_set_cold(inode)   set_file(inode, FADVISE_COLD_BIT)
+#define file_lost_pino(inode)  set_file(inode, FADVISE_LOST_PINO_BIT)
+#define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT)
+#define file_got_pino(inode)   clear_file(inode, FADVISE_LOST_PINO_BIT)
+
 #define DEF_DIR_LEVEL  0
 
 struct f2fs_inode_info {
@@ -1391,6 +1398,21 @@ static inline void f2fs_dentry_kunmap(struct inode *dir, 
struct page *page)
kunmap(page);
 }
 
+static inline int is_file(struct inode *inode, int type)
+{
+   return F2FS_I(inode)->i_advise & type;
+}
+
+static inline void set_file(struct inode *inode, int type)
+{
+   F2FS_I(inode)->i_advise |= type;
+}
+
+static inline void clear_file(struct inode *inode, int type)
+{
+   F2FS_I(inode)->i_advise &= ~type;
+}
+
 static inline int f2fs_readonly(struct super_block *sb)
 {
return sb->s_flags & MS_RDONLY;
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index c56026f..7427e95 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -343,28 +343,6 @@ static inline nid_t get_nid(struct page *p, int off, bool 
i)
  *  - Mark cold node blocks in their node footer
  *  - Mark cold data pages in page cache
  */
-static inline int is_file(struct inode *inode, int type)
-{
-   return F2FS_I(inode)->i_advise & type;
-}
-
-static inline void set_file(struct inode *inode, int type)
-{
-   F2FS_I(inode)->i_advise |= type;
-}
-
-static inline void clear_file(struct inode *inode, int type)
-{
-   F2FS_I(inode)->i_advise &= ~type;
-}
-
-#define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT)
-#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
-#define file_set_cold(inode)   set_file(inode, FADVISE_COLD_BIT)
-#define file_lost_pino(inode)  set_file(inode, FADVISE_LOST_PINO_BIT)
-#define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT)
-#define file_got_pino(inode)   clear_file(inode, FADVISE_LOST_PINO_BIT)
-
 static inline int is_cold_data(struct page *page)
 {
return PageChecked(page);
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/19] f2fs: clean up f2fs_lookup

2015-05-01 Thread Jaegeuk Kim
This patch cleans up to avoid deep indentation.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/namei.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 658e807..a311c3c 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -232,31 +232,32 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
struct inode *inode = NULL;
struct f2fs_dir_entry *de;
struct page *page;
+   nid_t ino;
 
if (dentry->d_name.len > F2FS_NAME_LEN)
return ERR_PTR(-ENAMETOOLONG);
 
de = f2fs_find_entry(dir, >d_name, );
-   if (de) {
-   nid_t ino = le32_to_cpu(de->ino);
-   f2fs_dentry_kunmap(dir, page);
-   f2fs_put_page(page, 0);
+   if (!de)
+   return d_splice_alias(inode, dentry);
 
-   inode = f2fs_iget(dir->i_sb, ino);
-   if (IS_ERR(inode))
-   return ERR_CAST(inode);
+   ino = le32_to_cpu(de->ino);
+   f2fs_dentry_kunmap(dir, page);
+   f2fs_put_page(page, 0);
 
-   if (f2fs_has_inline_dots(inode)) {
-   int err;
+   inode = f2fs_iget(dir->i_sb, ino);
+   if (IS_ERR(inode))
+   return ERR_CAST(inode);
+
+   if (f2fs_has_inline_dots(inode)) {
+   int err;
 
-   err = __recover_dot_dentries(inode, dir->i_ino);
-   if (err) {
-   iget_failed(inode);
-   return ERR_PTR(err);
-   }
+   err = __recover_dot_dentries(inode, dir->i_ino);
+   if (err) {
+   iget_failed(inode);
+   return ERR_PTR(err);
}
}
-
return d_splice_alias(inode, dentry);
 }
 
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/10] parse_integer: convert lib/

2015-05-01 Thread Alexey Dobriyan
Convert away lib/ from deprecated simple_strto*() interfaces.

"&(int []){0}[0]" expression is anonymous variable, don't be scared.
Filesystem option parser code does parsing 1.5 times: first to know
boundaries of a value (args[0].from, args[0].to) and then actual
parsing with match_number(). Noody knows why it does that.

match_number() needlessly allocates/duplicates memory,
parsing can be done straight from original string.

Signed-off-by: Alexey Dobriyan 
---

 lib/cmdline.c |   36 ++--
 lib/parser.c  |   29 -
 lib/swiotlb.c |2 +-
 3 files changed, 31 insertions(+), 36 deletions(-)

--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -27,7 +27,7 @@ static int get_range(char **str, int *pint)
int x, inc_counter, upper_range;
 
(*str)++;
-   upper_range = simple_strtol((*str), NULL, 0);
+   parse_integer(*str, 0, _range);
inc_counter = upper_range - *pint;
for (x = *pint; x < upper_range; x++)
*pint++ = x;
@@ -51,13 +51,14 @@ static int get_range(char **str, int *pint)
 
 int get_option(char **str, int *pint)
 {
-   char *cur = *str;
+   int len;
 
-   if (!cur || !(*cur))
+   if (!str || !*str)
return 0;
-   *pint = simple_strtol(cur, str, 0);
-   if (cur == *str)
+   len = parse_integer(*str, 0, pint);
+   if (len < 0)
return 0;
+   *str += len;
if (**str == ',') {
(*str)++;
return 2;
@@ -126,38 +127,37 @@ EXPORT_SYMBOL(get_options);
 
 unsigned long long memparse(const char *ptr, char **retptr)
 {
-   char *endptr;   /* local pointer to end of parsed string */
+   unsigned long long val;
 
-   unsigned long long ret = simple_strtoull(ptr, , 0);
-
-   switch (*endptr) {
+   ptr += parse_integer(ptr, 0, );
+   switch (*ptr) {
case 'E':
case 'e':
-   ret <<= 10;
+   val <<= 10;
case 'P':
case 'p':
-   ret <<= 10;
+   val <<= 10;
case 'T':
case 't':
-   ret <<= 10;
+   val <<= 10;
case 'G':
case 'g':
-   ret <<= 10;
+   val <<= 10;
case 'M':
case 'm':
-   ret <<= 10;
+   val <<= 10;
case 'K':
case 'k':
-   ret <<= 10;
-   endptr++;
+   val <<= 10;
+   ptr++;
default:
break;
}
 
if (retptr)
-   *retptr = endptr;
+   *retptr = (char *)ptr;
 
-   return ret;
+   return val;
 }
 EXPORT_SYMBOL(memparse);
 
--- a/lib/parser.c
+++ b/lib/parser.c
@@ -44,7 +44,7 @@ static int match_one(char *s, const char *p, substring_t 
args[])
p = meta + 1;
 
if (isdigit(*p))
-   len = simple_strtoul(p, (char **) , 10);
+   p += parse_integer(p, 10, (unsigned int *));
else if (*p == '%') {
if (*s++ != '%')
return 0;
@@ -68,19 +68,21 @@ static int match_one(char *s, const char *p, substring_t 
args[])
break;
}
case 'd':
-   simple_strtol(s, [argc].to, 0);
+   /* anonymous variable */
+   len = parse_integer(s, 0, &(int []){0}[0]);
goto num;
case 'u':
-   simple_strtoul(s, [argc].to, 0);
+   len = parse_integer(s, 0, &(unsigned int []){0}[0]);
goto num;
case 'o':
-   simple_strtoul(s, [argc].to, 8);
+   len = parse_integer(s, 8, &(unsigned int []){0}[0]);
goto num;
case 'x':
-   simple_strtoul(s, [argc].to, 16);
+   len = parse_integer(s, 16, &(unsigned int []){0}[0]);
num:
-   if (args[argc].to == args[argc].from)
+   if (len < 0)
return 0;
+   args[argc].to = args[argc].from + len;
break;
default:
return 0;
@@ -127,10 +129,8 @@ EXPORT_SYMBOL(match_token);
  */
 static int match_number(substring_t *s, int *result, int base)
 {
-   char *endp;
char *buf;
int ret;
-   long val;
size_t len = s->to - s->from;
 
buf = kmalloc(len + 1, GFP_KERNEL);
@@ -139,16 +139,11 @@ static int match_number(substring_t *s, int *result, int 
base)
memcpy(buf, s->from, len);
buf[len] = '\0';
 
-   ret = 0;
-   val = simple_strtol(buf, , base);
-   if (endp == buf)
-   ret = -EINVAL;
-   else if (val < (long)INT_MIN || val > (long)INT_MAX)
- 

[PATCH 05/19] f2fs: add f2fs_map_blocks

2015-05-01 Thread Jaegeuk Kim
This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks.
Now, f2fs uses f2fs_map_blocks when handling get_block.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c  | 72 ++---
 fs/f2fs/f2fs.h  | 16 ++
 include/trace/events/f2fs.h | 28 +-
 3 files changed, 71 insertions(+), 45 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 1e1aae6..aa3c079 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -251,19 +251,6 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t 
index)
return err;
 }
 
-static void f2fs_map_bh(struct super_block *sb, pgoff_t pgofs,
-   struct extent_info *ei, struct buffer_head *bh_result)
-{
-   unsigned int blkbits = sb->s_blocksize_bits;
-   size_t max_size = bh_result->b_size;
-   size_t mapped_size;
-
-   clear_buffer_new(bh_result);
-   map_bh(bh_result, sb, ei->blk + pgofs - ei->fofs);
-   mapped_size = (ei->fofs + ei->len - pgofs) << blkbits;
-   bh_result->b_size = min(max_size, mapped_size);
-}
-
 static bool lookup_extent_info(struct inode *inode, pgoff_t pgofs,
struct extent_info *ei)
 {
@@ -1208,18 +1195,18 @@ out:
 }
 
 /*
- * get_data_block() now supported readahead/bmap/rw direct_IO with mapped bh.
+ * f2fs_map_blocks() now supported readahead/bmap/rw direct_IO with
+ * f2fs_map_blocks structure.
  * If original data blocks are allocated, then give them to blockdev.
  * Otherwise,
  * a. preallocate requested block addresses
  * b. do not use extent cache for better performance
  * c. give the block addresses to blockdev
  */
-static int __get_data_block(struct inode *inode, sector_t iblock,
-   struct buffer_head *bh_result, int create, bool fiemap)
+static int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
+   int create, bool fiemap)
 {
-   unsigned int blkbits = inode->i_sb->s_blocksize_bits;
-   unsigned maxblocks = bh_result->b_size >> blkbits;
+   unsigned int maxblocks = map->m_len;
struct dnode_of_data dn;
int mode = create ? ALLOC_NODE : LOOKUP_NODE_RA;
pgoff_t pgofs, end_offset;
@@ -1227,11 +1214,16 @@ static int __get_data_block(struct inode *inode, 
sector_t iblock,
struct extent_info ei;
bool allocated = false;
 
-   /* Get the page offset from the block offset(iblock) */
-   pgofs = (pgoff_t)(iblock >> (PAGE_CACHE_SHIFT - blkbits));
+   map->m_len = 0;
+   map->m_flags = 0;
+
+   /* it only supports block size == page size */
+   pgofs = (pgoff_t)map->m_lblk;
 
if (f2fs_lookup_extent_cache(inode, pgofs, )) {
-   f2fs_map_bh(inode->i_sb, pgofs, , bh_result);
+   map->m_pblk = ei.blk + pgofs - ei.fofs;
+   map->m_len = min((pgoff_t)maxblocks, ei.fofs + ei.len - pgofs);
+   map->m_flags = F2FS_MAP_MAPPED;
goto out;
}
 
@@ -1250,21 +1242,21 @@ static int __get_data_block(struct inode *inode, 
sector_t iblock,
goto put_out;
 
if (dn.data_blkaddr != NULL_ADDR) {
-   clear_buffer_new(bh_result);
-   map_bh(bh_result, inode->i_sb, dn.data_blkaddr);
+   map->m_flags = F2FS_MAP_MAPPED;
+   map->m_pblk = dn.data_blkaddr;
} else if (create) {
err = __allocate_data_block();
if (err)
goto put_out;
allocated = true;
-   set_buffer_new(bh_result);
-   map_bh(bh_result, inode->i_sb, dn.data_blkaddr);
+   map->m_flags = F2FS_MAP_NEW | F2FS_MAP_MAPPED;
+   map->m_pblk = dn.data_blkaddr;
} else {
goto put_out;
}
 
end_offset = ADDRS_PER_PAGE(dn.node_page, F2FS_I(inode));
-   bh_result->b_size = (((size_t)1) << blkbits);
+   map->m_len = 1;
dn.ofs_in_node++;
pgofs++;
 
@@ -1288,22 +1280,22 @@ get_next:
end_offset = ADDRS_PER_PAGE(dn.node_page, F2FS_I(inode));
}
 
-   if (maxblocks > (bh_result->b_size >> blkbits)) {
+   if (maxblocks > map->m_len) {
block_t blkaddr = datablock_addr(dn.node_page, dn.ofs_in_node);
if (blkaddr == NULL_ADDR && create) {
err = __allocate_data_block();
if (err)
goto sync_out;
allocated = true;
-   set_buffer_new(bh_result);
+   map->m_flags |= F2FS_MAP_NEW;
blkaddr = dn.data_blkaddr;
}
/* Give more consecutive addresses for the readahead */
-   if (blkaddr == (bh_result->b_blocknr + ofs)) {
+   if (map->m_pblk != NEW_ADDR && blkaddr == (map->m_pblk + ofs)) 

[PATCH 09/19] f2fs: add f2fs_may_inline_{data, dentry}

2015-05-01 Thread Jaegeuk Kim
This patch adds f2fs_may_inline_data and f2fs_may_inline_dentry.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h   |  3 ++-
 fs/f2fs/file.c   |  2 +-
 fs/f2fs/inline.c | 13 -
 fs/f2fs/namei.c  |  4 ++--
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 7ff3ac7..2bb9b57 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1839,7 +1839,8 @@ extern struct kmem_cache *inode_entry_slab;
 /*
  * inline.c
  */
-bool f2fs_may_inline(struct inode *);
+bool f2fs_may_inline_data(struct inode *);
+bool f2fs_may_inline_dentry(struct inode *);
 void read_inline_data(struct page *, struct page *);
 bool truncate_inline_inode(struct page *, u64);
 int f2fs_read_inline_data(struct inode *, struct page *);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 3d6de54..ffd9b7e 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -560,7 +560,7 @@ void f2fs_truncate(struct inode *inode)
trace_f2fs_truncate(inode);
 
/* we should check inline_data size */
-   if (f2fs_has_inline_data(inode) && !f2fs_may_inline(inode)) {
+   if (f2fs_has_inline_data(inode) && !f2fs_may_inline_data(inode)) {
if (f2fs_convert_inline_inode(inode))
return;
}
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 8140e4f..99d5148 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -13,7 +13,7 @@
 
 #include "f2fs.h"
 
-bool f2fs_may_inline(struct inode *inode)
+bool f2fs_may_inline_data(struct inode *inode)
 {
if (!test_opt(F2FS_I_SB(inode), INLINE_DATA))
return false;
@@ -30,6 +30,17 @@ bool f2fs_may_inline(struct inode *inode)
return true;
 }
 
+bool f2fs_may_inline_dentry(struct inode *inode)
+{
+   if (!test_opt(F2FS_I_SB(inode), INLINE_DENTRY))
+   return false;
+
+   if (!S_ISDIR(inode->i_mode))
+   return false;
+
+   return true;
+}
+
 void read_inline_data(struct page *page, struct page *ipage)
 {
void *src_addr, *dst_addr;
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index a311c3c..c0ba8e3 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -56,9 +56,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
goto out;
}
 
-   if (f2fs_may_inline(inode))
+   if (f2fs_may_inline_data(inode))
set_inode_flag(F2FS_I(inode), FI_INLINE_DATA);
-   if (test_opt(sbi, INLINE_DENTRY) && S_ISDIR(inode->i_mode))
+   if (f2fs_may_inline_dentry(inode))
set_inode_flag(F2FS_I(inode), FI_INLINE_DENTRY);
 
trace_f2fs_new_inode(inode, 0);
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/19] f2fs: add sbi and page pointer in f2fs_io_info

2015-05-01 Thread Jaegeuk Kim
This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure.
With this change, we can reduce a lot of parameters for IO functions.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  9 +++--
 fs/f2fs/data.c   | 47 +--
 fs/f2fs/f2fs.h   | 18 --
 fs/f2fs/file.c   |  2 +-
 fs/f2fs/gc.c |  4 +++-
 fs/f2fs/inline.c |  4 +++-
 fs/f2fs/node.c   |  8 ++--
 fs/f2fs/segment.c| 38 --
 fs/f2fs/super.c  |  2 +-
 fs/f2fs/trace.c  |  6 +++---
 fs/f2fs/trace.h  |  2 +-
 11 files changed, 82 insertions(+), 58 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 72f64b3..6dbff2b 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -52,6 +52,7 @@ struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t 
index)
struct address_space *mapping = META_MAPPING(sbi);
struct page *page;
struct f2fs_io_info fio = {
+   .sbi = sbi,
.type = META,
.rw = READ_SYNC | REQ_META | REQ_PRIO,
.blk_addr = index,
@@ -65,7 +66,9 @@ repeat:
if (PageUptodate(page))
goto out;
 
-   if (f2fs_submit_page_bio(sbi, page, ))
+   fio.page = page;
+
+   if (f2fs_submit_page_bio())
goto repeat;
 
lock_page(page);
@@ -117,6 +120,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, 
int nrpages, int type
struct page *page;
block_t blkno = start;
struct f2fs_io_info fio = {
+   .sbi = sbi,
.type = META,
.rw = READ_SYNC | REQ_META | REQ_PRIO
};
@@ -160,7 +164,8 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, 
int nrpages, int type
continue;
}
 
-   f2fs_submit_page_mbio(sbi, page, );
+   fio.page = page;
+   f2fs_submit_page_mbio();
f2fs_put_page(page, 0);
}
 out:
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 2a3a9cd..81d1fd5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -158,16 +158,16 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
  * Fill the locked page with data located in the block address.
  * Return unlocked page.
  */
-int f2fs_submit_page_bio(struct f2fs_sb_info *sbi, struct page *page,
-   struct f2fs_io_info *fio)
+int f2fs_submit_page_bio(struct f2fs_io_info *fio)
 {
struct bio *bio;
+   struct page *page = fio->page;
 
trace_f2fs_submit_page_bio(page, fio);
-   f2fs_trace_ios(page, fio, 0);
+   f2fs_trace_ios(fio, 0);
 
/* Allocate a new bio */
-   bio = __bio_alloc(sbi, fio->blk_addr, 1, is_read_io(fio->rw));
+   bio = __bio_alloc(fio->sbi, fio->blk_addr, 1, is_read_io(fio->rw));
 
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
bio_put(bio);
@@ -179,9 +179,9 @@ int f2fs_submit_page_bio(struct f2fs_sb_info *sbi, struct 
page *page,
return 0;
 }
 
-void f2fs_submit_page_mbio(struct f2fs_sb_info *sbi, struct page *page,
-   struct f2fs_io_info *fio)
+void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
 {
+   struct f2fs_sb_info *sbi = fio->sbi;
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io;
bool is_read = is_read_io(fio->rw);
@@ -206,17 +206,17 @@ alloc_new:
io->fio = *fio;
}
 
-   if (bio_add_page(io->bio, page, PAGE_CACHE_SIZE, 0) <
+   if (bio_add_page(io->bio, fio->page, PAGE_CACHE_SIZE, 0) <
PAGE_CACHE_SIZE) {
__submit_merged_bio(io);
goto alloc_new;
}
 
io->last_block_in_bio = fio->blk_addr;
-   f2fs_trace_ios(page, fio, 0);
+   f2fs_trace_ios(fio, 0);
 
up_write(>io_rwsem);
-   trace_f2fs_submit_page_mbio(page, fio);
+   trace_f2fs_submit_page_mbio(fio->page, fio);
 }
 
 /*
@@ -925,6 +925,7 @@ struct page *find_data_page(struct inode *inode, pgoff_t 
index, bool sync)
struct extent_info ei;
int err;
struct f2fs_io_info fio = {
+   .sbi = F2FS_I_SB(inode),
.type = DATA,
.rw = sync ? READ_SYNC : READA,
};
@@ -971,7 +972,8 @@ got_it:
}
 
fio.blk_addr = dn.data_blkaddr;
-   err = f2fs_submit_page_bio(F2FS_I_SB(inode), page, );
+   fio.page = page;
+   err = f2fs_submit_page_bio();
if (err)
return ERR_PTR(err);
 
@@ -998,6 +1000,7 @@ struct page *get_lock_data_page(struct inode *inode, 
pgoff_t index)
struct extent_info ei;
int err;
struct f2fs_io_info fio = {
+   .sbi = F2FS_I_SB(inode),
.type = DATA,
.rw = 

[PATCH 13/19] f2fs: fix race on allocating and deallocating a dentry block

2015-05-01 Thread Jaegeuk Kim
There are two threads:
 f2fs_delete_entry()  get_new_data_page()
  f2fs_reserve_block()
  dn.blkaddr = XXX
 lock_page(dentry_block)
 truncate_hole()
 dn.blkaddr = NULL
 unlock_page(dentry_block)
  lock_page(dentry_block)
  fill the block from XXX address
  add new dentries
  unlock_page(dentry_block)

Later, f2fs_write_data_page() will truncate the dentry_block, since
its block address is NULL.

The reason for this was due to the wrong lock order.
In this case, we should do f2fs_reserve_block() after locking its dentry block.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 81d1fd5..9ba30b4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1076,20 +1076,22 @@ struct page *get_new_data_page(struct inode *inode,
struct page *page;
struct dnode_of_data dn;
int err;
+repeat:
+   page = grab_cache_page(mapping, index);
+   if (!page)
+   return ERR_PTR(-ENOMEM);
 
set_new_dnode(, inode, ipage, NULL, 0);
err = f2fs_reserve_block(, index);
-   if (err)
+   if (err) {
+   f2fs_put_page(page, 1);
return ERR_PTR(err);
-repeat:
-   page = grab_cache_page(mapping, index);
-   if (!page) {
-   err = -ENOMEM;
-   goto put_err;
}
+   if (!ipage)
+   f2fs_put_dnode();
 
if (PageUptodate(page))
-   return page;
+   goto got_it;
 
if (dn.data_blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_CACHE_SIZE);
@@ -1104,20 +1106,19 @@ repeat:
};
err = f2fs_submit_page_bio();
if (err)
-   goto put_err;
+   return ERR_PTR(err);
 
lock_page(page);
if (unlikely(!PageUptodate(page))) {
f2fs_put_page(page, 1);
-   err = -EIO;
-   goto put_err;
+   return ERR_PTR(-EIO);
}
if (unlikely(page->mapping != mapping)) {
f2fs_put_page(page, 1);
goto repeat;
}
}
-
+got_it:
if (new_i_size &&
i_size_read(inode) < ((index + 1) << PAGE_CACHE_SHIFT)) {
i_size_write(inode, ((index + 1) << PAGE_CACHE_SHIFT));
@@ -1125,10 +1126,6 @@ repeat:
set_inode_flag(F2FS_I(inode), FI_UPDATE_DIR);
}
return page;
-
-put_err:
-   f2fs_put_dnode();
-   return ERR_PTR(err);
 }
 
 static int __allocate_data_block(struct dnode_of_data *dn)
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/19] f2fs: add feature facility in superblock

2015-05-01 Thread Jaegeuk Kim
This patch introduces a feature in superblock, which will indicate any new
features for f2fs.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  | 7 +++
 include/linux/f2fs_fs.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cd9748a..e1dd986 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -70,6 +70,13 @@ struct f2fs_mount_info {
unsigned intopt;
 };
 
+#define F2FS_HAS_FEATURE(sb, mask) \
+   ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
+#define F2FS_SET_FEATURE(sb, mask) \
+   F2FS_SB(sb)->raw_super->feature |= cpu_to_le32(mask)
+#define F2FS_CLEAR_FEATURE(sb, mask)   \
+   F2FS_SB(sb)->raw_super->feature &= ~cpu_to_le32(mask)
+
 #define CRCPOLY_LE 0xedb88320
 
 static inline __u32 f2fs_crc32(void *buf, size_t len)
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index 8d345c2..d44e97f 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -90,7 +90,8 @@ struct f2fs_super_block {
__le32 cp_payload;
__u8 version[VERSION_LEN];  /* the kernel version */
__u8 init_version[VERSION_LEN]; /* the initial kernel version */
-   __u8 reserved[892]; /* valid reserved region */
+   __le32 feature; /* defined features */
+   __u8 reserved[888]; /* valid reserved region */
 } __packed;
 
 /*
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/19] f2fs: introduce dot and dotdot name check

2015-05-01 Thread Jaegeuk Kim
This patch adds an inline function to check dot and dotdot names.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h | 11 +++
 fs/f2fs/hash.c |  3 +--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e99a404..b8f99fd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1454,6 +1454,17 @@ static inline void f2fs_stop_checkpoint(struct 
f2fs_sb_info *sbi)
sbi->sb->s_flags |= MS_RDONLY;
 }
 
+static inline bool is_dot_dotdot(const struct qstr *str)
+{
+   if (str->len == 1 && str->name[0] == '.')
+   return true;
+
+   if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
+   return true;
+
+   return false;
+}
+
 #define get_inode_mode(i) \
((is_inode_flag_set(F2FS_I(i), FI_ACL_MODE)) ? \
 (F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
index a844fcf..71b7206 100644
--- a/fs/f2fs/hash.c
+++ b/fs/f2fs/hash.c
@@ -79,8 +79,7 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info)
const unsigned char *name = name_info->name;
size_t len = name_info->len;
 
-   if ((len <= 2) && (name[0] == '.') &&
-   (name[1] == '.' || name[1] == '\0'))
+   if (is_dot_dotdot(name_info))
return 0;
 
/* Initialize the default seed for the hash checksum functions */
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/19] f2fs: introduce discard_map for f2fs_trim_fs

2015-05-01 Thread Jaegeuk Kim
This patch adds a bitmap for discard issues from f2fs_trim_fs.
There-in rule is to issue discard commands only for invalidated blocks
after mount.
Once mount is done, f2fs_trim_fs trims out whole invalid area.
After ehn, it will not issue and discrads redundantly.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/debug.c   |  2 +-
 fs/f2fs/f2fs.h| 21 +
 fs/f2fs/segment.c | 52 +++-
 fs/f2fs/segment.h |  1 +
 4 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index f5388f3..f50acbc 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -143,7 +143,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->base_mem += sizeof(struct sit_info);
si->base_mem += MAIN_SEGS(sbi) * sizeof(struct seg_entry);
si->base_mem += f2fs_bitmap_size(MAIN_SEGS(sbi));
-   si->base_mem += 2 * SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi);
+   si->base_mem += 3 * SIT_VBLOCK_MAP_SIZE * MAIN_SEGS(sbi);
si->base_mem += SIT_VBLOCK_MAP_SIZE;
if (sbi->segs_per_sec > 1)
si->base_mem += MAIN_SECS(sbi) * sizeof(struct sec_entry);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 78a4300..98fc719 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -117,6 +117,8 @@ enum {
 #define DEF_BATCHED_TRIM_SECTIONS  32
 #define BATCHED_TRIM_SEGMENTS(sbi) \
(SM_I(sbi)->trim_sections * (sbi)->segs_per_sec)
+#define BATCHED_TRIM_BLOCKS(sbi)   \
+   (BATCHED_TRIM_SEGMENTS(sbi) << (sbi)->log_blocks_per_seg)
 
 struct cp_control {
int reason;
@@ -698,6 +700,7 @@ struct f2fs_sb_info {
block_t user_block_count;   /* # of user blocks */
block_t total_valid_block_count;/* # of valid blocks */
block_t alloc_valid_block_count;/* # of allocated blocks */
+   block_t discard_blks;   /* discard command candidats */
block_t last_valid_block_count; /* for recovery */
u32 s_next_generation;  /* for NFS support */
atomic_t nr_pages[NR_COUNT_TYPE];   /* # of pages, see count_type */
@@ -1225,6 +1228,24 @@ static inline int f2fs_test_bit(unsigned int nr, char 
*addr)
return mask & *addr;
 }
 
+static inline void f2fs_set_bit(unsigned int nr, char *addr)
+{
+   int mask;
+
+   addr += (nr >> 3);
+   mask = 1 << (7 - (nr & 0x07));
+   *addr |= mask;
+}
+
+static inline void f2fs_clear_bit(unsigned int nr, char *addr)
+{
+   int mask;
+
+   addr += (nr >> 3);
+   mask = 1 << (7 - (nr & 0x07));
+   *addr &= ~mask;
+}
+
 static inline int f2fs_test_and_set_bit(unsigned int nr, char *addr)
 {
int mask;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index df8bce5..5a4ec01 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -483,10 +483,12 @@ void discard_next_dnode(struct f2fs_sb_info *sbi, block_t 
blkaddr)
 }
 
 static void __add_discard_entry(struct f2fs_sb_info *sbi,
-   struct cp_control *cpc, unsigned int start, unsigned int end)
+   struct cp_control *cpc, struct seg_entry *se,
+   unsigned int start, unsigned int end)
 {
struct list_head *head = _I(sbi)->discard_list;
struct discard_entry *new, *last;
+   unsigned int i;
 
if (!list_empty(head)) {
last = list_last_entry(head, struct discard_entry, list);
@@ -504,6 +506,10 @@ static void __add_discard_entry(struct f2fs_sb_info *sbi,
list_add_tail(>list, head);
 done:
SM_I(sbi)->nr_discards += end - start;
+   for (i = start; i < end; i++) {
+   f2fs_set_bit(i, se->discard_map);
+   sbi->discard_blks--;
+   }
cpc->trimmed += end - start;
 }
 
@@ -514,6 +520,7 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
struct seg_entry *se = get_seg_entry(sbi, cpc->trim_start);
unsigned long *cur_map = (unsigned long *)se->cur_valid_map;
unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map;
+   unsigned long *discard_map = (unsigned long *)se->discard_map;
unsigned long *dmap = SIT_I(sbi)->tmp_map;
unsigned int start = 0, end = -1;
bool force = (cpc->reason == CP_DISCARD);
@@ -523,8 +530,11 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
SM_I(sbi)->nr_discards >= SM_I(sbi)->max_discards))
return;
 
-   if (force && !se->valid_blocks) {
+   if (!se->valid_blocks) {
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
+
+   if (!force)
+   return;
/*
 * if this segment is registered in the prefree list, then
 * we should skip adding a discard candidate, and let the
@@ -537,18 +547,14 @@ static void add_discard_addrs(struct f2fs_sb_info 

[PATCH 17/19] f2fs: revmove spin_lock for write_orphan_inodes

2015-05-01 Thread Jaegeuk Kim
This patch removes spin_lock, since this is covered by f2fs_lock_op already.
And, we should avoid to use page operations inside spin_lock.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 6dbff2b..d076e7e 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -514,7 +514,12 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
grab_meta_page(sbi, start_blk + index);
 
index = 1;
-   spin_lock(>ino_lock);
+
+   /*
+* we don't need to do spin_lock(>ino_lock) here, since all the
+* orphan inode operations are covered under f2fs_lock_op().
+* And, spin_lock should be avoided due to page operations below.
+*/
head = >ino_list;
 
/* loop for each orphan inode entry and write them in Jornal block */
@@ -554,8 +559,6 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, 
block_t start_blk)
set_page_dirty(page);
f2fs_put_page(page, 1);
}
-
-   spin_unlock(>ino_lock);
 }
 
 static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10] parse_integer: convert sscanf()

2015-05-01 Thread Alexey Dobriyan
Remove base second guessing.

Uniformly fix too liberal acceptance in %lu/%ld cases in the next patch.

Signed-off-by: Alexey Dobriyan 
---

 lib/vsprintf.c |   36 ++--
 1 file changed, 10 insertions(+), 26 deletions(-)

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2470,8 +2470,6 @@ EXPORT_SYMBOL_GPL(bprintf);
 int vsscanf(const char *buf, const char *fmt, va_list args)
 {
const char *str = buf;
-   char *next;
-   char digit;
int num = 0;
u8 qualifier;
unsigned int base;
@@ -2483,6 +2481,8 @@ int vsscanf(const char *buf, const char *fmt, va_list 
args)
bool is_sign;
 
while (*fmt) {
+   int len;
+
/* skip any white space in format */
/* white space in format matchs any amount of
 * white space, including none, in the input.
@@ -2611,35 +2611,22 @@ int vsscanf(const char *buf, const char *fmt, va_list 
args)
 */
str = skip_spaces(str);
 
-   digit = *str;
-   if (is_sign && digit == '-')
-   digit = *(str + 1);
-
-   if (!digit
-   || (base == 16 && !isxdigit(digit))
-   || (base == 10 && !isdigit(digit))
-   || (base == 8 && (!isdigit(digit) || digit > '7'))
-   || (base == 0 && !isdigit(digit)))
-   break;
-
if (is_sign)
-   val.s = qualifier != 'L' ?
-   simple_strtol(str, , base) :
-   simple_strtoll(str, , base);
+   len = parse_integer(str, base, );
else
-   val.u = qualifier != 'L' ?
-   simple_strtoul(str, , base) :
-   simple_strtoull(str, , base);
+   len = parse_integer(str, base, );
+   if (len < 0)
+   break;
 
-   if (field_width > 0 && next - str > field_width) {
+   if (field_width > 0) {
if (base == 0)
_parse_integer_fixup_radix(str, );
-   while (next - str > field_width) {
+   while (len > field_width) {
if (is_sign)
val.s = div_s64(val.s, base);
else
val.u = div_u64(val.u, base);
-   --next;
+   len--;
}
}
 
@@ -2680,10 +2667,7 @@ int vsscanf(const char *buf, const char *fmt, va_list 
args)
break;
}
num++;
-
-   if (!next)
-   break;
-   str = next;
+   str += len;
}
 
return num;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/10] sscanf: fix overflow

2015-05-01 Thread Alexey Dobriyan
Fun fact:

uint8_t val;
sscanf("256", "%hhu", );

will return 1 (as it should), and make val=0 (as it should not).

Apart from correctness, patch allows to remove checks and switch
to proper types in several (most?) cases:

grep -e 'scanf.*%[0-9]\+[dioux]' -n -r .

Such checks can be incorrect too, checking for 3 digits with %3u
for parsing uint8_t is not enough.

Signed-off-by: Alexey Dobriyan 
---

 lib/vsprintf.c |   45 ++---
 1 file changed, 34 insertions(+), 11 deletions(-)

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2632,44 +2632,67 @@ int vsscanf(const char *buf, const char *fmt, va_list 
args)
 
switch (qualifier) {
case 'H':   /* that's 'hh' in format */
-   if (is_sign)
+   if (is_sign) {
+   if (val.s != (signed char)val.s)
+   goto out;
*va_arg(args, signed char *) = val.s;
-   else
+   } else {
+   if (val.u != (unsigned char)val.u)
+   goto out;
*va_arg(args, unsigned char *) = val.u;
+   }
break;
case 'h':
-   if (is_sign)
+   if (is_sign) {
+   if (val.s != (short)val.s)
+   goto out;
*va_arg(args, short *) = val.s;
-   else
+   } else {
+   if (val.u != (unsigned short)val.u)
+   goto out;
*va_arg(args, unsigned short *) = val.u;
+   }
break;
case 'l':
-   if (is_sign)
+   if (is_sign) {
+   if (val.s != (long)val.s)
+   goto out;
*va_arg(args, long *) = val.s;
-   else
+   } else {
+   if (val.u != (unsigned long)val.u)
+   goto out;
*va_arg(args, unsigned long *) = val.u;
+   }
break;
case 'L':
-   if (is_sign)
+   if (is_sign) {
*va_arg(args, long long *) = val.s;
-   else
+   } else {
*va_arg(args, unsigned long long *) = val.u;
+   }
break;
case 'Z':
case 'z':
+   if (val.u != (size_t)val.u)
+   goto out;
*va_arg(args, size_t *) = val.u;
break;
default:
-   if (is_sign)
+   if (is_sign) {
+   if (val.s != (int)val.s)
+   goto out;
*va_arg(args, int *) = val.s;
-   else
+   } else {
+   if (val.u != (unsigned int)val.u)
+   goto out;
*va_arg(args, unsigned int *) = val.u;
+   }
break;
}
num++;
str += len;
}
-
+out:
return num;
 }
 EXPORT_SYMBOL(vsscanf);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/19] f2fs: fix counting the number of inline_data inodes

2015-05-01 Thread Jaegeuk Kim
This patch fixes to count the missing symlink case.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/namei.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index c0ba8e3..90a9640 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -61,6 +61,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
if (f2fs_may_inline_dentry(inode))
set_inode_flag(F2FS_I(inode), FI_INLINE_DENTRY);
 
+   stat_inc_inline_inode(inode);
+   stat_inc_inline_dir(inode);
+
trace_f2fs_new_inode(inode, 0);
mark_inode_dirty(inode);
return inode;
@@ -136,7 +139,6 @@ static int f2fs_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
 
alloc_nid_done(sbi, ino);
 
-   stat_inc_inline_inode(inode);
d_instantiate(dentry, inode);
unlock_new_inode(inode);
 
@@ -384,7 +386,6 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
goto out_fail;
f2fs_unlock_op(sbi);
 
-   stat_inc_inline_dir(inode);
alloc_nid_done(sbi, inode->i_ino);
 
d_instantiate(dentry, inode);
@@ -770,7 +771,6 @@ static int f2fs_tmpfile(struct inode *dir, struct dentry 
*dentry, umode_t mode)
 
alloc_nid_done(sbi, inode->i_ino);
 
-   stat_inc_inline_inode(inode);
d_tmpfile(dentry, inode);
unlock_new_inode(inode);
return 0;
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/19] f2fs: move get_page for gc victims

2015-05-01 Thread Jaegeuk Kim
This patch moves getting victim page into move_data_page.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/gc.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 72667a5..1bd11f0 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -518,14 +518,13 @@ static int check_dnode(struct f2fs_sb_info *sbi, struct 
f2fs_summary *sum,
return 1;
 }
 
-static void move_data_page(struct inode *inode, struct page *page, int gc_type)
+static void move_data_page(struct inode *inode, block_t bidx, int gc_type)
 {
-   struct f2fs_io_info fio = {
-   .sbi = F2FS_I_SB(inode),
-   .type = DATA,
-   .rw = WRITE_SYNC,
-   .page = page,
-   };
+   struct page *page;
+
+   page = get_lock_data_page(inode, bidx);
+   if (IS_ERR(page))
+   return;
 
if (gc_type == BG_GC) {
if (PageWriteback(page))
@@ -533,6 +532,12 @@ static void move_data_page(struct inode *inode, struct 
page *page, int gc_type)
set_page_dirty(page);
set_cold_data(page);
} else {
+   struct f2fs_io_info fio = {
+   .sbi = F2FS_I_SB(inode),
+   .type = DATA,
+   .rw = WRITE_SYNC,
+   .page = page,
+   };
f2fs_wait_on_page_writeback(page, DATA);
 
if (clear_page_dirty_for_io(page))
@@ -618,12 +623,9 @@ next_step:
/* phase 3 */
inode = find_gc_inode(gc_list, dni.ino);
if (inode) {
-   start_bidx = start_bidx_of_node(nofs, F2FS_I(inode));
-   data_page = get_lock_data_page(inode,
-   start_bidx + ofs_in_node);
-   if (IS_ERR(data_page))
-   continue;
-   move_data_page(inode, data_page, gc_type);
+   start_bidx = start_bidx_of_node(nofs, F2FS_I(inode))
+   + ofs_in_node;
+   move_data_page(inode, start_bidx, gc_type);
stat_inc_data_blk_count(sbi, 1, gc_type);
}
}
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/19] f2fs: add need_dentry_mark

2015-05-01 Thread Jaegeuk Kim
This patch introduces need_dentry_mark() to clean up and avoid redundant
node locks.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/node.c | 35 +--
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index b8f99fd..9e43ddc 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1565,6 +1565,7 @@ struct dnode_of_data;
 struct node_info;
 
 bool available_free_memory(struct f2fs_sb_info *, int);
+int need_dentry_mark(struct f2fs_sb_info *, nid_t);
 bool is_checkpointed_node(struct f2fs_sb_info *, nid_t);
 bool need_inode_block_update(struct f2fs_sb_info *, nid_t);
 void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 880d578..62982e6 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -195,32 +195,35 @@ static unsigned int __gang_lookup_nat_set(struct 
f2fs_nm_info *nm_i,
start, nr);
 }
 
-bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
+int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
-   bool is_cp = true;
+   bool need = false;
 
down_read(_i->nat_tree_lock);
e = __lookup_nat_cache(nm_i, nid);
-   if (e && !get_nat_flag(e, IS_CHECKPOINTED))
-   is_cp = false;
+   if (e) {
+   if (!get_nat_flag(e, IS_CHECKPOINTED) &&
+   !get_nat_flag(e, HAS_FSYNCED_INODE))
+   need = true;
+   }
up_read(_i->nat_tree_lock);
-   return is_cp;
+   return need;
 }
 
-static bool has_fsynced_inode(struct f2fs_sb_info *sbi, nid_t ino)
+bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
-   bool fsynced = false;
+   bool is_cp = true;
 
down_read(_i->nat_tree_lock);
-   e = __lookup_nat_cache(nm_i, ino);
-   if (e && get_nat_flag(e, HAS_FSYNCED_INODE))
-   fsynced = true;
+   e = __lookup_nat_cache(nm_i, nid);
+   if (e && !get_nat_flag(e, IS_CHECKPOINTED))
+   is_cp = false;
up_read(_i->nat_tree_lock);
-   return fsynced;
+   return is_cp;
 }
 
 bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino)
@@ -1206,13 +1209,9 @@ continue_unlock:
/* called by fsync() */
if (ino && IS_DNODE(page)) {
set_fsync_mark(page, 1);
-   if (IS_INODE(page)) {
-   if (!is_checkpointed_node(sbi, ino) &&
-   !has_fsynced_inode(sbi, ino))
-   set_dentry_mark(page, 1);
-   else
-   set_dentry_mark(page, 0);
-   }
+   if (IS_INODE(page))
+   set_dentry_mark(page,
+   need_dentry_mark(sbi, ino));
nwritten++;
} else {
set_fsync_mark(page, 0);
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/19] f2fs: issue discard with finally produced len and minlen

2015-05-01 Thread Jaegeuk Kim
This patch determines to issue discard commands by comparing given minlen and
the length of produced final candidates.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  2 +-
 fs/f2fs/f2fs.h   |  2 +-
 fs/f2fs/segment.c| 10 ++
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d076e7e..1da20a6 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1043,7 +1043,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
if (unlikely(f2fs_cp_error(sbi)))
return;
 
-   clear_prefree_segments(sbi);
+   clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);
 }
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 98fc719..0b8c454 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1628,7 +1628,7 @@ int create_flush_cmd_control(struct f2fs_sb_info *);
 void destroy_flush_cmd_control(struct f2fs_sb_info *);
 void invalidate_blocks(struct f2fs_sb_info *, block_t);
 void refresh_sit_entry(struct f2fs_sb_info *, block_t, block_t);
-void clear_prefree_segments(struct f2fs_sb_info *);
+void clear_prefree_segments(struct f2fs_sb_info *, struct cp_control *);
 void release_discard_addrs(struct f2fs_sb_info *);
 void discard_next_dnode(struct f2fs_sb_info *, block_t);
 int npages_for_summary_flush(struct f2fs_sb_info *, bool);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 5a4ec01..667bbf2 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,7 +597,7 @@ static void set_prefree_as_free_segments(struct 
f2fs_sb_info *sbi)
mutex_unlock(_i->seglist_lock);
 }
 
-void clear_prefree_segments(struct f2fs_sb_info *sbi)
+void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
 {
struct list_head *head = &(SM_I(sbi)->discard_list);
struct discard_entry *entry, *this;
@@ -630,7 +630,10 @@ void clear_prefree_segments(struct f2fs_sb_info *sbi)
 
/* send small discards */
list_for_each_entry_safe(entry, this, head, list) {
+   if (cpc->reason == CP_DISCARD && entry->len < cpc->trim_minlen)
+   goto skip;
f2fs_issue_discard(sbi, entry->blkaddr, entry->len);
+skip:
list_del(>list);
SM_I(sbi)->nr_discards -= entry->len;
kmem_cache_free(discard_entry_slab, entry);
@@ -1072,8 +1075,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
fstrim_range *range)
unsigned int start_segno, end_segno;
struct cp_control cpc;
 
-   if (range->minlen > SEGMENT_SIZE(sbi) || start >= MAX_BLKADDR(sbi) ||
-   range->len < sbi->blocksize)
+   if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
return -EINVAL;
 
cpc.trimmed = 0;
@@ -1085,7 +1087,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
fstrim_range *range)
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
GET_SEGNO(sbi, end);
cpc.reason = CP_DISCARD;
-   cpc.trim_minlen = F2FS_BYTES_TO_BLK(range->minlen);
+   cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
 
/* do checkpoint to issue discard commands safely */
for (; start_segno <= end_segno; start_segno = cpc.trim_end + 1) {
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/19] f2fs: split find_data_page according to specific purposes

2015-05-01 Thread Jaegeuk Kim
This patch splits find_data_page as follows.

1. f2fs_gc
 - use get_read_data_page() with read only

2. find_in_level
 - use find_data_page without locked page

3. truncate_partial_page
 - In the case cache_only mode, just drop cached page.
 - Ohterwise, use get_lock_data_page() and guarantee to truncate

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 127 ++---
 fs/f2fs/dir.c  |   2 +-
 fs/f2fs/f2fs.h |   3 +-
 fs/f2fs/file.c |  26 +++-
 fs/f2fs/gc.c   |   5 +--
 5 files changed, 68 insertions(+), 95 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ba30b4..3b76261 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -917,7 +917,7 @@ void f2fs_update_extent_cache(struct dnode_of_data *dn)
sync_inode_page(dn);
 }
 
-struct page *find_data_page(struct inode *inode, pgoff_t index, bool sync)
+struct page *get_read_data_page(struct inode *inode, pgoff_t index, int rw)
 {
struct address_space *mapping = inode->i_mapping;
struct dnode_of_data dn;
@@ -927,84 +927,9 @@ struct page *find_data_page(struct inode *inode, pgoff_t 
index, bool sync)
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
.type = DATA,
-   .rw = sync ? READ_SYNC : READA,
+   .rw = rw,
};
 
-   /*
-* If sync is false, it needs to check its block allocation.
-* This is need and triggered by two flows:
-*   gc and truncate_partial_data_page.
-*/
-   if (!sync)
-   goto search;
-
-   page = find_get_page(mapping, index);
-   if (page && PageUptodate(page))
-   return page;
-   f2fs_put_page(page, 0);
-search:
-   if (f2fs_lookup_extent_cache(inode, index, )) {
-   dn.data_blkaddr = ei.blk + index - ei.fofs;
-   goto got_it;
-   }
-
-   set_new_dnode(, inode, NULL, NULL, 0);
-   err = get_dnode_of_data(, index, LOOKUP_NODE);
-   if (err)
-   return ERR_PTR(err);
-   f2fs_put_dnode();
-
-   if (dn.data_blkaddr == NULL_ADDR)
-   return ERR_PTR(-ENOENT);
-
-   /* By fallocate(), there is no cached page, but with NEW_ADDR */
-   if (unlikely(dn.data_blkaddr == NEW_ADDR))
-   return ERR_PTR(-EINVAL);
-
-got_it:
-   page = grab_cache_page(mapping, index);
-   if (!page)
-   return ERR_PTR(-ENOMEM);
-
-   if (PageUptodate(page)) {
-   unlock_page(page);
-   return page;
-   }
-
-   fio.blk_addr = dn.data_blkaddr;
-   fio.page = page;
-   err = f2fs_submit_page_bio();
-   if (err)
-   return ERR_PTR(err);
-
-   if (sync) {
-   wait_on_page_locked(page);
-   if (unlikely(!PageUptodate(page))) {
-   f2fs_put_page(page, 0);
-   return ERR_PTR(-EIO);
-   }
-   }
-   return page;
-}
-
-/*
- * If it tries to access a hole, return an error.
- * Because, the callers, functions in dir.c and GC, should be able to know
- * whether this page exists or not.
- */
-struct page *get_lock_data_page(struct inode *inode, pgoff_t index)
-{
-   struct address_space *mapping = inode->i_mapping;
-   struct dnode_of_data dn;
-   struct page *page;
-   struct extent_info ei;
-   int err;
-   struct f2fs_io_info fio = {
-   .sbi = F2FS_I_SB(inode),
-   .type = DATA,
-   .rw = READ_SYNC,
-   };
-repeat:
page = grab_cache_page(mapping, index);
if (!page)
return ERR_PTR(-ENOMEM);
@@ -1026,10 +951,11 @@ repeat:
f2fs_put_page(page, 1);
return ERR_PTR(-ENOENT);
}
-
 got_it:
-   if (PageUptodate(page))
+   if (PageUptodate(page)) {
+   unlock_page(page);
return page;
+   }
 
/*
 * A new dentry page is allocated but not able to be written, since its
@@ -1040,6 +966,7 @@ got_it:
if (dn.data_blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page);
+   unlock_page(page);
return page;
}
 
@@ -1048,7 +975,49 @@ got_it:
err = f2fs_submit_page_bio();
if (err)
return ERR_PTR(err);
+   return page;
+}
+
+struct page *find_data_page(struct inode *inode, pgoff_t index)
+{
+   struct address_space *mapping = inode->i_mapping;
+   struct page *page;
+
+   page = find_get_page(mapping, index);
+   if (page && PageUptodate(page))
+   return page;
+   f2fs_put_page(page, 0);
+
+   page = get_read_data_page(inode, index, READ_SYNC);
+   if (IS_ERR(page))
+   return page;
+
+   if (PageUptodate(page))
+   return page;
+
+   wait_on_page_locked(page);
+   if 

[PATCH 07/19] f2fs: expose f2fs_mpage_readpages

2015-05-01 Thread Jaegeuk Kim
This patch implements f2fs_mpage_readpages for further optimization on
encryption support.

The basic code was taken from fs/mpage.c, and changed to be simple by adjusting
that block_size is equal to page_size in f2fs.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 157 +++--
 1 file changed, 154 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index aa3c079..2a3a9cd 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "f2fs.h"
 #include "node.h"
@@ -47,6 +48,30 @@ static void f2fs_read_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
+/*
+ * I/O completion handler for multipage BIOs.
+ * copied from fs/mpage.c
+ */
+static void mpage_end_io(struct bio *bio, int err)
+{
+   struct bio_vec *bv;
+   int i;
+
+   bio_for_each_segment_all(bv, bio, i) {
+   struct page *page = bv->bv_page;
+
+   if (!err) {
+   SetPageUptodate(page);
+   } else {
+   ClearPageUptodate(page);
+   SetPageError(page);
+   }
+   unlock_page(page);
+   }
+
+   bio_put(bio);
+}
+
 static void f2fs_write_end_io(struct bio *bio, int err)
 {
struct f2fs_sb_info *sbi = bio->bi_private;
@@ -1349,6 +1374,133 @@ int f2fs_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
start, len, get_data_block_fiemap);
 }
 
+/*
+ * This function was originally taken from fs/mpage.c, and customized for f2fs.
+ * Major change was from block_size == page_size in f2fs by default.
+ */
+static int f2fs_mpage_readpages(struct address_space *mapping,
+   struct list_head *pages, struct page *page,
+   unsigned nr_pages)
+{
+   struct bio *bio = NULL;
+   unsigned page_idx;
+   sector_t last_block_in_bio = 0;
+   struct inode *inode = mapping->host;
+   const unsigned blkbits = inode->i_blkbits;
+   const unsigned blocksize = 1 << blkbits;
+   sector_t block_in_file;
+   sector_t last_block;
+   sector_t last_block_in_file;
+   sector_t block_nr;
+   struct block_device *bdev = inode->i_sb->s_bdev;
+   struct f2fs_map_blocks map;
+
+   map.m_pblk = 0;
+   map.m_lblk = 0;
+   map.m_len = 0;
+   map.m_flags = 0;
+
+   for (page_idx = 0; nr_pages; page_idx++, nr_pages--) {
+
+   prefetchw(>flags);
+   if (pages) {
+   page = list_entry(pages->prev, struct page, lru);
+   list_del(>lru);
+   if (add_to_page_cache_lru(page, mapping,
+ page->index, GFP_KERNEL))
+   goto next_page;
+   }
+
+   block_in_file = (sector_t)page->index;
+   last_block = block_in_file + nr_pages;
+   last_block_in_file = (i_size_read(inode) + blocksize - 1) >>
+   blkbits;
+   if (last_block > last_block_in_file)
+   last_block = last_block_in_file;
+
+   /*
+* Map blocks using the previous result first.
+*/
+   if ((map.m_flags & F2FS_MAP_MAPPED) &&
+   block_in_file > map.m_lblk &&
+   block_in_file < (map.m_lblk + map.m_len))
+   goto got_it;
+
+   /*
+* Then do more f2fs_map_blocks() calls until we are
+* done with this page.
+*/
+   map.m_flags = 0;
+
+   if (block_in_file < last_block) {
+   map.m_lblk = block_in_file;
+   map.m_len = last_block - block_in_file;
+
+   if (f2fs_map_blocks(inode, , 0, false))
+   goto set_error_page;
+   }
+got_it:
+   if ((map.m_flags & F2FS_MAP_MAPPED)) {
+   block_nr = map.m_pblk + block_in_file - map.m_lblk;
+   SetPageMappedToDisk(page);
+
+   if (!PageUptodate(page) && !cleancache_get_page(page)) {
+   SetPageUptodate(page);
+   goto confused;
+   }
+   } else {
+   zero_user_segment(page, 0, PAGE_CACHE_SIZE);
+   SetPageUptodate(page);
+   unlock_page(page);
+   goto next_page;
+   }
+
+   /*
+* This page will go to BIO.  Do we need to send this
+* BIO off first?
+*/
+   if (bio && (last_block_in_bio != block_nr - 1)) {
+submit_and_realloc:
+   

[PATCH 06/19] f2fs: introduce f2fs_commit_super

2015-05-01 Thread Jaegeuk Kim
This patch introduces f2fs_commit_super to write updated superblock.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h  |  1 +
 fs/f2fs/super.c | 24 
 2 files changed, 25 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c98454d..7ff3ac7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1535,6 +1535,7 @@ static inline int f2fs_add_link(struct dentry *dentry, 
struct inode *inode)
 /*
  * super.c
  */
+int f2fs_commit_super(struct f2fs_sb_info *);
 int f2fs_sync_fs(struct super_block *, int);
 extern __printf(3, 4)
 void f2fs_msg(struct super_block *, const char *, const char *, ...);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b2dd1b0..8584168 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -966,6 +966,30 @@ retry:
return 0;
 }
 
+int f2fs_commit_super(struct f2fs_sb_info *sbi)
+{
+   struct buffer_head *sbh = sbi->raw_super_buf;
+   sector_t block = sbh->b_blocknr;
+   int err;
+
+   /* write back-up superblock first */
+   sbh->b_blocknr = block ? 0 : 1;
+   mark_buffer_dirty(sbh);
+   err = sync_dirty_buffer(sbh);
+
+   sbh->b_blocknr = block;
+   if (err)
+   goto out;
+
+   /* write current valid superblock */
+   mark_buffer_dirty(sbh);
+   err = sync_dirty_buffer(sbh);
+out:
+   clear_buffer_write_io_error(sbh);
+   set_buffer_uptodate(sbh);
+   return err;
+}
+
 static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
 {
struct f2fs_sb_info *sbi;
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/19] f2fs: add missing version info in superblock

2015-05-01 Thread Jaegeuk Kim
The mkfs.f2fs remains kernel version in superblock, but f2fs module has not
added that so far.

Signed-off-by: Jaegeuk Kim 
---
 include/linux/f2fs_fs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index 591f8c3..8d345c2 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -50,6 +50,8 @@
 #define MAX_ACTIVE_NODE_LOGS   8
 #define MAX_ACTIVE_DATA_LOGS   8
 
+#define VERSION_LEN256
+
 /*
  * For superblock
  */
@@ -86,6 +88,9 @@ struct f2fs_super_block {
__le32 extension_count; /* # of extensions below */
__u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */
__le32 cp_payload;
+   __u8 version[VERSION_LEN];  /* the kernel version */
+   __u8 init_version[VERSION_LEN]; /* the initial kernel version */
+   __u8 reserved[892]; /* valid reserved region */
 } __packed;
 
 /*
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/19] f2fs: fix not to check IS_ERR for null pointer

2015-05-01 Thread Jaegeuk Kim
The acl can have null, error pointer, or valid pointer.
So, we should check its pointer existence too.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/acl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index c8f25f7..851ff98 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -190,7 +190,7 @@ static struct posix_acl *__f2fs_get_acl(struct inode 
*inode, int type,
acl = ERR_PTR(retval);
kfree(value);
 
-   if (!IS_ERR(acl))
+   if (!IS_ERR_OR_NULL(acl))
set_cached_acl(inode, type, acl);
 
return acl;
-- 
2.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/10] parse_integer: rewrite kstrto*()

2015-05-01 Thread Alexey Dobriyan
Rewrite kstrto*() functions through parse_integer().

_kstrtoul() and _kstrtol() are removed because parse_integer()
can dispatch based on sizeof(long) saving function call.

Also move function definitions and comment one instance.
Remove redundant boilerplate comments from elsewhere.

High bit base hack suggested by Andrew M.

Signed-off-by: Alexey Dobriyan 
---

 include/linux/kernel.h|  124 ---
 include/linux/parse-integer.h |  111 +
 lib/kstrtox.c |  222 --
 lib/parse-integer.c   |   38 ++-
 4 files changed, 145 insertions(+), 350 deletions(-)

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -252,130 +252,6 @@ void do_exit(long error_code)
 void complete_and_exit(struct completion *, long)
__noreturn;
 
-/* Internal, do not use. */
-int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long 
*res);
-int __must_check _kstrtol(const char *s, unsigned int base, long *res);
-
-int __must_check kstrtoull(const char *s, unsigned int base, unsigned long 
long *res);
-int __must_check kstrtoll(const char *s, unsigned int base, long long *res);
-
-/**
- * kstrtoul - convert a string to an unsigned long
- * @s: The start of the string. The string must be null-terminated, and may 
also
- *  include a single newline before its terminating null. The first character
- *  may also be a plus sign, but not a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- *  given as 0, then the base of the string is automatically detected with the
- *  conventional semantics - If it begins with 0x the number will be parsed as 
a
- *  hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- *  parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Used as a replacement for the obsolete simple_strtoull. Return code must
- * be checked.
-*/
-static inline int __must_check kstrtoul(const char *s, unsigned int base, 
unsigned long *res)
-{
-   /*
-* We want to shortcut function call, but
-* __builtin_types_compatible_p(unsigned long, unsigned long long) = 0.
-*/
-   if (sizeof(unsigned long) == sizeof(unsigned long long) &&
-   __alignof__(unsigned long) == __alignof__(unsigned long long))
-   return kstrtoull(s, base, (unsigned long long *)res);
-   else
-   return _kstrtoul(s, base, res);
-}
-
-/**
- * kstrtol - convert a string to a long
- * @s: The start of the string. The string must be null-terminated, and may 
also
- *  include a single newline before its terminating null. The first character
- *  may also be a plus sign or a minus sign.
- * @base: The number base to use. The maximum supported base is 16. If base is
- *  given as 0, then the base of the string is automatically detected with the
- *  conventional semantics - If it begins with 0x the number will be parsed as 
a
- *  hexadecimal (case insensitive), if it otherwise begins with 0, it will be
- *  parsed as an octal number. Otherwise it will be parsed as a decimal.
- * @res: Where to write the result of the conversion on success.
- *
- * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
- * Used as a replacement for the obsolete simple_strtoull. Return code must
- * be checked.
- */
-static inline int __must_check kstrtol(const char *s, unsigned int base, long 
*res)
-{
-   /*
-* We want to shortcut function call, but
-* __builtin_types_compatible_p(long, long long) = 0.
-*/
-   if (sizeof(long) == sizeof(long long) &&
-   __alignof__(long) == __alignof__(long long))
-   return kstrtoll(s, base, (long long *)res);
-   else
-   return _kstrtol(s, base, res);
-}
-
-int __must_check kstrtouint(const char *s, unsigned int base, unsigned int 
*res);
-int __must_check kstrtoint(const char *s, unsigned int base, int *res);
-
-static inline int __must_check kstrtou64(const char *s, unsigned int base, u64 
*res)
-{
-   return kstrtoull(s, base, res);
-}
-
-static inline int __must_check kstrtos64(const char *s, unsigned int base, s64 
*res)
-{
-   return kstrtoll(s, base, res);
-}
-
-static inline int __must_check kstrtou32(const char *s, unsigned int base, u32 
*res)
-{
-   return kstrtouint(s, base, res);
-}
-
-static inline int __must_check kstrtos32(const char *s, unsigned int base, s32 
*res)
-{
-   return kstrtoint(s, base, res);
-}
-
-int __must_check kstrtou16(const char *s, unsigned int base, u16 *res);
-int __must_check kstrtos16(const char *s, unsigned int base, s16 *res);
-int __must_check kstrtou8(const char *s, unsigned int base, u8 *res);
-int __must_check kstrtos8(const char *s, unsigned int base, s8 *res);
-
-int 

[PATCH 01/10] Add parse_integer() (replacement for simple_strto*())

2015-05-01 Thread Alexey Dobriyan
kstrto*() and kstrto*_from_user() family of functions were added
to help with parsing one integer written as string to proc/sysfs/debugfs
files. But they have a limitation: string passed must end with \0 or \n\0.
There are enough places where kstrto*() functions can't be used because of
this limitation. Trivial example: major:minor "%u:%u".

Currently the only way to parse everything is simple_strto*() functions.
But they are suboptimal:
* they do not detect overflow (can be fixed, but no one bothered since 
~0.99.11),
* there are only 4 of them -- long and "long long" versions,
  This leads to silent truncation in the most simple case:

val = strtoul(s, NULL, 0);

* half of the people think that "char **endp" argument is necessary and
  add unnecessary variable.

OpenBSD people, fed up with how complex correct integer parsing is, added
strtonum(3) to fixup for deficiencies of libc-style integer parsing:
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/strtonum.3?query=strtonum=i386

It'd be OK to copy that but it relies on "errno" and fixed strings as
error reporting channel which I think is not OK for kernel.
strtonum() also doesn't report number of characted consumed.

What to do?

Enter parse_integer().

int parse_integer(const char *s, unsigned int base, T *val);

Rationale:

* parse_integer() is exactly 1 (one) interface not 4 or
  many,one for each type.

* parse_integer() reports -E errors reliably in a canonical kernel way:

rv = parse_integer(str, 10, );
if (rv < 0)
return rv;

* parse_integer() writes result only if there were no errors, at least
  one digit has to be consumed,

* parse_integer doesn't mix error reporting channel and value channel,
  it does mix error and number-of-character-consumed channel, though.

* parse_integer() reports number of characters consumed, makes parsing
  multiple values easy:

rv = parse_integer(str, 0, );
if (rv < 0)
return rv;
str += rv;
if (*str++ != ':')
return -EINVAL;
rv = parse_integer(str, 0, );
if (rv < 0)
return rv;
if (str[rv] != '\0')
return -EINVAL;

There are several deficiencies in parse_integer() compared to strto*():
* can't be used in initializers:

const T x = strtoul();

* can't be used with bitfields,
* can't be used in expressions:

x = strtol() * 1024;
x = y = strtol() * 1024;
x += strtol();

* currently there is no support for _Bool, and at least one place
  where simple_strtoul() is directly assigned to _Bool variable.
  It is trivial to add, but not clear if it should only accept "0" and "1",
  because, say, module param code accepts 0, 1, y, n, Y and N
  (which I personally think is stupid).

In defense of parse_integer() all I can say, is that using strtol() in
expression or initializer promotes no error checking and thus probably
should not be encouraged in C, language with no built-in error checking
anyway.

The amount of "x = y = strtol()" expressions in kernel is very small.
The amount of direct usage in expression is not small, but can be
counted as an acceptable loss.

Signed-off-by: Alexey Dobriyan 
---

 include/linux/kernel.h|7 +
 include/linux/parse-integer.h |   79 
 lib/Makefile  |1 
 lib/kstrtox.c |   76 ---
 lib/kstrtox.h |1 
 lib/parse-integer.c   |  203 ++
 6 files changed, 310 insertions(+), 57 deletions(-)

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -387,8 +388,10 @@ static inline int __must_check kstrtos32_from_user(const 
char __user *s, size_t
return kstrtoint_from_user(s, count, base, res);
 }
 
-/* Obsolete, do not use.  Use kstrto instead */
-
+/*
+ * Obsolete, do not use.
+ * Use parse_integer(), kstrto*(), kstrto*_from_user(), sscanf().
+ */
 extern unsigned long simple_strtoul(const char *,char **,unsigned int);
 extern long simple_strtol(const char *,char **,unsigned int);
 extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
--- /dev/null
+++ b/include/linux/parse-integer.h
@@ -0,0 +1,79 @@
+#ifndef _PARSE_INTEGER_H
+#define _PARSE_INTEGER_H
+#include 
+#include 
+
+/*
+ * int parse_integer(const char *s, unsigned int base, T *val);
+ *
+ * Convert integer string representation to an integer.
+ * Range of accepted values equals to that of type T.
+ *
+ * Conversion to unsigned integer accepts sign "+".
+ * Conversion to signed integer accepts sign "+" and sign "-".
+ *
+ * Radix 0 means autodetection: leading "0x" implies radix 16,
+ * leading "0" implies radix 8, otherwise radix is 10.
+ * Autodetection hint works after optional sign, but not before.
+ *
+ * Return number of characters parsed or -E.
+ *
+ * 

Re: [PATCH 0/1] speeding up cpu_up()

2015-05-01 Thread Aravind Gopalakrishnan


On 5/1/15 5:47 PM, Borislav Petkov wrote:

On Fri, May 01, 2015 at 02:42:39PM -0700, Len Brown wrote:

On Mon, Apr 20, 2015 at 12:15 AM, Ingo Molnar  wrote:


So instead of playing games with an ancient delay, I'd suggest we
install the 10 msec INIT assertion wait as a platform quirk instead,
and activate it for all CPUs/systems that we think might need it, with
a sufficiently robust and future-proof quirk cutoff condition.

New systems won't have the quirk active and thus won't have to have
this delay configurable either.

Okay, at this time, I think the quirk would apply to:

1. Intel family 5 (original pentium) -- some may actually need the quirk
2. Intel family F (pentium4) -- mostly b/c I don't want to bother
finding/testing p4
3. All AMD (happy to narrow down, if somebody can speak for AMD)

Aravind and I could probably test on a couple of AMD boxes to narrow down.

@Aravind, see here:

https://lkml.kernel.org/r/87d69aab88c14d65ae1e7be55050d1b689b59b4b.1429402494.git.len.br...@intel.com

You could ask around whether a timeout is needed between the assertion
and deassertion of INIT done by the BSP when booting other cores.


Sure, I'll ask around and try mdelay(0) on some systems as well.
I can gather Fam15h, Fam16h but don't have K8's or older.

Will let you know how it goes.

-Aravind.


If not, we probably should convert, at least modern AMD machines, to the
no-delay default.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] ipc/mqueue: lockless pipelined wakeups

2015-05-01 Thread Davidlohr Bueso
On Fri, 2015-05-01 at 17:52 -0400, George Spelvin wrote:
> In general, Acked-by, but you're making me fix all your comments. :-)
> 
> This is a nice use of the wake queue, since the code was already handling
> the same problem in a similar way with STATE_PENDING.
> 
> >  * The receiver accepts the message and returns without grabbing the queue
> >+ * spinlock. The used algorithm is different from sysv semaphores 
> >(ipc/sem.c):
> 
> Is that last sentence even wanted?

Yeah, we can probably remove it now.

> >+ *
> >+ * - Set pointer to message.
> >+ * - Queue the receiver task's for later wakeup (without the info->lock).
> 
> It's "task" singular, and the apostrophe would be wrong if it were plural.
> 
> >+ * - Update its state to STATE_READY. Now the receiver can continue.
> >+ * - Wake up the process after the lock is dropped. Should the process wake 
> >up
> >+ *   before this wakeup (due to a timeout or a signal) it will either see
> >+ *   STATE_READY and continue or acquire the lock to check the sate again.
> 
> "check the sTate again".
> 
> >+wake_q_add(wake_q, receiver->task);
> >+/*
> >+ * Rely on the implicit cmpxchg barrier from wake_q_add such
> >+ * that we can ensure that updating receiver->state is the last
> >+ * write operation: As once set, the receiver can continue,
> >+ * and if we don't have the reference count from the wake_q,
> >+ * yet, at that point we can later have a use-after-free
> >+ * condition and bogus wakeup.
> >+ */
> > receiver->state = STATE_READY;
> 
> How about:
>   /*
>* There must be a write barrier here; setting STATE_READY
>* lets the receiver proceed without further synchronization.
>* The cmpxchg inside wake_q_add serves as the barrier here.
>*/
> 
> The need for a wake queue to take a reference to avoid use-after-free
> is generic to wake queues, and handled in generic code; I don't see why
> it needs a comment here.

You are not wrong, but I'd rather leave the comment as is, as it will
vary from user to user. The comments in the sched wake_q bits are
already pretty clear, and if users cannot see the need for holding
reference and the task disappearing on their own they have no business
using wake_q. Furthermore, I think my comment serves better in mqueues
as the need for it isn't immediately obvious.

 
> >@@ -1084,6 +1094,7 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char 
> >__user *, u_msg_ptr,
> > ktime_t expires, *timeout = NULL;
> > struct timespec ts;
> > struct posix_msg_tree_node *new_leaf = NULL;
> >+WAKE_Q(wake_q);
> > 
> > if (u_abs_timeout) {
> > int res = prepare_timeout(u_abs_timeout, , );
> >@@ -1155,8 +1166,9 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char 
> >__user *, u_msg_ptr,
> > CURRENT_TIME;
> > 
> > /* There is now free space in queue. */
> >-pipelined_receive(info);
> >+pipelined_receive(_q, info);
> > spin_unlock(>lock);
> >+wake_up_q(_q);
> > ret = 0;
> > }
> > if (ret == 0) {
> 
> Since WAKE_Q actually involves some initialization, would it make sense to
> move its declaration to inside the condition that needs it?
> 
> (I'm also a fan of declaring variables in the smallest scope possible,
> just on general principles.)

Agreed.

Thanks,
Davidlohr


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader

2015-05-01 Thread Zheng, Lv
Hi,

> From: Borislav Petkov [mailto:b...@alien8.de]
> Sent: Thursday, April 30, 2015 4:49 PM
> 
> On Thu, Apr 30, 2015 at 08:05:12AM +, Zheng, Lv wrote:
> > Are there any such data around the SC and LL (MIPS)?
> 
> What, you can't search the internet yourself?

I mean if LL can do what you exactly did with "if" and no ABA problem can break 
the atomic_add_unless() users.
Then no atomic_cmpxchg() users might be broken for the same reason.
Why don't you put an "if" in the atomic_cmpxchg() to allow other users to have 
the same benefit because of the data?

But this is out of the context for this patch.
I actually don't have any API usage preference now.

Thanks
-Lv

> 
> --
> Regards/Gruss,
> Boris.
> 
> ECO tip #101: Trim your mails when you reply.
> --
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 02/11] drivers/clk: include for clk-max77xxx modular code

2015-05-01 Thread Paul Gortmaker
[Re: [PATCH 02/11] drivers/clk: include  for clk-max77xxx modular 
code] On 01/05/2015 (Fri 14:37) Stephen Boyd wrote:

> On 04/30/15 18:47, Paul Gortmaker wrote:
> > These files are built off of the tristate COMMON_CLK_MAX77686 and
> > COMMON_CLK_MAX77802 respectively.  They also contains modular function
> > calls so they should explicitly include module.h to avoid compile
> > breakage during header shuffles done in the future.
> >
> > Cc: Mike Turquette 
> > Cc: Stephen Boyd 
> > Signed-off-by: Paul Gortmaker 
> 
> I assume you're taking this through some larger series?

Yep, there will be several series fed to Linus once I'm done here.

It is basically a refactoring of this series into more manageable chunks:

https://marc.info/?l=linux-kernel=139033951228828

Thanks,
Paul.
--

> 
> Acked-by: Stephen Boyd 
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel >= 4.0: crashes when using traceroute6 with isatap

2015-05-01 Thread Wolfgang Walter
Hello,

kernel 4.0 (and 4.0.1) crashes immediately when I use traceroute6 with an 
isatap-tunnel.

I took an image of the message I got. It is not complete as my vt has not 
enough lines.

3.19.3 works fine.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

Re: [dm-devel] Regression: Disk corruption with dm-crypt and kernels >= 4.0

2015-05-01 Thread Abelardo Ricart III
On Fri, 2015-05-01 at 22:47 +0100, Alasdair G Kergon wrote:
> On Fri, May 01, 2015 at 12:37:07AM -0400, Abelardo Ricart III wrote:
> > # first bad commit: [cf2f1abfbd0dba701f7f16ef619e4d2485de3366] dm crypt: 
> > don't
> > allocate pages for a partial request
>  
> That's not a particularly good commit to identify.
> 
> If you didn't already, can you confirm whether or not the code works at the
> patch immediately following?
> 
>   7145c241a1bf2841952c3e297c4080b357b3e52d
> 
> Alasdair
> 
Just built that revision and it failed almost immediately with more ata errors. 
It also corrupted my testing log.

As an aside, here's my fstab in case it's of any use

>8
/dev/mapper/root/ f2fs  
rw,relatime,flush_merge,background_gc=on,user_xattr,acl,active_logs=6   0 0

/dev/mapper/home/home   f2fs
rw,relatime,flush_merge,background_gc=on,user_xattr,acl,active_logs=6   0 2

/dev/sda2   /boot   ext4
rw,relatime,data=ordered0 2

/dev/sda1   /boot/EFI   vfat
rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro
0 2

tmpfs   /scratchtmpfs   nodev,nosuid,size=12G   
0 0
>8
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-01 Thread Nicolas Pitre
On Fri, 1 May 2015, John Stultz wrote:

>  static inline u64 ktime_divns(const ktime_t kt, s64 div)
>  {
>   if (__builtin_constant_p(div) && !(div >> 32)) {
> - u64 ns = kt.tv64;
> + s64 ns = kt.tv64;
> + int neg = 0;
> +
> + if (ns < 0) {
> + neg = 1;
> + ns = -ns;
> + }

Minor comment: you could save some realestate with:

s64 ns = kt.tv64;
bool neg = (ns < 0);

if (neg)
ns = -ns;


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/13] Parallel struct page initialisation v4

2015-05-01 Thread Waiman Long

On 05/01/2015 06:02 PM, Waiman Long wrote:


Bad news!

I tried your patch on a 24-TB DragonHawk and got an out of memory 
panic. The kernel log messages were:

  :
[   80.126186] CPU  474: hi:  186, btch:  31 usd:   0
[   80.131457] CPU  475: hi:  186, btch:  31 usd:   0
[   80.136726] CPU  476: hi:  186, btch:  31 usd:   0
[   80.141997] CPU  477: hi:  186, btch:  31 usd:   0
[   80.147267] CPU  478: hi:  186, btch:  31 usd:   0
[   80.152538] CPU  479: hi:  186, btch:  31 usd:   0
[   80.157813] active_anon:0 inactive_anon:0 isolated_anon:0
[   80.157813]  active_file:0 inactive_file:0 isolated_file:0
[   80.157813]  unevictable:0 dirty:0 writeback:0 unstable:0
[   80.157813]  free:209 slab_reclaimable:7 slab_unreclaimable:42986
[   80.157813]  mapped:0 shmem:0 pagetables:0 bounce:0
[   80.157813]  free_cma:0
[   80.190428] Node 0 DMA free:568kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB 
managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB 
shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14928kB 
kernel_stack:400kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

[   80.233475] lowmem_reserve[]: 0 0 0 0
[   80.237542] Node 0 DMA32 free:20kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1961924kB managed:1333604kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
slab_unreclaimable:101664kB kernel_stack:50176kB pagetables:0kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.281456] lowmem_reserve[]: 0 0 0 0
[   80.285527] Node 0 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1608515580kB managed:2097148kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB 
slab_unreclaimable:948kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.328958] lowmem_reserve[]: 0 0 0 0
[   80.333031] Node 1 Normal free:248kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612732kB managed:2228220kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB 
slab_unreclaimable:46240kB kernel_stack:3232kB pagetables:0kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.377256] lowmem_reserve[]: 0 0 0 0
[   80.381325] Node 2 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:612kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.424764] lowmem_reserve[]: 0 0 0 0
[   80.428842] Node 3 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:600kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.472293] lowmem_reserve[]: 0 0 0 0
[   80.476360] Node 4 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:620kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.519803] lowmem_reserve[]: 0 0 0 0
[   80.523875] Node 5 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1610612736kB managed:2097152kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:584kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes

[   80.567312] lowmem_reserve[]: 0 0 0 0
[   80.571379] Node 6 Normal free:0kB min:0kB low:0kB high:0kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB 

Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-01 Thread John Stultz
On Fri, May 1, 2015 at 4:54 PM, Nicolas Pitre  wrote:
> On Fri, 1 May 2015, John Stultz wrote:
>
>> It was noted that the 32bit implementation of ktime_divns
>> was doing unsgined division adn didn't properly handle
>> negative values.
>>
>> This patch fixes the problem by checking and preserving
>> the sign bit, and then reapplying it if appropriate
>> after the division.
>>
>> Unfortunately there is some duplication since we have
>> the optimized version for constant 32bit divider. I
>> was considering reworkign the __ktime_divns helper
>> to simplify the sign-handling logic, but then it
>> would likely just be a s64/s64 divide, and probably
>> should be more generic.
>>
>> Thoughts?
>
> Wouldn't it be better to simply forbid negative time altogether?  Given
> it's been broken for quite a while, there must not be that many
> instances of such usage and fixing them would avoid the useless sign
> handling overhead to 99.9% of the cases.

Well, ktime is basically an s64 and timespecs can be negative as well.
So I'm not sure its reasonable to disqualify negative time intervals
from using this function.  Especially since on 64bit systems,
ktime_divns handles negative intervals just fine.


>> Nicolas also notes that the ktime_divns() function
>> breaks if someone passes in a negative divisor as
>> well. This patch doesn't yet address that issue.
>
> GRanted, a negative divisor here would be even weirder and should
> definitely be rejected.  Maybe the infinite loop is a good thing in that
> case, probably better than producing wrong numbers.

Yea. I'm thinking a WARN_ON or a BUG would be good to have in both
32bit and 64bit cases so we avoid folks testing on 64bit and thinking
it works generally.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-01 Thread Nicolas Pitre
On Fri, 1 May 2015, John Stultz wrote:

> It was noted that the 32bit implementation of ktime_divns
> was doing unsgined division adn didn't properly handle
> negative values.
> 
> This patch fixes the problem by checking and preserving
> the sign bit, and then reapplying it if appropriate
> after the division.
> 
> Unfortunately there is some duplication since we have
> the optimized version for constant 32bit divider. I
> was considering reworkign the __ktime_divns helper
> to simplify the sign-handling logic, but then it
> would likely just be a s64/s64 divide, and probably
> should be more generic.
> 
> Thoughts?

Wouldn't it be better to simply forbid negative time altogether?  Given 
it's been broken for quite a while, there must not be that many 
instances of such usage and fixing them would avoid the useless sign 
handling overhead to 99.9% of the cases.

> Nicolas also notes that the ktime_divns() function
> breaks if someone passes in a negative divisor as
> well. This patch doesn't yet address that issue.

GRanted, a negative divisor here would be even weirder and should 
definitely be rejected.  Maybe the infinite loop is a good thing in that 
case, probably better than producing wrong numbers.

> Cc: Nicolas Pitre 
> Cc: Thomas Gleixner 
> Cc: Josh Boyer 
> Cc: One Thousand Gnomes 
> Reported-by: Trevor Cordes 
> Signed-off-by: John Stultz 
> ---
>  include/linux/ktime.h | 12 ++--
>  kernel/time/hrtimer.c | 11 +--
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 5fc3d10..d947263 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -166,12 +166,20 @@ static inline bool ktime_before(const ktime_t cmp1, 
> const ktime_t cmp2)
>  }
>  
>  #if BITS_PER_LONG < 64
> -extern u64 __ktime_divns(const ktime_t kt, s64 div);
> +extern s64 __ktime_divns(const ktime_t kt, s64 div);
>  static inline u64 ktime_divns(const ktime_t kt, s64 div)
>  {
>   if (__builtin_constant_p(div) && !(div >> 32)) {
> - u64 ns = kt.tv64;
> + s64 ns = kt.tv64;
> + int neg = 0;
> +
> + if (ns < 0) {
> + neg = 1;
> + ns = -ns;
> + }
>   do_div(ns, div);
> + if (neg)
> + ns = -ns;
>   return ns;
>   } else {
>   return __ktime_divns(kt, div);
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 76d4bd9..4c1b294 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -266,12 +266,17 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
> long *flags)
>  /*
>   * Divide a ktime value by a nanosecond value
>   */
> -u64 __ktime_divns(const ktime_t kt, s64 div)
> +s64 __ktime_divns(const ktime_t kt, s64 div)
>  {
> - u64 dclc;
> + s64 dclc;
>   int sft = 0;
> + int neg = 0;
>  
>   dclc = ktime_to_ns(kt);
> + if (dclc < 0) {
> + neg = 1;
> + dclc = -dclc;
> + }
>   /* Make sure the divisor is less than 2^32: */
>   while (div >> 32) {
>   sft++;
> @@ -279,6 +284,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
>   }
>   dclc >>= sft;
>   do_div(dclc, (unsigned long) div);
> + if (neg)
> + dclc = -dclc;
>  
>   return dclc;
>  }
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Ceph RBD fix for -rc2

2015-05-01 Thread Sage Weil
Hi Linus,

Please pull the following RBD fix from

  git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus

Thanks!
sage


Ilya Dryomov (1):
  rbd: end I/O the entire obj_request on error

 drivers/block/rbd.c |5 +
 1 file changed, 5 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: Disk corruption with dm-crypt and kernels >= 4.0

2015-05-01 Thread Abelardo Ricart III
On Fri, 2015-05-01 at 18:24 -0400, Abelardo Ricart III wrote:
> On Fri, 2015-05-01 at 17:17 -0400, Mike Snitzer wrote:
> > On Fri, May 01 2015 at 12:37am -0400,
> > Abelardo Ricart III  wrote:
> > 
> > > I made sure to run a completely vanilla kernel when testing why I was 
> > > suddenly
> > > seeing some nasty libata errors with all kernels >= v4.0. Here's a 
> > > snippet:
> > > 
> > > >8
> > > [  165.592136] ata5.00: exception Emask 0x60 SAct 0x7000 SErr 0x800 
> > > action 
> > > 0x6
> > > frozen
> > > [  165.592140] ata5.00: irq_stat 0x2000, host bus error
> > > [  165.592143] ata5: SError: { HostInt }
> > > [  165.592145] ata5.00: failed command: READ FPDMA QUEUED
> > > [  165.592149] ata5.00: cmd 60/08:60:a0:0d:89/00:00:07:00:00/40 tag 12 
> > > ncq 
> > > 4096
> > > in
> > > res 40/00:74:40:58:5d/00:00:00:00:00/40 Emask 0x60
> > > (host bus error)
> > > [  165.592151] ata5.00: status: { DRDY }
> > > >8
> > > 
> > > After a few dozen of these errors, I'd suddenly find my system in read
> > > -only
> > > mode with corrupted files throughout my encrypted filesystems (seemed like
> > > either a read or a write would corrupt a file, though I could be 
> > > mistaken). 
> > > I
> > > decided to do a git bisect with a random read-write-sync test to narrow 
> > > down
> > > the culprit, which turned out to be this commit (part of a series):
> > > 
> > > # first bad commit: [cf2f1abfbd0dba701f7f16ef619e4d2485de3366] dm crypt: 
> > > don't
> > > allocate pages for a partial request
> > > 
> > > Just to be sure, I created a patch to revert the entire nine patch series 
> > > that
> > > commit belonged to... and the bad behavior disappeared. I've now been 
> > > running
> > > kernel 4.0 for a few days without issue, and went so far as to stress 
> > > test 
> > > my
> > > poor SSD for a few hours to be 100% positive.
> > > 
> > > Here's some more info on my setup.
> > > 
> > > >8
> > > $ lsblk -f
> > > NAME FSTYPE  LABEL MOUNTPOINT
> > > sda  
> > > ├─sda1   vfat  /boot/EFI
> > > ├─sda2   ext4  /boot
> > > └─sda3   LVM2_member
> > >   ├─SSD-root crypto_LUKS
> > >   │ └─root   f2fs  /
> > >   └─SSD-home crypto_LUKS
> > > └─home   f2fs  /home
> > > 
> > > $ cat /proc/cmdline
> > > BOOT_IMAGE=/vmlinuz-linux-memnix cryptdevice=/dev/SSD/root:root:allow
> > > -discards
> > > root=/dev/mapper/root acpi_osi=Linux security=tomoyo
> > > TOMOYO_trigger=/usr/lib/systemd/systemd intel_iommu=on
> > > modprobe.blacklist=nouveau rw quiet
> > > 
> > > $ cat /etc/lvm/lvm.conf | grep "issue_discards"
> > > issue_discards = 1
> > > >8
> > > 
> > > If there's anything else I can do to help diagnose the underlying 
> > > problem, 
> > > I'm
> > > more than willing.
> > 
> > The patchset in question was tested quite heavily so this is a
> > surprising report.  I'm noticing you are opting in to dm-crypt discard
> > support.  Have you tested without discards enabled?
> 
> I've disabled discards universally and rebuilt a vanilla kernel. After running
> my heavy read-write-sync scripts, everything seems to be working fine now. I
> suppose this could be something that used to fail silently before, but now
> produces bad behavior? I seem to remember having something in my message log
> about "discards not supported on this device" when running with it enabled
> before.

Forgive me, but I spoke too soon. The corruption and libata errors are still
there, as was evidenced when I went to reboot and got treated to an eye full of
"read-only filesystem" and ata errors.

So no, disabling discards unfortunately did nothing to help.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [1/2] RTC: Add core rtc support for Gemini Soc devices

2015-05-01 Thread Alexandre Belloni
Hi,

On 14/12/2010 at 16:08:26 +0100, Hans Ulli Kroll wrote :
> driver for the rtc device
> on Cortina Systems CS3516 or StormlinkSemi SL3516 aka Gemini SoC
> 
> Signed-off-by: Hans Ulli Kroll 

This driver has never been merged and the platform doesn't seem to be
active anymore. Is there still any interest in getting this driver
mainlined?

Only tree wide cleanups happened in mach-gemini since end of 2010, the
listed git repository (git://git.berlios.de/gemini-board) was on berliOS
(closed since 2011) and the sourceforge mirror seems empty. Is there
still interest in keeping that platform in the mainline?

> ---
>  MAINTAINERS  |1 +
>  drivers/rtc/Kconfig  |9 ++
>  drivers/rtc/Makefile |1 +
>  drivers/rtc/rtc-gemini.c |  206 
> ++
>  4 files changed, 217 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/rtc/rtc-gemini.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0094224..1987d46 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -619,6 +619,7 @@ L:linux-arm-ker...@lists.infradead.org (moderated 
> for non-subscribers)
>  T:   git git://git.berlios.de/gemini-board
>  S:   Maintained
>  F:   arch/arm/mach-gemini/
> +F:   drivers/rtc/rtc-gemini.c
>  
>  ARM/EBSA110 MACHINE SUPPORT
>  M:   Russell King 
> diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
> index 2883428..da14253 100644
> --- a/drivers/rtc/Kconfig
> +++ b/drivers/rtc/Kconfig
> @@ -839,6 +839,15 @@ config RTC_DRV_BFIN
> This driver can also be built as a module. If so, the module
> will be called rtc-bfin.
>  
> +config RTC_DRV_GEMINI
> + tristate "Gemini SoC RTC"
> + help
> +   If you say Y here you will get support for the
> +   RTC found on Gemini SoC's.
> +
> +   This driver can also be built as a module. If so, the module
> +   will be called rtc-gemini.
> +
>  config RTC_DRV_RS5C313
>   tristate "Ricoh RS5C313"
>   depends on SH_LANDISK
> diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
> index 4c2832d..779e42c 100644
> --- a/drivers/rtc/Makefile
> +++ b/drivers/rtc/Makefile
> @@ -46,6 +46,7 @@ obj-$(CONFIG_RTC_DRV_DS3234)+= rtc-ds3234.o
>  obj-$(CONFIG_RTC_DRV_EFI)+= rtc-efi.o
>  obj-$(CONFIG_RTC_DRV_EP93XX) += rtc-ep93xx.o
>  obj-$(CONFIG_RTC_DRV_FM3130) += rtc-fm3130.o
> +obj-$(CONFIG_RTC_DRV_GEMINI) += rtc-gemini.o
>  obj-$(CONFIG_RTC_DRV_GENERIC)+= rtc-generic.o
>  obj-$(CONFIG_RTC_DRV_IMXDI)  += rtc-imxdi.o
>  obj-$(CONFIG_RTC_DRV_ISL1208)+= rtc-isl1208.o
> diff --git a/drivers/rtc/rtc-gemini.c b/drivers/rtc/rtc-gemini.c
> new file mode 100644
> index 000..111bb67
> --- /dev/null
> +++ b/drivers/rtc/rtc-gemini.c
> @@ -0,0 +1,206 @@
> +/*
> + *  Gemini OnChip RTC
> + *
> + *  Copyright (C) 2009 Janos Laube 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * Original code for older kernel 2.6.15 are form Stormlinksemi
> + * first update from Janos Laube for > 2.6.29 kernels
> + *
> + * checkpatch fixes and usage off rtc-lib code
> + * Hans Ulli Kroll 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define DRV_NAME"rtc-gemini"
> +#define DRV_VERSION "0.2"
> +
> +struct gemini_rtc {
> + struct rtc_device   *rtc_dev;
> + void __iomem*rtc_base;
> + int rtc_irq;
> +};
> +
> +enum gemini_rtc_offsets {
> + GEMINI_RTC_SECOND   = 0x00,
> + GEMINI_RTC_MINUTE   = 0x04,
> + GEMINI_RTC_HOUR = 0x08,
> + GEMINI_RTC_DAYS = 0x0C,
> + GEMINI_RTC_ALARM_SECOND = 0x10,
> + GEMINI_RTC_ALARM_MINUTE = 0x14,
> + GEMINI_RTC_ALARM_HOUR   = 0x18,
> + GEMINI_RTC_RECORD   = 0x1C,
> + GEMINI_RTC_CR   = 0x20
> +};
> +
> +static irqreturn_t gemini_rtc_interrupt(int irq, void *dev)
> +{
> + return IRQ_HANDLED;
> +}
> +
> +/*
> + * Looks like the RTC in the Gemini SoC is (totaly) broken
> + * We can't read/write directly the time from RTC registers.
> + * We must do some "offset" calculation to get the real time
> + *
> + * This FIX works pretty fine and Stormlinksemi aka Cortina-Networks does
> + * the same thing, without the rtc-lib.c calls.
> + */
> +
> +static int 

Re: [PATCH v3 0/2] clk: improve handling of orphan clocks

2015-05-01 Thread Stephen Boyd
On 05/01/15 15:07, Heiko Stübner wrote:
> Am Freitag, 1. Mai 2015, 13:52:47 schrieb Stephen Boyd:
>
>>> Instead I guess we could hook it less deep into clk_get_sys, like in the
>>> following patch?
>> It looks like it will work at least, but still I'd prefer to keep the
>> orphan check contained to clk.c. How about this compile tested only patch?
> I gave this a spin on my rk3288-firefly board. It still boots, the clock tree 
> looks the same and it also still defers nicely in the scenario I needed it 
> for. The implementation also looks nice - and of course much more compact 
> than 
> my check in two places :-) . I don't know if you want to put this as 
> follow-up 
> on top or fold it into the original orphan-check, so in any case
>
> Tested-by: Heiko Stuebner 
> Reviewed-by: Heiko Stuebner 

Thanks. I'm leaning towards tossing your patch 2/2 and replacing it with
my patch and a note that it's based on an earlier patch from you.

>
>
>> This also brings up an existing problem with clk_unregister() where
>> orphaned clocks are sitting out there useable by drivers when their
>> parent is unregistered. That code could use some work to atomically
>> switch all the orphaned clocks over to use the nodrv_ops.
> Not sure I understand this correctly yet, but when these children get 
> orphaned, switched to the clk_nodrv_ops, they won't get their original ops 
> back if the parent reappears.
>
> So I guess we would need to store the original ops in secondary property of 
> struct clk_core and I guess simply bind the ops-switch to the orphan state 
> update?

Yep. We'll need to store away the original ops in case we need to put
them back. Don't feel obligated to fix this either. It would certainly
be nice if someone tried to fix this case at some point, but it's not
like things are any worse off right now.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix an error that can cause fsl espi task blocked for more than 120 seconds

2015-05-01 Thread Jane Wan
Incorrect condition is used in spin_event_timeout().  When the TX is done, the 
SPIE_NF bit in ESPI_SPIE register is set to 1 to indicate the Tx FIFO is not 
full.  If the bit is 0, it indicates the Tx FIFO is full.

Due to this error, if the Tx FIFO is full at the beginning, but becomes not 
full after handling the Rx FIFO (the SPIE_NF bit is set), the 
spin_event_timeout() returns with timeout occurred.  It causes the interrupt 
handler not to send completion notification to the thread that called 
wait_for_complete() waiting for the notification.

Signed-off-by: Jane Wan 
---
 drivers/spi/spi-fsl-espi.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c
index 9011e5d..333d5c2 100644
--- a/drivers/spi/spi-fsl-espi.c
+++ b/drivers/spi/spi-fsl-espi.c
@@ -551,9 +551,13 @@ void fsl_espi_cpu_irq(struct mpc8xxx_spi *mspi, u32 events)
 
/* spin until TX is done */
ret = spin_event_timeout(((events = mpc8xxx_spi_read_reg(
-   _base->event)) & SPIE_NF) == 0, 1000, 0);
+   _base->event)) & SPIE_NF), 1000, 0);
if (!ret) {
dev_err(mspi->dev, "tired waiting for SPIE_NF\n");
+
+   /* Clear the SPIE bits */
+   mpc8xxx_spi_write_reg(_base->event, events);
+   complete(>done);
return;
}
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] net: dsa: introduce module_switch_driver macro

2015-05-01 Thread Florian Fainelli
On 01/05/15 15:09, Vivien Didelot wrote:
> This commit introduces a new module_switch_driver macro, similar to
> module_platform_driver and such, to reduce boilerplate when declaring
> DSA switch drivers.
> 
> In order to use the module_driver macro, register_switch_driver needed
> to be changed to return an int instead of void, so make it return 0.

Do we get much benefit from having this change, the diffstat looks
pretty neutral, ultimately register_switch_driver() might be gone (see:
http://www.spinics.net/lists/netdev/msg326900.html) and mv88e6xxx cannot
be converted to it due to how it is designed. This is not a strong
objection though, the changes look fine to me.

> 
> Signed-off-by: Vivien Didelot 
> ---
>  include/net/dsa.h | 13 -
>  net/dsa/dsa.c |  4 +++-
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index fbca63b..927f16a 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -11,6 +11,7 @@
>  #ifndef __LINUX_NET_DSA_H
>  #define __LINUX_NET_DSA_H
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -304,8 +305,18 @@ struct dsa_switch_driver {
>  unsigned char *addr, bool *is_static);
>  };
>  
> -void register_switch_driver(struct dsa_switch_driver *type);
> +int register_switch_driver(struct dsa_switch_driver *type);
>  void unregister_switch_driver(struct dsa_switch_driver *type);
> +
> +/* module_switch_driver() - Helper macro for drivers that don't do anything
> + * special in module init/exit. This eliminates a lot of boilerplate. Each
> + * module may only use this macro once, and calling it replaces module_init()
> + * and module_exit()
> + */
> +#define module_switch_driver(__switch_driver) \
> + module_driver(__switch_driver, register_switch_driver, \
> + unregister_switch_driver)
> +
>  struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev);
>  
>  static inline void *ds_to_priv(struct dsa_switch *ds)
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index e6f6cc3..9630522 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -31,11 +31,13 @@ char dsa_driver_version[] = "0.1";
>  static DEFINE_MUTEX(dsa_switch_drivers_mutex);
>  static LIST_HEAD(dsa_switch_drivers);
>  
> -void register_switch_driver(struct dsa_switch_driver *drv)
> +int register_switch_driver(struct dsa_switch_driver *drv)
>  {
>   mutex_lock(_switch_drivers_mutex);
>   list_add_tail(>list, _switch_drivers);
>   mutex_unlock(_switch_drivers_mutex);
> +
> + return 0;
>  }
>  EXPORT_SYMBOL_GPL(register_switch_driver);
>  
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Tux3 Report: How fast can we fsync?

2015-05-01 Thread Daniel Phillips
On Friday, May 1, 2015 8:38:55 AM PDT, Dave Chinner wrote:
>
> Well, yes - I never claimed XFS is a general purpose filesystem.  It
> is a high performance filesystem. Is is also becoming more relevant
> to general purpose systems as low cost storage gains capabilities
> that used to be considered the domain of high performance storage...

OK. Well, Tux3 is general purpose and that means we care about single
spinning disk and small systems.

>>> So, to demonstrate, I'll run the same tests but using a 256GB
>>> samsung 840 EVO SSD and show how much the picture changes.
>>
>> I will go you one better, I ran a series of fsync tests using
>> tmpfs, and I now have a very clear picture of how the picture
>> changes. The executive summary is: Tux3 is still way faster, and
>> still scales way better to large numbers of tasks. I have every
>> confidence that the same is true of SSD.
>
> /dev/ramX can't be compared to an SSD.  Yes, they both have low
> seek/IO latency but they have very different dispatch and IO
> concurrency models.  One is synchronous, the other is fully
> asynchronous.

I had ram available and no SSD handy to abuse. I was interested in
measuring the filesystem overhead with the device factored out. I
mounted loopback on a tmpfs file, which seems to be about the same as
/dev/ram, maybe slightly faster, but much easier to configure. I ran
some tests on a ramdisk just now and was mortified to find that I have
to reboot to empty the disk. It would take a compelling reason before
I do that again.

> This is an important distinction, as we'll see later on

I regard it as predictive of Tux3 performance on NVM.

> These trees:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/daniel/linux-tux3.git
> git://git.kernel.org/pub/scm/linux/kernel/git/daniel/linux-tux3-test.git
>
> have not been updated for 11 months. I thought tux3 had died long
> ago.
>
> You should keep them up to date, and send patches for xfstests to
> support tux3, and then you'll get a lot more people running,
> testing and breaking tux3

People are starting to show up to do testing now, pretty much the first
time, so we must do some housecleaning. It is gratifying that Tux3 never
broke for Mike, but of course it will assert just by running out of
space at the moment. As you rightly point out, that fix is urgent and is
my current project.

>> Running the same thing on tmpfs, Tux3 is significantly faster:
>>
>> Ext4:   1.40s
>> XFS:1.10s
>> Btrfs:  1.56s
>> Tux3:   1.07s
>
> 3% is not "signficantly faster". It's within run to run variation!

You are right, XFS and Tux3 are within experimental error for single
syncs on the ram disk, while Ext4 and Btrfs are way slower:

   Ext4:   1.59s
   XFS:1.11s
   Btrfs:  1.70s
   Tux3:   1.11s

A distinct performance gap appears between Tux3 and XFS as parallel
tasks increase.

>> You wish. In fact, Tux3 is a lot faster. ...
>
> Yes, it's easy to be fast when you have simple, naive algorithms and
> an empty filesystem.

No it isn't or the others would be fast too. In any case our algorithms
are far from naive, except for allocation. You can rest assured that
when allocation is brought up to a respectable standard in the fullness
of time, it will be competitive and will not harm our clean filesystem
performance at all.

There is no call for you to disparage our current achievements, which
are significant. I do not mind some healthy skepticism about the
allocation work, you know as well as anyone how hard it is. However your
denial of our current result is irritating and creates the impression
that you have an agenda. If you want to complain about something real,
complain that our current code drop is not done yet. I will humbly
apologize, and the same for enospc.

>> triple checked and reproducible:
>>
>>Tasks:   10  1001,00010,000
>>Ext4:   0.05 0.141.53 26.56
>>XFS:0.05 0.162.10 29.76
>>Btrfs:  0.08 0.373.18 34.54
>>Tux3:   0.02 0.050.18  2.16
>
> Yet I can't reproduce those XFS or ext4 numbers you are quoting
> there. eg. XFS on a 4GB ram disk:
>
> $ for i in 10 100 1000 1; do rm /mnt/test/foo* ; time
> ./test-fsync /mnt/test/foo 10 $i; done
>
> real0m0.030s
> user0m0.000s
> sys 0m0.014s
>
> real0m0.031s
> user0m0.008s
> sys 0m0.157s
>
> real0m0.305s
> user0m0.029s
> sys 0m1.555s
>
> real0m3.624s
> user0m0.219s
> sys 0m17.631s
> $
>
> That's roughly 10x faster than your numbers. Can you describe your
> test setup in detail? e.g.  post the full log from block device
> creation to benchmark completion so I can reproduce what you are
> doing exactly?

Mine is a lame i5 minitower with 4GB from Fry's. Yours is clearly way
more substantial, so I can't compare my numbers directly to yours.

Clearly the curve is the same: your numbers increase 10x going from 100
to 1,000 tasks and 12x going from 1,000 to 10,000. The Tux3 

Re: [PATCH 1/3] mtd: sh_flctl: fixup wait_for_completion_timeout return handling

2015-05-01 Thread Laurent Pinchart
Hi Nicholas,

Thank you for the patch.

On Friday 01 May 2015 16:16:01 Nicholas Mc Guire wrote:
> wait_for_completion_timeout() returns unsigned long not int so the check
> for <= should be == and the type unsigned long. This fixes up the return
> value handling and returns -ETIMEDOUT on timeout rather than 0 and 1 on
> on success rather than a more or less random remaining number of jiffies.
> 
> Signed-off-by: Nicholas Mc Guire 
> ---
> 
> call sites:
> read_fiforeg,write_ec_fiforeg assume > 0 == success
> and the comment in flctl_dma_fifo0_transfe states
> /* ret > 0 is success */
> return ret;
> since it is only checking for > 0 in the call-sites
> returning -ETIMEDOUT should be fine.
> 
> Patch was compile tested with ap325rxa_defconfig (implies
> CONFIG_MTD_NAND_SH_FLCTL=y)
> 
> Patch is against 4.1-rc1 (localversion-next is -next-20150501)
> 
>  drivers/mtd/nand/sh_flctl.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/nand/sh_flctl.c b/drivers/mtd/nand/sh_flctl.c
> index c3ce81c..4450864 100644
> --- a/drivers/mtd/nand/sh_flctl.c
> +++ b/drivers/mtd/nand/sh_flctl.c
> @@ -354,6 +354,7 @@ static int flctl_dma_fifo0_transfer(struct sh_flctl
> *flctl, unsigned long *buf, dma_cookie_t cookie = -EINVAL;
>   uint32_t reg;
>   int ret;
> + unsigned long time_left;
> 
>   if (dir == DMA_FROM_DEVICE) {
>   chan = flctl->chan_fifo0_rx;
> @@ -388,14 +389,16 @@ static int flctl_dma_fifo0_transfer(struct sh_flctl
> *flctl, unsigned long *buf, goto out;
>   }
> 
> - ret =
> + time_left =
>   wait_for_completion_timeout(>dma_complete,
>   msecs_to_jiffies(3000));
> 
> - if (ret <= 0) {
> + if (time_left == 0) {
>   dmaengine_terminate_all(chan);
>   dev_err(>pdev->dev, "wait_for_completion_timeout\n");
> - }
> + ret = -ETIMEDOUT;
> + } else
> + ret = 1;/* completion succeeded */

I'd go one step further and make this function return < 0 on error and 0 on 
success, like usually done through the kernel API. You could then simplify the 
code using something like

if (!wait_for_completion_timeout(>dma_complete,
 msecs_to_jiffies(3000))) {
dmaengine_terminate_all(chan);
dev_err(>pdev->dev, "wait_for_completion_timeout\n");
ret = -ETIMEDOUT;
}

(pre-initializing ret to 0)

> 
>  out:
>   reg = readl(FLINTDMACR(flctl));

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFS Freezer and stuck tasks

2015-05-01 Thread Jeff Layton
On Fri, 1 May 2015 17:10:34 -0400 (EDT)
Benjamin Coddington  wrote:

> On Fri, 1 May 2015, Benjamin Coddington wrote:
> 
> > On Wed, 4 Mar 2015, Shawn Bohrer wrote:
> >
> > > Hello,
> > >
> > > We're using the Linux cgroup Freezer on some machines that use NFS and
> > > have run into what appears to be a bug where frozen tasks are blocking
> > > running tasks and preventing them from completing.  On one of our
> > > machines which happens to be running an older 3.10.46 kernel we have
> > > frozen some of the tasks on the system using the cgroup Freezer.  We
> > > also have a separate set of tasks which are NOT frozen which are stuck
> > > trying to open some files on NFS.
> > >
> > > Looking at the frozen tasks there are several that have the following
> > > stack:
> > >
> > > [] rpc_wait_bit_killable+0x35/0x80
> > > [] __rpc_wait_for_completion_task+0x2d/0x30
> > > [] nfs4_run_open_task+0x11d/0x170
> > > [] _nfs4_open_and_get_state+0x53/0x260
> > > [] nfs4_do_open+0x121/0x400
> > > [] nfs4_atomic_open+0x31/0x50
> > > [] nfs4_file_open+0xac/0x180
> > > [] do_dentry_open.isra.19+0x1ee/0x280
> > > [] finish_open+0x1e/0x30
> > > [] do_last.isra.64+0x2c2/0xc40
> > > [] path_openat.isra.65+0x2c9/0x490
> > > [] do_filp_open+0x38/0x80
> > > [] do_sys_open+0xe4/0x1c0
> > > [] SyS_open+0x1e/0x20
> > > [] system_call_fastpath+0x16/0x1b
> > > [] 0x
> > >
> > > Here it looks like we are waiting in a wait queue inside
> > > rpc_wait_bit_killable() for RPC_TASK_ACTIVE.
> > >
> > > And there is a single task with a stack that looks like the following:
> > >
> > > [] __refrigerator+0x55/0x150
> > > [] rpc_wait_bit_killable+0x66/0x80
> > > [] __rpc_wait_for_completion_task+0x2d/0x30
> > > [] nfs4_run_open_task+0x11d/0x170
> > > [] _nfs4_open_and_get_state+0x53/0x260
> > > [] nfs4_do_open+0x121/0x400
> > > [] nfs4_atomic_open+0x31/0x50
> > > [] nfs4_file_open+0xac/0x180
> > > [] do_dentry_open.isra.19+0x1ee/0x280
> > > [] finish_open+0x1e/0x30
> > > [] do_last.isra.64+0x2c2/0xc40
> > > [] path_openat.isra.65+0x2c9/0x490
> > > [] do_filp_open+0x38/0x80
> > > [] do_sys_open+0xe4/0x1c0
> > > [] SyS_open+0x1e/0x20
> > > [] system_call_fastpath+0x16/0x1b
> > > [] 0x
> > >
> > > This looks similar but the different offset into
> > > rpc_wait_bit_killable() shows that we have returned from the
> > > schedule() call in freezable_schedule() and are now blocked in
> > > __refrigerator() inside freezer_count()
> > >
> > > Similarly if you look at the tasks that are NOT frozen but are stuck
> > > opening a NFS file, they also have the following stack showing they are
> > > waiting in the wait queue for RPC_TASK_ACTIVE.
> > >
> > > [] rpc_wait_bit_killable+0x35/0x80
> > > [] __rpc_wait_for_completion_task+0x2d/0x30
> > > [] nfs4_run_open_task+0x11d/0x170
> > > [] _nfs4_open_and_get_state+0x53/0x260
> > > [] nfs4_do_open+0x121/0x400
> > > [] nfs4_atomic_open+0x31/0x50
> > > [] nfs4_file_open+0xac/0x180
> > > [] do_dentry_open.isra.19+0x1ee/0x280
> > > [] finish_open+0x1e/0x30
> > > [] do_last.isra.64+0x2c2/0xc40
> > > [] path_openat.isra.65+0x2c9/0x490
> > > [] do_filp_open+0x38/0x80
> > > [] do_sys_open+0xe4/0x1c0
> > > [] SyS_open+0x1e/0x20
> > > [] system_call_fastpath+0x16/0x1b
> > > [] 0x
> > >
> > > We have hit this a couple of times now and know that if we THAW all of
> > > the frozen tasks that running tasks will unwedge and finish.
> > >
> > > Additionally we have also tried thawing the single task that is frozen
> > > in __refrigerator() inside rpc_wait_bit_killable().  This usually
> > > results in different frozen task entering the __refrigerator() state
> > > inside rpc_wait_bit_killable().  It looks like each one of those tasks
> > > must wake up another letting it progress.  Again if you thaw enough of
> > > the frozen tasks eventually everything unwedges and everything
> > > completes.
> > >
> > > I've looked through the 3.10 stable patches since 3.10.46 and don't
> > > see anything that looks like it addresses this.  Does anyone have any
> > > idea what might be going on here, and what the fix might be?
> > >
> > > Thanks,
> > > Shawn
> >
> > Hi Shawn, just started looking at this myself, and as Frank Sorensen points
> > out in https://bugzilla.redhat.com/show_bug.cgi?id=1209143 the problem is
> > that a task takes the xprt lock and then ends up in the refrigerator
> > effectively blocking other tasks from proceeding.
> >
> > Jeff, any suggestions on how to proceed here?
> 
> Sorry for the noise, and self-reply..  Looks like there's additional context
> here: http://marc.info/?t=13676151217=1=2
> 
> Due to a number of locking problems the answer to this problem is likely to
> be "don't do that" for now.
> 
> Ben

Yeah, that's definitely the answer for now.

NFS and the freezer basically cooperate if you are freezing the whole
system, but freezing some tasks and not others is fraught with peril.
The problem is that by the time you get a freeze "signal" you might be
very 

[PATCH] virtio: fix typo in vring_need_event() doc comment

2015-05-01 Thread Rusty Russell
From: Stefan Hajnoczi 

Here the "other side" refers to the guest or host.

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Rusty Russell 
---
 include/uapi/linux/virtio_ring.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index a3318f31e8e7..915980ac68df 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -155,7 +155,7 @@ static inline unsigned vring_size(unsigned int num, 
unsigned long align)
 }
 
 /* The following is used with USED_EVENT_IDX and AVAIL_EVENT_IDX */
-/* Assuming a given event_idx value from the other size, if
+/* Assuming a given event_idx value from the other side, if
  * we have just incremented index from old to new_idx,
  * should we trigger an event? */
 static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] virtio: pass baton to Michael Tsirkin

2015-05-01 Thread Rusty Russell
With my job change kernel work will be "own time"; I'm keeping lguest
and modules (and the virtio standards work), but virtio kernel has to
go.

This makes it clear that Michael is in charge.  He's good, but having
me watch over his shoulder won't help.

Good luck Michael!

Signed-off-by: Rusty Russell 

diff --git a/MAINTAINERS b/MAINTAINERS
index 2e5bbc0d68b2..16227759dfa8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10523,7 +10523,6 @@ F:  include/linux/virtio_console.h
 F: include/uapi/linux/virtio_console.h
 
 VIRTIO CORE, NET AND BLOCK DRIVERS
-M: Rusty Russell 
 M: "Michael S. Tsirkin" 
 L: virtualizat...@lists.linux-foundation.org
 S: Maintained
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] x86: punit_atom: punit device state debug driver

2015-05-01 Thread Srinivas Pandruvada
The patch adds a debug driver, which dumps the power states
of all the North complex (NC) devices. This debug interface is
useful to figure out the NC IPs which blocks the S0ix
transitions on the platform. This is extremely useful during
enabling PM on customer platforms and derivatives.

This submission not a complete rewrite from the original
submission from Kumar P, Mahesh .
https://lkml.org/lkml/2014/11/5/367
The changes are:
- Dropped changes to config for PMC_ATOM, as PMC_ATOM
is not just a debug driver as suggested by the change. It has
additional power off interface also.
- Instead of just using nc ("North complex") use punit_..
similar to south complex PMC. All the interfaces exposed
by this driver are provided by PUNIT.
- Removed pmc_config structure,  as we don't need to predefine
number of register, we want to dump. This way new register
can be added without changing NC_NUM_DEVICES.
- prefixed function with punit_
- The debugfs directory will be punit_atom, which is NC equivalent
of pmc_atom, which we already exposed by pmc_atom driver.

Signed-off-by: Srinivas Pandruvada 
---
 arch/x86/Kconfig.debug |   9 ++
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/punit_atom_debug.c | 170 +
 3 files changed, 180 insertions(+)
 create mode 100644 arch/x86/kernel/punit_atom_debug.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 20028da..9b85dfe 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -336,4 +336,13 @@ config X86_DEBUG_STATIC_CPU_HAS
 
  If unsure, say N.
 
+config PUNIT_ATOM_DEBUG
+   tristate "ATOM Punit debug driver"
+   depends on DEBUG_FS
+   select IOSF_MBI
+   ---help---
+ This is a debug driver, which gets the power states
+ of all Punit North Complex devices.The power states of
+ each IP is exposed as part of the debugfs interface.
+
 endmenu
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index cdb1b70..2ecbd55 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_PERF_EVENTS)   += perf_regs.o
 obj-$(CONFIG_TRACING)  += tracepoint.o
 obj-$(CONFIG_IOSF_MBI) += iosf_mbi.o
 obj-$(CONFIG_PMC_ATOM) += pmc_atom.o
+obj-$(CONFIG_PUNIT_ATOM_DEBUG) += punit_atom_debug.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/punit_atom_debug.c 
b/arch/x86/kernel/punit_atom_debug.c
new file mode 100644
index 000..6e7d4e5
--- /dev/null
+++ b/arch/x86/kernel/punit_atom_debug.c
@@ -0,0 +1,170 @@
+/*
+ * Intel SOC Punit device state debug driver
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PUNIT_PORT 0x04
+#define PWRGT_STATUS   0x61 /* Power gate status reg */
+#define VED_SS_PM0 0x32 /* Subsystem config/status Video */
+#define ISP_SS_PM0 0x39 /* Subsystem config/status ISP */
+#define MIO_SS_PM  0x3B /* Subsystem config/status MIO */
+#define SSS_SHIFT  24
+#define RENDER_POS 0
+#define MEDIA_POS  2
+#define VLV_DISPLAY_POS6
+#define CHT_DSP_SSS0x36 /* Subsystem config/status DSP */
+#define CHT_DSP_SSS_POS16
+
+struct punit_device {
+   char *name;
+   int reg;
+   int sss_pos;
+};
+
+static struct punit_device *punit_device;
+
+static const struct punit_device punit_device_byt[] = {
+   { "GFX RENDER", PWRGT_STATUS, RENDER_POS },
+   { "GFX MEDIA", PWRGT_STATUS, MEDIA_POS },
+   { "DISPLAY", PWRGT_STATUS, VLV_DISPLAY_POS },
+   { "VED", VED_SS_PM0, SSS_SHIFT},
+   { "ISP", ISP_SS_PM0, SSS_SHIFT},
+   { "MIO", MIO_SS_PM, SSS_SHIFT},
+   { NULL}
+};
+
+static const struct punit_device punit_device_cht[] = {
+   { "GFX RENDER", PWRGT_STATUS, RENDER_POS },
+   { "GFX MEDIA", PWRGT_STATUS, MEDIA_POS },
+   { "DSP", CHT_DSP_SSS,  CHT_DSP_SSS_POS },
+   { "VED", VED_SS_PM0, SSS_SHIFT},
+   { "ISP", ISP_SS_PM0, SSS_SHIFT},
+   { "MIO", MIO_SS_PM, SSS_SHIFT},
+   { NULL}
+};
+
+static const char * const dstates[] = {"D0", "D0i1", "D0i2", "D0i3"};
+
+static int punit_dev_state_show(struct seq_file *seq_file, void *unused)
+{
+   u32 punit_pwr_status;
+   int index;
+   int status;
+
+   seq_puts(seq_file, "\n\nPUNIT NORTH COMPLEX DEVICES :\n");
+  

[PATCH v1] x86: punit_atom: punit device state debug driver

2015-05-01 Thread Srinivas Pandruvada
v1
 Based on review comments
 - Changed to tristate instead of bool
 - Moved config to kconfig.debug
 - Added debug in module name
 - Returning -ENXIO on debugfs file create error

v0:
Base version

Srinivas Pandruvada (1):
  x86: punit_atom: punit device state debug driver

 arch/x86/Kconfig.debug |   9 ++
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/punit_atom_debug.c | 170 +
 3 files changed, 180 insertions(+)
 create mode 100644 arch/x86/kernel/punit_atom_debug.c

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] mtd: sh_flctl: fix alignment of function argument

2015-05-01 Thread Laurent Pinchart
Hi Nicholas,

Thank you for the patch.

On Friday 01 May 2015 16:16:02 Nicholas Mc Guire wrote:
> This just is a minor coding style cleanup - align the function arguments
> with the opening (.

I would just squash this into patch 1/3.

> Signed-off-by: Nicholas Mc Guire 
> ---
> 
> Patch was compile tested with ap325rxa_defconfig (implies
> CONFIG_MTD_NAND_SH_FLCTL=y)
> 
> Patch is against 4.1-rc1 (localversion-next is -next-20150501)
> 
>  drivers/mtd/nand/sh_flctl.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/sh_flctl.c b/drivers/mtd/nand/sh_flctl.c
> index 4450864..b9f265a 100644
> --- a/drivers/mtd/nand/sh_flctl.c
> +++ b/drivers/mtd/nand/sh_flctl.c
> @@ -391,7 +391,7 @@ static int flctl_dma_fifo0_transfer(struct sh_flctl
> *flctl, unsigned long *buf,
> 
>   time_left =
>   wait_for_completion_timeout(>dma_complete,
> - msecs_to_jiffies(3000));
> + msecs_to_jiffies(3000));
> 
>   if (time_left == 0) {
>   dmaengine_terminate_all(chan);

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] mtd: sh_flctl: remove unused variable and assignment

2015-05-01 Thread Laurent Pinchart
Hi Nicholas,

Thank you for the patch.

On Friday 01 May 2015 16:16:03 Nicholas Mc Guire wrote:
> Fixes a compile warning [-Wunused-but-set-variable] only.
> 
> Signed-off-by: Nicholas Mc Guire 
> ---
> 
> This fixes the compile time warning:
> drivers/mtd/nand/sh_flctl.c: In function 'flctl_dma_fifo0_transfer':
> drivers/mtd/nand/sh_flctl.c:354:15: warning: variable 'cookie' set but not
> used [-Wunused-but-set-variable]
> 
> as dmaengine_submit only reads and returns desc->tx_submit(desc) which
> is then unused in flctl_dma_fifo0_transfer, this should be side-effect
> free to remove.

You can ignore its return value, but you can't remove the call to 
dmaengine_submit, as it's part of the DMA engine API and desc->tx_submit() 
does important work.

> Patch was compile tested with ap325rxa_defconfig (implies
> CONFIG_MTD_NAND_SH_FLCTL=y)
> 
> Patch is against 4.1-rc1 (localversion-next is -next-20150501)
> 
>  drivers/mtd/nand/sh_flctl.c |2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/sh_flctl.c b/drivers/mtd/nand/sh_flctl.c
> index b9f265a..6a5c4f7 100644
> --- a/drivers/mtd/nand/sh_flctl.c
> +++ b/drivers/mtd/nand/sh_flctl.c
> @@ -351,7 +351,6 @@ static int flctl_dma_fifo0_transfer(struct sh_flctl
> *flctl, unsigned long *buf, struct dma_chan *chan;
>   enum dma_transfer_direction tr_dir;
>   dma_addr_t dma_addr;
> - dma_cookie_t cookie = -EINVAL;
>   uint32_t reg;
>   int ret;
>   unsigned long time_left;
> @@ -377,7 +376,6 @@ static int flctl_dma_fifo0_transfer(struct sh_flctl
> *flctl, unsigned long *buf,
> 
>   desc->callback = flctl_dma_complete;
>   desc->callback_param = flctl;
> - cookie = dmaengine_submit(desc);
> 
>   dma_async_issue_pending(chan);
>   } else {

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] vfio: Fix runaway interruptible timeout

2015-05-01 Thread Alex Williamson
Commit 13060b64b819 ("vfio: Add and use device request op for vfio
bus drivers") incorrectly makes use of an interruptible timeout.
When interrupted, the signal remains pending resulting in subsequent
timeouts occurring instantly.  This makes the loop spin at a much
higher rate than intended.

Instead of making this completely non-interruptible, we can change
this into a sort of interruptible-once behavior and use the "once"
to log debug information.  The driver API doesn't allow us to abort
and return an error code.

Signed-off-by: Alex Williamson 
Fixes: 13060b64b819
Cc: sta...@vger.kernel.org # v4.0
---
 drivers/vfio/vfio.c |   21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 0d33662..e1278fe 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -710,6 +710,8 @@ void *vfio_del_group_dev(struct device *dev)
void *device_data = device->device_data;
struct vfio_unbound_dev *unbound;
unsigned int i = 0;
+   long ret;
+   bool interrupted = false;
 
/*
 * The group exists so long as we have a device reference.  Get
@@ -755,9 +757,22 @@ void *vfio_del_group_dev(struct device *dev)
 
vfio_device_put(device);
 
-   } while (wait_event_interruptible_timeout(vfio.release_q,
- !vfio_dev_present(group, dev),
- HZ * 10) <= 0);
+   if (interrupted) {
+   ret = wait_event_timeout(vfio.release_q,
+   !vfio_dev_present(group, dev), HZ * 10);
+   } else {
+   ret = wait_event_interruptible_timeout(vfio.release_q,
+   !vfio_dev_present(group, dev), HZ * 10);
+   if (ret == -ERESTARTSYS) {
+   interrupted = true;
+   dev_warn(dev,
+"Device is currently in use, task"
+" \"%s\" (%d) "
+"blocked until device is released",
+current->comm, task_pid_nr(current));
+   }
+   }
+   } while (ret <= 0);
 
vfio_group_put(group);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysfs: tightened sysfs permission checks

2015-05-01 Thread Rusty Russell
Gobinda Charan Maji  writes:
> There were some inconsistency in restriction to VERIFY_OCTAL_PERMISSIONS().
> Previously the test was "User perms >= group perms >= other perms". The
> permission field of User, Group or Other consists of three bits. LSB is
> EXECUTE permission, MSB is READ permission and the middle bit is WRITE
> permission. But logically WRITE is "more privileged" than READ.
>
> Say for example, permission value is "0430". Here User has only READ
> permission whereas Group has both WRITE and EXECUTE permission.
>
> So, the checks could be tightened and the tests are separated to
> USER_READABLE >= GROUP_READABLE >= OTHER_READABLE,
> USER_WRITABLE >= GROUP_WRITABLE and OTHER_WRITABLE is not permitted.
>
> Signed-off-by: Gobinda Charan Maji 

Thanks, applied!

Cheers,
Rusty.

> ---
>  include/linux/kernel.h | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 3a5b48e..cd54b35 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -818,13 +818,15 @@ static inline void ftrace_dump(enum ftrace_dump_mode 
> oops_dump_mode) { }
>  #endif
>  
>  /* Permissions on a sysfs file: you didn't miss the 0 prefix did you? */
> -#define VERIFY_OCTAL_PERMISSIONS(perms)  
> \
> - (BUILD_BUG_ON_ZERO((perms) < 0) +   \
> -  BUILD_BUG_ON_ZERO((perms) > 0777) +\
> -  /* User perms >= group perms >= other perms */ \
> -  BUILD_BUG_ON_ZERO(((perms) >> 6) < (((perms) >> 3) & 7)) + \
> -  BUILD_BUG_ON_ZEROperms) >> 3) & 7) < ((perms) & 7)) +  \
> -  /* Other writable?  Generally considered a bad idea. */\
> -  BUILD_BUG_ON_ZERO((perms) & 2) +   \
> +#define VERIFY_OCTAL_PERMISSIONS(perms)  
> \
> + (BUILD_BUG_ON_ZERO((perms) < 0) +   
> \
> +  BUILD_BUG_ON_ZERO((perms) > 0777) +
> \
> +  /* USER_READABLE >= GROUP_READABLE >= OTHER_READABLE */
> \
> +  BUILD_BUG_ON_ZEROperms) >> 6) & 4) < (((perms) >> 3) & 4)) +   
> \
> +  BUILD_BUG_ON_ZEROperms) >> 3) & 4) < ((perms) & 4)) +  
> \
> +  /* USER_WRITABLE >= GROUP_WRITABLE */  
> \
> +  BUILD_BUG_ON_ZEROperms) >> 6) & 2) < (((perms) >> 3) & 2)) +   
> \
> +  /* OTHER_WRITABLE?  Generally considered a bad idea. */
> \
> +  BUILD_BUG_ON_ZERO((perms) & 2) +   
> \
>(perms))
>  #endif
> -- 
> 1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [HPDD-discuss] [PATCH 2/11] Staging: lustre: fld: Use kzalloc and kfree

2015-05-01 Thread Simmons, James A.
>> Yes the LARGE functions do the switching. I was expecting also patches to 
>> remove the 
>> OBD_ALLOC_LARGE functions as well which is not the case here.  I do have one 
>> question still. The
>> macro __OBD_MALLOC_VERBOSE allowed the ability to simulate memory allocation 
>> failures at
>> a certain percentage rate. Does something exist in the kernel to duplicate 
>> that functionality?
>> Once these macros are gone we lose the ability to simulate high memory 
>> allocation failures.
>
>Yes, there are things like https://lkml.org/lkml/2014/12/25/64
>So I think the API is even riher compared to what our old wrapper code was 
>able to do.

We should look to integrating that into the test suite.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging: media: omap4iss: Constify platform_device_id

2015-05-01 Thread Laurent Pinchart
Hi Krzysztof,

Thank you for the patch.

On Saturday 02 May 2015 00:43:07 Krzysztof Kozlowski wrote:
> The platform_device_id is not modified by the driver and core uses it as
> const.
> 
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Laurent Pinchart 

and applied to my tree. I'll send a pull request for v4.2.

> ---
>  drivers/staging/media/omap4iss/iss.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/media/omap4iss/iss.c
> b/drivers/staging/media/omap4iss/iss.c index e0ad5e520e2d..68867f286afd
> 100644
> --- a/drivers/staging/media/omap4iss/iss.c
> +++ b/drivers/staging/media/omap4iss/iss.c
> @@ -1478,7 +1478,7 @@ static int iss_remove(struct platform_device *pdev)
>   return 0;
>  }
> 
> -static struct platform_device_id omap4iss_id_table[] = {
> +static const struct platform_device_id omap4iss_id_table[] = {
>   { "omap4iss", 0 },
>   { },
>  };

-- 
Regards,

Laurent Pinchart

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [HPDD-discuss] [PATCH 2/11] Staging: lustre: fld: Use kzalloc and kfree

2015-05-01 Thread Simmons, James A.
>> Yes the LARGE functions do the switching. I was expecting also patches to 
>> remove the 
>> OBD_ALLOC_LARGE functions as well which is not the case here.  I do have one 
>> question still. The
>> macro __OBD_MALLOC_VERBOSE allowed the ability to simulate memory allocation 
>> failures at
>> a certain percentage rate. Does something exist in the kernel to duplicate 
>> that functionality?
>
>Yes, no need for lustre to duplicate yet-another-thing the kernel
>already provides :)

The reason for this is that libcfs was written 10+ years ago which was before 
linux had such nice
features. At that time it was needed to fill the gaps missing which is no 
longer the case. Libcfs is
really showing its age :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] speeding up cpu_up()

2015-05-01 Thread Borislav Petkov
On Fri, May 01, 2015 at 02:42:39PM -0700, Len Brown wrote:
> On Mon, Apr 20, 2015 at 12:15 AM, Ingo Molnar  wrote:
> 
> > So instead of playing games with an ancient delay, I'd suggest we
> > install the 10 msec INIT assertion wait as a platform quirk instead,
> > and activate it for all CPUs/systems that we think might need it, with
> > a sufficiently robust and future-proof quirk cutoff condition.
> >
> > New systems won't have the quirk active and thus won't have to have
> > this delay configurable either.
> 
> Okay, at this time, I think the quirk would apply to:
> 
> 1. Intel family 5 (original pentium) -- some may actually need the quirk
> 2. Intel family F (pentium4) -- mostly b/c I don't want to bother
> finding/testing p4
> 3. All AMD (happy to narrow down, if somebody can speak for AMD)

Aravind and I could probably test on a couple of AMD boxes to narrow down.

@Aravind, see here:

https://lkml.kernel.org/r/87d69aab88c14d65ae1e7be55050d1b689b59b4b.1429402494.git.len.br...@intel.com

You could ask around whether a timeout is needed between the assertion
and deassertion of INIT done by the BSP when booting other cores.

If not, we probably should convert, at least modern AMD machines, to the
no-delay default.

> I'd keep the cmdline override, in case we break something,
> or somebody wants to optimize/test.  (Though I'll update units to
> usec, rather than msec.,
> so we can go below 1ms without going to 0)
> I don't think we need the config option, just a #define to document the quirk.
> 
> What do you think?
> 
> Len Brown, Intel Open Source Technology Center
> 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ktime: Fix ktime_divns to do signed division

2015-05-01 Thread John Stultz
Bah. I forgot to add [RFC] to the subject. This patch isn't yet ready
for submission, I just wanted to get some initial feedback on it.

thanks
-john

On Fri, May 1, 2015 at 3:41 PM, John Stultz  wrote:
> It was noted that the 32bit implementation of ktime_divns
> was doing unsgined division adn didn't properly handle
> negative values.
>
> This patch fixes the problem by checking and preserving
> the sign bit, and then reapplying it if appropriate
> after the division.
>
> Unfortunately there is some duplication since we have
> the optimized version for constant 32bit divider. I
> was considering reworkign the __ktime_divns helper
> to simplify the sign-handling logic, but then it
> would likely just be a s64/s64 divide, and probably
> should be more generic.
>
> Thoughts?
>
> Nicolas also notes that the ktime_divns() function
> breaks if someone passes in a negative divisor as
> well. This patch doesn't yet address that issue.
>
> Cc: Nicolas Pitre 
> Cc: Thomas Gleixner 
> Cc: Josh Boyer 
> Cc: One Thousand Gnomes 
> Reported-by: Trevor Cordes 
> Signed-off-by: John Stultz 
> ---
>  include/linux/ktime.h | 12 ++--
>  kernel/time/hrtimer.c | 11 +--
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index 5fc3d10..d947263 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -166,12 +166,20 @@ static inline bool ktime_before(const ktime_t cmp1, 
> const ktime_t cmp2)
>  }
>
>  #if BITS_PER_LONG < 64
> -extern u64 __ktime_divns(const ktime_t kt, s64 div);
> +extern s64 __ktime_divns(const ktime_t kt, s64 div);
>  static inline u64 ktime_divns(const ktime_t kt, s64 div)
>  {
> if (__builtin_constant_p(div) && !(div >> 32)) {
> -   u64 ns = kt.tv64;
> +   s64 ns = kt.tv64;
> +   int neg = 0;
> +
> +   if (ns < 0) {
> +   neg = 1;
> +   ns = -ns;
> +   }
> do_div(ns, div);
> +   if (neg)
> +   ns = -ns;
> return ns;
> } else {
> return __ktime_divns(kt, div);
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 76d4bd9..4c1b294 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -266,12 +266,17 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
> long *flags)
>  /*
>   * Divide a ktime value by a nanosecond value
>   */
> -u64 __ktime_divns(const ktime_t kt, s64 div)
> +s64 __ktime_divns(const ktime_t kt, s64 div)
>  {
> -   u64 dclc;
> +   s64 dclc;
> int sft = 0;
> +   int neg = 0;
>
> dclc = ktime_to_ns(kt);
> +   if (dclc < 0) {
> +   neg = 1;
> +   dclc = -dclc;
> +   }
> /* Make sure the divisor is less than 2^32: */
> while (div >> 32) {
> sft++;
> @@ -279,6 +284,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
> }
> dclc >>= sft;
> do_div(dclc, (unsigned long) div);
> +   if (neg)
> +   dclc = -dclc;
>
> return dclc;
>  }
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 15/17] ACPICA/ARM: ACPI 5.1: Update for GTDT table changes.

2015-05-01 Thread Timur Tabi
On Tue, Jul 29, 2014 at 11:21 PM, Lv Zheng  wrote:
> From: Tomasz Nowicki 
>
> New fields and new subtables. Tomasz Nowicki.
> tomasz.nowi...@linaro.org
>
> Signed-off-by: Tomasz Nowicki 
> Signed-off-by: Hanjun Guo 
> Signed-off-by: Bob Moore 
> Signed-off-by: Lv Zheng 

Hi, I know this patch is old, but something confuses me about it:

> +/* Common GTDT subtable header */
> +
> +struct acpi_gtdt_header {
> +   u8 type;
> +   u16 length;
> +};

I'm trying to write a function that parses the watchdog structure
(acpi_gtdt_watchdog).  The first entry in that structure is
acpi_gtdt_header.  Looking at the ACPI specification, I see that this
is correct: the type is one byte, and the length is two bytes.

However, this means that I cannot use acpi_parse_entries() to parse
the watchdog subtable:

int __init
acpi_parse_entries(char *id, unsigned long table_size,
acpi_tbl_entry_handler handler,
struct acpi_table_header *table_header,
int entry_id, unsigned int max_entries)

acpi_tbl_entry_handler takes an acpi_subtable_header as its first
parameter.  However, that structure looks like this:

struct acpi_subtable_header {
u8 type;
u8 length;
};

This is not compatible, so I'm confused now.  How do I properly parse
the watchdog subtable, if I cannot use acpi_parse_entries?

For context, here is my patch:

http://www.spinics.net/lists/linux-watchdog/msg06240.html

Scroll down to function arm_sbsa_wdt_parse_gtdt().  The typecast in
first line is invalid:

+struct acpi_gtdt_watchdog *wdg = (struct acpi_gtdt_watchdog *)header;

because of the mismatch.  I don't know how to fix this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ktime: Fix ktime_divns to do signed division

2015-05-01 Thread John Stultz
It was noted that the 32bit implementation of ktime_divns
was doing unsgined division adn didn't properly handle
negative values.

This patch fixes the problem by checking and preserving
the sign bit, and then reapplying it if appropriate
after the division.

Unfortunately there is some duplication since we have
the optimized version for constant 32bit divider. I
was considering reworkign the __ktime_divns helper
to simplify the sign-handling logic, but then it
would likely just be a s64/s64 divide, and probably
should be more generic.

Thoughts?

Nicolas also notes that the ktime_divns() function
breaks if someone passes in a negative divisor as
well. This patch doesn't yet address that issue.

Cc: Nicolas Pitre 
Cc: Thomas Gleixner 
Cc: Josh Boyer 
Cc: One Thousand Gnomes 
Reported-by: Trevor Cordes 
Signed-off-by: John Stultz 
---
 include/linux/ktime.h | 12 ++--
 kernel/time/hrtimer.c | 11 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 5fc3d10..d947263 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -166,12 +166,20 @@ static inline bool ktime_before(const ktime_t cmp1, const 
ktime_t cmp2)
 }
 
 #if BITS_PER_LONG < 64
-extern u64 __ktime_divns(const ktime_t kt, s64 div);
+extern s64 __ktime_divns(const ktime_t kt, s64 div);
 static inline u64 ktime_divns(const ktime_t kt, s64 div)
 {
if (__builtin_constant_p(div) && !(div >> 32)) {
-   u64 ns = kt.tv64;
+   s64 ns = kt.tv64;
+   int neg = 0;
+
+   if (ns < 0) {
+   neg = 1;
+   ns = -ns;
+   }
do_div(ns, div);
+   if (neg)
+   ns = -ns;
return ns;
} else {
return __ktime_divns(kt, div);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 76d4bd9..4c1b294 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -266,12 +266,17 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned 
long *flags)
 /*
  * Divide a ktime value by a nanosecond value
  */
-u64 __ktime_divns(const ktime_t kt, s64 div)
+s64 __ktime_divns(const ktime_t kt, s64 div)
 {
-   u64 dclc;
+   s64 dclc;
int sft = 0;
+   int neg = 0;
 
dclc = ktime_to_ns(kt);
+   if (dclc < 0) {
+   neg = 1;
+   dclc = -dclc;
+   }
/* Make sure the divisor is less than 2^32: */
while (div >> 32) {
sft++;
@@ -279,6 +284,8 @@ u64 __ktime_divns(const ktime_t kt, s64 div)
}
dclc >>= sft;
do_div(dclc, (unsigned long) div);
+   if (neg)
+   dclc = -dclc;
 
return dclc;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1 1/1] iio: ltr501: Add light channel support

2015-05-01 Thread Kuppuswamy Sathyanarayanan
Added support to calculate lux value from visible
and IR spectrum adc count values. Also added IIO_LIGHT
channel to enable user read the lux value directly
from device using illuminance input ABI.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
 drivers/iio/light/ltr501.c | 71 +-
 1 file changed, 64 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/light/ltr501.c b/drivers/iio/light/ltr501.c
index ca4bf47..242cf20 100644
--- a/drivers/iio/light/ltr501.c
+++ b/drivers/iio/light/ltr501.c
@@ -66,6 +66,9 @@
 
 #define LTR501_REGMAP_NAME "ltr501_regmap"
 
+#define LTR501_LUX_CONV(vis_coeff, vis_data, ir_coeff, ir_data) \
+   ((vis_coeff * vis_data) - (ir_coeff * ir_data))
+
 static const int int_time_mapping[] = {10, 5, 20, 40};
 
 static const struct reg_field reg_field_it =
@@ -298,6 +301,29 @@ static int ltr501_ps_read_samp_period(struct ltr501_data 
*data, int *val)
return IIO_VAL_INT;
 }
 
+/* IR and visible spectrum coeff's are given in data sheet */
+static unsigned long ltr501_calculate_lux(u16 vis_data, u16 ir_data)
+{
+   unsigned long ratio, lux;
+
+   if (vis_data == 0)
+   return 0;
+
+   /* multiply numerator by 100 to avoid handling ratio < 1 */
+   ratio = DIV_ROUND_UP(ir_data * 100, ir_data + vis_data);
+
+   if (ratio < 45)
+   lux = LTR501_LUX_CONV(1774, vis_data, -1105, ir_data);
+   else if (ratio >= 45 && ratio < 64)
+   lux = LTR501_LUX_CONV(3772, vis_data, 1336, ir_data);
+   else if (ratio >= 64 && ratio < 85)
+   lux = LTR501_LUX_CONV(1690, vis_data, 169, ir_data);
+   else
+   lux = 0;
+
+   return lux / 1000;
+}
+
 static int ltr501_drdy(struct ltr501_data *data, u8 drdy_mask)
 {
int tries = 100;
@@ -548,11 +574,24 @@ static const struct iio_event_spec 
ltr501_pxs_event_spec[] = {
.num_event_specs = _evsize,\
 }
 
+#define LTR501_LIGHT_CHANNEL(_idx) { \
+   .type = IIO_LIGHT, \
+   .info_mask_separate = BIT(IIO_CHAN_INFO_PROCESSED), \
+   .scan_index = 0, \
+   .scan_type = { \
+   .sign = 'u', \
+   .realbits = 16, \
+   .storagebits = 16, \
+   .endianness = IIO_CPU, \
+   }, \
+}
+
 static const struct iio_chan_spec ltr501_channels[] = {
-   LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
+   LTR501_LIGHT_CHANNEL(0),
+   LTR501_INTENSITY_CHANNEL(1, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
 ltr501_als_event_spec,
 ARRAY_SIZE(ltr501_als_event_spec)),
-   LTR501_INTENSITY_CHANNEL(1, LTR501_ALS_DATA1, IIO_MOD_LIGHT_IR,
+   LTR501_INTENSITY_CHANNEL(2, LTR501_ALS_DATA1, IIO_MOD_LIGHT_IR,
 BIT(IIO_CHAN_INFO_SCALE) |
 BIT(IIO_CHAN_INFO_INT_TIME) |
 BIT(IIO_CHAN_INFO_SAMP_FREQ),
@@ -562,7 +601,7 @@ static const struct iio_chan_spec ltr501_channels[] = {
.address = LTR501_PS_DATA,
.info_mask_separate = BIT(IIO_CHAN_INFO_RAW) |
BIT(IIO_CHAN_INFO_SCALE),
-   .scan_index = 2,
+   .scan_index = 3,
.scan_type = {
.sign = 'u',
.realbits = 11,
@@ -572,19 +611,20 @@ static const struct iio_chan_spec ltr501_channels[] = {
.event_spec = ltr501_pxs_event_spec,
.num_event_specs = ARRAY_SIZE(ltr501_pxs_event_spec),
},
-   IIO_CHAN_SOFT_TIMESTAMP(3),
+   IIO_CHAN_SOFT_TIMESTAMP(4),
 };
 
 static const struct iio_chan_spec ltr301_channels[] = {
-   LTR501_INTENSITY_CHANNEL(0, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
+   LTR501_LIGHT_CHANNEL(0),
+   LTR501_INTENSITY_CHANNEL(1, LTR501_ALS_DATA0, IIO_MOD_LIGHT_BOTH, 0,
 ltr501_als_event_spec,
 ARRAY_SIZE(ltr501_als_event_spec)),
-   LTR501_INTENSITY_CHANNEL(1, LTR501_ALS_DATA1, IIO_MOD_LIGHT_IR,
+   LTR501_INTENSITY_CHANNEL(2, LTR501_ALS_DATA1, IIO_MOD_LIGHT_IR,
 BIT(IIO_CHAN_INFO_SCALE) |
 BIT(IIO_CHAN_INFO_INT_TIME) |
 BIT(IIO_CHAN_INFO_SAMP_FREQ),
 NULL, 0),
-   IIO_CHAN_SOFT_TIMESTAMP(2),
+   IIO_CHAN_SOFT_TIMESTAMP(3),
 };
 
 static int ltr501_read_raw(struct iio_dev *indio_dev,
@@ -596,6 +636,23 @@ static int ltr501_read_raw(struct iio_dev *indio_dev,
int ret, i;
 
switch (mask) {
+   case IIO_CHAN_INFO_PROCESSED:
+   if (iio_buffer_enabled(indio_dev))
+   return -EBUSY;
+
+   switch (chan->type) {
+   case IIO_LIGHT:
+   mutex_lock(>lock_als);
+   ret = 

Re: Regression: Disk corruption with dm-crypt and kernels >= 4.0

2015-05-01 Thread Abelardo Ricart III
On Fri, 2015-05-01 at 17:17 -0400, Mike Snitzer wrote:
> On Fri, May 01 2015 at 12:37am -0400,
> Abelardo Ricart III  wrote:
> 
> > I made sure to run a completely vanilla kernel when testing why I was 
> > suddenly
> > seeing some nasty libata errors with all kernels >= v4.0. Here's a snippet:
> > 
> > >8
> > [  165.592136] ata5.00: exception Emask 0x60 SAct 0x7000 SErr 0x800 action 
> > 0x6
> > frozen
> > [  165.592140] ata5.00: irq_stat 0x2000, host bus error
> > [  165.592143] ata5: SError: { HostInt }
> > [  165.592145] ata5.00: failed command: READ FPDMA QUEUED
> > [  165.592149] ata5.00: cmd 60/08:60:a0:0d:89/00:00:07:00:00/40 tag 12 ncq 
> > 4096
> > in
> > res 40/00:74:40:58:5d/00:00:00:00:00/40 Emask 0x60
> > (host bus error)
> > [  165.592151] ata5.00: status: { DRDY }
> > >8
> > 
> > After a few dozen of these errors, I'd suddenly find my system in read-only
> > mode with corrupted files throughout my encrypted filesystems (seemed like
> > either a read or a write would corrupt a file, though I could be mistaken). 
> > I
> > decided to do a git bisect with a random read-write-sync test to narrow down
> > the culprit, which turned out to be this commit (part of a series):
> > 
> > # first bad commit: [cf2f1abfbd0dba701f7f16ef619e4d2485de3366] dm crypt: 
> > don't
> > allocate pages for a partial request
> > 
> > Just to be sure, I created a patch to revert the entire nine patch series 
> > that
> > commit belonged to... and the bad behavior disappeared. I've now been 
> > running
> > kernel 4.0 for a few days without issue, and went so far as to stress test 
> > my
> > poor SSD for a few hours to be 100% positive.
> > 
> > Here's some more info on my setup.
> > 
> > >8
> > $ lsblk -f
> > NAME FSTYPE  LABEL MOUNTPOINT
> > sda  
> > ├─sda1   vfat  /boot/EFI
> > ├─sda2   ext4  /boot
> > └─sda3   LVM2_member
> >   ├─SSD-root crypto_LUKS
> >   │ └─root   f2fs  /
> >   └─SSD-home crypto_LUKS
> > └─home   f2fs  /home
> > 
> > $ cat /proc/cmdline
> > BOOT_IMAGE=/vmlinuz-linux-memnix cryptdevice=/dev/SSD/root:root:allow
> > -discards
> > root=/dev/mapper/root acpi_osi=Linux security=tomoyo
> > TOMOYO_trigger=/usr/lib/systemd/systemd intel_iommu=on
> > modprobe.blacklist=nouveau rw quiet
> > 
> > $ cat /etc/lvm/lvm.conf | grep "issue_discards"
> > issue_discards = 1
> > >8
> > 
> > If there's anything else I can do to help diagnose the underlying problem, 
> > I'm
> > more than willing.
> 
> The patchset in question was tested quite heavily so this is a
> surprising report.  I'm noticing you are opting in to dm-crypt discard
> support.  Have you tested without discards enabled?

I've disabled discards universally and rebuilt a vanilla kernel. After running
my heavy read-write-sync scripts, everything seems to be working fine now. I
suppose this could be something that used to fail silently before, but now
produces bad behavior? I seem to remember having something in my message log
about "discards not supported on this device" when running with it enabled
before.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Optimize variable_test_bit()

2015-05-01 Thread Vladimir Makarov



On 01/05/15 04:49 PM, Linus Torvalds wrote:

On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov  wrote:

   GCC RA is a major reason to prohibit output operands for asm goto.

Hmm.. Thinking some more about it, I think that what would actually
work really well at least for the kernel is:

(a) allow *memory* operands (ie "=m") as outputs and having them be
meaningful even at any output labels (obviously with the caveat that
the asm instructions that write to memory would have to happen before
the branch ;)

This covers the somewhat common case of having magic instructions that
result in conditions that can't be tested at a C level. Things like
"bit clear and test" on x86 (with or without the lock) .

  (b) allow other operands to be meaningful onlty for the fallthrough case.

 From a register allocation standpoint, these should be the easy cases.
(a) doesn't need any register allocation of the output (only on the
input to set up the effective address of the memory location), and (b)
would explicitly mean that an "asm goto" would leave any non-memory
outputs undefined in any of the goto cases, so from a RA standpoint it
ends up being equivalent to a non-goto asm..

Thanks for explanation what you need in the most common case.

Big part of GCC RA (at least local register allocators -- reload pass 
and LRA) besides assigning hard registers to pseudos is to make 
transformations to satisfy insn constraints.  If there is not enough 
hard registers, a pseudo can be allocated to a stack slot and if insn 
using the pseudo needs a hard register, load or/and store should be 
generated before/after the insn.  And the problem for the old (reload 
pass) and new RA (LRA) is that they were not designed to put new insns 
after an insn changing control flow.  Assigning hard registers itself is 
not an issue for asm goto case.


If I understood you correctly, you assume that just permitting =m will 
make GCC generates the correct code. Unfortunately, it is more 
complicated.  The operand can be not a memory or memory not satisfying 
memory constraint 'm'.  So still insns for moving memory satisfying 'm' 
into output operand location might be necessary after the asm goto.


We could make asm goto semantics requiring that a user should provide 
memory for such output operand (e.g. a pointer dereferrencing in your 
case) and generate an error otherwise.  By the way the same could be 
done for output *register* operand.  And user to avoid the error should 
use a local register variable (a GCC extension) as an operand. But it 
might be a bad idea with code performance point of view.


Unfortunately, the operand can be substituted by an equiv. value during 
different transformations and even if an user think it will be a memory 
before RA, it might be wrong.  Although I believe there are some cases 
where we can be sure that it will be memory (e.g. dereferrencing pointer 
which is a function argument and is not used anywhere else in 
function).  Still it makes asm goto semantics complicated imho.


We could prevent equiv. substitution for output memory operand of asm 
goto through all the optimizations but it is probably even harder task 
than implementing output reloads in *reload* pass (it is 28-year old 
pass with so many changes during its life that practically nobody can 
understand it now well and change w/o introducing a new bug).  As for 
LRA, I wrote implementing output reloads is a double task.



Hmm?

So as an example of something that the kernel does and which wants to
have an output register. is to do a load from user space that can
fault. When it faults, we obviously simply don't *have* an actual
result, and we return an error. But for the successful fallthrough
case, we get a value in a register.

I'd love to be able to write it as (this is simplified, and doesn't
worry about all the different access sizes, or the "stac/clac"
sequence to enable user accesses on modern Intel CPU's):

 asm goto(
 "1:"
 "\tmovl %0,%1\n"
 _ASM_EXTABLE(1b,%l[error])
 : "=r" (val)
 : "m" (*userptr)
 : : error);

where that "_ASM_EXTABLE()" is our magic macro for generating an
exception entry for that instruction, so that if the load takes an
exception, it will instead to to the "error" label.

But if it goes to the error label, the "val" output register really
doesn't contain anything, so we wouldn't even *want* gcc to try to do
any register allocation for the "jump to label from assembly" case.

So at least for one of the major cases that I'd like to use "asm goto"
with an output, I actually don't *want* any register allocation for
anything but the fallthrough case. And I suspect that's a
not-too-uncommon pattern - it's probably often about error handling.


As I wrote already if we implement output reloads after the control flow 
insn, it does not matter what operand constraint should be (memory or 
register).  Implementing it only for fall-through case 

[PATCH 1/3] net: dsa: introduce module_switch_driver macro

2015-05-01 Thread Vivien Didelot
This commit introduces a new module_switch_driver macro, similar to
module_platform_driver and such, to reduce boilerplate when declaring
DSA switch drivers.

In order to use the module_driver macro, register_switch_driver needed
to be changed to return an int instead of void, so make it return 0.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h | 13 -
 net/dsa/dsa.c |  4 +++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63b..927f16a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -11,6 +11,7 @@
 #ifndef __LINUX_NET_DSA_H
 #define __LINUX_NET_DSA_H
 
+#include 
 #include 
 #include 
 #include 
@@ -304,8 +305,18 @@ struct dsa_switch_driver {
   unsigned char *addr, bool *is_static);
 };
 
-void register_switch_driver(struct dsa_switch_driver *type);
+int register_switch_driver(struct dsa_switch_driver *type);
 void unregister_switch_driver(struct dsa_switch_driver *type);
+
+/* module_switch_driver() - Helper macro for drivers that don't do anything
+ * special in module init/exit. This eliminates a lot of boilerplate. Each
+ * module may only use this macro once, and calling it replaces module_init()
+ * and module_exit()
+ */
+#define module_switch_driver(__switch_driver) \
+   module_driver(__switch_driver, register_switch_driver, \
+   unregister_switch_driver)
+
 struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev);
 
 static inline void *ds_to_priv(struct dsa_switch *ds)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index e6f6cc3..9630522 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -31,11 +31,13 @@ char dsa_driver_version[] = "0.1";
 static DEFINE_MUTEX(dsa_switch_drivers_mutex);
 static LIST_HEAD(dsa_switch_drivers);
 
-void register_switch_driver(struct dsa_switch_driver *drv)
+int register_switch_driver(struct dsa_switch_driver *drv)
 {
mutex_lock(_switch_drivers_mutex);
list_add_tail(>list, _switch_drivers);
mutex_unlock(_switch_drivers_mutex);
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(register_switch_driver);
 
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] net: dsa: mv88e6060: use module_switch_driver

2015-05-01 Thread Vivien Didelot
Use the module_switch_driver helper macro to declare the driver and thus
reduce boilerplate.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6060.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index c29aebe..c58d5c9 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -283,18 +283,7 @@ static struct dsa_switch_driver mv88e6060_switch_driver = {
.poll_link  = mv88e6060_poll_link,
 };
 
-static int __init mv88e6060_init(void)
-{
-   register_switch_driver(_switch_driver);
-   return 0;
-}
-module_init(mv88e6060_init);
-
-static void __exit mv88e6060_cleanup(void)
-{
-   unregister_switch_driver(_switch_driver);
-}
-module_exit(mv88e6060_cleanup);
+module_switch_driver(mv88e6060_switch_driver);
 
 MODULE_AUTHOR("Lennert Buytenhek ");
 MODULE_DESCRIPTION("Driver for Marvell 88E6060 ethernet switch chip");
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] net: dsa: sf2: use module_switch_driver

2015-05-01 Thread Vivien Didelot
Use the module_switch_driver helper macro to declare the driver and thus
reduce boilerplate.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/bcm_sf2.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index cedb572..2b438fb 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -1071,19 +1071,7 @@ static struct dsa_switch_driver bcm_sf2_switch_driver = {
.port_stp_update= bcm_sf2_sw_br_set_stp_state,
 };
 
-static int __init bcm_sf2_init(void)
-{
-   register_switch_driver(_sf2_switch_driver);
-
-   return 0;
-}
-module_init(bcm_sf2_init);
-
-static void __exit bcm_sf2_exit(void)
-{
-   unregister_switch_driver(_sf2_switch_driver);
-}
-module_exit(bcm_sf2_exit);
+module_switch_driver(bcm_sf2_switch_driver);
 
 MODULE_AUTHOR("Broadcom Corporation");
 MODULE_DESCRIPTION("Driver for Broadcom Starfighter 2 ethernet switch chip");
-- 
2.3.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/2] clk: improve handling of orphan clocks

2015-05-01 Thread Heiko Stübner
Am Freitag, 1. Mai 2015, 13:52:47 schrieb Stephen Boyd:
> On 05/01/15 12:59, Heiko Stübner wrote:
> > Am Donnerstag, 30. April 2015, 17:19:01 schrieb Stephen Boyd:
> >> On 04/22, Heiko Stuebner wrote:
> >>> Using orphan clocks can introduce strange behaviour as they don't have
> >>> rate information at all and also of course don't track
> >>> 
> >>> This v2/v3 takes into account suggestions from Stephen Boyd to not try
> >>> to
> >>> walk the clock tree at runtime but instead keep track of orphan states
> >>> on clock tree changes and making it mandatory for everybody from the
> >>> start as orphaned clocks should not be used at all.
> >>> 
> >>> 
> >>> This fixes an issue on most rk3288 platforms, where some soc-clocks
> >>> are supplied by a 32khz clock from an external i2c-chip which often
> >>> is only probed later in the boot process and maybe even after the
> >>> drivers using these soc-clocks like the tsadc temperature sensor.
> >>> In this case the driver using the clock should of course defer probing
> >>> until the clock is actually usable.
> >>> 
> >>> 
> >>> As this changes the behaviour for orphan clocks, it would of course
> >>> benefit from more testing than on my Rockchip boards. To keep the
> >>> recipent-list reasonable and not spam to much I selected one (the
> >>> topmost)
> >>> from the get_maintainer output of each drivers/clk entry.
> >>> Hopefully some will provide Tested-by-tags :-)
> >> 
> >>  I don't see any Tested-by: tags yet . I've
> >> put these two patches on a separate branch "defer-orphans" and
> >> pushed it to clk-next so we can give it some more exposure.
> >> 
> >> Unfortunately this doesn't solve the orphan problem for non-OF
> >> providers. What if we did the orphan check in __clk_create_clk()
> >> instead and returned an error pointer for orphans? I suspect that
> >> will solve all cases, right?
> > 
> > hmm, clk_register also uses __clk_create_clk, which in turn would prevent
> > registering orphan-clocks at all, I'd think.
> > As on my platform I'm dependant on orphan clocks (the soc-level clock gets
> > registerted as part of the big clock controller way before the i2c-based
> > supplying clock), I'd rather not touch this :-) .
> 
> Have no fear! We should just change clk_register() to call a
> __clk_create_clk() type function that doesn't check for orphan status.

ok :-D


> > Instead I guess we could hook it less deep into clk_get_sys, like in the
> > following patch?
> 
> It looks like it will work at least, but still I'd prefer to keep the
> orphan check contained to clk.c. How about this compile tested only patch?

I gave this a spin on my rk3288-firefly board. It still boots, the clock tree 
looks the same and it also still defers nicely in the scenario I needed it 
for. The implementation also looks nice - and of course much more compact than 
my check in two places :-) . I don't know if you want to put this as follow-up 
on top or fold it into the original orphan-check, so in any case

Tested-by: Heiko Stuebner 
Reviewed-by: Heiko Stuebner 


> This also brings up an existing problem with clk_unregister() where
> orphaned clocks are sitting out there useable by drivers when their
> parent is unregistered. That code could use some work to atomically
> switch all the orphaned clocks over to use the nodrv_ops.

Not sure I understand this correctly yet, but when these children get 
orphaned, switched to the clk_nodrv_ops, they won't get their original ops 
back if the parent reappears.

So I guess we would need to store the original ops in secondary property of 
struct clk_core and I guess simply bind the ops-switch to the orphan state 
update?


> 
> 8<-
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 30d45c657a07..1d23daa42dd2 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -2221,14 +2221,6 @@ static inline void clk_debug_unregister(struct
> clk_core *core) }
>  #endif
> 
> -static bool clk_is_orphan(const struct clk *clk)
> -{
> - if (!clk)
> - return false;
> -
> - return clk->core->orphan;
> -}
> -
>  /**
>   * __clk_init - initialize the data structures in a struct clk
>   * @dev: device initializing this clk, placeholder for now
> @@ -2420,15 +2412,11 @@ out:
>   return ret;
>  }
> 
> -struct clk *__clk_create_clk(struct clk_hw *hw, const char *dev_id,
> -  const char *con_id)
> +static struct clk *clk_hw_create_clk(struct clk_hw *hw, const char *dev_id,
> +  const char *con_id)
>  {
>   struct clk *clk;
> 
> - /* This is to allow this function to be chained to others */
> - if (!hw || IS_ERR(hw))
> - return (struct clk *) hw;
> -
>   clk = kzalloc(sizeof(*clk), GFP_KERNEL);
>   if (!clk)
>   return ERR_PTR(-ENOMEM);
> @@ -2445,6 +2433,19 @@ struct clk *__clk_create_clk(struct clk_hw *hw, const
> char *dev_id, return clk;
>  }
> 
> +struct clk *__clk_create_clk(struct clk_hw *hw, const 

Re: [GIT PULL] VFIO fixes for v4.1-rc2

2015-05-01 Thread Alex Williamson
On Fri, 2015-05-01 at 13:23 -0700, Linus Torvalds wrote:
> On Fri, May 1, 2015 at 11:48 AM, Alex Williamson
>  wrote:
> >
> > Ok.  It seemed like useful behavior to be able to provide some response
> > to the user in the event that a ->remove handler is blocked by a device
> > in-use and the user attempts to abort the action.
> 
> Well, that kind of notification *might* be useful, but at the cost of
> saying "somebody tried to send you a signal, so I am now telling you
> about it, and then deleting that signal, and you'll never know what it
> actually was"?
> 
> That's not useful, that's just wrong.

Yep, it was a bad idea.

> Now, what might in theory be useful - but I haven't actually seen
> anybody do anything like that - is to start out with an interruptible
> sleep, warn if you get interrupted, and then continue with an
> un-interruptible sleep (leaving the signal active).

I was considering doing exactly this.

> But even that sounds like a very special case, and I don't think
> anything has ever done that.
> 
> In general, our signal handling falls into three distinct categories:
> 
>  (a) interruptible (and you can cancel the operation and return "try again")
> 
>  (b) killable (you can cancel the operation, knowing that the
> requester will be killed and won't try again)
> 
>  (c) uninterruptible
> 
> where that (b) tends to be a special case of an operation that
> technically isn't really interruptible (because the ABI doesn't allow
> for retrying or error returns), but knowing that the caller will never
> see the error case because it's killed means that you can do it. The
> classic example of that is an NFS mount that is mounted "nointr" - you
> can't return EINTR for a read or a write (because that invalidates
> POSIX) but you want to let SIGKILL just kill the process in the middle
> when the network is hung.

I think we're in that (c) case unless we want to change our driver API
to allow driver remove callbacks to return error.  Killing the caller
doesn't really help the situation without being able to back out of the
remove path.  Killing the task with the device open would help, but
seems rather harsh.  I expect we eventually want to be able to escalate
to revoking the device from the user, but currently we only have a
notifier to request the device from cooperative users.  In the event of
an uncooperative user, we block, which can be difficult to figure out,
especially when we're dealing with SR-IOV devices and a PF unbind
implicitly induces a VF unbind.  The interruptible component here is
simply a logging mechanism which should have turned into an
"interruptible_once" rather than a signal flush.

I try to avoid vfio being a special case, but maybe in this instance
it's worthwhile.  If you have other suggestions, please let me know.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >