date:20130120

Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP

2013-01-20 Thread Feng Tang

On Mon, Jan 21, 2013 at 02:27:29AM -0500, Chen Gong wrote:
> On Mon, Jan 21, 2013 at 02:38:41PM +0800, Feng Tang wrote:
> > Date:   Mon, 21 Jan 2013 14:38:41 +0800
> > From: Feng Tang 
> > To: Thomas Gleixner , John Stultz
> >  , Ingo Molnar , "H. Peter Anvin"
> >  , x...@kernel.org, Len Brown ,
> >  "Rafael J. Wysocki" ,
> >  linux-kernel@vger.kernel.org
> > Cc: Feng Tang 
> > Subject: [RFC PATCH 1/5] x86: Add cpu capability flag
> >  X86_FEATURE_TSC_S3_NOTSTOP
> > X-Mailer: git-send-email 1.7.9.5
> > 
> > On some new Intel Atom processors (Penwell and Cloverview), there is
> > a feature that the TSC won't stop S3, say the TSC value won't be
> > reset to 0 after resume. This feature makes TSC a more reliable
> > clocksource and could benefit the timekeeping code during system
> > suspend/resume cycle, so add a flag for it.
> > 
> > Signed-off-by: Feng Tang 
> > ---
> >  arch/x86/include/asm/cpufeature.h |1 +
> >  arch/x86/kernel/cpu/intel.c   |   12 
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h 
> > b/arch/x86/include/asm/cpufeature.h
> > index 2d9075e..f7e1eac 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -100,6 +100,7 @@
> >  #define X86_FEATURE_AMD_DCM (3*32+27) /* multi-node processor */
> >  #define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */
> >  #define X86_FEATURE_EAGER_FPU  (3*32+29) /* "eagerfpu" Non lazy FPU 
> > restore */
> > +#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 
> > state */
> >  
> We have an existed "TSC always running in C3+" feature and name it as
> X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
> like X86_FEATURE_NONSTOP_TSC_S3?

Yeah, actually I used a name X86_FEATURE_xxx_TSC, then I did a grep,
and found there is no unified name convention for TSC, so I chose such
a name.

--
#grep _TSC arch/x86/include/asm/cpufeature.h
#define X86_FEATURE_TSC (0*32+ 4) /* Time Stamp Counter */
#define X86_FEATURE_CONSTANT_TSC (3*32+ 8) /* TSC ticks at a constant rate */
#define X86_FEATURE_TSC_RELIABLE (3*32+23) /* TSC is known to be reliable */
#define X86_FEATURE_NONSTOP_TSC (3*32+24) /* TSC does not stop in C states */
#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
#define X86_FEATURE_TSC_DEADLINE_TIMER  (4*32+24) /* Tsc deadline timer */
#define X86_FEATURE_TSCRATEMSR  (8*32+ 9) /* "tsc_scale" AMD TSC scaling 
support */
#define X86_FEATURE_TSC_ADJUST  (9*32+ 1) /* TSC adjustment MSR 0x3b */

Thanks,
Feng






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] pwm-backlight: add subdrivers & Tegra support

2013-01-20 Thread Thierry Reding

On Sat, Jan 19, 2013 at 07:30:17PM +0900, Alexandre Courbot wrote:
> This series introduces a way to use pwm-backlight hooks with platforms
> that use the device tree through a subdriver system. It also adds support
> for the Tegra-based Ventana board, adding the last missing block to enable
> its panel. Support for other Tegra board can thus be easily added.
> 
> I have something else in mind to properly support this (power
> sequences), but this work relies on the GPIO subsystem redesign which will
> take some time. The pwm-backlight subdrivers can do the job by the meantime.
> 
> There are a few design points that might need to be discussed:
> 1) Link order is important: subdrivers register themselves in their
> module_init function, which must be called before pwm-backlight's probe.
> This forbids linking subdrivers as separate modules from pwm-backlight.
> 2) The subdriver's data is temporarily passed through the backlight
> device's driver data. This should not hurt, but maybe there is a better way
> to do this.
> 3) Subdrivers must add themselves into pwm-backlight's own of_device_id
> table. It would be cleaner to not have to list subdrivers into
> pwm-backlight's main file, but I cannot think of a way to do otherwise.
> 
> Suggestions for the 3 points listed above are very welcome - in any case,
> I hope to make this converge into something mergeable quickly.
> 
> Note that these patches are the last missing block to get a functional
> panel on Tegra boards. Using 3.8rc4 and these patches, the internal panel
> on Ventana is usable out-of-the-box. Yay.

Hi Alexandre,

It's great to see you pick this up. I've been meaning to do this myself
but I just can't find the time right now. Generally I think the approach
you've chosen looks good, but I don't think doing it in pwm-backlight is
the right way.

Eventually this should all be covered by the CDF, but since that's not
ready yet we want something ad-hoc to get the hardware supported. As
such I would like to see this go into some sort of minimalistic, Tegra-
specific display/panel framework. I'd prefer to keep the pwm-backlight
driver as simple and generic as possible, that is, a driver for a PWM-
controlled backlight.

Another advantage of moving this into a sort of display framework is
that it may help in defining the requirements for a CDF and that moving
the code to the CDF should be easier once it is done.

Last but not least, abstracting away the panel allows other things such
as physical dimensions and display modes to be properly encapsulated. I
think that power-on/off timing requirements for panels also belong to
this set since they are usually specific to a given panel.

Maybe adding these drivers to tegra-drm for now would be a good option.
That way the corresponding glue can be added without a need for inter-
tree dependencies.

Thierry

pgprFSdcJRsHJ.pgp
Description: PGP signature

[v2][PATCH 5/6] powerpc/book3e: support kgdb for kernel space

2013-01-20 Thread Tiejun Chen

Currently we need to skip this for supporting KGDB.

Signed-off-by: Tiejun Chen 
---
 arch/powerpc/kernel/exceptions-64e.S |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 423a936..6204681 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -589,11 +589,14 @@ kernel_dbg_exc:
rfdi
 
/* Normal debug exception */
+1:
+#ifndef CONFIG_KGDB
/* XXX We only handle coming from userspace for now since we can't
 * quite save properly an interrupted kernel state yet
 */
-1: andi.   r14,r11,MSR_PR; /* check for userspace again */
+   andi.   r14,r11,MSR_PR; /* check for userspace again */
beq kernel_dbg_exc; /* if from kernel mode */
+#endif
 
/* Now we mash up things to make it look like we are coming on a
 * normal exception
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 0/6] powerpc/book3e: make kgdb to work well

2013-01-20 Thread Tiejun Chen

This patchset is used to support kgdb/gdb on book3e.

v2:

* Make sure we cover CONFIG_PPC_BOOK3E_64 safely
* Use LOAD_REG_IMMEDIATE() to load properly
the value of the constant expression in load debug exception stack 
* Copy thread infor form the kernel stack coming from usr
* Rebase latest powerpc git tree

v1:
* Copy thread info only when we are from !user mode since we'll get kernel stack
  coming from usr directly.
* remove save/restore EX_R14/EX_R15 since DBG_EXCEPTION_PROLOG already covered
  this.
* use CURRENT_THREAD_INFO() conveniently to get thread.
* fix some typos
* add a patch to make sure gdb can generate a single step properly to invoke a
  kgdb state.
* add a patch to if we need to replay an interrupt, we shouldn't restore that
  previous backup thread info to make sure we can replay an interrupt lately
  with a proper thread info.
* rebase latest powerpc git tree

v0:
This patchset is used to support kgdb for book3e.


Tiejun Chen (6):
  powerpc/book3e: load critical/machine/debug exception stack
  powerpc/book3e: store critical/machine/debug exception thread info
  book3e/kgdb: update thread's dbcr0
  book3e/kgdb: Fix a single stgep case of lazy IRQ
  powerpc/book3e: support kgdb for kernel space
  kgdb/kgdbts: support ppc64

 arch/powerpc/kernel/exceptions-64e.S |   60 +++---
 arch/powerpc/kernel/irq.c|   10 ++
 arch/powerpc/kernel/kgdb.c   |   16 ++---
 drivers/misc/kgdbts.c|2 ++
 4 files changed, 80 insertions(+), 8 deletions(-)

Tiejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 4/6] book3e/kgdb: Fix a single stgep case of lazy IRQ

2013-01-20 Thread Tiejun Chen

When we're in kgdb_singlestep(), we have to work around to get
thread_info by copying from the kernel stack before calling
kgdb_handle_exception(), then copying it back afterwards.

But for PPC64, we have a lazy interrupt implementation. So after
copying thread info frome kernle stack, if we need to replay an
interrupt, we shouldn't restore that previous backup thread info
to make sure we can replay an interrupt lately with a proper
thread info.

This patch use __check_irq_replay() to guarantee this process.

Signed-off-by: Tiejun Chen 
---
 arch/powerpc/kernel/irq.c  |   10 ++
 arch/powerpc/kernel/kgdb.c |3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 4f97fe3..bb8d27a 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -339,7 +339,17 @@ bool prep_irq_for_idle(void)
return true;
 }
 
+notrace unsigned int check_irq_replay(void)
+{
+   return __check_irq_replay();
+}
+#else
+notrace unsigned int check_irq_replay(void)
+{
+   return 0;
+}
 #endif /* CONFIG_PPC64 */
+EXPORT_SYMBOL(check_irq_replay);
 
 int arch_show_interrupts(struct seq_file *p, int prec)
 {
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index eb30a40..2f22807 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -151,6 +151,7 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
return 1;
 }
 
+extern notrace unsigned int check_irq_replay(void);
 static int kgdb_singlestep(struct pt_regs *regs)
 {
struct thread_info *thread_info, *exception_thread_info;
@@ -181,7 +182,7 @@ static int kgdb_singlestep(struct pt_regs *regs)
 
kgdb_handle_exception(0, SIGTRAP, 0, regs);
 
-   if (thread_info != exception_thread_info)
+   if ((thread_info != exception_thread_info) && (!check_irq_replay()))
/* Restore current_thread_info lastly. */
memcpy(exception_thread_info, backup_current_thread_info, 
sizeof *thread_info);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Michael Wang

On 01/21/2013 03:09 PM, Mike Galbraith wrote:
> On Mon, 2013-01-21 at 07:42 +0100, Mike Galbraith wrote: 
>> On Mon, 2013-01-21 at 13:07 +0800, Michael Wang wrote:
> 
>>> May be we could try change this back to the old way later, after the aim
>>> 7 test on my server.
>>
>> Yeah, something funny is going on.
> 
> Never entering balance path kills the collapse.  Asking wake_affine()
> wrt the pull as before, but allowing us to continue should no idle cpu
> be found, still collapsed.  So the source of funny behavior is indeed in
> balance_path.

Below patch based on the patch set could help to avoid enter balance path
if affine_sd could be found, just like the old logical, would you like to
take a try and see whether it could help fix the collapse?

Regards,
Michael Wang

---
 kernel/sched/fair.c |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d600708..4e95bb0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3297,6 +3297,8 @@ next:
sg = sg->next;
} while (sg != sd->groups);
}
+
+   return -1;
 done:
return target;
 }
@@ -3349,7 +3351,7 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
 * some cases.
 */
new_cpu = select_idle_sibling(p, prev_cpu);
-   if (idle_cpu(new_cpu))
+   if (new_cpu != -1)
goto unlock;

/*
@@ -3363,15 +3365,15 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, 
int wake_flags)
goto balance_path;

new_cpu = select_idle_sibling(p, cpu);
-   if (!idle_cpu(new_cpu))
-   goto balance_path;
-
/*
 * Invoke wake_affine() finally since it is no doubt a
 * performance killer.
 */
-   if (wake_affine(sbm->affine_map[prev_cpu], p, sync))
-   goto unlock;
+   if (new_cpu == -1 ||
+   !wake_affine(sbm->affine_map[prev_cpu], p, sync))
+   new_cpu = prev_cpu;
+
+   goto unlock;
}

 balance_path:
-- 
1.7.4.1


> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 3/6] book3e/kgdb: update thread's dbcr0

2013-01-20 Thread Tiejun Chen

gdb always need to generate a single step properly to invoke
a kgdb state. But with lazy interrupt, book3e can't always
trigger a debug exception with a single step since the current
is blocked for handling those pending exception, then we miss
that expected dbcr configuration at last to generate a debug
exception.

So here we also update thread's dbcr0 to make sure the current
can go back with that missed dbcr0 configuration.

Signed-off-by: Tiejun Chen 
---
 arch/powerpc/kernel/kgdb.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 8747447..eb30a40 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -409,7 +409,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
   struct pt_regs *linux_regs)
 {
char *ptr = _in_buffer[1];
-   unsigned long addr;
+   unsigned long addr, dbcr0;
 
switch (remcom_in_buffer[0]) {
/*
@@ -426,8 +426,15 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
/* set the trace bit if we're stepping */
if (remcom_in_buffer[0] == 's') {
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
-   mtspr(SPRN_DBCR0,
- mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
+   dbcr0 = mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM;
+   mtspr(SPRN_DBCR0, dbcr0);
+#ifdef CONFIG_PPC_BOOK3E_64
+   /* With lazy interrut we have to update thread dbcr0 
here
+* to make sure we can set debug properly at last to 
invoke
+* kgdb again to work well.
+*/
+   current->thread.dbcr0 = dbcr0;
+#endif
linux_regs->msr |= MSR_DE;
 #else
linux_regs->msr |= MSR_SE;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 6/6] kgdb/kgdbts: support ppc64

2013-01-20 Thread Tiejun Chen

We can't look up the address of the entry point of the function simply
via that function symbol for all architectures.

For PPC64 ABI, actually there is a function descriptors structure.

A function descriptor is a three doubleword data structure that contains
the following values:
* The first doubleword contains the address of the entry point of
the function.
* The second doubleword contains the TOC base address for
the function.
* The third doubleword contains the environment pointer for
languages such as Pascal and PL/1.

So we should call a wapperred dereference_function_descriptor() to get
the address of the entry point of the function.

Note this is also safe for other architecture after refer to
"include/asm-generic/sections.h" since:

dereference_function_descriptor(p) always is (p) if without arched definition.

Signed-off-by: Tiejun Chen 
---
 drivers/misc/kgdbts.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index 3aa9a96..4799e1f 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -103,6 +103,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define v1printk(a...) do { \
if (verbose) \
@@ -222,6 +223,7 @@ static unsigned long lookup_addr(char *arg)
addr = (unsigned long)do_fork;
else if (!strcmp(arg, "hw_break_val"))
addr = (unsigned long)_break_val;
+   addr = (unsigned long )dereference_function_descriptor((void *)addr);
return addr;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 2/6] powerpc/book3e: store critical/machine/debug exception thread info

2013-01-20 Thread Tiejun Chen

We need to store thread info to these exception thread info like something
we already did for PPC32.

Signed-off-by: Tiejun Chen 
---
 arch/powerpc/kernel/exceptions-64e.S |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 767f856..423a936 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -58,6 +58,18 @@
std r10,PACA_##level##_STACK(r13);
 #endif
 
+/* Store something to exception thread info */
+#defineBOOK3E_STORE_EXC_LEVEL_THEAD_INFO(type) 
\
+   ld  r1,PACAKSAVE(r13);  
\
+   CURRENT_THREAD_INFO(r14, r14);  
\
+   CURRENT_THREAD_INFO(r15, r1);   
\
+   ld  r10,TI_FLAGS(r14);  
\
+   std r10,TI_FLAGS(r15);  
\
+   ld  r10,TI_PREEMPT(r14);
\
+   std r10,TI_PREEMPT(r15);
\
+   ld  r10,TI_TASK(r14);   
\
+   std r10,TI_TASK(r15);
+
 /* Exception prolog code for all exceptions */
 #define EXCEPTION_PROLOG(n, intnum, type, addition)\
mtspr   SPRN_SPRG_##type##_SCRATCH,r13; /* get spare registers */   \
@@ -95,6 +107,7 @@
BOOK3E_LOAD_EXC_LEVEL_STACK(CRIT);  
\
ld  r1,PACA_CRIT_STACK(r13);\
subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(CRIT);
\
 1:
 #define SPRN_CRIT_SRR0 SPRN_CSRR0
 #define SPRN_CRIT_SRR1 SPRN_CSRR1
@@ -105,6 +118,7 @@
BOOK3E_LOAD_EXC_LEVEL_STACK(DBG);   
\
ld  r1,PACA_DBG_STACK(r13); \
subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(DBG); 
\
 1:
 #define SPRN_DBG_SRR0  SPRN_DSRR0
 #define SPRN_DBG_SRR1  SPRN_DSRR1
@@ -115,6 +129,7 @@
BOOK3E_LOAD_EXC_LEVEL_STACK(MC);
\
ld  r1,PACA_MC_STACK(r13);  \
subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+   BOOK3E_STORE_EXC_LEVEL_THEAD_INFO(MC);  
\
 1:
 #define SPRN_MC_SRR0   SPRN_MCSRR0
 #define SPRN_MC_SRR1   SPRN_MCSRR1
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[v2][PATCH 1/6] powerpc/book3e: load critical/machine/debug exception stack

2013-01-20 Thread Tiejun Chen

We always alloc critical/machine/debug check exceptions. This is
different from the normal exception. So we should load these exception
stack properly like we did for booke.

Signed-off-by: Tiejun Chen 
---
 arch/powerpc/kernel/exceptions-64e.S |   40 +++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index ae54553..767f856 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -36,6 +36,28 @@
  */
 #defineSPECIAL_EXC_FRAME_SIZE  INT_FRAME_SIZE
 
+/* only on book3e */
+#define DBG_STACK_BASE dbgirq_ctx
+#define MC_STACK_BASE  mcheckirq_ctx
+#define CRIT_STACK_BASEcritirq_ctx
+
+#ifdef CONFIG_SMP
+#define BOOK3E_LOAD_EXC_LEVEL_STACK(level) \
+   mfspr   r14,SPRN_PIR;   \
+   slwir14,r14,3;  \
+   LOAD_REG_IMMEDIATE(r10, level##_STACK_BASE);\
+   add r10,r10,r14;\
+   ld  r10,0(r10); \
+   addir10,r10,THREAD_SIZE;\
+   std r10,PACA_##level##_STACK(r13);
+#else
+#define BOOK3E_LOAD_EXC_LEVEL_STACK(level) \
+   LOAD_REG_IMMEDIATE(r10, level##_STACK_BASE);\
+   ld  r10,0(r10); \
+   addir10,r10,THREAD_SIZE;\
+   std r10,PACA_##level##_STACK(r13);
+#endif
+
 /* Exception prolog code for all exceptions */
 #define EXCEPTION_PROLOG(n, intnum, type, addition)\
mtspr   SPRN_SPRG_##type##_SCRATCH,r13; /* get spare registers */   \
@@ -68,20 +90,32 @@
 #define SPRN_GDBELL_SRR1   SPRN_GSRR1
 
 #define CRIT_SET_KSTACK
\
+   andi.   r10,r11,MSR_PR; 
\
+   bne 1f; 
\
+   BOOK3E_LOAD_EXC_LEVEL_STACK(CRIT);  
\
ld  r1,PACA_CRIT_STACK(r13);\
-   subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+1:
 #define SPRN_CRIT_SRR0 SPRN_CSRR0
 #define SPRN_CRIT_SRR1 SPRN_CSRR1
 
 #define DBG_SET_KSTACK \
+   andi.   r10,r11,MSR_PR; 
\
+   bne 1f; 
\
+   BOOK3E_LOAD_EXC_LEVEL_STACK(DBG);   
\
ld  r1,PACA_DBG_STACK(r13); \
-   subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+1:
 #define SPRN_DBG_SRR0  SPRN_DSRR0
 #define SPRN_DBG_SRR1  SPRN_DSRR1
 
 #define MC_SET_KSTACK  \
+   andi.   r10,r11,MSR_PR; 
\
+   bne 1f; 
\
+   BOOK3E_LOAD_EXC_LEVEL_STACK(MC);
\
ld  r1,PACA_MC_STACK(r13);  \
-   subir1,r1,SPECIAL_EXC_FRAME_SIZE;
+   subir1,r1,SPECIAL_EXC_FRAME_SIZE;   
\
+1:
 #define SPRN_MC_SRR0   SPRN_MCSRR0
 #define SPRN_MC_SRR1   SPRN_MCSRR1
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cifs: fix srcip_matches() for ipv6

2013-01-20 Thread Steve French

merged into cifs-2.6.git

On Wed, Jan 16, 2013 at 10:04 PM, Nickolai Zeldovich
 wrote:
> On Wed, Jan 16, 2013 at 10:51 PM, Steve French  wrote:
>> How did you discover this - did you have an ipv6 test case or by
>> inspection or ...?
>
> By mostly-automated inspection (i.e., with the help of a static
> program analysis tool).
>
> Nickolai.



-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP

2013-01-20 Thread Chen Gong

On Mon, Jan 21, 2013 at 02:38:41PM +0800, Feng Tang wrote:
> Date: Mon, 21 Jan 2013 14:38:41 +0800
> From: Feng Tang 
> To: Thomas Gleixner , John Stultz
>  , Ingo Molnar , "H. Peter Anvin"
>  , x...@kernel.org, Len Brown ,
>  "Rafael J. Wysocki" ,
>  linux-kernel@vger.kernel.org
> Cc: Feng Tang 
> Subject: [RFC PATCH 1/5] x86: Add cpu capability flag
>  X86_FEATURE_TSC_S3_NOTSTOP
> X-Mailer: git-send-email 1.7.9.5
> 
> On some new Intel Atom processors (Penwell and Cloverview), there is
> a feature that the TSC won't stop S3, say the TSC value won't be
> reset to 0 after resume. This feature makes TSC a more reliable
> clocksource and could benefit the timekeeping code during system
> suspend/resume cycle, so add a flag for it.
> 
> Signed-off-by: Feng Tang 
> ---
>  arch/x86/include/asm/cpufeature.h |1 +
>  arch/x86/kernel/cpu/intel.c   |   12 
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index 2d9075e..f7e1eac 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -100,6 +100,7 @@
>  #define X86_FEATURE_AMD_DCM (3*32+27) /* multi-node processor */
>  #define X86_FEATURE_APERFMPERF   (3*32+28) /* APERFMPERF */
>  #define X86_FEATURE_EAGER_FPU(3*32+29) /* "eagerfpu" Non lazy FPU 
> restore */
> +#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state 
> */
>  
We have an existed "TSC always running in C3+" feature and name it as
X86_FEATURE_NONSTOP_TSC, so how about naming it with the same style,
like X86_FEATURE_NONSTOP_TSC_S3?



signature.asc
Description: Digital signature

Re: [PATCH 2/3] tegra: pwm-backlight: add tegra pwm-bl driver

2013-01-20 Thread Mark Zhang

On 01/19/2013 06:30 PM, Alexandre Courbot wrote:
> Add a PWM-backlight subdriver for Tegra boards, with support for
> Ventana.
> 
> Signed-off-by: Alexandre Courbot 
> ---
[...]
>  
> + backlight {
> + compatible = "pwm-backlight-ventana";
> + brightness-levels = <0 16 32 48 64 80 96 112 128 144 160 176 
> 192 208 224 240 255>;
> + default-brightness-level = <12>;
> +
> + pwms = < 2 500>;

After read the codes of tegra pwm driver & pwm framework, I got to know
the meaning of this property. So I think we need to add a doc(e.g:
Documentation/devicetree/bindings/video/backlight/nvidia,tegra20-bl.txt)
to explain this, "Documentation/devicetree/bindings/pwm/pwm.txt" doesn't
explain this, because this may be different between different pwm drivers.

> + pwm-names = "backlight";
> +
> + power-supply = <_bl_reg>;
> + panel-supply = <_pnl_reg>;
> + bl-gpio = < 28 0>;
> + bl-panel = < 10 0>;
> + };
> +
[...]
> diff --git a/drivers/video/backlight/pwm_bl_tegra.c 
> b/drivers/video/backlight/pwm_bl_tegra.c
> new file mode 100644
> index 000..8f2195b
> --- /dev/null
> +++ b/drivers/video/backlight/pwm_bl_tegra.c

So according to the filename, I think we can put all tegra boards codes
here, right? Just like what you do for Ventana, if I wanna add support
for cardhu, I can define similar functions -- let's say "init_cardhu",
"exit_cardhu", "notify_cardhu" and "notify_after_cardhu", right?

But I think if we do in this way, the file will become very long soon.
And there are a lot of redundant codes in it. So do you have any
suggestions?

Mark
> @@ -0,0 +1,159 @@
> +/*
> + * pwm-backlight subdriver for Tegra.
> + *
> + * Copyright (c) 2013 NVIDIA CORPORATION.  All rights reserved.
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
[...]
> +MODULE_DESCRIPTION("Backlight Driver for Tegra boards");
> +MODULE_LICENSE("GPL");
> +MODULE_ALIAS("platform:pwm-tegra-backlight");
> +
> +
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift

2013-01-20 Thread Chen Gong

On Mon, Jan 21, 2013 at 02:38:44PM +0800, Feng Tang wrote:
> Date: Mon, 21 Jan 2013 14:38:44 +0800
> From: Feng Tang 
> To: Thomas Gleixner , John Stultz
>  , Ingo Molnar , "H. Peter Anvin"
>  , x...@kernel.org, Len Brown ,
>  "Rafael J. Wysocki" ,
>  linux-kernel@vger.kernel.org
> Cc: Feng Tang 
> Subject: [RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when
>  configuring the scale and shift
> X-Mailer: git-send-email 1.7.9.5
> 
> On our x86 platform, we see a failure case of calling clocksource_cyc2ns(),
> which return a negative value. The reason is the time interval was large
> (more than 1000 seconds), while its TSC frequency is 2GHz, so the following
> fomular overflowed:
>   ((u64) cycles * mult) >> shift
> 
> So enlarge the time interval from 10 mins to 40 mins to fix the bug.
> 
> Another solution may be adding a "max_interval" in struct clocksource, and
> use a default value (like current 10 minutes) when clocksource driver
> doesn't set it.
> 
As you said, it looks like it is a littleb it arbitrary from 10m -> 40m, I
think max_interval is a better choice, if timer guys not minding too many
control knobs :-).

> Signed-off-by: Feng Tang 
> ---
>  kernel/time/clocksource.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index c958338..48fbfcb 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -663,7 +663,7 @@ void __clocksource_updatefreq_scale(struct clocksource 
> *cs, u32 scale, u32 freq)
>* Calc the maximum number of seconds which we can run before
>* wrapping around. For clocksources which have a mask > 32bit
>* we need to limit the max sleep time to have a good
> -  * conversion precision. 10 minutes is still a reasonable
> +  * conversion precision. 40 minutes is still a reasonable
>* amount. That results in a shift value of 24 for a
>* clocksource with mask >= 40bit and f >= 4GHz. That maps to
>* ~ 0.06ppm granularity for NTP. We apply the same 12.5%
> @@ -674,8 +674,8 @@ void __clocksource_updatefreq_scale(struct clocksource 
> *cs, u32 scale, u32 freq)
>   do_div(sec, scale);
>   if (!sec)
>   sec = 1;
> - else if (sec > 600 && cs->mask > UINT_MAX)
> - sec = 600;
> + else if (sec > 2400 && cs->mask > UINT_MAX)
> + sec = 2400;
>  
>   clocks_calc_mult_shift(>mult, >shift, freq,
>  NSEC_PER_SEC / scale, sec * scale);
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


signature.asc
Description: Digital signature

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Michael Wang

On 01/21/2013 02:42 PM, Mike Galbraith wrote:
> On Mon, 2013-01-21 at 13:07 +0800, Michael Wang wrote:
> 
>> That seems like the default one, could you please show me the numbers in
>> your datapoint file?
> 
> Yup, I do not touch the workfile.  Datapoints is what you see in the
> tabulated result...
> 
> 1
> 1
> 1
> 5
> 5
> 5
> 10
> 10
> 10
> ...
> 
> so it does three consecutive runs at each load level.  I quiesce the
> box, set governor to performance, echo 250 32000 32 4096
>> /proc/sys/kernel/sem, then ./multitask -nl -f, and point it
> at ./datapoints.

I have changed the "/proc/sys/kernel/sem" to:

20002048000 256 1024

and run few rounds, seems like I can't reproduce this issue on my 12 cpu
X86 server:

prevpost
Tasksjobs/min   jobs/min
1  508.39   506.69
5 2792.63   2792.63
   10 5454.55   5449.64
   2010262.49   10271.19
   4018089.55   18184.55
   8028995.22   28960.57
  16041365.19   41613.73
  32053099.67   52767.35
  64061308.88   61483.83
 128066707.95   66484.96
 256069736.58   69350.02

Almost nothing changed...I would like to find another machine and do the
test again later.

> 
>> I'm not familiar with this benchmark, but I'd like to have a try on my
>> server, to make sure whether it is a generic issue.
> 
> One thing I didn't like about your changes is that you don't ask
> wake_affine() if it's ok to pull cross node or not, which I though might
> induce imbalance, but twiddling that didn't fix up the collapse, pretty
> much leaving only the balance path.

wake_affine() will be asked before trying to use the idle sibling
selected from current cpu's domain, doesn't it? It's just been delayed
since it's cost is too high.

But you notified me that I missed the case when prev == current, not
sure whether it's the killer, but will correct it.

> 
 And I'm confusing about how those new parameter value was figured out
 and how could them help solve the possible issue?
>>>
>>> Oh, that's easy.  I set sched_min_granularity_ns such that last_buddy
>>> kicks in when a third task arrives on a runqueue, and set
>>> sched_wakeup_granularity_ns near minimum that still allows wakeup
>>> preemption to occur.  Combined effect is reduced over-scheduling.
>>
>> That sounds very hard, to catch the timing, whatever, it could be an
>> important clue for analysis.
> 
> (Play with the knobs with a bunch of different loads, I think you'll
> find that those settings work well)
> 
 Do you have any idea about which part in this patch set may cause the 
 issue?
>>>
>>> Nope, I'm as puzzled by that as you are.  When the box had 40 cores,
>>> both virgin and patched showed over-scheduling effects, but not like
>>> this.  With 20 cores, symptoms changed in a most puzzling way, and I
>>> don't see how you'd be directly responsible.
>>
>> Hmm...
>>
>>>
 One change by designed is that, for old logical, if it's a wake up and
 we found affine sd, the select func will never go into the balance path,
 but the new logical will, in some cases, do you think this could be a
 problem?
>>>
>>> Since it's the high load end, where looking for an idle core is most
>>> likely to be a waste of time, it makes sense that entering the balance
>>> path would hurt _some_, it isn't free.. except for twiddling preemption
>>> knobs making the collapse just go away.  We're still going to enter that
>>> path if all cores are busy, no matter how I twiddle those knobs.
>>
>> May be we could try change this back to the old way later, after the aim
>> 7 test on my server.
> 
> Yeah, something funny is going on.  I'd like select_idle_sibling() to
> just go away, that task be integrated into one and only one short and
> sweet balance path.  I don't see why fine_idlest* needs to continue
> traversal after seeing a zero.  
It should be just fine to say gee, we're
> done.  

Yes, that's true :)

Hohum, so much for pure test and report, twiddle twiddle tweak,
> bend spindle mutilate ;-) 

Scheduler is impossible to be analysis some time, the only way to prove
is the painful endless testing...and usually, we still missed some thing
in the end...

Regards,
Michael Wang


>
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] scripts/package/Makefile: remove useless KBUILD_OUTPUT test

2013-01-20 Thread Bin Wang

The test of KBUILD_OUTPUT in "rpm-pkg rpm" target is useless.
KBUILD_OUTPUT is always empty here.

Signed-off-by: Bin Wang 
---
 scripts/package/Makefile | 6 --
 1 file changed, 6 deletions(-)

diff --git a/scripts/package/Makefile b/scripts/package/Makefile
index 87bf080..ba073a6 100644
--- a/scripts/package/Makefile
+++ b/scripts/package/Makefile
@@ -36,12 +36,6 @@ $(objtree)/kernel.spec: $(MKSPEC) $(srctree)/Makefile
$(CONFIG_SHELL) $(MKSPEC) > $@
 
 rpm-pkg rpm: $(objtree)/kernel.spec FORCE
-   @if test -n "$(KBUILD_OUTPUT)"; then \
-   echo "Building source + binary RPM is not possible outside 
the"; \
-   echo "kernel source tree. Don't set KBUILD_OUTPUT, or use the"; 
\
-   echo "binrpm-pkg target instead."; \
-   false; \
-   fi
$(MAKE) clean
$(PREV) ln -sf $(srctree) $(KERNELPATH)
$(CONFIG_SHELL) $(srctree)/scripts/setlocalversion --save-scmversion
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: thoughts on requiring multi-arch support for arm drm drivers?

2013-01-20 Thread Thierry Reding

On Sun, Jan 20, 2013 at 04:42:55PM +0100, Daniel Vetter wrote:
> On Sun, Jan 20, 2013 at 4:08 PM, Rob Clark  wrote:
> > One thing I've run into in the past when trying to make changes in drm
> > core, and Daniel Vetter has mentioned the same, is that it is a bit of
> > a pain to compile test things for the arm drivers that do not support
> > CONFIG_ARCH_MULTIPLATFORM.  I went through a while back and fixed up
> > the low hanging fruit (basically the drivers that just needed a
> > Kconfig change).  But, IIRC some of the backlight related code in
> > shmob had some non-trivial plat dependencies.  And I think when tegra
> > came in, it introduced some non-trivial plat dependencies.
> >
> > What do others think about requiring multiarch or no arch dependencies
> > for new drivers, and cleaning up existing drivers.  Even if it is at
> > reduced functionality (like maybe #ifdef CONFIG_ARCH_SHMOBILE for some
> > of the backlight code in shmob) or doesn't even work but is just for
> > the purpose of being able to compile test the rest of the code?
> >
> > Thoughts?
> 
> Definitely in favour of this. Also, I think the arm world _really_
> needs something like Wu Fenggungs 0-day kernel testing/building
> machines, which checks every commit pushed to around a 150 git kernel
> maintainer repos with randconfigs, sparse (and iirc other static
> checkers like cocinelle), and test-boots them on kvm. It's not just
> that every driver seems to need it's own special defconfig/platform to
> even be selectable in Kconfig, they also seem to randomly (and often)
> break compilation if you're on the wrong tree or don't have the
> exactly required golden config ...

That's true. Unfortunately due to the many repositories involved there
seem to be quite a few dependencies involved to get all the pieces to
build properly. linux-next is usually in pretty good shape, however.
I've been running an automated build over at least all ARM defconfigs in
linux-next for a few days and sent out patches for build failures. But
I'm not sure if I can keep that up, or at least not on a daily basis.

Obviously it doesn't help the DRM problem all that much. But I agree
with Rob that the only thing that will really help is multi-platform
support.

Thierry


pgpgJZw8ndiwE.pgp
Description: PGP signature

[PATCH v2 1/1] page_alloc: Bootmem limit with movablecore_map

2013-01-20 Thread Tang Chen

This patch make sure bootmem will not allocate memory from areas that
may be ZONE_MOVABLE. The map info is from movablecore_map boot option.

Signed-off-by: Tang Chen 
Reviewed-by: Wen Congyang 
Reviewed-by: Lai Jiangshan 
Tested-by: Lin Feng 
---
 include/linux/memblock.h |2 +
 mm/memblock.c|   50 ++
 2 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index d452ee1..ac52bbc 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -60,6 +60,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 void memblock_trim_memory(phys_addr_t align);
 
 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+extern struct movablecore_map movablecore_map;
+
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
  unsigned long *out_end_pfn, int *out_nid);
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 88adc8a..0218231 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -92,9 +92,58 @@ static long __init_memblock memblock_overlaps_region(struct 
memblock_type *type,
  *
  * Find @size free area aligned to @align in the specified range and node.
  *
+ * If we have CONFIG_HAVE_MEMBLOCK_NODE_MAP defined, we need to check if the
+ * memory we found if not in hotpluggable ranges.
+ *
  * RETURNS:
  * Found address on success, %0 on failure.
  */
+#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
+phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
+   phys_addr_t end, phys_addr_t size,
+   phys_addr_t align, int nid)
+{
+   phys_addr_t this_start, this_end, cand;
+   u64 i;
+   int curr = movablecore_map.nr_map - 1;
+
+   /* pump up @end */
+   if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
+   end = memblock.current_limit;
+
+   /* avoid allocating the first page */
+   start = max_t(phys_addr_t, start, PAGE_SIZE);
+   end = max(start, end);
+
+   for_each_free_mem_range_reverse(i, nid, _start, _end, NULL) {
+   this_start = clamp(this_start, start, end);
+   this_end = clamp(this_end, start, end);
+
+restart:
+   if (this_end <= this_start || this_end < size)
+   continue;
+
+   for (; curr >= 0; curr--) {
+   if ((movablecore_map.map[curr].start_pfn << PAGE_SHIFT)
+   < this_end)
+   break;
+   }
+
+   cand = round_down(this_end - size, align);
+   if (curr >= 0 &&
+   cand < movablecore_map.map[curr].end_pfn << PAGE_SHIFT) {
+   this_end = movablecore_map.map[curr].start_pfn
+  << PAGE_SHIFT;
+   goto restart;
+   }
+
+   if (cand >= this_start)
+   return cand;
+   }
+
+   return 0;
+}
+#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align, int nid)
@@ -123,6 +172,7 @@ phys_addr_t __init_memblock 
memblock_find_in_range_node(phys_addr_t start,
}
return 0;
 }
+#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 /**
  * memblock_find_in_range - find free area in given range
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Creating an eeprom class

2013-01-20 Thread Thomas De Schampheleire

On Sun, Jan 20, 2013 at 11:39 PM, Greg KH  wrote:
> On Sun, Jan 20, 2013 at 07:08:28PM +0100, Thomas De Schampheleire wrote:
>> [plaintext and fixed address of David Brownell]
>
> David passed away a year or so ago, so that's really not going to help :(

So sorry to hear that, I was not aware...

>
>> Hi,
>>
>> Several of the eeprom drivers that live in drivers/misc/eeprom export
>> a binary sysfs file 'eeprom'. If a userspace program or script wants
>> to access this file, it needs to know the full path, for example:
>>
>> /sys/bus/spi/devices/spi32766.0/eeprom
>>
>> The problem with this approach is that it requires knowledge about the
>> hardware configuration: is the eeprom on the SPI bus, the I2C bus, or
>> maybe memory mapped?
>>
>> It would therefore be more interesting to have a bus-agnostic way to
>> access this eeprom file, for example:
>> /sys/class/eeprom/eeprom0/eeprom
>>
>> Maybe it'd be even better to use a more generic class name than
>> 'eeprom', since there are several types of eeprom-like devices that
>> you could export this way.
>
> Does all of the existing "eeprom" devices use the same userspace
> interface?  If so, yes, having a "class" would make sense.

All but one do. That one (eeprom_93cx6.c) exports its read/write
functions to other kernel code, and is used in several
wireless/ethernet drivers.

>
>> Or should we rather hook the eeprom code into the mtd subsystem?
>
> Why mtd?

Because an eeprom is a piece of memory. Maybe mtd is overkill in term
of the operations supported, but from a high-level perspective an
eeprom is a memory technology device, right?

Thanks,
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Only wakeup the line discipline idle queue when queue is active

2013-01-20 Thread Ivo Sieben

Hi,

2013/1/18 Oleg Nesterov :
>
> I can't understand why do you dislike Ivo's simple patch. There are
> a lot of "if (waitqueue_active) wake_up" examples. Even if we add the
> new helpers (personally I don't think this makes sense) , we can do
> this later. Why should we delay this fix?
>

FYI: Greg has added my patch to his tty-next branch, so my fix has
been approved.
Thank you all for reviewing.

Regards,
Ivo Sieben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the final tree (akpm tree related)

2013-01-20 Thread Tang Chen


Hi Stephen,

On 01/21/2013 02:08 PM, Stephen Rothwell wrote:

Hi all,

After merging the final tree, today's linux-next build (arm defconfig)
failed like this:

mm/memblock.c: In function 'memblock_find_in_range_node':
mm/memblock.c:104:2: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:123:4: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:130:7: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:131:4: error: invalid use of undefined type 'struct 
movablecore_map'

Caused by commit "page_alloc: bootmem limit with movablecore_map" from
the akpm tree.  The definition of struct movablecore_map is protected by
CONFIG_HAVE_MEMBLOCK_NODE_MAP but its use is not.

I have reverted that commit for today.


Thank you very much for reporting this. It was my mistake to miss this 
definition.


I will post a new version of "page_alloc: bootmem limit with 
movablecore_map" since

you have reverted it.


CONFIG_HAVE_MEMBLOCK_NODE_MAP is selected by x86=y, but I don't have any 
non-x86 box.
So I didn't test it. Please tell me if you have any problem with it on 
other platforms.


Thanks. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] staging, zsmalloc: introduce zs_mem_[read/write]

2013-01-20 Thread Joonsoo Kim

Hello, Minchan.

On Thu, Jan 17, 2013 at 08:59:22AM +0900, Minchan Kim wrote:
> Hi Joonsoo,
> 
> On Wed, Jan 16, 2013 at 05:08:55PM +0900, Joonsoo Kim wrote:
> > If object is on boundary of page, zs_map_object() copy content of object
> > to pre-allocated page and return virtual address of
> 
> IMHO, for reviewer reading easily, it would be better to specify explict
> word instead of abstract.
> 
> pre-allocated pages : vm_buf which is reserved pages for zsmalloc
> 
> > this pre-allocated page. If user inform zsmalloc of memcpy region,
> > we can avoid this copy overhead.
> 
> That's a good idea!
> 
> > This patch implement two API and these get information of memcpy region.
> > Using this information, we can do memcpy without overhead.
> 
> For the clarification,
> 
>   we can reduce copy overhead with this patch
>   in !USE_PGTABLE_MAPPING case.
> 
> > 
> > For USE_PGTABLE_MAPPING case, we can avoid flush cache and tlb overhead
> > via these API.
> 
> Yeb!
> 
> > 
> > Signed-off-by: Joonsoo Kim 
> > ---
> > These are [RFC] patches, because I don't test and
> > I don't have test environment, yet. Just compile test done.
> > If there is positive comment, I will setup test env and check correctness.
> > These are based on v3.8-rc3.
> > If rebase is needed, please notify me what tree I should rebase.
> 
> Whenever you send zsmalloc/zram/zcache, you have to based on recent 
> linux-next.
> But I hope we send the patches to akpm by promoting soon. :(
> 
> > 
> > diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
> > b/drivers/staging/zsmalloc/zsmalloc-main.c
> > index 09a9d35..e3ef5a5 100644
> > --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> > +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> > @@ -1045,6 +1045,118 @@ void zs_unmap_object(struct zs_pool *pool, unsigned 
> > long handle)
> >  }
> >  EXPORT_SYMBOL_GPL(zs_unmap_object);
> >  
> 
> It's exported function. Please write description.
> 
> > +void zs_mem_read(struct zs_pool *pool, unsigned long handle,
> > +   void *dest, unsigned long src_off, size_t n)
> 
> n is meaningless, please use meaningful word.
> How about this?
> void *buf, unsigned long offset, size_t count
> 
> > +{
> > +   struct page *page;
> > +   unsigned long obj_idx, off;
> > +
> > +   unsigned int class_idx;
> > +   enum fullness_group fg;
> > +   struct size_class *class;
> > +   struct page *pages[2];
> > +   int sizes[2];
> > +   void *addr;
> > +
> > +   BUG_ON(!handle);
> > +
> > +   /*
> > +* Because we use per-cpu mapping areas shared among the
> > +* pools/users, we can't allow mapping in interrupt context
> > +* because it can corrupt another users mappings.
> > +*/
> > +   BUG_ON(in_interrupt());
> > +
> > +   obj_handle_to_location(handle, , _idx);
> > +   get_zspage_mapping(get_first_page(page), _idx, );
> > +   class = >size_class[class_idx];
> > +   off = obj_idx_to_offset(page, obj_idx, class->size);
> > +   off += src_off;
> > +
> > +   BUG_ON(class->size < n);
> > +
> > +   if (off + n <= PAGE_SIZE) {
> > +   /* this object is contained entirely within a page */
> > +   addr = kmap_atomic(page);
> > +   memcpy(dest, addr + off, n);
> > +   kunmap_atomic(addr);
> > +   return;
> > +   }
> > +
> > +   /* this object spans two pages */
> > +   pages[0] = page;
> > +   pages[1] = get_next_page(page);
> > +   BUG_ON(!pages[1]);
> > +
> > +   sizes[0] = PAGE_SIZE - off;
> > +   sizes[1] = n - sizes[0];
> > +
> > +   addr = kmap_atomic(pages[0]);
> > +   memcpy(dest, addr + off, sizes[0]);
> > +   kunmap_atomic(addr);
> > +
> > +   addr = kmap_atomic(pages[1]);
> > +   memcpy(dest + sizes[0], addr, sizes[1]);
> > +   kunmap_atomic(addr);
> > +}
> > +EXPORT_SYMBOL_GPL(zs_mem_read);
> > +
> 
> Ditto. Write descriptoin.
> 
> > +void zs_mem_write(struct zs_pool *pool, unsigned long handle,
> > +   const void *src, unsigned long dest_off, size_t n)
> > +{
> > +   struct page *page;
> > +   unsigned long obj_idx, off;
> > +
> > +   unsigned int class_idx;
> > +   enum fullness_group fg;
> > +   struct size_class *class;
> > +   struct page *pages[2];
> > +   int sizes[2];
> > +   void *addr;
> > +
> > +   BUG_ON(!handle);
> > +
> > +   /*
> > +* Because we use per-cpu mapping areas shared among the
> > +* pools/users, we can't allow mapping in interrupt context
> > +* because it can corrupt another users mappings.
> > +*/
> > +   BUG_ON(in_interrupt());
> > +
> > +   obj_handle_to_location(handle, , _idx);
> > +   get_zspage_mapping(get_first_page(page), _idx, );
> > +   class = >size_class[class_idx];
> > +   off = obj_idx_to_offset(page, obj_idx, class->size);
> > +   off += dest_off;
> > +
> > +   BUG_ON(class->size < n);
> > +
> > +   if (off + n <= PAGE_SIZE) {
> > +   /* this object is contained entirely within a page */
> > +   addr = kmap_atomic(page);
> > +

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Mike Galbraith

On Mon, 2013-01-21 at 07:42 +0100, Mike Galbraith wrote: 
> On Mon, 2013-01-21 at 13:07 +0800, Michael Wang wrote:

> > May be we could try change this back to the old way later, after the aim
> > 7 test on my server.
> 
> Yeah, something funny is going on.

Never entering balance path kills the collapse.  Asking wake_affine()
wrt the pull as before, but allowing us to continue should no idle cpu
be found, still collapsed.  So the source of funny behavior is indeed in
balance_path.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] ARM: shmobile: sh73a0: Use generic irqchip_init()

2013-01-20 Thread Thierry Reding

On Mon, Jan 21, 2013 at 09:54:39AM +0900, Simon Horman wrote:
> On Fri, Jan 18, 2013 at 08:16:12AM +0100, Thierry Reding wrote:
> > The asm/hardware/gic.h header does no longer exist and the corresponding
> > functionality was moved to linux/irqchip.h and linux/irqchip/arm-gic.h
> > respectively. gic_handle_irq() and of_irq_init() are no longer available
> > either and have been replaced by irqchip_init().
> 
> asm/hardware/gic.h Seems to still exist in Linus's tree.
> Could you let me know which tree of which branch I should depend on
> in order to apply this change?

I found this when doing an automated build over all ARM defconfigs on
linux-next.

Commit 520f7bd73354f003a9a59937b28e4903d985c420 "irqchip: Move ARM gic.h
to include/linux/irqchip/arm-gic.h" moved the file and was merged
through Olof Johansson's next/cleanup and for-next branches.

Adding Olof on Cc since I'm not quite sure myself about how this is
handled.

Thierry

pgpRFywRyd6Zr.pgp
Description: PGP signature

Re: [RFC PATCH v1 0/3] kdump, vmcore: Map vmcore memory in direct mapping region

2013-01-20 Thread HATAYAMA Daisuke

From: Vivek Goyal 
Subject: Re: [RFC PATCH v1 0/3] kdump, vmcore: Map vmcore memory in direct 
mapping region
Date: Fri, 18 Jan 2013 15:54:13 -0500

> On Fri, Jan 18, 2013 at 11:06:59PM +0900, HATAYAMA Daisuke wrote:
> 
> [..]
>> > These are impressive improvements. I missed the discussion on mmap().
>> > So why couldn't we provide mmap() interface for /proc/vmcore. If that
>> > works then application can select to mmap/unmap bigger chunks of file
>> > (instead ioremap mapping/remapping a page at a time). 
>> > 
>> > And if application controls the size of mapping, then it can vary the
>> > size of mapping based on available amount of free memory. That way if
>> > somebody reserves less amount of memory, we could still dump but with
>> > some time penalty.
>> > 
>> 
>> mmap() needs user-space page table in addition to kernel-space's,
> 
> [ CC Rik van Riel] 
> 
> I was chatting with Rik and it does not look like that there is any
> fundamental requirement that range of pfn being mapped in user tables
> has to be mapped in kernel tables too. Did you run into specific issue.
> 

No, I was confused simply this around.

>> and
>> it looks that remap_pfn_range() that creates the user-space page
>> table, doesn't support large pages, only 4KB pages.
> 
> This indeed looks like the case. May be we can enahnce remap_pfn_range()
> to take an argument and create larger size mappings.
> 

Adding a new argument to remap_pfn_range would never easily be
accepted because it changes signature of it. It is the function that
is exported to modules.

As init_memory_mapping does, it should internally automatically divide
a given ranges of kernel address space into properly aligned ones then
remap them.

Also, if we extend this in the future, we need to have some feature
for userland to know a given kernel can use 2MB/1GB pages for
remapping. makedumpfile needs to estimate how much memory is required
for the remapping.

>> If mmaping small
>> chunks only for small memory programming, then we would again face the
>> same issue as with ioremap.
> 
> Even if it is 4KB pages, I think it will still be faster than current
> interface. Because we will not be issuing these many tlb flushes.
> (Assuming makedumpfile has been modified to map/unap large areas of
> /proc/vmcore).
> 

OK, I'll go in this direction first. From my local investigation, I'm
beginning with thinking that my idea to map a whole DIMM ranges in
direct mapping region is difficult due to some memory hot-plug issues,
and mmap interface is more useful than keeping page table handling in
/proc/vmcore when we process /proc/vmcore in paralell where each
process reads different range.

Assuming we can use 4KB pages only, if we use 1MB buffer for page
table, we can cover about 500MB memory region. Then, remapping is done
about 2000 times. On ioremap case, remapping is done 268435456
times. Peformacne should be improved so much. We should benchmark this
first.

Thanks.
HATAYAMA, Daisuke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 16/76] ARC: Syscall support (no-legacy-syscall ABI)

2013-01-20 Thread Vineet Gupta

On Saturday 19 January 2013 08:39 AM, Al Viro wrote:
> Please, collapse your #36--#40 into that one (and I'd probably fold #17
> here as well, to simplify that reordering).  Sure, it's not a bisection
> hazard, but...
> 

I kept #16 and #17 distinct and

  * squashed switch-to-generic-kernel-thread #36 into process creation patch #17
  * split generic kernel_execve and sys_execve #37 into two
* squashed sys_execve bits into syscall patch #16
* squashed kernel_execve patch into #17
  * squashed switch-to-saner-execve patches #38 and #39 into #17
  * squashed generic clone patch #40 into #16

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 11/11] PCI: Put pci dev to device tree as early as possible

2013-01-20 Thread Yinghai Lu

On Sun, Jan 20, 2013 at 3:23 PM, Rafael J. Wysocki  wrote:
> On Thursday, January 17, 2013 11:53:22 PM Yinghai Lu wrote:
>> We want to put created pci device in the device tree as soon as possible.
>> - just after we find it and create pci_dev struct for it.
>> so for_pci_dev iteration will not miss them.
>>
>> But at that time, we can not load driver for them yet. Need to be after
>> pci_assign_unsigned_resources() etc to make sure all pci devices get
>> resource allocated at first.
>>
>> Move out device registering out of pci_bus_add_devices, and
>> new pci_bus_add_devices() will do the device_attach work to load pci drivers
>>
>> Signed-off-by: Yinghai Lu 
>> ---
>>  drivers/pci/bus.c   |   47 +++
>>  drivers/pci/iov.c   |7 ---
>>  drivers/pci/probe.c |   34 +++---
>>  3 files changed, 30 insertions(+), 58 deletions(-)
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 18c1c6d..0a55845 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -178,22 +178,9 @@ static void pci_bus_attach_device(struct pci_dev *dev)
>>   */
..
>> @@ -205,21 +192,9 @@ int pci_bus_add_device(struct pci_dev *dev)
>>   */
>>  int pci_bus_add_child(struct pci_bus *bus)
>>  {
>> - int retval;
>> -
>> - if (bus->bridge)
>> - bus->dev.parent = bus->bridge;
>> -
>> - retval = device_register(>dev);
>> - if (retval)
>> - return retval;
>> -
>>   bus->is_added = 1;
>>
>> - /* Create legacy_io and legacy_mem files for this bus */
>> - pci_create_legacy_files(bus);
>> -
>> - return retval;
>> + return 0;
>>  }
>
> Well, what sense does this make to keep that function as is after removing
> almost all of the code from it?

ok, will remove that function.

...
>>   list_for_each_entry(dev, >devices, bus_list) {
>>   BUG_ON(!dev->is_added);
>>
>>   child = dev->subordinate;
>> - /*
>> -  * If there is an unattached subordinate bus, attach
>> -  * it and then scan for unattached PCI devices.
>> -  */
>> +
>>   if (!child)
>>   continue;
>> - if (list_empty(>node)) {
>> - down_write(_bus_sem);
>> - list_add_tail(>node, >bus->children);
>> - up_write(_bus_sem);
>> - }
>
> This doesn't seem to have a replacement.  Why isn't it necessary any more?
>

add that in changelog, so related changlog will be:

---
Also remove unattached child bus handling in pci_bus_add_devices().
Because that is not needed, child bus via pci_add_new_bus() is already
in parent bus children list.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/2] Adding USB 3.0 DRD-phy support for exynos5250

2013-01-20 Thread Vivek Gautam

Hi Felipe,


On Mon, Jan 14, 2013 at 6:29 PM, Vivek Gautam  wrote:
> Changes from v2:
>  - Renaming 'samsung-usbphy.c' driver to 'samsung-usb2.c' indicating
>usb 2.0 phy controller's driver for Samsung's SoCs.
>  - Moving the register definitions and strcuture definitions to
>common header file 'samsung-usbphy.h' to be used across
>usb 2.0 and usb 3.0 phy.
>  - Keeping common exported function definitions in samsung-usbphy.c
>which can be used across usb 2.0 and usb 3.0 phy.
>  - Writting separate driver file for Samsung's USB 3.0 phy controller.
>and making it dependent on USB_DWC3.
>

Is the re-organization being done here fine as per requirements for
separate drivers for usb 2.0 type PHY and usb 3.0 type PHY ?

> Rebased on top of usb-next followed by following patches/patch-threads:
> -- [PATCH v9 1/2] usb: phy: samsung: Introducing usb phy driver for 
> hsotg
> -- [PATCH] usb: phy: samsung: Add support to set pmu isolation 
> (version 6)
> -- [PATCH v6 0/4] Adding usb2.0 host-phy support for exynos5250
>
> Changes form v1:
>  - Moved architecture related patch out of this patch-set.
>  - Replaced unnecessary multi-line macro definitions by
>single line definitions.
>  - Creating new data structure for USB 3.0 phy type and embedding
>it in 'samsung_usbphy' structure.
>  - Adding a flag in 'samsung_usbphy' structure to check if device
>has usb 3.0 type phy or not.
>  - Restructuring probe sequence for USB 3.0 phy, such that we are
>initializing only when device has usb3.0 type phy.
>
> Vivek Gautam (2):
>   usb: phy: samsung: Common out the generic stuff
>   usb: phy: samsung: Add PHY support for USB 3.0 controller
>
>  drivers/usb/phy/Kconfig  |8 +
>  drivers/usb/phy/Makefile |3 +-
>  drivers/usb/phy/samsung-usb2.c   |  511 +++
>  drivers/usb/phy/samsung-usb3.c   |  349 +++
>  drivers/usb/phy/samsung-usbphy.c |  713 
> +-
>  drivers/usb/phy/samsung-usbphy.h |  328 +
>  6 files changed, 1205 insertions(+), 707 deletions(-)
>  create mode 100644 drivers/usb/phy/samsung-usb2.c
>  create mode 100644 drivers/usb/phy/samsung-usb3.c
>  create mode 100644 drivers/usb/phy/samsung-usbphy.h
>



-- 
Thanks & Regards
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Mike Galbraith

On Mon, 2013-01-21 at 13:07 +0800, Michael Wang wrote:

> That seems like the default one, could you please show me the numbers in
> your datapoint file?

Yup, I do not touch the workfile.  Datapoints is what you see in the
tabulated result...

1
1
1
5
5
5
10
10
10
...

so it does three consecutive runs at each load level.  I quiesce the
box, set governor to performance, echo 250 32000 32 4096
> /proc/sys/kernel/sem, then ./multitask -nl -f, and point it
at ./datapoints.

> I'm not familiar with this benchmark, but I'd like to have a try on my
> server, to make sure whether it is a generic issue.

One thing I didn't like about your changes is that you don't ask
wake_affine() if it's ok to pull cross node or not, which I though might
induce imbalance, but twiddling that didn't fix up the collapse, pretty
much leaving only the balance path.

> >> And I'm confusing about how those new parameter value was figured out
> >> and how could them help solve the possible issue?
> > 
> > Oh, that's easy.  I set sched_min_granularity_ns such that last_buddy
> > kicks in when a third task arrives on a runqueue, and set
> > sched_wakeup_granularity_ns near minimum that still allows wakeup
> > preemption to occur.  Combined effect is reduced over-scheduling.
> 
> That sounds very hard, to catch the timing, whatever, it could be an
> important clue for analysis.

(Play with the knobs with a bunch of different loads, I think you'll
find that those settings work well)

> >> Do you have any idea about which part in this patch set may cause the 
> >> issue?
> > 
> > Nope, I'm as puzzled by that as you are.  When the box had 40 cores,
> > both virgin and patched showed over-scheduling effects, but not like
> > this.  With 20 cores, symptoms changed in a most puzzling way, and I
> > don't see how you'd be directly responsible.
> 
> Hmm...
> 
> > 
> >> One change by designed is that, for old logical, if it's a wake up and
> >> we found affine sd, the select func will never go into the balance path,
> >> but the new logical will, in some cases, do you think this could be a
> >> problem?
> > 
> > Since it's the high load end, where looking for an idle core is most
> > likely to be a waste of time, it makes sense that entering the balance
> > path would hurt _some_, it isn't free.. except for twiddling preemption
> > knobs making the collapse just go away.  We're still going to enter that
> > path if all cores are busy, no matter how I twiddle those knobs.
> 
> May be we could try change this back to the old way later, after the aim
> 7 test on my server.

Yeah, something funny is going on.  I'd like select_idle_sibling() to
just go away, that task be integrated into one and only one short and
sweet balance path.  I don't see why fine_idlest* needs to continue
traversal after seeing a zero.  It should be just fine to say gee, we're
done.  Hohum, so much for pure test and report, twiddle twiddle tweak,
bend spindle mutilate ;-) 
   
-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fat: eliminate iterations in fat_search_long in case of EOD

2013-01-20 Thread Namjae Jeon

2013/1/21, OGAWA Hirofumi :
> Namjae Jeon  writes:
>
>> 2013/1/20, OGAWA Hirofumi :
>>> Namjae Jeon  writes:
>>>
 From: Namjae Jeon 

 When searching a directory for names, we can stop checking for further
 entries if we detect End of Directory, i.e. if (de->name[0] ==
 0x00).The
 current code traverses the cluster chain of a directory until a hit is
 found or till the last cluster for that directory, ignoring the EOD
 mark.
 Fix this.
>>>
>>> f_pos still works fine after this change?
>> Hi OGAWA.
>> I can not find f_pos usage in fat_search_long function.
>> Maybe, Have you seen other function such as __fat_readdir ?
>> Let me know your opinion.
>
> Ah, I see. Only ->lookup. So, this makes behavior more strange.
> I.e. readdir() returns beyond 0, but lookup() can't find it?
Yes, Good point. I will check other places included readdir.
Thanks for review!
>
> Thanks.
> --
> OGAWA Hirofumi 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 5/5] timekeeping: Add support for clocksource which doesn't stop during suspend

2013-01-20 Thread Feng Tang

There are some new processors whose TSC clocksource won't stop during
suspend. Currently, after system resumes from sleep state, kernel will
use persistent clock or RTC to compensate the sleep time, but for those
new types of clocksources, we could skip the special compensation from
external sources, and just use current clocksource for recounting.

This can solve some time drift bugs caused by the not-so-accurate RTC
devices.

Signed-off-by: Feng Tang 
---
 kernel/time/timekeeping.c |   23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cbc6acb..628c9ba 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -749,22 +749,36 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 static void timekeeping_resume(void)
 {
struct timekeeper *tk = 
+   struct clocksource *clock = tk->clock;
unsigned long flags;
struct timespec ts;
+   cycle_t cycle_now, cycle_delta;
+   s64 nsec;
 
read_persistent_clock();
-
clockevents_resume();
clocksource_resume();
 
write_seqlock_irqsave(>lock, flags);
 
-   if (timespec_compare(, _suspend_time) > 0) {
+   if (clock->flags & CLOCK_SOURCE_SUSPEND_NOTSTOP) {
+   cycle_now = clock->read(clock);
+   cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
+   clock->cycle_last = cycle_now;
+
+   nsec = clocksource_cyc2ns(cycle_delta, clock->mult, 
clock->shift);
+   ts = ns_to_timespec(nsec);
+   } else if (timespec_compare(, _suspend_time) > 0)
ts = timespec_sub(ts, timekeeping_suspend_time);
-   __timekeeping_inject_sleeptime(tk, );
+   else {
+   ts.tv_sec = 0;
+   ts.tv_nsec = 0;
}
+
+   __timekeeping_inject_sleeptime(tk, );
+
/* re-base the last cycle value */
-   tk->clock->cycle_last = tk->clock->read(tk->clock);
+   clock->cycle_last = clock->read(clock);
tk->ntp_error = 0;
timekeeping_suspended = 0;
timekeeping_update(tk, false);
@@ -1134,7 +1148,6 @@ static inline void old_vsyscall_fixup(struct timekeeper 
*tk)
 #endif
 
 
-
 /**
  * update_wall_time - Uses the current clocksource to increment the wall time
  *
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/5] x86: tsc: Add support for new S3_NOTSTOP feature

2013-01-20 Thread Feng Tang

Signed-off-by: Feng Tang 
---
 arch/x86/kernel/tsc.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 06ccb50..4cc33ca 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -767,7 +767,8 @@ static cycle_t read_tsc(struct clocksource *cs)
 
 static void resume_tsc(struct clocksource *cs)
 {
-   clocksource_tsc.cycle_last = 0;
+   if (!boot_cpu_has(X86_FEATURE_TSC_S3_NOTSTOP))
+   clocksource_tsc.cycle_last = 0;
 }
 
 static struct clocksource clocksource_tsc = {
@@ -938,6 +939,9 @@ static int __init init_tsc_clocksource(void)
clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
}
 
+   if (boot_cpu_has(X86_FEATURE_TSC_S3_NOTSTOP))
+   clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NOTSTOP;
+
/*
 * Trust the results of the earlier calibration on systems
 * exporting a reliable TSC.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 4/5] clocksource: Enlarge the maxim time interval when configuring the scale and shift

2013-01-20 Thread Feng Tang

On our x86 platform, we see a failure case of calling clocksource_cyc2ns(),
which return a negative value. The reason is the time interval was large
(more than 1000 seconds), while its TSC frequency is 2GHz, so the following
fomular overflowed:
((u64) cycles * mult) >> shift

So enlarge the time interval from 10 mins to 40 mins to fix the bug.

Another solution may be adding a "max_interval" in struct clocksource, and
use a default value (like current 10 minutes) when clocksource driver
doesn't set it.

Signed-off-by: Feng Tang 
---
 kernel/time/clocksource.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c958338..48fbfcb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -663,7 +663,7 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
 * Calc the maximum number of seconds which we can run before
 * wrapping around. For clocksources which have a mask > 32bit
 * we need to limit the max sleep time to have a good
-* conversion precision. 10 minutes is still a reasonable
+* conversion precision. 40 minutes is still a reasonable
 * amount. That results in a shift value of 24 for a
 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
@@ -674,8 +674,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, 
u32 scale, u32 freq)
do_div(sec, scale);
if (!sec)
sec = 1;
-   else if (sec > 600 && cs->mask > UINT_MAX)
-   sec = 600;
+   else if (sec > 2400 && cs->mask > UINT_MAX)
+   sec = 2400;
 
clocks_calc_mult_shift(>mult, >shift, freq,
   NSEC_PER_SEC / scale, sec * scale);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/5] Add support for S3 non-stop TSC support.

2013-01-20 Thread Feng Tang

Hi All,

On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop S3, say the TSC value won't be
reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycles.

The enabling efforts include adding new flags for this feature, 
modifying clocksource.c and timekeeping.c to support and utilizing
it.

One remaining question is inside the timekeeping_resume(), we don't
know if it is called by resuming from suspend(s2ram) or from
hibernate(s2disk), as there is no easy way to check it currently.
But it doesn't hurt as these Penwell/Cloverview platforms only have
S3 state, and no S4.

Please help to review them, thanks!

- Feng

-

Feng Tang (5):
  x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP
  clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP
  x86: tsc: Add support for new S3_NOTSTOP feature
  clocksource: Enlarge the maxim time interval when configuring the
scale and shift
  timekeeping: Add support for clocksource which doesn't stop during
suspend

 arch/x86/include/asm/cpufeature.h |1 +
 arch/x86/kernel/cpu/intel.c   |   12 
 arch/x86/kernel/tsc.c |6 +-
 include/linux/clocksource.h   |1 +
 kernel/time/clocksource.c |6 +++---
 kernel/time/timekeeping.c |   23 ++-
 6 files changed, 40 insertions(+), 9 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/5] x86: Add cpu capability flag X86_FEATURE_TSC_S3_NOTSTOP

2013-01-20 Thread Feng Tang

On some new Intel Atom processors (Penwell and Cloverview), there is
a feature that the TSC won't stop S3, say the TSC value won't be
reset to 0 after resume. This feature makes TSC a more reliable
clocksource and could benefit the timekeeping code during system
suspend/resume cycle, so add a flag for it.

Signed-off-by: Feng Tang 
---
 arch/x86/include/asm/cpufeature.h |1 +
 arch/x86/kernel/cpu/intel.c   |   12 
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 2d9075e..f7e1eac 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -100,6 +100,7 @@
 #define X86_FEATURE_AMD_DCM (3*32+27) /* multi-node processor */
 #define X86_FEATURE_APERFMPERF (3*32+28) /* APERFMPERF */
 #define X86_FEATURE_EAGER_FPU  (3*32+29) /* "eagerfpu" Non lazy FPU restore */
+#define X86_FEATURE_TSC_S3_NOTSTOP (3*32+30) /* TSC doesn't stop in S3 state */
 
 /* Intel-defined CPU features, CPUID level 0x0001 (ecx), word 4 */
 #define X86_FEATURE_XMM3   (4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fcaabd0..532f873 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -97,6 +97,18 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
sched_clock_stable = 1;
}
 
+   /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
+   if (c->x86 == 6) {
+   switch (c->x86_model) {
+   case 0x27:  /* Penwell */
+   case 0x35:  /* Cloverview */
+   set_cpu_cap(c, X86_FEATURE_TSC_S3_NOTSTOP);
+   break;
+   default:
+   ;
+   }
+   }
+
/*
 * There is a known erratum on Pentium III and Core Solo
 * and Core Duo CPUs.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/5] clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NOTSTOP

2013-01-20 Thread Feng Tang

Some x86 processors have a TSC clocksource, which continue to work when
system is suspend. Add a feature flag so that it could be utilized.

Signed-off-by: Feng Tang 
---
 include/linux/clocksource.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 4dceaf8..2d53a8a 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -206,6 +206,7 @@ struct clocksource {
 #define CLOCK_SOURCE_WATCHDOG  0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES0x20
 #define CLOCK_SOURCE_UNSTABLE  0x40
+#define CLOCK_SOURCE_SUSPEND_NOTSTOP   0x80
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

scripts/package/Makefile: KBUILD_OUTPUT is useless in rpm build

2013-01-20 Thread Bin Wang

I found KBUILD_OUTPUT variable is useless in the rpm-pkg and rpm target.

Yes there is a comment said:

# Note that the rpm-pkg target cannot be used with KBUILD_OUTPUT,
# but the binrpm-pkg target can; for some reason O= gets ignored.

It does not say for what reason. Also, the code under rpm-pkg checks if
KBUILD_OUTPUT is defined.

> @if test -n "$(KBUILD_OUTPUT)"; then \
> echo "Building source + binary RPM is not possible outside the"; \
> echo "kernel source tree. Don't set KBUILD_OUTPUT, or use the"; \
> echo "binrpm-pkg target instead."; \
> false; \
> fi

But the fact is, whether or not the user use "O=" option, KBUILD_OUTPUT
is always empty. I try to figure out why but the big Makefile drives me
crazy. I'm thinking if the "O=" option really don't effect KBUILD_OUTPUT
here, at least remove these code.

-- 
Bin Wang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ARM: davinci: da850: add OF_DEV_AUXDATA entry for eth0.

2013-01-20 Thread Prabhakar Lad

From: Lad, Prabhakar 

Add OF_DEV_AUXDATA for eth0  driver in da850 board dt
file to use emac clock.

Signed-off-by: Lad, Prabhakar 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: davinci-linux-open-sou...@linux.davincidsp.com
Cc: net...@vger.kernel.org
Cc: devicetree-disc...@lists.ozlabs.org
Cc: Sekhar Nori 
Cc: Heiko Schocher 
---
 arch/arm/mach-davinci/da8xx-dt.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mach-davinci/da8xx-dt.c b/arch/arm/mach-davinci/da8xx-dt.c
index 37c27af..d548a38 100644
--- a/arch/arm/mach-davinci/da8xx-dt.c
+++ b/arch/arm/mach-davinci/da8xx-dt.c
@@ -37,11 +37,18 @@ static void __init da8xx_init_irq(void)
of_irq_init(da8xx_irq_match);
 }
 
+struct of_dev_auxdata da850_evm_auxdata_lookup[] __initdata = {
+   OF_DEV_AUXDATA("ti,davinci-dm6467-emac", 0x01e2, "davinci_emac.1",
+  NULL),
+   {}
+};
+
 #ifdef CONFIG_ARCH_DAVINCI_DA850
 
 static void __init da850_init_machine(void)
 {
-   of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
+   of_platform_populate(NULL, of_default_bus_match_table,
+da850_evm_auxdata_lookup, NULL);
 
da8xx_uart_clk_enable();
 }
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] ARM: davinci: da850: add DT node for eth0.

2013-01-20 Thread Prabhakar Lad

From: Lad, Prabhakar 

Add eth0 device tree node information to da850 by
providing interrupt details and local mac address of eth0.

Signed-off-by: Lad, Prabhakar 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: davinci-linux-open-sou...@linux.davincidsp.com
Cc: net...@vger.kernel.org
Cc: devicetree-disc...@lists.ozlabs.org
Cc: Sekhar Nori 
Cc: Heiko Schocher 
---
 arch/arm/boot/dts/da850-evm.dts |3 +++
 arch/arm/boot/dts/da850.dtsi|   15 +++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/arm/boot/dts/da850-evm.dts b/arch/arm/boot/dts/da850-evm.dts
index 37dc5a3..a1d6e3e 100644
--- a/arch/arm/boot/dts/da850-evm.dts
+++ b/arch/arm/boot/dts/da850-evm.dts
@@ -24,5 +24,8 @@
serial2: serial@1d0d000 {
status = "okay";
};
+   eth0: emac@1e2 {
+   status = "okay";
+   };
};
 };
diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 640ab75..309cc99 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -56,5 +56,20 @@
interrupt-parent = <>;
status = "disabled";
};
+   eth0: emac@1e2 {
+   compatible = "ti,davinci-dm6467-emac";
+   reg = <0x22 0x4000>;
+   ti,davinci-ctrl-reg-offset = <0x3000>;
+   ti,davinci-ctrl-mod-reg-offset = <0x2000>;
+   ti,davinci-ctrl-ram-offset = <0>;
+   ti,davinci-ctrl-ram-size = <0x2000>;
+   local-mac-address = [ 00 00 00 00 00 00 ];
+   interrupts = <33
+   34
+   35
+   36
+   >;
+   interrupt-parent = <>;
+   };
};
 };
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] ARM: davinci: da850: add ethernet driver DT support

2013-01-20 Thread Prabhakar Lad

From: Lad, Prabhakar 

This patch set enables Ethernet support through device tree model.

Patches are available on [1] for testing.

[1] 
http://git.linuxtv.org/mhadli/v4l-dvb-davinci_devices.git/shortlog/refs/heads/da850_dt

Lad, Prabhakar (2):
  ARM: davinci: da850: add DT node for eth0.
  ARM: davinci: da850: add OF_DEV_AUXDATA entry for eth0.

 arch/arm/boot/dts/da850-evm.dts  |3 +++
 arch/arm/boot/dts/da850.dtsi |   15 +++
 arch/arm/mach-davinci/da8xx-dt.c |9 -
 3 files changed, 26 insertions(+), 1 deletions(-)

-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Jan 21

2013-01-20 Thread Stephen Rothwell

Hi all,

Changes since 20130118:

The powerpc tree still had a build failure.

The security tree gained a conflict against Linus' tree.

The driver-core tree lost its build failure.

The tty tree gained a conflict against Linus' tree.

The usb tree gained a conflict against Linus' tree and a build failure so
I used the version from next-20130118.

The gpio-lw tree lost its build failure.

The samsung tree gained a conflict against the gpio-lw tree.

The akpm tree gained a conflict against the drm tree and a build failure
for which I reverted a commit.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 211 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (3a142ed Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal)
Merging fixes/master (d287b87 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging kbuild-current/rc-fixes (02f3e53 Merge branch 'yem-kconfig-rc-fixes' of 
git://gitorious.org/linux-kconfig/linux-kconfig into kbuild/rc-fixes)
Merging arm-current/fixes (210b184 Merge branch 'for-rmk/virt/hyp-boot/fixes' 
of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into fixes)
Merging m68k-current/for-linus (e7e29b4 m68k: Wire up finit_module)
Merging powerpc-merge/merge (e6449c9 powerpc: Add missing NULL terminator to 
avoid boot panic on PPC40x)
Merging sparc/master (b7c13f7 sparc: remove __devinit, __devexit annotations)
Merging net/master (b74aa93 tcp: fix incorrect LOCKDROPPEDICMPS counter)
Merging sound-current/for-linus (e043403 ALSA: hda - Fix mute led for another 
HP machine)
Merging pci-current/for-linus (444ee9b PCI: remove depends on 
CONFIG_EXPERIMENTAL)
Merging wireless/master (4668cce ath9k: disable the tasklet before taking the 
PCU lock)
Merging driver-core.current/driver-core-linus (7d1f9ae Linux 3.8-rc4)
Merging tty.current/tty-linus (ebebd49 8250/16?50: Add support for Broadcom 
TruManage redirected serial port)
Merging usb.current/usb-linus (1ee0a22 USB: io_ti: Fix NULL dereference in 
chase_port())
Merging staging.current/staging-linus (7dfc833 staging/sb105x: PARPORT config 
is not good enough must use PARPORT_PC)
Merging char-misc.current/char-misc-linus (33080c1 Drivers: hv: balloon: Fix a 
memory leak)
Merging input-current/for-linus (b666263 Input: document that unregistering 
managed devices is not necessary)
Merging md-current/for-linus (a9add5d md/raid5: add blktrace calls)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (a2c0911 crypto: caam - Updated SEC-4.0 device 
tree binding for ERA information.)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (084a0ec x86: add CONFIG_X86_MOVBE option)
CONFLICT (content): Merge conflict in arch/x86/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6)
Merging devicetree-current/devicetree/merge (ab28698 of: define struct device 
in of_platform.h if !OF_DEVICE and !OF_ADDRESS)
Merging

[BUG] Bug in netprio_cgroup and netcls_cgroup ?

2013-01-20 Thread Li Zefan

I'm not a network developer, so correct me if I'm wrong.

Since commit 406a3c638ce8b17d9704052c07955490f732c2b8
("net: netprio_cgroup: rework update socket logic"), sock->sk->sk_cgrp_prioidx
is set when the socket is created, and won't be updated unless the task is
moved to another cgroup.

Now the problem is, a socket can be _shared_ by multiple processes (fork, 
SCM_RIGHT).
If we place those processes in different cgroups, and each cgroup has
different configs, but all of the processes will send data via this socket
with the same network priority.

Similar with cls cgroup.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the final tree (akpm tree related)

2013-01-20 Thread Stephen Rothwell

Hi all,

After merging the final tree, today's linux-next build (arm defconfig)
failed like this:

mm/memblock.c: In function 'memblock_find_in_range_node':
mm/memblock.c:104:2: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:123:4: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:130:7: error: invalid use of undefined type 'struct 
movablecore_map'
mm/memblock.c:131:4: error: invalid use of undefined type 'struct 
movablecore_map'

Caused by commit "page_alloc: bootmem limit with movablecore_map" from
the akpm tree.  The definition of struct movablecore_map is protected by
CONFIG_HAVE_MEMBLOCK_NODE_MAP but its use is not.

I have reverted that commit for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpfZzvezWCFY.pgp
Description: PGP signature

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2013-01-20 Thread Al Viro

On Sun, Jan 20, 2013 at 06:39:09PM -0800, Linus Torvalds wrote:

> And right now, that HAVE_SYSCALL_WRAPPERS does make it much harder to
> think about the header file changes.

Agreed.

> > FWIW, there's another bit of ugliness around that area - all these
> > #define __SC_BLAH3, etc., all of the same form.  This stuff begs for
> > something like
> > #define __MAP1(m,t,a) m(t,a)
> > #define __MAP2(m,t,a,...) m(t,a) __MAP1(m,__VA_ARGS__)
> > #define __MAP3(m,t,a,...) m(t,a) __MAP2(m,__VA_ARGS__)
> > #define __MAP4(m,t,a,...) m(t,a) __MAP3(m,__VA_ARGS__)
> > #define __MAP5(m,t,a,...) m(t,a) __MAP4(m,__VA_ARGS__)
> > #define __MAP6(m,t,a,...) m(t,a) __MAP5(m,__VA_ARGS__)
> > #define __MAP(n,...) __MAP##n(__VA_ARGS__)
> > with __MAP(x,__SC_DECL,__VA_ARGS__) instead of __SC_DECL##x(__VA_ARGS__)
> > etc. in users...

... with missing commas added, of course.

> Well, I can see both sides. The above is the nice and dense
> declaration model with less duplication, but christ, it's hard for
> people to wrap their minds around unless they've seen it a million
> times. It really does take some getting used to, and the long-form can
> be easier to understand.

Umm...  Even with
/*
 * __MAP - apply a given macro to all syscall arguments.
 * __MAP(n, m, t1, a1, ..., tn, an) will expand to
 *  m(t1,a1), m(t2,a2), ..., m(tn, an)
 * Note that the first argument of __MAP must be equal to the number of
 * type, name pairs in the list.  The list itself (all arguments of __MAP
 * starting with the 3rd one) is in the form we pass to SYSCALL_DEFINE.
 */
slapped on top of it?

> That said, we have so many of those things now when it comes to the
> syscall stuff that the dense form seems to be called for just to be
> consistent.
> 
> So go wild if you have the energy for it. I'm not going to pull that
> for 3.8, though.

No, that's obviously next cycle fodder, along with the sick tricks for
generating compat wrappers on s390 if Martin can live with those.

BTW, grep for asmlinkage; it's amazing how much cargo-culting is going
on with it ;-/  Some of the instances are syscalls yet to be converted
to SYSCALL_DEFINE; even more of COMPAT_SYSCALL_DEFINE-to-be.  We
also have a bunch of declarations in syscalls.h and compat.h - those
are fine.  _Some_ of the rest might be legitimate - ia64 and i386 have
non-trivial asmlinkage expansion and some (but not all) of arch/{x86,ia64}
instances do make sense.  Not all of those - e.g. things like
FPU_divide_by_zero() have no business being regparm(0); they are only called
from C code and forcing their arguments on stack is a pure pessimization for
no reason whatsoever.  Everything else in arch/* is magic green marker,
AFAICS...

There are some borderline cases - e.g. I'm not sure if having sys_recv
done *not* via SYSCALL_DEFINE() is deliberate; it might cut down on
some overhead (the sucker's calling sys_recvfrom(), which does normalizations,
which make normalizing in sys_recv() pointless).  OTOH, sys_send *is*
done as SYSCALL_DEFINE, even though it ends up calling sys_sendto()...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-20 Thread Minchan Kim

On Fri, Jan 18, 2013 at 11:46:02PM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 18, 2013 at 07:11:32PM -0600, Matt Sealey wrote:
> > On Fri, Jan 18, 2013 at 3:08 PM, Russell King - ARM Linux
> >  wrote:
> > > On Fri, Jan 18, 2013 at 02:24:15PM -0600, Matt Sealey wrote:
> > >> Hello all,
> > >>
> > >> I wonder if anyone can shed some light on this linking problem I have
> > >> right now. If I configure my kernel without SMP support (it is a very
> > >> lean config for i.MX51 with device tree support only) I hit this error
> > >> on linking:
> > >
> > > Yes, I looked at this, and I've decided that I will _not_ fix this export,
> > > neither will I accept a patch to add an export.
> > 
> > Understood..
> > 
> > > As far as I can see, this code is buggy in a SMP environment.  There's
> > > apparantly no guarantee that:
> > >
> > > 1. the mapping will be created on a particular CPU.
> > > 2. the mapping will then be used only on this specific CPU.
> > > 3. no guarantee that another CPU won't speculatively prefetch from this
> > >region.
> > > 4. when the mapping is torn down, no guarantee that it's the same CPU that
> > >used the happing.
> > >
> > > So, the use of the local TLB flush leaves all the other CPUs potentially
> > > containing TLB entries for this mapping.
> > 
> > I'm gonna put this out to the maintainers (Konrad, and Seth since he
> > committed it) that if this code is buggy it gets taken back out, even
> > if it makes zsmalloc "slow" on ARM, for the following reasons:
> 
> Just to make sure I understand, you mean don't use page table
> mapping but instead use copying?
> 
> > 
> > * It's buggy on SMP as Russell describes above
> > * It might not be buggy on UP (opposite to Russell's description above
> > as the restrictions he states do not exist), but that would imply an
> > export for a really core internal MM function nobody should be using
> > anyway
> > * By that assessment, using that core internal MM function on SMP is
> > also bad voodoo that zsmalloc should not be doing
> 
>  'local_tlb_flush' is bad voodoo?
> 
> > 
> > It also either smacks of a lack of comprehensive testing or defiance
> > of logic that nobody ever built the code without CONFIG_SMP, which
> > means it was only tested on a bunch of SMP ARM systems (I'm guessing..
> > Pandaboard? :) or UP systems with SMP/SMP_ON_UP enabled (to expand on
> > that guess, maybe Beagleboard in some multiplatform Beagle/Panda
> > hybrid kernel). I am sure I was reading the mailing lists when that
> > patch was discussed, coded and committed and my guess is correct. In
> > this case, what we have here anyway is code which when PROPERLY
> > configured as so..
> 
> The initial patch were done on x86. Then Seth did the work to make sure
> it worked on PPC. Munchin looked on ARM and that is it.

s/Munchin/Minchan

> 
> If you have an ARM server that you would be willing to part with I would
> be thrilled to look at it.
> 
> > 
> > diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
> > b/drivers/staging/zsmalloc/zsmalloc-main.c
> > index 09a9d35..ecf75fb 100644
>   > --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> > +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> > @@ -228,7 +228,7 @@ struct zs_pool {
> >   * mapping rather than copying
> >   * for object mapping.
> >  */
> > -#if defined(CONFIG_ARM)
> > +#if defined(CONFIG_ARM) && defined(CONFIG_SMP)
> >  #define USE_PGTABLE_MAPPING

I don't get it. How to prevent the problem Russel described?
The problem is that other CPU can prefetch _speculatively_ under us.

> >  #endif
> > 
> > .. such that it even compiles in both "guess" configurations, the
> > slower Cortex-A8 600MHz single core system gets to use the slow copy
> > path and the dual-core 1GHz+ Cortex-A9 (with twice the RAM..) gets to
> > use the fast mapping path. Essentially all the patch does is "improve
> > performance" on the fastest, best-configured, large-amounts-of-RAM,
> > lots-of-CPU-performance ARM systems (OMAP4+, Snapdragon, Exynos4+,
> > marvell armada, i.MX6..) while introducing the problems Russell
> > describes, and leave performance exactly the same and potentially far
> > more stable on the slower, memory-limited ARM machines.
> 
> Any ideas on how to detect that?
> > 
> > Given the purpose of zsmalloc, zram, zcache etc. this somewhat defies
> > logic. If it's not making the memory-limited, slow ARM systems run
> > better, what's the point?
> > 
> > So in summary I suggest "we" (Greg? or is it Seth's responsibility?)
> > should just back out that whole USE_PGTABLE_MAPPING chunk of code
> > introduced with f553646. Then Russell can carry on randconfiging and I
> > can build for SMP and UP and get the same code.. with less bugs.
> 
> I get that you want to have this fixed right now. I think having it
> fixed the right way is a better choice. Lets discuss that first
> before we start tossing patches to disable parts of it.

If I don't miss something, we could have 2 choice.

1) use

Re: linux-next: build failure after merge of the gpio-lw tree

2013-01-20 Thread Stephen Rothwell

Hi Shawn,

On Mon, 21 Jan 2013 14:20:13 +0800 Shawn Guo  wrote:
>
> On Sat, Jan 19, 2013 at 10:40:45AM +1100, Stephen Rothwell wrote:
> > 
> > On Fri, 18 Jan 2013 16:02:13 +0800 Shawn Guo  wrote:
> > >
> > > My bad, sorry for that.  I just sent a v2 in reply to this message
> > > for fixing the error.  I spent some time trying to install a ppc64
> > > toolchain for testing, but unfortunately with on luck.  So Stephen,
> > > I have to rely on linux-next to give it a test again.  Thanks.
> > 
> > Cross compilers suitable for building kernels are available at
> > http://www.kernel.org/pub/tools/crosstool/ .
> 
> Thanks for the link, Stephen.  I installed the compiler and verified
> that the v2 fixes the error.

Thanks.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpSPRULyEr9U.pgp
Description: PGP signature

Re: linux-next: build failure after merge of the gpio-lw tree

2013-01-20 Thread Shawn Guo

On Sat, Jan 19, 2013 at 10:40:45AM +1100, Stephen Rothwell wrote:
> Hi Shawn,
> 
> On Fri, 18 Jan 2013 16:02:13 +0800 Shawn Guo  wrote:
> >
> > My bad, sorry for that.  I just sent a v2 in reply to this message
> > for fixing the error.  I spent some time trying to install a ppc64
> > toolchain for testing, but unfortunately with on luck.  So Stephen,
> > I have to rely on linux-next to give it a test again.  Thanks.
> 
> Cross compilers suitable for building kernels are available at
> http://www.kernel.org/pub/tools/crosstool/ .

Thanks for the link, Stephen.  I installed the compiler and verified
that the v2 fixes the error.

Shawn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/76] Synopsys ARC Linux kernel Port

2013-01-20 Thread Vineet Gupta

On Sunday 20 January 2013 11:45 AM, H. Peter Anvin wrote:
> On 01/18/2013 04:24 AM, Vineet Gupta wrote:
>> This patchset based off-of 3.8-rc4, adds the Linux kernel port to ARC700
>> processor family (750D and 770D) from Synopsys. I would be greatful for
>> further review and feedback.
> 
> One thing: ARC, as I understand it, is a whole family of architectures, which
> mostly have in common their origin at Synopsys.  

ARC has had a long history - as a startup in 90's. There used to be ARCTanget
Instruction set (and cores A4,A5... based on that) which 10 years ago got
deprecated by current ARCompact ISA (600 / 700 cores). So yes it is a family of
architectures - but all we care about is the ARCompact and 600/700 at Synopsys.

However, I don't think there were sub-architectures or forks which floated 
around.
The MIPS "ARC" seems to be some sort of firmware standard, but not  related to
ARC: Ralf would you care to shed some light ? I don't know of any ARC arch other
than these.

> Can we make this arch/arc700
> since that is what it is?
> 
> -hpa
> 

As of now yes, but in near future we may have a new Instruction Set and cores
based on that - so it will be better if we keep a non specific name.

Thx,
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] OPP usage fixes for RCU locking

2013-01-20 Thread MyungJoo Ham

On Sat, Jan 19, 2013 at 7:28 AM, Rafael J. Wysocki  wrote:
> On Friday, January 18, 2013 01:52:31 PM Nishanth Menon wrote:
>> Hi,
>> Despite being documented in function documentation and in
>> Documentation/power/opp.txt, many of the users of OPP APIs
>> dont honor RCU lock usage appropriately.
>>
>> This recently appeared in IRC discussion earlier today [1].
>> I did an audit of current usage and the following series
>> is a result of this.
>>
>> NOTE:
>> 1. The patch "PM / devfreq: exynos4_bus: honor RCU lock usage" has only
>>been build tested as I dont have an exynos platform to try it on. I have
>>tried to make it as least intrusive as possible and at least reviewed
>>to ensure I haven't screwed anything up.
>>
>> Other than this, I have added appropriate tested by information in requisite
>> patches.
>
> Thanks for the fixes.
>
> MyungJoo, do you want me to take the devfreq ones too?
>
> Rafael

Yes, please take RCU-OPP patches. Having those patches splitted
doesn't seem beneficial.

I'll let other devfreq patches be based on this after you get them applied.




Cheers,
MyungJoo

>
>
>> Series is based off: v3.8-rc4 tag
>> Also available in the following location[2]:
>> https://github.com/nmenon/linux-2.6-playground branch: post/pm/opp-fixes-v1
>>
>> Nishanth Menon (4):
>>   cpufreq: OMAP: use RCU locks around usage of OPP
>>   cpufreq: cpufreq-cpu0: use RCU locks around usage of OPP
>>   PM / devfreq: add locking documentation for recommend_opp
>>   PM / devfreq: exynos4_bus: honor RCU lock usage
>>
>>  drivers/cpufreq/cpufreq-cpu0.c |5 +++
>>  drivers/cpufreq/omap-cpufreq.c |3 ++
>>  drivers/devfreq/devfreq.c  |5 +++
>>  drivers/devfreq/exynos4_bus.c  |   94 
>> 
>>  4 files changed, 80 insertions(+), 27 deletions(-)
>>
>> [1] http://www.beagleboard.org/irclogs/index.php?date=2013-01-18#T14:14:07
>> [2] 
>> https://github.com/nmenon/linux-2.6-playground/commits/post/pm/opp-fixes-v1
>>
>> Regards,
>> Nishanth Menon
>>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the akpm tree with the drm tree

2013-01-20 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in
drivers/gpu/drm/drm_fb_helper.c between commit 848499032504 ("drm: add
drm_modeset_lock|unlock_all") from the drm tree and commit
"drivers/gpu/drm/drm_fb_helper.c: avoid sleeping in unblank_screen() if
oops in progress" from the akpm tree.

I can't see an easy way to resolve these, so I just dropped the akpm tree
patch.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpKhdHVlO3t3.pgp
Description: PGP signature

Re: [PATCH] fat: eliminate iterations in fat_search_long in case of EOD

2013-01-20 Thread OGAWA Hirofumi

Namjae Jeon  writes:

> 2013/1/20, OGAWA Hirofumi :
>> Namjae Jeon  writes:
>>
>>> From: Namjae Jeon 
>>>
>>> When searching a directory for names, we can stop checking for further
>>> entries if we detect End of Directory, i.e. if (de->name[0] == 0x00).The
>>> current code traverses the cluster chain of a directory until a hit is
>>> found or till the last cluster for that directory, ignoring the EOD mark.
>>> Fix this.
>>
>> f_pos still works fine after this change?
> Hi OGAWA.
> I can not find f_pos usage in fat_search_long function.
> Maybe, Have you seen other function such as __fat_readdir ?
> Let me know your opinion.

Ah, I see. Only ->lookup. So, this makes behavior more strange.
I.e. readdir() returns beyond 0, but lookup() can't find it?

Thanks.
-- 
OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [LSF/MM TOPIC] Re: [dm-devel] Announcement: STEC EnhanceIO SSD caching software for Linux kernel

2013-01-20 Thread Amit Kale

> -Original Message-
> From: Mike Snitzer [mailto:snit...@redhat.com]
> Sent: Saturday, January 19, 2013 3:08 AM
> To: Darrick J. Wong
> Cc: device-mapper development; Amit Kale; linux-bca...@vger.kernel.org;
> kent.overstr...@gmail.com; LKML; lsf...@lists.linux-foundation.org; Joe
> Thornber
> Subject: Re: [LSF/MM TOPIC] Re: [dm-devel] Announcement: STEC EnhanceIO
> SSD caching software for Linux kernel
> 
> On Fri, Jan 18 2013 at  4:25pm -0500,
> Darrick J. Wong  wrote:
> 
> > Since Joe is putting together a testing tree to compare the three
> > caching things, what do you all think of having a(nother) session
> > about ssd caching at this year's LSFMM Summit?
> >
> > [Apologies for hijacking the thread.]
> > [Adding lsf-pc to the cc list.]
> 
> Hopefully we'll have some findings on the comparisons well before LSF
> (since we currently have some momentum).  But yes it may be worthwhile
> to discuss things further and/or report findings.

We should have performance comparisons presented well before the summit. It'll 
be good to have ssd caching session in any case. The likelihood that one of 
them will be included in Linux kernel before April is very low.

-Amit

PROPRIETARY-CONFIDENTIAL INFORMATION INCLUDED



This electronic transmission, and any documents attached hereto, may contain 
confidential, proprietary and/or legally privileged information. The 
information is intended only for use by the recipient named above. If you 
received this electronic message in error, please notify the sender and delete 
the electronic message. Any disclosure, copying, distribution, or use of the 
contents of information received in error is strictly prohibited, and violators 
will be pursued legally.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 10/11] PCI: Add match_driver in struct pci_dev

2013-01-20 Thread Yinghai Lu

On Sun, Jan 20, 2013 at 3:15 PM, Rafael J. Wysocki  wrote:
> On Thursday, January 17, 2013 11:53:21 PM Yinghai Lu wrote:
>> with that we could move out attaching driver for pci device,
>> out of device_add for pci hot add path.
>>
>> pci_bus_attach_device() will attach driver to pci device.
>
> Acked-by: Rafael J. Wysocki 
>
> for the code, but you still aren't saying in the changelog why the change
> is needed.

Thanks.

Please check if the changelog is good to you.

---
Subject: [PATCH] PCI: Skip attaching driver in device_add()

We want to add pci device to device tree as early as possible but
delay attach driver in next following path.

To make that patch smaller, in this patch:

We add match_driver field in pci_dev and default vaule is false, it will
make pci_bus_match fail, so device_add  will skip attaching driver,
then pci_bus_attach_device() will set match_driver to true so
pci_bus_match will return true and device_attach will attach driver
to pci device.

---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/4] zram: get rid of lockdep warning

2013-01-20 Thread Minchan Kim

Lockdep complains about recursive deadlock of zram->init_lock.
[1] made it false positive because we can't request IO to zram
before setting disksize. Anyway, we should shut lockdep up to
avoid many reporting from user.

Cc: Jerome Marchand 
Cc: Nitin Gupta 
Signed-off-by: Minchan Kim 
---
 drivers/staging/zram/zram_drv.c   |  115 +++--
 drivers/staging/zram/zram_drv.h   |   12 +++-
 drivers/staging/zram/zram_sysfs.c |   10 +++-
 3 files changed, 79 insertions(+), 58 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index e95e37c..1f6938a 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -462,19 +462,12 @@ error:
 void __zram_reset_device(struct zram *zram)
 {
size_t index;
+   struct zram_meta meta;
 
if (!zram->init_done)
goto out;
 
zram->init_done = 0;
-
-   /* Free various per-device buffers */
-   kfree(zram->compress_workmem);
-   free_pages((unsigned long)zram->compress_buffer, 1);
-
-   zram->compress_workmem = NULL;
-   zram->compress_buffer = NULL;
-
/* Free all pages that are still in this zram device */
for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
unsigned long handle = zram->table[index].handle;
@@ -484,11 +477,11 @@ void __zram_reset_device(struct zram *zram)
zs_free(zram->mem_pool, handle);
}
 
-   vfree(zram->table);
-   zram->table = NULL;
-
-   zs_destroy_pool(zram->mem_pool);
-   zram->mem_pool = NULL;
+   meta.compress_workmem = zram->compress_workmem;
+   meta.compress_buffer = zram->compress_buffer;
+   meta.table = zram->table;
+   meta.mem_pool = zram->mem_pool;
+   zram_meta_free();
 
/* Reset stats */
memset(>stats, 0, sizeof(zram->stats));
@@ -505,12 +498,59 @@ void zram_reset_device(struct zram *zram)
up_write(>init_lock);
 }
 
-/* zram->init_lock should be held */
-int zram_init_device(struct zram *zram)
+void zram_meta_free(struct zram_meta *meta)
+{
+   zs_destroy_pool(meta->mem_pool);
+   kfree(meta->compress_workmem);
+   free_pages((unsigned long)meta->compress_buffer, 1);
+   vfree(meta->table);
+   kfree(meta);
+}
+
+int zram_meta_alloc(struct zram_meta *meta, u64 disksize)
 {
-   int ret;
size_t num_pages;
 
+   meta->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
+   if (!meta->compress_workmem) {
+   pr_err("Error allocating compressor working memory!\n");
+   goto out;
+   }
+
+   meta->compress_buffer =
+   (void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, 1);
+   if (!meta->compress_buffer) {
+   pr_err("Error allocating compressor buffer space\n");
+   goto free_workmem;
+   }
+
+   num_pages = disksize >> PAGE_SHIFT;
+   meta->table = vzalloc(num_pages * sizeof(*meta->table));
+   if (!meta->table) {
+   pr_err("Error allocating zram address table\n");
+   goto free_buffer;
+   }
+
+   meta->mem_pool = zs_create_pool("zram", GFP_NOIO | __GFP_HIGHMEM);
+   if (!meta->mem_pool) {
+   pr_err("Error creating memory pool\n");
+   goto free_table;
+   }
+
+   return 0;
+
+free_table:
+   vfree(meta->table);
+free_buffer:
+   free_pages((unsigned long)meta->compress_buffer, 1);
+free_workmem:
+   kfree(meta->compress_workmem);
+out:
+   return -ENOMEM;
+}
+
+void zram_init_device(struct zram *zram, struct zram_meta *meta)
+{
if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
pr_info(
"There is little point creating a zram of greater than "
@@ -525,51 +565,16 @@ int zram_init_device(struct zram *zram)
);
}
 
-   zram->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
-   if (!zram->compress_workmem) {
-   pr_err("Error allocating compressor working memory!\n");
-   ret = -ENOMEM;
-   goto fail_no_table;
-   }
-
-   zram->compress_buffer =
-   (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
-   if (!zram->compress_buffer) {
-   pr_err("Error allocating compressor buffer space\n");
-   ret = -ENOMEM;
-   goto fail_no_table;
-   }
-
-   num_pages = zram->disksize >> PAGE_SHIFT;
-   zram->table = vzalloc(num_pages * sizeof(*zram->table));
-   if (!zram->table) {
-   pr_err("Error allocating zram address table\n");
-   ret = -ENOMEM;
-   goto fail_no_table;
-   }
-
/* zram devices sort of resembles non-rotational disks */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
 
-   zram->mem_pool = zs_create_pool("zram", GFP_NOIO | __GFP_HIGHMEM);
-   if

[PATCH v3 1/4] zram: force disksize setting before using zram

2013-01-20 Thread Minchan Kim

Now zram document syas "set disksize is optional"
but partly it's wrong. When you try to use zram firstly after
booting, you must set disksize, otherwise zram can't work because
zram gendisk's size is 0. But once you do it, you can use zram freely
after reset because reset doesn't reset to zero paradoxically.
So in this time, disksize setting is optional.:(
It's inconsitent for user behavior and not straightforward.

This patch forces always setting disksize firstly before using zram.
Yes. It changes current behavior so someone could complain when
he upgrades zram. Apparently it could be a problem if zram is mainline
but it still lives in staging so behavior could be changed for right
way to go. Let them excuse.

Cc: Nitin Gupta 
Acked-by: Dan Magenheimer 
Signed-off-by: Minchan Kim 
---
 drivers/staging/zram/zram.txt |   27 +--
 drivers/staging/zram/zram_drv.c   |   52 ++---
 drivers/staging/zram/zram_drv.h   |5 +---
 drivers/staging/zram/zram_sysfs.c |6 +
 4 files changed, 35 insertions(+), 55 deletions(-)

diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
index 5f75d29..765d790 100644
--- a/drivers/staging/zram/zram.txt
+++ b/drivers/staging/zram/zram.txt
@@ -23,17 +23,17 @@ Following shows a typical sequence of steps for using zram.
This creates 4 devices: /dev/zram{0,1,2,3}
(num_devices parameter is optional. Default: 1)
 
-2) Set Disksize (Optional):
-   Set disk size by writing the value to sysfs node 'disksize'
-   (in bytes). If disksize is not given, default value of 25%
-   of RAM is used.
-
-   # Initialize /dev/zram0 with 50MB disksize
-   echo $((50*1024*1024)) > /sys/block/zram0/disksize
-
-   NOTE: disksize cannot be changed if the disk contains any
-   data. So, for such a disk, you need to issue 'reset' (see below)
-   before you can change its disksize.
+2) Set Disksize
+Set disk size by writing the value to sysfs node 'disksize'.
+The value can be either in bytes or you can use mem suffixes.
+Examples:
+# Initialize /dev/zram0 with 50MB disksize
+echo $((50*1024*1024)) > /sys/block/zram0/disksize
+
+# Using mem suffixes
+echo 256K > /sys/block/zram0/disksize
+echo 512M > /sys/block/zram0/disksize
+echo 1G > /sys/block/zram0/disksize
 
 3) Activate:
mkswap /dev/zram0
@@ -65,8 +65,9 @@ Following shows a typical sequence of steps for using zram.
echo 1 > /sys/block/zram0/reset
echo 1 > /sys/block/zram1/reset
 
-   (This frees all the memory allocated for the given device).
-
+   This frees all the memory allocated for the given device and
+   resets the disksize to zero. You must set the disksize again
+   before reusing the device.
 
 Please report any problems at:
  - Mailing list: linux-mm-cc at laptop dot org
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 61fb8f1..1d45401 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -94,34 +94,6 @@ static int page_zero_filled(void *ptr)
return 1;
 }
 
-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
-{
-   if (!zram->disksize) {
-   pr_info(
-   "disk size not provided. You can use disksize_kb module "
-   "param to specify size.\nUsing default: (%u%% of RAM).\n",
-   default_disksize_perc_ram
-   );
-   zram->disksize = default_disksize_perc_ram *
-   (totalram_bytes / 100);
-   }
-
-   if (zram->disksize > 2 * (totalram_bytes)) {
-   pr_info(
-   "There is little point creating a zram of greater than "
-   "twice the size of memory since we expect a 2:1 compression "
-   "ratio. Note that zram uses about 0.1%% of the size of "
-   "the disk when not in use so a huge zram is "
-   "wasteful.\n"
-   "\tMemory Size: %zu kB\n"
-   "\tSize you selected: %llu kB\n"
-   "Continuing anyway ...\n",
-   totalram_bytes >> 10, zram->disksize >> 10);
-   }
-
-   zram->disksize &= PAGE_MASK;
-}
-
 static void zram_free_page(struct zram *zram, size_t index)
 {
unsigned long handle = zram->table[index].handle;
@@ -495,6 +467,9 @@ void __zram_reset_device(struct zram *zram)
 {
size_t index;
 
+   if (!zram->init_done)
+   goto out;
+
zram->init_done = 0;
 
/* Free various per-device buffers */
@@ -522,7 +497,9 @@ void __zram_reset_device(struct zram *zram)
/* Reset stats */
memset(>stats, 0, sizeof(zram->stats));
 
+out:
zram->disksize = 0;
+   set_capacity(zram->disk, 0);
 }
 
 void zram_reset_device(struct zram *zram)
@@ -544,7 +521,19 @@ int zram_init_device(struct

[PATCH v3 4/4] zram: Fix deadlock bug in partial write

2013-01-20 Thread Minchan Kim

Now zram allocates new page with GFP_KERNEL in zram I/O path
if IO is partial. Unfortunately, It may cuase deadlock with
reclaim path so this patch solves the problem.

Cc: Nitin Gupta 
Cc: Jerome Marchand 
Signed-off-by: Minchan Kim 
---
We could use GFP_IO instead of GFP_ATOMIC in zram_bvec_read with
some modification related to buffer allocation in case of partial IO.
But it needs more churn and prevent merge this patch into stable
if we should send this to stable so I'd like to keep it as simple
as possbile. GFP_IO usage could be separate patch after we merge it.
Thanks.

 drivers/staging/zram/zram_drv.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 1f6938a..e00397f 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -192,7 +192,7 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec 
*bvec,
user_mem = kmap_atomic(page);
if (is_partial_io(bvec))
/* Use  a temporary buffer to decompress the page */
-   uncmem = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   uncmem = kmalloc(PAGE_SIZE, GFP_ATOMIC);
else
uncmem = user_mem;
 
@@ -240,7 +240,7 @@ static int zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index,
 * This is a partial IO. We need to read the full page
 * before to write the changes.
 */
-   uncmem = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   uncmem = kmalloc(PAGE_SIZE, GFP_NOIO);
if (!uncmem) {
pr_info("Error allocating temp memory!\n");
ret = -ENOMEM;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/4] zram: give up lazy initialization of zram metadata

2013-01-20 Thread Minchan Kim

1) User of zram normally do mkfs.xxx or mkswap before using
   the zram block device(ex, normally, do it at booting time)
   It ends up allocating such metadata of zram before real usage so
   benefit of lazy initialzation would be mitigated.

2) Some user want to use zram when memory pressure is high.(ie, load zram
   dynamically, NOT booting time). It does make sense because people don't
   want to waste memory until memory pressure is high(ie, where zram is really
   helpful time). In this case, lazy initialzation could be failed easily
   because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
   So the benefit of lazy initialzation would be mitigated, too.

3) Metadata overhead is not critical and Nitin has a plan to diet it.
   4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
   If insane user use such big zram device up to 20, it could consume 6% of ram
   but efficieny of zram will cover the waste.

So this patch gives up lazy initialization and instead we initialize metadata
at disksize setting time.

Cc: Nitin Gupta 
Signed-off-by: Minchan Kim 
---
 drivers/staging/zram/zram_drv.c   |   20 
 drivers/staging/zram/zram_sysfs.c |1 +
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 1d45401..e95e37c 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -440,16 +440,13 @@ static void zram_make_request(struct request_queue 
*queue, struct bio *bio)
 {
struct zram *zram = queue->queuedata;
 
-   if (unlikely(!zram->init_done) && zram_init_device(zram))
-   goto error;
-
down_read(>init_lock);
if (unlikely(!zram->init_done))
-   goto error_unlock;
+   goto error;
 
if (!valid_io_request(zram, bio)) {
zram_stat64_inc(zram, >stats.invalid_io);
-   goto error_unlock;
+   goto error;
}
 
__zram_make_request(zram, bio, bio_data_dir(bio));
@@ -457,9 +454,8 @@ static void zram_make_request(struct request_queue *queue, 
struct bio *bio)
 
return;
 
-error_unlock:
-   up_read(>init_lock);
 error:
+   up_read(>init_lock);
bio_io_error(bio);
 }
 
@@ -509,18 +505,12 @@ void zram_reset_device(struct zram *zram)
up_write(>init_lock);
 }
 
+/* zram->init_lock should be held */
 int zram_init_device(struct zram *zram)
 {
int ret;
size_t num_pages;
 
-   down_write(>init_lock);
-
-   if (zram->init_done) {
-   up_write(>init_lock);
-   return 0;
-   }
-
if (zram->disksize > 2 * (totalram_pages << PAGE_SHIFT)) {
pr_info(
"There is little point creating a zram of greater than "
@@ -569,7 +559,6 @@ int zram_init_device(struct zram *zram)
}
 
zram->init_done = 1;
-   up_write(>init_lock);
 
pr_debug("Initialization done!\n");
return 0;
@@ -579,7 +568,6 @@ fail_no_table:
zram->disksize = 0;
 fail:
__zram_reset_device(zram);
-   up_write(>init_lock);
pr_err("Initialization failed: err=%d\n", ret);
return ret;
 }
diff --git a/drivers/staging/zram/zram_sysfs.c 
b/drivers/staging/zram/zram_sysfs.c
index 4143af9..369db12 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -71,6 +71,7 @@ static ssize_t disksize_store(struct device *dev,
 
zram->disksize = PAGE_ALIGN(disksize);
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+   zram_init_device(zram);
up_write(>init_lock);
 
return len;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] zram: get rid of lockdep warning

2013-01-20 Thread Minchan Kim

On Fri, Jan 18, 2013 at 01:34:18PM -0800, Nitin Gupta wrote:
> On Wed, Jan 16, 2013 at 6:12 PM, Minchan Kim  wrote:
> > Lockdep complains about recursive deadlock of zram->init_lock.
> > [1] made it false positive because we can't request IO to zram
> > before setting disksize. Anyway, we should shut lockdep up to
> > avoid many reporting from user.
> >
> > This patch allocates zram's metadata out of lock so we can fix it.
> > In addition, this patch replace GFP_KERNEL with GFP_NOIO/GFP_ATOMIC
> > in request handle path for partial I/O.
> >
> > [1] zram: give up lazy initialization of zram metadata
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  drivers/staging/zram/zram_drv.c   |  194 
> > +++--
> >  drivers/staging/zram/zram_drv.h   |   12 ++-
> >  drivers/staging/zram/zram_sysfs.c |   13 ++-
> >  3 files changed, 118 insertions(+), 101 deletions(-)
> >
> > diff --git a/drivers/staging/zram/zram_drv.c 
> > b/drivers/staging/zram/zram_drv.c
> > index 3693780..eb1bc37 100644
> > --- a/drivers/staging/zram/zram_drv.c
> > +++ b/drivers/staging/zram/zram_drv.c
> > @@ -71,22 +71,22 @@ static void zram_stat64_inc(struct zram *zram, u64 *v)
> > zram_stat64_add(zram, v, 1);
> >  }
> >
> > -static int zram_test_flag(struct zram *zram, u32 index,
> > +static int zram_test_flag(struct zram_meta *meta, u32 index,
> > enum zram_pageflags flag)
> >  {
> > -   return zram->table[index].flags & BIT(flag);
> > +   return meta->table[index].flags & BIT(flag);
> >  }
> >
> > -static void zram_set_flag(struct zram *zram, u32 index,
> > +static void zram_set_flag(struct zram_meta *meta, u32 index,
> > enum zram_pageflags flag)
> >  {
> > -   zram->table[index].flags |= BIT(flag);
> > +   meta->table[index].flags |= BIT(flag);
> >  }
> >
> > -static void zram_clear_flag(struct zram *zram, u32 index,
> > +static void zram_clear_flag(struct zram_meta *meta, u32 index,
> > enum zram_pageflags flag)
> >  {
> > -   zram->table[index].flags &= ~BIT(flag);
> > +   meta->table[index].flags &= ~BIT(flag);
> >  }
> >
> >  static int page_zero_filled(void *ptr)
> > @@ -106,16 +106,17 @@ static int page_zero_filled(void *ptr)
> >
> >  static void zram_free_page(struct zram *zram, size_t index)
> >  {
> > -   unsigned long handle = zram->table[index].handle;
> > -   u16 size = zram->table[index].size;
> > +   struct zram_meta *meta = zram->meta;
> > +   unsigned long handle = meta->table[index].handle;
> > +   u16 size = meta->table[index].size;
> >
> > if (unlikely(!handle)) {
> > /*
> >  * No memory is allocated for zero filled pages.
> >  * Simply clear zero page flag.
> >  */
> > -   if (zram_test_flag(zram, index, ZRAM_ZERO)) {
> > -   zram_clear_flag(zram, index, ZRAM_ZERO);
> > +   if (zram_test_flag(meta, index, ZRAM_ZERO)) {
> > +   zram_clear_flag(meta, index, ZRAM_ZERO);
> > zram_stat_dec(>stats.pages_zero);
> > }
> > return;
> > @@ -124,17 +125,17 @@ static void zram_free_page(struct zram *zram, size_t 
> > index)
> > if (unlikely(size > max_zpage_size))
> > zram_stat_dec(>stats.bad_compress);
> >
> > -   zs_free(zram->mem_pool, handle);
> > +   zs_free(meta->mem_pool, handle);
> >
> > if (size <= PAGE_SIZE / 2)
> > zram_stat_dec(>stats.good_compress);
> >
> > zram_stat64_sub(zram, >stats.compr_size,
> > -   zram->table[index].size);
> > +   meta->table[index].size);
> > zram_stat_dec(>stats.pages_stored);
> >
> > -   zram->table[index].handle = 0;
> > -   zram->table[index].size = 0;
> > +   meta->table[index].handle = 0;
> > +   meta->table[index].size = 0;
> >  }
> >
> >  static void handle_zero_page(struct bio_vec *bvec)
> > @@ -159,20 +160,21 @@ static int zram_decompress_page(struct zram *zram, 
> > char *mem, u32 index)
> > int ret = LZO_E_OK;
> > size_t clen = PAGE_SIZE;
> > unsigned char *cmem;
> > -   unsigned long handle = zram->table[index].handle;
> > +   struct zram_meta *meta = zram->meta;
> > +   unsigned long handle = meta->table[index].handle;
> >
> > -   if (!handle || zram_test_flag(zram, index, ZRAM_ZERO)) {
> > +   if (!handle || zram_test_flag(meta, index, ZRAM_ZERO)) {
> > memset(mem, 0, PAGE_SIZE);
> > return 0;
> > }
> >
> > -   cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
> > -   if (zram->table[index].size == PAGE_SIZE)
> > +   cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
> > +   if (meta->table[index].size == PAGE_SIZE)
> > memcpy(mem, cmem, PAGE_SIZE);
> > else
> > -   ret =

Re: [PATCH] dw_dmac: move soft LLP code from tasklet to dwc_scan_descriptors

2013-01-20 Thread Vinod Koul

On Fri, Jan 18, 2013 at 02:14:15PM +0200, Andy Shevchenko wrote:
> The proper place for the main logic of the soft LLP mode is
> dwc_scan_descriptors. It prevents to get the transfer unexpectedly aborted in
> case the user calls dwc_tx_status.
> 
> Signed-off-by: Andy Shevchenko 
Applied, Thanks

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] dw_dmac: don't exceed AHB master number in dwc_get_data_width

2013-01-20 Thread Vinod Koul

On Thu, Jan 17, 2013 at 01:35:47PM +0530, Viresh Kumar wrote:
> On Thu, Jan 17, 2013 at 1:33 PM, Andy Shevchenko
>  wrote:
> > The driver assumes that hardware has two AHB masters which might not be 
> > always
> > true. In such cases we must not exceed number of the AHB masters present in 
> > the
> > hardware. In the proposed scheme in this patch, we would choose the master 
> > with
> > highest possible number whenever we exceed max AHB masters.
> >
> > Signed-off-by: Andy Shevchenko 
> > Acked-by: Viresh Kumar 
Applied, Thanks

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] dw_dmac: allocate dma descriptors from DMA_COHERENT memory

2013-01-20 Thread Vinod Koul

On Wed, Jan 16, 2013 at 03:48:50PM +0200, Andy Shevchenko wrote:
> Currently descriptors are allocated from normal cacheable memory and that 
> slows
> down filling the descriptors, as we need to call cache_coherency routines
> afterwards. It would be better to allocate memory for these descriptors from
> DMA_COHERENT memory. This would make code much cleaner too.
> 
> Signed-off-by: Andy Shevchenko 
> Tested-by: Mika Westerberg 
> Acked-by: Viresh Kumar 
Applied Thanks

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Michael Wang

On 01/21/2013 12:38 PM, Mike Galbraith wrote:
> On Mon, 2013-01-21 at 10:50 +0800, Michael Wang wrote: 
>> On 01/20/2013 12:09 PM, Mike Galbraith wrote:
>>> On Thu, 2013-01-17 at 13:55 +0800, Michael Wang wrote: 
 Hi, Mike

 I've send out the v2, which I suppose it will fix the below BUG and
 perform better, please do let me know if it still cause issues on your
 arm7 machine.
>>>
>>> s/arm7/aim7
>>>
>>> Someone swiped half of CPUs/ram, so the box is now 2 10 core nodes vs 4.
>>>
>>> stock scheduler knobs
>>>
>>> 3.8-wang-v2 avg 3.8-virgin  
>>> avgvs wang
>>> Tasksjobs/min
>>> 1  436.29435.66435.97435.97437.86441.69
>>> 440.09439.88  1.008
>>> 5 2361.65   2356.14   2350.66   2356.15   2416.27   2563.45   
>>> 2374.61   2451.44  1.040
>>>10 4767.90   4764.15   4779.18   4770.41   4946.94   4832.54   
>>> 4828.69   4869.39  1.020
>>>20 9672.79   9703.76   9380.80   9585.78   9634.34   9672.79   
>>> 9727.13   9678.08  1.009
>>>4019162.06  19207.61  19299.36  19223.01  19268.68  19192.40  
>>> 19056.60  19172.56   .997
>>>8037610.55  37465.22  37465.22  37513.66  37263.64  37120.98  
>>> 37465.22  37283.28   .993
>>>   16069306.65  69655.17  69257.14  69406.32  69257.14  69306.65  
>>> 69257.14  69273.64   .998
>>>   320   111512.36 109066.37 111256.45 110611.72 108395.75 107913.19 
>>> 108335.20 108214.71   .978
>>>   640   142850.83 148483.92 150851.81 147395.52 151974.92 151263.65 
>>> 151322.67 151520.41  1.027
>>>  128052788.89  52706.39  67280.77  57592.01 189931.44 189745.60 
>>> 189792.02 189823.02  3.295
>>>  256075403.91  52905.91  45196.21  57835.34 217368.64 217582.05 
>>> 217551.54 217500.74  3.760
>>>
>>> sched_latency_ns = 24ms
>>> sched_min_granularity_ns = 8ms
>>> sched_wakeup_granularity_ns = 10ms
>>>
>>> 3.8-wang-v2 avg 3.8-virgin  
>>> avgvs wang
>>> Tasksjobs/min
>>> 1  436.29436.60434.72435.87434.41439.77
>>> 438.81437.66  1.004
>>> 5 2382.08   2393.36   2451.46   2408.96   2451.46   2453.44   
>>> 2425.94   2443.61  1.014
>>>10 5029.05   4887.10   5045.80   4987.31   4844.12   4828.69   
>>> 4844.12   4838.97   .970
>>>20 9869.71   9734.94   9758.45   9787.70   9513.34   9611.42   
>>> 9565.90   9563.55   .977
>>>4019146.92  19146.92  19192.40  19162.08  18617.51  18603.22  
>>> 18517.95  18579.56   .969
>>>8037177.91  37378.57  37292.31  37282.93  36451.13  36179.10  
>>> 36233.18  36287.80   .973
>>>   16070260.87  69109.05  69207.71  69525.87  68281.69  68522.97  
>>> 68912.58  68572.41   .986
>>>   320   114745.56 113869.64 114474.62 114363.27 114137.73 114137.73 
>>> 114137.73 114137.73   .998
>>>   640   164338.98 164338.98 164618.00 164431.98 164130.34 164130.34 
>>> 164130.34 164130.34   .998
>>>  1280   209473.40 209134.54 209473.40 209360.44 210040.62 210040.62 
>>> 210097.51 210059.58  1.003
>>>  2560   242703.38 242627.46 242779.34 242703.39 244001.26 243847.85 
>>> 243732.91 243860.67  1.004
>>>
>>> As you can see, the load collapsed at the high load end with stock
>>> scheduler knobs (desktop latency).  With knobs set to scale, the delta
>>> disappeared.
>>
>> Thanks for the testing, Mike, please allow me to ask few questions.
>>
>> What are those tasks actually doing? what's the workload?
> 
> It's the canned aim7 compute load, mixed bag load weighted toward
> compute.  Below is the workfile, should give you an idea.
> 
> # @(#) workfile.compute:1.3 1/22/96 00:00:00
> # Compute Server Mix
> FILESIZE: 100K
> POOLSIZE: 250M
> 50  add_double
> 30  add_int
> 30  add_long
> 10  array_rtns
> 10  disk_cp
> 30  disk_rd
> 10  disk_src
> 20  disk_wrt
> 40  div_double
> 30  div_int
> 50  matrix_rtns
> 40  mem_rtns_1
> 40  mem_rtns_2
> 50  mul_double
> 30  mul_int
> 30  mul_long
> 40  new_raph
> 40  num_rtns_1
> 50  page_test
> 40  series_1
> 10  shared_memory
> 30  sieve
> 20  stream_pipe
> 30  string_rtns
> 40  trig_rtns
> 20  udp_test
> 

That seems like the default one, could you please show me the numbers in
your datapoint file?

I'm not familiar with this benchmark, but I'd like to have a try on my
server, to make sure whether it is a generic issue.

>> And I'm confusing about how those new parameter value was figured out
>> and how could them help solve the possible issue?
> 
> Oh, that's easy.  I set sched_min_granularity_ns such that last_buddy
> kicks in when a third task arrives on a runqueue, and set
> sched_wakeup_granularity_ns near minimum that still allows wakeup
> preemption to occur.  Combined effect is reduced over-scheduling.

That sounds very hard, to catch the timing,

Re: [PATCH] perf evsel: fix NULL pointer deference when evsel->counts is NULL

2013-01-20 Thread Namhyung Kim

Hi Colin,

On Sat, 19 Jan 2013 16:36:54 +, Colin King wrote:
> From: Colin Ian King 
>
> __perf_evsel__read_on_cpu() only bails out with -ENOMEM if
> evsel->counts is NULL and perf_evsel__alloc_counts() has returned
> an error.  If perf_evsel__alloc_counts() does not return an error
> we get an NULL pointer deference on evsel->counts->cpu[cpu]
> if evsel->counts is NULL.

perf_evsel__alloc_counts() should allocate evsel->counts when it sees
evsel->counts is NULL and return negative error code if the allocation
fails.

So I don't see any problem in current code.  With your code, it won't
try to allocate if ->counts is NULL but overwrite existing ->counts?

Thanks,
Namhyung

>
> Signed-off-by: Colin Ian King 
> ---
>  tools/perf/util/evsel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1b16dd1..93acd06 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -640,7 +640,7 @@ int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
>   if (FD(evsel, cpu, thread) < 0)
>   return -EINVAL;
>  
> - if (evsel->counts == NULL && perf_evsel__alloc_counts(evsel, cpu + 1) < 
> 0)
> + if (evsel->counts == NULL || perf_evsel__alloc_counts(evsel, cpu + 1) < 
> 0)
>   return -ENOMEM;
>  
>   if (readn(FD(evsel, cpu, thread), , nv * sizeof(u64)) < 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the samsung tree with the gpio-lw tree

2013-01-20 Thread Stephen Rothwell

Hi Kukjin,

Today's linux-next merge of the samsung tree got a conflict in
drivers/gpio/gpio-samsung.c between commit 6948ce588bd7 ("gpio: samsung:
skip gpio lib registration for EXYNOS5440") from the gpio-lw tree and
commit bda7f6d4e198 ("gpio: samsung: skip gpiolib registration if pinctrl
support is enabled for exynos5250") from the samsung tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpio/gpio-samsung.c
index 76be7ee,0d46db6..000
--- a/drivers/gpio/gpio-samsung.c
+++ b/drivers/gpio/gpio-samsung.c
@@@ -3025,7 -3025,7 +3024,8 @@@ static __init int samsung_gpiolib_init(
static const struct of_device_id exynos_pinctrl_ids[] = {
{ .compatible = "samsung,pinctrl-exynos4210", },
{ .compatible = "samsung,pinctrl-exynos4x12", },
+   { .compatible = "samsung,pinctrl-exynos5250", },
 +  { .compatible = "samsung,pinctrl-exynos5440", },
};
for_each_matching_node(pctrl_np, exynos_pinctrl_ids)
if (pctrl_np && of_device_is_available(pctrl_np))


pgpWAN1kPpeon.pgp
Description: PGP signature

Re: [PATCH 5/6] OF: Introduce Device Tree resolve support.

2013-01-20 Thread David Gibson

On Fri, Jan 04, 2013 at 09:31:09PM +0200, Pantelis Antoniou wrote:
> Introduce support for dynamic device tree resolution.
> Using it, it is possible to prepare a device tree that's
> been loaded on runtime to be modified and inserted at the kernel
> live tree.
> 
> Signed-off-by: Pantelis Antoniou 
> ---
>  .../devicetree/dynamic-resolution-notes.txt|  25 ++
>  drivers/of/Kconfig |   9 +
>  drivers/of/Makefile|   1 +
>  drivers/of/resolver.c  | 394 
> +
>  include/linux/of.h |  17 +
>  5 files changed, 446 insertions(+)
>  create mode 100644 Documentation/devicetree/dynamic-resolution-notes.txt
>  create mode 100644 drivers/of/resolver.c
> 
> diff --git a/Documentation/devicetree/dynamic-resolution-notes.txt 
> b/Documentation/devicetree/dynamic-resolution-notes.txt
> new file mode 100644
> index 000..0b396c4
> --- /dev/null
> +++ b/Documentation/devicetree/dynamic-resolution-notes.txt
> @@ -0,0 +1,25 @@
> +Device Tree Dynamic Resolver Notes
> +--
> +
> +This document describes the implementation of the in-kernel
> +Device Tree resolver, residing in drivers/of/resolver.c and is a
> +companion document to Documentation/devicetree/dt-object-internal.txt[1]
> +
> +How the resolver works
> +--
> +
> +The resolver is given as an input an arbitrary tree compiled with the
> +proper dtc option and having a /plugin/ tag. This generates the
> +appropriate __fixups__ & __local_fixups__ nodes as described in [1].
> +
> +In sequence the resolver works by the following steps:
> +
> +1. Get the maximum device tree phandle value from the live tree + 1.
> +2. Adjust all the local phandles of the tree to resolve by that amount.
> +3. Using the __local__fixups__ node information adjust all local references
> +   by the same amount.
> +4. For each property in the __fixups__ node locate the node it references
> +   in the live tree. This is the label used to tag the node.
> +5. Retrieve the phandle of the target of the fixup.
> +5. For each fixup in the property locate the node:property:offset location
> +   and replace it with the phandle value.

Hrm.  So, I'm really still not convinced by this approach.

First, I think it's unwise to allow overlays to change
essentially anything in the base tree, rather than having the base
tree define sockets of some sort where things can be attached.

Second, even allowing overlays to change anything, I don't see
a lot of reason to do this kind of resolution within the kernel and
with data stored in the dtb itself, rather than doing the resolution
in userspace from an annotated overlay dts or dtb, then inserting the
fully resolved product into the kernel.  In either case, the overlay
needs to be constructed with pretty intimate knowledge of the base
tree.

That said, I have some implementation comments below.

[snip]
> +/**
> + * Find a subtree's maximum phandle value.
> + */
> +static phandle __of_get_tree_max_phandle(struct device_node *node,
> + phandle max_phandle)
> +{
> + struct device_node *child;
> +
> + if (node->phandle != 0 && node->phandle != OF_PHANDLE_ILLEGAL &&
> + node->phandle > max_phandle)
> + max_phandle = node->phandle;
> +
> + __for_each_child_of_node(node, child)
> + max_phandle = __of_get_tree_max_phandle(child, max_phandle);

Recursion is best avoided given the kernel's limited stack space.
This is also trivial to implement non-recursively, using the allnext
pointer.

> +
> + return max_phandle;
> +}
> +
> +/**
> + * Find live tree's maximum phandle value.
> + */
> +static phandle of_get_tree_max_phandle(void)
> +{
> + struct device_node *node;
> + phandle phandle;
> +
> + /* get root node */
> + node = of_find_node_by_path("/");
> + if (node == NULL)
> + return OF_PHANDLE_ILLEGAL;
> +
> + /* now search recursively */
> + read_lock(_lock);
> + phandle = __of_get_tree_max_phandle(node, 0);
> + read_unlock(_lock);
> +
> + of_node_put(node);
> +
> + return phandle;
> +}
> +
> +/**
> + * Adjust a subtree's phandle values by a given delta.
> + * Makes sure not to just adjust the device node's phandle value,
> + * but modify the phandle properties values as well.
> + */
> +static void __of_adjust_tree_phandles(struct device_node *node,
> + int phandle_delta)
> +{
> + struct device_node *child;
> + struct property *prop;
> + phandle phandle;
> +
> + /* first adjust the node's phandle direct value */
> + if (node->phandle != 0 && node->phandle != OF_PHANDLE_ILLEGAL)
> + node->phandle += phandle_delta;

You need to have some kind of check for overflow here, or the adjusted
phandle could be one of the illegal values (0 or -1) - or wrap around
and colllide with existing phandle values

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Mike Galbraith

On Mon, 2013-01-21 at 10:50 +0800, Michael Wang wrote: 
> On 01/20/2013 12:09 PM, Mike Galbraith wrote:
> > On Thu, 2013-01-17 at 13:55 +0800, Michael Wang wrote: 
> >> Hi, Mike
> >>
> >> I've send out the v2, which I suppose it will fix the below BUG and
> >> perform better, please do let me know if it still cause issues on your
> >> arm7 machine.
> > 
> > s/arm7/aim7
> > 
> > Someone swiped half of CPUs/ram, so the box is now 2 10 core nodes vs 4.
> > 
> > stock scheduler knobs
> > 
> > 3.8-wang-v2 avg 3.8-virgin  
> > avgvs wang
> > Tasksjobs/min
> > 1  436.29435.66435.97435.97437.86441.69
> > 440.09439.88  1.008
> > 5 2361.65   2356.14   2350.66   2356.15   2416.27   2563.45   
> > 2374.61   2451.44  1.040
> >10 4767.90   4764.15   4779.18   4770.41   4946.94   4832.54   
> > 4828.69   4869.39  1.020
> >20 9672.79   9703.76   9380.80   9585.78   9634.34   9672.79   
> > 9727.13   9678.08  1.009
> >4019162.06  19207.61  19299.36  19223.01  19268.68  19192.40  
> > 19056.60  19172.56   .997
> >8037610.55  37465.22  37465.22  37513.66  37263.64  37120.98  
> > 37465.22  37283.28   .993
> >   16069306.65  69655.17  69257.14  69406.32  69257.14  69306.65  
> > 69257.14  69273.64   .998
> >   320   111512.36 109066.37 111256.45 110611.72 108395.75 107913.19 
> > 108335.20 108214.71   .978
> >   640   142850.83 148483.92 150851.81 147395.52 151974.92 151263.65 
> > 151322.67 151520.41  1.027
> >  128052788.89  52706.39  67280.77  57592.01 189931.44 189745.60 
> > 189792.02 189823.02  3.295
> >  256075403.91  52905.91  45196.21  57835.34 217368.64 217582.05 
> > 217551.54 217500.74  3.760
> > 
> > sched_latency_ns = 24ms
> > sched_min_granularity_ns = 8ms
> > sched_wakeup_granularity_ns = 10ms
> > 
> > 3.8-wang-v2 avg 3.8-virgin  
> > avgvs wang
> > Tasksjobs/min
> > 1  436.29436.60434.72435.87434.41439.77
> > 438.81437.66  1.004
> > 5 2382.08   2393.36   2451.46   2408.96   2451.46   2453.44   
> > 2425.94   2443.61  1.014
> >10 5029.05   4887.10   5045.80   4987.31   4844.12   4828.69   
> > 4844.12   4838.97   .970
> >20 9869.71   9734.94   9758.45   9787.70   9513.34   9611.42   
> > 9565.90   9563.55   .977
> >4019146.92  19146.92  19192.40  19162.08  18617.51  18603.22  
> > 18517.95  18579.56   .969
> >8037177.91  37378.57  37292.31  37282.93  36451.13  36179.10  
> > 36233.18  36287.80   .973
> >   16070260.87  69109.05  69207.71  69525.87  68281.69  68522.97  
> > 68912.58  68572.41   .986
> >   320   114745.56 113869.64 114474.62 114363.27 114137.73 114137.73 
> > 114137.73 114137.73   .998
> >   640   164338.98 164338.98 164618.00 164431.98 164130.34 164130.34 
> > 164130.34 164130.34   .998
> >  1280   209473.40 209134.54 209473.40 209360.44 210040.62 210040.62 
> > 210097.51 210059.58  1.003
> >  2560   242703.38 242627.46 242779.34 242703.39 244001.26 243847.85 
> > 243732.91 243860.67  1.004
> > 
> > As you can see, the load collapsed at the high load end with stock
> > scheduler knobs (desktop latency).  With knobs set to scale, the delta
> > disappeared.
> 
> Thanks for the testing, Mike, please allow me to ask few questions.
> 
> What are those tasks actually doing? what's the workload?

It's the canned aim7 compute load, mixed bag load weighted toward
compute.  Below is the workfile, should give you an idea.

# @(#) workfile.compute:1.3 1/22/96 00:00:00
# Compute Server Mix
FILESIZE: 100K
POOLSIZE: 250M
50  add_double
30  add_int
30  add_long
10  array_rtns
10  disk_cp
30  disk_rd
10  disk_src
20  disk_wrt
40  div_double
30  div_int
50  matrix_rtns
40  mem_rtns_1
40  mem_rtns_2
50  mul_double
30  mul_int
30  mul_long
40  new_raph
40  num_rtns_1
50  page_test
40  series_1
10  shared_memory
30  sieve
20  stream_pipe
30  string_rtns
40  trig_rtns
20  udp_test

> And I'm confusing about how those new parameter value was figured out
> and how could them help solve the possible issue?

Oh, that's easy.  I set sched_min_granularity_ns such that last_buddy
kicks in when a third task arrives on a runqueue, and set
sched_wakeup_granularity_ns near minimum that still allows wakeup
preemption to occur.  Combined effect is reduced over-scheduling.
> Do you have any idea about which part in this patch set may cause the issue?

Nope, I'm as puzzled by that as you are.  When the box had 40 cores,
both virgin and patched showed over-scheduling effects, but not like
this.  With 20 cores, symptoms changed in a most puzzling way, and I
don't see how you'd be directly responsible.

> One change by designed is that, for old logical, if it's

Re: [patch] module: potential deadlock in error path

2013-01-20 Thread Linus Torvalds

On Sun, Jan 20, 2013 at 7:52 PM, Rusty Russell  wrote:
>
> You've now conflated two completely different lock paths into a single
> unlock.

We have that elsewhere too. And it's what we used to have before too.

So the simple fact is that commit 1fb9341ac348 just introduced this
bug, and moving the goto target around is the obvious fix for it, and
makes it match the old code that was simply incorrectly modified.

The suggested patch instead has *some* cleanup inside the
if-statement, and some at the goto target. That makes no sense to
humans, and just makes it harder for the compiler to generate better
code.

> mutex_bug_cleanup() should really lock internally, but doesn't
> so we wrap it.  And that mutex_unlock of yours has nothing to do with
> cleaning up ddebug, so the labels misnamed, at best.

Bah, humbug. It's called "ddebug_cleanup" because it's called after
the  debug setup, so it needs to clean up the state set up by that.
The fact that it needs to unlock is secondary, and is simply because
the lock is taken at that point, so needs to be released. The naming
is not wonderful, but it's not hugely illogical, and again, that's
what it used to (except "ddebug" has been renamed to
"ddebug_cleanup"). You could rename it if you want to (we used to have
a target called "unlock" at that point), but that's *still* no excuse
for just creating code that does cleanup in two totally unrelated
places.

> Not that it matters much: this is going to change for next merge window.

Now, agreed, that looks better, although I suspect you could have
taken the "split that ugly function up" further still.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the security tree with Linus' tree

2013-01-20 Thread Stephen Rothwell

Hi Mimi,

On Sun, 20 Jan 2013 22:10:23 -0500 Mimi Zohar  wrote:
>
> Sorry Stephen, the merged result should look like what's contained in
> linux-integrity/next-upstreamed-patches:
> 
> int ima_module_check(struct file *file)
> {
> if (!file) {
> if ((ima_appraise & IMA_APPRAISE_MODULES) &&
> (ima_appraise & IMA_APPRAISE_ENFORCE)) {
> #ifndef CONFIG_MODULE_SIG_FORCE
> return -EACCES; /* INTEGRITY_UNKNOWN */
> #endif
> }
> return 0;
> }
> return process_measurement(file, file->f_dentry->d_name.name,
>MAY_EXEC, MODULE_CHECK);
> }

OK, I will use that version tomorrow.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgppP6zI04_oW.pgp
Description: PGP signature

linux-next: build failure after merge of the usb tree

2013-01-20 Thread Stephen Rothwell

Hi Greg,

After merging the usb tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

drivers/usb/core/port.c: In function 'usb_port_device_release':
drivers/usb/core/port.c:25:2: error: implicit declaration of function 'kfree' 
[-Werror=implicit-function-declaration]
drivers/usb/core/port.c: In function 'usb_hub_create_port_device':
drivers/usb/core/port.c:38:2: error: implicit declaration of function 'kzalloc' 
[-Werror=implicit-function-declaration]
drivers/usb/core/port.c:38:11: warning: assignment makes pointer from integer 
without a cast [enabled by default]

Caused by commit 6e30d7cba992 ("usb: Add driver/usb/core/(port.c,hub.h)
files").  See Rule 1 in Documentation/SubmitChecklist.

I have used the usb tree from next-20130118 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpUu9N_89NCM.pgp
Description: PGP signature

Re: splice() giving unexpected EOF in 3.7.3 and 3.8-rc4+

2013-01-20 Thread David Miller

From: Eric Dumazet 
Date: Fri, 18 Jan 2013 22:13:16 -0800

> On Fri, 2013-01-18 at 21:54 -0800, Eric Dumazet wrote:
> 
>> 
>> Hmm, this might be already fixed in net-next tree, could you try it ?
>> 
> 
> Yes, running your program on net-next seems OK.
> 
> David, we need the two following commits.

Tossed into 'net' and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firewire net: Use LL_RESERVED_SPACE(), HH_DATA_OFF().

2013-01-20 Thread David Miller

From: YOSHIFUJI Hideaki 
Date: Sun, 20 Jan 2013 17:03:07 +0900

> Signed-off-by: YOSHIFUJI Hideaki 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firewire net: Ensure checksumming in upper layer.

2013-01-20 Thread David Miller

From: YOSHIFUJI Hideaki 
Date: Sun, 20 Jan 2013 16:43:40 +0900

> It is wrong to set skb->ip_summed to CHECKSUM_UNNECESSARY unless
> the device has already checked it.
> 
> Signed-off-by: YOSHIFUJI Hideaki 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] kernel config template for running inside virtual machine

2013-01-20 Thread Mulyadi Santosa

On Mon, Jan 21, 2013 at 11:03 AM, Mulyadi Santosa
 wrote:
> Hello everybody
>
> With the significant usage of virtualization in recent years, I
> personally think there might be a need to easily generate somewhat
> more optimal kernel for running as VM guest.

To make it clearer, it will be saved under arch/x86/configs using name
like vm_defconfig or alike.

PS: I am not subscribed to linux-kernel@vger right now, so kindly cc:
with in your reply.


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fat: eliminate iterations in fat_search_long in case of EOD

2013-01-20 Thread Namjae Jeon

2013/1/20, OGAWA Hirofumi :
> Namjae Jeon  writes:
>
>> From: Namjae Jeon 
>>
>> When searching a directory for names, we can stop checking for further
>> entries if we detect End of Directory, i.e. if (de->name[0] == 0x00).The
>> current code traverses the cluster chain of a directory until a hit is
>> found or till the last cluster for that directory, ignoring the EOD mark.
>> Fix this.
>
> f_pos still works fine after this change?
Hi OGAWA.
I can not find f_pos usage in fat_search_long function.
Maybe, Have you seen other function such as __fat_readdir ?
Let me know your opinion.

Thanks.
>
>> Signed-off-by: Namjae Jeon 
>> Signed-off-by: Ravishankar N 
>> ---
>>  fs/fat/dir.c |4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/fat/dir.c b/fs/fat/dir.c
>> index 58bf744..cde0e69 100644
>> --- a/fs/fat/dir.c
>> +++ b/fs/fat/dir.c
>> @@ -484,10 +484,10 @@ parse_record:
>>  nr_slots = 0;
>>  if (de->name[0] == DELETED_FLAG)
>>  continue;
>> +if (!de->name[0])
>> +goto end_of_dir;
>>  if (de->attr != ATTR_EXT && (de->attr & ATTR_VOLUME))
>>  continue;
>> -if (de->attr != ATTR_EXT && IS_FREE(de->name))
>> -continue;
>>  if (de->attr == ATTR_EXT) {
>>  int status = fat_parse_long(inode, , , ,
>>  , _slots);
>
> --
> OGAWA Hirofumi 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the usb tree with Linus' tree

2013-01-20 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the usb tree got a conflict in
drivers/usb/serial/io_ti.c between commit 1ee0a224bc9a ("USB: io_ti: Fix
NULL dereference in chase_port()") from Linus' tree and commit
f40d781554ef ("USB: io_ti: kill custom closing_wait implementation") from
the usb tree.

I fixed it up (the latter removed the code fixed by the former, so I just
used thet) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgpzyd8_rXYBC.pgp
Description: PGP signature

Re: [PATCH 5/5] drivers: atm: checkpatch.pl fixed coding style issues in eni.c

2013-01-20 Thread David Miller

From: Patrik Karlin 
Date: Mon, 21 Jan 2013 00:12:55 +0100

> This patch fixes statement placement around if/else/for statments
> as suggested by checkpatch.pl
> 
> Signed-off-by: Patrik Kårlin 

This patch set is a good example of why nobody should
fix up coding style in such a robotic way in response
to codingstyle.pl complaints.

> - ATM_MAX_AAL5_PDU) eff = (length+3) >> 2;
> + ATM_MAX_AAL5_PDU) 

I bet you didn't even notice that in this change you are adding
trailing whitespace, the exact problem you fixed up for this file in a
previous patch of the series.

I really would encourage you to work on something else entirely.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] module: potential deadlock in error path

2013-01-20 Thread Rusty Russell

Linus Torvalds  writes:

> On Sun, Jan 20, 2013 at 5:20 PM, Rusty Russell  wrote:
>> Dan Carpenter  writes:
>>> We take the lock twice if we hit this goto.
>>>
>>> Signed-off-by: Dan Carpenter 
>>
>> Damn, just pushed that to Linus: should have read mail first.
>>
>> I've added this, thanks.
>
> I'm not pulling this. It seems stupid.
>
> Why isn't the fix just this (whitespace-damaged, cut-and-pasted)
> one-liner instead? I may be blind, but as far as I cal tell, there's
> exactly one single place we do that "giti ddebug_cleanup", and it
> wants to unlock the mutex, so we should just move the unlock down one
> line instead.
>
> Hmm? Is there some hidden magic going on that I can't see?

TBH, I find your change marginally less clear.

You've now conflated two completely different lock paths into a single
unlock.  mutex_bug_cleanup() should really lock internally, but doesn't
so we wrap it.  And that mutex_unlock of yours has nothing to do with
cleaning up ddebug, so the labels misnamed, at best.

> diff --git a/kernel/module.c b/kernel/module.c
> index d25e359279ae..eab08274ec9b 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3274,8 +3274,8 @@ again:
> /* module_bug_cleanup needs module_mutex protection */
> mutex_lock(_mutex);
> module_bug_cleanup(mod);
> -   mutex_unlock(_mutex);
>   ddebug_cleanup:
> +   mutex_unlock(_mutex);
> dynamic_debug_remove(info->debug);
> synchronize_sched();
> kfree(mod->args);

Not that it matters much: this is going to change for next merge window.
See below for freshly-minted patch (compiled, untested).

Nice to make module_bug_cleanup() lock internally but it's in bug.c,
and I've avoided making the module mutex non-static due to a history of
abuse...

Thanks,
Rusty.

module: clean up load_module a little more.

1fb9341ac34825aa40354e74d9a2c69df7d2c304 made our locking in
load_module more complicated: we grab the mutex once to insert the
module in the list, then again to upgrade it once it's formed.

Since the locking is self-contained, it's neater to do this in
separate functions.

diff --git a/kernel/module.c b/kernel/module.c
index 2b1d517..c0bc9b9 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3145,12 +3145,72 @@ static int may_init_module(void)
return 0;
 }
 
+/*
+ * We try to place it in the list now to make sure it's unique before
+ * we dedicate too many resources.  In particular, temporary percpu
+ * memory exhaustion.
+ */
+static int add_unformed_module(struct module *mod)
+{
+   int err;
+   struct module *old;
+
+   mod->state = MODULE_STATE_UNFORMED;
+
+again:
+   mutex_lock(_mutex);
+   if ((old = find_module_all(mod->name, true)) != NULL) {
+   if (old->state == MODULE_STATE_COMING
+   || old->state == MODULE_STATE_UNFORMED) {
+   /* Wait in case it fails to load. */
+   mutex_unlock(_mutex);
+   err = wait_event_interruptible(module_wq,
+  finished_loading(mod->name));
+   if (err)
+   goto out_unlocked;
+   goto again;
+   }
+   err = -EEXIST;
+   goto out;
+   }
+   list_add_rcu(>list, );
+   err = 0;
+
+out:
+   mutex_unlock(_mutex);
+out_unlocked:
+   return err;
+}
+
+static int complete_formation(struct module *mod, struct load_info *info)
+{
+   int err;
+
+   mutex_lock(_mutex);
+
+   /* Find duplicate symbols (must be called under lock). */
+   err = verify_export_symbols(mod);
+   if (err < 0)
+   goto out;
+
+   /* This relies on module_mutex for list integrity. */
+   module_bug_finalize(info->hdr, info->sechdrs, mod);
+
+   /* Mark state as coming so strong_try_module_get() ignores us,
+* but kallsyms etc. can see us. */
+   mod->state = MODULE_STATE_COMING;
+
+out:
+   mutex_unlock(_mutex);
+   return err;
+}
+
 /* Allocate and load the module: note that size of section 0 is always
zero, and we rely on this for optional sections. */
 static int load_module(struct load_info *info, const char __user *uargs,
   int flags)
 {
-   struct module *mod, *old;
+   struct module *mod;
long err;
 
err = module_sig_check(info);
@@ -3168,31 +3228,10 @@ static int load_module(struct load_info *info, const 
char __user *uargs,
goto free_copy;
}
 
-   /*
-* We try to place it in the list now to make sure it's unique
-* before we dedicate too many resources.  In particular,
-* temporary percpu memory exhaustion.
-*/
-   mod->state = MODULE_STATE_UNFORMED;
-again:
-   mutex_lock(_mutex);
-   if ((old = find_module_all(mod->name, true)) != NULL) {
-   if (old->state == MODULE_STATE_COMING
-   ||

[git pull] drm fixes

2013-01-20 Thread Dave Airlie


Hi Linus,

A bunch of intel and radeon fixes, along with two fixes to TTM code.

The correct fix for the Intel ironlake failure is in this, and should make 
things more stable, along with some misc radeon fixes.

Dave.

The following changes since commit 7b4cf994e4c6ba48872bb25253cc393b7fb74c82:

  udldrmfb: udl_get_edid: drop unneeded i-- (2013-01-14 08:45:27 +1000)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux.git drm-fixes

for you to fetch changes up to 014b34409fb2015f63663b6cafdf557fdf289628:

  ttm: on move memory failure don't leave a node dangling (2013-01-21 13:45:23 
+1000)


Alex Deucher (2):
  drm/radeon: clear reset flags if engines are idle
  Revert "drm/radeon: do not move bo to different placement at each cs"

Chris Wilson (2):
  drm/i915: Record DERRMR, FORCEWAKE and RING_CTL in error-state
  drm/i915: Invalidate the relocation presumed_offsets along the slow path

Dave Airlie (4):
  Merge branch 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux 
into drm-next
  Merge branch 'drm-intel-fixes' of 
git://people.freedesktop.org/~danvet/drm-intel into drm-next
  ttm: don't destroy old mm_node on memcpy failure
  ttm: on move memory failure don't leave a node dangling

Jani Nikula (2):
  drm/i915/eDP: do not write power sequence registers for ghost eDP
  drm/i915: fix FORCEWAKE posting reads

Jerome Glisse (1):
  drm/radeon: improve semaphore debugging on lockup

Marek Olšák (1):
  drm/radeon: allow FP16 color clear registers on r500

 drivers/gpu/drm/i915/i915_debugfs.c|  3 ++
 drivers/gpu/drm/i915/i915_drv.h|  3 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 21 +
 drivers/gpu/drm/i915/i915_irq.c| 11 +++
 drivers/gpu/drm/i915/i915_reg.h|  2 ++
 drivers/gpu/drm/i915/intel_dp.c| 47 --
 drivers/gpu/drm/i915/intel_pm.c| 17 +++
 drivers/gpu/drm/radeon/evergreen.c |  6 
 drivers/gpu/drm/radeon/ni.c|  6 
 drivers/gpu/drm/radeon/r600.c  |  6 
 drivers/gpu/drm/radeon/radeon.h|  3 +-
 drivers/gpu/drm/radeon/radeon_drv.c|  3 +-
 drivers/gpu/drm/radeon/radeon_object.c | 18 +++-
 drivers/gpu/drm/radeon/radeon_ring.c   |  2 ++
 drivers/gpu/drm/radeon/radeon_semaphore.c  |  4 +++
 drivers/gpu/drm/radeon/reg_srcs/rv515  |  2 ++
 drivers/gpu/drm/radeon/si.c|  6 
 drivers/gpu/drm/ttm/ttm_bo.c   |  1 +
 drivers/gpu/drm/ttm/ttm_bo_util.c  | 11 +--
 19 files changed, 140 insertions(+), 32 deletions(-)

linux-next: manual merge of the tty tree with Linus' tree

2013-01-20 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the tty tree got a conflict in
drivers/tty/serial/vt8500_serial.c between commit a6dd114e16cb ("tty:
serial: vt8500: fix return value check in vt8500_serial_probe()") from
Linus' tree and commit 12faa35ae5cb ("serial: vt8500: UART uses gated
clock rather than 24Mhz reference") from the tty tree.

I fixed it up (I just used the tty tree version - which included the
former fix) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgppyb2Wvi3qW.pgp
Description: PGP signature

RE: USB: storage: optimize the matching rules and support new switch command for Huawei USB storage devices

2013-01-20 Thread Fangxiaozhi (Franko)

Dear Greg:

> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Saturday, January 19, 2013 7:42 AM
> To: Fangxiaozhi (Franko)
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Xueguiying 
> (Zihan);
> Linlei (Lei Lin); Yili (Neil); Wangyuhua (Roger, Credit); Huqiao (C); 
> ba...@ti.com;
> mdharm-...@one-eyed-alien.net; sebast...@breakpoint.cc
> Subject: Re: USB: storage: optimize the matching rules and support new switch
> command for Huawei USB storage devices
> 
> On Mon, Jan 14, 2013 at 10:55:48AM +0800, fangxiaozhi 00110321 wrote:
> >
> > From: fangxiaozhi 
> >
> > 1. Optimize the matching rules with new macro for Huawei USB storage
> >devices, to avoid to load USB storage driver for the modem interface
> >with Huawei devices.
> > 2. Add to support new switch command for new Huawei USB dongles.
> >
> > Signed-off-by: fangxiaozhi 
> 
> Next time, please always use the scripts/checkpatch.pl tool to find any
> problems you might have made in your patch (you had trailing whitespace in
> this one, which I have fixed.)
> 
-Yes, I have checked my patch with scripts/checkpatch.pl tool before 
submitting.
-For this trailing whitespace error, I think that it is better readable to 
leave whitespace in our patch code. Isn't it?

> Also, you might want to use git, it makes creating the patches easier, that 
> way
> you don't end up with lines in the patch like this one:
> 
> > Binary files linux-3.8-rc3_orig/drivers/usb/storage/initializers.o and
> > linux-3.8-rc3/drivers/usb/storage/initializers.o differ
> 
> thanks,
> 
> greg k-h

Best Regards,
Franko Fang

Re: [PATCH v2 2/3] dma: edma: add device_channel_caps() support

2013-01-20 Thread Vinod Koul

On Sun, Jan 20, 2013 at 11:51:08AM -0500, Matt Porter wrote:
> The explanation in the cover letter mentions that dmaengine_slave_config() is
> required to be called prior to dmaengine_get_channel_caps(). If we
> switch to the alternative API, then that would go away including the
> dependency on direction.
Nope you got that wrong!

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/3] dmaengine: add per channel capabilities api

2013-01-20 Thread Vinod Koul

On Sun, Jan 20, 2013 at 11:37:35AM -0500, Matt Porter wrote:
> On Sun, Jan 20, 2013 at 12:37:34PM +, Vinod Koul wrote:
> > On Thu, Jan 10, 2013 at 02:07:03PM -0500, Matt Porter wrote:
> > > The call is implemented as follows:
> > > 
> > >   struct dmaengine_chan_caps
> > >   *dma_get_channel_caps(struct dma_chan *chan,
> > > enum dma_transfer_direction dir);
> > > 
> > > The dma transfer direction parameter may appear a bit out of place
> > > but it is necessary since the direction field in struct
> > > dma_slave_config was deprecated. In some cases, EDMA for one, it
> > > is necessary for the dmaengine driver to have the burst and address
> > > width slave configuration parameters available in order to compute
> > > the maximum segment size that can be handle. Due to this requirement,
> > > the calling order of this api is as follows:
> > Well you are passing direction as argument so even in EDMA it doesn't seem 
> > to
> > help you as you seem to need burst and width!. So why do you even need the
> > direction to compute the capablities
> 
> Yes, I need burst and width, but they are dependent on direction (dst vs
> src, as stored in the slave channel config). Ok, so I think I know where
> this is leading...the problem is probably that I made an implicit
> dependency on burst and width here. The expectation in this
And also due to wrong documentation. This is what you have put up the flow as:
Due to this requirement,
the calling order of this api is as follows:

1. Allocate a DMA slave channel
1a. [Optionally] Get channel capabilities
2. Set slave and controller specific parameters
3. Get a descriptor for transaction
4. Submit the transaction
5. Issue pending requests and wait for callback notification

Now when we query capablities, slave parameters _are_not_set_.
So seems like you have thought something and written something else!

Which brings me to the point on what are we trying to query:
a) API capability, dont need slave parameters for that
b) Sg segment length and numbers: Well these are capabilities, so it tells
you what is the maximum I can do. IMO it doesn't make sense to tie it down to
burst, width etc. For that configuration you are checking maximum. What this
needs to return is what is the maximum length it supports and maximum number of
sg list the h/w can use. Also if you return your burst and width capablity, then
any client can easily find out what is the length byte value it can hold.

If you feel this computaion if client specific, though looking at doesnt make me
think so, you can add a callback for this computaion given the parameters.

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: USB: storage: optimize the matching rules and support new switch command for Huawei USB storage devices

2013-01-20 Thread Fangxiaozhi (Franko)

Dear Greg:


> -Original Message-
> From: Greg KH [mailto:g...@kroah.com]
> Sent: Saturday, January 19, 2013 7:44 AM
> To: Fangxiaozhi (Franko)
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Xueguiying 
> (Zihan);
> Linlei (Lei Lin); Yili (Neil); Wangyuhua (Roger, Credit); Huqiao (C); 
> ba...@ti.com;
> mdharm-...@one-eyed-alien.net; sebast...@breakpoint.cc
> Subject: Re: USB: storage: optimize the matching rules and support new switch
> command for Huawei USB storage devices
> 
> On Mon, Jan 14, 2013 at 10:55:48AM +0800, fangxiaozhi 00110321 wrote:
> >
> > From: fangxiaozhi 
> >
> > 1. Optimize the matching rules with new macro for Huawei USB storage
> >devices, to avoid to load USB storage driver for the modem interface
> >with Huawei devices.
> > 2. Add to support new switch command for new Huawei USB dongles.
> >
> > Signed-off-by: fangxiaozhi 
> 
> This patch breaks the build, did you test it out?
> 
> I get the following errors:
> 
> drivers/usb/storage/unusual_devs.h:1530:1: error: implicit declaration of
> function ‘UNUSUAL_VENDOR_INTF’ [-Werror=implicit-function-declaration]
> drivers/usb/storage/unusual_devs.h:1534:3: warning: missing braces around
> initializer [-Wmissing-braces]
> drivers/usb/storage/unusual_devs.h:1534:3: warning: (near initialization for
> ‘us_unusual_dev_list[186]’) [-Wmissing-braces]
> drivers/usb/storage/unusual_devs.h:1534:3: error: initializer element is not
> constant
> drivers/usb/storage/unusual_devs.h:1534:3: error: (near initialization for
> ‘us_unusual_dev_list[186].vendorName’)
> drivers/usb/storage/unusual_devs.h:1537:1: warning: braces around scalar
> initializer [enabled by default]
> 
> And it goes on and on...
--The macro define, please see another patch: 
[PATCH 1/1]linux-usb:Define a new macro for USB storage match rules
http://www.spinics.net/lists/linux-usb/msg76629.html
> Care to fix this up and resend it?
> 
> thanks,
> 
> greg k-h

Best Regards,
Franko Fang

Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

2013-01-20 Thread Alex Shi

On 01/21/2013 10:40 AM, Preeti U Murthy wrote:
> Hi Alex,
> Thank you very much for running the below benchmark on
> blocked_load+runnable_load:) Just a few queries.
> 
> How did you do the wake up balancing? Did you iterate over the L3
> package looking for an idle cpu? Or did you just query the L2 package
> for an idle cpu?
> 

Just used the current select_idle_sibling function, so it search in L3
package.
> I think when you are using blocked_load+runnable_load it would be better
> if we just query the L2 package as Vincent had pointed out because the
> fundamental behind using blocked_load+runnable_load is to keep a steady
> state across cpus unless we could reap the advantage of moving the
> blocked load to a sibling core when it wakes up.
> 
> And the drop of performance is relative to what?

it is 2 VS 3.8-rc3
> 1.Your v3 patchset with runnable_load_avg in weighted_cpu_load().
> 2.Your v3 patchset with runnable_load_avg+blocked_load_avg in
> weighted_cpu_load().
> 
> Are the above two what you are comparing? And in the above two versions
> have you included your [PATCH] sched: use instant load weight in burst
> regular load balance?

no this patch.
> 
> On 01/20/2013 09:22 PM, Alex Shi wrote:
> The blocked load of a cluster will be high if the blocked tasks have
> run recently. The contribution of a blocked task will be divided by 2
> each 32ms, so it means that a high blocked load will be made of recent
> running tasks and the long sleeping tasks will not influence the load
> balancing.
> The load balance period is between 1 tick (10ms for idle load balance
> on ARM) and up to 256 ms (for busy load balance) so a high blocked
> load should imply some tasks that have run recently otherwise your
> blocked load will be small and will not have a large influence on your
> load balance
>>>
>>> Just tried using cfs's runnable_load_avg + blocked_load_avg in
>>> weighted_cpuload() with my v3 patchset, aim9 shared workfile testing
>>> show the performance dropped 70% more on the NHM EP machine. :(
>>>
>>
>> Ops, the performance is still worse than just count runnable_load_avg.
>> But dropping is not so big, it dropped 30%, not 70%.
>>
> 
> Thank you
> 
> Regards
> Preeti U Murthy
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PULL] Module fixes, and a virtio block fix.

2013-01-20 Thread Linus Torvalds

On Sun, Jan 20, 2013 at 6:57 PM, Rusty Russell  wrote:
>
> I'm confused.  The default argument is HEAD: what does it know about tag
> names?

Ugh. I actually thought that if you give it the tag name directly (as
the "end") it will use that.

But no. It figures it out with "git describe --exact" internally.
Regardless, if your HEAD is actually tagged, it *will* have the
tag-name in git-request-pull.

And it will have it based on your *local* repo, so the fact that it
hasn't been mirrored out yet doesn't really matter. git request-pull
knows that tag name regardless of mirroring issues.

> The bug is that if it can't find that commit at the remote end, it
> still generates a valid-looking request (with a warning at the end),
> where it guesses you're talking about the master branch.

It really shouldn't do that any more, but you seem to have the older
version with the bug.

At  least one of the annoying problems was fixed in the 1.7.11 series,
you have 1.7.10.

The nice thing about git is that it is *really* easy to upgrade. Just
fetch the sources, do "make; make install" all as a normal user, and
you do not need to worry about package management or distro issues or
any crap like that. It installs into your $(HOME)/bin, and as long as
your PATH has that first, you'll get it. I've long suggested that as
the workaround for distros having old versions (some more so than
others).

> Since I use a wrapper script now for your pull requests I can use sed to
> unscrew it:
>
> [alias]
> for-linus = !check-commits && TAGNAME=`git symbolic-ref HEAD | cut 
> -d/ -f3`-for-linus && git tag -f -u D1ADB8F1 $TAGNAME HEAD && git push korg 
> tag $TAGNAME && git request-pull master korg | sed 
> s,gitol...@ra.kernel.org:/pub,git://git.kernel.org/pub, && git log --stat 
> --reverse master..$TAGNAME | emails-from-log | grep -v 'rusty@rustcorp' | 
> grep -v 'sta...@kernel.org' | sed 's/^/Cc: /'

Heh. Ok. That will at least hide the breakage. But I suspect you could
fix it by just updating git.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-20 Thread paul . szabo

When calculating amount of dirtyable memory, min_free_kbytes should be
subtracted because it is not intended for dirty pages.

Using an "extern int" because that is the only interface to some such
sysctl values.

(This patch does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo 
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo 

--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100
@@ -343,12 +343,16 @@
 unsigned long determine_dirtyable_memory(void)
 {
unsigned long x;
+   extern int min_free_kbytes;
 
x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
 
+   /* Subtract min_free_kbytes */
+   x -= min(x, min_free_kbytes >> (PAGE_SHIFT - 10));
+
return x + 1;   /* Ensure that we never return 0 */
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the security tree with Linus' tree

2013-01-20 Thread Mimi Zohar

On Mon, 2013-01-21 at 13:12 +1100, Stephen Rothwell wrote:
> Hi James,
> 
> Today's linux-next merge of the security tree got a conflict in
> security/integrity/ima/ima_main.c between commit a7f2a366f623 ("ima:
> fallback to MODULE_SIG_ENFORCE for existing kernel module syscall") from
> Linus' tree and commit 750943a30714 ("ima: remove enforce checking
> duplication") from the security tree.
> 
> I think I fixed it up (see below).

Sorry Stephen, the merged result should look like what's contained in
linux-integrity/next-upstreamed-patches:

int ima_module_check(struct file *file)
{
if (!file) {
if ((ima_appraise & IMA_APPRAISE_MODULES) &&
(ima_appraise & IMA_APPRAISE_ENFORCE)) {
#ifndef CONFIG_MODULE_SIG_FORCE
return -EACCES; /* INTEGRITY_UNKNOWN */
#endif
}
return 0;
}
return process_measurement(file, file->f_dentry->d_name.name,
   MAY_EXEC, MODULE_CHECK);
}

thanks,

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MAX_PAUSE to be at least 4

2013-01-20 Thread paul . szabo

Ensure MAX_PAUSE is 4 or larger, so limits in
return clamp_val(t, 4, MAX_PAUSE);
(the only use of it) are not back-to-front.

(This patch does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo 
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo 

--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100
@@ -39,7 +39,7 @@
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE  max(HZ/5, 1)
+#define MAX_PAUSE  max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] pwm-backlight: add subdrivers & Tegra support

2013-01-20 Thread Mark Zhang

Patch is applied OK on 3.8-rc4.

Hmmm.. But I think it's better to make the patch can be applied on
linux-next.

Mark
On 01/21/2013 10:09 AM, Mark Zhang wrote:
> Hi Alex,
> 
> This patch set applies failed on tot linux-next(0118). Here is the log:
> 
> markz@markz-hp6200:~/tegradrm/official-upstream-kernel$ git am
> ~/Desktop/*.eml
> Applying: pwm-backlight: add subdriver mechanism
> error: patch failed: drivers/video/backlight/pwm_bl.c:35
> error: drivers/video/backlight/pwm_bl.c: patch does not apply
> Patch failed at 0001 pwm-backlight: add subdriver mechanism
> When you have resolved this problem run "git am --resolved".
> If you would prefer to skip this patch, instead run "git am --skip".
> To restore the original branch and stop patching run "git am --abort".
> 
> Anyway, I'll try to apply this on 3.8-rc4.
> 
> Mark
> On 01/19/2013 06:30 PM, Alexandre Courbot wrote:
>> This series introduces a way to use pwm-backlight hooks with platforms
>> that use the device tree through a subdriver system. It also adds support
>> for the Tegra-based Ventana board, adding the last missing block to enable
>> its panel. Support for other Tegra board can thus be easily added.
>>
>> I have something else in mind to properly support this (power
>> sequences), but this work relies on the GPIO subsystem redesign which will
>> take some time. The pwm-backlight subdrivers can do the job by the meantime.
>>
>> There are a few design points that might need to be discussed:
>> 1) Link order is important: subdrivers register themselves in their
>> module_init function, which must be called before pwm-backlight's probe.
>> This forbids linking subdrivers as separate modules from pwm-backlight.
>> 2) The subdriver's data is temporarily passed through the backlight
>> device's driver data. This should not hurt, but maybe there is a better way
>> to do this.
>> 3) Subdrivers must add themselves into pwm-backlight's own of_device_id
>> table. It would be cleaner to not have to list subdrivers into
>> pwm-backlight's main file, but I cannot think of a way to do otherwise.
>>
>> Suggestions for the 3 points listed above are very welcome - in any case,
>> I hope to make this converge into something mergeable quickly.
>>
>> Note that these patches are the last missing block to get a functional
>> panel on Tegra boards. Using 3.8rc4 and these patches, the internal panel
>> on Ventana is usable out-of-the-box. Yay.
>>
>> Alexandre Courbot (3):
>>   pwm-backlight: add subdriver mechanism
>>   tegra: pwm-backlight: add tegra pwm-bl driver
>>   tegra: ventana: of: add host1x device to DT
>>
>>  arch/arm/boot/dts/tegra20-ventana.dts  |  29 +-
>>  arch/arm/configs/tegra_defconfig   |   1 +
>>  drivers/video/backlight/Kconfig|   7 ++
>>  drivers/video/backlight/Makefile   |   4 +
>>  drivers/video/backlight/pwm_bl.c   |  70 ++-
>>  drivers/video/backlight/pwm_bl_tegra.c | 159 
>> +
>>  include/linux/pwm_backlight.h  |  15 
>>  7 files changed, 281 insertions(+), 4 deletions(-)
>>  create mode 100644 drivers/video/backlight/pwm_bl_tegra.c
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PULL] Module fixes, and a virtio block fix.

2013-01-20 Thread Rusty Russell

Linus Torvalds  writes:

> On Sun, Jan 20, 2013 at 5:32 PM, Rusty Russell  wrote:
>>
>> Due to the delay on git.kernel.org, git request-pull fails.  It *looks*
>> like it succeeds, except the warning, but (as we learned last time I
>> screwed up), it doesn't put the branchname because it can't know.
>
> I think this should be fixed in modern git versions.
>
> And it sure as hell knows the proper tag name, since you *gave* it the
> name and it used it for generating the actual contents. The fact that
> some versions then screw that up and re-write the tag-name to
> something randomly matching that isn't a tag was just a bug.

I'm confused.  The default argument is HEAD: what does it know about tag
names?

git request-pull master korg

The bug is that if it can't find that commit at the remote end, it
still generates a valid-looking request (with a warning at the end),
where it guesses you're talking about the master branch.

>> For want of a better solution, I'll now resort to sending pull requests
>> with the anti-social gitolite URL in it, like so:
>
> That's even worse, fwiw. It means that the pull request address makes
> no sense to anybody who doesn't have a kernel.org address, and then
> I'm forced to just edit things by hand instead to not pollute the
> kernel changelog history with crap.

Since I use a wrapper script now for your pull requests I can use sed to
unscrew it:

[alias]
for-linus = !check-commits && TAGNAME=`git symbolic-ref HEAD | cut -d/ 
-f3`-for-linus && git tag -f -u D1ADB8F1 $TAGNAME HEAD && git push korg tag 
$TAGNAME && git request-pull master korg | sed 
s,gitol...@ra.kernel.org:/pub,git://git.kernel.org/pub, && git log --stat 
--reverse master..$TAGNAME | emails-from-log | grep -v 'rusty@rustcorp' | grep 
-v 'sta...@kernel.org' | sed 's/^/Cc: /'

> Junio, didn't "git request-pull" get fixed so that it *warns* about
> missing tagnames/branches, but never actually corrupts the pull
> request? Or did it just get "fixed" to be a hard error instead of
> corrupting things? Because this is annoying.

Here: git version 1.7.10.4

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Only wakeup the line discipline idle queue when queue is active

2013-01-20 Thread Preeti U Murthy

On 01/18/2013 09:15 PM, Oleg Nesterov wrote:
> On 01/17, Preeti U Murthy wrote:
>>
>> On 01/16/2013 05:32 PM, Ivo Sieben wrote:
>>>
>>> I don't have a problem that there is a context switch to the high
>>> priority process: it has a higher priority, so it probably is more
>>> important.
>>> My problem is that even when the waitqueue is empty, the high priority
>>> thread has a risk to block on the spinlock needlessly (causing context
>>> switches to low priority task and back to the high priority task)
>>>
>> Fair enough Ivo.I think you should go ahead with merging the
>> waitqueue_active()
>>   wake_up()
>> logic into the wake_up() variants.
> 
> This is not easy. We can't simply change wake_up*() helpers or modify
> __wake_up().

Hmm.I need to confess that I don't really know what goes into a change
such as this.Since there are a lot of waitqueue_active()+wake_up()
calls,I was wondering why at all have a separate logic as
waitqueue_active(),if we could do what it does in wake_up*(). But you
guys can decide this best.
> 
> I can't understand why do you dislike Ivo's simple patch. There are
> a lot of "if (waitqueue_active) wake_up" examples. Even if we add the
> new helpers (personally I don't think this makes sense) , we can do
> this later. Why should we delay this fix?

Personally i was concerned about how this could cause a scheduler
overhead.There does not seem to be much of a problem here.Ivo's patch
for adding a waitqueue_active() for his specific problem would also do
well,unless there is a dire requirement for a clean up,which I am unable
to evaluate.

> 
> Oleg.
> 

Thank you

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/2] perf stat: add interval printing

2013-01-20 Thread Namhyung Kim

Hi Stephane,

On Sat, 19 Jan 2013 00:13:59 +0100, Stephane Eranian wrote:
> This patch adds a new printing mode for perf stat.
> It allows internval printing. That means perf stat
> can now print event deltas at regular time interval.
> This is useful to detect phases in programs.
>
> The -I option enables interval printing. It expects
> an interval duration in milliseconds. Minimum is
> 100ms. Once, activated perf stat prints events deltas
> since last printout. All modes are supported.
>
> $ perf stat -I 1000 -e cycles noploop 10
> noploop for 10 seconds

Is this line an output from perf stat?

In addition, how about adding a head line like:

# timecount  event
#
> 1.86918 2385155642 cycles#0.000 GHz
> 2.000267937 2392279774 cycles#0.000 GHz
> 3.000385400 2390971450 cycles#0.000 GHz
> 4.000504408 2390996752 cycles#0.000 GHz
> 5.000626878 2390853097 cycles#0.000 GHz
>
> The output format makes it easy to feed into a plotting program
> such as gnuplot when the -I option is used in combination with the -x
> option:
>
> $ perf stat -x, -I 1000 -e cycles noploop 10
> noploop for 10 seconds
> 1.84113,2378775498,cycles
> 2.000245798,2391056897,cycles
> 3.000354445,2392089414,cycles
> 4.000459115,2390936603,cycles
> 5.000565341,2392108173,cycles
>
> Signed-off-by: Stephane Eranian 
> ---
[snip]
> @@ -877,6 +977,8 @@ static void print_counter(struct perf_evsel *counter)
>  static void print_stat(int argc, const char **argv)
>  {
>   struct perf_evsel *counter;
> + struct timespec ts, rs;
> + char prefix[64] = { 0, };
>   int i;
>  
>   fflush(stdout);
> @@ -899,12 +1001,18 @@ static void print_stat(int argc, const char **argv)
>   fprintf(output, ":\n\n");
>   }
>  
> + if (interval) {
> + clock_gettime(CLOCK_MONOTONIC, );
> + diff_timespec(, , _time);
> + sprintf(prefix, "%lu.%09lu%s", rs.tv_sec, rs.tv_nsec, csv_sep);
> + }

AFAICS the only caller of print_stat() is cmd_stat() and it'll call this
only if interval is 0.  So why not just setting prefix to NULL then?


> +
>   if (no_aggr) {
>   list_for_each_entry(counter, _list->entries, node)
> - print_counter(counter);
> + print_counter(counter, prefix);
>   } else {
>   list_for_each_entry(counter, _list->entries, node)
> - print_counter_aggr(counter);
> + print_counter_aggr(counter, prefix);
>   }
>  
>   if (!csv_output) {
> @@ -925,7 +1033,7 @@ static volatile int signr = -1;
>  
>  static void skip_signal(int signo)
>  {
> - if(child_pid == -1)
> + if((child_pid == -1) || interval)

Looks like it needs a whitespace :)


>   done = 1;
>  
>   signr = signo;
> @@ -1145,6 +1253,8 @@ int cmd_stat(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>   "command to run prior to the measured command"),
>   OPT_STRING(0, "post", _cmd, "command",
>   "command to run after to the measured command"),
> + OPT_INTEGER('I', "interval-print", ,
> + "print counts at regular interval in ms (>= 100)"),
>   OPT_END()
>   };
>   const char * const stat_usage[] = {
> @@ -1245,12 +1355,23 @@ int cmd_stat(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>   usage_with_options(stat_usage, options);
>   return -1;
>   }
> + if (interval < 0 || (interval > 0 && interval < 100)) {
> + pr_err("print interval must be >= 100ms\n");
> + usage_with_options(stat_usage, options);
> + return -1;
> + }

How about making 'interval' unsigned and simplify the condition a bit:

if (interval && interval < 100) {
...
}

Thanks,
Namhyung

>  
>   list_for_each_entry(pos, _list->entries, node) {
>   if (perf_evsel__alloc_stat_priv(pos) < 0 ||
>   perf_evsel__alloc_counts(pos, perf_evsel__nr_cpus(pos)) < 0)
>   goto out_free_fd;
>   }
> + if (interval) {
> + list_for_each_entry(pos, _list->entries, node) {
> + if (perf_evsel__alloc_prev_raw_counts(pos) < 0)
> + goto out_free_fd;
> + }
> + }

It's not about your patch, but I can't find where it frees evsel->counts
- a counter part of perf_evsel__alloc_counts().  Seems we leak that?


>  
>   /*
>* We dont want to block the signals - that would cause
> @@ -1260,6 +1381,7 @@ int cmd_stat(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>*/
>   atexit(sig_atexit);
>   signal(SIGINT,  skip_signal);
> + signal(SIGCHLD, skip_signal);
>   signal(SIGALRM,

Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

2013-01-20 Thread Michael Wang

On 01/20/2013 12:09 PM, Mike Galbraith wrote:
> On Thu, 2013-01-17 at 13:55 +0800, Michael Wang wrote: 
>> Hi, Mike
>>
>> I've send out the v2, which I suppose it will fix the below BUG and
>> perform better, please do let me know if it still cause issues on your
>> arm7 machine.
> 
> s/arm7/aim7
> 
> Someone swiped half of CPUs/ram, so the box is now 2 10 core nodes vs 4.
> 
> stock scheduler knobs
> 
> 3.8-wang-v2 avg 3.8-virgin
>   avgvs wang
> Tasksjobs/min
> 1  436.29435.66435.97435.97437.86441.69
> 440.09439.88  1.008
> 5 2361.65   2356.14   2350.66   2356.15   2416.27   2563.45   
> 2374.61   2451.44  1.040
>10 4767.90   4764.15   4779.18   4770.41   4946.94   4832.54   
> 4828.69   4869.39  1.020
>20 9672.79   9703.76   9380.80   9585.78   9634.34   9672.79   
> 9727.13   9678.08  1.009
>4019162.06  19207.61  19299.36  19223.01  19268.68  19192.40  
> 19056.60  19172.56   .997
>8037610.55  37465.22  37465.22  37513.66  37263.64  37120.98  
> 37465.22  37283.28   .993
>   16069306.65  69655.17  69257.14  69406.32  69257.14  69306.65  
> 69257.14  69273.64   .998
>   320   111512.36 109066.37 111256.45 110611.72 108395.75 107913.19 
> 108335.20 108214.71   .978
>   640   142850.83 148483.92 150851.81 147395.52 151974.92 151263.65 
> 151322.67 151520.41  1.027
>  128052788.89  52706.39  67280.77  57592.01 189931.44 189745.60 
> 189792.02 189823.02  3.295
>  256075403.91  52905.91  45196.21  57835.34 217368.64 217582.05 
> 217551.54 217500.74  3.760
> 
> sched_latency_ns = 24ms
> sched_min_granularity_ns = 8ms
> sched_wakeup_granularity_ns = 10ms
> 
> 3.8-wang-v2 avg 3.8-virgin
>   avgvs wang
> Tasksjobs/min
> 1  436.29436.60434.72435.87434.41439.77
> 438.81437.66  1.004
> 5 2382.08   2393.36   2451.46   2408.96   2451.46   2453.44   
> 2425.94   2443.61  1.014
>10 5029.05   4887.10   5045.80   4987.31   4844.12   4828.69   
> 4844.12   4838.97   .970
>20 9869.71   9734.94   9758.45   9787.70   9513.34   9611.42   
> 9565.90   9563.55   .977
>4019146.92  19146.92  19192.40  19162.08  18617.51  18603.22  
> 18517.95  18579.56   .969
>8037177.91  37378.57  37292.31  37282.93  36451.13  36179.10  
> 36233.18  36287.80   .973
>   16070260.87  69109.05  69207.71  69525.87  68281.69  68522.97  
> 68912.58  68572.41   .986
>   320   114745.56 113869.64 114474.62 114363.27 114137.73 114137.73 
> 114137.73 114137.73   .998
>   640   164338.98 164338.98 164618.00 164431.98 164130.34 164130.34 
> 164130.34 164130.34   .998
>  1280   209473.40 209134.54 209473.40 209360.44 210040.62 210040.62 
> 210097.51 210059.58  1.003
>  2560   242703.38 242627.46 242779.34 242703.39 244001.26 243847.85 
> 243732.91 243860.67  1.004
> 
> As you can see, the load collapsed at the high load end with stock
> scheduler knobs (desktop latency).  With knobs set to scale, the delta
> disappeared.

Thanks for the testing, Mike, please allow me to ask few questions.

What are those tasks actually doing? what's the workload?

And I'm confusing about how those new parameter value was figured out
and how could them help solve the possible issue?

Do you have any idea about which part in this patch set may cause the issue?

One change by designed is that, for old logical, if it's a wake up and
we found affine sd, the select func will never go into the balance path,
but the new logical will, in some cases, do you think this could be a
problem?

> 
> I thought perhaps the bogus (shouldn't exist) CPU domain in mainline
> somehow contributes to the strange behavioral delta, but killing it made
> zero difference.  All of these numbers for both trees were logged with
> the below applies, but as noted, it changed nothing. 

The patch set was supposed to do accelerate by reduce the cost of
select_task_rq(), so it should be harmless for all the conditions.

Regards,
Michael Wang

> 
> From: Alex Shi 
> Date: Mon, 17 Dec 2012 09:42:57 +0800
> Subject: [PATCH 01/18] sched: remove SD_PERFER_SIBLING flag
> 
> The flag was introduced in commit b5d978e0c7e79a. Its purpose seems
> trying to fullfill one node first in NUMA machine via pulling tasks
> from other nodes when the node has capacity.
> 
> Its advantage is when few tasks share memories among them, pulling
> together is helpful on locality, so has performance gain. The shortage
> is it will keep unnecessary task migrations thrashing among different
> nodes, that reduces the performance gain, and just hurt performance if
> tasks has no memory cross.
> 
> Thinking about the sched numa balancing patch is coming. The small
>

Re: [PATCH v9 09/11] PCI, acpiphp: Don't bailout even no slots found yet.

2013-01-20 Thread Yinghai Lu

>
> If that's the case:
>
> Acked-by: Rafael J. Wysocki 
>
> but please say something like this in the changelog:
>
> "The result returned by acpiphp_get_num_slots() is meaningless, because
>  the bridge the slots are under may be added after this function has been
>  called, so drop acpiphp_get_num_slots() and the code using it."

yes, I add you inputs into change log.

Thanks a lot
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: Consequences of integrating the Per Entity Load Tracking Metric into the Load Balancer

2013-01-20 Thread Preeti U Murthy

Hi Alex,
Thank you very much for running the below benchmark on
blocked_load+runnable_load:) Just a few queries.

How did you do the wake up balancing? Did you iterate over the L3
package looking for an idle cpu? Or did you just query the L2 package
for an idle cpu?

I think when you are using blocked_load+runnable_load it would be better
if we just query the L2 package as Vincent had pointed out because the
fundamental behind using blocked_load+runnable_load is to keep a steady
state across cpus unless we could reap the advantage of moving the
blocked load to a sibling core when it wakes up.

And the drop of performance is relative to what?
1.Your v3 patchset with runnable_load_avg in weighted_cpu_load().
2.Your v3 patchset with runnable_load_avg+blocked_load_avg in
weighted_cpu_load().

Are the above two what you are comparing? And in the above two versions
have you included your [PATCH] sched: use instant load weight in burst
regular load balance?

On 01/20/2013 09:22 PM, Alex Shi wrote:
 The blocked load of a cluster will be high if the blocked tasks have
 run recently. The contribution of a blocked task will be divided by 2
 each 32ms, so it means that a high blocked load will be made of recent
 running tasks and the long sleeping tasks will not influence the load
 balancing.
 The load balance period is between 1 tick (10ms for idle load balance
 on ARM) and up to 256 ms (for busy load balance) so a high blocked
 load should imply some tasks that have run recently otherwise your
 blocked load will be small and will not have a large influence on your
 load balance
>>
>> Just tried using cfs's runnable_load_avg + blocked_load_avg in
>> weighted_cpuload() with my v3 patchset, aim9 shared workfile testing
>> show the performance dropped 70% more on the NHM EP machine. :(
>>
> 
> Ops, the performance is still worse than just count runnable_load_avg.
> But dropping is not so big, it dropped 30%, not 70%.
> 

Thank you

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2013-01-20 Thread Linus Torvalds

On Sun, Jan 20, 2013 at 6:30 PM, Al Viro  wrote:
>
> Neither do I, to be honest.  It might be saving us a few cycles on
> some architectures, but I'd like to see examples of that.  amd64
> doesn't seem to be one, at least...

I think that the inlining of the body should make it basically be
pretty much free even on architectures that would want to do something
about the casts.

.. and thinking about it, the architectures that do actually generate
code for casting to a narrower type should already have selected that
HAVE_SYSCALL_WRAPPERS option anyway, so the only reason *not* to
select it is for a n architecture that doesn't generate any extra
code.

And right now, that HAVE_SYSCALL_WRAPPERS does make it much harder to
think about the header file changes.

> FWIW, there's another bit of ugliness around that area - all these
> #define __SC_BLAH3, etc., all of the same form.  This stuff begs for
> something like
> #define __MAP1(m,t,a) m(t,a)
> #define __MAP2(m,t,a,...) m(t,a) __MAP1(m,__VA_ARGS__)
> #define __MAP3(m,t,a,...) m(t,a) __MAP2(m,__VA_ARGS__)
> #define __MAP4(m,t,a,...) m(t,a) __MAP3(m,__VA_ARGS__)
> #define __MAP5(m,t,a,...) m(t,a) __MAP4(m,__VA_ARGS__)
> #define __MAP6(m,t,a,...) m(t,a) __MAP5(m,__VA_ARGS__)
> #define __MAP(n,...) __MAP##n(__VA_ARGS__)
> with __MAP(x,__SC_DECL,__VA_ARGS__) instead of __SC_DECL##x(__VA_ARGS__)
> etc. in users...

Well, I can see both sides. The above is the nice and dense
declaration model with less duplication, but christ, it's hard for
people to wrap their minds around unless they've seen it a million
times. It really does take some getting used to, and the long-form can
be easier to understand.

That said, we have so many of those things now when it comes to the
syscall stuff that the dense form seems to be called for just to be
consistent.

So go wild if you have the energy for it. I'm not going to pull that
for 3.8, though.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 08/11] PCI, ACPI: debug print for installation of acpi root bridge's notifier

2013-01-20 Thread Yinghai Lu

On Sun, Jan 20, 2013 at 3:00 PM, Rafael J. Wysocki  wrote:
> On Thursday, January 17, 2013 11:53:19 PM Yinghai Lu wrote:
>> From: Tang Chen 
>>
>> acpi_install_notify_handler() could fail. So check the exit status
>> and give a better debug info.
>>
>> Signed-off-by: Tang Chen 
>> Signed-off-by: Yinghai Lu 
>> ---
>>  drivers/acpi/pci_root.c |   12 +---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 3ce5d80..f3ceb61 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -762,6 +762,7 @@ static void handle_hotplug_event_root(acpi_handle 
>> handle, u32 type,
>>  static acpi_status __init
>>  find_root_bridges(acpi_handle handle, u32 lvl, void *context, void **rv)
>>  {
>> + acpi_status status;
>>   char objname[64];
>>   struct acpi_buffer buffer = { .length = sizeof(objname),
>> .pointer = objname };
>> @@ -774,9 +775,14 @@ find_root_bridges(acpi_handle handle, u32 lvl, void 
>> *context, void **rv)
>>
>>   acpi_get_name(handle, ACPI_FULL_PATHNAME, );
>>
>> - acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
>> - handle_hotplug_event_root, NULL);
>> - printk(KERN_DEBUG "acpi root: %s notify handler installed\n", objname);
>> + status = acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
>> + handle_hotplug_event_root, NULL);
>> + if (ACPI_FAILURE(status))
>> + printk(KERN_DEBUG "acpi root: %s notify handler is not 
>> installed, exit status: %u\n",
>
> Can you break that line, please?  And use pr_debug()?

Long line should be ok, and checkpatch.pl is not complaining about that.

Also keep the complete print out in one line, could make git grep find
that code exactly.

Actually I really hate pr_debug(), that will make the generated code
different with DEBUG
defined or not. And need to end user to recompile kernel to get debug
output if needed.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 07/11] PCI, acpiphp: Move and enhance hotplug support of pci host bridge

2013-01-20 Thread Yinghai Lu

On Sun, Jan 20, 2013 at 2:55 PM, Rafael J. Wysocki  wrote:
> On Thursday, January 17, 2013 11:53:18 PM Yinghai Lu wrote:
>> We have partial hot-add support in acpiphp driver, and it is confusing.
>>
>> Move host bridge hot-add support to pci_root.c, and keep acpiphp simple,
>> also add hot-remove support in pci_root.c.
>>
>> How to test it: if sci_emu patch is applied,
>>
>> Find out root bus number to acpi root name mapping from dmesg or /sys
>>
>>   echo "\_SB.PCIB 3" > /sys/kernel/debug/acpi/sci_notify
>> to remove root bus
>>
>>   echo "\_SB.PCIB 1" > /sys/kernel/debug/acpi/sci_notify
>> to add back root bus
>>
>> -v2: put back pci_root_hp change in one patch
>> -v3: add pcibios_resource_survey_bus() calling
>> -v4: remove not needed code with remove_bridge
>> -v5: put back support for acpiphp support for slots just on root bus.
>> -v6: change some functions to *_p2p_* to make it more clean.
>> -v7: split hot_added change out.
>> -v8: Move to pci_root.c instead of adding another file requested by Bjorn.
>> -v9: Fold three following patches into this one for easy review:
>>   a: Add missing hot_remove support for root device.
>>   b: Tang Chen noticed that hotplug through container will not update
>>  acpi_root_bridge list. After closely checking, we don't need
>>  that for struct for tracking and could use acpi_pci_root directly.
>>   c: Tang Chen found handle_root_bridge_removal is very similiar to
>>  acpi_bus_hot_remove_device().  Change to handle_root_bridge_removal
>>  to use acpi_bus_hot_remove_device.
>>
>> Signed-off-by: Yinghai Lu 
>> ---
>>  drivers/acpi/pci_root.c|  139 
>> 
>>  drivers/pci/hotplug/acpiphp_glue.c |   59 ---
>>  2 files changed, 154 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index bf5108a..3ce5d80 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -655,3 +655,142 @@ int __init acpi_pci_root_init(void)
>>
>>   return 0;
>>  }
>> +
>> +/* Support root bridge hotplug */
>> +
>> +static void handle_root_bridge_insertion(acpi_handle handle)
>> +{
>> + struct acpi_device *device, *pdevice;
>> + acpi_handle phandle;
>> + int ret_val;
>> +
>> + acpi_get_parent(handle, );
>> + if (acpi_bus_get_device(phandle, )) {
>> + printk(KERN_DEBUG "no parent device, assuming NULL\n");
>> + pdevice = NULL;
>> + }
>> + if (!acpi_bus_get_device(handle, )) {
>> + /* check if  pci root_bus is removed */
>> + struct acpi_pci_root *root = acpi_driver_data(device);
>> + if (pci_find_bus(root->segment, root->secondary.start))
>> + return;
>> +
>> + printk(KERN_DEBUG "bus exists... trim\n");
>> + /* this shouldn't be in here, so remove
>> +  * the bus then re-add it...
>> +  */
>> + ret_val = acpi_bus_trim(device);
>
> You said that this followed acpiphp, but the purpose of the trimming in there
> seems to be to handle surprise removal and re-insertion, which I'm not sure is
> OK with something like a host bridge.

ok, will just bail out if it is there.

>
> The drawback is that if we have a spurious ACPI_NOTIFY_BUS_CHECK or
> ACPI_NOTIFY_DEVICE_CHECK, we'll be trying to remove the whole bus here in
> response.  That doesn't sound quite right.
>
>> + printk(KERN_DEBUG "acpi_bus_trim return %x\n", ret_val);
>> + }
>> + if (acpi_bus_add(handle))
>> + printk(KERN_ERR "cannot add bridge to acpi list\n");
>> +}
>> +
>> +static void handle_root_bridge_removal(struct acpi_device *device)
>> +{
>> + struct acpi_eject_event *ej_event;
>> +
>> + ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
>> + if (!ej_event)
>
> Shouldn't we do acpi_evaluate_hotplug_ost() here?

ok. will add

/* Inform firmware the hot-remove operation has error */
(void) acpi_evaluate_hotplug_ost(device->handle,
ACPI_NOTIFY_EJECT_REQUEST,
ACPI_OST_SC_NON_SPECIFIC_FAILURE,
NULL);

before return.

>
>> + return;
>> +
>> + ej_event->device = device;
>> + ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
>> +
>> + acpi_bus_hot_remove_device(ej_event);
>> +}
>> +
>> +static void _handle_hotplug_event_root(struct work_struct *work)
>> +{
>> + struct acpi_pci_root *root;
>> + char objname[64];
>> + struct acpi_buffer buffer = { .length = sizeof(objname),
>> +   .pointer = objname };
>> + struct acpi_hp_work *hp_work;
>> + acpi_handle handle;
>> + u32 type;
>> +
>> + hp_work = container_of(work, struct acpi_hp_work, work);
>> + handle = hp_work->handle;
>> + type = hp_work->type;
>> +
>> + root =

Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2013-01-20 Thread Al Viro

On Sun, Jan 20, 2013 at 05:40:28PM -0800, Linus Torvalds wrote:
> On Sun, Jan 20, 2013 at 5:22 PM, Al Viro  wrote:
> >
> > Anyway, that's a separate story - semctl(2) is going to be ugly, no matter
> > what we do, but the rest of those guys doesn't have to.  How about the
> > following (completely untested):
> 
> Hmm.  Looks like the RightThing(tm) to me.
> 
> The thing that stands out that I question the value of that
> HAVE_SYSCALL_WRAPPERS thing. Is there any reason we don't just make
> all architectures use it? What's the downside? I'm not sure I see the
> point of the non-wrapper version.

Neither do I, to be honest.  It might be saving us a few cycles on
some architectures, but I'd like to see examples of that.  amd64
doesn't seem to be one, at least...

FWIW, there's another bit of ugliness around that area - all these
#define __SC_BLAH3, etc., all of the same form.  This stuff begs for
something like
#define __MAP1(m,t,a) m(t,a)
#define __MAP2(m,t,a,...) m(t,a) __MAP1(m,__VA_ARGS__)
#define __MAP3(m,t,a,...) m(t,a) __MAP2(m,__VA_ARGS__)
#define __MAP4(m,t,a,...) m(t,a) __MAP3(m,__VA_ARGS__)
#define __MAP5(m,t,a,...) m(t,a) __MAP4(m,__VA_ARGS__)
#define __MAP6(m,t,a,...) m(t,a) __MAP5(m,__VA_ARGS__)
#define __MAP(n,...) __MAP##n(__VA_ARGS__)
with __MAP(x,__SC_DECL,__VA_ARGS__) instead of __SC_DECL##x(__VA_ARGS__)
etc. in users...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] udf: add extent cache support in case of file reading

2013-01-20 Thread Namjae Jeon

2013/1/19, Cong Ding :
> On Sat, Jan 19, 2013 at 11:17:14AM +0900, Namjae Jeon wrote:
>> From: Namjae Jeon 
>>
>> This patch implements extent caching in case of file reading.
>> While reading a file, currently, UDF reads metadata serially
>> which takes a lot of time depending on the number of extents present
>> in the file. Caching last accessd extent improves metadata read time.
>> Instead of reading file metadata from start, now we read from
>> the cached extent.
>>
>> This patch considerably improves the time spent by CPU in kernel mode.
>> For example, while reading a 10.9 GB file using dd:
>> Time before applying patch:
>> 11677022208 bytes (10.9GB) copied, 1529.748921 seconds, 7.3MB/s
>> real25m 29.85s
>> user0m 12.41s
>> sys 15m 34.75s
>>
>> Time after applying patch:
>> 11677022208 bytes (10.9GB) copied, 1469.338231 seconds, 7.6MB/s
>> real24m 29.44s
>> user0m 15.73s
>> sys 3m 27.61s
> did you have any test on lots of small files?
Hi. Cong.

I created 2048 files of  each 512KB size for testing performance
drpping by extent cache feature.

Used this script to read every file:
index=0
while [ $index != 2048 ]
do
dd if=file.$index of=/dev/zero 1> /dev/null 2>/dev/null
index=$(($index + 1))
done

Performance without patch:
VDLinux#> echo 3 > /proc/sys/vm/drop_caches
VDLinux#> time ./script2.sh
real0m 55.13s
user0m 1.40s
sys 0m 25.17s

Performace with patch =>
VDLinux#> time ./script2.sh
real0m 53.70s
user0m 1.60s
sys 0m 25.11s

I can not find any performance dropping with extent cache patch.
Thanks.
>
>  - cong
>>
>> Signed-off-by: Namjae Jeon 
>> Signed-off-by: Ashish Sangwan 
>> Signed-off-by: Bonggil Bak 
>> ---
>>  fs/udf/ialloc.c  |4 +++
>>  fs/udf/inode.c   |   79
>> +-
>>  fs/udf/udf_i.h   |   16 +++
>>  fs/udf/udfdecl.h |   10 +++
>>  4 files changed, 98 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/udf/ialloc.c b/fs/udf/ialloc.c
>> index 7e5aae4..0cb208e 100644
>> --- a/fs/udf/ialloc.c
>> +++ b/fs/udf/ialloc.c
>> @@ -117,6 +117,10 @@ struct inode *udf_new_inode(struct inode *dir,
>> umode_t mode, int *err)
>>  iinfo->i_lenAlloc = 0;
>>  iinfo->i_use = 0;
>>  iinfo->i_checkpoint = 1;
>> +memset(>cached_extent, 0, sizeof(struct udf_ext_cache));
>> +spin_lock_init(&(iinfo->i_extent_cache_lock));
>> +/* Mark extent cache as invalid for now */
>> +iinfo->cached_extent.lstart = -1;
>>  if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_AD_IN_ICB))
>>  iinfo->i_alloc_type = ICBTAG_FLAG_AD_IN_ICB;
>>  else if (UDF_QUERY_FLAG(inode->i_sb, UDF_FLAG_USE_SHORT_AD))
>> diff --git a/fs/udf/inode.c b/fs/udf/inode.c
>> index e78ef48..86e0469 100644
>> --- a/fs/udf/inode.c
>> +++ b/fs/udf/inode.c
>> @@ -91,6 +91,7 @@ void udf_evict_inode(struct inode *inode)
>>  }
>>  kfree(iinfo->i_ext.i_data);
>>  iinfo->i_ext.i_data = NULL;
>> +udf_clear_extent_cache(iinfo);
>>  if (want_delete) {
>>  udf_free_inode(inode);
>>  }
>> @@ -106,6 +107,7 @@ static void udf_write_failed(struct address_space
>> *mapping, loff_t to)
>>  truncate_pagecache(inode, to, isize);
>>  if (iinfo->i_alloc_type != ICBTAG_FLAG_AD_IN_ICB) {
>>  down_write(>i_data_sem);
>> +udf_clear_extent_cache(iinfo);
>>  udf_truncate_extents(inode);
>>  up_write(>i_data_sem);
>>  }
>> @@ -373,7 +375,7 @@ static int udf_get_block(struct inode *inode, sector_t
>> block,
>>  iinfo->i_next_alloc_goal++;
>>  }
>>
>> -
>> +udf_clear_extent_cache(iinfo);
>>  phys = inode_getblk(inode, block, , );
>>  if (!phys)
>>  goto abort;
>> @@ -1172,6 +1174,7 @@ set_size:
>>  } else {
>>  if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) {
>>  down_write(>i_data_sem);
>> +udf_clear_extent_cache(iinfo);
>>  memset(iinfo->i_ext.i_data + iinfo->i_lenEAttr + 
>> newsize,
>> 0x00, bsize - newsize -
>> udf_file_entry_alloc_offset(inode));
>> @@ -1185,6 +1188,7 @@ set_size:
>>  if (err)
>>  return err;
>>  down_write(>i_data_sem);
>> +udf_clear_extent_cache(iinfo);
>>  truncate_setsize(inode, newsize);
>>  udf_truncate_extents(inode);
>>  up_write(>i_data_sem);
>> @@ -1302,6 +1306,9 @@ static void udf_fill_inode(struct inode *inode,
>> struct buffer_head *bh)
>>  iinfo->i_lenAlloc = 0;
>>  iinfo->i_next_alloc_block = 0;
>>  iinfo->i_next_alloc_goal = 0;
>> +memset(>cached_extent, 0, sizeof(struct udf_ext_cache));
>> +spin_lock_init(&(iinfo->i_extent_cache_lock));
>> +iinfo->cached_extent.lstart = -1;
>>  if (fe->descTag.tagIdent == cpu_to_le16(TAG_IDENT_EFE)) {
>>

1 2 3 4 5 6 7 >

1 - 100 of 608 matches

Mail list logo