Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Chen Gang
On 07/25/2013 02:03 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-25 at 15:51 +1000, Benjamin Herrenschmidt wrote:
>> On Thu, 2013-07-25 at 13:24 +0800, Chen Gang wrote:
>>> For an extern function, if the performance is not sensible, better to
>>> have the return value which can indicate the failure with the negative
>>> number.
>>
>> The return value is meaningless.
>>
>> We don't have a good way to handle it. It has no defined semantics. What
>> does "failure" means in that case ? Nothing !
>>
>> So just remove it.
> 
> Note: If you want to create a concept of smp_ops->probe() failing, then
> not only you need to check all the implementations, but *also* add
> something sensible to do when it fails ... such as disabling bringup of
> CPUs.
> 

Hmm... if critical, use BUG(), else (none critical), just print a
warning message ?

> In this case however, we have put the burden of doing whatever makes
> sense in the probe() function itself. If can adjust the possible map if
> it fails.
> 

Excuse me, my English is not quite well, I guss your meaning is: "it can
be fail in internal implementation, but has no effect with the final
result to caller", is it correct ?

If what I understand is correct, it needn't let caller know about it.


Thanks.
-- 
Chen Gang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Chen Gang
On 07/25/2013 01:51 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-25 at 13:24 +0800, Chen Gang wrote:
>> For an extern function, if the performance is not sensible, better to
>> have the return value which can indicate the failure with the negative
>> number.
> 
> The return value is meaningless.
> 
> We don't have a good way to handle it. It has no defined semantics. What
> does "failure" means in that case ? Nothing !
> 
> So just remove it.
> 

Hmm... for an extern function (espeically have been implemented in
various modules), normally, we can assume it may fail in some cases
(although now, we don't know what cases can cause its failure).

If "we don't have a good way to handle the failure", "print the related
warning message" is an executable choice (or "BUG_ON()", if it is critical).

So, if the performance is not sensible, I still suggest to let extern
function have return value.


Thanks.
-- 
Chen Gang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Peter LaDow
On Wed, Jul 24, 2013 at 3:08 PM, Benjamin Herrenschmidt
 wrote:
> No, they resolve to the same thing under the hood. Did you do other
> changes ? Could it be another unrelated kernel bug causing something
> like use-after-free of network buffer or similar oddity unrelated to the
> network driver ?

There are other items, such as drivers for our custom hardware modules
implemented on the FPGA.  Perhaps I'll pull our drivers and run a
stock kernel.  Maybe a stock 83xx configuration (such as the
MPC8349E-MITX).  If we have problems even on a stock configuration...

> Have you tried with different kernel versions ?

Funny you mention it.  I just tried 3.10.2 today and we still get the
same memory corruption.  I was hoping that perhaps something had
changed between 3.0 and 3.10 that might clear up the problem, and then
I could bisect to find where it failed.  But unfortunately, 3.10.2
exhibits the same issue.

So clearly this isn't an issue specific to the kernel version.  Though
the e1000 driver looks largely unchanged in 3.10.  So if the problem
is driver related, it would still be there.

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Benjamin Herrenschmidt
On Thu, 2013-07-25 at 15:51 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-25 at 13:24 +0800, Chen Gang wrote:
> > For an extern function, if the performance is not sensible, better to
> > have the return value which can indicate the failure with the negative
> > number.
> 
> The return value is meaningless.
> 
> We don't have a good way to handle it. It has no defined semantics. What
> does "failure" means in that case ? Nothing !
> 
> So just remove it.

Note: If you want to create a concept of smp_ops->probe() failing, then
not only you need to check all the implementations, but *also* add
something sensible to do when it fails ... such as disabling bringup of
CPUs.

In this case however, we have put the burden of doing whatever makes
sense in the probe() function itself. If can adjust the possible map if
it fails.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Benjamin Herrenschmidt
On Thu, 2013-07-25 at 13:24 +0800, Chen Gang wrote:
> For an extern function, if the performance is not sensible, better to
> have the return value which can indicate the failure with the negative
> number.

The return value is meaningless.

We don't have a good way to handle it. It has no defined semantics. What
does "failure" means in that case ? Nothing !

So just remove it.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Chen Gang
On 07/25/2013 01:16 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-25 at 13:15 +1000, Michael Ellerman wrote:
>>> But for API (also include the internal API), at least, better to always
>>> provide the return value which can indicate failure by negative number
>>> (if succeed can return the meanness value, e.g. the number of cpus).
>>
>> Are we still talking about this?
>>
>> There is no point returning a value when no one checks it. Which is the
>> case here.
> 
> Right. The return value is historical, it dates from when we didn't have
> cpu_possible_mask etc...
> 
> Nowadays, the probe() routine is just some early init, and might also
> affect those masks if needed, the return value has become obsolete.
> 
> You are welcome to post a patch removing it.
> 

For an extern function, if the performance is not sensible, better to
have the return value which can indicate the failure with the negative
number.


Thanks.
-- 
Chen Gang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Benjamin Herrenschmidt
On Thu, 2013-07-25 at 13:15 +1000, Michael Ellerman wrote:
> > But for API (also include the internal API), at least, better to always
> > provide the return value which can indicate failure by negative number
> > (if succeed can return the meanness value, e.g. the number of cpus).
> 
> Are we still talking about this?
> 
> There is no point returning a value when no one checks it. Which is the
> case here.

Right. The return value is historical, it dates from when we didn't have
cpu_possible_mask etc...

Nowadays, the probe() routine is just some early init, and might also
affect those masks if needed, the return value has become obsolete.

You are welcome to post a patch removing it.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] of: Feed entire flattened device tree into the random pool

2013-07-24 Thread David Gibson
On Thu, Jul 25, 2013 at 02:30:31PM +1000, Anton Blanchard wrote:
> 
> Hi Michael,
> 
> > But why not put the initcall in drivers/of/fdt.c, that way it's not
> > early but it's still common ?
> 
> Good idea! How does this look? So long as it happens before
> module_init(rand_initialize) we should be good.

This must be some strange new meaning of the word "random" of which I
was not previously aware.  But I guess it's marginally better than
nothing.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp31KUrEvz5a.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] of: Feed entire flattened device tree into the random pool

2013-07-24 Thread Anton Blanchard

Hi Michael,

> But why not put the initcall in drivers/of/fdt.c, that way it's not
> early but it's still common ?

Good idea! How does this look? So long as it happens before
module_init(rand_initialize) we should be good.

Anton
--

We feed the entire DMI table into the random pool to provide
better random data during early boot, so do the same with the
flattened device tree.

Signed-off-by: Anton Blanchard 
---

v2: move to drivers/of/fdt.c as suggested by Michael Ellerman

Index: b/drivers/of/fdt.c
===
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include   /* for COMMAND_LINE_SIZE */
 #ifdef CONFIG_PPC
@@ -714,3 +715,14 @@ void __init unflatten_device_tree(void)
 }
 
 #endif /* CONFIG_OF_EARLY_FLATTREE */
+
+/* Feed entire flattened device tree into the random pool */
+static int __init add_fdt_randomness(void)
+{
+   if (initial_boot_params)
+   add_device_randomness(initial_boot_params,
+ initial_boot_params->totalsize);
+
+   return 0;
+}
+core_initcall(add_fdt_randomness);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Chen Gang
On 07/25/2013 11:15 AM, Michael Ellerman wrote:
> On Wed, Jul 24, 2013 at 10:09:33AM +0800, Chen Gang wrote:
>> > On 07/24/2013 09:16 AM, Michael Ellerman wrote:
>>> > > On Wed, Jul 24, 2013 at 08:28:07AM +0800, Chen Gang wrote:
> > >> > On 07/23/2013 09:44 PM, Michael Ellerman wrote:
>>> > >>> > > On Mon, Jul 22, 2013 at 12:21:16PM +0530, Srivatsa S. Bhat 
>>> > >>> > > wrote:
> >  > >> On 07/22/2013 12:10 PM, Chen Gang wrote:
>>> > > > >>> Since not need 'max_cpus' after the related commit, 
>>> > > > >>> the related code
>>> > > > >>> are useless too, need be removed.
>>> > >>> > > 
>>> > >>> > > A good follow up patch, or actually series of patches, would 
>>> > >>> > > be to
>>> > >>> > > change the prototype of smp_ops->probe() to return void, and 
>>> > >>> > > fix all the
>>> > >>> > > implementations to no longer return anything.
>>> > >>> > > 
> > >> > 
> > >> > Hmm... normally, a function need have a return value, it will make 
> > >> > it
> > >> > more extensible (especially, it is an API which need be 
> > >> > implemented in
> > >> > various sub modules).
>>> > > A function doesn't need a return value, and if it needs one in future 
>>> > > then
>>> > > we'll add it then. We don't carry code around "just in case".
>> > 
>> > But for API (also include the internal API), at least, better to always
>> > provide the return value which can indicate failure by negative number
>> > (if succeed can return the meanness value, e.g. the number of cpus).
> Are we still talking about this?
> 
> There is no point returning a value when no one checks it. Which is the
> case here.
> 
> For a published API maybe it's a good idea to have a return value "just
> in case", but this is kernel internal and we own both the implementation
> and the callers of the API.
> 

API is between caller and callee, but independent with who will use it
and who will implement it (may be they are the same member), and also
independent with whether "between kernel and user" or not.

Today, you are really the member for both caller and callee, but in the
future, may not.

For our case, it is really an API, it is defined in upper level, and
implement in various sub modules (extern the declaration and implement
in various sub modules).


> > >> > Even though the return value may be useless, now, if the 
> > >> > performance is
> > >> > not quite important in our case, I still suggest to have it 
> > >> > (especially
> > >> > each various original implementation already has it).
>>> > > It's dead code, it should be removed.
>> > 
>> > For API, if not cause the real world issue, better to keep compatible
>> > (especially, the return value still can indicate failure by negative
>> > number).
> No. Dead code is a real world issue. If we ever need a return value
> we'll add one then.

Hmm... what my original saying "real world issue" is not precise, it
need change to: "if it is not an 'urgent' issue (may be a real world
issue), for API, need still keep it compatible (keep no touch) now".

"dead code for an API" does not belong to 'urgent' issue, it belongs to
'important' issue.  When we are reconstructing the source code, we can
also remove them in that window (at least, it does not often happen).

(although for our case, I don't think "return value for API" is "dead
code for an API")


Thanks.
-- 
Chen Gang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: Feed entire flattened device tree into the random pool

2013-07-24 Thread Michael Ellerman
On Thu, Jul 25, 2013 at 12:51:22PM +1000, Anton Blanchard wrote:
> 
> We feed the entire DMI table into the random pool to provide
> better random data during early boot, so do the same with the
> flattened device tree.
> 
> Signed-off-by: Anton Blanchard 
> ---
> 
> It might be worth doing this somewhere common, but the only place
> I could find (unflatten_device_tree) is almost certainly too
> early in the boot process.

Nice.

But why not put the initcall in drivers/of/fdt.c, that way it's not
early but it's still common ?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: kernel: remove useless code which related with 'max_cpus'

2013-07-24 Thread Michael Ellerman
On Wed, Jul 24, 2013 at 10:09:33AM +0800, Chen Gang wrote:
> On 07/24/2013 09:16 AM, Michael Ellerman wrote:
> > On Wed, Jul 24, 2013 at 08:28:07AM +0800, Chen Gang wrote:
> >> > On 07/23/2013 09:44 PM, Michael Ellerman wrote:
> >>> > > On Mon, Jul 22, 2013 at 12:21:16PM +0530, Srivatsa S. Bhat wrote:
>  > >> On 07/22/2013 12:10 PM, Chen Gang wrote:
> > > >>> Since not need 'max_cpus' after the related commit, the related 
> > > >>> code
> > > >>> are useless too, need be removed.
> >>> > > 
> >>> > > A good follow up patch, or actually series of patches, would be to
> >>> > > change the prototype of smp_ops->probe() to return void, and fix all 
> >>> > > the
> >>> > > implementations to no longer return anything.
> >>> > > 
> >> > 
> >> > Hmm... normally, a function need have a return value, it will make it
> >> > more extensible (especially, it is an API which need be implemented in
> >> > various sub modules).
> > A function doesn't need a return value, and if it needs one in future then
> > we'll add it then. We don't carry code around "just in case".
> 
> But for API (also include the internal API), at least, better to always
> provide the return value which can indicate failure by negative number
> (if succeed can return the meanness value, e.g. the number of cpus).

Are we still talking about this?

There is no point returning a value when no one checks it. Which is the
case here.

For a published API maybe it's a good idea to have a return value "just
in case", but this is kernel internal and we own both the implementation
and the callers of the API.

> >> > Even though the return value may be useless, now, if the performance is
> >> > not quite important in our case, I still suggest to have it (especially
> >> > each various original implementation already has it).

> > It's dead code, it should be removed.
> 
> For API, if not cause the real world issue, better to keep compatible
> (especially, the return value still can indicate failure by negative
> number).

No. Dead code is a real world issue. If we ever need a return value
we'll add one then.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: Feed entire flattened device tree into the random pool

2013-07-24 Thread Anton Blanchard

We feed the entire DMI table into the random pool to provide
better random data during early boot, so do the same with the
flattened device tree.

Signed-off-by: Anton Blanchard 
---

It might be worth doing this somewhere common, but the only place
I could find (unflatten_device_tree) is almost certainly too
early in the boot process.

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 63d051f..6914851 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -752,3 +753,13 @@ void arch_setup_pdev_archdata(struct platform_device *pdev)
pdev->dev.dma_mask = &pdev->archdata.dma_mask;
set_dma_ops(&pdev->dev, &dma_direct_ops);
 }
+
+/* Feed entire flattened device tree into the random pool */
+static int __init add_fdt_randomness(void)
+{
+   add_device_randomness(initial_boot_params,
+ initial_boot_params->totalsize);
+
+   return 0;
+}
+core_initcall(add_fdt_randomness);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/pmac: Early debug output on screen on 64-bit macs

2013-07-24 Thread Benjamin Herrenschmidt
On Thu, 2013-07-25 at 12:12 +1000, Benjamin Herrenschmidt wrote:
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -10,7 +10,7 @@
>   *  2 of the License, or (at your option) any later version.
>   */
>  
> -#undef DEBUG
> +#define DEBUG

Ooops... sent the wrong version. Will send an updated one later,
if there are no other comments.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 4/4] DMA: Freescale: eliminate a compiling warning

2013-07-24 Thread Hongbo Zhang

On 07/25/2013 03:33 AM, Scott Wood wrote:

On 07/24/2013 01:21:09 AM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

The variable cookie is initialized in a list_for_each_entry loop, 
if(unlikely)
the list is empty, this variable will be used uninitialized, so we 
get a gcc

compiling warning about this. This patch fixes this defect by setting an
initial value to the varialble cookie.

Signed-off-by: Hongbo Zhang 
---
 drivers/dma/fsldma.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 16a9a48..14d68a4 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -406,7 +406,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)

 struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
 struct fsl_desc_sw *child;
 unsigned long flags;
-dma_cookie_t cookie;
+dma_cookie_t cookie = 0;

 spin_lock_irqsave(&chan->desc_lock, flags);


This patch is unrelated to the rest of the patch series...

What are the semantics of this function if there are multiple entries 
in the list?  Returning the last cookie seems a bit odd.


Is zero the proper error value?  include/linux/dmaengine.h suggests 
that cookies should be < 0 to indicate error.
I found this compiling warning since the beginning of this work, it is 
better somebody fixes it sooner or later, so I take it at last.
Yes it was a bit hard to define the initial value, I saw the 
dmaengine.h, and I searched all the other DMA drivers with initial value 
before making the decision:

drivers/dma/mv_xor.c:dma_cookie_t cookie = 0;
drivers/dma/sh/shdma-base.c:dma_cookie_t cookie = 0;
drivers/dma/mmp_pdma.c:dma_cookie_t cookie = -EBUSY;
drivers/dma/ppc4xx/adma.c:dma_cookie_t cookie = 0;
drivers/dma/iop-adma.c:dma_cookie_t cookie = 0;
most of them using 0, and only one negative value, it seems better? but 
-EBUSY isn't  so accurate I think.
My thought is to drop this in the next iteration, and back to this after 
the first 3 get merged.


-Scott




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/pmac: Early debug output on screen on 64-bit macs

2013-07-24 Thread Benjamin Herrenschmidt
We have a bunch of CONFIG_PPC_EARLY_DEBUG_* options that are intended
for bringup/debug only. They hard wire a machine specific udbg backend
very early on (before we even probe the platform), and use whatever
tricks are available on each machine/cpu to be able to get some kind
of output out there early on.

So far, on powermac with no serial ports, we have CONFIG_PPC_EARLY_DEBUG_BOOTX
to use the low-level btext engine on the screen, but it doesn't do much, at
least on 64-bit. It only really gets enabled after the platform has been
probed and the MMU enabled.

This adds a way to enable it much earlier. From prom_init.c (while still
running with Open Firmware), we grab the screen details and set things up
using the physical address of the frame buffer.

Then btext itself uses the "rm_ci" feature of the 970 processor (Real
Mode Cache Inhibited) to access it while in real mode.

We need to do a little bit of reorg of the btext code to inline things
better, in order to limit how much we touch memory while in this mode as
the consequences might be ... interesting.

This successfully allowed me to debug problems early on with the G5
(related to gold being broken vs. ppc64 kernels).

Signed-off-by: Benjamin Herrenschmidt 
---

diff --git a/arch/powerpc/include/asm/btext.h b/arch/powerpc/include/asm/btext.h
index 906f46e..89fc382 100644
--- a/arch/powerpc/include/asm/btext.h
+++ b/arch/powerpc/include/asm/btext.h
@@ -13,6 +13,7 @@ extern void btext_update_display(unsigned long phys, int 
width, int height,
 extern void btext_setup_display(int width, int height, int depth, int pitch,
unsigned long address);
 extern void btext_prepare_BAT(void);
+extern void btext_map(void);
 extern void btext_unmap(void);
 
 extern void btext_drawchar(char c);
diff --git a/arch/powerpc/kernel/btext.c b/arch/powerpc/kernel/btext.c
index ac8f527..0428992 100644
--- a/arch/powerpc/kernel/btext.c
+++ b/arch/powerpc/kernel/btext.c
@@ -25,11 +25,6 @@
 static void scrollscreen(void);
 #endif
 
-static void draw_byte(unsigned char c, long locX, long locY);
-static void draw_byte_32(unsigned char *bits, unsigned int *base, int rb);
-static void draw_byte_16(unsigned char *bits, unsigned int *base, int rb);
-static void draw_byte_8(unsigned char *bits, unsigned int *base, int rb);
-
 #define __force_data __attribute__((__section__(".data")))
 
 static int g_loc_X __force_data;
@@ -52,6 +47,26 @@ static unsigned char vga_font[cmapsz];
 int boot_text_mapped __force_data = 0;
 int force_printk_to_btext = 0;
 
+extern void rmci_on(void);
+extern void rmci_off(void);
+
+static inline void rmci_maybe_on(void)
+{
+#ifdef CONFIG_PPC_EARLY_DEBUG_BOOTX
+   if (!(mfmsr() & MSR_DR))
+   rmci_on();
+#endif
+}
+
+static inline void rmci_maybe_off(void)
+{
+#ifdef CONFIG_PPC_EARLY_DEBUG_BOOTX
+   if (!(mfmsr() & MSR_DR))
+   rmci_off();
+#endif
+}
+
+
 #ifdef CONFIG_PPC32
 /* Calc BAT values for mapping the display and store them
  * in disp_BAT.  Those values are then used from head.S to map
@@ -134,7 +149,7 @@ void __init btext_unmap(void)
  *changes.
  */
 
-static void map_boot_text(void)
+void btext_map(void)
 {
unsigned long base, offset, size;
unsigned char *vbase;
@@ -209,7 +224,7 @@ int btext_initialize(struct device_node *np)
dispDeviceRect[2] = width;
dispDeviceRect[3] = height;
 
-   map_boot_text();
+   btext_map();
 
return 0;
 }
@@ -283,7 +298,7 @@ void btext_update_display(unsigned long phys, int width, 
int height,
iounmap(logicalDisplayBase);
boot_text_mapped = 0;
}
-   map_boot_text();
+   btext_map();
g_loc_X = 0;
g_loc_Y = 0;
g_max_loc_X = width / 8;
@@ -298,6 +313,7 @@ void btext_clearscreen(void)
(dispDeviceDepth >> 3)) >> 2;
int i,j;
 
+   rmci_maybe_on();
for (i=0; i<(dispDeviceRect[3] - dispDeviceRect[1]); i++)
{
unsigned int *ptr = base;
@@ -305,6 +321,7 @@ void btext_clearscreen(void)
*(ptr++) = 0;
base += (dispDeviceRowBytes >> 2);
}
+   rmci_maybe_off();
 }
 
 void btext_flushscreen(void)
@@ -355,6 +372,8 @@ static void scrollscreen(void)
   (dispDeviceDepth >> 3)) >> 2;
int i,j;
 
+   rmci_maybe_on();
+
for (i=0; i<(dispDeviceRect[3] - dispDeviceRect[1] - 16); i++)
{
unsigned int *src_ptr = src;
@@ -371,9 +390,116 @@ static void scrollscreen(void)
*(dst_ptr++) = 0;
dst += (dispDeviceRowBytes >> 2);
}
+
+   rmci_maybe_off();
 }
 #endif /* ndef NO_SCROLL */
 
+static unsigned int expand_bits_8[16] = {
+   0x,
+   0x00ff,
+   0xff00,
+   0x,
+   0x00ff,
+   0x00ff00ff,
+   0x0000,
+   0x00ff,
+   0xff00,
+  

[PATCH 2/2 V2] mmc: esdhc: get voltage from dts file

2013-07-24 Thread Haijun Zhang
Add voltage-range support in esdhc of T4, So we can choose
to read voltages from dts file as one optional.
If we can get a valid voltage-range from device node, we use
this voltage as the final voltage support. Else we still read
from capacity or from other provider.

Signed-off-by: Haijun Zhang 
Signed-off-by: Anton Vorontsov 
---
changes for V2:
- change dev_info to dev_err
- share function in pltfm.c

 drivers/mmc/host/sdhci-of-esdhc.c |  1 +
 drivers/mmc/host/sdhci-pltfm.c| 32 
 drivers/mmc/host/sdhci-pltfm.h|  1 +
 drivers/mmc/host/sdhci.c  |  3 +++
 include/linux/mmc/sdhci.h |  1 +
 5 files changed, 38 insertions(+)

diff --git a/drivers/mmc/host/sdhci-of-esdhc.c 
b/drivers/mmc/host/sdhci-of-esdhc.c
index 15039e2..cdfb08b 100644
--- a/drivers/mmc/host/sdhci-of-esdhc.c
+++ b/drivers/mmc/host/sdhci-of-esdhc.c
@@ -304,6 +304,7 @@ static int sdhci_esdhc_probe(struct platform_device *pdev)
return PTR_ERR(host);
 
sdhci_get_of_property(pdev);
+   sdhci_get_voltage(pdev);
 
np = pdev->dev.of_node;
if (of_device_is_compatible(np, "fsl,p2020-esdhc")) {
diff --git a/drivers/mmc/host/sdhci-pltfm.c b/drivers/mmc/host/sdhci-pltfm.c
index e2065a4..4682aba 100644
--- a/drivers/mmc/host/sdhci-pltfm.c
+++ b/drivers/mmc/host/sdhci-pltfm.c
@@ -109,10 +109,42 @@ void sdhci_get_of_property(struct platform_device *pdev)
host->mmc->pm_caps |= MMC_PM_WAKE_SDIO_IRQ;
}
 }
+
+void sdhci_get_voltage(struct platform_device *pdev)
+{
+   struct sdhci_host *host = platform_get_drvdata(pdev);
+   const u32 *voltage_ranges;
+   int num_ranges, i;
+   struct device_node *np;
+
+   np = pdev->dev.of_node;
+   voltage_ranges = of_get_property(np, "voltage-ranges", &num_ranges);
+   num_ranges = num_ranges / sizeof(*voltage_ranges) / 2;
+   if (!voltage_ranges || !num_ranges) {
+   dev_info(&pdev->dev, "OF: voltage-ranges unspecified\n");
+   return;
+   }
+
+   for (i = 0; i < num_ranges; i++) {
+   const int j = i * 2;
+   u32 mask;
+
+   mask = mmc_vddrange_to_ocrmask(be32_to_cpu(voltage_ranges[j]),
+   be32_to_cpu(voltage_ranges[j + 1]));
+   if (!mask) {
+   dev_err(&pdev->dev,
+   "OF: voltage-range #%d is invalid\n", i);
+   return;
+   }
+   host->ocr_mask |= mask;
+   }
+}
 #else
 void sdhci_get_of_property(struct platform_device *pdev) {}
+void sdhci_get_voltage(struct platform_device *pdev) {}
 #endif /* CONFIG_OF */
 EXPORT_SYMBOL_GPL(sdhci_get_of_property);
+EXPORT_SYMBOL_GPL(sdhci_get_voltage);
 
 struct sdhci_host *sdhci_pltfm_init(struct platform_device *pdev,
const struct sdhci_pltfm_data *pdata,
diff --git a/drivers/mmc/host/sdhci-pltfm.h b/drivers/mmc/host/sdhci-pltfm.h
index e15ced79..aba8253 100644
--- a/drivers/mmc/host/sdhci-pltfm.h
+++ b/drivers/mmc/host/sdhci-pltfm.h
@@ -92,6 +92,7 @@ static inline void sdhci_be32bs_writeb(struct sdhci_host 
*host, u8 val, int reg)
 #endif /* CONFIG_MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER */
 
 extern void sdhci_get_of_property(struct platform_device *pdev);
+extern void sdhci_get_voltage(struct platform_device *pdev);
 
 extern struct sdhci_host *sdhci_pltfm_init(struct platform_device *pdev,
  const struct sdhci_pltfm_data *pdata,
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index a78bd4f..57541e0 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -3119,6 +3119,9 @@ int sdhci_add_host(struct sdhci_host *host)
   SDHCI_MAX_CURRENT_MULTIPLIER;
}
 
+   if (host->ocr_mask)
+   ocr_avail = host->ocr_mask;
+
mmc->ocr_avail = ocr_avail;
mmc->ocr_avail_sdio = ocr_avail;
if (host->ocr_avail_sdio)
diff --git a/include/linux/mmc/sdhci.h b/include/linux/mmc/sdhci.h
index e3c6a74..3e781b8 100644
--- a/include/linux/mmc/sdhci.h
+++ b/include/linux/mmc/sdhci.h
@@ -171,6 +171,7 @@ struct sdhci_host {
unsigned intocr_avail_sdio; /* OCR bit masks */
unsigned intocr_avail_sd;
unsigned intocr_avail_mmc;
+   u32 ocr_mask;   /* available voltages */
 
wait_queue_head_t   buf_ready_int;  /* Waitqueue for Buffer Read 
Ready interrupt */
unsigned inttuning_done;/* Condition flag set when 
CMD19 succeeds */
-- 
1.8.0


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3] powerpc: VPHN topology change updates all siblings

2013-07-24 Thread Robert Jennings
When an associativity level change is found for one thread, the
siblings threads need to be updated as well.  This is done today
for PRRN in stage_topology_update() but is missing for VPHN in
update_cpu_associativity_changes_mask().  This patch will correctly
update all thread siblings during a topology change.

Without this patch a topology update can result in a CPU in
init_sched_groups_power() getting stuck indefinitely in a loop.

This loop is built in build_sched_groups(). As a result of the thread
moving to a node separate from its siblings the struct sched_group will
have its next pointer set to point to itself rather than the sched_group
struct of the next thread.  This happens because we have a domain without
the SD_OVERLAP flag, which is correct, and a topology that doesn't conform
with reality (threads on the same core assigned to different numa nodes).
When this list is traversed by init_sched_groups_power() it will reach
the thread's sched_group structure and loop indefinitely; the cpu will
be stuck at this point.

The bug was exposed when VPHN was enabled in commit b7abef0 (v3.9).

Cc: 
Reported-by: Jan Stancek 
Signed-off-by: Robert Jennings 
---
v2. cpu_sibling_mask is now defined for UP which fixes that build break.
v3. Corrected with Cc:stable under singed-off-by and improved description
of impact of this issue.

 - While re-enabled only in v3.9, hardware VPHN support was available
 prior to this.  This could be a pervasive issue and should be considered
 for the stable tree.
---
 arch/powerpc/include/asm/smp.h |  4 +++ arch/powerpc/mm/numa.c |
 59 +++--- 2 files changed, 48
 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ffbaabe..48cfc85 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -145,6 +145,10 @@ extern void __cpu_die(unsigned int cpu);
 #define smp_setup_cpu_maps()
 static inline void inhibit_secondary_onlining(void) {}
 static inline void uninhibit_secondary_onlining(void) {}
+static inline const struct cpumask *cpu_sibling_mask(int cpu)
+{
+   return cpumask_of(cpu);
+}
 
 #endif /* CONFIG_SMP */
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0839721..5850798 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1318,7 +1319,8 @@ static int update_cpu_associativity_changes_mask(void)
}
}
if (changed) {
-   cpumask_set_cpu(cpu, changes);
+   cpumask_or(changes, changes, cpu_sibling_mask(cpu));
+   cpu = cpu_last_thread_sibling(cpu);
}
}
 
@@ -1426,7 +1428,7 @@ static int update_cpu_topology(void *data)
if (!data)
return -EINVAL;
 
-   cpu = get_cpu();
+   cpu = smp_processor_id();
 
for (update = data; update; update = update->next) {
if (cpu != update->cpu)
@@ -1446,12 +1448,12 @@ static int update_cpu_topology(void *data)
  */
 int arch_update_cpu_topology(void)
 {
-   unsigned int cpu, changed = 0;
+   unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
cpumask_t updated_cpus;
struct device *dev;
-   int weight, i = 0;
+   int weight, new_nid, i = 0;
 
weight = cpumask_weight(&cpu_associativity_changes_mask);
if (!weight)
@@ -1464,19 +1466,46 @@ int arch_update_cpu_topology(void)
cpumask_clear(&updated_cpus);
 
for_each_cpu(cpu, &cpu_associativity_changes_mask) {
-   ud = &updates[i++];
-   ud->cpu = cpu;
-   vphn_get_associativity(cpu, associativity);
-   ud->new_nid = associativity_to_nid(associativity);
-
-   if (ud->new_nid < 0 || !node_online(ud->new_nid))
-   ud->new_nid = first_online_node;
+   /*
+* If siblings aren't flagged for changes, updates list
+* will be too short. Skip on this update and set for next
+* update.
+*/
+   if (!cpumask_subset(cpu_sibling_mask(cpu),
+   &cpu_associativity_changes_mask)) {
+   pr_info("Sibling bits not set for associativity "
+   "change, cpu%d\n", cpu);
+   cpumask_or(&cpu_associativity_changes_mask,
+   &cpu_associativity_changes_mask,
+   cpu_sibling_mask(cpu));
+   cpu = cpu_last_thread_sibling(cpu);
+   continue;
+   }
 
-   ud->old_nid = numa_cpu_lookup_table[cpu];
-   c

Re: [PATCH] module: ppc64 module CRC relocation fix causes perf issues

2013-07-24 Thread Benjamin Herrenschmidt
On Thu, 2013-07-25 at 08:34 +1000, Anton Blanchard wrote:
> > Apart from the annoying colors, is there anything specific I should
> > be looking for?  Some sort of error message, or output that actually
> > makes sense?
> 
> Thanks for testing! Ben, I think the patch is good to go.

Sent it yesterday to Linus, it's upstream already :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 15:43 -0700, Andrew Morton wrote:
> For what?  The three lines of comment in page-flags.h?   ack :)
> 
> Manipulating page->_count directly is considered poor form.  Don't
> blame us if we break your code ;)
> 
> Actually, the manipulation in realmode_get_page() duplicates the
> existing get_page_unless_zero() and the one in realmode_put_page()
> could perhaps be placed in mm.h with a suitable name and some
> documentation.  That would improve your form and might protect the code
> from getting broken later on.

Yes, this stuff makes me really nervous :-) If it didn't provide an order
of magnitude performance improvement in KVM I would avoid it but heh...

Alexey, I like having that stuff in generic code.

However the meaning of the words "real mode" can be ambiguous accross
architectures, it might be best to then name it "mmu_off_put_page" to
make things a bit clearer, along with a comment explaining that this is
called in a context where none of the virtual mappings are accessible
(vmalloc, vmemmap, IOs, ...), and that in the case of sparsemem vmemmap
the caller must have taken care of getting the physical address of the
struct page and of ensuring it isn't split accross two vmemmap blocks.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: VPHN topology change updates all siblings

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 10:00 -0500, Robert Jennings wrote:
> When an associativity level change is found for one thread, the
> siblings threads need to be updated as well.  This is done today
> for PRRN in stage_topology_update() but is missing for VPHN in
> update_cpu_associativity_changes_mask().
> 
> All threads should be updated to move to the new node.  Without this
> patch, a single thread may be flagged for a topology change, leaving it
> in a different node from its siblings, which is incorrect.  This causes
> problems for the scheduler where overlapping scheduler groups are created
> and a loop is formed in those groups.
> 
> Signed-off-by: Robert Jennings 
> ---

This is big for a CC stable ... Can you be a bit more verbose on what
the consequences are of not having this patch ? Ie, what happens when "a
loop loop is formed in [the scheduler] groups" ?

Also you shouldn't CC stable on the actual patch email. You should add a
CC:  tag along with your Signed-off-by:

Also how far back in stable should this go ?

Cheers,
Ben.

> cpu_sibling_mask is now defined for UP which fixes that build break.
> ---
>  arch/powerpc/include/asm/smp.h |  4 +++
>  arch/powerpc/mm/numa.c | 59 
> +++---
>  2 files changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index ffbaabe..48cfc85 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -145,6 +145,10 @@ extern void __cpu_die(unsigned int cpu);
>  #define smp_setup_cpu_maps()
>  static inline void inhibit_secondary_onlining(void) {}
>  static inline void uninhibit_secondary_onlining(void) {}
> +static inline const struct cpumask *cpu_sibling_mask(int cpu)
> +{
> + return cpumask_of(cpu);
> +}
>  
>  #endif /* CONFIG_SMP */
>  
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 0839721..5850798 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1318,7 +1319,8 @@ static int update_cpu_associativity_changes_mask(void)
>   }
>   }
>   if (changed) {
> - cpumask_set_cpu(cpu, changes);
> + cpumask_or(changes, changes, cpu_sibling_mask(cpu));
> + cpu = cpu_last_thread_sibling(cpu);
>   }
>   }
>  
> @@ -1426,7 +1428,7 @@ static int update_cpu_topology(void *data)
>   if (!data)
>   return -EINVAL;
>  
> - cpu = get_cpu();
> + cpu = smp_processor_id();
>  
>   for (update = data; update; update = update->next) {
>   if (cpu != update->cpu)
> @@ -1446,12 +1448,12 @@ static int update_cpu_topology(void *data)
>   */
>  int arch_update_cpu_topology(void)
>  {
> - unsigned int cpu, changed = 0;
> + unsigned int cpu, sibling, changed = 0;
>   struct topology_update_data *updates, *ud;
>   unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
>   cpumask_t updated_cpus;
>   struct device *dev;
> - int weight, i = 0;
> + int weight, new_nid, i = 0;
>  
>   weight = cpumask_weight(&cpu_associativity_changes_mask);
>   if (!weight)
> @@ -1464,19 +1466,46 @@ int arch_update_cpu_topology(void)
>   cpumask_clear(&updated_cpus);
>  
>   for_each_cpu(cpu, &cpu_associativity_changes_mask) {
> - ud = &updates[i++];
> - ud->cpu = cpu;
> - vphn_get_associativity(cpu, associativity);
> - ud->new_nid = associativity_to_nid(associativity);
> -
> - if (ud->new_nid < 0 || !node_online(ud->new_nid))
> - ud->new_nid = first_online_node;
> + /*
> +  * If siblings aren't flagged for changes, updates list
> +  * will be too short. Skip on this update and set for next
> +  * update.
> +  */
> + if (!cpumask_subset(cpu_sibling_mask(cpu),
> + &cpu_associativity_changes_mask)) {
> + pr_info("Sibling bits not set for associativity "
> + "change, cpu%d\n", cpu);
> + cpumask_or(&cpu_associativity_changes_mask,
> + &cpu_associativity_changes_mask,
> + cpu_sibling_mask(cpu));
> + cpu = cpu_last_thread_sibling(cpu);
> + continue;
> + }
>  
> - ud->old_nid = numa_cpu_lookup_table[cpu];
> - cpumask_set_cpu(cpu, &updated_cpus);
> + /* Use associativity from first thread for all siblings */
> + vphn_get_associativity(cpu, associativity);
> + new_nid = associativity_to_nid(associativity);
> + if (new_nid < 0 || !node_online(new_nid))
> +

Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Andrew Morton
On Tue, 23 Jul 2013 12:22:59 +1000 Alexey Kardashevskiy  wrote:

> Ping, anyone, please?

ew, you top-posted.

> Ben needs ack from any of MM people before proceeding with this patch. Thanks!

For what?  The three lines of comment in page-flags.h?   ack :)

Manipulating page->_count directly is considered poor form.  Don't
blame us if we break your code ;)

Actually, the manipulation in realmode_get_page() duplicates the
existing get_page_unless_zero() and the one in realmode_put_page()
could perhaps be placed in mm.h with a suitable name and some
documentation.  That would improve your form and might protect the code
from getting broken later on.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] module: ppc64 module CRC relocation fix causes perf issues

2013-07-24 Thread Anton Blanchard

Hi Scott,

> I'm not really sure what it's supposed to look like when "perf  
> annotate" works.  It spits a bunch of unreadable[1]
> dark-blue-on-black assembly code at me, all with "0.00 :" in the left
> column.
> 
> Oh, wait -- some lines have "100.00 : " on the left, in  
> even-more-unreadable dark-red-on-black.
> 
> Apart from the annoying colors, is there anything specific I should
> be looking for?  Some sort of error message, or output that actually
> makes sense?

Thanks for testing! Ben, I think the patch is good to go.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 08:39 -0700, Peter LaDow wrote:
> A bit of history that may help.  We were using an e100 (an 82559)
> part, but Intel EOL'd that part so we picked up the 82540EP (which
> they have also recently EOL'd).  The e100 driver uses a different DMA
> model.  It uses pci_map_single/pci_unmap_single along with
> pci_dma_sync_single_for* calls (as well as other PCI calls).  The
> e1000 driver, however, does not use the pci_* calls.  We have never
> had a problem with the e100 parts.  I don't suppose the use of
> pci_map_* vs dma_map_* makes a difference does it?

No, they resolve to the same thing under the hood. Did you do other
changes ? Could it be another unrelated kernel bug causing something
like use-after-free of network buffer or similar oddity unrelated to the
network driver ?

Have you tried with different kernel versions ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 04/11] PCI/hotplug: Needn't remove EEH cache again

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 12:02 -0600, Bjorn Helgaas wrote:
> [+cc linux-pci]
> 
> On Tue, Jul 23, 2013 at 8:24 PM, Gavin Shan  wrote:
> > Since pcibios_release_device() called by pci_stop_and_remove_bus_device()
> > has removed the EEH cache, we needn't do that again.
> >
> > Cc: Bjorn Helgaas 
> > Acked-by: Bjorn Helgaas 
> > Signed-off-by: Gavin Shan 
> 
> I'll be happy to merge this if you want, or since you have my Ack
> already, you can merge it with the rest of the series.  I didn't get
> the rest of the series, so I don't know if it depends on this.
> 
> Just let me know what you want me to do.

Already merged :-)

Thanks !

Cheers,
Ben.

> > ---
> >  drivers/pci/hotplug/rpadlpar_core.c |1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> > b/drivers/pci/hotplug/rpadlpar_core.c
> > index b29e20b..bb7af78 100644
> > --- a/drivers/pci/hotplug/rpadlpar_core.c
> > +++ b/drivers/pci/hotplug/rpadlpar_core.c
> > @@ -388,7 +388,6 @@ int dlpar_remove_pci_slot(char *drc_name, struct 
> > device_node *dn)
> > /* Remove the EADS bridge device itself */
> > BUG_ON(!bus->self);
> > pr_debug("PCI: Now removing bridge device %s\n", 
> > pci_name(bus->self));
> > -   eeh_remove_bus_device(bus->self, true);
> > pci_stop_and_remove_bus_device(bus->self);
> >
> > return 0;
> > --
> > 1.7.5.4
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Scott Wood

On 07/24/2013 04:39:59 AM, Alexander Graf wrote:


On 24.07.2013, at 11:35, Gleb Natapov wrote:

> On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
>>> Are not we going to use page_is_ram() from   
e500_shadow_mas2_attrib() as Scott commented?

>>
>> rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
>>
>>
> Because it is much slower and, IIRC, actually used to build pfn map  
that allow

> us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the  
rest of the kvm code is doing.


I don't understand "actually used to build pfn map...".  What code is  
this?  I don't see any calls to page_is_ram() in the KVM code, or in  
generic mm code.  Is this a statement about what x86 does?


On PPC page_is_ram() is only called (AFAICT) for determining what  
attributes to set on mmaps.  We want to be sure that KVM always makes  
the same decision.  While pfn_valid() seems like it should be  
equivalent, it's not obvious from the PPC code that it is.


If pfn_valid() is better, why is that not used for mmap?  Why are there  
two different names for the same thing?


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 4/4] DMA: Freescale: eliminate a compiling warning

2013-07-24 Thread Scott Wood

On 07/24/2013 01:21:09 AM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

The variable cookie is initialized in a list_for_each_entry loop,  
if(unlikely)
the list is empty, this variable will be used uninitialized, so we  
get a gcc
compiling warning about this. This patch fixes this defect by setting  
an

initial value to the varialble cookie.

Signed-off-by: Hongbo Zhang 
---
 drivers/dma/fsldma.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 16a9a48..14d68a4 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -406,7 +406,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct  
dma_async_tx_descriptor *tx)

struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
struct fsl_desc_sw *child;
unsigned long flags;
-   dma_cookie_t cookie;
+   dma_cookie_t cookie = 0;

spin_lock_irqsave(&chan->desc_lock, flags);


This patch is unrelated to the rest of the patch series...

What are the semantics of this function if there are multiple entries  
in the list?  Returning the last cookie seems a bit odd.


Is zero the proper error value?  include/linux/dmaengine.h suggests  
that cookies should be < 0 to indicate error.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 3/4] DMA: Freescale: update driver to support 8-channel DMA engine

2013-07-24 Thread Scott Wood

On 07/24/2013 01:21:08 AM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

This patch adds support to 8-channel DMA engine, thus the driver  
works for both

the new 8-channel and the legacy 4-channel DMA engines.

Signed-off-by: Hongbo Zhang 
---
 drivers/dma/Kconfig  |9 +
 drivers/dma/fsldma.c |9 ++---
 drivers/dma/fsldma.h |2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6825957..1b78272 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -89,14 +89,15 @@ config AT_HDMAC
  Support the Atmel AHB DMA controller.

 config FSL_DMA
-   tristate "Freescale Elo and Elo Plus DMA support"
+   tristate "Freescale Elo series DMA support"
depends on FSL_SOC
select DMA_ENGINE
select ASYNC_TX_ENABLE_CHANNEL_SWITCH
---help---
-	  Enable support for the Freescale Elo and Elo Plus DMA  
controllers.
-	  The Elo is the DMA controller on some 82xx and 83xx parts,  
and the

- Elo Plus is the DMA controller on 85xx and 86xx parts.
+ Enable support for the Freescale Elo series DMA controllers.
+	  The Elo is the DMA controller on some mpc82xx and mpc83xx  
parts, the
+	  EloPlus is on mpc85xx and mpc86xx and Pxxx parts, and the  
Elo3 is on
+	  some Txxx and Bxxx parts. Look up user manuals for details  
anyway.


The user manuals do not use the "elo" terminology.  I also don't  
understand the tone you're trying to convey with "anyway".


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 2/4] DMA: Freescale: Add new 8-channel DMA engine device tree nodes

2013-07-24 Thread Scott Wood

On 07/24/2013 01:21:07 AM, hongbo.zh...@freescale.com wrote:

From: Hongbo Zhang 

Freescale QorIQ T4 and B4 introduce new 8-channel DMA engines, this  
patch add

the device tree nodes for them.

Signed-off-by: Hongbo Zhang 
---
 .../devicetree/bindings/powerpc/fsl/dma.txt|   66  


 arch/powerpc/boot/dts/fsl/b4si-post.dtsi   |4 +-
 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi  |   81  

 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi  |   81  


 arch/powerpc/boot/dts/fsl/t4240si-post.dtsi|4 +-
 5 files changed, 232 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt  
b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt

index ed703d9..54a023b2 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/dma.txt
@@ -130,6 +130,72 @@ Example:
};
};

+** Freescale Elo3 DMA Controller
+   This is EloPlus controller with 8 channels, used in Freescale  
Txxx and Bxxx

+   series chips, such as t1040, t4240, b4860.
+
+Required properties:
+
+- compatible: should be "fsl,elo3-dma"


Should include "fsl,elo3-dma".  There's nothing different about elo3  
versus elo/eloplus regarding whether fsl,CHIP-dma is allowed.  I'd just  
drop the references to fsl,CHIP-dma throughout the binding, and phrase  
the compatible description as "must include" rather than "must be" so  
that additional strings are allowed.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] module: ppc64 module CRC relocation fix causes perf issues

2013-07-24 Thread Scott Wood

On 07/23/2013 08:30:32 AM, Michael Ellerman wrote:

On Fri, Jul 19, 2013 at 05:59:30PM -0500, Scott Wood wrote:
> On 07/17/2013 11:00:45 PM, Anton Blanchard wrote:
> >
> >Hi Scott,
> >
> >> What specifically should I do to test it?
> >
> >Could you double check perf annotate works? I'm 99% sure it will  
but

> >that is what was failing on ppc64.
>
> I'm not really sure what it's supposed to look like when "perf
> annotate" works.  It spits a bunch of unreadable[1]
> dark-blue-on-black assembly code at me, all with "0.00 :" in the
> left column.
>
> Oh, wait -- some lines have "100.00 : " on the left, in
> even-more-unreadable dark-red-on-black.
>
> Apart from the annoying colors, is there anything specific I should
> be looking for?  Some sort of error message, or output that actually
> makes sense?

The colours look fine on my terminal, so I don't know what you've done
there.


It probably looks better if the terminal is configured to have a light  
background (which of course makes some other programs look worse), or  
(as I noted) if you've got your monitor set to be very bright.  I now  
see that xfce4-terminal lets me redefine the standard colors, though,  
so that should help.



If you care you can use "--stdio" to use the plainer interface,
though it still uses colours.

That output looks fine in terms of the bug Anton was chasing. As far  
as

only ever hitting one instruction that does look weird.


OK.  I'll add "investigate weird e500 perf annotate results" to the  
TODO list...


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 8/8] Remove no longer needed powerpc memory node update handler

2013-07-24 Thread Nathan Fontenot
Remove the update_node handler for powerpc/pseries.

Now that we can do memory dlpar in the kernel we no longer need the of
update node notifier to update the ibm,dynamic-memory property of the
ibm,dynamic-reconfiguration-memory node. This work is now handled by
the memory notification handlers for powerpc/pseries.

This patch also conditionally registers the handler for of node remove
if we are not using the ibm,dynamic-reconfiguration-memory device tree
layout. That handler is only needed for handling memory@XXX nodes
in the device tree.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   60 +++-
 1 file changed, 8 insertions(+), 52 deletions(-)

Index: linux/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux.orig/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ linux/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -166,67 +166,15 @@ static inline int pseries_remove_memory(
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
-{
-   struct of_drconf_cell *new_drmem, *old_drmem;
-   unsigned long memblock_size;
-   u32 entries;
-   u32 *p;
-   int i, rc = -EINVAL;
-
-   memblock_size = get_memblock_size();
-   if (!memblock_size)
-   return -EINVAL;
-
-   p = (u32 *)of_get_property(pr->dn, "ibm,dynamic-memory", NULL);
-   if (!p)
-   return -EINVAL;
-
-   /* The first int of the property is the number of lmb's described
-* by the property. This is followed by an array of of_drconf_cell
-* entries. Get the niumber of entries and skip to the array of
-* of_drconf_cell's.
-*/
-   entries = *p++;
-   old_drmem = (struct of_drconf_cell *)p;
-
-   p = (u32 *)pr->prop->value;
-   p++;
-   new_drmem = (struct of_drconf_cell *)p;
-
-   for (i = 0; i < entries; i++) {
-   if ((old_drmem[i].flags & DRCONF_MEM_ASSIGNED) &&
-   (!(new_drmem[i].flags & DRCONF_MEM_ASSIGNED))) {
-   rc = pseries_remove_memblock(old_drmem[i].base_addr,
-memblock_size);
-   break;
-   } else if ((!(old_drmem[i].flags & DRCONF_MEM_ASSIGNED)) &&
-  (new_drmem[i].flags & DRCONF_MEM_ASSIGNED)) {
-   rc = memblock_add(old_drmem[i].base_addr,
- memblock_size);
-   rc = (rc < 0) ? -EINVAL : 0;
-   break;
-   }
-   }
-
-   return rc;
-}
-
 static int pseries_memory_notifier(struct notifier_block *nb,
   unsigned long action, void *node)
 {
-   struct of_prop_reconfig *pr;
int err = 0;
 
switch (action) {
case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_memory(node);
break;
-   case OF_RECONFIG_UPDATE_PROPERTY:
-   pr = (struct of_prop_reconfig *)node;
-   if (!strcmp(pr->prop->name, "ibm,dynamic-memory"))
-   err = pseries_update_drconf_memory(pr);
-   break;
}
return notifier_from_errno(err);
 }
@@ -237,6 +185,14 @@ static struct notifier_block pseries_mem
 
 static int __init pseries_memory_hotplug_init(void)
 {
+   struct device_node *dn;
+
+   dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (dn) {
+   of_node_put(dn);
+   return 0;
+   }
+
if (firmware_has_feature(FW_FEATURE_LPAR))
of_reconfig_notifier_register(&pseries_mem_nb);
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 7/8] Add memory hot add/remove notifier handlers for pwoerpc

2013-07-24 Thread Nathan Fontenot
Add memory hot add/remove notifier handlers for powerpc/pseries.

This patch allows the powerpc/pseries platforms to perform memory DLPAR
int the kernel. The handlers for add and remove do the work of
acquiring/releasing the memory to firmware and updating the device tree.

This is only used when memory is specified in the
ibm,dynamic-reconfiguration-memory device tree node so the memory notifiers
are registered contingent on its existence.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/platforms/pseries/dlpar.c |  103 +
 1 file changed, 103 insertions(+)

Index: linux/arch/powerpc/platforms/pseries/dlpar.c
===
--- linux.orig/arch/powerpc/platforms/pseries/dlpar.c
+++ linux/arch/powerpc/platforms/pseries/dlpar.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "offline_states.h"
@@ -531,11 +532,113 @@ out:
return rc ? rc : count;
 }
 
+static struct of_drconf_cell *dlpar_get_drconf_cell(struct device_node *dn,
+   unsigned long phys_addr)
+{
+   struct of_drconf_cell *drmem;
+   u32 entries;
+   u32 *prop;
+   int i;
+
+   prop = (u32 *)of_get_property(dn, "ibm,dynamic-memory", NULL);
+   of_node_put(dn);
+   if (!prop)
+   return NULL;
+
+   entries = *prop++;
+   drmem = (struct of_drconf_cell *)prop;
+
+   for (i = 0; i < entries; i++) {
+   if (drmem[i].base_addr == phys_addr)
+   return &drmem[i];
+   }
+
+   return NULL;
+}
+
+static int dlpar_mem_probe(unsigned long phys_addr)
+{
+   struct device_node *dn;
+   struct of_drconf_cell *drmem;
+   int rc;
+
+   dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (!dn)
+   return -EINVAL;
+
+   drmem = dlpar_get_drconf_cell(dn, phys_addr);
+   of_node_put(dn);
+
+   if (!drmem)
+   return -EINVAL;
+
+   if (drmem->flags & DRCONF_MEM_ASSIGNED)
+   return 0;
+
+   drmem->flags |= DRCONF_MEM_ASSIGNED;
+
+   rc = dlpar_acquire_drc(drmem->drc_index);
+   return rc;
+}
+
+static int dlpar_mem_release(unsigned long phys_addr)
+{
+   struct device_node *dn;
+   struct of_drconf_cell *drmem;
+   int rc;
+
+   dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (!dn)
+   return -EINVAL;
+
+   drmem = dlpar_get_drconf_cell(dn, phys_addr);
+   of_node_put(dn);
+
+   if (!drmem)
+   return -EINVAL;
+
+   if (!drmem->flags & DRCONF_MEM_ASSIGNED)
+   return 0;
+
+   drmem->flags &= ~DRCONF_MEM_ASSIGNED;
+
+   rc = dlpar_release_drc(drmem->drc_index);
+   return rc;
+}
+
+static int pseries_dlpar_mem_callback(struct notifier_block *nb,
+ unsigned long action, void *hp_arg)
+{
+   struct memory_notify *arg = hp_arg;
+   unsigned long phys_addr = arg->start_pfn << PAGE_SHIFT;
+   int rc = 0;
+
+
+   switch (action) {
+   case MEM_BEING_HOT_ADDED:
+   rc = dlpar_mem_probe(phys_addr);
+   break;
+   case MEM_HOT_REMOVED:
+   rc = dlpar_mem_release(phys_addr);
+   break;
+   }
+
+   return notifier_from_errno(rc);
+}
+
 static int __init pseries_dlpar_init(void)
 {
+   struct device_node *dn;
+
ppc_md.cpu_probe = dlpar_cpu_probe;
ppc_md.cpu_release = dlpar_cpu_release;
 
+   dn = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
+   if (dn) {
+   hotplug_memory_notifier(pseries_dlpar_mem_callback, 0);
+   of_node_put(dn);
+   }
+
return 0;
 }
 machine_device_initcall(pseries, pseries_dlpar_init);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 6/8] Update the powerpc arch specific memory add/remove handlers

2013-07-24 Thread Nathan Fontenot
In order to properly hot add and remove memory for powerpc the arch
specific callouts need to now complete all of the required work to
fully add or remove the memory.

With this update we can also remove the handler for memory node add
because the powerpc arch specific memory add handler will do all the
work needed. We do still need the memory node remove handler because
systems with memory specified in the memory@XXX nodes in the device tree
we have to use the removal of the node to trigger memory hot remove.

For systems on newer firmware with memory specified in the
ibm,dynamic-reconfiguration-memory node of the device tree this is not an
issue.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/mm/mem.c   |   33 +++---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   35 
 2 files changed, 29 insertions(+), 39 deletions(-)

Index: linux/arch/powerpc/mm/mem.c
===
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -120,17 +121,24 @@ int arch_add_memory(int nid, u64 start,
struct zone *zone;
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   u64 va_start;
+   int ret;
 
pgdata = NODE_DATA(nid);
 
-   start = (unsigned long)__va(start);
-   if (create_section_mapping(start, start + size))
+   va_start = (unsigned long)__va(start);
+   if (create_section_mapping(va_start, va_start + size))
return -EINVAL;
 
/* this should work for most non-highmem platforms */
zone = pgdata->node_zones;
 
-   return __add_pages(nid, zone, start_pfn, nr_pages);
+   ret = __add_pages(nid, zone, start_pfn, nr_pages);
+   if (ret)
+   return ret;
+
+   ret = memblock_add(start, size);
+   return ret;
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
@@ -138,10 +146,27 @@ int arch_remove_memory(u64 start, u64 si
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   unsigned long va_addr;
struct zone *zone;
+   int ret;
 
zone = page_zone(pfn_to_page(start_pfn));
-   return __remove_pages(zone, start_pfn, nr_pages);
+   ret = __remove_pages(zone, start_pfn, nr_pages);
+   if (ret)
+   return ret;
+
+   memblock_remove(start, size);
+
+   /* remove htab bolted mappings */
+   va_addr = (unsigned long)__va(start);
+   ret = remove_section_mapping(va_addr, va_addr + size);
+
+   /* Ensure all vmalloc mappings are flushed in case they also
+* hit that section of memory.
+*/
+   vm_unmap_aliases();
+
+   return ret;
 }
 #endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
Index: linux/arch/powerpc/platforms/pseries/hotplug-memory.c
===
--- linux.orig/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ linux/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -166,38 +166,6 @@ static inline int pseries_remove_memory(
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
-static int pseries_add_memory(struct device_node *np)
-{
-   const char *type;
-   const unsigned int *regs;
-   unsigned long base;
-   unsigned int lmb_size;
-   int ret = -EINVAL;
-
-   /*
-* Check to see if we are actually adding memory
-*/
-   type = of_get_property(np, "device_type", NULL);
-   if (type == NULL || strcmp(type, "memory") != 0)
-   return 0;
-
-   /*
-* Find the base and size of the memblock
-*/
-   regs = of_get_property(np, "reg", NULL);
-   if (!regs)
-   return ret;
-
-   base = *(unsigned long *)regs;
-   lmb_size = regs[3];
-
-   /*
-* Update memory region to represent the memory add
-*/
-   ret = memblock_add(base, lmb_size);
-   return (ret < 0) ? -EINVAL : 0;
-}
-
 static int pseries_update_drconf_memory(struct of_prop_reconfig *pr)
 {
struct of_drconf_cell *new_drmem, *old_drmem;
@@ -251,9 +219,6 @@ static int pseries_memory_notifier(struc
int err = 0;
 
switch (action) {
-   case OF_RECONFIG_ATTACH_NODE:
-   err = pseries_add_memory(node);
-   break;
case OF_RECONFIG_DETACH_NODE:
err = pseries_remove_memory(node);
break;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 5/8] Add notifiers for memory hot add/remove

2013-07-24 Thread Nathan Fontenot
In order to allow architectures or other subsystems to do any needed
work prior to hot adding or hot removing memory the memory notifier
chain should be updated to provide notifications of these events.

This patch adds the notifications for memory hot add and hot remove.

Signed-off-by: Nathan Fontenot 
--
 Documentation/memory-hotplug.txt |   26 +++---
 include/linux/memory.h   |6 ++
 mm/memory_hotplug.c  |   25 ++---
 3 files changed, 51 insertions(+), 6 deletions(-)

Index: linux/include/linux/memory.h
===
--- linux.orig/include/linux/memory.h
+++ linux/include/linux/memory.h
@@ -50,6 +50,12 @@ int arch_get_memory_phys_device(unsigned
 #defineMEM_GOING_ONLINE(1<<3)
 #defineMEM_CANCEL_ONLINE   (1<<4)
 #defineMEM_CANCEL_OFFLINE  (1<<5)
+#define MEM_BEING_HOT_REMOVED  (1<<6)
+#define MEM_HOT_REMOVED(1<<7)
+#define MEM_CANCEL_HOT_REMOVE  (1<<8)
+#define MEM_BEING_HOT_ADDED(1<<9)
+#define MEM_HOT_ADDED  (1<<10)
+#define MEM_CANCEL_HOT_ADD (1<<11)

 struct memory_notify {
unsigned long start_pfn;
Index: linux/mm/memory_hotplug.c
===
--- linux.orig/mm/memory_hotplug.c
+++ linux/mm/memory_hotplug.c
@@ -1073,17 +1073,25 @@ out:
 int __ref add_memory(int nid, u64 start, u64 size)
 {
pg_data_t *pgdat = NULL;
-   bool new_pgdat;
+   bool new_pgdat = false;
bool new_node;
-   struct resource *res;
+   struct resource *res = NULL;
+   struct memory_notify arg;
int ret;

lock_memory_hotplug();

+   arg.start_pfn = start >> PAGE_SHIFT;
+   arg.nr_pages = size / PAGE_SIZE;
+   ret = memory_notify(MEM_BEING_HOT_ADDED, &arg);
+   ret = notifier_to_errno(ret);
+   if (ret)
+   goto error;
+
res = register_memory_resource(start, size);
ret = -EEXIST;
if (!res)
-   goto out;
+   goto error;

{   /* Stupid hack to suppress address-never-null warning */
void *p = NODE_DATA(nid);
@@ -1119,9 +1127,12 @@ int __ref add_memory(int nid, u64 start,
/* create new memmap entry */
firmware_map_add_hotplug(start, start + size, "System RAM");

+   memory_notify(MEM_HOT_ADDED, &arg);
goto out;

 error:
+   memory_notify(MEM_CANCEL_HOT_ADD, &arg);
+
/* rollback pgdat allocation and others */
if (new_pgdat)
rollback_node_hotadd(nid, pgdat);
@@ -1784,10 +1795,15 @@ EXPORT_SYMBOL(try_offline_node);

 void __ref remove_memory(int nid, u64 start, u64 size)
 {
+   struct memory_notify arg;
int ret;

lock_memory_hotplug();

+   arg.start_pfn = start >> PAGE_SHIFT;
+   arg.nr_pages = size / PAGE_SIZE;
+   memory_notify(MEM_BEING_HOT_REMOVED, &arg);
+
/*
 * All memory blocks must be offlined before removing memory.  Check
 * whether all memory blocks in question are offline and trigger a BUG()
@@ -1796,6 +1812,7 @@ void __ref remove_memory(int nid, u64 st
ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
is_memblock_offlined_cb);
if (ret) {
+   memory_notify(MEM_CANCEL_HOT_REMOVE, &arg);
unlock_memory_hotplug();
BUG();
}
@@ -1807,6 +1824,8 @@ void __ref remove_memory(int nid, u64 st

try_offline_node(nid);

+   memory_notify(MEM_HOT_REMOVED, &arg);
+
unlock_memory_hotplug();
 }
 EXPORT_SYMBOL_GPL(remove_memory);
Index: linux/Documentation/memory-hotplug.txt
===
--- linux.orig/Documentation/memory-hotplug.txt
+++ linux/Documentation/memory-hotplug.txt
@@ -371,7 +371,9 @@ Need more implementation yet
 
 8. Memory hotplug event notifier
 
-Memory hotplug has event notifier. There are 6 types of notification.
+Memory hotplug has event notifier. There are 12 types of notification, the
+first six relate to memory hotplug and the second six relate to memory hot
+add/remove.

 MEMORY_GOING_ONLINE
   Generated before new memory becomes available in order to be able to
@@ -398,6 +400,24 @@ MEMORY_CANCEL_OFFLINE
 MEMORY_OFFLINE
   Generated after offlining memory is complete.

+MEMORY_BEING_HOT_REMOVED
+  Generated prior to the process of hot removing memory.
+
+MEMORY_CANCEL_HOT_REMOVE
+  Generated if MEMORY_BEING_HOT_REMOVED fails.
+
+MEMORY_HOT_REMOVED
+  Generated when memory has been successfully hot removed.
+
+MEMORY_BEING_HOT_ADDED
+  Generated prior to the process of hot adding memory.
+
+MEMORY_HOT_ADD_CANCEL
+  Generated if MEMORY_BEING_HOT_ADDED fails.
+
+MEMORY_HOT_ADDED
+  Generated when memory has successfully been hot added.
+
 

[PATCH 4/8] Create a sysfs release file for hot removing memory

2013-07-24 Thread Nathan Fontenot
Provide a sysfs interface to hot remove memory.

This patch updates the sysfs interface for hot add of memory to also
provide a sysfs interface to hot remove memory. The use of this interface
is controlled with the ARCH_MEMORY_PROBE config option, currently used
by x86 and powerpc. This patch also updates the name of this option to
CONFIG_ARCH_MEMORY_PROBE_RELEASE to indicate that it controls the probe
and release sysfs interfaces.

Signed-off-by: Nathan Fontenot 
---
 Documentation/memory-hotplug.txt |   34 
 arch/powerpc/Kconfig |2 
 arch/x86/Kconfig |2 
 drivers/base/memory.c|   81 ++-
 4 files changed, 100 insertions(+), 19 deletions(-)

Index: linux/drivers/base/memory.c
===
--- linux.orig/drivers/base/memory.c
+++ linux/drivers/base/memory.c
@@ -129,22 +129,30 @@ static ssize_t show_mem_end_phys_index(s
return sprintf(buf, "%08lx\n", phys_index);
 }
 
+static int is_memblock_removable(unsigned long start_section_nr)
+{
+   unsigned long pfn;
+   int i, ret = 1;
+
+   for (i = 0; i < sections_per_block; i++) {
+   pfn = section_nr_to_pfn(start_section_nr + i);
+   ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
+   }
+
+   return ret;
+}
+
 /*
  * Show whether the section of memory is likely to be hot-removable
  */
 static ssize_t show_mem_removable(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   unsigned long i, pfn;
-   int ret = 1;
+   int ret;
struct memory_block *mem =
container_of(dev, struct memory_block, dev);
 
-   for (i = 0; i < sections_per_block; i++) {
-   pfn = section_nr_to_pfn(mem->start_section_nr + i);
-   ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
-   }
-
+   ret = is_memblock_removable(mem->start_section_nr);
return sprintf(buf, "%d\n", ret);
 }
 
@@ -421,7 +429,7 @@ static DEVICE_ATTR(block_size_bytes, 044
  * as well as ppc64 will do all of their discovery in userspace
  * and will require this interface.
  */
-#ifdef CONFIG_ARCH_MEMORY_PROBE
+#ifdef CONFIG_ARCH_MEMORY_PROBE_RELEASE
 static ssize_t
 memory_probe_store(struct device *dev, struct device_attribute *attr,
   const char *buf, size_t count)
@@ -444,6 +452,60 @@ memory_probe_store(struct device *dev, s
 }
 
 static DEVICE_ATTR(probe, S_IWUSR, NULL, memory_probe_store);
+
+static int is_memblock_offline(struct memory_block *mem, void *arg)
+{
+   if (mem->state == MEM_ONLINE)
+   return 1;
+
+   return 0;
+}
+
+static ssize_t
+memory_release_store(struct device *dev, struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   u64 phys_addr;
+   int nid, ret = 0;
+   unsigned long block_size, pfn;
+   unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
+
+   lock_device_hotplug();
+
+   ret = kstrtoull(buf, 0, &phys_addr);
+   if (ret)
+   goto out;
+
+   if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1)) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   block_size = get_memory_block_size();
+   nid = memory_add_physaddr_to_nid(phys_addr);
+
+   /* Ensure memory is offline and removable before removing it. */
+   ret = walk_memory_range(PFN_DOWN(phys_addr),
+   PFN_UP(phys_addr + block_size - 1), NULL,
+   is_memblock_offline);
+   if (!ret) {
+   pfn = phys_addr >> PAGE_SHIFT;
+   ret = !is_memblock_removable(pfn_to_section_nr(pfn));
+   }
+
+   if (ret) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   remove_memory(nid, phys_addr, block_size);
+
+out:
+   unlock_device_hotplug();
+   return ret ? ret : count;
+}
+
+static DEVICE_ATTR(release, S_IWUSR, NULL, memory_release_store);
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
@@ -694,8 +756,9 @@ bool is_memblock_offlined(struct memory_
 }
 
 static struct attribute *memory_root_attrs[] = {
-#ifdef CONFIG_ARCH_MEMORY_PROBE
+#ifdef CONFIG_ARCH_MEMORY_PROBE_RELEASE
&dev_attr_probe.attr,
+   &dev_attr_release.attr,
 #endif
 
 #ifdef CONFIG_MEMORY_FAILURE
Index: linux/arch/powerpc/Kconfig
===
--- linux.orig/arch/powerpc/Kconfig
+++ linux/arch/powerpc/Kconfig
@@ -438,7 +438,7 @@ config SYS_SUPPORTS_HUGETLBFS
 
 source "mm/Kconfig"
 
-config ARCH_MEMORY_PROBE
+config ARCH_MEMORY_PROBE_RELEASE
def_bool y
depends on MEMORY_HOTPLUG
 
Index: linux/arch/x86/Kconfig
===
--- linux.orig/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -1343,7 +1343,7 @@ config ARCH_SELECT_MEMORY_MOD

[PATCH 3/8] Add all memory via sysfs probe interface at once

2013-07-24 Thread Nathan Fontenot
When doing memory hot add via the 'probe' interface in sysfs we do not
need to loop through and add memory one section at a time. I think this
was originally done for powerpc, but is not needed. This patch removes
the loop and just calls add_memory for all of the memory to be added.

Signed-off-by: Nathan Fontenot 
---
 drivers/base/memory.c |   20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

Index: linux/drivers/base/memory.c
===
--- linux.orig/drivers/base/memory.c
+++ linux/drivers/base/memory.c
@@ -427,8 +427,8 @@ memory_probe_store(struct device *dev, s
   const char *buf, size_t count)
 {
u64 phys_addr;
-   int nid;
-   int i, ret;
+   int nid, ret;
+   unsigned long block_size;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
 
phys_addr = simple_strtoull(buf, NULL, 0);
@@ -436,19 +436,11 @@ memory_probe_store(struct device *dev, s
if (phys_addr & ((pages_per_block << PAGE_SHIFT) - 1))
return -EINVAL;
 
-   for (i = 0; i < sections_per_block; i++) {
-   nid = memory_add_physaddr_to_nid(phys_addr);
-   ret = add_memory(nid, phys_addr,
-PAGES_PER_SECTION << PAGE_SHIFT);
-   if (ret)
-   goto out;
+   block_size = get_memory_block_size();
+   nid = memory_add_physaddr_to_nid(phys_addr);
+   ret = add_memory(nid, phys_addr, block_size);
 
-   phys_addr += MIN_MEMORY_BLOCK_SIZE;
-   }
-
-   ret = count;
-out:
-   return ret;
+   return ret ? ret : count;
 }
 
 static DEVICE_ATTR(probe, S_IWUSR, NULL, memory_probe_store);


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/8] Mark powerpc memory resources as busy

2013-07-24 Thread Nathan Fontenot
Memory I/O resources need to be marked as busy or else we cannot remove
them when doing memory hot remove.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/mm/mem.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/powerpc/mm/mem.c
===
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -523,7 +523,7 @@ static int add_system_ram_resources(void
res->name = "System RAM";
res->start = base;
res->end = base + size - 1;
-   res->flags = IORESOURCE_MEM;
+   res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
WARN_ON(request_resource(&iomem_resource, res) < 0);
}
}

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/8] register bootmem pages for powerpc when sparse vmemmap is not defined

2013-07-24 Thread Nathan Fontenot
Previous commit 46723bfa540... introduced a new config option
HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for powerpc
when sparse vmemmap is not defined.

This patch defines HAVE_BOOTMEM_INFO_NODE for powerpc and adds the call to
register_page_bootmem_info_node. Without this patch we get a BUG_ON for memory
hot remove in put_page_bootmem().

This also adds a stub for register_page_bootmem_memmap to allow powerpc to
build with sparse vmemmap defined.

Signed-off-by: Nathan Fontenot 
---

---
 arch/powerpc/mm/init_64.c |6 ++
 arch/powerpc/mm/mem.c |9 +
 mm/Kconfig|2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

Index: linux/arch/powerpc/mm/init_64.c
===
--- linux.orig/arch/powerpc/mm/init_64.c
+++ linux/arch/powerpc/mm/init_64.c
@@ -300,5 +300,11 @@ void vmemmap_free(unsigned long start, u
 {
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   WARN_ONCE(1, KERN_INFO
+ "Sparse Vmemmap not fully supported for bootmem info 
nodes\n");
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
Index: linux/arch/powerpc/mm/mem.c
===
--- linux.orig/arch/powerpc/mm/mem.c
+++ linux/arch/powerpc/mm/mem.c
@@ -297,12 +297,21 @@ void __init paging_init(void)
 }
 #endif /* ! CONFIG_NEED_MULTIPLE_NODES */
 
+static void __init register_page_bootmem_info(void)
+{
+   int i;
+
+   for_each_online_node(i)
+   register_page_bootmem_info_node(NODE_DATA(i));
+}
+
 void __init mem_init(void)
 {
 #ifdef CONFIG_SWIOTLB
swiotlb_init(0);
 #endif
 
+   register_page_bootmem_info();
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
set_max_mapnr(max_pfn);
free_all_bootmem();
Index: linux/mm/Kconfig
===
--- linux.orig/mm/Kconfig
+++ linux/mm/Kconfig
@@ -183,7 +183,7 @@ config MEMORY_HOTPLUG_SPARSE
 config MEMORY_HOTREMOVE
bool "Allow for memory hot remove"
select MEMORY_ISOLATION
-   select HAVE_BOOTMEM_INFO_NODE if X86_64
+   select HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
depends on MIGRATION
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/8] Correct memory hot add/remove for powerpc

2013-07-24 Thread Nathan Fontenot
The current implementation of memory hot add and remove for powerpc is broken.
This patch set both corrects this issue and updates the memory hot add and
remove code for powerpc so that it can be done properly in the kernel.

The first two patches update the powerpc hot add and remove code to work with
all of the updates that have gone in to enable memory remove with sparse
vmemmap enabled. With these two patches applied the powerpc code is back to
working, but not working properly.

The remaining patches update the powerpc memory add and remove code so the
work can be done in the kernel and all while holding the memory hotplug lock.
The current powerpc implementation does some of the work in the kernel and
some of the work in userspace. While this code did work at one time, it has
a problem in that it does part of the work to add and remove memory without
holding the memory hotplug lock. In this scheme memory could be added and
removed fast enough to cause the system to crash. This was a result of
doing part of the add or remove without holding the lock.

In order to do memory hot remove in the kernel, this patch set introduces
a sysfs release file (/sys/device/system/memory/release) which one
can write the physical address of the memory to be removed to. Additionally
there is a new set of flags defined for the memory notification chain to
indicate that memory is being hot added or hot removed. This allows any work
that may need to be done prior to or after memory is hot added or removed
to be performed.

The remaining patches in the patch set update the powerpc to properly do
memory hot add and remove in the kernel.

Nathan Fontenot
---
 Documentation/memory-hotplug.txt  |   26 
 arch/powerpc/mm/mem.c |   35 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c   |   95 +---
 drivers/base/memory.c |   81 --
 linux/Documentation/memory-hotplug.txt|   34 -
 linux/arch/powerpc/Kconfig|2 
 linux/arch/powerpc/mm/init_64.c   |6 +
 linux/arch/powerpc/mm/mem.c   |9 +
 linux/arch/powerpc/platforms/pseries/dlpar.c  |  103 ++
 linux/arch/powerpc/platforms/pseries/hotplug-memory.c |   60 +-
 linux/arch/x86/Kconfig|2 
 linux/drivers/base/memory.c   |   20 +--
 linux/include/linux/memory.h  |6 +
 linux/mm/Kconfig  |2 
 linux/mm/memory_hotplug.c |   25 +++-
 15 files changed, 322 insertions(+), 184 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 04/11] PCI/hotplug: Needn't remove EEH cache again

2013-07-24 Thread Bjorn Helgaas
[+cc linux-pci]

On Tue, Jul 23, 2013 at 8:24 PM, Gavin Shan  wrote:
> Since pcibios_release_device() called by pci_stop_and_remove_bus_device()
> has removed the EEH cache, we needn't do that again.
>
> Cc: Bjorn Helgaas 
> Acked-by: Bjorn Helgaas 
> Signed-off-by: Gavin Shan 

I'll be happy to merge this if you want, or since you have my Ack
already, you can merge it with the rest of the series.  I didn't get
the rest of the series, so I don't know if it depends on this.

Just let me know what you want me to do.

> ---
>  drivers/pci/hotplug/rpadlpar_core.c |1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
> b/drivers/pci/hotplug/rpadlpar_core.c
> index b29e20b..bb7af78 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -388,7 +388,6 @@ int dlpar_remove_pci_slot(char *drc_name, struct 
> device_node *dn)
> /* Remove the EADS bridge device itself */
> BUG_ON(!bus->self);
> pr_debug("PCI: Now removing bridge device %s\n", pci_name(bus->self));
> -   eeh_remove_bus_device(bus->self, true);
> pci_stop_and_remove_bus_device(bus->self);
>
> return 0;
> --
> 1.7.5.4
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Update compilation flags with core specific options

2013-07-24 Thread Scott Wood

On 07/24/2013 11:25:52 AM, Udma Catalin-Dan-B32721 wrote:
> This breaks the vdso for e500v1/v2 (userspace dies with SIGILL),  
since

> KBUILD_CFLAGS doesn't get used when building asm files, and the vdso
> uses mftbu/mftbl which are not being assembled to the form that
> e500v1/v2 support.
>
> We should be setting -mcpu=whatever and -msoft-float in both CFLAGS  
and
> AFLAGS, since we don't call "as" directly, and target selection  
should

> not differ based on whether we're building a C file or an asm file.
>
> -Scott
[CU] Thank you, Scott. I'll update also AFLAGS.

I have some questions about how you reproduce the SIGILL issue.
I tried to reproduce the issue looking to gettimeofday.S, that uses
mftbu/mftbl. In my tests, I obtained the same output when compiling  
this file
for p1021rdb before and after this patch, and also after adding -mcpu  
to
AFLAGS: "objdump -d arch/powerpc/kernel/vdso32/gettimeofday.o" looks  
the same

for the cases mentioned above:
   "mftbu r3" from .S file is decoded to "mfspr   r3,269" in "objdump  
-d" output


Indeed, for -mcpu=601/power3, the "objdump -d" output is "mftbu r3"  
and according to
powerISA this instruction would cause Illegal Instruction error  
handler to be invoked

and permits the sw to emulate the instruction.


It probably depends on what the default is for your toolchain.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH] Update compilation flags with core specific options

2013-07-24 Thread Udma Catalin-Dan-B32721
> This breaks the vdso for e500v1/v2 (userspace dies with SIGILL), since
> KBUILD_CFLAGS doesn't get used when building asm files, and the vdso
> uses mftbu/mftbl which are not being assembled to the form that
> e500v1/v2 support.
> 
> We should be setting -mcpu=whatever and -msoft-float in both CFLAGS and
> AFLAGS, since we don't call "as" directly, and target selection should
> not differ based on whether we're building a C file or an asm file.
> 
> -Scott
[CU] Thank you, Scott. I'll update also AFLAGS.

I have some questions about how you reproduce the SIGILL issue.
I tried to reproduce the issue looking to gettimeofday.S, that uses
mftbu/mftbl. In my tests, I obtained the same output when compiling this file
for p1021rdb before and after this patch, and also after adding -mcpu to
AFLAGS: "objdump -d arch/powerpc/kernel/vdso32/gettimeofday.o" looks the same
for the cases mentioned above:
   "mftbu r3" from .S file is decoded to "mfspr   r3,269" in "objdump -d" output

Indeed, for -mcpu=601/power3, the "objdump -d" output is "mftbu r3" and 
according to
powerISA this instruction would cause Illegal Instruction error handler to be 
invoked
and permits the sw to emulate the instruction.

Regards,
Catalin



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Inbound PCI and Memory Corruption

2013-07-24 Thread Peter LaDow
On Tue, Jul 23, 2013 at 9:27 PM, Benjamin Herrenschmidt
 wrote:
> CONFIG_NOT_COHERENT_CACHE will do it for you (in
> arch/powerpc/kernel/dma.c) provided the driver does the right things vs.
> the DMA accessors but afaik e1000 does.

Well, when I went to make the changes I noted a few things.  First,
the e1000 driver does a dma_unmap_single() prior to processing the
descriptor.  So it would seem that the dma_sync_single_for_cpu() isn't
necessary in that case.  And when allocating descriptors, it does
dma_map_single() after setting up the descriptor, so
dma_sync_single_for_device() probably isn't necessary either.

But regardless, I put in the dma_sync_single_* calls and we still get
the same behavior.  So, even with CONFIG_NOT_COHERENT_CACHE we are
getting this error.

> If that helps, that might hint at either a missing barrier or some kind
> of HW (or HW configuration) bug with cache coherency.

And unfortunately it didn't help.  We have a few other things we are
trying, but I'm not hopeful that any will change the behavior.

A bit of history that may help.  We were using an e100 (an 82559)
part, but Intel EOL'd that part so we picked up the 82540EP (which
they have also recently EOL'd).  The e100 driver uses a different DMA
model.  It uses pci_map_single/pci_unmap_single along with
pci_dma_sync_single_for* calls (as well as other PCI calls).  The
e1000 driver, however, does not use the pci_* calls.  We have never
had a problem with the e100 parts.  I don't suppose the use of
pci_map_* vs dma_map_* makes a difference does it?

Thanks,
Pete
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: VPHN topology change updates all siblings

2013-07-24 Thread Greg KH
On Wed, Jul 24, 2013 at 10:00:05AM -0500, Robert Jennings wrote:
> When an associativity level change is found for one thread, the
> siblings threads need to be updated as well.  This is done today
> for PRRN in stage_topology_update() but is missing for VPHN in
> update_cpu_associativity_changes_mask().
> 
> All threads should be updated to move to the new node.  Without this
> patch, a single thread may be flagged for a topology change, leaving it
> in a different node from its siblings, which is incorrect.  This causes
> problems for the scheduler where overlapping scheduler groups are created
> and a loop is formed in those groups.
> 
> Signed-off-by: Robert Jennings 
> ---
> cpu_sibling_mask is now defined for UP which fixes that build break.
> ---
>  arch/powerpc/include/asm/smp.h |  4 +++
>  arch/powerpc/mm/numa.c | 59 
> +++---
>  2 files changed, 48 insertions(+), 15 deletions(-)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] powerpc: VPHN topology change updates all siblings

2013-07-24 Thread Robert Jennings
When an associativity level change is found for one thread, the
siblings threads need to be updated as well.  This is done today
for PRRN in stage_topology_update() but is missing for VPHN in
update_cpu_associativity_changes_mask().

All threads should be updated to move to the new node.  Without this
patch, a single thread may be flagged for a topology change, leaving it
in a different node from its siblings, which is incorrect.  This causes
problems for the scheduler where overlapping scheduler groups are created
and a loop is formed in those groups.

Signed-off-by: Robert Jennings 
---
cpu_sibling_mask is now defined for UP which fixes that build break.
---
 arch/powerpc/include/asm/smp.h |  4 +++
 arch/powerpc/mm/numa.c | 59 +++---
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ffbaabe..48cfc85 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -145,6 +145,10 @@ extern void __cpu_die(unsigned int cpu);
 #define smp_setup_cpu_maps()
 static inline void inhibit_secondary_onlining(void) {}
 static inline void uninhibit_secondary_onlining(void) {}
+static inline const struct cpumask *cpu_sibling_mask(int cpu)
+{
+   return cpumask_of(cpu);
+}
 
 #endif /* CONFIG_SMP */
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0839721..5850798 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1318,7 +1319,8 @@ static int update_cpu_associativity_changes_mask(void)
}
}
if (changed) {
-   cpumask_set_cpu(cpu, changes);
+   cpumask_or(changes, changes, cpu_sibling_mask(cpu));
+   cpu = cpu_last_thread_sibling(cpu);
}
}
 
@@ -1426,7 +1428,7 @@ static int update_cpu_topology(void *data)
if (!data)
return -EINVAL;
 
-   cpu = get_cpu();
+   cpu = smp_processor_id();
 
for (update = data; update; update = update->next) {
if (cpu != update->cpu)
@@ -1446,12 +1448,12 @@ static int update_cpu_topology(void *data)
  */
 int arch_update_cpu_topology(void)
 {
-   unsigned int cpu, changed = 0;
+   unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
unsigned int associativity[VPHN_ASSOC_BUFSIZE] = {0};
cpumask_t updated_cpus;
struct device *dev;
-   int weight, i = 0;
+   int weight, new_nid, i = 0;
 
weight = cpumask_weight(&cpu_associativity_changes_mask);
if (!weight)
@@ -1464,19 +1466,46 @@ int arch_update_cpu_topology(void)
cpumask_clear(&updated_cpus);
 
for_each_cpu(cpu, &cpu_associativity_changes_mask) {
-   ud = &updates[i++];
-   ud->cpu = cpu;
-   vphn_get_associativity(cpu, associativity);
-   ud->new_nid = associativity_to_nid(associativity);
-
-   if (ud->new_nid < 0 || !node_online(ud->new_nid))
-   ud->new_nid = first_online_node;
+   /*
+* If siblings aren't flagged for changes, updates list
+* will be too short. Skip on this update and set for next
+* update.
+*/
+   if (!cpumask_subset(cpu_sibling_mask(cpu),
+   &cpu_associativity_changes_mask)) {
+   pr_info("Sibling bits not set for associativity "
+   "change, cpu%d\n", cpu);
+   cpumask_or(&cpu_associativity_changes_mask,
+   &cpu_associativity_changes_mask,
+   cpu_sibling_mask(cpu));
+   cpu = cpu_last_thread_sibling(cpu);
+   continue;
+   }
 
-   ud->old_nid = numa_cpu_lookup_table[cpu];
-   cpumask_set_cpu(cpu, &updated_cpus);
+   /* Use associativity from first thread for all siblings */
+   vphn_get_associativity(cpu, associativity);
+   new_nid = associativity_to_nid(associativity);
+   if (new_nid < 0 || !node_online(new_nid))
+   new_nid = first_online_node;
+
+   if (new_nid == numa_cpu_lookup_table[cpu]) {
+   cpumask_andnot(&cpu_associativity_changes_mask,
+   &cpu_associativity_changes_mask,
+   cpu_sibling_mask(cpu));
+   cpu = cpu_last_thread_sibling(cpu);
+   continue;
+   }
 
-   if (i < weight)
-   ud->next = &updates[i];
+   for_each_cpu(sibling, cpu_sibli

Re: [PATCH] powerpc: VPHN topology change updates all siblings

2013-07-24 Thread Robert Jennings
* Benjamin Herrenschmidt (b...@kernel.crashing.org) wrote:
> On Tue, 2013-07-23 at 07:33 -0500, Robert Jennings wrote:
> > When an associativity level change is found for one thread, the
> > siblings threads need to be updated as well.  This is done today
> > for PRRN in stage_topology_update() but is missing for VPHN in
> > update_cpu_associativity_changes_mask().
> > 
> > All threads should be updated to move to the new node.  Without this
> > patch, a single thread may be flagged for a topology change, leaving it
> > in a different node from its siblings, which is incorrect.  This causes
> > problems for the scheduler where overlapping scheduler groups are created
> > and a loop is formed in those groups.
> > 
> > Reported-by: Jan Stancek 
> > Signed-off-by: Robert Jennings 
> > Cc: 
> 
> Patch breaks the UP build ...
> 
> Cheers,
> Ben.

Very sorry about that, a fix is on the way.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/3] cpufreq: pmac64: re-estimate G5 cpufreq transition latency

2013-07-24 Thread Viresh Kumar
On 24 July 2013 15:48, Aaro Koskinen  wrote:
> Hi,
>
> On Wed, Jul 24, 2013 at 11:04:50AM +0530, Viresh Kumar wrote:
>> On 24 July 2013 01:54, Aaro Koskinen  wrote:
>> > The patch also enables to use ondemand governor on the latter.
>>
>> How? I can't see anything obvious here. :(
>
> It replaces CPUFREQ_ETERNAL with a proper value on older PowerMacs.
> ondemand does not accept CPUFREQ_ETERNAL transition latency.

Ahh.. In case you are sending it again, just add this info in log as people
might miss it.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH 1/2] cpuidle: fix cpu idle driver as a module can not remove

2013-07-24 Thread Li Yang-R58472


> -Original Message-
> From: linux-pm-ow...@vger.kernel.org [mailto:linux-pm-
> ow...@vger.kernel.org] On Behalf Of Wang Dongsheng-B40534
> Sent: Wednesday, July 24, 2013 10:26 AM
> To: Rafael J. Wysocki
> Cc: daniel.lezc...@linaro.org; linux...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: RE: [PATCH 1/2] cpuidle: fix cpu idle driver as a module can not
> remove
> 
> 
> 
> > -Original Message-
> > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > Sent: Wednesday, July 24, 2013 5:33 AM
> > To: Wang Dongsheng-B40534
> > Cc: daniel.lezc...@linaro.org; linux...@vger.kernel.org; linuxppc-
> > d...@lists.ozlabs.org
> > Subject: Re: [PATCH 1/2] cpuidle: fix cpu idle driver as a module can
> > not remove
> >
> > On Tuesday, July 23, 2013 05:28:00 PM Dongsheng Wang wrote:
> > > From: Wang Dongsheng 
> > >
> > > The module can not be removed when execute "rmmod". rmmod not use
> > > "--force".
> > >
> > > Log:
> > > root:~# rmmod cpuidle-e500
> > > incs[9], decs[1]
> > > rmmod: can't unload 'cpuidle_e500': Resource temporarily unavailable
> > >
> > > Signed-off-by: Wang Dongsheng 
> >
> > Can you please check the current linux-next branch of the linux-pm.git
> > tree and see if that doesn't conflict with the material in there?
> >
> > Also please explain in the changelog how your changes help to fix the
> > problem.
> >
> Yes, Linux-next branch also have this problem.
> 
> Should I base on Linux-next to fix this problem?

I think Dongsheng is trying to make the platform cpuidle driver as a kernel 
module.

My questions are:
Is the cpuidle driver supposed to work as a module?
Or it can only be built-in like many current drivers do?

Regards,
Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/3] cpufreq: pmac64: re-estimate G5 cpufreq transition latency

2013-07-24 Thread Aaro Koskinen
Hi,

On Wed, Jul 24, 2013 at 11:04:50AM +0530, Viresh Kumar wrote:
> On 24 July 2013 01:54, Aaro Koskinen  wrote:
> > The patch also enables to use ondemand governor on the latter.
> 
> How? I can't see anything obvious here. :(

It replaces CPUFREQ_ETERNAL with a proper value on older PowerMacs.
ondemand does not accept CPUFREQ_ETERNAL transition latency.

A.

> 
> >
> > Signed-off-by: Aaro Koskinen 
> > ---
> >  drivers/cpufreq/pmac64-cpufreq.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cpufreq/pmac64-cpufreq.c 
> > b/drivers/cpufreq/pmac64-cpufreq.c
> > index 674807d..f9e399b 100644
> > --- a/drivers/cpufreq/pmac64-cpufreq.c
> > +++ b/drivers/cpufreq/pmac64-cpufreq.c
> > @@ -85,7 +85,8 @@ static int (*g5_query_freq)(void);
> >
> >  static DEFINE_MUTEX(g5_switch_mutex);
> >
> > -static unsigned long transition_latency;
> > +/* A conservative estimate, based on Xserve G5 and iMac G5 (iSight). */
> > +static const unsigned long transition_latency = 10 * NSEC_PER_MSEC;
> >
> >  #ifdef CONFIG_PMAC_SMU
> >
> > @@ -499,7 +500,6 @@ static int __init g5_neo2_cpufreq_init(struct 
> > device_node *cpus)
> > g5_cpu_freqs[1].frequency = max_freq/2;
> >
> > /* Set callbacks */
> > -   transition_latency = 12000;
> > g5_switch_freq = g5_scom_switch_freq;
> > g5_query_freq = g5_scom_query_freq;
> > freq_method = "SCOM";
> > @@ -675,7 +675,6 @@ static int __init g5_pm72_cpufreq_init(struct 
> > device_node *cpus)
> > g5_cpu_freqs[1].frequency = min_freq;
> >
> > /* Set callbacks */
> > -   transition_latency = CPUFREQ_ETERNAL;
> > g5_switch_volt = g5_pfunc_switch_volt;
> > g5_switch_freq = g5_pfunc_switch_freq;
> > g5_query_freq = g5_pfunc_query_freq;
> > --
> > 1.8.3.2
> >
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/3] cpuidle/powernv: cpuidle backend driver for powernv

2013-07-24 Thread Deepthi Dharwar
On 07/23/2013 07:36 PM, Michael Ellerman wrote:
> On Tue, Jul 23, 2013 at 02:31:41PM +0530, Deepthi Dharwar wrote:
>> This patch implements a back-end cpuidle driver for
>> powernv calling power7_nap and snooze idle states.
>> This can be extended by adding more idle states
>> in the future to the existing framework.
> 
> Other than the state table and a few minor details this looks almost
> identical to the pseries driver. Can we not have a single version in
> sysdev and isolate just the differences?
>

Hi Michael,

Yes, I was actually looking at consolidating and moving all the powerpc
cpuidle driver code to drivers/cpuidle/. sysdev also seems fine. Let me
redo and club the drivers and have a single version of the code in
sysdev for both powerpc and powernv platforms.

Thanks !
Deepthi


> cheers
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Inbound PCI and Memory Corruption

2013-07-24 Thread David Laight
> On Fri, Jul 19, 2013 at 6:46 AM, Gerhard Sittig  wrote:
> > So:  No, not having to fiddle with DMA stuff when doing PCI need
> > not be a problem, it's actually expected.  But since a DMA engine
> > might be involved (that's just not under your command), the
> > accompanying problems may arise.  You may need to flush CPU
> > provided data upon write before telling an external entity to
> > access it, and may need to invalidate caches (to have data
> > re-fetched) before the CPU accesses what an external entity did
> > manipulate.  And this applies to both payload data as well as
> > management data (descriptors) if the latter apply to the former.
> 
> This is something I've been exploring today.  But what is unclear is
> _how_ to flush/invalidate the caches'.  I was going to tweak the
> driver to setup the descriptors, flush the cache, then enable the
> hardware (and when taking the device down, disable the hardware, flush
> the cache, then deallocate the descriptors).  But this is in the
> network code and it isn't obvious how to make this happen.

FWIW it is almost impossible to code for non-coherent descriptors
(even ignoring problems with speculative cache line reads).
You don't even want to try to do it except for hardware where you
can no choice.

The problem is that you have no control over the device writes
into the descriptors. In order not to lose the device writes
the cpu must not write to any cache lines that contain active
descriptors.

For the receive side this can be arranged by initialising cache
line sized blocks of descriptors (if the cache line write isn't
atomic you still have problems).

The send side is much more tricky: you either have to setup a
full cache line of descriptors or wait until the transmit is idle.

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[git pull] Please pull powerpc.git merge branch

2013-07-24 Thread Benjamin Herrenschmidt
Hi Linus !

Here is a series of powerpc fixes. It's a bit big, mostly because of the
series of 11 "EEH" patches from Gavin. The EEH (Our IBM specific
PCI/PCIe Enhanced Error Handling) code had been rotting for a while and
this merge window saw a significant rework & fixing of it by Gavin Shan.

However, that wasn't complete and left some open issues. There were
still a few corner cases that didn't work properly, for example in
relation to hotplug and devices without explicit error handlers. We had
some patches but they weren't quite good enough yet so I left them off
the 3.11 merge window.

Gavin since then fixed it all up, we ran quite a few rounds of testing
and it seems fairly solid (at least probably more than it has ever
been). This should probably have made -rc1 but both Gavin and I took
some vacation so it had to wait for -rc2.

The rest is more bug fixes, mostly to new features recently added, for
example, we missed the cpu table entry for one of the two models of P8
(we didn't realize they had different PVR [Processor Version Register]
values), some module CRC issues, etc...

Please apply,

Cheers,
Ben.

The following changes since commit 3b2f64d00c46e1e4e9bd0bb9bb12619adac27a4b:

  Linux 3.11-rc2 (2013-07-21 12:05:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to ff3d79dc12c2ed38483f6c1e0f26fde430f27c9d:

  powerpc/perf: BHRB filter configuration should follow the task (2013-07-24 
14:42:34 +1000)


Aneesh Kumar K.V (2):
  powerpc/mm: Fix fallthrough bug in hpte_decode
  powerpc/mm: Use the correct SLB(LLP) encoding in tlbie instruction

Anshuman Khandual (2):
  powerpc/perf: Ignore separate BHRB privilege state filter request
  powerpc/perf: BHRB filter configuration should follow the task

Anton Blanchard (1):
  powerpc/modules: Module CRC relocation fix causes perf issues

Bjorn Helgaas (1):
  powerpc/powernv: Mark pnv_pci_init_ioda2_phb() as __init

Denis Kirjanov (1):
  powerpc/pseries: Fix a typo in pSeries_lpar_hpte_insert()

Gavin Shan (11):
  powerpc/eeh: Remove reference to PCI device
  powerpc/eeh: Export functions for hotplug
  powerpc/pci: Override pcibios_release_device()
  powerpc/pci/hotplug: Don't need to remove from EEH cache twice
  powerpc/eeh: Keep PE during hotplug
  powerpc/eeh: Use safe list traversal when walking EEH devices
  powerpc/pci: Partial tree hotplug support
  powerpc/eeh: Use partial hotplug for EEH unaware drivers
  powerpc/eeh: Don't use pci_dev during BAR restore
  powerpc/eeh: Fix unbalanced enable for IRQ
  powerpc/eeh: Introdce flag to protect sysfs

Mahesh Salgaonkar (1):
  powerpc: Fix the corrupt r3 error during MCE handling.

Michael Ellerman (1):
  powerpc/perf: Set PPC_FEATURE2_EBB when we register the power8 PMU

Michael Neuling (1):
  powerpc: Add second POWER8 PVR entry

Paul Bolle (1):
  powerpc/pseries: Drop "select HOTPLUG"

Tiejun Chen (1):
  powerpc: Access local paca after hard irq disabled

 arch/powerpc/include/asm/eeh.h   | 30 ---
 arch/powerpc/include/asm/hw_irq.h|  7 +--
 arch/powerpc/include/asm/module.h|  5 +-
 arch/powerpc/include/asm/pci-bridge.h|  1 -
 arch/powerpc/include/asm/reg.h   |  3 +-
 arch/powerpc/kernel/cputable.c   | 20 +++-
 arch/powerpc/kernel/eeh.c| 70 -
 arch/powerpc/kernel/eeh_cache.c  | 18 ++-
 arch/powerpc/kernel/eeh_driver.c | 77 ++--
 arch/powerpc/kernel/eeh_pe.c | 58 +
 arch/powerpc/kernel/eeh_sysfs.c  | 21 
 arch/powerpc/kernel/pci-common.c |  2 +
 arch/powerpc/kernel/pci-hotplug.c| 49 +-
 arch/powerpc/kernel/pci_of_scan.c| 56 ++--
 arch/powerpc/kernel/prom_init.c  |  5 +-
 arch/powerpc/kernel/vmlinux.lds.S|  3 --
 arch/powerpc/mm/hash_native_64.c | 12 -
 arch/powerpc/perf/core-book3s.c  |  5 +-
 arch/powerpc/perf/power8-pmu.c   | 24 +
 arch/powerpc/platforms/powernv/eeh-powernv.c | 17 --
 arch/powerpc/platforms/powernv/pci-ioda.c|  2 +-
 arch/powerpc/platforms/pseries/Kconfig   |  1 -
 arch/powerpc/platforms/pseries/eeh_pseries.c | 67 ++--
 arch/powerpc/platforms/pseries/lpar.c|  2 +-
 arch/powerpc/platforms/pseries/ras.c |  3 ++
 drivers/pci/hotplug/rpadlpar_core.c  |  1 -
 26 files changed, 390 insertions(+), 169 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 0/4] DMA: Freescale: Add support for 8-channel DMA engine

2013-07-24 Thread Li Yang
On Wed, Jul 24, 2013 at 2:21 PM,  wrote:

> From: Hongbo Zhang 
>
> Hi Vinod, Dan, Scott and Leo, please have a look at these V2 patches.
>

Looks good now after rounds of review.

Acked-by: Li Yang 


>
> Freescale QorIQ T4 and B4 introduce new 8-channel DMA engines, this patch
> set
> adds support this DMA engine.
>
> V4->V5 changes:
> - update description in the dt binding document, to make it more resonable
> - add new patch [4/4] to eliminate a compiling warning which already exists
>   for a long time
>
> V3->V4 changes:
> - introduce new patch [1/3] to revise the legacy dma binding document
> - and then add new paragraph to describe new dt node binding in [2/3]
> - rebase to latest kernel v3.11-rc1
>
> V2->V3 changes:
> - edit Documentation/devicetree/bindings/powerpc/fsl/dma.txt
> - edit text string in Kconfig and the driver files, using "elo series" to
>   mention all the current "elo*"
>
> V1->V2 changes:
> - removed the codes handling the register dgsr1, since it isn't used
> corrently
> - renamed the DMA DT compatible to "fsl,elo3-dma"
> - renamed the new dts files to "elo3-dma-.dtsi"
>
> Hongbo Zhang (4):
>   DMA: Freescale: revise device tree binding document
>   DMA: Freescale: Add new 8-channel DMA engine device tree nodes
>   DMA: Freescale: update driver to support 8-channel DMA engine
>   DMA: Freescale: eliminate a compiling warning
>
>  .../devicetree/bindings/powerpc/fsl/dma.txt|  118
> +++-
>  arch/powerpc/boot/dts/fsl/b4si-post.dtsi   |4 +-
>  arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi  |   81 ++
>  arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi  |   81 ++
>  arch/powerpc/boot/dts/fsl/t4240si-post.dtsi|4 +-
>  drivers/dma/Kconfig|9 +-
>  drivers/dma/fsldma.c   |   11 +-
>  drivers/dma/fsldma.h   |2 +-
>  8 files changed, 269 insertions(+), 41 deletions(-)
>  create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-0.dtsi
>  create mode 100644 arch/powerpc/boot/dts/fsl/elo3-dma-1.dtsi
>
> --
> 1.7.9.5
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
- Leo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev