[PATCH] ALSA: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Wei Yongjun
From: Wei Yongjun 

Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun 
---
 sound/pci/emu10k1/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/pci/emu10k1/memory.c b/sound/pci/emu10k1/memory.c
index 0a43662..1dce0a39 100644
--- a/sound/pci/emu10k1/memory.c
+++ b/sound/pci/emu10k1/memory.c
@@ -263,8 +263,8 @@ int snd_emu10k1_memblk_map(struct snd_emu10k1 *emu, struct 
snd_emu10k1_memblk *b
spin_lock_irqsave(&emu->memblk_lock, flags);
if (blk->mapped_page >= 0) {
/* update order link */
-   list_del(&blk->mapped_order_link);
-   list_add_tail(&blk->mapped_order_link, 
&emu->mapped_order_link_head);
+   list_move_tail(&blk->mapped_order_link,
+  &emu->mapped_order_link_head);
spin_unlock_irqrestore(&emu->memblk_lock, flags);
return 0;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/11] kexec: Disable in a secure boot environment

2012-09-05 Thread Eric W. Biederman
Matthew Garrett  writes:

> On Tue, Sep 04, 2012 at 09:33:31PM -0700, Eric W. Biederman wrote:
>> Matthew Garrett  writes:
>> > The full implementation should trust keys that are trusted by the 
>> > platform, so it'd boot any kexec image you cared to sign. Or simply 
>> > patch this code out and rebuild and self-sign, or disable the code that 
>> > turns off the capability when in secure boot mode. I've no objection to 
>> > putting that behind an #ifdef.
>> 
>> I will be happy to see a version of kexec that accepts signed images,
>> allowing the functionality to work in your brave new world where
>> everything must be signed.
>> 
>> Until then I don't see a point in merging anything else.
>
> Fine. We'll just carry this one out of tree for now.

It is your tree.

I am disappointed to learn that you aren't enthusiastic about
implementing verification of signatures for all code that goes into
ring 0.

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/11] kexec: Disable in a secure boot environment

2012-09-05 Thread Matthew Garrett
On Wed, Sep 05, 2012 at 12:00:31AM -0700, Eric W. Biederman wrote:
> Matthew Garrett  writes:
> > Fine. We'll just carry this one out of tree for now.
> 
> It is your tree.
> 
> I am disappointed to learn that you aren't enthusiastic about
> implementing verification of signatures for all code that goes into
> ring 0.

I am enthusiastic, but October 26th is a date outside my control and 
kexec isn't at the top of the priority list. We ship with the code we 
have, not the code we want.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

2012-09-05 Thread Paolo Bonzini
Il 04/09/2012 22:11, Nicholas A. Bellinger ha scritto:
>>> As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the
>>> atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o
>>> smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h
>>> accessors properly, no..?
>>
>> No, only a single "thing" is being accessed, and there is no need to
>> order the decrement with respect to preceding or subsequent accesses to
>> other locations.
>>
>> In other words, tgt->reqs is already synchronized with itself, and that
>> is enough.
>>
> 
> However, it's still my understanding that the use of atomic_dec() in the
> completion path mean that smp_mb__after_atomic_dec() is a requirement to
> be proper portable atomic.hcode, no..?  Otherwise tgt->regs should be
> using something other than an atomic_t, right..?

Memory barriers aren't _always_ requested, only when you need to order
accesses to multiple locations.

In this case, there is no other location that the
queuecommand/completion handlers needs to synchronize against, so no
barrier is required.  You can see plenty of atomic_inc/atomic_dec in the
code without a barrier afterwards (the typical case is the opposite as
in this patch: a refcount increment needs no barrier, a refcount
decrement uses atomic_dec_return).

>> virtio-scsi multiqueue has a performance benefit up to 20% (for a single
>> LUN) or 40% (on overall bandwidth across multiple LUNs).  I doubt that a
>> single memory barrier can have that much impact. :)
>>
> 
> I've no doubt that this series increases the large block high bandwidth
> for virtio-scsi, but historically that has always been the easier
> workload to scale.  ;)

This is with a mixed workload (random 4k-64k) and tmpfs backend on the host.

> Yes, I think Jen's new approach is providing some pretty significant
> gains for raw block drivers with extremly high packet (small block
> random I/O) workloads, esp with hw block drivers that support genuine mq
> with hw num_queues > 1.

I need to look into it, to understand how the queue steering here can be
adapted to his code.

>> Have you measured the host_lock to be a bottleneck in high-iops
>> benchmarks, even for a modern driver that does not hold it in
>> queuecommand?  (Certainly it will become more important as the
>> virtio-scsi queuecommand becomes thinner and thinner).
> 
> This is exactly why it would make such a good vehicle to re-architect
> SCSI core.  I'm thinking it can be the first sw LLD we attempt to get
> running on an (currently) future scsi-mq prototype.

Agreed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Clemens Ladisch
Wei Yongjun wrote:
> Using list_move_tail() instead of list_del() + list_add_tail().
>
> Signed-off-by: Wei Yongjun 

Acked-by: Clemens Ladisch 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Staging: ipack: fix build failure in powerpc allyesconfig

2012-09-05 Thread Samuel Iglesias Gonsálvez
Caused by commit 187e47824013 ("Staging: ipack: Read the ID space during
device registration").

drivers/staging/ipack/ipack.c: In function 'ipack_device_read_id':
drivers/staging/ipack/ipack.c:291:2: error: implicit declaration of function
'ioread8' [-Werror=implicit-function-declaration]
drivers/staging/ipack/ipack.c:309:3: error: implicit declaration of function
'ioread16be' [-Werror=implicit-function-declaration]

Reported-by: Stephen Rothwell 
Signed-off-by: Samuel Iglesias Gonsálvez 
---
 drivers/staging/ipack/ipack.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/ipack/ipack.c b/drivers/staging/ipack/ipack.c
index da9e7bd..8cce6c4 100644
--- a/drivers/staging/ipack/ipack.c
+++ b/drivers/staging/ipack/ipack.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ipack.h"
 
 #define to_ipack_dev(device) container_of(device, struct ipack_device, dev)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1 v3] leds: Add new LED driver for lm355x chips

2012-09-05 Thread G.Shark Jeong
From: "G.Shark Jeong" 

LM3554 and LM3556 have similar functions but very different register map.
This driver is a general version for lm3554 and lm3556 both,led chips of TI.
lm3556 driver can be replaced by this driver.

LM3554 :
The LM3554 is a 2 MHz fixed-frequency synchronous boost
converter with 1.2A dual high side led drivers.
Datasheet: www.ti.com/lit/ds/symlink/lm3554.pdf

LM3556 :
The LM3556 is a 4 MHz fixed-frequency synchronous boost
converter plus 1.5A constant current driver for a high-current white LED.
Datasheet: www.national.com/ds/LM/LM3556.pdf

G.Shark Jeong (1):
  leds: Add LED driver for lm355x chips

 drivers/leds/Kconfig  |8 +-
 drivers/leds/Makefile |2 +-
 drivers/leds/leds-lm3556.c|  512 --
 drivers/leds/leds-lm355x.c|  572 +
 include/linux/platform_data/leds-lm3556.h |   50 ---
 include/linux/platform_data/leds-lm355x.h |   66 
 6 files changed, 643 insertions(+), 567 deletions(-)
 delete mode 100644 drivers/leds/leds-lm3556.c
 create mode 100644 drivers/leds/leds-lm355x.c
 delete mode 100644 include/linux/platform_data/leds-lm3556.h
 create mode 100644 include/linux/platform_data/leds-lm355x.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1 v3] leds: Add new LED driver for lm355x chips

2012-09-05 Thread G.Shark Jeong
From: "G.Shark Jeong" 

This driver is a general version for LM355x,lm3554 and lm3556,led chips of TI.

LM3554 :
The LM3554 is a 2 MHz fixed-frequency synchronous boost
converter with 1.2A dual high side led drivers.
Datasheet: www.ti.com/lit/ds/symlink/lm3554.pdf

LM3556 :
The LM3556 is a 4 MHz fixed-frequency synchronous boost
converter plus 1.5A constant current driver for a high-current white LED.
Datasheet: www.national.com/ds/LM/LM3556.pdf

Signed-off-by: G.Shark Jeong 
---
 drivers/leds/Kconfig  |8 +-
 drivers/leds/Makefile |2 +-
 drivers/leds/leds-lm3556.c|  512 --
 drivers/leds/leds-lm355x.c|  572 +
 include/linux/platform_data/leds-lm3556.h |   50 ---
 include/linux/platform_data/leds-lm355x.h |   66 
 6 files changed, 643 insertions(+), 567 deletions(-)
 delete mode 100644 drivers/leds/leds-lm3556.c
 create mode 100644 drivers/leds/leds-lm355x.c
 delete mode 100644 include/linux/platform_data/leds-lm3556.h
 create mode 100644 include/linux/platform_data/leds-lm355x.h

diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index c96bbaa..4f6ced2 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -422,13 +422,13 @@ config LEDS_MAX8997
  This option enables support for on-chip LED drivers on
  MAXIM MAX8997 PMIC.
 
-config LEDS_LM3556
-   tristate "LED support for LM3556 Chip"
+config LEDS_LM355x
+   tristate "LED support for LM355x Chips, LM3554 and LM3556"
depends on LEDS_CLASS && I2C
select REGMAP_I2C
help
- This option enables support for LEDs connected to LM3556.
- LM3556 includes Torch, Flash and Indicator functions.
+ This option enables support for LEDs connected to LM355x.
+ LM355x includes Torch, Flash and Indicator functions.
 
 config LEDS_OT200
tristate "LED support for the Bachmann OT200"
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index a4429a9..b57a021 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -48,7 +48,7 @@ obj-$(CONFIG_LEDS_NETXBIG)+= leds-netxbig.o
 obj-$(CONFIG_LEDS_ASIC3)   += leds-asic3.o
 obj-$(CONFIG_LEDS_RENESAS_TPU) += leds-renesas-tpu.o
 obj-$(CONFIG_LEDS_MAX8997) += leds-max8997.o
-obj-$(CONFIG_LEDS_LM3556)  += leds-lm3556.o
+obj-$(CONFIG_LEDS_LM355x)  += leds-lm355x.o
 obj-$(CONFIG_LEDS_BLINKM)  += leds-blinkm.o
 
 # LED SPI Drivers
diff --git a/drivers/leds/leds-lm3556.c b/drivers/leds/leds-lm3556.c
deleted file mode 100644
index 3062abd..000
--- a/drivers/leds/leds-lm3556.c
+++ /dev/null
@@ -1,512 +0,0 @@
-/*
- * Simple driver for Texas Instruments LM3556 LED Flash driver chip (Rev0x03)
- * Copyright (C) 2012 Texas Instruments
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Please refer Documentation/leds/leds-lm3556.txt file.
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define REG_FILT_TIME  (0x0)
-#define REG_IVFM_MODE  (0x1)
-#define REG_NTC(0x2)
-#define REG_INDIC_TIME (0x3)
-#define REG_INDIC_BLINK(0x4)
-#define REG_INDIC_PERIOD   (0x5)
-#define REG_TORCH_TIME (0x6)
-#define REG_CONF   (0x7)
-#define REG_FLASH  (0x8)
-#define REG_I_CTRL (0x9)
-#define REG_ENABLE (0xA)
-#define REG_FLAG   (0xB)
-#define REG_MAX(0xB)
-
-#define IVFM_FILTER_TIME_SHIFT (3)
-#define UVLO_EN_SHIFT  (7)
-#define HYSTERSIS_SHIFT(5)
-#define IVM_D_TH_SHIFT (2)
-#define IVFM_ADJ_MODE_SHIFT(0)
-#define NTC_EVENT_LVL_SHIFT(5)
-#define NTC_TRIP_TH_SHIFT  (2)
-#define NTC_BIAS_I_LVL_SHIFT   (0)
-#define INDIC_RAMP_UP_TIME_SHIFT   (3)
-#define INDIC_RAMP_DN_TIME_SHIFT   (0)
-#define INDIC_N_BLANK_SHIFT(4)
-#define INDIC_PULSE_TIME_SHIFT (0)
-#define INDIC_N_PERIOD_SHIFT   (0)
-#define TORCH_RAMP_UP_TIME_SHIFT   (3)
-#define TORCH_RAMP_DN_TIME_SHIFT   (0)
-#define STROBE_USUAGE_SHIFT(7)
-#define STROBE_PIN_POLARITY_SHIFT  (6)
-#define TORCH_PIN_POLARITY_SHIFT   (5)
-#define TX_PIN_POLARITY_SHIFT  (4)
-#define TX_EVENT_LVL_SHIFT (3)
-#define IVFM_EN_SHIFT  (2)
-#define NTC_MODE_SHIFT (1)
-#define INDIC_MODE_SHIFT   (0)
-#define INDUCTOR_I_LIMIT_SHIFT (6)
-#define FLASH_RAMP_TIME_SHIFT  (3)
-#define FLASH_TOUT_TIME_SHIFT   

[PATCH] dma: tegra: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Wei Yongjun
From: Wei Yongjun 

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun 
---
 drivers/dma/tegra20-apb-dma.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index 24acd71..6ed3f43 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -475,8 +475,7 @@ static void tegra_dma_abort_all(struct tegra_dma_channel 
*tdc)
while (!list_empty(&tdc->pending_sg_req)) {
sgreq = list_first_entry(&tdc->pending_sg_req,
typeof(*sgreq), node);
-   list_del(&sgreq->node);
-   list_add_tail(&sgreq->node, &tdc->free_sg_req);
+   list_move_tail(&sgreq->node, &tdc->free_sg_req);
if (sgreq->last_sg) {
dma_desc = sgreq->dma_desc;
dma_desc->dma_status = DMA_ERROR;
@@ -570,8 +569,7 @@ static void handle_cont_sngl_cycle_dma_done(struct 
tegra_dma_channel *tdc,
 
/* If not last req then put at end of pending list */
if (!list_is_last(&sgreq->node, &tdc->pending_sg_req)) {
-   list_del(&sgreq->node);
-   list_add_tail(&sgreq->node, &tdc->pending_sg_req);
+   list_move_tail(&sgreq->node, &tdc->pending_sg_req);
sgreq->configured = false;
st = handle_continuous_head_request(tdc, sgreq, to_terminate);
if (!st)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] drbd: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Wei Yongjun
From: Wei Yongjun 

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun 
---
 drivers/block/drbd/drbd_worker.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 6bce2cc..a196281 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -141,8 +141,7 @@ static void drbd_endio_write_sec_final(struct 
drbd_epoch_entry *e) __releases(lo
 
spin_lock_irqsave(&mdev->req_lock, flags);
mdev->writ_cnt += e->size >> 9;
-   list_del(&e->w.list); /* has been on active_ee or sync_ee */
-   list_add_tail(&e->w.list, &mdev->done_ee);
+   list_move_tail(&e->w.list, &mdev->done_ee);
 
/* No hlist_del_init(&e->collision) here, we did not send the Ack yet,
 * neither did we wake possibly waiting conflicting requests.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] proc: return -ENOMEM when inode allocation failed

2012-09-05 Thread Cong Wang

On 09/04/2012 05:22 PM, yan yan wrote:

2012/9/4 Cong Wang :

On 09/03/2012 10:14 PM, yan wrote:


Signed-off-by: yan 



Please provide a changelog to explain why we need this patch.


I think the title is self explained.



---
   fs/proc/generic.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index b3647fe..9e8f631 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -427,7 +427,7 @@ struct dentry *proc_lookup_de(struct proc_dir_entry
*de, struct inode *dir,
 if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
 pde_get(de);
 spin_unlock(&proc_subdir_lock);
-   error = -EINVAL;
+   error = -ENOMEM;



Why the !memcmp() case is related with ENOMEM ??


We are presetting 'error' here. The following proc_get_inode() will try
to get an inode, either from inode cache or allocate a new one (and fill it).

If we get a NULL inode, that means allocation failed. That's how
ENOMEM involved.


Then the following patch is probably better than yours:


diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index b3647fe..6b22913 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -427,12 +427,16 @@ struct dentry *proc_lookup_de(struct 
proc_dir_entry *de, struct inode *dir,

if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
pde_get(de);
spin_unlock(&proc_subdir_lock);
-   error = -EINVAL;
inode = proc_get_inode(dir->i_sb, de);
+   if (!inode) {
+   error = -ENOMEM;
+   goto out_put;
+   }
goto out_unlock;
}
}
spin_unlock(&proc_subdir_lock);
+
 out_unlock:

if (inode) {
@@ -440,6 +444,8 @@ out_unlock:
d_add(dentry, inode);
return NULL;
}
+out_put:
+
if (de)
pde_put(de);
return ERR_PTR(error);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] memory-hotplug: handle page race between allocation and isolation

2012-09-05 Thread Minchan Kim
Memory hotplug has a subtle race problem so this patchset fixes the problem
(Look at [3/3] for detail and please confirm the problem before review
other patches in this series.)

 [1/3] is just clean up and help for [2/3].
 [2/3] keeps the migratetype information to freed page's index field
   and [3/3] uses the information.
 [3/3] fixes the race problem with [2/3]'s information.

After applying [2/3], migratetype argument in __free_one_page
and free_one_page is redundant so we can remove it but I decide
to not touch them because it increases code size about 50 byte.

Minchan Kim (3):
  mm: use get_page_migratetype instead of page_private
  mm: remain migratetype in freed page
  memory-hotplug: bug fix race between isolation and allocation

 include/linux/mm.h  |   12 
 mm/page_alloc.c |   16 ++--
 mm/page_isolation.c |7 +--
 3 files changed, 27 insertions(+), 8 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] mm: remain migratetype in freed page

2012-09-05 Thread Minchan Kim
Page allocator doesn't keep migratetype information to page
when the page is freed. This patch remains the information
to freed page's index field which isn't used by free/alloc
preparing so it shouldn't change any behavir except below one.

This patch adds a new call site in __free_pages_ok so it might be
overhead a bit but it's for high order allocation.
So I believe damage isn't hurt.

Signed-off-by: Minchan Kim 
---
 include/linux/mm.h |6 --
 mm/page_alloc.c|7 ---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 86d61d6..8fd32da 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -251,12 +251,14 @@ struct inode;
 
 static inline void set_page_migratetype(struct page *page, int migratetype)
 {
-   set_page_private(page, migratetype);
+   VM_BUG_ON((unsigned int)migratetype >= MIGRATE_TYPES);
+   page->index = migratetype;
 }
 
 static inline int get_page_migratetype(struct page *page)
 {
-   return page_private(page);
+   VM_BUG_ON((unsigned int)page->index >= MIGRATE_TYPES);
+   return page->index;
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 103ba66..32985dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -723,6 +723,7 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
 {
unsigned long flags;
int wasMlocked = __TestClearPageMlocked(page);
+   int migratetype;
 
if (!free_pages_prepare(page, order))
return;
@@ -731,9 +732,9 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
if (unlikely(wasMlocked))
free_page_mlock(page);
__count_vm_events(PGFREE, 1 << order);
-   free_one_page(page_zone(page), page, order,
-   get_pageblock_migratetype(page));
-
+   migratetype = get_pageblock_migratetype(page);
+   set_page_migratetype(page, migratetype);
+   free_one_page(page_zone(page), page, order, migratetype);
local_irq_restore(flags);
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation

2012-09-05 Thread Minchan Kim
Like below, memory-hotplug makes race between page-isolation
and page-allocation so it can hit BUG_ON in __offline_isolated_pages.

CPU A   CPU B

start_isolate_page_range
set_migratetype_isolate
spin_lock_irqsave(zone->lock)

free_hot_cold_page(Page A)
/* without zone->lock */
migratetype = get_pageblock_migratetype(Page A);
/*
 * Page could be moved into MIGRATE_MOVABLE
 * of per_cpu_pages
 */
list_add_tail(&page->lru, 
&pcp->lists[migratetype]);

set_pageblock_isolate
move_freepages_block
drain_all_pages

/* Page A could be in MIGRATE_MOVABLE of 
free_list. */

check_pages_isolated
__test_page_isolated_in_pageblock
/*
 * We can't catch freed page which
 * is free_list[MIGRATE_MOVABLE]
 */
if (PageBuddy(page A))
pfn += 1 << page_order(page A);

/* So, Page A could be allocated */

__offline_isolated_pages
/*
 * BUG_ON hit or offline page
 * which is used by someone
 */
BUG_ON(!PageBuddy(page A));

Signed-off-by: Minchan Kim 
---
 mm/page_isolation.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index acf65a7..4699d1f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
unsigned long end_pfn)
continue;
}
page = pfn_to_page(pfn);
-   if (PageBuddy(page))
+   if (PageBuddy(page)) {
+   if (get_page_migratetype(page) != MIGRATE_ISOLATE)
+   break;
pfn += 1 << page_order(page);
+   }
else if (page_count(page) == 0 &&
get_page_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] mm: use get_page_migratetype instead of page_private

2012-09-05 Thread Minchan Kim
page allocator uses set_page_private and page_private for handling
migratetype when it frees page. Let's replace them with [set|get]
_page_migratetype to make it more clear.

Signed-off-by: Minchan Kim 
---
 include/linux/mm.h  |   10 ++
 mm/page_alloc.c |   11 +++
 mm/page_isolation.c |2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5c76634..86d61d6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -249,6 +249,16 @@ struct inode;
 #define page_private(page) ((page)->private)
 #define set_page_private(page, v)  ((page)->private = (v))
 
+static inline void set_page_migratetype(struct page *page, int migratetype)
+{
+   set_page_private(page, migratetype);
+}
+
+static inline int get_page_migratetype(struct page *page)
+{
+   return page_private(page);
+}
+
 /*
  * FIXME: take this include out, include page-flags.h in
  * files which need it (119 of them)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 710d91c..103ba66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
-   __free_one_page(page, zone, 0, page_private(page));
-   trace_mm_page_pcpu_drain(page, 0, page_private(page));
+   __free_one_page(page, zone, 0,
+   get_page_migratetype(page));
+   trace_mm_page_pcpu_drain(page, 0,
+   get_page_migratetype(page));
} while (--to_free && --batch_free && !list_empty(list));
}
__mod_zone_page_state(zone, NR_FREE_PAGES, count);
@@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
__count_vm_events(PGFREE, 1 << order);
free_one_page(page_zone(page), page, order,
get_pageblock_migratetype(page));
+
local_irq_restore(flags);
 }
 
@@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int 
order,
if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
mt = migratetype;
}
-   set_page_private(page, mt);
+   set_page_migratetype(page, mt);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
return;
 
migratetype = get_pageblock_migratetype(page);
-   set_page_private(page, migratetype);
+   set_page_migratetype(page, migratetype);
local_irq_save(flags);
if (unlikely(wasMlocked))
free_page_mlock(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 64abb33..acf65a7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
unsigned long end_pfn)
if (PageBuddy(page))
pfn += 1 << page_order(page);
else if (page_count(page) == 0 &&
-   page_private(page) == MIGRATE_ISOLATE)
+   get_page_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
else
break;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] memory-hotplug: remove MIGRATE_ISOLATE from free_area->free_list

2012-09-05 Thread Minchan Kim
Normally, MIGRATE_ISOLATE type is used for memory-hotplug.
But it's irony type because the pages isolated would exist
as free page in free_area->free_list[MIGRATE_ISOLATE] so people
can think of it as allocatable pages but it is *never* allocatable.
It ends up confusing NR_FREE_PAGES vmstat so it would be
totally not accurate so some of place which depend on such vmstat
could reach wrong decision by the context.

There were already report about it.[1]
[1] 702d1a6e, memory-hotplug: fix kswapd looping forever problem

Then, there was other report which is other problem.[2]
[2] http://www.spinics.net/lists/linux-mm/msg41251.html

I believe it can make problems in future, too.
So I hope removing such irony type by another design.

I hope this patch solves it and let's revert [1] and doesn't need [2].

Cc: Mel Gorman 
Cc: Kamezawa Hiroyuki 
Cc: Yasuaki Ishimatsu 
Cc: Konrad Rzeszutek Wilk 
Signed-off-by: Minchan Kim 
---

It's very early version which show the concept and just tested it with simple
test and works. This patch is needed indepth review from memory-hotplug
guys from fujitsu because I saw there are lots of patches recenlty they sent to
about memory-hotplug change. Please take a look at this patch.

 drivers/xen/balloon.c  |3 +-
 include/linux/mmzone.h |2 +-
 include/linux/page-isolation.h |   11 ++-
 mm/internal.h  |4 +
 mm/memory_hotplug.c|   38 +
 mm/page_alloc.c|   35 
 mm/page_isolation.c|  184 +++-
 7 files changed, 218 insertions(+), 59 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 31ab82f..617d7a3 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -66,7 +67,6 @@
 #include 
 #include 
 #include 
-
 /*
  * balloon_process() state:
  *
@@ -268,6 +268,7 @@ static void xen_online_page(struct page *page)
else
--balloon_stats.balloon_hotplug;
 
+   delete_from_isolated_list(page);
mutex_unlock(&balloon_mutex);
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..977dceb 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -57,7 +57,7 @@ enum {
 */
MIGRATE_CMA,
 #endif
-   MIGRATE_ISOLATE,/* can't allocate from here */
+   MIGRATE_ISOLATE,
MIGRATE_TYPES
 };
 
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 105077a..a26eb8a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -1,11 +1,16 @@
 #ifndef __LINUX_PAGEISOLATION_H
 #define __LINUX_PAGEISOLATION_H
 
+extern struct list_head isolated_pages;
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
int migratetype);
+
+void isolate_free_page(struct page *page, unsigned int order);
+void delete_from_isolated_list(struct page *page);
+
 /*
  * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
  * If specified range includes migrate types other than MOVABLE or CMA,
@@ -20,9 +25,13 @@ start_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 unsigned migratetype);
 
 /*
- * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
+ * Changes MIGRATE_ISOLATE to @migratetype.
  * target range is [start_pfn, end_pfn)
  */
+void
+undo_isolate_pageblock(unsigned long start_pfn, unsigned long end_pfn,
+   unsigned migratetype);
+
 int
 undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
unsigned migratetype);
diff --git a/mm/internal.h b/mm/internal.h
index 3314f79..4551179 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -96,6 +96,7 @@ extern void putback_lru_page(struct page *page);
  */
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
+extern int destroy_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
 extern bool is_free_buddy_page(struct page *page);
 #endif
@@ -144,6 +145,9 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
  * function for dealing with page's order in buddy system.
  * zone->lock is already acquired when we use these.
  * So, we don't need atomic page->flags operations here.
+ *
+ * Page order should be put on page->private because
+ * memory-hotplug depends on it. Look mm/page_isolate.c.
  */
 static inline unsigned long page_order(struct page *page)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..e297370 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -410,26 +410,29 @@ void __online_page_set_lim

Re: [eBeam PATCH 2/2] input: misc: New USB eBeam input driver.

2012-09-05 Thread Oliver Neukum
On Sunday 02 September 2012 00:52:03 Yann Cantin wrote:

Hi,

before we add yet another sysfs interface, we should ask whether calibration
isn't a problem that should be solved with a common API.

Regards
Oliver

> +static ssize_t ebeam_calibrated_set(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf,
> +   size_t count)
> +{
> +   struct ebeam_device *ebeam = dev_get_drvdata(dev);
> +   int err, c;
> +
> +   err = kstrtoint(buf, 10, &c);
> +   if (err)
> +   return err;
> +
> +   if (c == 1) {
> +   memcpy(&ebeam->cursetting, &ebeam->newsetting,
> +  sizeof(struct ebeam_settings));
> +   ebeam->calibrated = true;
> +   ebeam_setup_input(ebeam, ebeam->input);
> +   } else {
> +   memcpy(&ebeam->newsetting, &ebeam->cursetting,
> +  sizeof(struct ebeam_settings));
> +   ebeam->calibrated = false;
> +   ebeam_setup_input(ebeam, ebeam->input);
> +   }
> +
> +   return count;
> +}
> +
> +static DEVICE_ATTR(calibrated, S_IRUGO | S_IWUGO,
> +  ebeam_calibrated_get, ebeam_calibrated_set);
> +
> +static struct attribute *ebeam_attrs[] = {
> +   &dev_attr_min_x.attr,
> +   &dev_attr_min_y.attr,
> +   &dev_attr_max_x.attr,
> +   &dev_attr_max_y.attr,
> +   &dev_attr_h1.attr,
> +   &dev_attr_h2.attr,
> +   &dev_attr_h3.attr,
> +   &dev_attr_h4.attr,
> +   &dev_attr_h5.attr,
> +   &dev_attr_h6.attr,
> +   &dev_attr_h7.attr,
> +   &dev_attr_h8.attr,
> +   &dev_attr_h9.attr,
> +   &dev_attr_calibrated.attr,
> +   NULL
> +};
> +
-- 
- - - 
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
16746 (AG Nürnberg) 
Maxfeldstraße 5 
90409 Nürnberg 
Germany 
- - - 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] i2c: nomadik: Add Device Tree support to the Nomadik I2C driver

2012-09-05 Thread Lee Jones
Author: Lee Jones 
Date:   Mon Aug 6 11:09:57 2012 +0100

i2c: nomadik: Add Device Tree support to the Nomadik I2C driver

Here we apply the bindings required for successful Device Tree
probing of the i2c-nomadik driver.

Cc: linux-...@vger.kernel.org
Signed-off-by: Lee Jones 

diff --git a/drivers/i2c/busses/i2c-nomadik.c b/drivers/i2c/busses/i2c-nomadik.c
index 61b00ed..5d1a970 100644
--- a/drivers/i2c/busses/i2c-nomadik.c
+++ b/drivers/i2c/busses/i2c-nomadik.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_NAME "nmk-i2c"
 
@@ -920,18 +921,42 @@ static struct nmk_i2c_controller u8500_i2c = {
.sm = I2C_FREQ_MODE_FAST,
 };
 
+static void nmk_i2c_of_probe(struct device_node *np,
+   struct nmk_i2c_controller *pdata)
+{
+   of_property_read_u32(np, "clock-frequency", &pdata->clk_freq);
+
+   /* This driver only supports 'standard' and 'fast' modes of operation. 
*/
+   if (pdata->clk_freq <= 10)
+   pdata->sm = I2C_FREQ_MODE_STANDARD;
+   else
+   pdata->sm = I2C_FREQ_MODE_FAST;
+}
+
 static atomic_t adapter_id = ATOMIC_INIT(0);
 
 static int nmk_i2c_probe(struct amba_device *adev, const struct amba_id *id)
 {
int ret = 0;
struct nmk_i2c_controller *pdata = adev->dev.platform_data;
+   struct device_node *np = adev->dev.of_node;
struct nmk_i2c_dev  *dev;
struct i2c_adapter *adap;
 
-   if (!pdata)
-   /* No i2c configuration found, using the default. */
-   pdata = &u8500_i2c;
+   if (!pdata) {
+   if (np) {
+   pdata = devm_kzalloc(&adev->dev, sizeof(*pdata), 
GFP_KERNEL);
+   if (!pdata) {
+   ret = -ENOMEM;
+   goto err_no_mem;
+   }
+   /* Provide the default configuration as a base. */
+   memcpy(pdata, &u8500_i2c, sizeof(struct 
nmk_i2c_controller));
+   nmk_i2c_of_probe(np, pdata);
+   } else
+   /* No i2c configuration found, using the default. */
+   pdata = &u8500_i2c;
+   }
 
dev = kzalloc(sizeof(struct nmk_i2c_dev), GFP_KERNEL);
if (!dev) {
diff --git a/include/linux/platform_data/i2c-nomadik.h 
b/include/linux/platform_data/i2c-nomadik.h
index c2303c3..3a8be9c 100644
--- a/include/linux/platform_data/i2c-nomadik.h
+++ b/include/linux/platform_data/i2c-nomadik.h
@@ -28,7 +28,7 @@ enum i2c_freq_mode {
  * @sm:speed mode
  */
 struct nmk_i2c_controller {
-   unsigned long   clk_freq;
+   u32 clk_freq;
unsigned short  slsu;
unsigned char   tft;
unsigned char   rft;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Takashi Iwai
At Wed, 5 Sep 2012 14:33:21 +0800,
Wei Yongjun wrote:
> 
> From: Wei Yongjun 
> 
> Using list_move_tail() instead of list_del() + list_add_tail().
> 
> Signed-off-by: Wei Yongjun 

Thanks, applied.


Takashi

> ---
>  sound/drivers/opl4/opl4_synth.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/sound/drivers/opl4/opl4_synth.c b/sound/drivers/opl4/opl4_synth.c
> index 49b9e24..4b91adc 100644
> --- a/sound/drivers/opl4/opl4_synth.c
> +++ b/sound/drivers/opl4/opl4_synth.c
> @@ -504,8 +504,7 @@ void snd_opl4_note_on(void *private_data, int note, int 
> vel, struct snd_midi_cha
>   spin_lock_irqsave(&opl4->reg_lock, flags);
>   for (i = 0; i < voices; i++) {
>   voice[i] = snd_opl4_get_voice(opl4);
> - list_del(&voice[i]->list);
> - list_add_tail(&voice[i]->list, &opl4->on_voices);
> + list_move_tail(&voice[i]->list, &opl4->on_voices);
>   voice[i]->chan = chan;
>   voice[i]->note = note;
>   voice[i]->velocity = vel & 0x7f;
> @@ -555,8 +554,7 @@ void snd_opl4_note_on(void *private_data, int note, int 
> vel, struct snd_midi_cha
>  
>  static void snd_opl4_voice_off(struct snd_opl4 *opl4, struct opl4_voice 
> *voice)
>  {
> - list_del(&voice->list);
> - list_add_tail(&voice->list, &opl4->off_voices);
> + list_move_tail(&voice->list, &opl4->off_voices);
>  
>   voice->reg_misc &= ~OPL4_KEY_ON_BIT;
>   snd_opl4_write(opl4, OPL4_REG_MISC + voice->number, voice->reg_misc);
> @@ -571,8 +569,7 @@ void snd_opl4_note_off(void *private_data, int note, int 
> vel, struct snd_midi_ch
>  
>  static void snd_opl4_terminate_voice(struct snd_opl4 *opl4, struct 
> opl4_voice *voice)
>  {
> - list_del(&voice->list);
> - list_add_tail(&voice->list, &opl4->off_voices);
> + list_move_tail(&voice->list, &opl4->off_voices);
>  
>   voice->reg_misc = (voice->reg_misc & ~OPL4_KEY_ON_BIT) | OPL4_DAMP_BIT;
>   snd_opl4_write(opl4, OPL4_REG_MISC + voice->number, voice->reg_misc);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Takashi Iwai
At Wed, 5 Sep 2012 15:00:15 +0800,
Wei Yongjun wrote:
> 
> From: Wei Yongjun 
> 
> Using list_move_tail() instead of list_del() + list_add_tail().
> 
> Signed-off-by: Wei Yongjun 

Applied this one, too.  Thanks.


Takashi


> ---
>  sound/pci/emu10k1/memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/pci/emu10k1/memory.c b/sound/pci/emu10k1/memory.c
> index 0a43662..1dce0a39 100644
> --- a/sound/pci/emu10k1/memory.c
> +++ b/sound/pci/emu10k1/memory.c
> @@ -263,8 +263,8 @@ int snd_emu10k1_memblk_map(struct snd_emu10k1 *emu, 
> struct snd_emu10k1_memblk *b
>   spin_lock_irqsave(&emu->memblk_lock, flags);
>   if (blk->mapped_page >= 0) {
>   /* update order link */
> - list_del(&blk->mapped_order_link);
> - list_add_tail(&blk->mapped_order_link, 
> &emu->mapped_order_link_head);
> + list_move_tail(&blk->mapped_order_link,
> +&emu->mapped_order_link_head);
>   spin_unlock_irqrestore(&emu->memblk_lock, flags);
>   return 0;
>   }
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1 v3] leds: Add new LED driver for lm355x chips

2012-09-05 Thread Bryan Wu
On Wed, Sep 5, 2012 at 3:05 PM, G.Shark Jeong  wrote:
> From: "G.Shark Jeong" 
>
> This driver is a general version for LM355x,lm3554 and lm3556,led chips of TI.
>
> LM3554 :
> The LM3554 is a 2 MHz fixed-frequency synchronous boost
> converter with 1.2A dual high side led drivers.
> Datasheet: www.ti.com/lit/ds/symlink/lm3554.pdf
>
> LM3556 :
> The LM3556 is a 4 MHz fixed-frequency synchronous boost
> converter plus 1.5A constant current driver for a high-current white LED.
> Datasheet: www.national.com/ds/LM/LM3556.pdf
>
> Signed-off-by: G.Shark Jeong 
> ---
>  drivers/leds/Kconfig  |8 +-
>  drivers/leds/Makefile |2 +-
>  drivers/leds/leds-lm3556.c|  512 --
>  drivers/leds/leds-lm355x.c|  572 
> +
>  include/linux/platform_data/leds-lm3556.h |   50 ---
>  include/linux/platform_data/leds-lm355x.h |   66 
>  6 files changed, 643 insertions(+), 567 deletions(-)
>  delete mode 100644 drivers/leds/leds-lm3556.c
>  create mode 100644 drivers/leds/leds-lm355x.c
>  delete mode 100644 include/linux/platform_data/leds-lm3556.h
>  create mode 100644 include/linux/platform_data/leds-lm355x.h
>
> diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
> index c96bbaa..4f6ced2 100644
> --- a/drivers/leds/Kconfig
> +++ b/drivers/leds/Kconfig
> @@ -422,13 +422,13 @@ config LEDS_MAX8997
>   This option enables support for on-chip LED drivers on
>   MAXIM MAX8997 PMIC.
>
> -config LEDS_LM3556
> -   tristate "LED support for LM3556 Chip"
> +config LEDS_LM355x
> +   tristate "LED support for LM355x Chips, LM3554 and LM3556"
> depends on LEDS_CLASS && I2C
> select REGMAP_I2C
> help
> - This option enables support for LEDs connected to LM3556.
> - LM3556 includes Torch, Flash and Indicator functions.
> + This option enables support for LEDs connected to LM355x.
> + LM355x includes Torch, Flash and Indicator functions.
>
>  config LEDS_OT200
> tristate "LED support for the Bachmann OT200"
> diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
> index a4429a9..b57a021 100644
> --- a/drivers/leds/Makefile
> +++ b/drivers/leds/Makefile
> @@ -48,7 +48,7 @@ obj-$(CONFIG_LEDS_NETXBIG)+= leds-netxbig.o
>  obj-$(CONFIG_LEDS_ASIC3)   += leds-asic3.o
>  obj-$(CONFIG_LEDS_RENESAS_TPU) += leds-renesas-tpu.o
>  obj-$(CONFIG_LEDS_MAX8997) += leds-max8997.o
> -obj-$(CONFIG_LEDS_LM3556)  += leds-lm3556.o
> +obj-$(CONFIG_LEDS_LM355x)  += leds-lm355x.o
>  obj-$(CONFIG_LEDS_BLINKM)  += leds-blinkm.o
>
>  # LED SPI Drivers
> diff --git a/drivers/leds/leds-lm3556.c b/drivers/leds/leds-lm3556.c
> deleted file mode 100644
> index 3062abd..000
> --- a/drivers/leds/leds-lm3556.c
> +++ /dev/null
> @@ -1,512 +0,0 @@
> -/*
> - * Simple driver for Texas Instruments LM3556 LED Flash driver chip (Rev0x03)
> - * Copyright (C) 2012 Texas Instruments
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Please refer Documentation/leds/leds-lm3556.txt file.
> - */
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define REG_FILT_TIME  (0x0)
> -#define REG_IVFM_MODE  (0x1)
> -#define REG_NTC(0x2)
> -#define REG_INDIC_TIME (0x3)
> -#define REG_INDIC_BLINK(0x4)
> -#define REG_INDIC_PERIOD   (0x5)
> -#define REG_TORCH_TIME (0x6)
> -#define REG_CONF   (0x7)
> -#define REG_FLASH  (0x8)
> -#define REG_I_CTRL (0x9)
> -#define REG_ENABLE (0xA)
> -#define REG_FLAG   (0xB)
> -#define REG_MAX(0xB)
> -
> -#define IVFM_FILTER_TIME_SHIFT (3)
> -#define UVLO_EN_SHIFT  (7)
> -#define HYSTERSIS_SHIFT(5)
> -#define IVM_D_TH_SHIFT (2)
> -#define IVFM_ADJ_MODE_SHIFT(0)
> -#define NTC_EVENT_LVL_SHIFT(5)
> -#define NTC_TRIP_TH_SHIFT  (2)
> -#define NTC_BIAS_I_LVL_SHIFT   (0)
> -#define INDIC_RAMP_UP_TIME_SHIFT   (3)
> -#define INDIC_RAMP_DN_TIME_SHIFT   (0)
> -#define INDIC_N_BLANK_SHIFT(4)
> -#define INDIC_PULSE_TIME_SHIFT (0)
> -#define INDIC_N_PERIOD_SHIFT   (0)
> -#define TORCH_RAMP_UP_TIME_SHIFT   (3)
> -#define TORCH_RAMP_DN_TIME_SHIFT   (0)
> -#define STROBE_USUAGE_SHIFT(7)
> -#define STROBE_PIN_POLARITY_SHIFT  (6)
> -#define TORCH_PIN_POLARITY_SHIFT   (5)
> -#define TX_PIN_POLARITY_SHIFT  (4)
> -#def

Re: [PATCH] watchdog/imx2+: add support for pretimeout interrupt

2012-09-05 Thread Oskar Schirmer
Hi Wim,

On Tue, Jul 03, 2012 at 09:10:08 +, Oskar Schirmer wrote:
> This watchdog device provides pretimeout facilities:
> Set some timeout value and get informed about imminent
> watchdog activity thru interrupt.

sent this patch a while ago, as yet unprocessed.
Could You give it an ack?
If not so, please let me know reasons for it, so I can rework it.

thanks a lot,
  Oskar

> Allow user to wait for this interrupt thru poll(2),
> and to clear it thru read(2).
> 
> Signed-off-by: Oskar Schirmer 
> Cc: Wim Van Sebroeck 
> Cc: Wolfram Sang 
> Cc: Andrew Morton 
> ---
> Resubmitted as it probably got lost on first attempt (2012/05/31)
> 
>  drivers/watchdog/imx2_wdt.c |  129 
> ++-
>  1 files changed, 128 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c
> index bcfab2b..09172c8 100644
> --- a/drivers/watchdog/imx2_wdt.c
> +++ b/drivers/watchdog/imx2_wdt.c
> @@ -21,11 +21,16 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -49,6 +54,11 @@
>  #define IMX2_WDT_WRSR0x04/* Reset Status 
> Register */
>  #define IMX2_WDT_WRSR_TOUT   (1 << 1)/* -> Reset due to Timeout */
>  
> +#define IMX2_WDT_WICT0x06/* Interrupt Control 
> Reg */
> +#define IMX2_WDT_WICT_WIE(1 << 15)   /* -> Interrupt Enable */
> +#define IMX2_WDT_WICT_WTIS   (1 << 14)   /* -> Timer Interrupt Status */
> +#define IMX2_WDT_WICT_WICT   (0xFF << 0) /* -> Interrupt Count Timeout */
> +
>  #define IMX2_WDT_MAX_TIME128
>  #define IMX2_WDT_DEFAULT_TIME60  /* in seconds */
>  
> @@ -64,6 +74,11 @@ static struct {
>   unsigned timeout;
>   unsigned long status;
>   struct timer_list timer;/* Pings the watchdog when closed */
> + int irq;
> + spinlock_t read_lock;
> + wait_queue_head_t read_q;
> + unsigned char pretimer_once;
> + unsigned char pretimer_data;
>  } imx2_wdt;
>  
>  static struct miscdevice imx2_wdt_miscdev;
> @@ -81,7 +96,8 @@ MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds 
> (default="
>  
>  static const struct watchdog_info imx2_wdt_info = {
>   .identity = "imx2+ watchdog",
> - .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE,
> + .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE |
> + WDIOF_PRETIMEOUT,
>  };
>  
>  static inline void imx2_wdt_setup(void)
> @@ -148,6 +164,28 @@ static void imx2_wdt_set_timeout(int new_timeout)
>   __raw_writew(val, imx2_wdt.base + IMX2_WDT_WCR);
>  }
>  
> +static void imx2_wdt_set_pretimeout(int new_pretimeout)
> +{
> + u16 val = new_pretimeout;
> +
> + if (val > 0)
> + val = (val << 1) | IMX2_WDT_WICT_WIE;
> + __raw_writew(val, imx2_wdt.base + IMX2_WDT_WICT);
> +}
> +
> +static void imx2_wdt_interrupt(int irq, void *dev_id)
> +{
> + u16 val;
> + spin_lock(&imx2_wdt.read_lock);
> + val = __raw_readw(imx2_wdt.base + IMX2_WDT_WICT);
> + if (val & IMX2_WDT_WICT_WTIS) {
> + __raw_writew(val, imx2_wdt.base + IMX2_WDT_WICT);
> + imx2_wdt.pretimer_data = 1;
> + }
> + wake_up_interruptible(&imx2_wdt.read_q);
> + spin_unlock(&imx2_wdt.read_lock);
> +}
> +
>  static int imx2_wdt_open(struct inode *inode, struct file *file)
>  {
>   if (test_and_set_bit(IMX2_WDT_STATUS_OPEN, &imx2_wdt.status))
> @@ -210,6 +248,26 @@ static long imx2_wdt_ioctl(struct file *file, unsigned 
> int cmd,
>   case WDIOC_GETTIMEOUT:
>   return put_user(imx2_wdt.timeout, p);
>  
> + case WDIOC_SETPRETIMEOUT:
> + if (get_user(new_value, p))
> + return -EFAULT;
> + if ((new_value < 0) || (new_value >= IMX2_WDT_MAX_TIME))
> + return -EINVAL;
> + if (imx2_wdt.irq < 0)
> + return -EINVAL;
> + if (imx2_wdt.pretimer_once)
> + return -EPERM;
> + imx2_wdt.pretimer_once = 1;
> + imx2_wdt_set_pretimeout(new_value);
> +
> + case WDIOC_GETPRETIMEOUT:
> + val = __raw_readw(imx2_wdt.base + IMX2_WDT_WICT);
> + if (val & IMX2_WDT_WICT_WIE)
> + val = (val & IMX2_WDT_WICT_WICT) >> 1;
> + else
> + val = 0;
> + return put_user(val, p);
> +
>   default:
>   return -ENOTTY;
>   }
> @@ -237,6 +295,59 @@ static ssize_t imx2_wdt_write(struct file *file, const 
> char __user *data,
>   return len;
>  }
>  
> +static ssize_t imx2_wdt_read(struct file *file, char __user *data,
> + size_t count, loff_t *ppos)
> +{
> + wait_queue_t wait;
> + unsigned long flags;
> +
> + if (count <= 0)
> + ret

Re: [GIT PULL] oprofile: fixes and updates

2012-09-05 Thread Ingo Molnar

* Robert Richter  wrote:

> Ingo,
> 
> one patch each for perf/urgent and perf/core, please pull from:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git urgent

Pulled into tip:perf/urgent,

> 
> and
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

Pulled into tip:perf/core.

Thanks Robert!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] proc: return -ENOMEM when inode allocation failed

2012-09-05 Thread yan yan
2012/9/5 Cong Wang :
>>> Why the !memcmp() case is related with ENOMEM ??
>>
>>
>> We are presetting 'error' here. The following proc_get_inode() will try
>> to get an inode, either from inode cache or allocate a new one (and fill
>> it).
>>
>> If we get a NULL inode, that means allocation failed. That's how
>> ENOMEM involved.
>
>
> Then the following patch is probably better than yours:
>
>
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index b3647fe..6b22913 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -427,12 +427,16 @@ struct dentry *proc_lookup_de(struct proc_dir_entry
> *de, struct inode *dir,
>
> if (!memcmp(dentry->d_name.name, de->name, de->namelen)) {
> pde_get(de);
> spin_unlock(&proc_subdir_lock);
> -   error = -EINVAL;
> inode = proc_get_inode(dir->i_sb, de);
> +   if (!inode) {
> +   error = -ENOMEM;
> +   goto out_put;
> +   }
> goto out_unlock;
> }
> }
> spin_unlock(&proc_subdir_lock);
> +
>  out_unlock:
>
> if (inode) {
> @@ -440,6 +444,8 @@ out_unlock:
> d_add(dentry, inode);
> return NULL;
> }
> +out_put:
> +
> if (de)
> pde_put(de);
> return ERR_PTR(error);
>
>

Change so many lines to save a assignment to 'error' ...

That's a stye issue. I prefer a simple change, though your
change seems OK to me.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v3 14/14] x86, mm: Map ISA area with connected ram range at the same time

2012-09-05 Thread Pekka Enberg
On Wed, Sep 5, 2012 at 8:46 AM, Yinghai Lu  wrote:
> so could reduce one loop.
>
> Signed-off-by: Yinghai Lu 

How significant is the speed gain? The "isa_done" flag makes code flow
more difficult to follow.

> ---
>  arch/x86/mm/init.c |   21 ++---
>  1 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 6663f61..e69f832 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -248,20 +248,27 @@ static void __init walk_ram_ranges(
> void *data)
>  {
> unsigned long start_pfn, end_pfn;
> +   bool isa_done = false;
> int i;
>
> -   /* the ISA range is always mapped regardless of memory holes */
> -   work_fn(0, ISA_END_ADDRESS, data);
> -
> for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, NULL) {
> u64 start = start_pfn << PAGE_SHIFT;
> u64 end = end_pfn << PAGE_SHIFT;
>
> -   if (end <= ISA_END_ADDRESS)
> -   continue;
> +   if (!isa_done && start > ISA_END_ADDRESS) {
> +   work_fn(0, ISA_END_ADDRESS, data);
> +   isa_done = true;
> +   } else {
> +   if (end < ISA_END_ADDRESS)
> +   continue;
> +
> +   if (start <= ISA_END_ADDRESS &&
> +   end >= ISA_END_ADDRESS) {
> +   start = 0;
> +   isa_done = true;
> +   }
> +   }
>
> -   if (start < ISA_END_ADDRESS)
> -   start = ISA_END_ADDRESS;
>  #ifdef CONFIG_X86_32
> /* on 32 bit, we only map up to max_low_pfn */
> if ((start >> PAGE_SHIFT) >= max_low_pfn)
> --
> 1.7.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v3 13/14] x86, mm: Use func pointer to table size calculation and mapping

2012-09-05 Thread Pekka Enberg
On Wed, Sep 5, 2012 at 8:46 AM, Yinghai Lu  wrote:
> They all need to go over ram range in same sequence. So add shared function
> to reduce duplicated code.
>
> -v2: Change to walk_ram_ranges() according to Pekka Enberg.
>
> Signed-off-by: Yinghai Lu 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: fix logical error in tlb flushing

2012-09-05 Thread Jan Beulich
>>> On 05.09.12 at 07:34, Alex Shi  wrote:
> On 08/25/2012 03:45 AM, Jan Beulich wrote:
> On 24.08.12 at 20:17, Konrad Rzeszutek Wilk  
> wrote:
>>> How can I reproduce this
>>
>> I don't know, I spotted this while looking at the code.
> 
> Again, since the old buggy code doesn't cause trouble in PV guest, guess
> the hypercall for MMUEXT_INVLPG_MULTI was translated or treated as
> MMUEXT_TLB_FLUSH_MULTI. If so, believe correct this will bring a big
> performance benefit.

It's not clear to me what was buggy with the code prior to your
change. And no, there's no magic widening of the scope of these
MMU operations - if you ask the hypervisor for a single page
invalidation, that's what it's going to do. But of course, there
are cases where extra (full) invalidations need to be done
without a guest asking for them. But that's nothing a guest can
validly make itself dependent upon.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7 1/1] ieee802154: MRF24J40 driver

2012-09-05 Thread Alexander Smirnov
Dear colleagues,

2012/9/4 David Miller :
> From: Alan Ott 
> Date: Sun,  2 Sep 2012 21:44:13 -0400
>
>> Driver for the Microchip MRF24J40 802.15.4 WPAN module.

I wan on vacation so had no possibility to review the code.
Alan, thank you for the contribution!

>>
>> Signed-off-by: Alan Ott 
>
> Applied to net-next, thanks.

Thanks David.

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 01:46 AM, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote:
>> As we have been extensively discussing, the cost and pain points for cgroups
>> come from many places. But at least one of those is the arbitrary nature of
>> hierarchies. Many people, including at least Tejun and me would like this to 
>> go
>> away altogether. Problem so far, is breaking compatiblity with existing 
>> setups
>>
>> I am proposing here a default-n Kconfig option that will guarantee that the 
>> cpu
>> cgroups (for now) will be comounted. I started with them because the
>> cpu/cpuacct division is clearly the worst offender. Also, the default-n is 
>> here
>> so distributions will have time to adapt: Forcing this flag to be on without
>> userspace changes will just lead to cgroups failing to mount, which we don't
>> want.
>>
>> Although I've tested it and it works, I haven't compile-tested all possible
>> config combinations. So this is mostly for your eyes. If this gets traction,
>> I'll submit it properly, along with any changes that you might require.
> 
> As I said during the discussion, I'm skeptical about how useful this
> is.  This can't nudge existing users in any meaningfully gradual way.
> Kconfig doesn't make it any better.  It's still an abrupt behavior
> change when seen from userland.
>

The goal here is to have distributions to do it, because they tend to
have a well defined lifecycle management, much more than upstream. Whoever
sets this option, can coordinate with upstream.

Aside from enforcing it, we can pretty much warn() as well, to direct
people towards flipping the switch.

> Also, I really don't see much point in enforcing this almost arbitrary
> grouping of controllers.  It doesn't simplify anything and using
> cpuacct in more granular way than cpu actually is one of the better
> justified use of multiple hierarchies.  Also, what about memcg and
> blkcg?  Do they *really* coincide?  Note that both blkcg and memcg
> involve non-trivial overhead and blkcg is essentially broken
> hierarchy-wise.
> 

Where did I mention memcg or blkcg in this patch ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] mm: change enum migrate_mode with bitwise type

2012-09-05 Thread Minchan Kim
This patch changes migrate_mode type to bitwise type because
next patch will add MIGRATE_DISCARD and it could be ORed with other
attributes so it would be better to change it with bitwise type.

Cc: Rik van Riel 
Cc: Mel Gorman 
Suggested-by: Michal Nazarewicz 
Signed-off-by: Minchan Kim 
---
 fs/btrfs/disk-io.c   |2 +-
 fs/hugetlbfs/inode.c |2 +-
 fs/nfs/internal.h|2 +-
 fs/nfs/write.c   |2 +-
 include/linux/fs.h   |4 ++--
 include/linux/migrate.h  |   10 +-
 include/linux/migrate_mode.h |   15 +--
 mm/migrate.c |   38 +++---
 8 files changed, 39 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62e0caf..70fbbe1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -901,7 +901,7 @@ static int btree_submit_bio_hook(struct inode *inode, int 
rw, struct bio *bio,
 #ifdef CONFIG_MIGRATION
 static int btree_migratepage(struct address_space *mapping,
struct page *newpage, struct page *page,
-   enum migrate_mode mode)
+   migrate_mode_t mode)
 {
/*
 * we can't safely write a btree page from here,
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 7f8..2b254f9 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -604,7 +604,7 @@ static int hugetlbfs_set_page_dirty(struct page *page)
 
 static int hugetlbfs_migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
-   enum migrate_mode mode)
+   migrate_mode_t mode)
 {
int rc;
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 31fdb03..d554438 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -452,7 +452,7 @@ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
 
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
-   struct page *, struct page *, enum migrate_mode);
+   struct page *, struct page *, migrate_mode_t);
 #else
 #define nfs_migrate_page NULL
 #endif
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e3b5537..093889b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1783,7 +1783,7 @@ out_error:
 
 #ifdef CONFIG_MIGRATION
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
-   struct page *page, enum migrate_mode mode)
+   struct page *page, migrate_mode_t mode)
 {
/*
 * If PagePrivate is set, then the page is currently associated with
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0b25c5d..a7fbdc6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -637,7 +637,7 @@ struct address_space_operations {
 * is false, it must not block.
 */
int (*migratepage) (struct address_space *,
-   struct page *, struct page *, enum migrate_mode);
+   struct page *, struct page *, migrate_mode_t);
int (*launder_page) (struct page *);
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
@@ -2734,7 +2734,7 @@ extern int generic_check_addressable(unsigned, u64);
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_page(struct address_space *,
struct page *, struct page *,
-   enum migrate_mode);
+   migrate_mode_t);
 #else
 #define buffer_migrate_page NULL
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..f7a50f5 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -11,13 +11,13 @@ typedef struct page *new_page_t(struct page *, unsigned 
long private, int **);
 
 extern void putback_lru_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
-   struct page *, struct page *, enum migrate_mode);
+   struct page *, struct page *, migrate_mode_t);
 extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
-   enum migrate_mode mode);
+   migrate_mode_t mode);
 extern int migrate_huge_page(struct page *, new_page_t x,
unsigned long private, bool offlining,
-   enum migrate_mode mode);
+   migrate_mode_t mode);
 
 extern int fail_migrate_page(struct address_space *,
struct page *, struct page *);
@@ -35,10 +35,10 @@ extern int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 static inline void putback_lru_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
-   en

[PATCH 2/2] mm: support MIGRATE_DISCARD

2012-09-05 Thread Minchan Kim
This patch introudes MIGRATE_DISCARD mode in migration.
It drops *clean cache pages* instead of migration so that
migration latency could be reduced by avoiding (memcpy + page remapping).
It's useful for CMA because latency of migration is very important rather
than eviction of background processes's workingset. In addition, it needs
less free pages for migration targets so it could avoid memory reclaiming
to get free pages, which is another factor increase latency.

Cc: Marek Szyprowski 
Cc: Michal Nazarewicz 
Cc: Rik van Riel 
Cc: Mel Gorman 
Signed-off-by: Minchan Kim 
---
 include/linux/migrate_mode.h |7 +++
 mm/migrate.c |   41 ++---
 mm/page_alloc.c  |2 +-
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 8848cad..4eb1646 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -14,6 +14,13 @@
  */
 #define MIGRATE_SYNC   ((__force migrate_mode_t)0x4)
 
+/*
+ * MIGRTATE_DISCARD will discard clean cache page instead of migration.
+ * MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC shouldn't be used
+ * together with OR flag in current implementation.
+ */
+#define MIGRATE_DISCARD((__force migrate_mode_t)0x8)
+
 typedef unsigned __bitwise__ migrate_mode_t;
 
 #endif /* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index 28d464b..2de7709 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -678,6 +678,19 @@ static int move_to_new_page(struct page *newpage, struct 
page *page,
return rc;
 }
 
+static int discard_page(struct page *page)
+{
+   int ret = -EAGAIN;
+
+   struct address_space *mapping = page_mapping(page);
+   if (page_has_private(page))
+   if (!try_to_release_page(page, GFP_KERNEL))
+   return ret;
+   if (remove_mapping(mapping, page))
+   ret = 0;
+   return ret;
+}
+
 static int __unmap_and_move(struct page *page, struct page *newpage,
int force, bool offlining, migrate_mode_t mode)
 {
@@ -685,6 +698,9 @@ static int __unmap_and_move(struct page *page, struct page 
*newpage,
int remap_swapcache = 1;
struct mem_cgroup *mem;
struct anon_vma *anon_vma = NULL;
+   enum ttu_flags ttu_flags;
+   bool discard_mode = false;
+   bool file = false;
 
if (!trylock_page(page)) {
if (!force || (mode & MIGRATE_ASYNC))
@@ -799,12 +815,31 @@ static int __unmap_and_move(struct page *page, struct 
page *newpage,
goto skip_unmap;
}
 
+   file = page_is_file_cache(page);
+   ttu_flags = TTU_IGNORE_ACCESS;
+retry:
+   if (!(mode & MIGRATE_DISCARD) || !file || PageDirty(page))
+   ttu_flags |= (TTU_MIGRATION | TTU_IGNORE_MLOCK);
+   else
+   discard_mode = true;
+
/* Establish migration ptes or remove ptes */
-   try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+   rc = try_to_unmap(page, ttu_flags);
 
 skip_unmap:
-   if (!page_mapped(page))
-   rc = move_to_new_page(newpage, page, remap_swapcache, mode);
+   if (rc == SWAP_SUCCESS) {
+   if (!discard_mode) {
+   rc = move_to_new_page(newpage, page,
+   remap_swapcache, mode);
+   } else {
+   rc = discard_page(page);
+   goto uncharge;
+   }
+   } else if (rc == SWAP_MLOCK && discard_mode) {
+   mode &= ~MIGRATE_DISCARD;
+   discard_mode = false;
+   goto retry;
+   }
 
if (rc && remap_swapcache)
remove_migration_ptes(page, page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ba3100a..e14b960 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5670,7 +5670,7 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
 
ret = migrate_pages(&cc.migratepages,
__alloc_contig_migrate_alloc,
-   0, false, MIGRATE_SYNC);
+   0, false, MIGRATE_SYNC|MIGRATE_DISCARD);
}
 
putback_lru_pages(&cc.migratepages);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hello, Glauber.

On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote:
> The goal here is to have distributions to do it, because they tend to
> have a well defined lifecycle management, much more than upstream. Whoever
> sets this option, can coordinate with upstream.

Distros can just co-mount them during boot.  What's the point of the
config options?

> > Also, I really don't see much point in enforcing this almost arbitrary
> > grouping of controllers.  It doesn't simplify anything and using
> > cpuacct in more granular way than cpu actually is one of the better
> > justified use of multiple hierarchies.  Also, what about memcg and
> > blkcg?  Do they *really* coincide?  Note that both blkcg and memcg
> > involve non-trivial overhead and blkcg is essentially broken
> > hierarchy-wise.
> 
> Where did I mention memcg or blkcg in this patch ?

Differing hierarchies in memcg and blkcg currently is the most
prominent case where the intersection in writeback is problematic and
your proposed solution doesn't help one way or the other.  What's the
point?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] memcg: first step towards hierarchical controller

2012-09-05 Thread Glauber Costa
On 09/04/2012 08:25 PM, Michal Hocko wrote:
> On Tue 04-09-12 18:54:08, Glauber Costa wrote:
> [...]
 I'd personally believe merging both our patches together would achieve a
 good result.
>>>
>>> I am still not sure we want to add a config option for something that is
>>> meant to go away. But let's see what others think.
>>>
>>
>> So what you propose in the end is that we add a userspace tweak for
>> something that could go away, instead of a Kconfig for something that go
>> away.
> 
> The tweak is necessary only if you want to have use_hierarchy=1 for all
> cgroups without taking care about that (aka setting the attribute for
> the first level under the root). All the users that use only one level
> bellow root don't have to do anything at all.
> 
>> Way I see it, Kconfig is better because it is totally transparent, under
>> the hood, and will give us a single location to unpatch in case/when it
>> really goes away.
> 
> I guess that by the single location you mean that no other user space
> changes would have to be done, right? If yes then this is not true
> because there will be a lot of configurations setting this up already
> (either by cgconfig or by other scripts). All of them will have to be
> fixed some day.
> 

Some userspaces, not all. And the ones who set:

They are either explicitly setting to 0, and those are the ones we need
to find out, or they are setting to 1, which will be harmless. If they
were all mandated to do it, fine. But they are not everywhere, and much
many other exists that don't touch it at all. What you are proposing is
that *all* userspace tools that use it go flip it, instead of doing it
in the kernel.

As I've said before, distributions have lifecycles where changes in
behavior like this are tolerated. Some of those lifecycles are
incredibly long, in the 5+ years range. It could be really nice if they
would never see use_hierarchy=0 *at all*, which is much better
accomplished by a kernel-side switch. A Kconfig option is the choice
between carrying either an upstream patch or no patch at all (Depending
on timing), and carrying a non-standard patch.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/10] net/macb: Add support for Gigabit Ethernet mode

2012-09-05 Thread Nicolas Ferre
From: Patrice Vilchez 

Add Gigabit Ethernet mode to GEM cadence IP and enable RGMII connection.

Signed-off-by: Patrice Vilchez 
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   15 ---
 drivers/net/ethernet/cadence/macb.h |4 
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 033064b..9a10f69 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -152,13 +152,17 @@ static void macb_handle_link_change(struct net_device 
*dev)
 
reg = macb_readl(bp, NCFGR);
reg &= ~(MACB_BIT(SPD) | MACB_BIT(FD));
+   if (macb_is_gem(bp))
+   reg &= ~GEM_BIT(GBE);
 
if (phydev->duplex)
reg |= MACB_BIT(FD);
if (phydev->speed == SPEED_100)
reg |= MACB_BIT(SPD);
+   if (phydev->speed == SPEED_1000)
+   reg |= GEM_BIT(GBE);
 
-   macb_writel(bp, NCFGR, reg);
+   macb_or_gem_writel(bp, NCFGR, reg);
 
bp->speed = phydev->speed;
bp->duplex = phydev->duplex;
@@ -216,7 +220,10 @@ static int macb_mii_probe(struct net_device *dev)
}
 
/* mask with MAC supported features */
-   phydev->supported &= PHY_BASIC_FEATURES;
+   if (macb_is_gem(bp))
+   phydev->supported &= PHY_GBIT_FEATURES;
+   else
+   phydev->supported &= PHY_BASIC_FEATURES;
 
phydev->advertising = phydev->supported;
 
@@ -1384,7 +1391,9 @@ static int __init macb_probe(struct platform_device *pdev)
bp->phy_interface = err;
}
 
-   if (bp->phy_interface == PHY_INTERFACE_MODE_RMII)
+   if (bp->phy_interface == PHY_INTERFACE_MODE_RGMII)
+   macb_or_gem_writel(bp, USRIO, GEM_BIT(RGMII));
+   else if (bp->phy_interface == PHY_INTERFACE_MODE_RMII)
 #if defined(CONFIG_ARCH_AT91)
macb_or_gem_writel(bp, USRIO, (MACB_BIT(RMII) |
   MACB_BIT(CLKEN)));
diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 335e288..f69ceef 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -145,6 +145,8 @@
 #define MACB_IRXFCS_SIZE   1
 
 /* GEM specific NCFGR bitfields. */
+#define GEM_GBE_OFFSET 10
+#define GEM_GBE_SIZE   1
 #define GEM_CLK_OFFSET 18
 #define GEM_CLK_SIZE   3
 #define GEM_DBW_OFFSET 21
@@ -246,6 +248,8 @@
 /* Bitfields in USRIO (AT91) */
 #define MACB_RMII_OFFSET   0
 #define MACB_RMII_SIZE 1
+#define GEM_RGMII_OFFSET   0   /* GEM gigabit mode */
+#define GEM_RGMII_SIZE 1
 #define MACB_CLKEN_OFFSET  1
 #define MACB_CLKEN_SIZE1
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10] net/macb: change debugging messages

2012-09-05 Thread Nicolas Ferre
From: Havard Skinnemoen 

Convert some noisy netdev_dbg() statements to netdev_vdbg(). Defining
DEBUG will no longer fill up the logs; VERBOSE_DEBUG still does.
Add one more verbose debug for ISR status.

Signed-off-by: Havard Skinnemoen 
[nicolas.fe...@atmel.com: split patch in topics, add ISR status]
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 26ca01e..2228dfc 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -313,7 +313,7 @@ static void macb_tx(struct macb *bp)
status = macb_readl(bp, TSR);
macb_writel(bp, TSR, status);
 
-   netdev_dbg(bp->dev, "macb_tx status = %02lx\n", (unsigned long)status);
+   netdev_vdbg(bp->dev, "macb_tx status = %02lx\n", (unsigned long)status);
 
if (status & (MACB_BIT(UND) | MACB_BIT(TSR_RLE))) {
int i;
@@ -380,7 +380,7 @@ static void macb_tx(struct macb *bp)
if (!(bufstat & MACB_BIT(TX_USED)))
break;
 
-   netdev_dbg(bp->dev, "skb %u (data %p) TX complete\n",
+   netdev_vdbg(bp->dev, "skb %u (data %p) TX complete\n",
   tail, skb->data);
dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
 DMA_TO_DEVICE);
@@ -406,7 +406,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
 
len = MACB_BFEXT(RX_FRMLEN, bp->rx_ring[last_frag].ctrl);
 
-   netdev_dbg(bp->dev, "macb_rx_frame frags %u - %u (len %u)\n",
+   netdev_vdbg(bp->dev, "macb_rx_frame frags %u - %u (len %u)\n",
   first_frag, last_frag, len);
 
skb = netdev_alloc_skb(bp->dev, len + RX_OFFSET);
@@ -453,7 +453,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
 
bp->stats.rx_packets++;
bp->stats.rx_bytes += len;
-   netdev_dbg(bp->dev, "received skb of length %u, csum: %08x\n",
+   netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
   skb->len, skb->csum);
netif_receive_skb(skb);
 
@@ -535,7 +535,7 @@ static int macb_poll(struct napi_struct *napi, int budget)
 
work_done = 0;
 
-   netdev_dbg(bp->dev, "poll: status = %08lx, budget = %d\n",
+   netdev_vdbg(bp->dev, "poll: status = %08lx, budget = %d\n",
   (unsigned long)status, budget);
 
work_done = macb_rx(bp, budget);
@@ -574,6 +574,8 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
break;
}
 
+   netdev_vdbg(bp->dev, "isr = 0x%08lx\n", (unsigned long)status);
+
if (status & MACB_RX_INT_FLAGS) {
/*
 * There's no point taking any more interrupts
@@ -585,7 +587,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
macb_writel(bp, IDR, MACB_RX_INT_FLAGS);
 
if (napi_schedule_prep(&bp->napi)) {
-   netdev_dbg(bp->dev, "scheduling RX softirq\n");
+   netdev_vdbg(bp->dev, "scheduling RX softirq\n");
__napi_schedule(&bp->napi);
}
}
@@ -647,8 +649,8 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
u32 ctrl;
unsigned long flags;
 
-#ifdef DEBUG
-   netdev_dbg(bp->dev,
+#if defined(DEBUG) && defined(VERBOSE_DEBUG)
+   netdev_vdbg(bp->dev,
   "start_xmit: len %u head %p data %p tail %p end %p\n",
   skb->len, skb->head, skb->data,
   skb_tail_pointer(skb), skb_end_pointer(skb));
@@ -670,12 +672,12 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
entry = bp->tx_head;
-   netdev_dbg(bp->dev, "Allocated ring entry %u\n", entry);
+   netdev_vdbg(bp->dev, "Allocated ring entry %u\n", entry);
mapping = dma_map_single(&bp->pdev->dev, skb->data,
 len, DMA_TO_DEVICE);
bp->tx_skb[entry].skb = skb;
bp->tx_skb[entry].mapping = mapping;
-   netdev_dbg(bp->dev, "Mapped skb data %p to DMA addr %08lx\n",
+   netdev_vdbg(bp->dev, "Mapped skb data %p to DMA addr %08lx\n",
   skb->data, (unsigned long)mapping);
 
ctrl = MACB_BF(TX_FRMLEN, len);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/10] net/macb: Fix a race in macb_start_xmit()

2012-09-05 Thread Nicolas Ferre
From: Havard Skinnemoen 

Fix a race in macb_start_xmit() where we unconditionally set the TSTART bit.
If an underrun just happened (we do this with interrupts disabled, so it might
not have been handled yet), the controller starts transmitting from the first
entry in the ring, which is usually wrong.
Restart the controller after error handling.

Signed-off-by: Havard Skinnemoen 
[nicolas.fe...@atmel.com: split patch in topics]
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 2228dfc..f4b8adf 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -390,6 +390,13 @@ static void macb_tx(struct macb *bp)
dev_kfree_skb_irq(skb);
}
 
+   /*
+* Someone may have submitted a new frame while this interrupt
+* was pending, or we may just have handled an error.
+*/
+   if (head != tail && !(status & MACB_BIT(TGO)))
+   macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+
bp->tx_tail = tail;
if (netif_queue_stopped(bp->dev) &&
TX_BUFFS_AVAIL(bp) > MACB_TX_WAKEUP_THRESH)
@@ -696,7 +703,18 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
skb_tx_timestamp(skb);
 
-   macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+   /*
+* Only start the controller if the queue was empty; otherwise
+* we may race against the hardware resetting the ring pointer
+* due to a transmit error.
+*
+* If the controller is idle but the queue isn't empty, there
+* must be a pending interrupt that will trigger as soon as we
+* re-enable interrupts, and the interrupt handler will make
+* sure the controler is started.
+*/
+   if (NEXT_TX(bp->tx_tail) == bp->tx_head)
+   macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 
if (TX_BUFFS_AVAIL(bp) < 1)
netif_stop_queue(dev);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 12:14 PM, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote:
>> The goal here is to have distributions to do it, because they tend to
>> have a well defined lifecycle management, much more than upstream. Whoever
>> sets this option, can coordinate with upstream.
> 
> Distros can just co-mount them during boot.  What's the point of the
> config options?
> 

Pretty simple. The kernel can't assume the distro did. And then we still
need to pay a stupid big price in the scheduler.

After this patchset, We can assume this. And cpuusage can totally be
derived from the cpu cgroup. Because much more than "they can comount",
we can assume they did.

>>> Also, I really don't see much point in enforcing this almost arbitrary
>>> grouping of controllers.  It doesn't simplify anything and using
>>> cpuacct in more granular way than cpu actually is one of the better
>>> justified use of multiple hierarchies.  Also, what about memcg and
>>> blkcg?  Do they *really* coincide?  Note that both blkcg and memcg
>>> involve non-trivial overhead and blkcg is essentially broken
>>> hierarchy-wise.
>>
>> Where did I mention memcg or blkcg in this patch ?
> 
> Differing hierarchies in memcg and blkcg currently is the most
> prominent case where the intersection in writeback is problematic and
> your proposed solution doesn't help one way or the other.  What's the
> point?
> 

The point is that I am focusing at one problem at a time. But FWIW, I
don't see why memcg/blkcg can't use a step just like this one in a
separate pass.

If the goal is comounting them eventually, at some point when the issues
are sorted out, just do it. Get a switch like this one, and then you
will start being able to assume a lot of things in the code. Miracles
can happen.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] staging/rts_pstor: remove braces {} in sd.c

2012-09-05 Thread Toshiaki Yamane
On Wed, Sep 5, 2012 at 4:30 AM, Greg Kroah-Hartman  wrote:
> On Sat, Sep 01, 2012 at 10:43:00PM +0900, Toshiaki Yamane wrote:
>> fixed below checkpatch warnings.
>> -WARNING: braces {} are not necessary for single statement blocks
>> -WARNING: braces {} are not necessary for any arm of this statement
>>
>> Signed-off-by: Toshiaki Yamane 
>> ---
>>  drivers/staging/rts_pstor/sd.c | 1112 
>> +---
>>  1 file changed, 469 insertions(+), 643 deletions(-)
>
> Why is the object file size changing with this patch applied?  That
> implies that something went wrong with your patch, care to redo it in a
> format that I can properly review it?

I understand.
I will redo it.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] module: signature infrastructure

2012-09-05 Thread Rusty Russell
"Kasatkin, Dmitry"  writes:
> Hi,
>
> Please read bellow...
>
> On Tue, Sep 4, 2012 at 8:55 AM, Rusty Russell  wrote:
>> OK, I took a look at the module.c parts of David and Dmitry's patchsets,
>> and didn't really like either, but I stole parts of David's to make
>> this.
>>
>> So, here's the module.c part of module signing.  I hope you two got time
>> to discuss the signature format details?  Mimi suggested a scheme where
>> the private key would never be saved on disk (even temporarily), but I
>> didn't see patches.  Frankly it's something we can do later; let's aim
>> at getting the format right for the next merge window.
>
> In our patches key is stored on the disc in encrypted format...

Oh, I missed that twist.  Thanks for the explanation.

On consideration, I prefer signing to be the final part of the "modules"
target rather than modules_install.  I run the latter as root, and that
is wrong for doing any code generation.

>> +   for (i = 0; i < *len - (sizeof(MODULE_SIG_STRING)-1); i++) {
>> +   /* Our memcmp is dumb, speed it up a little. */
>> +   if (((char *)mod)[i] != MODULE_SIG_STRING[0])
>> +   continue;
>> +   if (memcmp(mod, MODULE_SIG_STRING, 
>> strlen(MODULE_SIG_STRING)))
>
> should be (mod+i)?

Yes, indeed.  Thanks, fixed.

>> +   continue;
>> +
>> +   sig = mod + i + strlen(MODULE_SIG_STRING);
>> +   siglen = *len - i - strlen(MODULE_SIG_STRING);
>> +   *len = i;
>> +   break;
>> +   }
>
> In general please clarify why do you need such parsing at all?
> Why not to have MODULE_SIG_STRING as a last octets of the module and
> have signature length field before?
> Then it is easy to get the signature and rest of the module?
> That will be super fast...
>
> Please clarify.

Ignore performance, it's just not an issue here.  So the simplest code
wins.

And it's also simpler to sign a module this way.

(echo '~Module signature appended~'; gpg --sign ) >> mod.ko

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] i2c: nomadik: Add Device Tree support to the Nomadik I2C driver

2012-09-05 Thread Linus Walleij
On Wed, Sep 5, 2012 at 9:33 AM, Lee Jones  wrote:

> Author: Lee Jones 
> Date:   Mon Aug 6 11:09:57 2012 +0100
>
> i2c: nomadik: Add Device Tree support to the Nomadik I2C driver
>
> Here we apply the bindings required for successful Device Tree
> probing of the i2c-nomadik driver.
>
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Lee Jones 

Excellent :-)
Reviewed-by: Linus Walleij 

Wolfram are you picking this up?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] module: signature infrastructure

2012-09-05 Thread Rusty Russell
Lucas De Marchi  writes:
> Hi Rusty,
>
> On Tue, Sep 4, 2012 at 2:55 AM, Rusty Russell  wrote:
>> @@ -2399,7 +2437,50 @@ static inline void kmemleak_load_module(const struct 
>> module *mod,
>>  }
>>  #endif
>>
>> -/* Sets info->hdr and info->len. */
>> +#ifdef CONFIG_MODULE_SIG
>> +static int module_sig_check(struct load_info *info,
>> +   void *mod, unsigned long *len)
>> +{
>> +   int err;
>> +   unsigned long i, siglen;
>> +   char *sig = NULL;
>> +
>> +   /* This is not a valid module: ELF header is larger anyway. */
>> +   if (*len < sizeof(MODULE_SIG_STRING))
>> +   return -ENOEXEC;
>> +
>> +   for (i = 0; i < *len - (sizeof(MODULE_SIG_STRING)-1); i++) {
>> +   /* Our memcmp is dumb, speed it up a little. */
>> +   if (((char *)mod)[i] != MODULE_SIG_STRING[0])
>> +   continue;
>
> Since the signature is appended to the module, why don't you go
> backwards, starting from *len - strlen(sizeof(MODULE_SIG_STRING)) and
> making this first comparison?

We've had this discussion multiple times.  Simple wins.  It's so
marginal, I don't really care, but I've changed it to:

int err;
unsigned long i, siglen, markerlen;
char *sig = NULL;

markerlen = strlen(MODULE_SIG_STRING);
/* This is not a valid module: ELF header is larger anyway. */
if (*len < markerlen)
return -ENOEXEC;

for (i = *len - markerlen; i > 0; i--) {
/* Our memcmp is dumb, speed it up a little. */
if (((char *)mod)[i] != MODULE_SIG_STRING[0])
continue;
if (memcmp(mod+i, MODULE_SIG_STRING, markerlen))
continue;

sig = mod + i + markerlen;
siglen = *len - i - markerlen;
*len = i;
break;
}

We could also implement memrchr(), or memrmem().  Hell, if we had
memmem() in the kernel I'd gladly use it.

> Or let the magic string as the last thing in the module and store the
> signature length, too. In this case no scanning is needed

Yes, they did that too, but append is simpler.  I don't even have to
think about endianness (Dmitry chose be32) or parsing (David chose
5-digit ascii numeric encoding).

Scanning the module is the least of our issues since we've just copied
it and we're about to SHA it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/10] net/macb: driver enhancement concerning GEM support, ring logic and cleanup

2012-09-05 Thread Nicolas Ferre
This is an enhancement work that began several years ago. I try to catchup with
some performance improvement that has been implemented then by Havard.
The ring index logic and the TX error path modification are the biggest changes
but some cleanup/debugging have been added along the way.
The GEM revision will benefit from the Gigabit support.

The series has been tested on several Atmel AT91 SoC with the two MACB/GEM
flavors.

Havard Skinnemoen (5):
  net/macb: memory barriers cleanup
  net/macb: change debugging messages
  net/macb: Fix a race in macb_start_xmit()
  net/macb: clean up ring buffer logic
  net/macb: Offset first RX buffer by two bytes

Nicolas Ferre (4):
  net/macb: better manage tx errors
  net/macb: tx status is more than 8 bits now
  net/macb: macb_get_drvinfo: add GEM/MACB suffix to differentiate
revision
  net/macb: ethtool interface: add register dump feature

Patrice Vilchez (1):
  net/macb: Add support for Gigabit Ethernet mode

 drivers/net/ethernet/cadence/macb.c |  408 ---
 drivers/net/ethernet/cadence/macb.h |   29 ++-
 2 files changed, 304 insertions(+), 133 deletions(-)

-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/10] net/macb: memory barriers cleanup

2012-09-05 Thread Nicolas Ferre
From: Havard Skinnemoen 

Remove a couple of unneeded barriers and document the remaining ones.

Signed-off-by: Havard Skinnemoen 
[nicolas.fe...@atmel.com: split patch in topics]
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 9a10f69..26ca01e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -372,7 +372,9 @@ static void macb_tx(struct macb *bp)
 
BUG_ON(skb == NULL);
 
+   /* Make hw descriptor updates visible to CPU */
rmb();
+
bufstat = bp->tx_ring[tail].ctrl;
 
if (!(bufstat & MACB_BIT(TX_USED)))
@@ -415,7 +417,10 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
if (frag == last_frag)
break;
}
+
+   /* Make descriptor updates visible to hardware */
wmb();
+
return 1;
}
 
@@ -436,12 +441,14 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
   frag_len);
offset += RX_BUFFER_SIZE;
bp->rx_ring[frag].addr &= ~MACB_BIT(RX_USED);
-   wmb();
 
if (frag == last_frag)
break;
}
 
+   /* Make descriptor updates visible to hardware */
+   wmb();
+
skb->protocol = eth_type_trans(skb, bp->dev);
 
bp->stats.rx_packets++;
@@ -461,6 +468,8 @@ static void discard_partial_frame(struct macb *bp, unsigned 
int begin,
 
for (frag = begin; frag != end; frag = NEXT_RX(frag))
bp->rx_ring[frag].addr &= ~MACB_BIT(RX_USED);
+
+   /* Make descriptor updates visible to hardware */
wmb();
 
/*
@@ -479,7 +488,9 @@ static int macb_rx(struct macb *bp, int budget)
for (; budget > 0; tail = NEXT_RX(tail)) {
u32 addr, ctrl;
 
+   /* Make hw descriptor updates visible to CPU */
rmb();
+
addr = bp->rx_ring[tail].addr;
ctrl = bp->rx_ring[tail].ctrl;
 
@@ -674,6 +685,8 @@ static int macb_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
bp->tx_ring[entry].addr = mapping;
bp->tx_ring[entry].ctrl = ctrl;
+
+   /* Make newly initialized descriptor visible to hardware */
wmb();
 
entry = NEXT_TX(entry);
@@ -782,9 +795,6 @@ static void macb_init_rings(struct macb *bp)
 
 static void macb_reset_hw(struct macb *bp)
 {
-   /* Make sure we have the write buffer for ourselves */
-   wmb();
-
/*
 * Disable RX and TX (XXX: Should we halt the transmission
 * more gracefully?)
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2] regulator: tps6586x: register regulator even if no init data

2012-09-05 Thread Laxman Dewangan

On Wednesday 05 September 2012 02:17 AM, Stephen Warren wrote:

On 08/29/2012 09:01 AM, Laxman Dewangan wrote:

Register all TPS6586x regulators even if there is no regulator
init data for platform i.e. without any user-supplied constraints.

Signed-off-by: Laxman Dewangan

Tested-by: Stephen Warren

Note that this patch depends on the patch I just posted titled
"regulator: tps6586x: add support for SYS rail". I also believe Laxman
will be posting another patch based on these 2 soon (it will move the
regulator DT parsing out of the MFD driver into the regulator driver),
so I guess it makes sense to take them all through the same TPS6586x
topic branch in the regulator tree.


About next patch (moving regulator dt parsing out of mfd), do you want 
to support the Harmony also?
If yes then same change should also contain the board-harmony-power.c 
changes.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pinctrl: tegra: move pinconf-tegra.h content into drivers/pinctrl

2012-09-05 Thread Linus Walleij
On Wed, Sep 5, 2012 at 12:39 AM, Stephen Warren  wrote:

> From: Stephen Warren 
>
> Now that Tegra's pinmux is configured solely from device tree, there's
> no need for the pinconf types to be defined in arch/arm/mach-tegra/.
> Move it into the pinctrl directory to clean up mach-tegra, as a pre-
> requisite for single-zImage.
>
> Signed-off-by: Stephen Warren 
> ---
> I'll need to take this through the Tegra tree, since it depends on some
> patches that remove inclusion of the deleted header.

Go for it:
Acked-by: Linus Walleij 

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why exported const value modified by another driver not updated in original driver

2012-09-05 Thread Manavendra Nath Manav
On Tue, Sep 4, 2012 at 5:55 PM, Dan Carpenter  wrote:
> On Tue, Sep 04, 2012 at 03:58:20PM +0530, Manavendra Nath Manav wrote:
>> Is the above a genuine kernel bug, or i am missing something out here. Pls 
>> help.
>>
>
> When you declare something as const then the compiler assumes it
> really is const and uses a literal instead of reading from memory.
> I'm surprised the compiler doesn't print a warning message.
>
> It has to do with compilers, nothing to do with kernels.
>
> regards,
> dan carpenter

Thanks All,
I understood the problem and current gcc behaviour after looking at
output of objdump of driver.ko file when the variable is declared as
"const" and in second case as "const volatile". The compiler optimises
by directly passing the value in first case and the address of
variable in second case. Thanks for all the help and clarification.

push   $0x7b // 123 in decimal
push   $0x0

-- 
Manavendra Nath Manav
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 5/17] input: rmidev character driver for RMI4 sensors

2012-09-05 Thread Linus Walleij
On Wed, Sep 5, 2012 at 2:26 AM, Christopher Heiny  wrote:
> On 08/27/2012 11:49 AM, Linus Walleij wrote:
>>
>> You need to patch your desired major number into
>> Documentation/devices.txt'
>
> We were going by the recommendation in Linux Device Drivers (3rd edition) to
> use dynamic major number allocation via alloc_chrdev_region.  In particular
> in section 3.2.3 it says "new numbers are not being assigned".  I guess at
> this point we need to know whether the info in LDD3 is authoritative or not.

You're right, go for dynamic numbers. I was plain wrong.

>>> +static struct class *rmidev_device_class;
>>
>>
>> Last time discussed with Greg, class devices were deprecated,
>> and you should just use a bus instead. (But not sure.)
>
> The references I found online weren't clear on this, so more investigation
> is required.  We'll defer that till we find out if the regmap changes
> eliminate the need for this.

Just push Greg to review next version and he'll tell you what to
do about this, no problem.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hello, Glauber.

On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote:
> > Distros can just co-mount them during boot.  What's the point of the
> > config options?
> 
> Pretty simple. The kernel can't assume the distro did. And then we still
> need to pay a stupid big price in the scheduler.
> 
> After this patchset, We can assume this. And cpuusage can totally be
> derived from the cpu cgroup. Because much more than "they can comount",
> we can assume they did.

As long as cpuacct and cpu are separate, I think it makes sense to
assume that they at least could be at different granularity.  As for
optimization for co-mounted case, if that is *really* necessary,
couldn't it be done dynamically?  It's not like CONFIG_XXX blocks are
pretty things and they're worse for runtime code path coverage.

> > Differing hierarchies in memcg and blkcg currently is the most
> > prominent case where the intersection in writeback is problematic and
> > your proposed solution doesn't help one way or the other.  What's the
> > point?
> 
> The point is that I am focusing at one problem at a time. But FWIW, I
> don't see why memcg/blkcg can't use a step just like this one in a
> separate pass.
> 
> If the goal is comounting them eventually, at some point when the issues
> are sorted out, just do it. Get a switch like this one, and then you
> will start being able to assume a lot of things in the code. Miracles
> can happen.

The problem is that I really don't see how this leads to where we
eventually wanna be.  Orthogonal hierarchies are bad because,

* It complicates the code.  This doesn't really help there much.

* Intersections between controllers are cumbersome to handle.  Again,
  this doesn't help much.

And this restricts the only valid use case for multiple hierarchies
which is applying differing level of granularity depending on
controllers.  So, I don't know.  Doesn't seem like a good idea to me.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 12:29 PM, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote:
>>> Distros can just co-mount them during boot.  What's the point of the
>>> config options?
>>
>> Pretty simple. The kernel can't assume the distro did. And then we still
>> need to pay a stupid big price in the scheduler.
>>
>> After this patchset, We can assume this. And cpuusage can totally be
>> derived from the cpu cgroup. Because much more than "they can comount",
>> we can assume they did.
> 
> As long as cpuacct and cpu are separate, I think it makes sense to
> assume that they at least could be at different granularity.  

If they are comounted, and more: forceably comounted, I don't see how to
call them separate. At the very best, they are this way for
compatibility purposes only, to lay a path that would allow us to get
rid of the separation eventually.

> As for
> optimization for co-mounted case, if that is *really* necessary,
> couldn't it be done dynamically?  It's not like CONFIG_XXX blocks are
> pretty things and they're worse for runtime code path coverage.
> 

I've done it dynamically, as you know. But if you think that complicated
the code less than this, we're operating by very different standards...

CONFIG options can make the code uglier, but it is a lot more
predictable. It also guarantee no state changes will happen during the
lifecycle of the machine. Doing it dynamically makes the code prettier,
but still extensively large, and prone to subtle bugs, as we've already
seen in practice.

>>> Differing hierarchies in memcg and blkcg currently is the most
>>> prominent case where the intersection in writeback is problematic and
>>> your proposed solution doesn't help one way or the other.  What's the
>>> point?
>>
>> The point is that I am focusing at one problem at a time. But FWIW, I
>> don't see why memcg/blkcg can't use a step just like this one in a
>> separate pass.
>>
>> If the goal is comounting them eventually, at some point when the issues
>> are sorted out, just do it. Get a switch like this one, and then you
>> will start being able to assume a lot of things in the code. Miracles
>> can happen.
> 
> The problem is that I really don't see how this leads to where we
> eventually wanna be.  Orthogonal hierarchies are bad because,
> 
> * It complicates the code.  This doesn't really help there much.
> 

Way I see it, it is the price we pay for having screwed up before.
And Kconfig options doesn't necessarily complicate the code. They make
it bigger, and possibly slightly harder to follow. But I myself

> * Intersections between controllers are cumbersome to handle.  Again,
>   this doesn't help much.
>

They are only cumbersome because we can't assume nothing. The cpuacct is
the perfect example. Once we can start assuming, they become a lot less so.


> And this restricts the only valid use case for multiple hierarchies
> which is applying differing level of granularity depending on
> controllers.  So, I don't know.  Doesn't seem like a good idea to me.
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] decnet: fix shutdown parameter checking

2012-09-05 Thread Steven Whitehouse
Hi,

On Fri, 2012-08-31 at 15:57 -0400, David Miller wrote:
> From: Steven Whitehouse 
> Date: Mon, 27 Aug 2012 10:16:41 +0100
> 
> > On Sun, 2012-08-26 at 22:37 -0400, Xi Wang wrote:
> >> The allowed value of "how" is SHUT_RD/SHUT_WR/SHUT_RDWR (0/1/2),
> >> rather than SHUTDOWN_MASK (3).
> >> 
> >> Signed-off-by: Xi Wang 
> > Acked-by: Steven Whitehouse 
> 
> Applied to net-next.
> 
> > Although it could be argued that we should also continue to accept the
> > value 3 just in case there is any userland software out there which
> > sends that value,
> 
> True, but this is a rather standard BSD socket interface with a very
> specific small set of legitimate input parameters.  Allowing
> deviation, even for compatability for specific protocols, is largely
> unwise.

Yes, I'd agree on the whole, and certainly if this was a recent
addition. However since this code has been around for somewhere close to
16 years now, I'd say that means that either (a) nobody calls shutdown
for DECnet or (b) existing users are buggy too.

We do have a precedent for this kind of compatibility, such as the AX.25
use of SOCK_SEQPACKET.

However, I'm not overly worried and we'll soon know if it will cause any
problems or not,

Steve.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][trivial]vfs: Fix a typo in fs/libfs.c

2012-09-05 Thread ycnian
From: Yanchuan Nian 

Just a typo in the description of function generic_fh_to_parent. Please apply.

Signed-off-by: Yanchuan Nian 
---
 fs/libfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index a74cb17..7cc37ca 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -874,7 +874,7 @@ struct dentry *generic_fh_to_dentry(struct super_block *sb, 
struct fid *fid,
 EXPORT_SYMBOL_GPL(generic_fh_to_dentry);
 
 /**
- * generic_fh_to_dentry - generic helper for the fh_to_parent export operation
+ * generic_fh_to_parent - generic helper for the fh_to_parent export operation
  * @sb:filesystem to do the file handle conversion on
  * @fid:   file handle to convert
  * @fh_len:length of the file handle in bytes
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] DMA/RaidEngine: Enable FSL RaidEngine

2012-09-05 Thread Shi Xuelin-B29237
Hi Dan,

Do you have any comments about this RaidEngine patch?

Thanks,
Forrest

-Original Message-
From: Linuxppc-dev 
[mailto:linuxppc-dev-bounces+qiang.liu=freescale@lists.ozlabs.org] On 
Behalf Of Shi Xuelin-B29237
Sent: 2012年8月22日 14:24
To: dan.j.willi...@gmail.com; vinod.k...@intel.com; 
linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
Cc: Rai Harninder-B01044; Rai Harninder-B01044; i...@ovro.caltech.edu; Burmi 
Naveen-B16502; Burmi Naveen-B16502; Shi Xuelin-B29237
Subject: [PATCH] DMA/RaidEngine: Enable FSL RaidEngine

From: Xuelin Shi 

The RaidEngine is a new FSL hardware that used as hardware acceration for 
RAID5/6.

This patch enables the RaidEngine functionality and provides hardware 
offloading capability for memcpy, xor and raid6 pq computation. It works under 
dmaengine control with async_layer interface.

Signed-off-by: Harninder Rai 
Signed-off-by: Naveen Burmi 
Signed-off-by: Xuelin Shi 
---
 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi|1 +
 arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi |6 +
 arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi |   85 ++
 drivers/dma/Kconfig|   14 +
 drivers/dma/Makefile   |1 +
 drivers/dma/fsl_raid.c | 1090 
 drivers/dma/fsl_raid.h |  294 +++
 7 files changed, 1491 insertions(+), 0 deletions(-)  create mode 100644 
arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
 create mode 100644 drivers/dma/fsl_raid.c  create mode 100644 
drivers/dma/fsl_raid.h

diff --git a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
index 64b6abe..5d7205b 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
@@ -354,4 +354,5 @@
 /include/ "qoriq-sata2-0.dtsi"
 /include/ "qoriq-sata2-1.dtsi"
 /include/ "qoriq-sec4.2-0.dtsi"
+/include/ "qoriq-raid1.0-0.dtsi"
 };
diff --git a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
index ae823a4..d54cd90 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
@@ -70,6 +70,12 @@
rtic_c = &rtic_c;
rtic_d = &rtic_d;
sec_mon = &sec_mon;
+
+   raideng = &raideng;
+   raideng_jr0 = &raideng_jr0;
+   raideng_jr1 = &raideng_jr1;
+   raideng_jr2 = &raideng_jr2;
+   raideng_jr3 = &raideng_jr3;
};
 
cpus {
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
new file mode 100644
index 000..8d2e8aa
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
@@ -0,0 +1,85 @@
+/*
+ * QorIQ RAID 1.0 device tree stub [ controller @ offset 0x32 ]
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of 
+the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND 
+ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
+ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR 
+ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
+DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
+SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 
+CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 
+OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE 
+USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+raideng: raideng@32 {
+   compatible = "fsl,raideng-v1.0";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   reg = <0x32 0x1>;
+   ranges = <0 0x32 0x1>;
+
+   raideng_jq0@1000 {
+   compatible = "fsl,rai

Re: [Drbd-dev] FLUSH/FUA documentation & code discrepancy

2012-09-05 Thread Philipp Reisner
> Currently, FLUSH/FUA doesn't enforce any ordering requirement.  File
> systems are responsible for draining all writes which have to happen
> before and not issue further writes which should come after.

Ok. That is a clear statement. So we will do it that way.

The "Currently" in you statement, suggests that there might be something
more mighty in the future. Is that true?

We are looking for a method that allows us to submit some writes, then
an IO-barrier, and then further writes. 

Best,
 Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86: Disable uncore on virtualized CPU.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 08:35 +0200, Ingo Molnar wrote:
> * Yan, Zheng  wrote:
> 
> > From: "Yan, Zheng" 
> > 
> > Initializing uncore PMU on virtualized CPU may hang the kernel.
> > This is because kvm does not emulate the entire hardware. Thers
> > are lots of uncore related MSRs, making kvm enumerate them all
> > is a non-trival task. So just disable uncore on virtualized CPU.
> > 
> > Signed-off-by: Yan, Zheng 
> > ---
> >  arch/x86/kernel/cpu/perf_event_intel_uncore.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c 
> > b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > index 0a55710..2f005ba 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > @@ -2898,6 +2898,9 @@ static int __init intel_uncore_init(void)
> > if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > return -ENODEV;
> >  
> > +   if (cpu_has_hypervisor)
> > +   return -ENODEV;
> > +
> > ret = uncore_pci_init();
> > if (ret)
> > goto fail;
> 
> Cannot the presence of the uncore hardware be detected in a 
> cleaner fashion, via the PCI config space and such?

No, part of the uncore PMUs are in MSR space and aren't discoverable.
CPUID model checks + hard assumptions of presence are all that we are
left with.

Now Avi suggested we teach KVM about these MSRs and then modify the
uncore driver to test if the MSRs actually work -- as in retain values
written to them and aren't always 0.

That's a larger patch though, partly because enumerating the gazillion
MSRs consumed by the various uncore PMUs is a tedious job, and we can
always do this later.

This patch is a minimal patch to at least make things 'work' for now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hello, Glauber.

On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote:
> > As long as cpuacct and cpu are separate, I think it makes sense to
> > assume that they at least could be at different granularity.  
> 
> If they are comounted, and more: forceably comounted, I don't see how to
> call them separate. At the very best, they are this way for
> compatibility purposes only, to lay a path that would allow us to get
> rid of the separation eventually.

I think this is where we disagree.  I didn't mean that all controllers
should be using exactly the same hierarchy when I was talking about
unified hierarchy.  I do think it's useful and maybe even essential to
allow differing levels of granularity.  cpu and cpuacct could be a
valid example for this.  Likely blkcg and memcg too.

So, I think it's desirable for all controllers to be able to handle
hierarchies the same way and to have the ability to tag something as
belonging to certain group in the hierarchy for all controllers but I
don't think it's desirable or feasible to require all of them to
follow exactly the same grouping at all levels.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Drbd-dev] FLUSH/FUA documentation & code discrepancy

2012-09-05 Thread Tejun Heo
On Wed, Sep 05, 2012 at 10:44:55AM +0200, Philipp Reisner wrote:
> > Currently, FLUSH/FUA doesn't enforce any ordering requirement.  File
> > systems are responsible for draining all writes which have to happen
> > before and not issue further writes which should come after.
> 
> Ok. That is a clear statement. So we will do it that way.
> 
> The "Currently" in you statement, suggests that there might be something
> more mighty in the future. Is that true?

Heh, I was more thinking about the past.  We used to have barrier
support with much stricter ordering.  I don't think we're gonna change
the ordering requirement in any foreseeable future.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] powerpc: fix personality handling in ppc64_personality()

2012-09-05 Thread Jiri Kosina
On Wed, 5 Sep 2012, Benjamin Herrenschmidt wrote:

> > Directly comparing current->personality against PER_LINUX32 doesn't work
> > in cases when any of the personality flags stored in the top three bytes
> > are used.
> > 
> > Directly forcefully setting personality to PER_LINUX32 or PER_LINUX
> > discards any flags stored in the top three bytes
> > 
> > Use personality() macro to compare only PER_MASK bytes and make sure that
> > we are setting only the bits that should be set, instead of
> > overwriting the whole value.
> > 
> > Signed-off-by: Jiri Kosina 
> > ---
> > 
> > changed since v1: fix the bit ops to reflect the fact that PER_LINUX is 
> > actually 0
> 
> Had already merged v1 (oops.. didn't spot the issue with PER_LINUX being
> 0). Can you send an incremental fixup ?

Hi Benjamin,

actually commit 7256a5d2da56 seems to contain the correct PER_LINUX 
handling, so seems like you picked the right one :)

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 12:47 PM, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote:
>>> As long as cpuacct and cpu are separate, I think it makes sense to
>>> assume that they at least could be at different granularity.  
>>
>> If they are comounted, and more: forceably comounted, I don't see how to
>> call them separate. At the very best, they are this way for
>> compatibility purposes only, to lay a path that would allow us to get
>> rid of the separation eventually.
> 
> I think this is where we disagree.  I didn't mean that all controllers
> should be using exactly the same hierarchy when I was talking about
> unified hierarchy.  I do think it's useful and maybe even essential to
> allow differing levels of granularity.  cpu and cpuacct could be a
> valid example for this.  Likely blkcg and memcg too.
> 
> So, I think it's desirable for all controllers to be able to handle
> hierarchies the same way and to have the ability to tag something as
> belonging to certain group in the hierarchy for all controllers but I
> don't think it's desirable or feasible to require all of them to
> follow exactly the same grouping at all levels.
> 

By "different levels of granularity" do you mean having just a subset of
them turned on at a particular place?

If yes, having them guaranteed to be comounted is still perceived by me
as a good first step. A natural following would be to turn them on/off
on a per-group basis.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/10] net/macb: clean up ring buffer logic

2012-09-05 Thread Nicolas Ferre
From: Havard Skinnemoen 

Instead of masking head and tail every time we increment them, just let them
wrap through UINT_MAX and mask them when subscripting. Add simple accessor
functions to do the subscripting properly to minimize the chances of messing
this up.

This makes the code slightly smaller, and hopefully faster as well.  Also,
doing the ring buffer management this way will simplify things a lot when
making the ring sizes configurable in the future.

Signed-off-by: Havard Skinnemoen 
[nicolas.fe...@atmel.com: split patch in topics, adapt to newer kernel]
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |  170 ++-
 drivers/net/ethernet/cadence/macb.h |   22 +++--
 2 files changed, 123 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index f4b8adf..3d3a077 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -31,24 +31,13 @@
 
 #define RX_BUFFER_SIZE 128
 #define RX_RING_SIZE   512
-#define RX_RING_BYTES  (sizeof(struct dma_desc) * RX_RING_SIZE)
+#define RX_RING_BYTES  (sizeof(struct macb_dma_desc) * RX_RING_SIZE)
 
 /* Make the IP header word-aligned (the ethernet header is 14 bytes) */
 #define RX_OFFSET  2
 
 #define TX_RING_SIZE   128
-#define DEF_TX_RING_PENDING(TX_RING_SIZE - 1)
-#define TX_RING_BYTES  (sizeof(struct dma_desc) * TX_RING_SIZE)
-
-#define TX_RING_GAP(bp)\
-   (TX_RING_SIZE - (bp)->tx_pending)
-#define TX_BUFFS_AVAIL(bp) \
-   (((bp)->tx_tail <= (bp)->tx_head) ? \
-(bp)->tx_tail + (bp)->tx_pending - (bp)->tx_head : \
-(bp)->tx_tail - (bp)->tx_head - TX_RING_GAP(bp))
-#define NEXT_TX(n) (((n) + 1) & (TX_RING_SIZE - 1))
-
-#define NEXT_RX(n) (((n) + 1) & (RX_RING_SIZE - 1))
+#define TX_RING_BYTES  (sizeof(struct macb_dma_desc) * TX_RING_SIZE)
 
 /* minimum number of free TX descriptors before waking up TX process */
 #define MACB_TX_WAKEUP_THRESH  (TX_RING_SIZE / 4)
@@ -56,6 +45,51 @@
 #define MACB_RX_INT_FLAGS  (MACB_BIT(RCOMP) | MACB_BIT(RXUBR)  \
 | MACB_BIT(ISR_ROVR))
 
+/* Ring buffer accessors */
+static unsigned int macb_tx_ring_wrap(unsigned int index)
+{
+   return index & (TX_RING_SIZE - 1);
+}
+
+static unsigned int macb_tx_ring_avail(struct macb *bp)
+{
+   return TX_RING_SIZE - (bp->tx_head - bp->tx_tail);
+}
+
+static struct macb_dma_desc *macb_tx_desc(struct macb *bp, unsigned int index)
+{
+   return &bp->tx_ring[macb_tx_ring_wrap(index)];
+}
+
+static struct macb_tx_skb *macb_tx_skb(struct macb *bp, unsigned int index)
+{
+   return &bp->tx_skb[macb_tx_ring_wrap(index)];
+}
+
+static dma_addr_t macb_tx_dma(struct macb *bp, unsigned int index)
+{
+   dma_addr_t offset;
+
+   offset = macb_tx_ring_wrap(index) * sizeof(struct macb_dma_desc);
+
+   return bp->tx_ring_dma + offset;
+}
+
+static unsigned int macb_rx_ring_wrap(unsigned int index)
+{
+   return index & (RX_RING_SIZE - 1);
+}
+
+static struct macb_dma_desc *macb_rx_desc(struct macb *bp, unsigned int index)
+{
+   return &bp->rx_ring[macb_rx_ring_wrap(index)];
+}
+
+static void *macb_rx_buffer(struct macb *bp, unsigned int index)
+{
+   return bp->rx_buffers + RX_BUFFER_SIZE * macb_rx_ring_wrap(index);
+}
+
 static void __macb_set_hwaddr(struct macb *bp)
 {
u32 bottom;
@@ -335,17 +369,18 @@ static void macb_tx(struct macb *bp)
bp->tx_ring[TX_RING_SIZE - 1].ctrl |= MACB_BIT(TX_WRAP);
 
/* free transmit buffer in upper layer*/
-   for (tail = bp->tx_tail; tail != head; tail = NEXT_TX(tail)) {
-   struct ring_info *rp = &bp->tx_skb[tail];
-   struct sk_buff *skb = rp->skb;
-
-   BUG_ON(skb == NULL);
+   for (tail = bp->tx_tail; tail != head; tail++) {
+   struct macb_tx_skb  *tx_skb;
+   struct sk_buff  *skb;
 
rmb();
 
-   dma_unmap_single(&bp->pdev->dev, rp->mapping, skb->len,
-DMA_TO_DEVICE);
-   rp->skb = NULL;
+   tx_skb = macb_tx_skb(bp, tail);
+   skb = tx_skb->skb;
+
+   dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
+   skb->len, DMA_TO_DEVICE);
+   tx_skb->skb = NULL;
dev_kfree_skb_irq(skb);
}
 
@@ -365,28 +400,32 @@ static void macb_tx(struct macb *bp)
return;
 
head = bp->tx_head;
-   for (tail = bp->tx_tail; tail != head; tail = NEXT_TX(tail)) {
-

[PATCH 07/10] net/macb: tx status is more than 8 bits now

2012-09-05 Thread Nicolas Ferre
On some revision of GEM, TSR status register is has more information.

Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index af71151..bd331fd 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -390,7 +390,7 @@ static void macb_tx_interrupt(struct macb *bp)
status = macb_readl(bp, TSR);
macb_writel(bp, TSR, status);
 
-   netdev_vdbg(bp->dev, "macb_tx_interrupt status = %02lx\n",
+   netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
(unsigned long)status);
 
head = bp->tx_head;
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/10] net/macb: better manage tx errors

2012-09-05 Thread Nicolas Ferre
Handle all TX errors, not only underruns.
Reinitialize the TX ring after skipping all remaining frames, and
restart the controller when everything has been cleaned up properly.

Original idea from a patch by Havard Skinnemoen.

Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |  124 ---
 1 file changed, 71 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 3d3a077..af71151 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -44,6 +44,10 @@
 
 #define MACB_RX_INT_FLAGS  (MACB_BIT(RCOMP) | MACB_BIT(RXUBR)  \
 | MACB_BIT(ISR_ROVR))
+#define MACB_TX_INT_FLAGS  (MACB_BIT(ISR_TUND) \
+   | MACB_BIT(ISR_RLE) \
+   | MACB_BIT(TXERR)   \
+   | MACB_BIT(TCOMP))
 
 /* Ring buffer accessors */
 static unsigned int macb_tx_ring_wrap(unsigned int index)
@@ -338,66 +342,56 @@ static void macb_update_stats(struct macb *bp)
*p += __raw_readl(reg);
 }
 
-static void macb_tx(struct macb *bp)
+static void macb_handle_tx_error(struct macb *bp, unsigned int err_tail, u32 
ctrl)
 {
-   unsigned int tail;
-   unsigned int head;
-   u32 status;
-
-   status = macb_readl(bp, TSR);
-   macb_writel(bp, TSR, status);
+   struct macb_tx_skb  *tx_skb;
+   struct sk_buff  *skb;
+   unsigned inthead = bp->tx_head;
 
-   netdev_vdbg(bp->dev, "macb_tx status = %02lx\n", (unsigned long)status);
+   netdev_dbg(bp->dev, "TX error: ctrl 0x%08x, head %u, error tail %u\n",
+  ctrl, head, err_tail);
 
-   if (status & (MACB_BIT(UND) | MACB_BIT(TSR_RLE))) {
-   int i;
-   netdev_err(bp->dev, "TX %s, resetting buffers\n",
-  status & MACB_BIT(UND) ?
-  "underrun" : "retry limit exceeded");
-
-   /* Transfer ongoing, disable transmitter, to avoid confusion */
-   if (status & MACB_BIT(TGO))
-   macb_writel(bp, NCR, macb_readl(bp, NCR) & 
~MACB_BIT(TE));
-
-   head = bp->tx_head;
-
-   /*Mark all the buffer as used to avoid sending a lost buffer*/
-   for (i = 0; i < TX_RING_SIZE; i++)
-   bp->tx_ring[i].ctrl = MACB_BIT(TX_USED);
-
-   /* Add wrap bit */
-   bp->tx_ring[TX_RING_SIZE - 1].ctrl |= MACB_BIT(TX_WRAP);
+   /*
+* "Buffers exhausted mid-frame" errors may only happen if the
+* driver is buggy, so complain loudly about those. Statistics
+* are updated by hardware.
+*/
+   if (ctrl & MACB_BIT(TX_BUF_EXHAUSTED))
+   netdev_err(bp->dev, "BUG: TX buffers exhausted mid-frame\n");
 
-   /* free transmit buffer in upper layer*/
-   for (tail = bp->tx_tail; tail != head; tail++) {
-   struct macb_tx_skb  *tx_skb;
-   struct sk_buff  *skb;
+   /*
+* Drop the frames that caused the error plus all remaining in queue.
+* Free transmit buffers in upper layer.
+*/
+   for (; err_tail != head; err_tail++) {
+   struct macb_dma_desc*desc;
 
-   rmb();
+   tx_skb = macb_tx_skb(bp, err_tail);
+   skb = tx_skb->skb;
+   dma_unmap_single(&bp->pdev->dev, tx_skb->mapping, skb->len,
+DMA_TO_DEVICE);
+   dev_kfree_skb_irq(skb);
+   tx_skb->skb = NULL;
 
-   tx_skb = macb_tx_skb(bp, tail);
-   skb = tx_skb->skb;
+   desc = macb_tx_desc(bp, err_tail);
+   desc->ctrl |= MACB_BIT(TX_USED);
+   }
 
-   dma_unmap_single(&bp->pdev->dev, tx_skb->mapping,
-   skb->len, DMA_TO_DEVICE);
-   tx_skb->skb = NULL;
-   dev_kfree_skb_irq(skb);
-   }
+   /* Make descriptor updates visible to hardware */
+   wmb();
+}
 
-   bp->tx_head = bp->tx_tail = 0;
+static void macb_tx_interrupt(struct macb *bp)
+{
+   unsigned int tail;
+   unsigned int head;
+   u32 status;
 
-   /* Enable the transmitter again */
-   if (status & MACB_BIT(TGO))
-   macb_writel(bp, NCR, macb_readl(bp, NCR) | 
MACB_BIT(TE));
-   }
+   status = macb_readl(bp, TSR);
+   macb_writel(bp, TSR, status);
 
-   if (!(status & MACB_BIT(COMP)))
-   /*
-* This may happen when a buffer becomes complete
-* between reading the ISR and scanning the
-* descriptors. 

[PATCH 09/10] net/macb: ethtool interface: add register dump feature

2012-09-05 Thread Nicolas Ferre
Add macb_get_regs() ethtool function and its helper function:
macb_get_regs_len().

Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   40 +++
 drivers/net/ethernet/cadence/macb.h |3 +++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index c7c39f1..f31c0a7 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1321,10 +1321,50 @@ static void macb_get_drvinfo(struct net_device *dev,
strcpy(info->bus_info, dev_name(&bp->pdev->dev));
 }
 
+static int macb_get_regs_len(struct net_device *netdev)
+{
+   return MACB_GREGS_LEN * sizeof(u32);
+}
+
+static void macb_get_regs(struct net_device *dev, struct ethtool_regs *regs,
+ void *p)
+{
+   struct macb *bp = netdev_priv(dev);
+   unsigned int tail, head;
+   u32 *regs_buff = p;
+
+memset(p, 0, MACB_GREGS_LEN * sizeof(u32));
+   regs->version = MACB_BFEXT(IDNUM, macb_readl(bp, MID));
+
+   tail = macb_tx_ring_wrap(bp->tx_tail);
+   head = macb_tx_ring_wrap(bp->tx_head);
+
+   regs_buff[0]  = macb_readl(bp, NCR);
+   regs_buff[1]  = macb_or_gem_readl(bp, NCFGR);
+   regs_buff[2]  = macb_readl(bp, NSR);
+   regs_buff[3]  = macb_readl(bp, TSR);
+   regs_buff[4]  = macb_readl(bp, RBQP);
+   regs_buff[5]  = macb_readl(bp, TBQP);
+   regs_buff[6]  = macb_readl(bp, RSR);
+   regs_buff[7]  = macb_readl(bp, IMR);
+
+   regs_buff[8]  = tail;
+   regs_buff[9]  = head;
+   regs_buff[10] = macb_tx_dma(bp, tail);
+   regs_buff[11] = macb_tx_dma(bp, head);
+
+   if (macb_is_gem(bp)) {
+   regs_buff[12] = gem_readl(bp, USRIO);
+   regs_buff[13] = gem_readl(bp, DMACFG);
+   }
+}
+
 static const struct ethtool_ops macb_ethtool_ops = {
.get_settings   = macb_get_settings,
.set_settings   = macb_set_settings,
.get_drvinfo= macb_get_drvinfo,
+   .get_regs_len   = macb_get_regs_len,
+   .get_regs   = macb_get_regs,
.get_link   = ethtool_op_get_link,
.get_ts_info= ethtool_op_get_ts_info,
 };
diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 8a4ee2f..d509e88 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -10,6 +10,9 @@
 #ifndef _MACB_H
 #define _MACB_H
 
+
+#define MACB_GREGS_LEN 32
+
 /* MACB register offsets */
 #define MACB_NCR   0x
 #define MACB_NCFGR 0x0004
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] pinctrl: pinctrl-single: new type: pinctrl-single,bits

2012-09-05 Thread Peter Ujfalusi
Hello,

When configuring pinmux with pinctrl-single there could be a case when one
register is used to configure mux for more than one pin.
In this case the use of pinctrl-single,pins is a bit problematic since we can
only update the whole register (restricted by the mask).
In such a situations the pinctrl-single,bits could provide a safe way to handle
the mux.

pinctrl-single,bits takes three parameters: 
The sub mask is used to mask part of the register to make sure we do not change
bits outside of the scope of this pin.

The first patch in this series is to fix the previous pinctrl-since,pins
implementation because it was not using the mask on the value which could result
changed bits outside of the mask.

Regards,
Peter
---
Peter Ujfalusi (2):
  pinctrl: pinctrl-single: Make sure we do not change bits outside of
mask
  pinctrl: pinctrl-single: Add pinctrl-single,bits type of mux

 .../devicetree/bindings/pinctrl/pinctrl-single.txt |  9 +
 drivers/pinctrl/pinctrl-single.c   | 42 --
 2 files changed, 41 insertions(+), 10 deletions(-)

-- 
1.7.12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] pinctrl: pinctrl-single: Make sure we do not change bits outside of mask

2012-09-05 Thread Peter Ujfalusi
Use the pcs->fmask to make sure that the value is not changing (setting)
bits in areas where it should not.
To avoid situations like this:

pmx_dummy: pinmux@4a100040 {
compatible = "pinctrl-single";
reg = <0x4a100040 0x0196>;
#address-cells = <1>;
#size-cells = <0>;
pinctrl-single,register-width = <16>;
pinctrl-single,function-mask = <0x00ff>;
};

&pmx_dummy {
pinctrl-names = "default";
pinctrl-0 = <&board_pins>;

board_pins: pinmux_board_pins {
pinctrl-single,pins = <
0x6c 0xf0f
0x6e 0x10f
0x70 0x23f
0x72 0xa5f
>;
};
};

Signed-off-by: Peter Ujfalusi 
---
 drivers/pinctrl/pinctrl-single.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 76a4260..3508631 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -337,7 +337,7 @@ static int pcs_enable(struct pinctrl_dev *pctldev, unsigned 
fselector,
vals = &func->vals[i];
val = pcs->read(vals->reg);
val &= ~pcs->fmask;
-   val |= vals->val;
+   val |= (vals->val & pcs->fmask);
pcs->write(val, vals->reg);
}
 
-- 
1.7.12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/10] net/macb: macb_get_drvinfo: add GEM/MACB suffix to differentiate revision

2012-09-05 Thread Nicolas Ferre
Add an indication about which revision of the hardware we are running in
info->driver string.

Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index bd331fd..c7c39f1 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1313,6 +1313,10 @@ static void macb_get_drvinfo(struct net_device *dev,
struct macb *bp = netdev_priv(dev);
 
strcpy(info->driver, bp->pdev->dev.driver->name);
+   if (macb_is_gem(bp))
+   strcat(info->driver, " GEM");
+   else
+   strcat(info->driver, " MACB");
strcpy(info->version, "$Revision: 1.14 $");
strcpy(info->bus_info, dev_name(&bp->pdev->dev));
 }
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] pinctrl: pinctrl-single: Add pinctrl-single,bits type of mux

2012-09-05 Thread Peter Ujfalusi
With pinctrl-single,bits it is possible to update just part of the register
within the pinctrl-single,function-mask area.
This is useful when one register configures mmore than one pin's mux.

pinctrl-single,bits takes three parameters:


Signed-off-by: Peter Ujfalusi 
---
 .../devicetree/bindings/pinctrl/pinctrl-single.txt |  9 +
 drivers/pinctrl/pinctrl-single.c   | 42 --
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
index 5187f0d..287801d 100644
--- a/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-single.txt
@@ -31,6 +31,15 @@ device pinctrl register, and 0x118 contains the desired 
value of the
 pinctrl register. See the device example and static board pins example
 below for more information.
 
+In case when one register changes more than one pin's mux the
+pinctrl-single,bits can be used which takes three parameters:
+
+   pinctrl-single,bits = <0xdc 0x18, 0xff>;
+
+Where 0xdc is the offset from the pinctrl register base address for the
+device pinctrl register, 0x18 is the desired value, and 0xff is the sub mask to
+be used when applying this change to the register.
+
 Example:
 
 /* SoC common file */
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 3508631..aec338e 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -26,7 +26,8 @@
 #include "core.h"
 
 #define DRIVER_NAME"pinctrl-single"
-#define PCS_MUX_NAME   "pinctrl-single,pins"
+#define PCS_MUX_PINS_NAME  "pinctrl-single,pins"
+#define PCS_MUX_BITS_NAME  "pinctrl-single,bits"
 #define PCS_REG_NAME_LEN   ((sizeof(unsigned long) * 2) + 1)
 #define PCS_OFF_DISABLED   ~0U
 
@@ -54,6 +55,7 @@ struct pcs_pingroup {
 struct pcs_func_vals {
void __iomem *reg;
unsigned val;
+   unsigned mask;
 };
 
 /**
@@ -332,12 +334,17 @@ static int pcs_enable(struct pinctrl_dev *pctldev, 
unsigned fselector,
 
for (i = 0; i < func->nvals; i++) {
struct pcs_func_vals *vals;
-   unsigned val;
+   unsigned val, mask;
 
vals = &func->vals[i];
val = pcs->read(vals->reg);
-   val &= ~pcs->fmask;
-   val |= (vals->val & pcs->fmask);
+   if (!vals->mask)
+   mask = pcs->fmask;
+   else
+   mask = pcs->fmask & vals->mask;
+
+   val &= ~mask;
+   val |= (vals->val & mask);
pcs->write(val, vals->reg);
}
 
@@ -657,18 +664,29 @@ static int pcs_parse_one_pinctrl_entry(struct pcs_device 
*pcs,
 {
struct pcs_func_vals *vals;
const __be32 *mux;
-   int size, rows, *pins, index = 0, found = 0, res = -ENOMEM;
+   int size, params, rows, *pins, index = 0, found = 0, res = -ENOMEM;
struct pcs_function *function;
 
-   mux = of_get_property(np, PCS_MUX_NAME, &size);
-   if ((!mux) || (size < sizeof(*mux) * 2)) {
-   dev_err(pcs->dev, "bad data for mux %s\n",
-   np->name);
+   mux = of_get_property(np, PCS_MUX_PINS_NAME, &size);
+   if (mux) {
+   params = 2;
+   } else {
+   mux = of_get_property(np, PCS_MUX_BITS_NAME, &size);
+   if (!mux) {
+   dev_err(pcs->dev, "no valid property for %s\n",
+   np->name);
+   return -EINVAL;
+   }
+   params = 3;
+   }
+
+   if (size < (sizeof(*mux) * params)) {
+   dev_err(pcs->dev, "bad data for %s\n", np->name);
return -EINVAL;
}
 
size /= sizeof(*mux);   /* Number of elements in array */
-   rows = size / 2;/* Each row is a key value pair */
+   rows = size / params;   /* Each row is a key value pair */
 
vals = devm_kzalloc(pcs->dev, sizeof(*vals) * rows, GFP_KERNEL);
if (!vals)
@@ -686,6 +704,10 @@ static int pcs_parse_one_pinctrl_entry(struct pcs_device 
*pcs,
val = be32_to_cpup(mux + index++);
vals[found].reg = pcs->base + offset;
vals[found].val = val;
+   if (params == 3) {
+   val = be32_to_cpup(mux + index++);
+   vals[found].mask = val;
+   }
 
pin = pcs_get_pin_by_offset(pcs, offset);
if (pin < 0) {
-- 
1.7.12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] UDF: Add support for O_DIRECT

2012-09-05 Thread Ian Abbott

On 2012-09-04 16:11, Ian Abbott wrote:

On 2012-09-04 15:39, Jan Kara wrote:

On Tue 04-09-12 10:49:39, Ian Abbott wrote:

Add support for the O_DIRECT flag.  There are two cases to deal with:

   Out of curiosity, do you have a use for this feature or is it mostly
academic interest?


I'm planning to use it for an embedded project that needs to stream
large files off a CompactFlash card, but the data doesn't need to be in
the buffer cache as its only read once, and the system has very limited
memory bandwidth so I can't afford the the extra copy.  The old version
of this project only supported FAT, but that limited the file size to
about 4GiB.  The filesystem needs to be something reasonably
Windows-friendly, at least for adding the files to the CompactFlash card
in the first place.


Actually, remembering back (the old project was about 3 years ago), the 
main reason for using O_DIRECT was it was causing too much memory 
fragmentation on my MMU-less embedded system.  That and the extra 
overhead of managing the buffer cache for data that was only read once.


--
-=( Ian Abbott @ MEV Ltd.E-mail: )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587 )=-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/10] net/macb: Offset first RX buffer by two bytes

2012-09-05 Thread Nicolas Ferre
From: Havard Skinnemoen 

Make the ethernet frame payload word-aligned, possibly making the
memcpy into the skb a bit faster. This will be even more important
after we eliminate the copy altogether.

Also eliminate the redundant RX_OFFSET constant -- it has the same
definition and purpose as NET_IP_ALIGN.

Signed-off-by: Havard Skinnemoen 
[nicolas.fe...@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c |   23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index f31c0a7..f7716b6 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -33,9 +33,6 @@
 #define RX_RING_SIZE   512
 #define RX_RING_BYTES  (sizeof(struct macb_dma_desc) * RX_RING_SIZE)
 
-/* Make the IP header word-aligned (the ethernet header is 14 bytes) */
-#define RX_OFFSET  2
-
 #define TX_RING_SIZE   128
 #define TX_RING_BYTES  (sizeof(struct macb_dma_desc) * TX_RING_SIZE)
 
@@ -466,7 +463,7 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
 {
unsigned int len;
unsigned int frag;
-   unsigned int offset = 0;
+   unsigned int offset;
struct sk_buff *skb;
struct macb_dma_desc *desc;
 
@@ -477,7 +474,16 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
macb_rx_ring_wrap(first_frag),
macb_rx_ring_wrap(last_frag), len);
 
-   skb = netdev_alloc_skb(bp->dev, len + RX_OFFSET);
+   /*
+* The ethernet header starts NET_IP_ALIGN bytes into the
+* first buffer. Since the header is 14 bytes, this makes the
+* payload word-aligned.
+*
+* Instead of calling skb_reserve(NET_IP_ALIGN), we just copy
+* the two padding bytes into the skb so that we avoid hitting
+* the slowpath in memcpy(), and pull them off afterwards.
+*/
+   skb = netdev_alloc_skb(bp->dev, len + NET_IP_ALIGN);
if (!skb) {
bp->stats.rx_dropped++;
for (frag = first_frag; ; frag++) {
@@ -493,7 +499,8 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
return 1;
}
 
-   skb_reserve(skb, RX_OFFSET);
+   offset = 0;
+   len += NET_IP_ALIGN;
skb_checksum_none_assert(skb);
skb_put(skb, len);
 
@@ -517,10 +524,11 @@ static int macb_rx_frame(struct macb *bp, unsigned int 
first_frag,
/* Make descriptor updates visible to hardware */
wmb();
 
+   __skb_pull(skb, NET_IP_ALIGN);
skb->protocol = eth_type_trans(skb, bp->dev);
 
bp->stats.rx_packets++;
-   bp->stats.rx_bytes += len;
+   bp->stats.rx_bytes += skb->len;
netdev_vdbg(bp->dev, "received skb of length %u, csum: %08x\n",
   skb->len, skb->csum);
netif_receive_skb(skb);
@@ -985,6 +993,7 @@ static void macb_init_hw(struct macb *bp)
__macb_set_hwaddr(bp);
 
config = macb_mdc_clk_div(bp);
+   config |= MACB_BF(RBOF, NET_IP_ALIGN);  /* Make eth data aligned */
config |= MACB_BIT(PAE);/* PAuse Enable */
config |= MACB_BIT(DRFCS);  /* Discard Rx FCS */
config |= MACB_BIT(BIG);/* Receive oversized frames */
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 01:47 -0700, Tejun Heo wrote:
> I think this is where we disagree.  I didn't mean that all controllers
> should be using exactly the same hierarchy when I was talking about
> unified hierarchy.  I do think it's useful and maybe even essential to
> allow differing levels of granularity.  cpu and cpuacct could be a
> valid example for this.  Likely blkcg and memcg too.
> 
> So, I think it's desirable for all controllers to be able to handle
> hierarchies the same way and to have the ability to tag something as
> belonging to certain group in the hierarchy for all controllers but I
> don't think it's desirable or feasible to require all of them to
> follow exactly the same grouping at all levels. 

*confused* I always thought that was exactly what you meant with unified
hierarchy.

Doing all this runtime is just going to make the mess even bigger,
because now we have to deal with even more stupid cases.

So either we go and try to contain this mess as proposed by Glauber or
we go delete controllers.. I've had it with this crap.

---
 Documentation/cgroups/00-INDEX|   2 -
 Documentation/cgroups/cpuacct.txt |  49 
 include/linux/cgroup_subsys.h |   6 -
 init/Kconfig  |   6 -
 kernel/sched/core.c   | 247 --
 kernel/sched/fair.c   |   1 -
 kernel/sched/rt.c |   1 -
 kernel/sched/sched.h  |  45 ---
 kernel/sched/stop_task.c  |   1 -
 9 files changed, 358 deletions(-)

diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX
index 3f58fa3..9f100cc 100644
--- a/Documentation/cgroups/00-INDEX
+++ b/Documentation/cgroups/00-INDEX
@@ -2,8 +2,6 @@
- this file
 cgroups.txt
- Control Groups definition, implementation details, examples and API.
-cpuacct.txt
-   - CPU Accounting Controller; account CPU usage for groups of tasks.
 cpusets.txt
- documents the cpusets feature; assign CPUs and Mem to a set of tasks.
 devices.txt
diff --git a/Documentation/cgroups/cpuacct.txt 
b/Documentation/cgroups/cpuacct.txt
deleted file mode 100644
index 9d73cc0..000
--- a/Documentation/cgroups/cpuacct.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-CPU Accounting Controller
--
-
-The CPU accounting controller is used to group tasks using cgroups and
-account the CPU usage of these groups of tasks.
-
-The CPU accounting controller supports multi-hierarchy groups. An accounting
-group accumulates the CPU usage of all of its child groups and the tasks
-directly present in its group.
-
-Accounting groups can be created by first mounting the cgroup filesystem.
-
-# mount -t cgroup -ocpuacct none /sys/fs/cgroup
-
-With the above step, the initial or the parent accounting group becomes
-visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
-the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
-/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained
-by this group which is essentially the CPU time obtained by all the tasks
-in the system.
-
-New accounting groups can be created under the parent group /sys/fs/cgroup.
-
-# cd /sys/fs/cgroup
-# mkdir g1
-# echo $$ > g1/tasks
-
-The above steps create a new group g1 and move the current shell
-process (bash) into it. CPU time consumed by this bash and its children
-can be obtained from g1/cpuacct.usage and the same is accumulated in
-/sys/fs/cgroup/cpuacct.usage also.
-
-cpuacct.stat file lists a few statistics which further divide the
-CPU time obtained by the cgroup into user and system times. Currently
-the following statistics are supported:
-
-user: Time spent by tasks of the cgroup in user mode.
-system: Time spent by tasks of the cgroup in kernel mode.
-
-user and system are in USER_HZ unit.
-
-cpuacct controller uses percpu_counter interface to collect user and
-system times. This has two side effects:
-
-- It is theoretically possible to see wrong values for user and system times.
-  This is because percpu_counter_read() on 32bit systems isn't safe
-  against concurrent writes.
-- It is possible to see slightly outdated values for user and system times
-  due to the batch processing nature of percpu_counter.
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index dfae957..73b7cc1 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -25,12 +25,6 @@ SUBSYS(cpu_cgroup)
 
 /* */
 
-#ifdef CONFIG_CGROUP_CPUACCT
-SUBSYS(cpuacct)
-#endif
-
-/* */
-
 #ifdef CONFIG_MEMCG
 SUBSYS(mem_cgroup)
 #endif
diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..3ac9e1c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -674,12 +674,6 @@ config PROC_PID_CPUSET
depends on CPUSETS
default y
 
-config CGROUP_CPUACCT
-   bool "Simple CPU accounting cgroup subsystem"
-   help
- Provides a simple Resource Controller for monitoring the
- total CPU consumed by 

Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Peter Zijlstra
On Wed, 2012-09-05 at 11:06 +0200, Peter Zijlstra wrote:
> 
> So either we go and try to contain this mess as proposed by Glauber or
> we go delete controllers.. I've had it with this crap.
> 
> 

Glauber, the other approach is sending a patch that doesn't touch
cgroup.c but only the controllers and I'll merge it regardless of what
tj thinks.

We need some movement here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hello, Glauber.

On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote:
> > So, I think it's desirable for all controllers to be able to handle
> > hierarchies the same way and to have the ability to tag something as
> > belonging to certain group in the hierarchy for all controllers but I
> > don't think it's desirable or feasible to require all of them to
> > follow exactly the same grouping at all levels.
> 
> By "different levels of granularity" do you mean having just a subset of
> them turned on at a particular place?

Heh, this is tricky to describe and I'm not really following what you
mean.  They're all on the same tree but a controller should be able to
handle a given subtree as single group.  e.g. if you draw the tree,
different controllers should be able to draw different enclosing
circles and operate on the simplifed tree.  How flexible that should
be, I don't know.  Maybe it would be enough to be able to say "treat
all children of this node as belonging to this node for controllers X
and Y".

> If yes, having them guaranteed to be comounted is still perceived by me
> as a good first step. A natural following would be to turn them on/off
> on a per-group basis.

I don't agree with that.  If we do it that way, we would lose
differing granularity from forcing co-mounting and then restore it
later when the subtree handling is implemented.  If we can do away
with differing granularity, that's fine; otherwise, it doesn't make
much sense to remove and then restore it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf/x86: Disable uncore on virtualized CPU.

2012-09-05 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, 2012-09-05 at 08:35 +0200, Ingo Molnar wrote:
> > * Yan, Zheng  wrote:
> > 
> > > From: "Yan, Zheng" 
> > > 
> > > Initializing uncore PMU on virtualized CPU may hang the kernel.
> > > This is because kvm does not emulate the entire hardware. Thers
> > > are lots of uncore related MSRs, making kvm enumerate them all
> > > is a non-trival task. So just disable uncore on virtualized CPU.
> > > 
> > > Signed-off-by: Yan, Zheng 
> > > ---
> > >  arch/x86/kernel/cpu/perf_event_intel_uncore.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c 
> > > b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > > index 0a55710..2f005ba 100644
> > > --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > > +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> > > @@ -2898,6 +2898,9 @@ static int __init intel_uncore_init(void)
> > >   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > >   return -ENODEV;
> > >  
> > > + if (cpu_has_hypervisor)
> > > + return -ENODEV;
> > > +
> > >   ret = uncore_pci_init();
> > >   if (ret)
> > >   goto fail;
> > 
> > Cannot the presence of the uncore hardware be detected in a 
> > cleaner fashion, via the PCI config space and such?
> 
> No, part of the uncore PMUs are in MSR space and aren't 
> discoverable. CPUID model checks + hard assumptions of 
> presence are all that we are left with.
>
> Now Avi suggested we teach KVM about these MSRs and then 
> modify the uncore driver to test if the MSRs actually work -- 
> as in retain values written to them and aren't always 0.
> 
> That's a larger patch though, partly because enumerating the 
> gazillion MSRs consumed by the various uncore PMUs is a 
> tedious job, and we can always do this later.
> 
> This patch is a minimal patch to at least make things 'work' 
> for now.

Ok, no objections.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 01:07 PM, Tejun Heo wrote:
> Hello, Glauber.
> 
> On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote:
>>> So, I think it's desirable for all controllers to be able to handle
>>> hierarchies the same way and to have the ability to tag something as
>>> belonging to certain group in the hierarchy for all controllers but I
>>> don't think it's desirable or feasible to require all of them to
>>> follow exactly the same grouping at all levels.
>>
>> By "different levels of granularity" do you mean having just a subset of
>> them turned on at a particular place?
> 
> Heh, this is tricky to describe and I'm not really following what you
> mean. 

Do we really want to start cleaning up all this by changing the
interface to something that is described as "tricky" ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] mm: use get_page_migratetype instead of page_private

2012-09-05 Thread Mel Gorman
On Wed, Sep 05, 2012 at 04:26:00PM +0900, Minchan Kim wrote:
> page allocator uses set_page_private and page_private for handling
> migratetype when it frees page. Let's replace them with [set|get]
> _page_migratetype to make it more clear.
> 
> Signed-off-by: Minchan Kim 

Maybe it's because I'm used of setting set_page_private() in the page
allocator and what it means but I fear that it'll be very easy to confuse
get_page_migratetype() with get_pageblock_migratetype(). The former only
works while the page is in the buddy allocator. The latter can be called
at any time. I'm not against the patch as such but I'm not convinced
either :)

One nit below

> ---
>  include/linux/mm.h  |   10 ++
>  mm/page_alloc.c |   11 +++
>  mm/page_isolation.c |2 +-
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5c76634..86d61d6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -249,6 +249,16 @@ struct inode;
>  #define page_private(page)   ((page)->private)
>  #define set_page_private(page, v)((page)->private = (v))
>  
> +static inline void set_page_migratetype(struct page *page, int migratetype)
> +{
> + set_page_private(page, migratetype);
> +}
> +
> +static inline int get_page_migratetype(struct page *page)
> +{
> + return page_private(page);
> +}
> +
>  /*
>   * FIXME: take this include out, include page-flags.h in
>   * files which need it (119 of them)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 710d91c..103ba66 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int 
> count,
>   /* must delete as __free_one_page list manipulates */
>   list_del(&page->lru);
>   /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> - __free_one_page(page, zone, 0, page_private(page));
> - trace_mm_page_pcpu_drain(page, 0, page_private(page));
> + __free_one_page(page, zone, 0,
> + get_page_migratetype(page));
> + trace_mm_page_pcpu_drain(page, 0,
> + get_page_migratetype(page));
>   } while (--to_free && --batch_free && !list_empty(list));
>   }
>   __mod_zone_page_state(zone, NR_FREE_PAGES, count);
> @@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned 
> int order)
>   __count_vm_events(PGFREE, 1 << order);
>   free_one_page(page_zone(page), page, order,
>   get_pageblock_migratetype(page));
> +
>   local_irq_restore(flags);
>  }
>  

Unnecessary whitespace change.

> @@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int 
> order,
>   if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
>   mt = migratetype;
>   }
> - set_page_private(page, mt);
> + set_page_migratetype(page, mt);
>   list = &page->lru;
>   }
>   __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
>   return;
>  
>   migratetype = get_pageblock_migratetype(page);
> - set_page_private(page, migratetype);
> + set_page_migratetype(page, migratetype);
>   local_irq_save(flags);
>   if (unlikely(wasMlocked))
>   free_page_mlock(page);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 64abb33..acf65a7 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
> unsigned long end_pfn)
>   if (PageBuddy(page))
>   pfn += 1 << page_order(page);
>   else if (page_count(page) == 0 &&
> - page_private(page) == MIGRATE_ISOLATE)
> + get_page_migratetype(page) == MIGRATE_ISOLATE)
>   pfn += 1;
>   else
>   break;

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hello, Peter.

On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote:
> *confused* I always thought that was exactly what you meant with unified
> hierarchy.

No, I never counted out differing granularity.

> Doing all this runtime is just going to make the mess even bigger,
> because now we have to deal with even more stupid cases.
> 
> So either we go and try to contain this mess as proposed by Glauber or
> we go delete controllers.. I've had it with this crap.

If cpuacct can really go away, that's great, but I don't think the
problem at hand is unsolvable, so let's not jump it.  cpuacct and cpu
aren't the onlfy problem cases after all.  We need to solve it for
other controllers too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Add support to M54xx DMA FEC Driver

2012-09-05 Thread Philippe De Muyter
Hi Stany & Greg

Seeing that I was not the only one wanting to have the m54xx fec dma
driver merged in, and hoping to compare Stany's version to mine,
I have rebased (step by step) my patch from v2.38 to v3.6rc2.
The driver still works and perhaps even better due to some fixes
in other m68k area.

Unfortunately I have not being able to compare it yet fully with Stany's
version because Stany's patch 2/2 did not apply (using `git am') to v3.5
or v3.6rc2.

I have checked my patch using a recent version of checkpatch.pl (not the
v3.5 version, because v3.5 version of checkpatch.pl fails with :
Nested quantifiers in regex; marked by <-- HERE in m/(\((?:[^\(\)]++ <-- HERE |(
?-1))*\))/ at scripts/checkpatch.pl line 340.))

and I am now at :
464 WARNING: line over 80 characters
 90 WARNING: Use of volatile is usually wrong: see 
Documentation/volatile-considered-harmful.txt

Many "volatile" warnings are about such definitions :

#define FEC_FECFRST(x) (*(volatile unsigned int *)(x + 0x1C4))
which are afterwards used with

+   FEC_FECFRST(base_addr) |= FEC_SW_RST;
+   FEC_FECFRST(base_addr) &= ~FEC_SW_RST;
+   FEC_FECFRST(base_addr) |= FEC_SW_RST;
+   FEC_FECFRST(base_addr) &= ~FEC_SW_RST;
+   FEC_FECFRST(base_addr) |= FEC_SW_RST | FEC_RST_CTL;
+   FEC_FECFRST(base_addr) &= ~FEC_SW_RST;

Any advice about those ones ?

while many "80 characters" ones are about :
#4014: FILE: arch/m68k/platform/coldfire/MCD_tasks.c:2406:
+   0x600b, /* 0098(:1560):  DRD2A: EU0=0 EU1=0 EU2=0 EU3=11 EXT ini
t=0 WS=0 RS=0 */

I would like to keep those lines intact because the comment seems to actually
be the assembler source of the hex value at left, which seems to be a
microcode, and it makes sense to me to keep that on one line.  What do
you think about that ?

I did not include the current status of the patch because of its size
(I did not separate the dma part of the ethernet driver part because
the dma part is useless without the ethernet driver, and linking the
ethernet driver cannot succeed without the dma part), but if you ask,
I'll send it privately.

Best regards

Philippe

-- 
Philippe De Muyter +32 2 6101532 Macq SA rue de l'Aeronef 2 B-1140 Bruxelles
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
On Wed, Sep 05, 2012 at 01:06:39PM +0400, Glauber Costa wrote:
> > Heh, this is tricky to describe and I'm not really following what you
> > mean. 
> 
> Do we really want to start cleaning up all this by changing the
> interface to something that is described as "tricky" ?

The concept is not tricky.  I just can't find the appropriate words.
I *suspect* this can mostly re-use the existing css_set thing.  It
mostly becomes that css_set belongs to the unified hierarchy rather
than each task.  The user interface part isn't trivial and maybe
"don't nest beyond this level" is the only thing reasonable.  Not sure
yet whether that would be enough tho.  Need to think more about it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Glauber Costa
On 09/05/2012 01:11 PM, Tejun Heo wrote:
> Hello, Peter.
> 
> On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote:
>> *confused* I always thought that was exactly what you meant with unified
>> hierarchy.
> 
> No, I never counted out differing granularity.
> 

Can you elaborate on which interface do you envision to make it work?
They will clearly be mounted in the same hierarchy, or as said
alternatively, comounted.

If you can turn them on/off on a per-subtree basis, which interface
exactly do you propose for that?

Would a pair of cgroup core files like available_controllers and
current_controllers are a lot of drivers do, suffice?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] partitions: efi: compare first and last usable LBAs

2012-09-05 Thread Davidlohr Bueso
When verifying GPT header integrity, make sure that
first usable LBA is smaller than last usable LBA.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 6296b40..7795bb4 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -344,6 +344,12 @@ static int is_gpt_valid(struct parsed_partitions *state, 
u64 lba,
 * within the disk.
 */
lastlba = last_lba(state->bdev);
+   if (le64_to_cpu((*gpt)->last_usable_lba) < 
le64_to_cpu((*gpt)->first_usable_lba)) {
+   pr_debug("GPT: last_usable_lba incorrect: %lld > %lld\n",
+(unsigned long 
long)le64_to_cpu((*gpt)->last_usable_lba),
+(unsigned long 
long)le64_to_cpu((*gpt)->first_usable_lba));
+   goto fail;
+   }
if (le64_to_cpu((*gpt)->first_usable_lba) > lastlba) {
pr_debug("GPT: first_usable_lba incorrect: %lld > %lld\n",
 (unsigned long 
long)le64_to_cpu((*gpt)->first_usable_lba),
-- 
1.7.4.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] partitions: efi: verify header is outside usable area

2012-09-05 Thread Davidlohr Bueso
The first usable logical block can be used by a GUID partition
entry, and therefore cannot be used by the header.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 7795bb4..abf33a2 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -363,6 +363,13 @@ static int is_gpt_valid(struct parsed_partitions *state, 
u64 lba,
goto fail;
}
 
+   /* The header must be outside usable range */
+   if (le64_to_cpu((*gpt)->first_usable_lba) < lba &&
+   le64_to_cpu((*gpt)->last_usable_lba) > lba) {
+   pr_debug("GPT: Header is inside usable area\n");
+   goto fail;
+   }
+
/* Check that sizeof_partition_entry has the correct value */
if (le32_to_cpu((*gpt)->sizeof_partition_entry) != sizeof(gpt_entry)) {
pr_debug("GUID Partitition Entry Size check failed.\n");
-- 
1.7.4.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] partitions: efi: check minimum header size

2012-09-05 Thread Davidlohr Bueso
As per UEFI specs 2.3.1 (June 2012),
"The Header Size must be greater than 92 and must be less than
or equal to the logical block size"

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index abf33a2..3a5114e 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -25,6 +25,9 @@
  * TODO:
  *
  * Changelog:
+ * Sept. 2012 Davidlohr Bueso 
+ * - tighten GPT header integrity verification.
+ *
  * Mon Nov 09 2004 Matt Domsch 
  * - test for valid PMBR and valid PGPT before ever reading
  *   AGPT, allow override with 'gpt' kernel command line option.
@@ -311,8 +314,8 @@ static int is_gpt_valid(struct parsed_partitions *state, 
u64 lba,
}
 
/* Check the GUID Partition Table header size */
-   if (le32_to_cpu((*gpt)->header_size) >
-   bdev_logical_block_size(state->bdev)) {
+   if (le32_to_cpu((*gpt)->header_size) <= 92 ||
+   le32_to_cpu((*gpt)->header_size) > 
bdev_logical_block_size(state->bdev)) {
pr_debug("GUID Partition Table Header size is wrong: %u > %u\n",
le32_to_cpu((*gpt)->header_size),
bdev_logical_block_size(state->bdev));
-- 
1.7.4.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote:
> > No, I never counted out differing granularity.
> 
> Can you elaborate on which interface do you envision to make it work?
> They will clearly be mounted in the same hierarchy, or as said
> alternatively, comounted.

I'm not sure yet.  At the simplest, mask of controllers which should
honor (or ignore) nesting beyond the node.  That should be
understandable enough.  Not sure whether that would be flexible enough
yet tho.  In the end, they should be comounted but again I don't think
enforcing comounting at the moment is a step towards that.  It's more
like a step sideways.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] ARM: OMAP2+: omap-device: Do not overwrite resources allocated by OF layer

2012-09-05 Thread Peter Ujfalusi
On 08/29/2012 12:48 PM, Vaibhav Hiremath wrote:
> With the new devices (like, AM33XX and OMAP5) we now only support
> DT boot mode of operation and now it is the time to start killing
> slowly the dependency on hwmod, so with this patch, we are starting
> with device resources.
> The idea here is implemented considering to both boot modes -
>   - DT boot mode
> OF framework will construct the resource structure (currently
> does for MEM & IRQ resource) and we should respect/use these
> resources, killing hwmod dependency.
> If pdev->num_resources > 0, we assume that MEM & IRQ resources
> have been allocated by OF layer already (through DTB).
> 
> Once DMA resource is available from OF layer, we should
> kill filling any resources from hwmod.
> 
>   - Non-DT boot mode
> Here, pdev->num_resources = 0, and we should get all the
> resources from hwmod (following existing steps)
> 
> Signed-off-by: Vaibhav Hiremath 
> Cc: Benoit Cousson 
> Cc: Tony Lindgren 
> Cc: Paul Walmsley 
> Cc: Kevin Hilman 
> ---
> This patch is tested on BeagleBone and AM37xEVM.

I tried this on OMAP3 (with McBSP/twl4030-audio/omap-twl4030 DT boot), OMAP4
(McPDM, DMIC DT), and on OMAP5 (McPDM, DMIC DT).
I have sent the patches needed for the dtsi files to probe the audio related
IPs with this patch.

Tested-by: Peter Ujfalusi 

> 
> Why RFC?
> Still we have function duplication omap_device_fill_resources() and
> omap_device_fill_dma_resources(), we can actually split the function
> into 3 resources and avoid duplication -
>   - omap_device_fill_dma_resources()
>   - omap_device_fill_mem_resources()
>   - omap_device_fill_irq_resources()
> 
> Actually I wanted to clean it further but thought of getting
> feedback first and then proceed further.
> 
>  arch/arm/mach-omap2/omap_hwmod.c |   27 ++
>  arch/arm/plat-omap/include/plat/omap_hwmod.h |1 +
>  arch/arm/plat-omap/omap_device.c |   72 +
>  3 files changed, 88 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/omap_hwmod.c 
> b/arch/arm/mach-omap2/omap_hwmod.c
> index 31ec283..edabfb3 100644
> --- a/arch/arm/mach-omap2/omap_hwmod.c
> +++ b/arch/arm/mach-omap2/omap_hwmod.c
> @@ -3330,6 +3330,33 @@ int omap_hwmod_fill_resources(struct omap_hwmod *oh, 
> struct resource *res)
>  }
> 
>  /**
> + * omap_hwmod_fill_dma_resources - fill struct resource array with dma data
> + * @oh: struct omap_hwmod *
> + * @res: pointer to the array of struct resource to fill
> + *
> + * Fill the struct resource array @res with dma resource data from the
> + * omap_hwmod @oh.  Intended to be called by code that registers
> + * omap_devices.  See also omap_hwmod_count_resources().  Returns the
> + * number of array elements filled.
> + */
> +int omap_hwmod_fill_dma_resources(struct omap_hwmod *oh, struct resource 
> *res)
> +{
> + int i, sdma_reqs_cnt;
> + int r = 0;
> +
> + sdma_reqs_cnt = _count_sdma_reqs(oh);
> + for (i = 0; i < sdma_reqs_cnt; i++) {
> + (res + r)->name = (oh->sdma_reqs + i)->name;
> + (res + r)->start = (oh->sdma_reqs + i)->dma_req;
> + (res + r)->end = (oh->sdma_reqs + i)->dma_req;
> + (res + r)->flags = IORESOURCE_DMA;
> + r++;
> + }
> +
> + return r;
> +}
> +
> +/**
>   * omap_hwmod_get_resource_byname - fetch IP block integration data by name
>   * @oh: struct omap_hwmod * to operate on
>   * @type: one of the IORESOURCE_* constants from include/linux/ioport.h
> diff --git a/arch/arm/plat-omap/include/plat/omap_hwmod.h 
> b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> index 9b9646c..0533073 100644
> --- a/arch/arm/plat-omap/include/plat/omap_hwmod.h
> +++ b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> @@ -615,6 +615,7 @@ int omap_hwmod_softreset(struct omap_hwmod *oh);
> 
>  int omap_hwmod_count_resources(struct omap_hwmod *oh);
>  int omap_hwmod_fill_resources(struct omap_hwmod *oh, struct resource *res);
> +int omap_hwmod_fill_dma_resources(struct omap_hwmod *oh, struct resource 
> *res);
>  int omap_hwmod_get_resource_byname(struct omap_hwmod *oh, unsigned int type,
>  const char *name, struct resource *res);
> 
> diff --git a/arch/arm/plat-omap/omap_device.c 
> b/arch/arm/plat-omap/omap_device.c
> index c490240..fd15a3a 100644
> --- a/arch/arm/plat-omap/omap_device.c
> +++ b/arch/arm/plat-omap/omap_device.c
> @@ -486,6 +486,33 @@ static int omap_device_fill_resources(struct omap_device 
> *od,
>  }
> 
>  /**
> + * omap_device_fill_dma_resources - fill in array of struct resource with 
> dma resources
> + * @od: struct omap_device *
> + * @res: pointer to an array of struct resource to be filled in
> + *
> + * Populate one or more empty struct resource pointed to by @res with
> + * the dma resource data for this omap_device @od.  Used by
> + * omap_device_alloc() after calling omap_device_count_resources().
> + *
> + * Ideally this function would not 

[RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device

2012-09-05 Thread wency
From: Wen Congyang 

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
Reviewed-by: Yasuaki Ishimatsu 
---
 drivers/acpi/acpi_memhotplug.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 2a7beac..7873832 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
 struct acpi_memory_device {
struct acpi_device * device;
unsigned int state; /* State of the memory device */
+   int nid;
struct list_head res_list;
 };
 
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
info->enabled = 1;
num_enabled++;
}
+
+   mem_device->nid = node;
+
if (!num_enabled) {
printk(KERN_ERR PREFIX "add_memory failed\n");
mem_device->state = MEMORY_INVALID_STATE;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 08/21] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2012-09-05 Thread wency
From: Yasuaki Ishimatsu 

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/firmware/memmap.c|   98 +-
 include/linux/firmware-map.h |6 +++
 mm/memory_hotplug.c  |9 +++-
 3 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index c1cdc92..6740d26 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Data types 
--
@@ -41,6 +42,7 @@ struct firmware_map_entry {
const char  *type;  /* type of the memory range */
struct list_headlist;   /* entry for the linked list */
struct kobject  kobj;   /* kobject for each entry */
+   unsigned intbootmem:1; /* allocated from bootmem */
 };
 
 /*
@@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
 };
 
+
+static inline struct firmware_map_entry *
+to_memmap_entry(struct kobject *kobj)
+{
+   return container_of(kobj, struct firmware_map_entry, kobj);
+}
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+
+   if (entry->bootmem)
+   /* There is no way to free memory allocated from bootmem */
+   return;
+
+   kfree(entry);
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = &memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = {
  * in firmware initialisation code in one single thread of execution.
  */
 static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);
 
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
@@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 start, u64 end,
INIT_LIST_HEAD(&entry->list);
kobject_init(&entry->kobj, &memmap_ktype);
 
+   spin_lock(&map_entries_lock);
list_add_tail(&entry->list, &map_entries);
+   spin_unlock(&map_entries_lock);
 
return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   spin_lock(&map_entries_lock);
+   list_del(&entry->list);
+   spin_unlock(&map_entries_lock);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct 
firmware_map_entry *entry)
return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(&entry->kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   spin_lock(&map_entries_lock);
+   list_for_each_entry(entry, &map_entries, list)
+   if ((entry->start == start) && (entry->end == end) &&
+   (!strcmp(entry->type, type))) {
+   spin_unlock(&map_entries_lock);
+   return entry;
+   }
+
+   spin_unlock(&map_entries_lock);
+   return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -193,9 +258,36 @@ int __init firmware_map_add_early(u64 start, u64 end, 
const char *type)
if (WARN_ON(!entry))
return -ENOMEM;
 
+   entry->bootmem = 1;
return firmware_map_add_entry(start, end, type, entry);
 }
 
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end:   End of the memory range.
+ * @type:  Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   entry = firmware_map_find_entry(start, end - 1, type);
+   if (!entry)
+   return -EINVAL;
+
+   firmware_map_remove_entry(entry);
+
+   /* remove the memmap

[RFC v9 PATCH 21/21] memory-hotplug: auto offline page_cgroup when onlining memory block failed

2012-09-05 Thread wency
From: Wen Congyang 

When a memory block is onlined, we will try allocate memory on that node
to store page_cgroup. If onlining the memory block failed, we don't
offline the page cgroup, and we have no chance to offline this page cgroup
unless the memory block is onlined successfully again. It will cause
that we can't hot-remove the memory device on that node, because some
memory is used to store page cgroup. If onlining the memory block
is failed, there is no need to stort page cgroup for this memory. So
auto offline page_cgroup when onlining memory block failed.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 mm/page_cgroup.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..44db00e 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct 
notifier_block *self,
mn->nr_pages, mn->status_change_nid);
break;
case MEM_CANCEL_ONLINE:
+   offline_page_cgroup(mn->start_pfn,
+   mn->nr_pages, mn->status_change_nid);
+   break;
case MEM_GOING_OFFLINE:
break;
case MEM_ONLINE:
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 20/21] memory-hotplug: clear hwpoisoned flag when onlining pages

2012-09-05 Thread wency
From: Wen Congyang 

hwpoisoned may set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. If we don't clear
this flag when onlining pages, this page can't be freed, and will
not in free list. So we can't offline these pages again. So we
should clear this flag when onlining pages.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 mm/memory_hotplug.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 270c249..140c080 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -661,6 +661,11 @@ EXPORT_SYMBOL_GPL(__online_page_increment_counters);
 
 void __online_page_free(struct page *page)
 {
+#ifdef CONFIG_MEMORY_FAILURE
+   /* The page may be marked HWPoisoned by soft/hard offline page */
+   ClearPageHWPoison(page);
+#endif
+
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/2] btrfs-progs: some bugfixes

2012-09-05 Thread Zhi Yong Wu
  Some misc bugs are found when i work on other tasks.
Now send out them for interview, thanks.

Zhi Yong Wu (2):
  btrfs-progs: Close file descriptor on exit
  btrfs-progs: Fix up memory leakage

 cmds-filesystem.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] btrfs-progs: Close file descriptor on exit

2012-09-05 Thread Zhi Yong Wu
  Need to close fd on exit.

Signed-off-by: Zhi Yong Wu 
---
 cmds-filesystem.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index b1457de..e62c4fd 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -77,18 +77,23 @@ static int cmd_df(int argc, char **argv)
if (ret) {
fprintf(stderr, "ERROR: couldn't get space info on '%s' - %s\n",
path, strerror(e));
+   close(fd);
free(sargs);
return ret;
}
-   if (!sargs->total_spaces)
+   if (!sargs->total_spaces) {
+   close(fd);
return 0;
+   }
 
count = sargs->total_spaces;
 
sargs = realloc(sargs, sizeof(struct btrfs_ioctl_space_args) +
(count * sizeof(struct btrfs_ioctl_space_info)));
-   if (!sargs)
+   if (!sargs) {
+   close(fd);
return -ENOMEM;
+   }
 
sargs->space_slots = count;
sargs->total_spaces = 0;
@@ -148,6 +153,7 @@ static int cmd_df(int argc, char **argv)
printf("%s: total=%s, used=%s\n", description, total_bytes,
   used_bytes);
}
+   close(fd);
free(sargs);
 
return 0;
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] btrfs-progs: Fix up memory leakage

2012-09-05 Thread Zhi Yong Wu
  Some code pathes forget to free memory on exit.

Changelog from v1:
  Fix the variable is used uncorrectly. [Ram Pai]

Signed-off-by: Zhi Yong Wu 
---
 cmds-filesystem.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index e62c4fd..9c43d35 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -47,7 +47,7 @@ static const char * const cmd_df_usage[] = {
 
 static int cmd_df(int argc, char **argv)
 {
-   struct btrfs_ioctl_space_args *sargs;
+   struct btrfs_ioctl_space_args *sargs, *sargs_orig;
u64 count = 0, i;
int ret;
int fd;
@@ -65,7 +65,7 @@ static int cmd_df(int argc, char **argv)
return 12;
}
 
-   sargs = malloc(sizeof(struct btrfs_ioctl_space_args));
+   sargs_orig = sargs = malloc(sizeof(struct btrfs_ioctl_space_args));
if (!sargs)
return -ENOMEM;
 
@@ -83,6 +83,7 @@ static int cmd_df(int argc, char **argv)
}
if (!sargs->total_spaces) {
close(fd);
+   free(sargs);
return 0;
}
 
@@ -92,6 +93,7 @@ static int cmd_df(int argc, char **argv)
(count * sizeof(struct btrfs_ioctl_space_info)));
if (!sargs) {
close(fd);
+   free(sargs_orig);
return -ENOMEM;
}
 
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/5] forced comounts for cgroups.

2012-09-05 Thread Tejun Heo
Hey,

On Wed, Sep 05, 2012 at 11:07:21AM +0200, Peter Zijlstra wrote:
> Glauber, the other approach is sending a patch that doesn't touch
> cgroup.c but only the controllers and I'll merge it regardless of what
> tj thinks.
> 
> We need some movement here.

Peter, I don't think the proposed patch is helpful at this point.
While movement is necessary, it's not like moving towards any
direction is helpful.  They might just become another cruft which
needs to be maintained.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] x86: reimplement mem boot option

2012-09-05 Thread Wen Congyang
Hi, H. Peter Anvin

Do you have time to review this patch?

Thanks
Wen Congyang

At 08/21/2012 04:11 PM, Wen Congyang Wrote:
> Current mem boot option only can work for non efi environment. If the user
> specifies add_efi_memmap, it cannot work for efi environment. In
> the efi environment, we call e820_add_region() to add the memory map. So
> we can modify __e820_add_region() and the mem boot option can work for
> efi environment.
> 
> Signed-off-by: Wen Congyang 
> ---
>  arch/x86/kernel/e820.c |   29 +
>  1 files changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 4185797..20bc467 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -47,6 +47,7 @@ unsigned long pci_mem_start = 0xaeedbabe;
>  #ifdef CONFIG_PCI
>  EXPORT_SYMBOL(pci_mem_start);
>  #endif
> +static u64 mem_limit = ~0ULL;
>  
>  /*
>   * This function checks if any part of the range  is mapped
> @@ -119,6 +120,20 @@ static void __init __e820_add_region(struct e820map 
> *e820x, u64 start, u64 size,
>   return;
>   }
>  
> + if (start >= mem_limit) {
> + printk(KERN_ERR "e820: ignoring [mem %#010llx-%#010llx]\n",
> +(unsigned long long)start,
> +(unsigned long long)(start + size - 1));
> + return;
> + }
> +
> + if (mem_limit - start < size) {
> + printk(KERN_ERR "e820: ignoring [mem %#010llx-%#010llx]\n",
> +(unsigned long long)mem_limit,
> +(unsigned long long)(start + size - 1));
> + size = mem_limit - start;
> + }
> +
>   e820x->map[x].addr = start;
>   e820x->map[x].size = size;
>   e820x->map[x].type = type;
> @@ -809,7 +824,7 @@ static int userdef __initdata;
>  /* "mem=nopentium" disables the 4MB page tables. */
>  static int __init parse_memopt(char *p)
>  {
> - u64 mem_size;
> + char *oldp;
>  
>   if (!p)
>   return -EINVAL;
> @@ -825,11 +840,11 @@ static int __init parse_memopt(char *p)
>   }
>  
>   userdef = 1;
> - mem_size = memparse(p, &p);
> + oldp = p;
> + mem_limit = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
> - if (mem_size == 0)
> + if (mem_limit == 0 || p == oldp)
>   return -EINVAL;
> - e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);
>  
>   return 0;
>  }
> @@ -881,6 +896,12 @@ early_param("memmap", parse_memmap_opt);
>  
>  void __init finish_e820_parsing(void)
>  {
> + if (mem_limit != ~0ULL) {
> + userdef = 1;
> + e820_remove_range(mem_limit, ULLONG_MAX - mem_limit,
> +   E820_RAM, 1);
> + }
> +
>   if (userdef) {
>   u32 nr = e820.nr_map;
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 07/21] memory-hotplug: call acpi_bus_remove() to remove memory device

2012-09-05 Thread wency
From: Wen Congyang 

The memory device has been ejected and powoffed, so we can call
acpi_bus_remove() to remove the memory device from acpi bus.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 9d47458..b152767 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -425,8 +425,9 @@ static void acpi_memory_device_notify(acpi_handle handle, 
u32 event, void *data)
}
 
/*
-* TBD: Invoke acpi_bus_remove to cleanup data structures
+* Invoke acpi_bus_remove() to remove memory device
 */
+   acpi_bus_remove(device, 1);
 
/* _EJ0 succeeded; _OST is not necessary */
return;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 06/21] memory-hotplug: export the function acpi_bus_remove()

2012-09-05 Thread wency
From: Wen Congyang 

The function acpi_bus_remove() can remove a acpi device from acpi device.
When a acpi device is removed, we need to call this function to remove
the acpi device from acpi bus. So export this function.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/scan.c |3 ++-
 include/acpi/acpi_bus.h |1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d1ecca2..1cefc34 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1224,7 +1224,7 @@ static int acpi_device_set_context(struct acpi_device 
*device)
return -ENODEV;
 }
 
-static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
if (!dev)
return -EINVAL;
@@ -1246,6 +1246,7 @@ static int acpi_bus_remove(struct acpi_device *dev, int 
rmdevice)
 
return 0;
 }
+EXPORT_SYMBOL(acpi_bus_remove);
 
 static int acpi_add_single_object(struct acpi_device **child,
  acpi_handle handle, int type,
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index bde976e..2ccf109 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -360,6 +360,7 @@ bool acpi_bus_power_manageable(acpi_handle handle);
 bool acpi_bus_can_wakeup(acpi_handle handle);
 int acpi_power_resource_register_device(struct device *dev, acpi_handle 
handle);
 void acpi_power_resource_unregister_device(struct device *dev, acpi_handle 
handle);
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice);
 #ifdef CONFIG_ACPI_PROC_EVENT
 int acpi_bus_generate_proc_event(struct acpi_device *device, u8 type, int 
data);
 int acpi_bus_generate_proc_event4(const char *class, const char *bid, u8 type, 
int data);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 15/21] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap

2012-09-05 Thread wency
From: Yasuaki Ishimatsu 

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
and sparc.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 arch/ia64/mm/discontig.c   |6 
 arch/powerpc/mm/init_64.c  |6 
 arch/s390/mm/vmem.c|6 
 arch/sparc/mm/init_64.c|6 
 arch/x86/mm/init_64.c  |   52 
 include/linux/memory_hotplug.h |2 +
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   31 +--
 8 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index c641333..33943db 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct page *start_page,
 {
return vmemmap_populate_basepages(start_page, size, node);
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 620b7ac..3690c44 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -298,5 +298,11 @@ int __meminit vmemmap_populate(struct page *start_page,
 
return 0;
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6f896e7..eda55cd 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,12 @@ out:
return ret;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
+
 /*
  * Add memory segment to the segment list if it doesn't overlap with
  * an already present segment.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d58edf5..add1cc7 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2077,6 +2077,12 @@ void __meminit vmemmap_populate_print_last(void)
node_start = 0;
}
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 static void prot_init_common(unsigned long page_none,
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e0d88ba..0075592 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,58 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
return 0;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   unsigned long addr = (unsigned long)start_page;
+   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long next;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   for (; addr < end; addr = next) {
+   pte_t *pte = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE) & PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+MIX_SECTION_INFO);
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   continue;
+   get_page_bootmem(section_nr, pte_page(*pte),
+SECTION_INFO);
+   } else {
+   next = pmd_addr_end(addr, end);
+
+

[RFC v9 PATCH 02/21] memory-hotplug: implement offline_memory()

2012-09-05 Thread wency
From: Wen Congyang 

The function offline_memory() will be called when hot removing a
memory device. The memory device may contain more than one memory
block. If the memory block has been offlined, __offline_pages()
will fail. So we should try to offline one memory block at a
time.

If the memory block is offlined in offline_memory(), we also
update it's state, and notify the userspace that its state is
changed.

The function offline_memory() also check each memory block's
state. So there is no need to check the memory block's state
before calling offline_memory().

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
CC: Vasilis Liaskovitis 
Signed-off-by: Wen Congyang 
---
 drivers/base/memory.c  |   31 +++
 include/linux/memory_hotplug.h |2 ++
 mm/memory_hotplug.c|   37 -
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 44e7de6..86c8821 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -275,13 +275,11 @@ memory_block_action(unsigned long phys_index, unsigned 
long action)
return ret;
 }
 
-static int memory_block_change_state(struct memory_block *mem,
+static int __memory_block_change_state(struct memory_block *mem,
unsigned long to_state, unsigned long from_state_req)
 {
int ret = 0;
 
-   mutex_lock(&mem->state_mutex);
-
if (mem->state != from_state_req) {
ret = -EINVAL;
goto out;
@@ -309,10 +307,20 @@ static int memory_block_change_state(struct memory_block 
*mem,
break;
}
 out:
-   mutex_unlock(&mem->state_mutex);
return ret;
 }
 
+static int memory_block_change_state(struct memory_block *mem,
+   unsigned long to_state, unsigned long from_state_req)
+{
+   int ret;
+
+   mutex_lock(&mem->state_mutex);
+   ret = __memory_block_change_state(mem, to_state, from_state_req);
+   mutex_unlock(&mem->state_mutex);
+
+   return ret;
+}
 static ssize_t
 store_mem_state(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
@@ -653,6 +661,21 @@ int unregister_memory_section(struct mem_section *section)
 }
 
 /*
+ * offline one memory block. If the memory block has been offlined, do nothing.
+ */
+int offline_memory_block(struct memory_block *mem)
+{
+   int ret = 0;
+
+   mutex_lock(&mem->state_mutex);
+   if (mem->state != MEM_OFFLINE)
+   ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+   mutex_unlock(&mem->state_mutex);
+
+   return ret;
+}
+
+/*
  * Initialize the sysfs support for memory devices...
  */
 int __init memory_dev_init(void)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c183f39..0b040bb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -10,6 +10,7 @@ struct page;
 struct zone;
 struct pglist_data;
 struct mem_section;
+struct memory_block;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 
@@ -234,6 +235,7 @@ extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory_block(struct memory_block *mem);
 extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bb42316..6fc1908 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1001,7 +1001,42 @@ int offline_pages(unsigned long start_pfn, unsigned long 
nr_pages)
 
 int offline_memory(u64 start, u64 size)
 {
-   return -EINVAL;
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+   int ret;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = start_pfn + PFN_DOWN(size);
+
+   for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr >= mem->start_section_nr) &&
+   (section_nr <= mem->end_section_nr))
+   continue;
+
+   mem = find_memory_block_hinted(section, mem);
+   if (!mem)
+   continue;
+
+   ret = offline

[RFC v9 PATCH 05/21] memory-hotplug: check whether memory is present or not

2012-09-05 Thread wency
From: Yasuaki Ishimatsu 

If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
 include/linux/mmzone.h |   19 +++
 mm/memory_hotplug.c|   13 +
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ac3ae30 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1180,6 +1180,25 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   int i;
+   for (i = 0; i < nr_pages; i++) {
+   if (pfn_present(pfn + i))
+   continue;
+   else
+   return -EINVAL;
+   }
+   return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
 #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7747..299747d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* If system supports memory hot-remove, the memory may have been
+* removed. So we check whether the memory has been removed or not.
+*
+* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+*   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+*   always returns 0.
+*/
+   ret = pfns_present(pfn, nr_pages);
+   if (ret) {
+   unlock_memory_hotplug();
+   return ret;
+   }
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v9 PATCH 04/21] memory-hotplug: offline and remove memory when removing the memory device

2012-09-05 Thread wency
From: Yasuaki Ishimatsu 

We should offline and remove memory when removing the memory device.
The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called. In the 2nd
case, acpi_memory_device_remove() will be called. acpi_memory_device_remove()
will also be called when we unbind the memory device from the driver
acpi_memhotplug. If the type is ACPI_BUS_REMOVAL_EJECT, it means
that the user wants to eject the memory device, and we should offline
and remove memory in acpi_memory_device_remove().

The function remove_memory() is not implemeted now. It only check whether
all memory has been offllined now.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |   45 +--
 drivers/base/memory.c  |   39 ++
 include/linux/memory.h |5 
 include/linux/memory_hotplug.h |5 
 mm/memory_hotplug.c|   22 +++
 5 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 7873832..9d47458 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -310,25 +311,44 @@ static int acpi_memory_powerdown_device(struct 
acpi_memory_device *mem_device)
return 0;
 }
 
-static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+static int
+acpi_memory_device_remove_memory(struct acpi_memory_device *mem_device)
 {
int result;
struct acpi_memory_info *info, *n;
+   int node = mem_device->nid;
 
-
-   /*
-* Ask the VM to offline this memory range.
-* Note: Assume that this function returns zero on success
-*/
list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
if (info->enabled) {
result = offline_memory(info->start_addr, info->length);
if (result)
return result;
+
+   result = remove_memory(node, info->start_addr,
+  info->length);
+   if (result)
+   return result;
}
+
+   list_del(&info->list);
kfree(info);
}
 
+   return 0;
+}
+
+static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+{
+   int result;
+
+   /*
+* Ask the VM to offline this memory range.
+* Note: Assume that this function returns zero on success
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+
/* Power-off and eject the device */
result = acpi_memory_powerdown_device(mem_device);
if (result) {
@@ -477,12 +497,23 @@ static int acpi_memory_device_add(struct acpi_device 
*device)
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
struct acpi_memory_device *mem_device = NULL;
-
+   int result;
 
if (!device || !acpi_driver_data(device))
return -EINVAL;
 
mem_device = acpi_driver_data(device);
+
+   if (type == ACPI_BUS_REMOVAL_EJECT) {
+   /*
+* offline and remove memory only when the memory device is
+* ejected.
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+   }
+
kfree(mem_device);
 
return 0;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..038be73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(struct 
notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(start + size);
+
+   for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr >= mem->start_section_n

[RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-05 Thread wency
From: Wen Congyang 

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

  - acpi_memory_info  : [RFC PATCH 4/19]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
  - iomem_resource: [RFC PATCH 9/19]
  - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
  - page table of removed memory  : [RFC PATCH 12/19]
  - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
   ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
   You will see the memory device under the directory /sys/bus/acpi/devices/.
   Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
   You can write online/offline to /sys/devices/system/memory/memoryX/state to
   online/offline pages provided by this memory device
5. hotremove the memory device
   You can hotremove the memory device by the hardware, or writing 1 to
   /sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
   For example: there is a memory device on node 1. The address range
   is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
   and memory11 under the directory /sys/devices/system/memory/.
   If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
   when we online pages. When we online memory8, the memory stored page cgroup
   is not provided by this memory device. But when we online memory9, the memory
   stored page cgroup may be provided by memory8. So we can't offline memory8
   now. We should offline the memory in the reversed order.
   When the memory device is hotremoved, we will auto offline memory provided
   by this memory device. But we don't know which memory is onlined first, so
   offlining memory may fail. In such case, you should offline the memory by
   hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
   This bug will be fixed by Liu Jiang's patch:
   https://lkml.org/lkml/2012/7/3/1

change log of v9:
 [RFC PATCH v9 8/21]
   * add a lock to protect the list map_entries
   * add an indicator to firmware_map_entry to remember whether the memory
 is allocated from bootmem
 [RFC PATCH v9 10/21]
   * change the macro to inline function
 [RFC PATCH v9 19/21]
   * don't offline the node if the cpu on the node is onlined
 [RFC PATCH v9 21/21]
   * create new patch: auto offline page_cgroup when onlining memory block
 failed

change log of v8:
 [RFC PATCH v8 17/20]
   * Fix problems when one node's range include the other nodes
 [RFC PATCH v8 18/20]
   * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
 is not defined.
 [RFC PATCH v8 19/20]
   * don't offline node when some memory sections are not removed
 [RFC PATCH v8 20/20]
   * create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
 [RFC PATCH v7 4/19]
   * do not continue if acpi_memory_device_remove_memory() fails.
 [RFC PATCH v7 15/19]
   * handle usemap in register_page_bootmem_info_section() too.

change log of v6:
 [RFC PATCH v6 12/19]
   * fix building error on other archtitectures than x86

 [RFC PATCH v6 15-16/19]
   * fix building error on other archtitectures than x86

change log of v5:
 * merge the patchset to clear page table and the patchset to hot remove
   memory(from ishimatsu) to one big patchset.

 [RFC PATCH v5 1/19]
   * rename remove_memory() to offline_memory()/offline_pages()

 [RFC PATCH v5 2/19]
   * new patch: implement offline_memory(). This function offlines pages,
 update memory block's state, and notify the userspace that the memory
 block's state is changed.

 [RFC PATCH v5 4/19]
   * offline and remove memory in acpi_memory_disable_device() too.

 [RFC PATCH v5 17/19]
   * new patch: add a new function __remove_zone() to revert the things done
 in the function __add_zone().

 [RFC PATCH v5 18/19]
   * flush work befor reseting node device.

change log of v4:
 * remove "memory-hotplug : unify argument of firmware_map_add_early/hotplug"
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
 getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not at online_pages()


[RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-05 Thread wency
From: Yasuaki Ishimatsu 

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline >/sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
 drivers/acpi/acpi_memhotplug.c |2 +-
 drivers/base/memory.c  |9 +++--
 include/linux/memory_hotplug.h |3 ++-
 mm/memory_hotplug.c|   22 ++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
 */
list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
if (info->enabled) {
-   result = remove_memory(info->start_addr, info->length);
+   result = offline_memory(info->start_addr, info->length);
if (result)
return result;
}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
 static int
 memory_block_action(unsigned long phys_index, unsigned long action)
 {
-   unsigned long start_pfn, start_paddr;
+   unsigned long start_pfn;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
struct page *first_page;
int ret;
 
first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
+   start_pfn = page_to_pfn(first_page);
 
switch (action) {
case MEM_ONLINE:
-   start_pfn = page_to_pfn(first_page);
-
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
 
ret = online_pages(start_pfn, nr_pages);
break;
case MEM_OFFLINE:
-   start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages << PAGE_SHIFT);
+   ret = offline_pages(start_pfn, nr_pages);
break;
default:
WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..bb42316 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
return offlined;
 }
 
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
  unsigned long end_pfn, unsigned long timeout)
 {
unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-   unsigned long start_pfn, end_pfn;
+   return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
 
-   start_pfn = PFN_DOWN(start);
-   end_pfn = start_pfn + PFN_DOWN(size);
-   return offline_pages(start_pfn, end_pfn, 120 * HZ);
+int offline_memory(u64 start, u64 size)
+{
+   return -EINVAL;
 }
 #else
-int remove_memory(u64 start, u64 size)
+int offline_pages(u

  1   2   3   4   5   6   7   >