date:20120808

RE: [PATCH 1/1] proc: add /proc/pid/shmaps

2012-08-08 Thread Ren, Qiaowei

Thanks for your reply. There are so many contents in /proc/pid/maps, and 
usually only a very small minority of those are about shared memory in address 
space of every process. So I hope that a new file maybe provide some 
convenience. Could you tell me how to get such information except analyzing 
'maps' file?

-Original Message-
From: David Rientjes [mailto:rient...@google.com] 
Sent: Thursday, August 09, 2012 5:10 AM
To: Ren, Qiaowei
Cc: Andrew Morton; Al Viro; Oleg Nesterov; Cyrill Gorcunov; Vasiliy Kulikov; 
Hugh Dickins; Naoya Horiguchi; Konstantin Khlebnikov; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] proc: add /proc/pid/shmaps

On Wed, 8 Aug 2012, Qiaowei Ren wrote:

> Add a shmaps entry to /proc/pid: show information about shared memory in an 
> address space.
> 
> People that use shared memory and want to perform an analyzing about it. For 
> example, judge whether any memory address is shared. This file just contains 
> 'share' part of /proc/pid/maps now. There are too many contents in maps, and 
> so we have to do a lot of analysis to obtain relative information every time.
> 
> Signed-off-by: Qiaowei Ren 

Nack as unnecessary; /proc/pid/maps already explicitly emits 's' for 
VM_MAYSHARE and 'p' otherwise so this information is already available to 
userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] perf: Teach perf tool to profile sleep times (v2)

2012-08-08 Thread Namhyung Kim

On Wed, 8 Aug 2012 11:24:34 +0400, Andrey Wagin wrote:
> 2012/8/8 Namhyung Kim :
>> On Wed, 8 Aug 2012 09:02:18 +0400, Andrey Wagin wrote:
>>> 2012/8/8 Namhyung Kim :
>
> $ ./perf record -e sched:sched_stat_sleep -e sched:sched_switch \
>   -e sched:sched_process_exit -gP -o ~/perf.data.raw ~/foo
>>>
>>> Actually this string is not completed, because sched:sched_switch
>>> should be filtered by state.
>>>
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.015 MB /root/perf.data.raw (~661 
> samples) ]
> $ ./perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
> $ ./perf report -i ~/perf.data

 The usage like this is too specific and hard to use IMHO. How about
 putting it somehow into perf sched or new command?

 /me don't have an idea though. :-)

>>>
>>> I'm going to add a script, so the usage will look like this:
>>> $ perf script record sched-stat -e sched:sched_stat_sleep 
>>> This command will collect sched_stat_* and proper sched_switch events
>>
>> ???  That means '-e sched:sched_stat_sleep' part can be removed from
>> command line, no?
>
> No. My method works for all kind of sched_stat_* events, so you need
> to specify an event type which should be traced.

Ok, so can it be like 'perf script record sched-stat -t sleep '?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: Fix potential NULL pointer accesses

2012-08-08 Thread Zhang Rui

On 二, 2012-08-07 at 22:36 -0700, Guenter Roeck wrote:
> The type parameter in thermal_zone_device_register and
> thermal_cooling_device_register can be NULL, indicating that no sysfs 
> attribute
> for the type should be created. Only call strlen() and strcpy() on type if it 
> is
> not NULL.
> 
> This patch addresses Coverity #102180 and #102182: Dereference before null 
> check
> 
> Signed-off-by: Guenter Roeck 

Acked-by: Zhang Rui 

> ---
> Applies on top of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git (thermal).
> 
>  drivers/thermal/thermal_sys.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> index 5be8728..e69f76d 100644
> --- a/drivers/thermal/thermal_sys.c
> +++ b/drivers/thermal/thermal_sys.c
> @@ -900,7 +900,7 @@ thermal_cooling_device_register(char *type, void *devdata,
>   struct thermal_zone_device *pos;
>   int result;
>  
> - if (strlen(type) >= THERMAL_NAME_LENGTH)
> + if (type && strlen(type) >= THERMAL_NAME_LENGTH)
>   return ERR_PTR(-EINVAL);
>  
>   if (!ops || !ops->get_max_state || !ops->get_cur_state ||
> @@ -917,7 +917,7 @@ thermal_cooling_device_register(char *type, void *devdata,
>   return ERR_PTR(result);
>   }
>  
> - strcpy(cdev->type, type);
> + strcpy(cdev->type, type ? : "");
>   mutex_init(&cdev->lock);
>   INIT_LIST_HEAD(&cdev->thermal_instances);
>   cdev->ops = ops;
> @@ -1343,7 +1343,7 @@ struct thermal_zone_device 
> *thermal_zone_device_register(const char *type,
>   int count;
>   int passive = 0;
>  
> - if (strlen(type) >= THERMAL_NAME_LENGTH)
> + if (type && strlen(type) >= THERMAL_NAME_LENGTH)
>   return ERR_PTR(-EINVAL);
>  
>   if (trips > THERMAL_MAX_TRIPS || trips < 0 || mask >> trips)
> @@ -1365,7 +1365,7 @@ struct thermal_zone_device 
> *thermal_zone_device_register(const char *type,
>   return ERR_PTR(result);
>   }
>  
> - strcpy(tz->type, type);
> + strcpy(tz->type, type ? : "");
>   tz->ops = ops;
>   tz->device.class = &thermal_class;
>   tz->devdata = devdata;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [dm-devel] [PATCH v2 2/2] dm: verity support data device offset (Linux 3.4.7)

2012-08-08 Thread Wesley Miaw

On Aug 8, 2012, at 1:56 PM, Milan Broz wrote:

> On 08/08/2012 10:46 PM, Wesley Miaw wrote:
> 
>> I did modify veritysetup on my own so the format and verify commands will 
>> work with regular files on disk instead of having to mount through loop 
>> devices.
> 
> Which veritysetup? In upstream (cryptsetup repository) it allocates loop 
> automatically.
> (And for userspace verification it doesn't need loop at all.)
> 
> Anyway, please send a patch for userspace as well then ;-)

This isn't as polished because I pretty much just added support to do what I 
needed. I'm not sure if the LKML is the right place to post, so let me know if 
I should send this somewhere else.

Your previous email implied that veritysetup would need a way to determine if 
the data offset option is supported; I did not modify veritysetup to support 
this idea as I didn't need it.

Thanks.

From: Wesley Miaw 

Allow veritysetup format and verify commands to directly operate on regular
files instead of requiring mounts through loop devices.

Signed-off-by: Wesley Miaw 
---
 cryptsetup/lib/internal.h|1 
 cryptsetup/lib/libcryptsetup.h   |   22 
 cryptsetup/lib/libcryptsetup.sym |2 
 cryptsetup/lib/setup.c   |  133 -
 cryptsetup/lib/utils.c   |   12 ++
 cryptsetup/src/veritysetup.c |   23 +++--
 6 files changed, 183 insertions(+), 10 deletions(-)
--- a/cryptsetup/lib/internal.h 2012-08-08 17:11:20.366392301 -0700
+++ b/cryptsetup/lib/internal.h 2012-08-06 16:17:35.154719491 -0700
@@ -76,6 +76,7 @@ ssize_t read_blockwise(int fd, void *_bu
 ssize_t write_lseek_blockwise(int fd, char *buf, size_t count, off_t offset);
 int device_ready(struct crypt_device *cd, const char *device, int mode);
 int device_size(const char *device, uint64_t *size);
+int file_size(const char *filename, uint64_t *size);
 
 unsigned crypt_getpagesize(void);
 
--- a/cryptsetup/lib/libcryptsetup.h2012-08-08 17:11:20.375392929 -0700
+++ b/cryptsetup/lib/libcryptsetup.h2012-08-06 16:17:35.159720699 -0700
@@ -56,6 +57,19 @@ struct crypt_device; /* crypt device han
 int crypt_init(struct crypt_device **cd, const char *device);
 
 /**
+ * Initial crypt device handle from a file and check if provided file exists.
+ *
+ * @param cd Returns pointer to crypt device handle.
+ * @param filename Path to the backing file.
+ *
+ * @return @e 0 on success or negative errno value otherwise.
+ *
+ * @note Note that logging is not initialized here, possible messages uses
+ *   default log function.
+ */
+int crypt_initfile(struct crypt_device **cd, const char *filename);
+
+/**
  * Initialize crypt device handle from provided active device name,
  * and, optionally, from separate metadata (header) device
  * and check if provided device exists.
@@ -237,6 +251,15 @@ void crypt_set_password_verify(struct cr
 int crypt_set_data_device(struct crypt_device *cd, const char *device);
 
 /**
+ * Set data file
+ * For VERITY it is data file when hash device is separated.
+ *
+ * @param cd crypt device handle
+ * @param filename path to data file
+ */
+int crypt_set_data_file(struct crypt_device *cd, const char *device);
+
+/**
  * @defgroup rng "Cryptsetup RNG"
  *
  * @addtogroup rng
--- a/cryptsetup/lib/libcryptsetup.sym  2012-08-08 17:11:20.375392930 -0700
+++ b/cryptsetup/lib/libcryptsetup.sym  2012-08-06 16:17:35.160720941 -0700
@@ -1,6 +1,7 @@
 CRYPTSETUP_1.0 {
global:
crypt_init;
+   crypt_initfile;
crypt_init_by_name;
crypt_init_by_name_and_header;
crypt_set_log_callback;
@@ -13,6 +14,7 @@ CRYPTSETUP_1.0 {
crypt_set_password_verify;
crypt_set_uuid;
crypt_set_data_device;
+   crypt_set_data_file;
 
crypt_memory_lock;
crypt_format;
--- a/cryptsetup/lib/setup.c2012-08-08 17:11:20.428396640 -0700
+++ b/cryptsetup/lib/setup.c2012-08-06 16:17:35.192728669 -0700
@@ -25,6 +25,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "libcryptsetup.h"
 #include "luks.h"
@@ -585,6 +587,56 @@ bad:
return r;
 }
 
+int crypt_initfile(struct crypt_device **cd, const char *filename)
+{
+   struct crypt_device *h = NULL;
+   int fd;
+   struct stat st;
+   int r;
+
+   if (!cd)
+   return -EINVAL;
+
+   if (stat(filename, &st) < 0) {
+   log_err(NULL, _("File %s doesn't exist or access denied.\n"), 
filename);
+   return -EINVAL;
+   }
+
+   log_dbg("Trying to open and write file %s.", filename);
+   fd = open(filename, O_RDWR);
+   if (fd < 0) {
+   log_err(NULL, _("Cannot open file %s for writeable access.\n"), 
filename);
+   return -EINVAL;
+   }
+   close(fd);
+
+   log_dbg("Allocating crypt device %s context.", filename);
+
+   if (!(h = malloc(sizeof(struct crypt_device
+   return

Re: [dm-devel] [PATCH v2 1/2] dm: verity support data device offset (Linux 3.4.7)

2012-08-08 Thread Wesley Miaw

On Aug 8, 2012, at 1:31 PM, Milan Broz wrote:

> On 08/08/2012 08:46 PM, Mikulas Patocka wrote:
> 
>> The problem with the patch is that it changes interface to the userspace 
>> tool. The userspace tool veritysetup already exists in recent cryptsetup 
>> package, so we can't change the interface - you should change the patch so 
>> that the starting data block is the last argument and the argument is 
>> optional - so that it is compatible with the existing userspace too.
> 
> yes. Please never change interface without at least increasing target version.
> 
> I have to add userspace support as well to veritysetup and we need a way
> how to detect that option is supported by running kernel.

Apologies if the version increment is incorrect; I was not sure if the minor or 
patch number should be incremented. I assume the different version number is 
what would be used to detect if the data offset option is supported. Thanks.

From: Wesley Miaw 

Add data device start block index as optional dm-verity target parameters to
support verity targets where the data does not begin at sector 0 of the block
device.

Also fix the hash block index computations so they take into account any data
offset.

Signed-off-by: Wesley Miaw 
---
 Documentation/device-mapper/verity.txt |8 ++-
 drivers/md/dm-verity.c |   24 ++-
 2 files changed, 26 insertions(+), 6 deletions(-)
--- a/drivers/md/dm-verity.c2012-08-07 16:03:03.778759000 -0700
+++ b/drivers/md/dm-verity.c2012-08-08 17:04:16.344682266 -0700
@@ -477,7 +477,7 @@ static int verity_map(struct dm_target *
return -EIO;
}
 
-   if ((bio->bi_sector + bio_sectors(bio)) >>
+   if ((bio->bi_sector - v->data_start + bio_sectors(bio)) >>
(v->data_dev_block_bits - SECTOR_SHIFT) > v->data_blocks) {
DMERR_LIMIT("io out of range");
return -EIO;
@@ -491,7 +491,7 @@ static int verity_map(struct dm_target *
io->bio = bio;
io->orig_bi_end_io = bio->bi_end_io;
io->orig_bi_private = bio->bi_private;
-   io->block = bio->bi_sector >> (v->data_dev_block_bits - SECTOR_SHIFT);
+   io->block = (bio->bi_sector - v->data_start) >> (v->data_dev_block_bits 
- SECTOR_SHIFT);
io->n_blocks = bio->bi_size >> v->data_dev_block_bits;
 
bio->bi_end_io = verity_end_io;
@@ -646,6 +646,7 @@ static void verity_dtr(struct dm_target 
  * 
  * 
  *   Hex string or "-" if no salt.
+ *   Optional. The default is zero.
  */
 static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
 {
@@ -671,8 +672,8 @@ static int verity_ctr(struct dm_target *
goto bad;
}
 
-   if (argc != 10) {
-   ti->error = "Invalid argument count: exactly 10 arguments 
required";
+   if (argc != 10 && argc != 11) {
+   ti->error = "Invalid argument count: 10 or 11 arguments 
required";
r = -EINVAL;
goto bad;
}
@@ -793,6 +794,19 @@ static int verity_ctr(struct dm_target *
}
}
 
+   if (argc == 11) {
+   if (sscanf(argv[10], "%llu%c", &num_ll, &dummy) != 1 ||
+   num_ll << (v->data_dev_block_bits - SECTOR_SHIFT) !=
+   (sector_t)num_ll << (v->data_dev_block_bits - 
SECTOR_SHIFT)) {
+   ti->error = "Invalid data start";
+   r = -EINVAL;
+   goto bad;
+   }
+   v->data_start = num_ll << (v->data_dev_block_bits - 
SECTOR_SHIFT);
+   } else {
+   v->data_start = 0;
+   }
+
v->hash_per_block_bits =
fls((1 << v->hash_dev_block_bits) / v->digest_size) - 1;
 
@@ -875,7 +889,7 @@ bad:
 
 static struct target_type verity_target = {
.name   = "verity",
-   .version= {1, 0, 0},
+   .version= {1, 1, 0},
.module = THIS_MODULE,
.ctr= verity_ctr,
.dtr= verity_dtr,
--- a/Documentation/device-mapper/verity.txt2012-08-08 11:02:48.558883756 
-0700
+++ b/Documentation/device-mapper/verity.txt2012-08-08 16:50:04.114864090 
-0700
@@ -11,6 +11,7 @@ Construction Parameters
  
  
   
+[]
 
 
 This is the type of the on-disk hash format.
@@ -62,6 +63,10 @@ Construction Parameters
 
 The hexadecimal encoding of the salt value.
 
+
+This is the offset, in -blocks, from the start of data_dev
+to the first block of the data.
+
 Theory of operation
 ===
 
@@ -138,7 +143,8 @@ Set up a device:
   # dmsetup create vroot --readonly --table \
 "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
 "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
-"1234"
+"1234000

Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v6)

2012-08-08 Thread Namhyung Kim

Hi, Arun

On Wed, 8 Aug 2012 12:16:30 -0700, Arun Sharma wrote:
> On 3/30/12 10:43 PM, Arun Sharma wrote:
>> [ Meant to include v6 ChangeLog as well. Technical difficulties.. ]
>>
>> v6 ChangeLog:
>>
>> rebased to tip:perf/core and fixed a minor problem in computing
>> the total period in hists__remove_entry_filter(). Needed to
>> use period_self instead of period.
>
> This patch breaks perf top (symptom: percentages > 100%). Fixed by the
> following patch.
>
> Namhyung: if you're still working on forward porting this, please add
> this fix to your queue.
>
Will do, thanks.
Namhyung


>  -Arun
>
> commit 75a1c409a529c9741f8a2f493868d1fc7ce7e06d
> Author: Arun Sharma 
> Date:   Wed Aug 8 11:47:02 2012 -0700
>
>perf: update period_self as well on collapsing
>   When running perf top, we have a series of incoming samples,
>which get aggregated in various user specified ways.
>   Suppose function "foo" had the following samples:
>101, 103, 99, 105, ...
>   ->period for the corresponding entry looks as follows:
>101, 204, 303, 408, ...
>   However, due to this bug, ->period_self contains:
>101, 103, 99, 105, ...
>   and therefore breaks the invariant period == period_self
>in the default mode (no sort inclusive).
>   Since total_period is computed by summing up period_self,
>   period/total_period can be > 100%
>   Fix the bug by updating period_self as well.
>   Signed-off-by: Arun Sharma 
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index a2a8d91..adc891e 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -462,6 +462,7 @@ static bool hists__collapse_insert_entry(struct
> hists *hists,
>
>   if (!cmp) {
>   iter->period += he->period;
> + iter->period_self += he->period_self;
>   iter->nr_events += he->nr_events;
>   if (symbol_conf.use_callchain) {
>   
> callchain_cursor_reset(&hists->callchain_cursor);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 01/12] block: Generalized bio pool freeing

2012-08-08 Thread Kent Overstreet

On Wed, Aug 08, 2012 at 03:25:15PM -0700, Tejun Heo wrote:
> On Mon, Aug 06, 2012 at 03:08:30PM -0700, Kent Overstreet wrote:
> > @@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
> > if (atomic_dec_and_test(&bio->bi_cnt)) {
> > bio_disassociate_task(bio);
> > bio->bi_next = NULL;
> > -   bio->bi_destructor(bio);
> > +
> > +   if (bio->bi_pool)
> > +   bio_free(bio, bio->bi_pool);
> > +   else
> > +   bio->bi_destructor(bio);
> 
> So, this bi_pool overriding caller specified custom bi_destructor is
> rather unusual.  I know why it's like that - the patch series is
> gradually replacing bi_destructor with bi_pool and removes
> bi_destructor eventually, but it would be far better if at least patch
> description says why this is unusual like this.

Ok, I'll stick a comment in there:

if (atomic_dec_and_test(&bio->bi_cnt)) {
bio_disassociate_task(bio);
bio->bi_next = NULL;

/*
 * This if statement is temporary - bi_pool is replacing
 * bi_destructor, but bi_destructor will be taken out in another
 * patch.
 */
if (bio->bi_pool)
bio_free(bio, bio->bi_pool);
else
bio->bi_destructor(bio);
}

> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 05/12] block: Kill bi_destructor

2012-08-08 Thread Kent Overstreet

On Wed, Aug 08, 2012 at 03:22:23PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 06, 2012 at 03:08:34PM -0700, Kent Overstreet wrote:
> > Now that we've got generic code for freeing bios allocated from bio
> > pools, this isn't needed anymore.
> > 
> > This also changes the semantics of bio_free() a bit - it now also frees
> > bios allocated by bio_kmalloc(). It's also no longer exported, as
> > without bi_destructor there should be no need for it to be called
> > anywhere else.
> > 
> > v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
> > 
> > Signed-off-by: Kent Overstreet 
> > ---
> > diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
> > index 920ede2..19bf632 100644
> > --- a/drivers/block/drbd/drbd_main.c
> > +++ b/drivers/block/drbd/drbd_main.c
> > @@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops 
> > = {
> > .release = drbd_release,
> >  };
> >  
> > -static void bio_destructor_drbd(struct bio *bio)
> > -{
> > -   bio_free(bio, drbd_md_io_bio_set);
> > -}
> > -
> >  struct bio *bio_alloc_drbd(gfp_t gfp_mask)
> >  {
> > -   struct bio *bio;
> > -
> > if (!drbd_md_io_bio_set)
> > return bio_alloc(gfp_mask, 1);
> >  
> > -   bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> > -   if (!bio)
> > -   return NULL;
> > -   bio->bi_destructor = bio_destructor_drbd;
> > -   return bio;
> > +   return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> >  }
> 
> Does this chunk belong to this patch?

Hrm, that should've been in the first patch. Will move it.

> 
> > @@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] 
> > __read_mostly = {
> >   */
> >  struct bio_set *fs_bio_set;
> >  
> > +#define BIO_KMALLOC_POOL ((void *) ~0)
> 
> What's wrong with good ol' NULL?

If it's NULL, we can't distinguish between bios where that field wasn't
set (i.e. bios that were statically allocated somewhere) from bios that
were allocated by bio_kmalloc().

It's just there to make debugging easier - if bi_cnt goes to 0 on a bio
where it shouldn't we'll catch it at the BUG_ON() in bio_free() instead
of kfreeing a bad pointer.

> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 05/12] block: Kill bi_destructor

2012-08-08 Thread Kent Overstreet

On Mon, Aug 06, 2012 at 11:19:21PM -0400, Mike Snitzer wrote:
> On Mon, Aug 06 2012 at  6:08pm -0400,
> Kent Overstreet  wrote:
> 
> > Now that we've got generic code for freeing bios allocated from bio
> > pools, this isn't needed anymore.
> > 
> > This also changes the semantics of bio_free() a bit - it now also frees
> > bios allocated by bio_kmalloc(). It's also no longer exported, as
> > without bi_destructor there should be no need for it to be called
> > anywhere else.
> 
> Seems you forgot to remove bio_free's EXPORT_SYMBOL

Whoops - thanks, fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-08 Thread Minchan Kim

Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
>   C
> Process A M S F
>   |---|
> Process B M   FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
> if (cc->order > 0 && (!cc->wrapped ||
>   zone->compact_cached_free_pfn >
>   cc->start_free_pfn))
> pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
>   of the zone. When a wrapped scanner isolates a page, it updates
>   compact_cached_free_pfn to point to the highest pageblock it
>   can isolate pages from.

Okay until here.

> 
> If a scanner has not wrapped when it has finished isolated pages it
>   checks if compact_cached_free_pfn is pointing to the end of the
>   zone. If so, the value is updated to point to the highest
>   pageblock that pages were isolated from. This value will not
>   be updated again until a free page scanner wraps and resets
>   compact_cached_free_pfn.

I tried to understand your intention of this part but unfortunately failed.
By this part, the problem you mentioned could happen again?

C
 Process A  M S F
|---|
 Process B  M   FS
 
 C is zone->compact_cached_free_pfn
 S is cc->start_pfree_pfn
 M is cc->migrate_pfn
 F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn to end of the zone accordingly.

Simultaneously, Process B finishes isolating in a block and peek 
compact_cached_free_pfn position and know it's end of the zone so
update compact_cached_free_pfn to highest pageblock that pages were
isolated from.

Process A updates compact_cached_free_pfn to the highest pageblock which
was set by process B because process A has wrapped. It ends up big jump
without any scanning in process A.

No?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman 
> Reviewed-by: Rik van Riel 
> ---
>  mm/compaction.c |   54 --
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the 
> starting
> + * point for full compaction of a zone.  Compaction searches for free pages 
> from
> + * the end of each zone, while isolate_freepages_block scans forward inside 
> each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> + unsigned long free_pfn;
> + free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> + free_pfn &= ~(pageblock_nr_pages-1);
> + return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isola

Re: [PATCH v5 04/12] pktcdvd: Switch to bio_kmalloc()

2012-08-08 Thread Kent Overstreet

On Wed, Aug 08, 2012 at 03:13:59PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 06, 2012 at 03:08:33PM -0700, Kent Overstreet wrote:
> > This is prep work for killing bi_destructor - previously, pktcdvd had
> > its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> > necessitating its own bi_destructor implementation.
> > 
> > v5: Un-reorder some functions, to make the patch easier to review
> > 
> > Signed-off-by: Kent Overstreet 
> 
> Please Cc: the maintainers.  Cc'ing Peter Osterlund and keeping the
> whole body for him.

Whoops, thanks.

> Generally looks good to me.  How is this tested?

Untested - no hardware for it.

> 
> Thanks.
> 
> > ---
> >  drivers/block/pktcdvd.c |   67 
> > +++---
> >  1 files changed, 16 insertions(+), 51 deletions(-)
> > 
> > diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> > index ba66e44..ae55f08 100644
> > --- a/drivers/block/pktcdvd.c
> > +++ b/drivers/block/pktcdvd.c
> > @@ -101,6 +101,8 @@ static struct dentry*pkt_debugfs_root = NULL; /* 
> > /sys/kernel/debug/pktcdvd */
> >  static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
> >  static int pkt_remove_dev(dev_t pkt_dev);
> >  static int pkt_seq_show(struct seq_file *m, void *p);
> > +static void pkt_end_io_read(struct bio *bio, int err);
> > +static void pkt_end_io_packet_write(struct bio *bio, int err);
> >  
> >  
> >  
> > @@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
> > }
> >  }
> >  
> > -static void pkt_bio_destructor(struct bio *bio)
> > -{
> > -   kfree(bio->bi_io_vec);
> > -   kfree(bio);
> > -}
> > -
> > -static struct bio *pkt_bio_alloc(int nr_iovecs)
> > -{
> > -   struct bio_vec *bvl = NULL;
> > -   struct bio *bio;
> > -
> > -   bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
> > -   if (!bio)
> > -   goto no_bio;
> > -   bio_init(bio);
> > -
> > -   bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
> > -   if (!bvl)
> > -   goto no_bvl;
> > -
> > -   bio->bi_max_vecs = nr_iovecs;
> > -   bio->bi_io_vec = bvl;
> > -   bio->bi_destructor = pkt_bio_destructor;
> > -
> > -   return bio;
> > -
> > - no_bvl:
> > -   kfree(bio);
> > - no_bio:
> > -   return NULL;
> > -}
> > -
> >  /*
> >   * Allocate a packet_data struct
> >   */
> > @@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int 
> > frames)
> > goto no_pkt;
> >  
> > pkt->frames = frames;
> > -   pkt->w_bio = pkt_bio_alloc(frames);
> > +   pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
> > if (!pkt->w_bio)
> > goto no_bio;
> >  
> > +   pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> > +   pkt->w_bio->bi_private = pkt;
> > +
> > for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
> > pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
> > if (!pkt->pages[i])
> > @@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int 
> > frames)
> > bio_list_init(&pkt->orig_bios);
> >  
> > for (i = 0; i < frames; i++) {
> > -   struct bio *bio = pkt_bio_alloc(1);
> > +   struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
> > if (!bio)
> > goto no_rd_bio;
> > +
> > +   bio->bi_end_io = pkt_end_io_read;
> > +   bio->bi_private = pkt;
> > pkt->r_bios[i] = bio;
> > }
> >  
> > @@ -,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device 
> > *pd, struct packet_data *pkt)
> >  * Schedule reads for missing parts of the packet.
> >  */
> > for (f = 0; f < pkt->frames; f++) {
> > -   struct bio_vec *vec;
> > -
> > int p, offset;
> > +
> > if (written[f])
> > continue;
> > +
> > bio = pkt->r_bios[f];
> > -   vec = bio->bi_io_vec;
> > -   bio_init(bio);
> > -   bio->bi_max_vecs = 1;
> > -   bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> > -   bio->bi_bdev = pd->bdev;
> > -   bio->bi_end_io = pkt_end_io_read;
> > -   bio->bi_private = pkt;
> > -   bio->bi_io_vec = vec;
> > -   bio->bi_destructor = pkt_bio_destructor;
> > +   bio_reset(bio);
> > +   bio->bi_sector  = pkt->sector + f * (CD_FRAMESIZE >> 9);
> > +   bio->bi_bdev= pd->bdev;
> >  
> > p = (f * CD_FRAMESIZE) / PAGE_SIZE;
> > offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
> > @@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device 
> > *pd, struct packet_data *pkt)
> > }
> >  
> > /* Start the write request */
> > -   bio_init(pkt->w_bio);
> > -   pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
> > +   bio_reset(pkt->w_bio);
> > pkt->w_bio->bi_sector = pkt->sector;
> > pkt->w_bio->bi_bdev = pd->bdev;
> > -   pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> > -   pkt->w_bio->bi_private = pkt;
> > -   pkt->w_bio->bi_io_vec = bvec;
> > -   pkt->w_bio->bi_d

Re: [PATCH v5 03/12] block: Add bio_reset()

2012-08-08 Thread Kent Overstreet

On Wed, Aug 08, 2012 at 03:11:29PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 06, 2012 at 03:08:32PM -0700, Kent Overstreet wrote:
> > Reusing bios is something that's been highly frowned upon in the past,
> > but driver code keeps doing it anyways. If it's going to happen anyways,
> > we should provide a generic method.
> > 
> > This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> > was open coding it, by doing a bio_init() and resetting bi_destructor.
> > 
> > v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
> > bio->bi_flags are saved.
> > 
> > Signed-off-by: Kent Overstreet 
> > Change-Id: I4eb2975bd678d3be811d5423d0620b08020be9ff
> 
> Please drop Change-Id.  Die gerrit die.

Bah, missed that one. 

> > +void bio_reset(struct bio *bio)
> > +{
> > +   unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
> 
> How many flags are we talking about?  If there aren't too many, I'd
> prefer explicit BIO_FLAGS_PRESERVED or whatever.

It mostly isn't actual flags that are preserved - the high bits of the
flags are used for indicating what slab the bvec was allocated from, and
that's the main thing that has to be preserved.

So that's why I went with defining the things that are reset instead of
the things that are preserved.

I would prefer if bitfields were used for at least BIO_POOL_IDX, but the
problem is flags is used as an atomic bit vector for BIO_UPTODATE.

But flags isn't treated as an atomic bit vector elsewhere -
bio_flagged() doesn't use test_bit(), and flags are set/cleared with
atomic bit operations in some places but not in others (probably _most_
of them are technically safe, but... ick).

> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the osd tree with the nfs tree

2012-08-08 Thread Stephen Rothwell

Hi Boaz,

Today's linux-next merge of the osd tree got a conflict in
fs/nfs/nfs4proc.c between commit 47fbf7976e0b ("NFSv4.1: Remove a bogus
BUG_ON() in nfs4_layoutreturn_done") from the nfs tree and commit
d8e8b68405db ("pnfs: Don't BUG on info received from Server") from the
osd tree.

I just used the nfs tree version.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpIuxsNsmgsj.pgp
Description: PGP signature

Re: [PATCH] x86/pci: Allow x86 platforms to use translation offsets

2012-08-08 Thread Yinghai Lu

On Wed, Aug 8, 2012 at 4:42 PM,   wrote:
> From: Charlie Mear 
>
> The memory range descriptors in the _CRS control method contain an
> address translation offset for host bridges.  This value is used to
> translate addresses across the bridge.  The support to use _TRA values
> is present for other architectures but not for X86 platforms.
>
> For existing X86 platforms the _TRA value is zero.  Non zero _TRA values
> are expected on future X86 platforms and this change will register that
> value with the resource.
>
> Signed-off-by: Charlie Mear 
> ---
>  arch/x86/pci/acpi.c |   18 --
>  1 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index 505acdd..37acbae 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -12,6 +12,7 @@ struct pci_root_info {
> char name[16];
> unsigned int res_num;
> struct resource *res;
> +   u64 *res_offset;

resource_size_t * ?

> struct pci_sysdata sd;
>  #ifdef CONFIG_PCI_MMCONFIG
> bool mcfg_added;
> @@ -306,6 +307,7 @@ setup_resource(struct acpi_resource *acpi_res, void *data)
> res->start = start;
> res->end = end;
> res->child = NULL;
> +   info->res_offset[info->res_num] = addr.translation_offset;
>
> if (!pci_use_crs) {
> dev_printk(KERN_DEBUG, &info->bridge->dev,
> @@ -375,7 +377,8 @@ static void add_resources(struct pci_root_info *info,
>  "ignoring host bridge window %pR (conflicts 
> with %s %pR)\n",
>  res, conflict->name, conflict);
> else
> -   pci_add_resource(resources, res);
> +   pci_add_resource_offset(resources, res,
> +   info->res_offset[i]);
> }
>  }
>
> @@ -383,6 +386,8 @@ static void free_pci_root_info_res(struct pci_root_info 
> *info)
>  {
> kfree(info->res);
> info->res = NULL;
> +   kfree(info->res_offset);
> +   info->res_offset = NULL;
> info->res_num = 0;
>  }
>
> @@ -433,11 +438,20 @@ probe_pci_root_info(struct pci_root_info *info, struct 
> acpi_device *device,
> return;
>
> size = sizeof(*info->res) * info->res_num;
> -   info->res_num = 0;
> info->res = kmalloc(size, GFP_KERNEL);
> if (!info->res)
   you need to info->res_num = 0 here
> return;
>
> +   size = sizeof(*info->res_offset) * info->res_num;
> +   info->res_offset = kmalloc(size, GFP_KERNEL);
> +   if (!info->res_offset) {
> +   kfree(info->res);
> +   info->res = NULL;
> +   return;
> +   }
> +   info->res_num = 0;

need to move it before: if (!info->res_offset) {

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Casey Schaufler

On 8/8/2012 2:54 PM, Eric Dumazet wrote:

By the way, once this proved to be an issue that involved
more than just SELinux it needed to go onto the LSM list as
well.

> On Wed, 2012-08-08 at 16:46 -0400, Paul Moore wrote:
>> On Wednesday, August 08, 2012 10:32:52 PM Eric Dumazet wrote:
>>> On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote:
 On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote:
> Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
> code.
 Sure but it seems include file misses an accessor for this.

 We could add it on a future cleanup patch, as Paul mentioned.
>>> I cooked following patch.
>>> But smack/smack_lsm.c makes a reference to
>>> smk_of_current()... so it seems we are in a hole...
>>>
>>> It makes little sense to me to have any kind of security on this
>>> internal sockets.
>>>
>>> Maybe selinux should not crash if sk->sk_security is NULL ?
>> I realize our last emails probably passed each other mid-flight, but 
>> hopefully 
>> it explains why we can't just pass packets when sk->sk_security is NULL.
>>
>> Regardless, some quick comments below ...
>>
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 6c77f63..459eca6 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -4289,10 +4289,13 @@ out:
>>> return 0;
>>>  }
>>>
>>> -static int selinux_sk_alloc_security(struct sock *sk, int family, ...
>>> +static int selinux_sk_alloc_security(struct sock *sk, int family, ...
>>>  {
>>> struct sk_security_struct *sksec;
>>>
>>> +   if (check && sk->sk_security)
>>> +   return 0;
>>> +
>>> sksec = kzalloc(sizeof(*sksec), priority);
>>> if (!sksec)
>>> return -ENOMEM;
>> I think I might replace the "check" boolean with a "kern/kernel" boolean so 
>> that in addition to the allocation we can also initialize the socket to 
>> SECINITSID_KERNEL/kernel_t here in the case when the boolean is set.  The 
>> only 
>> place that would set the boolean to true would be ip_send_unicast_reply(), 
>> all 
>> other callers would set it to false.
>>
>>> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
>>> index 8221514..8965cf1 100644
>>> --- a/security/smack/smack_lsm.c
>>> +++ b/security/smack/smack_lsm.c
>>> @@ -1754,11 +1754,14 @@ static void smack_task_to_inode(struct task_struct
>>> *p, struct inode *inode) *
>>>   * Returns 0 on success, -ENOMEM is there's no memory
>>>   */
>>> -static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t
>>> gfp_flags) +static int smack_sk_alloc_security(struct sock *sk, int family,
>>> gfp_t gfp_flags, bool check) {
>>> char *csp = smk_of_current();
>>> struct socket_smack *ssp;
>>>
>>> +   if (check && sk->sk_security)
>>> +   return 0;
>>> +
>>> ssp = kzalloc(sizeof(struct socket_smack), gfp_flags);
>>> if (ssp == NULL)
>>> return -ENOMEM;
>> In the case of Smack, when the kernel boolean is true I think the right 
>> solution is to use smack_net_ambient.

I confess that my understanding of unicast is limited.
If the intention is to send an unlabeled packet then
indeed smack_net_ambient is the way to go.

>>
> cool, here the last version :
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 4e5a73c..4d8e454 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1601,7 +1601,7 @@ struct security_operations {
>   int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb);
>   int (*socket_getpeersec_stream) (struct socket *sock, char __user 
> *optval, int __user *optlen, unsigned len);
>   int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff 
> *skb, u32 *secid);
> - int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority);
> + int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority, 
> bool kernel);

Is there no information already available in the sock
that will tell us this is a unicast operation?

>   void (*sk_free_security) (struct sock *sk);
>   void (*sk_clone_security) (const struct sock *sk, struct sock *newsk);
>   void (*sk_getsecid) (struct sock *sk, u32 *secid);
> @@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct 
> sk_buff *skb);
>  int security_socket_getpeersec_stream(struct socket *sock, char __user 
> *optval,
> int __user *optlen, unsigned len);
>  int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff 
> *skb, u32 *secid);
> -int security_sk_alloc(struct sock *sk, int family, gfp_t priority);
> +int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool 
> kernel);
>  void security_sk_free(struct sock *sk);
>  void security_sk_clone(const struct sock *sk, struct sock *newsk);
>  void security_sk_classify_flow(struct sock *sk, struct flowi *fl);
> @@ -2667,7 +2667,7 @@ static inline int 
> security_socket_getpeersec_dgram(struct s

Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-08-08 Thread Kent Overstreet

On Wed, Aug 08, 2012 at 03:06:12PM -0700, Tejun Heo wrote:
> Hello,
> 
> On Mon, Aug 06, 2012 at 03:08:31PM -0700, Kent Overstreet wrote:
> > Previously, dm_rq_clone_bio_info needed to be freed by the bio's
> > destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
> > This gets rid of a memory allocation and means we can kill
> > dm_rq_bio_destructor.
> > 
> > Signed-off-by: Kent Overstreet 
> > ---
> >  drivers/md/dm.c |   31 +--
> >  1 files changed, 5 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 40b7735..4014696 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -92,6 +92,7 @@ struct dm_rq_target_io {
> >  struct dm_rq_clone_bio_info {
> > struct bio *orig;
> > struct dm_rq_target_io *tio;
> > +   struct bio clone;
> >  };
> ...
> > @@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
> > type, unsigned integrity)
> > if (!pools->tio_pool)
> > goto free_io_pool_and_out;
> >  
> > -   pools->bs = bioset_create(pool_size, 0);
> > +   pools->bs = bioset_create(pool_size,
> > + offsetof(struct dm_rq_clone_bio_info, orig));
> > if (!pools->bs)
> > goto free_tio_pool_and_out;
> 
> I do like this approach much better but this isn't something
> super-obvious.  Can we please explain what's going on?  Especially,
> the comment above dm_rq_clone_bio_info is outright misleading now.

This look better to you?

/*
 * For request-based dm - the bio clones we allocate are embedded in these
 * structs.
 *
 * We allocate these with bio_alloc_bioset, using the front_pad parameter when
 * the bioset is created - this means the bio has to come at the end of the
 * struct.
 */
struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
struct bio clone;
};

> Can someone more familiar review this one?  Alasdir, Mike?
> 
> Also, how was this tested?

Well, AFAICT the only request based dm target is multipath, and from the
documentation I've seen it doesn't appear to work without multipath
hardware, or at least I haven't seen it documented how. So, unless
there's another user I missed it's not been tested.

> 
> Thanks.
> 
> -- 
> tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

2012-08-08 Thread Minchan Kim

On Wed, Aug 08, 2012 at 09:51:12AM +0100, Mel Gorman wrote:
> On Wed, Aug 08, 2012 at 05:27:38PM +0900, Minchan Kim wrote:
> > On Wed, Aug 08, 2012 at 08:55:26AM +0100, Mel Gorman wrote:
> > > On Wed, Aug 08, 2012 at 10:48:24AM +0900, Minchan Kim wrote:
> > > > Hi Mel,
> > > > 
> > > > Just out of curiosity.
> > > > What's the problem did you see? (ie, What's the problem do this patch 
> > > > solve?)
> > > 
> > > Everythign in this series is related to the problem in the leader - high
> > > order allocation success rates are lower. This patch increases the success
> > > rates when allocating under load.
> > > 
> > > > AFAIUC, it seem to solve consecutive allocation success ratio through
> > > > getting several free pageblocks all at once in a process/kswapd
> > > > reclaim context. Right?
> > > 
> > > Only pageblocks if it is order-9 on x86, it reclaims an amount that 
> > > depends
> > > on an allocation size. This only happens during reclaim/compaction context
> > > when we know that a high-order allocation has recently failed. The 
> > > objective
> > > is to reclaim enough order-0 pages so that compaction can succeed again.
> > 
> > Your patch increases the number of pages to be reclaimed with considering
> > the number of fail case during deferring period and your test proved it's
> > really good. Without your patch, why can't VM reclaim enough pages?
> 
> It could reclaim enough pages but it doesn't. nr_to_reclaim is
> SWAP_CLUSTER_MAX and that gets short-cutted in direct reclaim at least
> by 
> 
> if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> goto out;
> 
> I could set nr_to_reclaim in try_to_free_pages() of course and drive
> it from there but that's just different, not better. If driven from
> do_try_to_free_pages(), it is also possible that priorities will rise.
> When they reach DEF_PRIORITY-2, it will also start stalling and setting
> pages for immediate reclaim which is more disruptive than not desirable
> in this case. That is a more wide-reaching change than I would expect for
> this problem and could cause another regression related to THP requests
> causing interactive jitter.

Agreed.
I hope it should be added by changelog.

> 
> > Other processes steal the pages reclaimed?
> 
> Or the page it reclaimed were in pageblocks that could not be used.
> 
> > Why I ask a question is that I want to know what's the problem at current
> > VM.
> > 
> 
> We cannot reliably tell in advance whether compaction is going to succeed
> in the future without doing a full scan of the zone which would be both
> very heavy and race with any allocation requests. Compaction needs free
> pages to succeed so the intention is to scale the number of pages reclaimed
> with the number of recent compaction failures.

> If allocation fails after compaction then compaction may be deferred for
> a number of allocation attempts. If there are subsequent failures,
> compact_defer_shift is increased to defer for longer periods. This patch
> uses that information to scale the number of pages reclaimed with
> compact_defer_shift until allocations succeed again.
> 
> Signed-off-by: Mel Gorman 
> ---
>  mm/vmscan.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 66e4310..0cb2593 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1708,6 +1708,7 @@ static inline bool should_continue_reclaim(struct 
> lruvec *lruvec,
>  {
>   unsigned long pages_for_compaction;
>   unsigned long inactive_lru_pages;
> + struct zone *zone;
>  
>   /* If not in reclaim/compaction mode, stop */
>   if (!in_reclaim_compaction(sc))
> @@ -1741,6 +1742,15 @@ static inline bool should_continue_reclaim(struct 
> lruvec *lruvec,
>* inactive lists are large enough, continue reclaiming
>*/
>   pages_for_compaction = (2UL << sc->order);
> +
> + /*
> +  * If compaction is deferred for this order then scale the number of

this order? sc->order?

> +  * pages reclaimed based on the number of consecutive allocation
> +  * failures
> +  */
> + zone = lruvec_zone(lruvec);
> + if (zone->compact_order_failed >= sc->order)

I can't understand this part.
We don't defer lower order than compact_order_failed by aff62249.
Do you mean lower order compaction context should be a lamb for
deferred higher order allocation request success? I think it's not fair
and even I can't understand rationale why it has to scale the number of pages
reclaimed with the number of recent compaction failture.
Your changelog just says "What we have to do, NOT Why we have to do".


> + pages_for_compaction <<= zone->compact_defer_shift;


>   inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
>   if (nr_swap_pages > 0)
>   inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON);
> -- 
> 1.7.9.2
> 


> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubsc

Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

2012-08-08 Thread Muthu Kumar

Tejun,

This is changing the semantics of the clone. Sorry, I missed this
thread and replied separately. But anyway, replying it again here:


On Wed, Aug 8, 2012 at 4:28 PM, Tejun Heo  wrote:
> On Mon, Aug 06, 2012 at 07:16:33PM -0400, Mikulas Patocka wrote:
>> Hi Kent
>>
>> When you change the semantics of an exported function, rename that
>> function. There may be external modules that use __bio_clone and this
>> change could silently introduce bugs in them.
>>
>> Otherwise, the patchset looks fine.
>
> I don't know.  This doesn't change the main functionality and should
> be transparent unless the caller is doing something crazy.  It *might*
> be nice to rename but I don't think that's a must here.
>
> Thanks.

--
You are changing the meaning of __bio_clone() here. In old code, the
number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
bi_iovec[0] and also restricting the number of allocated io_vecs of
the clone. It may be useful for cases were we would like a identical
copy of the original bio (may not be in current code base, but this
implementation is definitely not what one would expect from the name
"clone").

May be, call this new implementation some thing else (and use it for bcache)?

---

Like Mikulas pointed out, this is an exported function and silently
changing the semantics will break external modules.

Regards,
Muthu


>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 12/12] block: Only clone bio vecs that are in use

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
>   bio->bi_sector = bio_src->bi_sector;
>   bio->bi_bdev = bio_src->bi_bdev;
>   bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);

This isn't obvious at all.  Why no explanation anywhere?  Also it
would be nice to update comments of the updated functions so that it's
clear that only partial cloning happens.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

2012-08-08 Thread Tejun Heo

On Mon, Aug 06, 2012 at 07:16:33PM -0400, Mikulas Patocka wrote:
> Hi Kent
> 
> When you change the semantics of an exported function, rename that 
> function. There may be external modules that use __bio_clone and this 
> change could silently introduce bugs in them.
> 
> Otherwise, the patchset looks fine.

I don't know.  This doesn't change the main functionality and should
be transparent unless the caller is doing something crazy.  It *might*
be nice to rename but I don't think that's a must here.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 11/12] block: Add bio_clone_bioset()

2012-08-08 Thread Tejun Heo

On Mon, Aug 06, 2012 at 03:08:40PM -0700, Kent Overstreet wrote:
> This consolidates some code, and will help in a later patch changing how
> bio cloning works.

I think it would be better to introduce bio_clone*() functions in a
separate patch and convert the users in a different one.

>  /**
> - *   bio_clone   -   clone a bio
> + *   bio_clone_bioset -  clone a bio
>   *   @bio: bio to clone
>   *   @gfp_mask: allocation priority
> + *   @bs: bio_set to allocate from
>   *
>   *   Like __bio_clone, only also allocates the returned bio
>   */
> -struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> +struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
> +  struct bio_set *bs)
>  {
> - struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
> + struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
>  
>   if (!b)
>   return NULL;
> @@ -485,7 +487,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>   if (bio_integrity(bio)) {
>   int ret;
>  
> - ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
> + ret = bio_integrity_clone(b, bio, gfp_mask, bs);
>  
>   if (ret < 0) {
>   bio_put(b);
> @@ -495,6 +497,12 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>  
>   return b;
>  }
> +EXPORT_SYMBOL(bio_clone_bioset);
> +
> +struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> +{
> + return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
> +}

So, bio_clone() loses its function comment.  Also, does it even make
sense to call bio_clone() from fs_bio_set?  Let's say it's so, then
what's the difference from using _kmalloc variant?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] PM QoS: Add a metric : Bus Throughput.

2012-08-08 Thread Kyungmin Park

+ Myungjoo Ham,

It used at devfreq. Mr. Ham can you explain it in detail?

Thank you,
Kyungmin Park
,
On 8/9/12, Rafael J. Wysocki  wrote:
> On Wednesday, August 08, 2012, Jonghwa Lee wrote:
>> Bus throughput metric is added to PM QoS in order to control the
>> frequency of memory interfaces and busses with PM QoS.
>>
>> Signed-off-by: Jonghwa Lee 
>> Signed-off-by: Kyungmin Park 
>
> I said some time ago I didn't want any new global PM QoS classes to be
> added this way.
>
> Can you please post a driver patch using this new thing?
>
> Rafael
>
>
>> ---
>>  include/linux/pm_qos.h |2 ++
>>  kernel/power/qos.c |   15 ++-
>>  2 files changed, 16 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
>> index 233149c..6db4939 100644
>> --- a/include/linux/pm_qos.h
>> +++ b/include/linux/pm_qos.h
>> @@ -15,6 +15,7 @@ enum {
>>  PM_QOS_CPU_DMA_LATENCY,
>>  PM_QOS_NETWORK_LATENCY,
>>  PM_QOS_NETWORK_THROUGHPUT,
>> +PM_QOS_BUS_DMA_THROUGHPUT,
>>
>>  /* insert new class ID */
>>  PM_QOS_NUM_CLASSES,
>> @@ -26,6 +27,7 @@ enum {
>>  #define PM_QOS_NETWORK_LAT_DEFAULT_VALUE(2000 * USEC_PER_SEC)
>>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE 0
>>  #define PM_QOS_DEV_LAT_DEFAULT_VALUE0
>> +#define PM_QOS_BUS_DMA_THROUGHPUT_DEFAULT_VALUE 0
>>
>>  struct pm_qos_request {
>>  struct plist_node node;
>> diff --git a/kernel/power/qos.c b/kernel/power/qos.c
>> index 6a031e6..75322cc 100644
>> --- a/kernel/power/qos.c
>> +++ b/kernel/power/qos.c
>> @@ -100,12 +100,25 @@ static struct pm_qos_object
>> network_throughput_pm_qos = {
>>  .name = "network_throughput",
>>  };
>>
>> +static BLOCKING_NOTIFIER_HEAD(bus_dma_throughput_notifier);
>> +static struct pm_qos_constraints bus_dma_tput_constraints = {
>> +.list = PLIST_HEAD_INIT(bus_dma_tput_constraints.list),
>> +.target_value = PM_QOS_BUS_DMA_THROUGHPUT_DEFAULT_VALUE,
>> +.default_value = PM_QOS_BUS_DMA_THROUGHPUT_DEFAULT_VALUE,
>> +.type = PM_QOS_MAX,
>> +.notifiers = &bus_dma_throughput_notifier,
>> +};
>> +static struct pm_qos_object bus_dma_throughput_pm_qos = {
>> +.constraints = &bus_dma_tput_constraints,
>> +.name = "bus_dma_throughput",
>> +};
>>
>>  static struct pm_qos_object *pm_qos_array[] = {
>>  &null_pm_qos,
>>  &cpu_dma_pm_qos,
>>  &network_lat_pm_qos,
>> -&network_throughput_pm_qos
>> +&network_throughput_pm_qos,
>> +&bus_dma_throughput_pm_qos,
>>  };
>>
>>  static ssize_t pm_qos_power_write(struct file *filp, const char __user
>> *buf,
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

2012-08-08 Thread Tejun Heo

On Mon, Aug 06, 2012 at 03:08:39PM -0700, Kent Overstreet wrote:

How about the following?

There was no API to kmalloc bio and clone and osdblk was using
explicit bio_kmalloc() + __bio_clone().  (my guess here) As this is
inconvenient and there will be more users of it in the future, add
bio_clone_kmalloc() and use it in osdblk.

> Acked-by: Boaz Harrosh 
> Signed-off-by: Kent Overstreet 
> ---
>  drivers/block/osdblk.c |3 +--
>  fs/bio.c   |   13 +
>  fs/exofs/ore.c |5 ++---
>  include/linux/bio.h|1 +
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> index 87311eb..1bbc681 100644
> --- a/drivers/block/osdblk.c
> +++ b/drivers/block/osdblk.c
> @@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio 
> *old_chain, gfp_t gfpmask)
>   struct bio *tmp, *new_chain = NULL, *tail = NULL;
>  
>   while (old_chain) {
> - tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
> + tmp = bio_clone_kmalloc(old_chain, gfpmask);
>   if (!tmp)
>   goto err_out;
>  
> - __bio_clone(tmp, old_chain);
>   tmp->bi_bdev = NULL;
>   gfpmask &= ~__GFP_WAIT;
>   tmp->bi_next = NULL;
> diff --git a/fs/bio.c b/fs/bio.c
> index f0c865b..77b9313 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -497,6 +497,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(bio_clone);
>  

/**

PLEASE.

> +struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> +{
> + struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);

Can't we use %NULL bioset as an indication to allocate from kmalloc
instead of duping interfaces like this?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 09/12] block: Rework bio_pair_split()

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:38PM -0700, Kent Overstreet wrote:
> This changes bio_pair_split() to use the new bio_split() underneath,
> which gets rid of the single page bio limitation. The various callers
> are fixed up for the slightly different struct bio_pair, and to remove
> the unnecessary checks.
> 
> v5: Move extern declaration to proper patch, per Boaz

I don't get this.  Why can't bio_split() chain the split to the
original one thus make bio_pair unnecessary?  It's not like completing
the split bio with the same end_io ever makes sense.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] remoteproc: add rproc_report_crash function to notify rproc crashes

2012-08-08 Thread Fernando Guzman Lugo

This patch is exporting the rproc_report_crash function which can be used
to report a rproc crash to the remoteproc core. This function is specially
thought to be called by low-level remoteproc driver code in case of
detecting a crash (remoteproc is not functional anymore). Using this
function from another driver (non rproc driver) should be analyzed very
carefully most of the time that will be considered wrong.

rproc_report_crash function can be called from any context, that means,
it can be called from atomic context without any problem. The reporter
function is creating a new thread (workqueue work) in charge of handling
the crash (if possible).

Creating this new thread is done for two main reasons. First reason is
to be able to call it from atomic context, due to the fact that many
crashes trigger an interrupt, so this function can be called directly
from ISR context. Second reason is avoid any deadlock condition which
could happen if the rproc_report_crash function is called from a
function which is indirectly holding a rproc lock.

The reporter function is scheduling the crash handler task. This task
is thought to have some features like:

-remoteproc register dump
-remoteproc stack dump
-remoteproc core dump
-Saving of the remoteproc traces in order to be visible after the crash
-Reseting the remoteproc in order to make it functional again (hard recovery)

Right now, it is only printing the crash type which was detected. The
types of crashes are represented by an enum. I have only added mmufault
crash type. Remoteproc low-level drivers can add more types when needed.

Signed-off-by: Fernando Guzman Lugo 
---
 Documentation/remoteproc.txt |7 +++
 drivers/remoteproc/remoteproc_core.c |   80 +++---
 include/linux/remoteproc.h   |   18 
 3 files changed, 98 insertions(+), 7 deletions(-)

diff --git a/Documentation/remoteproc.txt b/Documentation/remoteproc.txt
index 23a09b8..e6469fd 100644
--- a/Documentation/remoteproc.txt
+++ b/Documentation/remoteproc.txt
@@ -129,6 +129,13 @@ int dummy_rproc_example(struct rproc *my_rproc)
 
   Returns 0 on success and -EINVAL if @rproc isn't valid.
 
+  void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
+- Report a crash in a remoteproc
+  This function must be called every time a crash is detected by the
+  platform specific rproc implementation. This should not be called from a
+  non-remoteproc driver. This function can be called from atomic/interrupt
+  context.
+
 5. Implementation callbacks
 
 These callbacks should be provided by platform-specific remoteproc
diff --git a/drivers/remoteproc/remoteproc_core.c 
b/drivers/remoteproc/remoteproc_core.c
index d5c2dbf..3a6f1a1 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -50,6 +50,18 @@ typedef int (*rproc_handle_resource_t)(struct rproc *rproc, 
void *, int avail);
 /* Unique indices for remoteproc devices */
 static DEFINE_IDA(rproc_dev_index);
 
+static const char * const rproc_crash_names[] = {
+   [RPROC_MMUFAULT]= "mmufault",
+};
+
+/* translate rproc_crash_type to string */
+static const char *rproc_crash_to_string(enum rproc_crash_type type)
+{
+   if (type < ARRAY_SIZE(rproc_crash_names))
+   return rproc_crash_names[type];
+   return "unkown";
+}
+
 /*
  * This is the IOMMU fault handler we register with the IOMMU API
  * (when relevant; not all remote processors access memory through
@@ -57,19 +69,17 @@ static DEFINE_IDA(rproc_dev_index);
  *
  * IOMMU core will invoke this handler whenever the remote processor
  * will try to access an unmapped device address.
- *
- * Currently this is mostly a stub, but it will be later used to trigger
- * the recovery of the remote processor.
  */
 static int rproc_iommu_fault(struct iommu_domain *domain, struct device *dev,
unsigned long iova, int flags, void *token)
 {
+   struct rproc *rproc = token;
+
dev_err(dev, "iommu fault: da 0x%lx flags 0x%x\n", iova, flags);
 
-   /*
-* Let the iommu core know we're not really handling this fault;
-* we just plan to use this as a recovery trigger.
-*/
+   rproc_report_crash(rproc, RPROC_MMUFAULT);
+
+   /* Let the iommu core know we're not really handling this fault; */
return -ENOSYS;
 }
 
@@ -872,6 +882,34 @@ out:
 }
 
 /**
+ * rproc_crash_handler_work() - handle a crash
+ *
+ * This function needs to handle everything related to a crash, like cpu
+ * registers and stack dump, information to help to debug the fatal error, etc.
+ */
+static void rproc_crash_handler_work(struct work_struct *work)
+{
+   struct rproc *rproc = container_of(work, struct rproc, crash_handler);
+   struct device *dev = &rproc->dev;
+
+   dev_dbg(dev, "enter %s\n", __func__);
+
+   mutex_lock(&rproc->lock);
+   if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLIN

[PATCH 2/3] remoteproc: recover a remoteproc when it has crashed

2012-08-08 Thread Fernando Guzman Lugo

This patch is introducing rproc_trigger_recover function which is in
charge of recovering the rproc. One way to recover the rproc after a crash
is resetting all its virtio devices. Doing that, all rpmsg drivers are
restored along with the rpmsg devices and that also causes the reset of
the remoteproc making the rpmsg communication with the remoteproc
functional again. So far, rproc_trigger_recover function is only resetting
all virtio devices, if in the future other rproc features are introduced
and need to be reset too, rproc_trigger_recover function should take care
of that.

Signed-off-by: Fernando Guzman Lugo 
---
 drivers/remoteproc/remoteproc_core.c |   28 +++-
 drivers/remoteproc/remoteproc_internal.h |1 +
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c 
b/drivers/remoteproc/remoteproc_core.c
index 3a6f1a1..c879069 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -882,6 +882,32 @@ out:
 }
 
 /**
+ * rproc_trigger_recover() - recover a remoteproc
+ * @rproc: the remote processor
+ *
+ * The recovery is done by reseting all the virtio devices, that way all the
+ * rpmsg drivers will be reseted along with the remote processor making the
+ * remoteproc functional again.
+ *
+ * This function can sleep, so that it cannot be called from atomic context.
+ */
+int rproc_trigger_recover(struct rproc *rproc)
+{
+   struct rproc_vdev *rvdev, *rvtmp;
+
+   dev_err(&rproc->dev, "recovering %s\n", rproc->name);
+
+   /* clean up remote vdev entries */
+   list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
+   rproc_remove_virtio_dev(rvdev);
+
+   /* run rproc_fw_config_virtio to create vdevs again */
+   return request_firmware_nowait(THIS_MODULE, FW_ACTION_HOTPLUG,
+   rproc->firmware, &rproc->dev, GFP_KERNEL,
+   rproc, rproc_fw_config_virtio);
+}
+
+/**
  * rproc_crash_handler_work() - handle a crash
  *
  * This function needs to handle everything related to a crash, like cpu
@@ -906,7 +932,7 @@ static void rproc_crash_handler_work(struct work_struct 
*work)
++rproc->crash_cnt, rproc->name);
mutex_unlock(&rproc->lock);
 
-   /* TODO: handle crash */
+   rproc_trigger_recover(rproc);
 }
 
 /**
diff --git a/drivers/remoteproc/remoteproc_internal.h 
b/drivers/remoteproc/remoteproc_internal.h
index a690ebe..d9c0730 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -63,6 +63,7 @@ void rproc_free_vring(struct rproc_vring *rvring);
 int rproc_alloc_vring(struct rproc_vdev *rvdev, int i);
 
 void *rproc_da_to_va(struct rproc *rproc, u64 da, int len);
+int rproc_trigger_recover(struct rproc *rproc);
 
 static inline
 int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/11] net/stmmac: mark probe function as __devinit

2012-08-08 Thread David Miller

From: Arnd Bergmann 
Date: Wed,  8 Aug 2012 16:47:24 +0200

> Driver probe functions are generally __devinit so they will be
> discarded after initialization for non-hotplug kernels.
> This was found by a new warning after patch 6a228452d "stmmac: Add
> device-tree support" adds a new __devinit function that is called
> from stmmac_pltfr_probe.
> 
> Without this patch, building socfpga_defconfig results in:
> 
> WARNING: drivers/net/ethernet/stmicro/stmmac/stmmac.o(.text+0x5d4c): Section 
> mismatch in reference from the function stmmac_pltfr_probe() to the function 
> .devinit.text:stmmac_probe_config_dt()
> The function stmmac_pltfr_probe() references
> the function __devinit stmmac_probe_config_dt().
> This is often because stmmac_pltfr_probe lacks a __devinit
> annotation or the annotation of stmmac_probe_config_dt is wrong.
> 
> Signed-off-by: Arnd Bergmann 
> Cc: Stefan Roese 
> Cc: Giuseppe Cavallaro 
> Cc: David S. Miller 
> Cc: net...@vger.kernel.org

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] remoteproc: create debugfs entry to disable/enable recovery dynamically

2012-08-08 Thread Fernando Guzman Lugo

Add a debugfs entry (named recovery) so that recovery can be disabled
dynamically at runtime. This entry is very useful when you are trying to
debug a rproc crash. Without this a recovery will take place making
impossible to debug the issue.

Original idea from Ohad Ben-Cohen and contributions from
Subramaniam Chanderashekarapuram

Example:
-disabling recovery:
$ echo disabled > /remoteproc/remoteproc0/recovery

-enabling recovery:
$ echo enabled > /remoteproc/remoteproc0/recovery

-in case you have disabled recovery and you want to continue
 debugging you can recover the remoteproc once using recover.
 This will not change the state of the recovery entry, it will
 only recovery the rproc if its state is RPROC_CRASHED
$ echo recover > /remoteproc/remoteproc0/recovery

Signed-off-by: Fernando Guzman Lugo 
---
 drivers/remoteproc/remoteproc_core.c|3 +-
 drivers/remoteproc/remoteproc_debugfs.c |   83 +++
 include/linux/remoteproc.h  |2 +
 3 files changed, 87 insertions(+), 1 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c 
b/drivers/remoteproc/remoteproc_core.c
index c879069..0b52169 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -932,7 +932,8 @@ static void rproc_crash_handler_work(struct work_struct 
*work)
++rproc->crash_cnt, rproc->name);
mutex_unlock(&rproc->lock);
 
-   rproc_trigger_recover(rproc);
+   if (!rproc->recovery_disabled)
+   rproc_trigger_recover(rproc);
 }
 
 /**
diff --git a/drivers/remoteproc/remoteproc_debugfs.c 
b/drivers/remoteproc/remoteproc_debugfs.c
index 0383385..aa95cde 100644
--- a/drivers/remoteproc/remoteproc_debugfs.c
+++ b/drivers/remoteproc/remoteproc_debugfs.c
@@ -28,6 +28,9 @@
 #include 
 #include 
 #include 
+#include 
+
+#include "remoteproc_internal.h"
 
 /* remoteproc debugfs parent dir */
 static struct dentry *rproc_dbg;
@@ -111,6 +114,84 @@ static const struct file_operations rproc_name_ops = {
.llseek = generic_file_llseek,
 };
 
+/* expose recovery flag via debugfs */
+static ssize_t rproc_recovery_read(struct file *filp, char __user *userbuf,
+   size_t count, loff_t *ppos)
+{
+   struct rproc *rproc = filp->private_data;
+   char *buf = rproc->recovery_disabled ? "disabled\n" : "enabled\n";
+
+   return simple_read_from_buffer(userbuf, count, ppos, buf, strlen(buf));
+}
+
+
+/*
+ * Writing to the recovey debugfs entry we can change the behavior of the
+ * recovery dynamically. The default value of this entry is "enabled".
+ *
+ * There are 3 possible options you can write to the recovery debug entry:
+ * "enabled", "disabled" and "recover"
+ *
+ * enabled:In this case recovery will be enabled, every time there is a
+ * rproc crashed the rproc will be recovered. If recovery has been
+ * disabled and it crashed and you enable recovery it will be
+ * recover as soon as you enable recovery.
+ * disabled:   In this case recovery will be disabled, that means if a rproc
+ * crashes it will remain in crashed state. Therefore the rproc
+ * won't be functional any more. But this option is used for
+ * debugging purposes. Otherwise, debugging a crash would not be
+ * possible.
+ * recover:This function will trigger a recovery without taking care of
+ * the recovery state (enabled/disabled) and without changing it.
+ * This useful for the cases when you are debugging a crash and
+ * after enabling recovery you get another crash immediately. As
+ * the recovery state will be enabled it will recover the rproc
+ * without let you debug the new crash. So, it is recommended to
+ * disabled recovery, then starting debugging and use "recovery"
+ * command while still debugging and when you are done then you
+ * case use enabled command.
+ */
+static ssize_t rproc_recovery_write(struct file *filp,
+   const char __user *user_buf, size_t count, loff_t *ppos)
+{
+   struct rproc *rproc = filp->private_data;
+   char buf[10];
+   int ret;
+
+   if (count > sizeof(buf))
+   return count;
+
+   ret = copy_from_user(buf, user_buf, count);
+   if (ret)
+   return ret;
+
+   /* remove end of line */
+   if (buf[count - 1] == '\n')
+   buf[count - 1] = '\0';
+
+   if (!strncmp(buf, "enabled", count)) {
+   rproc->recovery_disabled = false;
+   /* if rproc has crashed trigger recovery */
+   if (rproc->state == RPROC_CRASHED)
+   rproc_trigger_recover(rproc);
+   } else if (!strncmp(buf, "disabled", count)) {
+   rproc->recovery_disabled = true;
+   } else if (!strncmp(buf, "recover", count)) {
+   /* if rproc has crashed trigger recovery */
+

Re: [PATCH] lpc_eth: remove obsolete ifdefs

2012-08-08 Thread David Miller

From: Roland Stigge 
Date: Wed,  8 Aug 2012 15:18:54 +0200

> The #ifdefs regarding CONFIG_ARCH_LPC32XX_MII_SUPPORT and
> CONFIG_ARCH_LPC32XX_IRAM_FOR_NET are obsolete since the symbols have been
> removed from Kconfig and replaced by devicetree based configuration.
> 
> Signed-off-by: Roland Stigge 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] remoteproc: introduce rproc recovery

2012-08-08 Thread Fernando Guzman Lugo

These set of patches make possible the remoteproc recover after a crash.
This is a hard recovery, that means the remoteproc is reset and it will
start from the beginning. When a crash happen all the virtio devices are
destroyed. Therefore, both rpmsg drivers and devices are gracefully
removed which also cause rproc users become 0 and the remoteproc is turned
off. After the virtio devices are destroyed the crash handler function
will read the virtio information from the firmware in order to recreate
the virtio devices that will boot the remoteproc and everything will be
functional again.

Fernando Guzman Lugo (3):
  remoteproc: add rproc_report_crash function to notify rproc crashes
  remoteproc: recover a remoteproc when it has crashed
  remoteproc: create debugfs entry to disable/enable recovery
dynamically

 Documentation/remoteproc.txt |7 ++
 drivers/remoteproc/remoteproc_core.c |  107 --
 drivers/remoteproc/remoteproc_debugfs.c  |   83 +++
 drivers/remoteproc/remoteproc_internal.h |1 +
 include/linux/remoteproc.h   |   20 ++
 5 files changed, 211 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net/core: Fix potential memory leak in dev_set_alias()

2012-08-08 Thread David Miller

From: Alexey Khoroshilov 
Date: Wed,  8 Aug 2012 14:33:25 +0400

> Do not leak memory by updating pointer with potentially NULL realloc return 
> value.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 08/12] block: Introduce new bio_split()

2012-08-08 Thread Tejun Heo

One more thing.

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> + if (bio_integrity(bio)) {
> + bio_integrity_clone(ret, bio, gfp, bs);
> + bio_integrity_trim(ret, 0, bio_sectors(ret));
> + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));

Is this equivalent to bio_integrity_split() performance-wise?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cdc-phonet: Don't leak in usbpn_open

2012-08-08 Thread David Miller

From: "Rémi Denis-Courmont" 
Date: Wed, 8 Aug 2012 10:12:06 +0300

> Le mercredi 8 août 2012 00:56:26 Jesper Juhl, vous avez écrit :
>> We allocate memory for 'req' with usb_alloc_urb() and then test
>> 'if (!req || rx_submit(pnd, req, GFP_KERNEL | __GFP_COLD))'.
>> If we enter that branch due to '!req' then there is no problem. But if
>> we enter the branch due to 'req' being != 0 and the 'rx_submit()' call
>> being false, then we'll leak the memory we allocated.
>> Deal with the leak by always calling 'usb_free_urb(req)' when entering
>> the branch. If 'req' happens to be 0 then the call is harmless, if it
>> is not 0 then we free the memory we allocated but don't need.
>>
>> Signed-off-by: Jesper Juhl 
> 
> Acked-by: Rémi Denis-Courmont 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] batman-adv: Fix mem leak in the batadv_tt_local_event() function

2012-08-08 Thread David Miller

From: Antonio Quartulli 
Date: Tue, 7 Aug 2012 20:50:36 +0200

> On Tue, Aug 07, 2012 at 08:32:34PM +0200, Jesper Juhl wrote:
>> Memory is allocated for 'tt_change_node' with kmalloc().
>> 'tt_change_node' may go out of scope really being used for anything
>> (except have a few members initialized) if we hit the 'del:' label.
>> This patch makes sure we free the memory in that case.
>> 
>> Signed-off-by: Jesper Juhl 
> 
> Acked-by: Antonio Quartulli 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND] sched: add missing group change to qfq_change_class

2012-08-08 Thread David Miller

From: Paolo Valente 
Date: Tue, 7 Aug 2012 19:27:25 +0200

> [Resending again, as the text was corrupted by the email client]
> 
> To speed up operations, QFQ internally divides classes into
> groups. Which group a class belongs to depends on the ratio between
> the maximum packet length and the weight of the class. Unfortunately
> the function qfq_change_class lacks the steps for changing the group
> of a class when the ratio max_pkt_len/weight of the class changes.
> 
> For example, when the last of the following three commands is
> executed, the group of class 1:1 is not correctly changed:
> 
> tc disc add dev XXX root handle 1: qfq
> tc class add dev XXX parent 1: qfq classid 1:1 weight 1
> tc class change dev XXX parent 1: classid 1:1 qfq weight 4
> 
> Not changing the group of a class does not affect the long-term
> bandwidth guaranteed to the class, as the latter is independent of the
> maximum packet length, and correctly changes (only) if the weight of
> the class changes. In contrast, if the group of the class is not
> updated, the class is still guaranteed the short-term bandwidth and
> packet delay related to its old group, instead of the guarantees that
> it should receive according to its new weight and/or maximum packet
> length. This may also break service guarantees for other classes.
> This patch adds the missing operations.
> 
> Signed-off-by: Paolo Valente 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 08/12] block: Introduce new bio_split()

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
>  /**
> + * bio_split - split a bio
> + * @bio: bio to split
> + * @sectors: number of sectors to split from the front of @bio
> + * @gfp: gfp mask
> + * @bs:  bio set to allocate from
> + *
> + * Allocates and returns a new bio which represents @sectors from the start 
> of
> + * @bio, and updates @bio to represent the remaining sectors.
> + *
> + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> + * unchanged.

Umm I don't know.  This is rather confusing.  The function may
return new or old bios?  What's the rationale behind it?  Return
ERR_PTR(-EINVAL) instead?

> + *
> + * The newly allocated bio will point to @bio's bi_io_vec, if the split was 
> on a
> + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> + * freed before the split.

This is somewhat error-prone.  Given how splits are used now, this
might not be a big issue but it isn't difficult to imagine how this
could go subtly wrong.  More on this.

> + *
> + * BIG FAT WARNING:
> + *
> + * If you're calling this from under generic_make_request() (i.e.
> + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> + * workqueue if the allocation fails. Otherwise, your code will probably
> + * deadlock.

If the condition is detectable, WARN_ON_ONCE() please.

> + * You can't allocate more than once from the same bio pool without 
> submitting
> + * the previous allocations (so they'll eventually complete and deallocate
> + * themselves), but if you're under generic_make_request() those previous
> + * allocations won't submit until you return . And if you have to split bios,
   ^
   extra space
> + * you should expect that some bios will require multiple splits.
> + */
> +struct bio *bio_split(struct bio *bio, int sectors,
> +   gfp_t gfp, struct bio_set *bs)
> +{
> + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> + struct bio_vec *bv;
> + struct bio *ret = NULL;
> +
> + BUG_ON(sectors <= 0);
> +
> + if (sectors >= bio_sectors(bio))
> + return bio;
> +
> + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> +   bio->bi_sector + sectors);
> +
> + bio_for_each_segment(bv, bio, idx) {
> + vcnt = idx - bio->bi_idx;
> +
> + if (!nbytes) {
> + ret = bio_alloc_bioset(gfp, 0, bs);
> + if (!ret)
> + return NULL;
> +
> + ret->bi_io_vec = bio_iovec(bio);
> + ret->bi_flags |= 1 << BIO_CLONED;
> + break;
> + } else if (nbytes < bv->bv_len) {
> + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> + if (!ret)
> + return NULL;
> +
> + memcpy(ret->bi_io_vec, bio_iovec(bio),
> +sizeof(struct bio_vec) * vcnt);
> +
> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> + bv->bv_offset   += nbytes;
> + bv->bv_len  -= nbytes;
> + break;
> + }

Ummm... ISTR reviewing this code and getting confused by bio_alloc
inside bio_for_each_segment() loop and commenting something about
that.  Yeah, this one.

  http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/15790/focus=370

So, I actually have reviewed this but didn't get any response and
majority of the issues I raised aren't addressed and you sent the
patch to me again?  What the hell, Kent?

> +
> + nbytes -= bv->bv_len;
> + }
> +
> + ret->bi_bdev= bio->bi_bdev;
> + ret->bi_sector  = bio->bi_sector;
> + ret->bi_size= sectors << 9;
> + ret->bi_rw  = bio->bi_rw;
> + ret->bi_vcnt= vcnt;
> + ret->bi_max_vecs = vcnt;
> + ret->bi_end_io  = bio->bi_end_io;

Is this safe?  Why isn't this chaining completion of split bio to the
original one?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 2/3] virtio_balloon: introduce migration primitives to balloon pages

2012-08-08 Thread Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 138 +---
 include/linux/virtio_balloon.h  |   4 ++
 2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -35,6 +36,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 
+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon->pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
 struct virtio_balloon
 {
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
 * The pages we've told the Host we're not using.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+   __GFP_NORETRY | __GFP_NOWARN |
+   __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(&pages_lock);
list_add(&page->lru, &vb->pages);
+   page->mapping = balloon_mapping;
+   spin_unlock(&pages_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb->inflate_vq);
+   mutex_unlock(&balloon_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(&balloon_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(&pages_lock);
+   if (unlikely(list_empty(&vb->pages))) {
+   spin_unlock(&pages_lock);
+   break;
+   }
page = list_first_entry(&vb->pages, struct page, lru);
+   page->mapping = NULL;
list_del(&page->lru);
+   spin_unlock(&pages_lock);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to

[PATCH v6 0/3] make balloon pages movable by compaction

2012-08-08 Thread Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Rafael Aquini (3):
  mm: introduce compaction and migration for virtio ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c | 139 +---
 include/linux/mm.h  |  17 +
 include/linux/virtio_balloon.h  |   4 ++
 include/linux/vm_event_item.h   |   2 +
 mm/compaction.c | 132 --
 mm/migrate.c|  32 -
 mm/vmstat.c |   4 ++
 7 files changed, 302 insertions(+), 28 deletions(-)

Change log:
v6:
 * rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik);
v5:
 * address Andrew Morton's review comments on the patch series;
 * address a couple extra nitpick suggestions on PATCH 01 (Minchan);
v4: 
 * address Rusty Russel's review comments on PATCH 02;
 * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3: 
 * address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan);
V2: 
 * address Mel Gorman's review comments on PATCH 01;


Preliminary test results:
(2 VCPU 1024mB RAM KVM guest running 3.6.0_rc1+ -- after a reboot)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3520
compact_pages_moved 47548
compact_pagemigrate_failed 120
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 16378
compact_balloon_failed 0
compact_balloon_isolated 16378
compact_balloon_freed 16378

* 128mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 3356
compact_pages_moved 47099
compact_pagemigrate_failed 158
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 26275
compact_balloon_failed 42
compact_balloon_isolated 26317
compact_balloon_freed 26275

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages

2012-08-08 Thread Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini 
---
 include/linux/mm.h |  17 +++
 mm/compaction.c| 131 +
 mm/migrate.c   |  30 +++-
 3 files changed, 158 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..18f978b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1662,5 +1662,22 @@ static inline unsigned int 
debug_guardpage_minorder(void) { return 0; }
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+   defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern bool putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool movable_balloon_page(struct page *page)
+{
+   return (page->mapping && page->mapping == balloon_mapping);
+}
+
+#else
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline bool putback_balloon_page(struct page *page) { return false; }
+static inline bool movable_balloon_page(struct page *page) { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index e78cb96..7372592 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -21,6 +22,90 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * Users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ * There is no need on utilizing struct address_space locking schemes for
+ * balloon_mapping as, once it gets initialized at balloon driver, it will
+ * remain just like a static reference that helps us on identifying a guest
+ * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
+ * pointers to the functions that will execute the balloon page mobility tasks.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ *   .migratepage- used to perform balloon's page migration (as is)
+ *   .invalidatepage - used to isolate a page from balloon's page list
+ *   .freepage   - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->freepage(page);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+   if (WARN_ON(!movable_balloon_page(page)))
+   return false;
+
+   if (likely(get_page_unless_zero(page))) {
+   /*
+* As balloon pages are not isolated from LRU lists, concurrent
+* compaction threads can race against page migration functions
+* move_to_new_page() & __unmap_and_move().
+* In order to avoid having an already isolated balloon page
+* being (wrongly) re-isolated while it is under migration,
+* lets be sure we have the page lock before proceeding with
+* the balloon page isolation steps.
+*/
+   if (likely(trylock_page(page))) {
+   /*
+* A ballooned page, by default, has just one refcount.
+* Prevent concurrent compaction threads from isolating
+* an already isolated balloon page.
+*/
+   if (movable_balloon_page(page) &&
+   (page_count(page) == 2)) {
+   __isolate_balloon_page(page);
+   unlock_page(page);
+   return true;
+   }
+   unlock_page(page);
+   }
+   /* Drop refcount t

[PATCH v6 3/3] mm: add vm event counters for balloon pages compaction

2012-08-08 Thread Rafael Aquini

This patch is only for testing report purposes and shall be dropped in case of
the rest of this patchset getting accepted for merging.

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 1 +
 include/linux/vm_event_item.h   | 2 ++
 mm/compaction.c | 1 +
 mm/migrate.c| 6 --
 mm/vmstat.c | 4 
 5 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7c937a0..b8f7ea5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -414,6 +414,7 @@ int virtballoon_migratepage(struct address_space *mapping,
 
mutex_unlock(&balloon_lock);
 
+   count_vm_event(COMPACTBALLOONMIGRATED);
return 0;
 }
 
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 57f7b10..a632a5d 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,6 +41,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+   COMPACTBALLOONMIGRATED, COMPACTBALLOONFAILED,
+   COMPACTBALLOONISOLATED, COMPACTBALLOONFREED,
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7372592..5d6a344 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -77,6 +77,7 @@ bool isolate_balloon_page(struct page *page)
(page_count(page) == 2)) {
__isolate_balloon_page(page);
unlock_page(page);
+   count_vm_event(COMPACTBALLOONISOLATED);
return true;
}
unlock_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 871a304..4115875 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -79,9 +79,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
-   if (unlikely(movable_balloon_page(page)))
+   if (unlikely(movable_balloon_page(page))) {
+   count_vm_event(COMPACTBALLOONFAILED);
WARN_ON(!putback_balloon_page(page));
-   else
+   } else
putback_lru_page(page);
}
 }
@@ -872,6 +873,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+   count_vm_event(COMPACTBALLOONFREED);
return rc;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..8d80f60 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -768,6 +768,10 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
+   "compact_balloon_migrated",
+   "compact_balloon_failed",
+   "compact_balloon_isolated",
+   "compact_balloon_freed",
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] netvm: check for page == NULL when propogating the skb->pfmemalloc flag

2012-08-08 Thread David Miller

From: Mel Gorman 
Date: Tue, 7 Aug 2012 09:55:55 +0100

> Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible
> for the following bug triggered by a xen network driver
 ...
> The problem is that the xenfront driver is passing a NULL page to
> __skb_fill_page_desc() which was unexpected. This patch checks that
> there is a page before dereferencing.
> 
> Reported-and-Tested-by: Konrad Rzeszutek Wilk 
> Signed-off-by: Mel Gorman 

That call to __skb_fill_page_desc() in xen-netfront.c looks completely bogus.
It's the only driver passing NULL here.

That whole song and dance figuring out what to do with the head
fragment page, depending upon whether the length is greater than the
RX_COPY_THRESHOLD, is completely unnecessary.

Just use something like a call to __pskb_pull_tail(skb, len) and all
that other crap around that area can simply be deleted.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND][PATCH] drivers: net: irda: bfin_sir: fix compile error

2012-08-08 Thread David Miller

From: Bob Liu 
Date: Tue, 7 Aug 2012 10:08:36 +0800

> From: Sonic Zhang 
> 
> Bit IREN is replaced by UMOD_IRDA and UMOD_MASK since blackfin 60x added, but
> this driver didn't update which will cause bfin_sir build error:
> 
> drivers/net/irda/bfin_sir.c:161:9: error: 'IREN' undeclared (first use in this
> function)
> drivers/net/irda/bfin_sir.c:435:18: error: 'IREN' undeclared (first use in
> this function)
> drivers/net/irda/bfin_sir.c:521:11: error: 'IREN' undeclared (first use in
> this function)
> 
> This patch fix it.
> 
> Signed-off-by: Sonic Zhang 
> Signed-off-by: Bob Liu 
> Acked-by: Samuel Ortiz 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf: Add a new sort order: SORT_INCLUSIVE (v6)

2012-08-08 Thread Arun Sharma


On 8/8/12 12:16 PM, Arun Sharma wrote:


and therefore breaks the invariant period == period_self
in the default mode (no sort inclusive).



hist_entry__decay() also needs an update to maintain the invariant.

--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -138,6 +138,7 @@ static void hist_entry__add_cpumode_period(struct 
hist_entry *he,

 static void hist_entry__decay(struct hist_entry *he)
 {
he->period = (he->period * 7) / 8;
+   he->period_self = (he->period_self * 7) / 8;
he->nr_events = (he->nr_events * 7) / 8;
 }

 -Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86:fpu 2/2] arch/x86/kernel/signal.c:626:4: error: implicit declaration of function '__setup_frame'

2012-08-08 Thread H. Peter Anvin

On 07/26/2012 10:48 AM, Suresh Siddha wrote:

Appended the patch for this. Thanks!
---
From: Suresh Siddha 
Subject: x86, fpu: fix x86_64 build without CONFIG_IA32_EMULATION

Fengguang's automated build reported some compilation failures:

arch/x86/kernel/signal.c: In function 'setup_rt_frame':
arch/x86/kernel/signal.c:626:4: error: implicit declaration of function 
'__setup_frame'
arch/x86/kernel/xsave.c: In function 'save_fsave_header':
arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
...

Fix x86_64 kernel build without CONFIG_IA32_EMULATION.

Code saving fsave prefix is applicable only for CONFIG_X86_32 or
CONFIG_IA32_EMULATION. Use config_enabled() checks to remove the unnecessary
code compile-time for x86_64 kernels build without CONFIG_IA32_EMULATION.

Also while we are at this, fix a spurious warning:

arch/x86/kernel/xsave.c:209:15: warning: ignoring return value of 
‘__clear_user’, declared with attribute warn_unused_result

Signed-off-by: Suresh Siddha 

With this patch applied I get an error at:

/home/hpa/kernel/tip.x86-fpu/arch/x86/kernel/signal.c:792:12: error: 
static declaration of ‘x32_setup_rt_frame’ follows non-static declaration
In file included from 
/home/hpa/kernel/tip.x86-fpu/arch/x86/kernel/signal.c:29:0:
/home/hpa/kernel/tip.x86-fpu/arch/x86/include/asm/fpu-internal.h:62:5: 
note: previous declaration of ‘x32_setup_rt_frame’ was here
/home/hpa/kernel/tip.x86-fpu/arch/x86/kernel/signal.c:792:12: warning: 
‘x32_setup_rt_frame’ defined but not used [-Wunused-function]

make[4]: *** [arch/x86/kernel/signal.o] Error 1
make[3]: *** [arch/x86/kernel] Error 2
make[2]: *** [arch/x86] Error 2
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/hpa/kernel/tip.x86-fpu'

... for an x86-64 allyesconfig.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: removal of some trees

2012-08-08 Thread Stephen Rothwell

Hi,

On Tue, 07 Aug 2012 17:08:18 +0900 Kukjin Kim  wrote:
>
> I think, now, the 'samsung' can be removed from linux-next but I'm not sure
> about 'bjdooks-i2c'. If so, please change the name of my tree 's5p' to
> 'samsung'.
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git
> #for-next

Done.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp4JJMiYluqT.pgp
Description: PGP signature

Re: [PATCH v5 06/12] block: Add an explicit bio flag for bios that own their bvec

2012-08-08 Thread Tejun Heo

On Mon, Aug 06, 2012 at 03:08:35PM -0700, Kent Overstreet wrote:
> This is for the new bio splitting code. When we split a bio, if the
> split occured on a bvec boundry we reuse the bvec for the new bio. But
> that means bio_free() can't free it, hence the explicit flag.
> 
> Signed-off-by: Kent Overstreet 

Sans how the flag is preserved,

 Acked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL for 3.6-rc1] media updates part 2

2012-08-08 Thread David Rientjes

On Tue, 31 Jul 2012, Mauro Carvalho Chehab wrote:

>   [media] radio-shark: New driver for the Griffin radioSHARK USB radio 
> receiver

This one gives me a build warning if CONFIG_LEDS_CLASS is disabled:

ERROR: "led_classdev_register" [drivers/media/radio/shark2.ko] undefined!
ERROR: "led_classdev_unregister" [drivers/media/radio/shark2.ko] undefined!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 01/12] block: Generalized bio pool freeing

2012-08-08 Thread Tejun Heo

On Mon, Aug 06, 2012 at 03:08:30PM -0700, Kent Overstreet wrote:
> @@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
>   if (atomic_dec_and_test(&bio->bi_cnt)) {
>   bio_disassociate_task(bio);
>   bio->bi_next = NULL;
> - bio->bi_destructor(bio);
> +
> + if (bio->bi_pool)
> + bio_free(bio, bio->bi_pool);
> + else
> + bio->bi_destructor(bio);

So, this bi_pool overriding caller specified custom bi_destructor is
rather unusual.  I know why it's like that - the patch series is
gradually replacing bi_destructor with bi_pool and removes
bi_destructor eventually, but it would be far better if at least patch
description says why this is unusual like this.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 05/12] block: Kill bi_destructor

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:34PM -0700, Kent Overstreet wrote:
> Now that we've got generic code for freeing bios allocated from bio
> pools, this isn't needed anymore.
> 
> This also changes the semantics of bio_free() a bit - it now also frees
> bios allocated by bio_kmalloc(). It's also no longer exported, as
> without bi_destructor there should be no need for it to be called
> anywhere else.
> 
> v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
> 
> Signed-off-by: Kent Overstreet 
> ---
> diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
> index 920ede2..19bf632 100644
> --- a/drivers/block/drbd/drbd_main.c
> +++ b/drivers/block/drbd/drbd_main.c
> @@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops = {
>   .release = drbd_release,
>  };
>  
> -static void bio_destructor_drbd(struct bio *bio)
> -{
> - bio_free(bio, drbd_md_io_bio_set);
> -}
> -
>  struct bio *bio_alloc_drbd(gfp_t gfp_mask)
>  {
> - struct bio *bio;
> -
>   if (!drbd_md_io_bio_set)
>   return bio_alloc(gfp_mask, 1);
>  
> - bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> - if (!bio)
> - return NULL;
> - bio->bi_destructor = bio_destructor_drbd;
> - return bio;
> + return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
>  }

Does this chunk belong to this patch?

> @@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] 
> __read_mostly = {
>   */
>  struct bio_set *fs_bio_set;
>  
> +#define BIO_KMALLOC_POOL ((void *) ~0)

What's wrong with good ol' NULL?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, pci: Fix all early PCI scans to check the vendor ID first

2012-08-08 Thread Andi Kleen

From: Andi Kleen 

According to the Intel PCI experts it's not safe to check any
other field than vendor ID for 0x when doing PCI scans
to see if the device exists.

Several of the early PCI scans violated this. I changed
them all to always check the vendor ID first.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/aperture_64.c|5 +
 arch/x86/kernel/early-quirks.c   |3 +++
 arch/x86/kernel/pci-calgary_64.c |8 ++--
 arch/x86/pci/early.c |3 +++
 drivers/firewire/init_ohci1394_dma.c |3 +++
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index d5fd66f..e1ca7cd 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -206,6 +206,11 @@ static u32 __init search_agp_bridge(u32 *order, int 
*valid_agp)
for (func = 0; func < 8; func++) {
u32 class, cap;
u8 type;
+
+   if (read_pci_config_16(bus, slot, func, 
PCI_VENDOR_ID) 
+   == 0x)
+   continue;
+
class = read_pci_config(bus, slot, func,
PCI_CLASS_REVISION);
if (class == 0x)
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 3755ef4..f76b930 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -250,6 +250,9 @@ static int __init check_dev_quirk(int num, int slot, int 
func)
 
vendor = read_pci_config_16(num, slot, func, PCI_VENDOR_ID);
 
+   if (vendor == 0x)
+   return -1;
+
device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID);
 
for (i = 0; early_qrk[i].f != NULL; i++) {
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index 299d493..05798a0 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -1324,8 +1324,9 @@ static void __init get_tce_space_from_tar(void)
unsigned short pci_device;
u32 val;
 
-   val = read_pci_config(bus, 0, 0, 0);
-   pci_device = (val & 0x) >> 16;
+   if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0x)
+   continue;
+   pci_device = read_pci_config_16(bus, 0, 0, PCI_DEVICE_ID);
 
if (!is_cal_pci_dev(pci_device))
continue;
@@ -1426,6 +1427,9 @@ int __init detect_calgary(void)
unsigned short pci_device;
u32 val;
 
+   if (read_pci_config_16(bus, 0, 0, PCI_VENDOR_ID) == 0x)
+   continue;
+
val = read_pci_config(bus, 0, 0, 0);
pci_device = (val & 0x) >> 16;
 
diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
index d1067d5..4fb6847 100644
--- a/arch/x86/pci/early.c
+++ b/arch/x86/pci/early.c
@@ -91,6 +91,9 @@ void early_dump_pci_devices(void)
u32 class;
u8 type;
 
+   if (read_pci_config_16(bus, slot, func, 
PCI_VENDOR_ID) == 0x)
+   continue;
+
class = read_pci_config(bus, slot, func,
PCI_CLASS_REVISION);
if (class == 0x)
diff --git a/drivers/firewire/init_ohci1394_dma.c 
b/drivers/firewire/init_ohci1394_dma.c
index a9a347a..dd3bd84 100644
--- a/drivers/firewire/init_ohci1394_dma.c
+++ b/drivers/firewire/init_ohci1394_dma.c
@@ -279,6 +279,9 @@ void __init init_ohci1394_dma_on_all_controllers(void)
for (num = 0; num < 32; num++) {
for (slot = 0; slot < 32; slot++) {
for (func = 0; func < 8; func++) {
+   if (read_pci_config_16(num, slot, func, 
PCI_VENDOR_ID) == 0x)
+   continue;
+
class = read_pci_config(num, slot, func,
PCI_CLASS_REVISION);
if (class == 0x)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 18/41] TTY: pty, switch to tty_alloc_driver

2012-08-08 Thread Alan Cox

> this and 19/41. The merge with DEVPTS_MEM (the termios case) needs
> devpts_kill_index to be moved from tty_release to
> pty_driver->ops->cleanup/shutdown, but I don't feel comfortable to do it
> now since it needs some testing. So I would add this to TODO and will
> send it after the next merge window. If I understood your point correctly?

That seems sensible - its on my list to sort as well. We don't need to
allocate some of the pointer arrays for many of the drivers (anything
which doesn't save termios state for example)

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 04/12] pktcdvd: Switch to bio_kmalloc()

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:33PM -0700, Kent Overstreet wrote:
> This is prep work for killing bi_destructor - previously, pktcdvd had
> its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> necessitating its own bi_destructor implementation.
> 
> v5: Un-reorder some functions, to make the patch easier to review
> 
> Signed-off-by: Kent Overstreet 

Please Cc: the maintainers.  Cc'ing Peter Osterlund and keeping the
whole body for him.

Generally looks good to me.  How is this tested?

Thanks.

> ---
>  drivers/block/pktcdvd.c |   67 +++---
>  1 files changed, 16 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index ba66e44..ae55f08 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -101,6 +101,8 @@ static struct dentry  *pkt_debugfs_root = NULL; /* 
> /sys/kernel/debug/pktcdvd */
>  static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
>  static int pkt_remove_dev(dev_t pkt_dev);
>  static int pkt_seq_show(struct seq_file *m, void *p);
> +static void pkt_end_io_read(struct bio *bio, int err);
> +static void pkt_end_io_packet_write(struct bio *bio, int err);
>  
>  
>  
> @@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
>   }
>  }
>  
> -static void pkt_bio_destructor(struct bio *bio)
> -{
> - kfree(bio->bi_io_vec);
> - kfree(bio);
> -}
> -
> -static struct bio *pkt_bio_alloc(int nr_iovecs)
> -{
> - struct bio_vec *bvl = NULL;
> - struct bio *bio;
> -
> - bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
> - if (!bio)
> - goto no_bio;
> - bio_init(bio);
> -
> - bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
> - if (!bvl)
> - goto no_bvl;
> -
> - bio->bi_max_vecs = nr_iovecs;
> - bio->bi_io_vec = bvl;
> - bio->bi_destructor = pkt_bio_destructor;
> -
> - return bio;
> -
> - no_bvl:
> - kfree(bio);
> - no_bio:
> - return NULL;
> -}
> -
>  /*
>   * Allocate a packet_data struct
>   */
> @@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int 
> frames)
>   goto no_pkt;
>  
>   pkt->frames = frames;
> - pkt->w_bio = pkt_bio_alloc(frames);
> + pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
>   if (!pkt->w_bio)
>   goto no_bio;
>  
> + pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> + pkt->w_bio->bi_private = pkt;
> +
>   for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
>   pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
>   if (!pkt->pages[i])
> @@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int 
> frames)
>   bio_list_init(&pkt->orig_bios);
>  
>   for (i = 0; i < frames; i++) {
> - struct bio *bio = pkt_bio_alloc(1);
> + struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
>   if (!bio)
>   goto no_rd_bio;
> +
> + bio->bi_end_io = pkt_end_io_read;
> + bio->bi_private = pkt;
>   pkt->r_bios[i] = bio;
>   }
>  
> @@ -,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device 
> *pd, struct packet_data *pkt)
>* Schedule reads for missing parts of the packet.
>*/
>   for (f = 0; f < pkt->frames; f++) {
> - struct bio_vec *vec;
> -
>   int p, offset;
> +
>   if (written[f])
>   continue;
> +
>   bio = pkt->r_bios[f];
> - vec = bio->bi_io_vec;
> - bio_init(bio);
> - bio->bi_max_vecs = 1;
> - bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> - bio->bi_bdev = pd->bdev;
> - bio->bi_end_io = pkt_end_io_read;
> - bio->bi_private = pkt;
> - bio->bi_io_vec = vec;
> - bio->bi_destructor = pkt_bio_destructor;
> + bio_reset(bio);
> + bio->bi_sector  = pkt->sector + f * (CD_FRAMESIZE >> 9);
> + bio->bi_bdev= pd->bdev;
>  
>   p = (f * CD_FRAMESIZE) / PAGE_SIZE;
>   offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
> @@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
> struct packet_data *pkt)
>   }
>  
>   /* Start the write request */
> - bio_init(pkt->w_bio);
> - pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
> + bio_reset(pkt->w_bio);
>   pkt->w_bio->bi_sector = pkt->sector;
>   pkt->w_bio->bi_bdev = pd->bdev;
> - pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> - pkt->w_bio->bi_private = pkt;
> - pkt->w_bio->bi_io_vec = bvec;
> - pkt->w_bio->bi_destructor = pkt_bio_destructor;
>   for (f = 0; f < pkt->frames; f++)
>   if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, 
> bvec[f].bv_offset))
>   BUG();
> -- 
> 1.7.7.3
> 

-- 
tejun
--
To unsubscribe from this

Re: [PATCH v5 03/12] block: Add bio_reset()

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:32PM -0700, Kent Overstreet wrote:
> Reusing bios is something that's been highly frowned upon in the past,
> but driver code keeps doing it anyways. If it's going to happen anyways,
> we should provide a generic method.
> 
> This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> was open coding it, by doing a bio_init() and resetting bi_destructor.
> 
> v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
> bio->bi_flags are saved.
> 
> Signed-off-by: Kent Overstreet 
> Change-Id: I4eb2975bd678d3be811d5423d0620b08020be9ff

Please drop Change-Id.  Die gerrit die.

> +void bio_reset(struct bio *bio)
> +{
> + unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);

How many flags are we talking about?  If there aren't too many, I'd
prefer explicit BIO_FLAGS_PRESERVED or whatever.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: UIO: missing resource mapping

2012-08-08 Thread Hans J. Koch

On Wed, Jul 18, 2012 at 12:40:47PM +0200, Dominic Eschweiler wrote:
> Am Montag, den 16.07.2012, 23:58 +0200 schrieb Hans J. Koch:
> > Try to hack up a patch to add generic BAR mapping to uio_pci_generic.c
> > and post it for review.
> > 
> 
> Here we go ...

Thank you very much for your work. I'm really sorry for the long delay,
but I was busy finishing a project because I go to vacation tomorrow.
Sorry, that might cause further delay since I don't know yet how often
I can read my mail...
Greg, can you review the next one?

Here's a first review.

Thanks,
Hans

> > 
> Signed-off-by: Dominic Eschweiler 
> diff --git a/drivers/uio/uio_pci_generic.c
> b/drivers/uio/uio_pci_generic.c
> index 0bd08ef..e25991e 100644
> --- a/drivers/uio/uio_pci_generic.c
> +++ b/drivers/uio/uio_pci_generic.c
> @@ -25,10 +25,12 @@
>  #include 
>  #include 
>  
> -#define DRIVER_VERSION   "0.01.0"
> +#define DRIVER_VERSION   "0.02.0"
>  #define DRIVER_AUTHOR"Michael S. Tsirkin "
>  #define DRIVER_DESC  "Generic UIO driver for PCI 2.3 devices"
>  
> +#define DRV_NAME "uio_pci_generic"
> +
>  struct uio_pci_generic_dev {
>   struct uio_info info;
>   struct pci_dev *pdev;
> @@ -58,6 +60,7 @@ static int __devinit probe(struct pci_dev *pdev,
>  {
>   struct uio_pci_generic_dev *gdev;
>   int err;
> + int i;
>  
>   err = pci_enable_device(pdev);
>   if (err) {
> @@ -67,8 +70,7 @@ static int __devinit probe(struct pci_dev *pdev,
>   }
>  
>   if (!pdev->irq) {
> - dev_warn(&pdev->dev, "No IRQ assigned to device: "
> -  "no support for interrupts?\n");
> + dev_warn(&pdev->dev, "No IRQ assigned to device: no support for
> interrupts?\n");

Please configure your mail client not to break lines when sending a patch.
It can't be applied like this.

Why did you make that change anyway? If it's just coding style, please send
another patch, don't mix functional changes with coding style fixes.

>   pci_disable_device(pdev);
>   return -ENODEV;
>   }
> @@ -91,10 +93,31 @@ static int __devinit probe(struct pci_dev *pdev,
>   gdev->info.handler = irqhandler;
>   gdev->pdev = pdev;
>  
> + /* request regions */
> + err = pci_request_regions(pdev, DRV_NAME);
> + if (err) {
> + dev_err(&pdev->dev, "Couldn't get PCI resources, aborting\n");
> + return err;
> + }
> +
> + /* create attributes for BAR mappings */
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + if (pdev->resource[i].flags &&
> + (pdev->resource[i].flags & IORESOURCE_MEM)) {
> + gdev->info.mem[i].addr = pci_resource_start(pdev, i);
> + gdev->info.mem[i].size = pci_resource_len(pdev, i);
> + gdev->info.mem[i].internal_addr = NULL;
> + gdev->info.mem[i].memtype = UIO_MEM_PHYS;
> + }
> + }
> +
>   if (uio_register_device(&pdev->dev, &gdev->info))
>   goto err_register;
>   pci_set_drvdata(pdev, gdev);
>  
> + pr_info("UIO_PCI_GENERIC : initialized new device (%x %x)\n",

Please use dev_info()

> + pdev->vendor, pdev->device);
> +
>   return 0;
>  err_register:
>   kfree(gdev);
> @@ -107,17 +130,21 @@ err_verify:
>  static void remove(struct pci_dev *pdev)
>  {
>   struct uio_pci_generic_dev *gdev = pci_get_drvdata(pdev);
> -
>   uio_unregister_device(&gdev->info);
> +
> + pci_release_regions(pdev);
>   pci_disable_device(pdev);
>   kfree(gdev);
> +
> + pr_info("UIO_PCI_GENERIC : removed device (%x %x)\n",

ditto

> + pdev->vendor, pdev->device);
>  }
>  
>  static struct pci_driver driver = {
> - .name = "uio_pci_generic",
> + .name = DRV_NAME,
>   .id_table = NULL, /* only dynamic id's */
> - .probe = probe,
> - .remove = remove,
> + .probe= probe,
> + .remove   = remove,

As above: Please put coding style fixes in an extra patch (if you really
insist on tabs instead of spaces...)

>  };
>  
>  static int __init init(void)
> 
> -- 
> Gruß
>   Dominic
> 
> Frankfurt Institute for Advanced Studies (FIAS)
> Ruth-Moufang-Straße 1
> D-60438 Frankfurt am Main
> Germany
> 
> Phone:  +49 69 79844114
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

2012-08-08 Thread Tejun Heo

Hello,

On Mon, Aug 06, 2012 at 03:08:31PM -0700, Kent Overstreet wrote:
> Previously, dm_rq_clone_bio_info needed to be freed by the bio's
> destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
> This gets rid of a memory allocation and means we can kill
> dm_rq_bio_destructor.
> 
> Signed-off-by: Kent Overstreet 
> ---
>  drivers/md/dm.c |   31 +--
>  1 files changed, 5 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 40b7735..4014696 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -92,6 +92,7 @@ struct dm_rq_target_io {
>  struct dm_rq_clone_bio_info {
>   struct bio *orig;
>   struct dm_rq_target_io *tio;
> + struct bio clone;
>  };
...
> @@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned 
> type, unsigned integrity)
>   if (!pools->tio_pool)
>   goto free_io_pool_and_out;
>  
> - pools->bs = bioset_create(pool_size, 0);
> + pools->bs = bioset_create(pool_size,
> +   offsetof(struct dm_rq_clone_bio_info, orig));
>   if (!pools->bs)
>   goto free_tio_pool_and_out;

I do like this approach much better but this isn't something
super-obvious.  Can we please explain what's going on?  Especially,
the comment above dm_rq_clone_bio_info is outright misleading now.

Can someone more familiar review this one?  Alasdir, Mike?

Also, how was this tested?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: tps6586x: correct vin pin for sm0/sm1/sm2

2012-08-08 Thread Stephen Warren

On 07/24/2012 02:18 AM, Laxman Dewangan wrote:
> As per datasheet, the vin pin for the regulator is named
> as vin_sm0, vin_sm1, vin_sm2 for sm0, sm1 and sm2 respectively.
> 
> Correcting the names in driver and documentation to match with
> datasheet.

Mark,

This patch was in next-20120803, but seems to have been dropped from
next-20120806 and later:

> git log next-20120803 --oneline -- drivers/regulator/tps6586x-regulator.c|cat
> c7bc4e5 regulator: tps6586x: correct vin pin for sm0/sm1/sm2
> 7c7fac3 regulator: tps6586x: add support for input supply
> f464703 regulator: tps6586x: Convert to regulator_list_voltage_table

> git log next-20120806 --oneline -- drivers/regulator/tps6586x-regulator.c|cat
> 4c79c8d regulator: tps6586x: Convert to 
> regulator_[enable|disable|is_enabled|get_voltage_sel]_regmap
> 7c7fac3 regulator: tps6586x: add support for input supply
> f464703 regulator: tps6586x: Convert to regulator_list_voltage_table

I assume that was just an accident?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet

On Wed, 2012-08-08 at 16:46 -0400, Paul Moore wrote:
> On Wednesday, August 08, 2012 10:32:52 PM Eric Dumazet wrote:
> > On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote:
> > > On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote:
> > > > Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
> > > > code.
> > > 
> > > Sure but it seems include file misses an accessor for this.
> > > 
> > > We could add it on a future cleanup patch, as Paul mentioned.
> > 
> > I cooked following patch.
> > But smack/smack_lsm.c makes a reference to
> > smk_of_current()... so it seems we are in a hole...
> > 
> > It makes little sense to me to have any kind of security on this
> > internal sockets.
> > 
> > Maybe selinux should not crash if sk->sk_security is NULL ?
> 
> I realize our last emails probably passed each other mid-flight, but 
> hopefully 
> it explains why we can't just pass packets when sk->sk_security is NULL.
> 
> Regardless, some quick comments below ...
> 
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 6c77f63..459eca6 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -4289,10 +4289,13 @@ out:
> > return 0;
> >  }
> > 
> > -static int selinux_sk_alloc_security(struct sock *sk, int family, ...
> > +static int selinux_sk_alloc_security(struct sock *sk, int family, ...
> >  {
> > struct sk_security_struct *sksec;
> > 
> > +   if (check && sk->sk_security)
> > +   return 0;
> > +
> > sksec = kzalloc(sizeof(*sksec), priority);
> > if (!sksec)
> > return -ENOMEM;
> 
> I think I might replace the "check" boolean with a "kern/kernel" boolean so 
> that in addition to the allocation we can also initialize the socket to 
> SECINITSID_KERNEL/kernel_t here in the case when the boolean is set.  The 
> only 
> place that would set the boolean to true would be ip_send_unicast_reply(), 
> all 
> other callers would set it to false.
> 
> > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> > index 8221514..8965cf1 100644
> > --- a/security/smack/smack_lsm.c
> > +++ b/security/smack/smack_lsm.c
> > @@ -1754,11 +1754,14 @@ static void smack_task_to_inode(struct task_struct
> > *p, struct inode *inode) *
> >   * Returns 0 on success, -ENOMEM is there's no memory
> >   */
> > -static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t
> > gfp_flags) +static int smack_sk_alloc_security(struct sock *sk, int family,
> > gfp_t gfp_flags, bool check) {
> > char *csp = smk_of_current();
> > struct socket_smack *ssp;
> > 
> > +   if (check && sk->sk_security)
> > +   return 0;
> > +
> > ssp = kzalloc(sizeof(struct socket_smack), gfp_flags);
> > if (ssp == NULL)
> > return -ENOMEM;
> 
> In the case of Smack, when the kernel boolean is true I think the right 
> solution is to use smack_net_ambient.
> 

cool, here the last version :

diff --git a/include/linux/security.h b/include/linux/security.h
index 4e5a73c..4d8e454 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1601,7 +1601,7 @@ struct security_operations {
int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb);
int (*socket_getpeersec_stream) (struct socket *sock, char __user 
*optval, int __user *optlen, unsigned len);
int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff 
*skb, u32 *secid);
-   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority);
+   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority, 
bool kernel);
void (*sk_free_security) (struct sock *sk);
void (*sk_clone_security) (const struct sock *sk, struct sock *newsk);
void (*sk_getsecid) (struct sock *sk, u32 *secid);
@@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct sk_buff 
*skb);
 int security_socket_getpeersec_stream(struct socket *sock, char __user *optval,
  int __user *optlen, unsigned len);
 int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, 
u32 *secid);
-int security_sk_alloc(struct sock *sk, int family, gfp_t priority);
+int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool 
kernel);
 void security_sk_free(struct sock *sk);
 void security_sk_clone(const struct sock *sk, struct sock *newsk);
 void security_sk_classify_flow(struct sock *sk, struct flowi *fl);
@@ -2667,7 +2667,7 @@ static inline int security_socket_getpeersec_dgram(struct 
socket *sock, struct s
return -ENOPROTOOPT;
 }
 
-static inline int security_sk_alloc(struct sock *sk, int family, gfp_t 
priority)
+static inline int security_sk_alloc(struct sock *sk, int family, gfp_t 
priority, bool kernel)
 {
return 0;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 8f67ced..e00cadf 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1186,7 +1186,7 @@ static struct sock *sk_prot_alloc(struct proto *p

Re: [dm-devel] [PATCH] dm: verity support data device offset (Linux 3.4.7)

2012-08-08 Thread Wesley Miaw

On Aug 8, 2012, at 1:56 PM, Milan Broz wrote:

> On 08/08/2012 10:46 PM, Wesley Miaw wrote:
> 
>> I did modify veritysetup on my own so the format and verify commands will 
>> work with regular files on disk instead of having to mount through loop 
>> devices.
> 
> Which veritysetup? In upstream (cryptsetup repository) it allocates loop 
> automatically.
> (And for userspace verification it doesn't need loop at all.)
> 
> Anyway, please send a patch for userspace as well then ;-)

I grabbed cryptsetup from http://code.google.com/p/cryptsetup as what I read 
said that is the most recent. And then modified the code in there because the 
final block device images need to combine the file system, hash data, and some 
metadata into a single image and I don't want my users to need root privileges.

I can send a separate patch of those changes, but I'm not sure to where? Also 
to the LKML?

Thanks,
--
Wesley Miaw

signature.asc
Description: Message signed with OpenPGP using GPGMail

[PATCH 2/7] workqueue: make deferrable delayed_work initializer names consistent

2012-08-08 Thread Tejun Heo

Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo 
---
 arch/powerpc/platforms/cell/cpufreq_spudemand.c |2 +-
 drivers/cpufreq/cpufreq_conservative.c  |2 +-
 drivers/cpufreq/cpufreq_ondemand.c  |2 +-
 drivers/devfreq/devfreq.c   |2 +-
 drivers/net/ethernet/mellanox/mlx4/sense.c  |2 +-
 drivers/power/ab8500_btemp.c|2 +-
 drivers/power/ab8500_charger.c  |8 
 drivers/power/ab8500_fg.c   |8 
 drivers/power/abx500_chargalg.c |4 ++--
 drivers/power/max17040_battery.c|2 +-
 drivers/video/omap2/displays/panel-taal.c   |6 +++---
 drivers/video/omap2/dss/dsi.c   |4 ++--
 include/linux/workqueue.h   |8 
 mm/slab.c   |2 +-
 mm/vmstat.c |2 +-
 net/core/neighbour.c|2 +-
 net/ipv4/inetpeer.c |2 +-
 net/sunrpc/cache.c  |2 +-
 18 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/platforms/cell/cpufreq_spudemand.c 
b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
index 23bc9db..82607d6 100644
--- a/arch/powerpc/platforms/cell/cpufreq_spudemand.c
+++ b/arch/powerpc/platforms/cell/cpufreq_spudemand.c
@@ -76,7 +76,7 @@ static void spu_gov_work(struct work_struct *work)
 static void spu_gov_init_work(struct spu_gov_info_struct *info)
 {
int delay = usecs_to_jiffies(info->poll_int);
-   INIT_DELAYED_WORK_DEFERRABLE(&info->work, spu_gov_work);
+   INIT_DEFERRABLE_WORK(&info->work, spu_gov_work);
schedule_delayed_work_on(info->policy->cpu, &info->work, delay);
 }
 
diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 235a340..55f0354 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -466,7 +466,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s 
*dbs_info)
delay -= jiffies % delay;
 
dbs_info->enable = 1;
-   INIT_DELAYED_WORK_DEFERRABLE(&dbs_info->work, do_dbs_timer);
+   INIT_DEFERRABLE_WORK(&dbs_info->work, do_dbs_timer);
schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work, delay);
 }
 
diff --git a/drivers/cpufreq/cpufreq_ondemand.c 
b/drivers/cpufreq/cpufreq_ondemand.c
index 836e9b0..14c1af5 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -644,7 +644,7 @@ static inline void dbs_timer_init(struct cpu_dbs_info_s 
*dbs_info)
delay -= jiffies % delay;
 
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
-   INIT_DELAYED_WORK_DEFERRABLE(&dbs_info->work, do_dbs_timer);
+   INIT_DEFERRABLE_WORK(&dbs_info->work, do_dbs_timer);
schedule_delayed_work_on(dbs_info->cpu, &dbs_info->work, delay);
 }
 
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 70c31d4..b146d76 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -607,7 +607,7 @@ static int __init devfreq_start_polling(void)
mutex_lock(&devfreq_list_lock);
polling = false;
devfreq_wq = create_freezable_workqueue("devfreq_wq");
-   INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+   INIT_DEFERRABLE_WORK(&devfreq_work, devfreq_monitor);
mutex_unlock(&devfreq_list_lock);
 
devfreq_monitor(&devfreq_work.work);
diff --git a/drivers/net/ethernet/mellanox/mlx4/sense.c 
b/drivers/net/ethernet/mellanox/mlx4/sense.c
index 8024982..37b2378 100644
--- a/drivers/net/ethernet/mellanox/mlx4/sense.c
+++ b/drivers/net/ethernet/mellanox/mlx4/sense.c
@@ -153,5 +153,5 @@ void  mlx4_sense_init(struct mlx4_dev *dev)
for (port = 1; port <= dev->caps.num_ports; port++)
sense->do_sense_port[port] = 1;
 
-   INIT_DELAYED_WORK_DEFERRABLE(&sense->sense_poll, mlx4_sense_port);
+   INIT_DEFERRABLE_WORK(&sense->sense_poll, mlx4_sense_port);
 }
diff --git a/drivers/power/ab8500_btemp.c b/drivers/power/ab8500_btemp.c
index bba3cca..3041514 100644
--- a/drivers/power/ab8500_btemp.c
+++ b/drivers/power/ab8500_btemp.c
@@ -1018,7 +1018,7 @@ static int __devinit ab8500_btemp_probe(struct 
platform_device *pdev)
}
 
/* Init work for measuring temperature periodically */
-   INIT_DELAYED_WORK_DEFERRABLE(&di->btemp_periodic_work,
+   INIT_DEFERRABLE_WORK(&di->btemp_periodic_work,
ab8500_btemp_periodic_work);
 
/* Identify the battery */
diff --git a/drivers/power/ab8500_charger.c b/drivers/

[PATCH 4/7] workqueue: use irqsafe timer for delayed_work

2012-08-08 Thread Tejun Heo

Up to now, for delayed_works, try_to_grab_pending() couldn't be used
from IRQ handlers because IRQs may happen while
delayed_work_timer_fn() is in progress leading to indefinite -EAGAIN.

This patch makes delayed_work use the new TIMER_IRQSAFE flag for
delayed_work->timer.  This makes try_to_grab_pending() and thus
mod_delayed_work_on() safe to call from IRQ handlers.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |8 +---
 kernel/workqueue.c|   20 +++-
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 83755f4..093968e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -136,7 +136,8 @@ struct execute_work {
 #define __DELAYED_WORK_INITIALIZER(n, f, tflags) { \
.work = __WORK_INITIALIZER((n).work, (f)),  \
.timer = __TIMER_INITIALIZER(delayed_work_timer_fn, \
-   0, (unsigned long)&(n), (tflags)),  \
+0, (unsigned long)&(n),\
+(tflags) | TIMER_IRQSAFE), \
}
 
 #define DECLARE_WORK(n, f) \
@@ -214,7 +215,8 @@ static inline unsigned int work_static(struct work_struct 
*work) { return 0; }
do {\
INIT_WORK(&(_work)->work, (_func)); \
__setup_timer(&(_work)->timer, delayed_work_timer_fn,   \
- (unsigned long)(_work), (_tflags));   \
+ (unsigned long)(_work),   \
+ (_tflags) | TIMER_IRQSAFE);   \
} while (0)
 
 #define __INIT_DELAYED_WORK_ONSTACK(_work, _func, _tflags) \
@@ -223,7 +225,7 @@ static inline unsigned int work_static(struct work_struct 
*work) { return 0; }
__setup_timer_on_stack(&(_work)->timer, \
   delayed_work_timer_fn,   \
   (unsigned long)(_work),  \
-  (_tflags));  \
+  (_tflags) | TIMER_IRQSAFE);  \
} while (0)
 
 #define INIT_DELAYED_WORK(_work, _func)
\
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 11723c5..9087599 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1041,16 +1041,14 @@ static void cwq_dec_nr_in_flight(struct 
cpu_workqueue_struct *cwq, int color,
  * for arbitrarily long
  *
  * On >= 0 return, the caller owns @work's PENDING bit.  To avoid getting
- * preempted while holding PENDING and @work off queue, preemption must be
- * disabled on entry.  This ensures that we don't return -EAGAIN while
- * another task is preempted in this function.
+ * interrupted while holding PENDING and @work off queue, irq must be
+ * disabled on entry.  This, combined with delayed_work->timer being
+ * irqsafe, ensures that we return -EAGAIN for finite short period of time.
  *
  * On successful return, >= 0, irq is disabled and the caller is
  * responsible for releasing it using local_irq_restore(*@flags).
  *
- * This function is safe to call from any context other than IRQ handler.
- * An IRQ handler may run on top of delayed_work_timer_fn() which can make
- * this function return -EAGAIN perpetually.
+ * This function is safe to call from any context including IRQ handler.
  */
 static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
   unsigned long *flags)
@@ -1065,6 +1063,11 @@ static int try_to_grab_pending(struct work_struct *work, 
bool is_dwork,
if (is_dwork) {
struct delayed_work *dwork = to_delayed_work(work);
 
+   /*
+* dwork->timer is irqsafe.  If del_timer() fails, it's
+* guaranteed that the timer is not queued anywhere and not
+* running on the local CPU.
+*/
if (likely(del_timer(&dwork->timer)))
return 1;
}
@@ -1318,9 +1321,8 @@ void delayed_work_timer_fn(unsigned long __data)
struct delayed_work *dwork = (struct delayed_work *)__data;
struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
-   local_irq_disable();
+   /* should have been called from irqsafe timer with irq already off */
__queue_work(dwork->cpu, cwq->wq, &dwork->work);
-   local_irq_enable();
 }
 EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
@@ -1429,7 +1431,7 @@ EXPORT_SYMBOL_GPL(queue_delayed_work);
  * Returns %false if @dwork was idle and queued, %true if @dwork was
  * pending and its timer was modified.
  *
- * This function is safe to call from any

[PATCH 6/7] workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()

2012-08-08 Thread Tejun Heo

cancel_delayed_work() can't be called from IRQ handlers due to its use
of del_timer_sync() and can't cancel work items which are already
transferred from timer to worklist.

Also, unlike other flush and cancel functions, a canceled delayed_work
would still point to the last associated cpu_workqueue.  If the
workqueue is destroyed afterwards and the work item is re-used on a
different workqueue, the queueing code can oops trying to dereference
already freed cpu_workqueue.

This patch reimplements cancel_delayed_work() using
try_to_grab_pending() and set_work_cpu_and_clear_pending().  This
allows the function to be called from IRQ handlers and makes its
behavior consistent with other flush / cancel functions.

Signed-off-by: Tejun Heo 
Cc: Linus Torvalds 
Cc: Ingo Molnar 
Cc: Andrew Morton 
---
 include/linux/workqueue.h |   17 +
 kernel/workqueue.c|   30 ++
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 093968e..6306157 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -417,6 +417,7 @@ extern bool cancel_work_sync(struct work_struct *work);
 
 extern bool flush_delayed_work(struct delayed_work *dwork);
 extern bool flush_delayed_work_sync(struct delayed_work *work);
+extern bool cancel_delayed_work(struct delayed_work *dwork);
 extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
 
 extern void workqueue_set_max_active(struct workqueue_struct *wq,
@@ -426,22 +427,6 @@ extern unsigned int work_cpu(struct work_struct *work);
 extern unsigned int work_busy(struct work_struct *work);
 
 /*
- * Kill off a pending schedule_delayed_work().  Note that the work callback
- * function may still be running on return from cancel_delayed_work(), unless
- * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or
- * cancel_work_sync() to wait on it.
- */
-static inline bool cancel_delayed_work(struct delayed_work *work)
-{
-   bool ret;
-
-   ret = del_timer_sync(&work->timer);
-   if (ret)
-   work_clear_pending(&work->work);
-   return ret;
-}
-
-/*
  * Like above, but uses del_timer() instead of del_timer_sync(). This means,
  * if it returns 0 the timer function may be running and the queueing is in
  * progress.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9087599..7413242 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3031,6 +3031,36 @@ bool flush_delayed_work_sync(struct delayed_work *dwork)
 EXPORT_SYMBOL(flush_delayed_work_sync);
 
 /**
+ * cancel_delayed_work - cancel a delayed work
+ * @dwork: delayed_work to cancel
+ *
+ * Kill off a pending delayed_work.  Returns %true if @dwork was pending
+ * and canceled; %false if wasn't pending.  Note that the work callback
+ * function may still be running on return, unless it returns %true and the
+ * work doesn't re-arm itself.  Explicitly flush or use cancel_work_sync()
+ * to wait on it.
+ *
+ * This function is safe to call from any context including IRQ handler.
+ */
+bool cancel_delayed_work(struct delayed_work *dwork)
+{
+   unsigned long flags;
+   int ret;
+
+   do {
+   ret = try_to_grab_pending(&dwork->work, true, &flags);
+   } while (unlikely(ret == -EAGAIN));
+
+   if (unlikely(ret < 0))
+   return false;
+
+   set_work_cpu_and_clear_pending(&dwork->work, work_cpu(&dwork->work));
+   local_irq_restore(flags);
+   return true;
+}
+EXPORT_SYMBOL(cancel_delayed_work);
+
+/**
  * cancel_delayed_work_sync - cancel a delayed work and wait for it to finish
  * @dwork: the delayed work cancel
  *
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/7] workqueue: use mod_delayed_work() instead of __cancel + queue

2012-08-08 Thread Tejun Heo

Now that mod_delayed_work() is safe to call from IRQ handlers,
__cancel_delayed_work() followed by queue_delayed_work() can be
replaced with mod_delayed_work().

Most conversions are straight-forward except for the following.

* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
  elaborate dancing around its delayed_work.  Collapse it such that
  linkwatch_work is queued for immediate execution if LW_URGENT and
  existing timer is kept otherwise.

Signed-off-by: Tejun Heo 
Cc: "David S. Miller" 
Cc: Tomi Valkeinen 
---
 block/blk-core.c|6 ++
 block/blk-throttle.c|7 +--
 drivers/block/floppy.c  |3 +--
 drivers/infiniband/core/mad.c   |   14 +-
 drivers/input/keyboard/qt2160.c |3 +--
 drivers/input/mouse/synaptics_i2c.c |7 +--
 net/core/link_watch.c   |   21 ++---
 7 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4b4dbdf..4b8b606 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -319,10 +319,8 @@ EXPORT_SYMBOL(__blk_run_queue);
  */
 void blk_run_queue_async(struct request_queue *q)
 {
-   if (likely(!blk_queue_stopped(q))) {
-   __cancel_delayed_work(&q->delay_work);
-   queue_delayed_work(kblockd_workqueue, &q->delay_work, 0);
-   }
+   if (likely(!blk_queue_stopped(q)))
+   mod_delayed_work(kblockd_workqueue, &q->delay_work, 0);
 }
 EXPORT_SYMBOL(blk_run_queue_async);
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index e287c19..3d3dcae 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -930,12 +930,7 @@ throtl_schedule_delayed_work(struct throtl_data *td, 
unsigned long delay)
 
/* schedule work if limits changed even if no bio is queued */
if (total_nr_queued(td) || td->limits_changed) {
-   /*
-* We might have a work scheduled to be executed in future.
-* Cancel that and schedule a new one.
-*/
-   __cancel_delayed_work(dwork);
-   queue_delayed_work(kthrotld_workqueue, dwork, delay);
+   mod_delayed_work(kthrotld_workqueue, dwork, delay);
throtl_log(td, "schedule work. delay=%lu jiffies=%lu",
delay, jiffies);
}
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index a7d6347..55a5bc0 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -672,7 +672,6 @@ static void __reschedule_timeout(int drive, const char 
*message)
 
if (drive == current_reqD)
drive = current_drive;
-   __cancel_delayed_work(&fd_timeout);
 
if (drive < 0 || drive >= N_DRIVE) {
delay = 20UL * HZ;
@@ -680,7 +679,7 @@ static void __reschedule_timeout(int drive, const char 
*message)
} else
delay = UDP->timeout;
 
-   queue_delayed_work(floppy_wq, &fd_timeout, delay);
+   mod_delayed_work(floppy_wq, &fd_timeout, delay);
if (UDP->flags & FD_DEBUG)
DPRINT("reschedule timeout %s\n", message);
timeout_message = message;
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b0d0bc8..b593814 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2013,13 +2013,11 @@ static void adjust_timeout(struct ib_mad_agent_private 
*mad_agent_priv)
if (time_after(mad_agent_priv->timeout,
   mad_send_wr->timeout)) {
mad_agent_priv->timeout = mad_send_wr->timeout;
-   __cancel_delayed_work(&mad_agent_priv->timed_work);
delay = mad_send_wr->timeout - jiffies;
if ((long)delay <= 0)
delay = 1;
-   queue_delayed_work(mad_agent_priv->qp_info->
-  port_priv->wq,
-  &mad_agent_priv->timed_work, delay);
+   mod_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
+&mad_agent_priv->timed_work, delay);
}
}
 }
@@ -2052,11 +2050,9 @@ static void wait_for_response(struct 
ib_mad_send_wr_private *mad_send_wr)
list_add(&mad_send_wr->agent_list, list_item);
 
/* Reschedule a work item if we have a shorter timeout */
-   if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list) {
-   __cancel_delayed_work(&mad_agent_priv->timed_work);
-   queue_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
-  &mad_agent_priv->timed_work, delay);
-   }
+   if (mad_agent_priv->wait_list.next == &mad_send_wr->agent_list)
+   mod_delayed_work(mad_agent_priv->qp_info->port_priv->wq,
+

[PATCH 7/7] workqueue: deprecate __cancel_delayed_work()

2012-08-08 Thread Tejun Heo

Now that cancel_delayed_work() can be safely called from IRQ handlers,
there's no reason to use __cancel_delayed_work().  Use
cancel_delayed_work() instead of __cancel_delayed_work() and mark the
latter deprecated.

Signed-off-by: Tejun Heo 
Cc: Jens Axboe 
Cc: Jiri Kosina 
Cc: Roland Dreier 
Cc: Tomi Valkeinen 
---
 block/blk-core.c  |2 +-
 drivers/block/floppy.c|2 +-
 drivers/infiniband/core/mad.c |2 +-
 drivers/video/omap2/dss/dsi.c |2 +-
 include/linux/workqueue.h |2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4b8b606..dc04a90 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -262,7 +262,7 @@ EXPORT_SYMBOL(blk_start_queue);
  **/
 void blk_stop_queue(struct request_queue *q)
 {
-   __cancel_delayed_work(&q->delay_work);
+   cancel_delayed_work(&q->delay_work);
queue_flag_set(QUEUE_FLAG_STOPPED, q);
 }
 EXPORT_SYMBOL(blk_stop_queue);
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 55a5bc0..17c675c 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -890,7 +890,7 @@ static void unlock_fdc(void)
 
raw_cmd = NULL;
command_status = FD_COMMAND_NONE;
-   __cancel_delayed_work(&fd_timeout);
+   cancel_delayed_work(&fd_timeout);
do_floppy = NULL;
cont = NULL;
clear_bit(0, &fdc_busy);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index b593814..dc3fd1e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2004,7 +2004,7 @@ static void adjust_timeout(struct ib_mad_agent_private 
*mad_agent_priv)
unsigned long delay;
 
if (list_empty(&mad_agent_priv->wait_list)) {
-   __cancel_delayed_work(&mad_agent_priv->timed_work);
+   cancel_delayed_work(&mad_agent_priv->timed_work);
} else {
mad_send_wr = list_entry(mad_agent_priv->wait_list.next,
 struct ib_mad_send_wr_private,
diff --git a/drivers/video/omap2/dss/dsi.c b/drivers/video/omap2/dss/dsi.c
index fd40f26..05ee046 100644
--- a/drivers/video/omap2/dss/dsi.c
+++ b/drivers/video/omap2/dss/dsi.c
@@ -4306,7 +4306,7 @@ static void dsi_framedone_irq_callback(void *data, u32 
mask)
 * and is sending the data.
 */
 
-   __cancel_delayed_work(&dsi->framedone_timeout_work);
+   cancel_delayed_work(&dsi->framedone_timeout_work);
 
dsi_handle_framedone(dsidev, 0);
 }
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 6306157..6cd8f91 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -431,7 +431,7 @@ extern unsigned int work_busy(struct work_struct *work);
  * if it returns 0 the timer function may be running and the queueing is in
  * progress.
  */
-static inline bool __cancel_delayed_work(struct delayed_work *work)
+static inline bool __deprecated __cancel_delayed_work(struct delayed_work 
*work)
 {
bool ret;
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/7] workqueue: clean up delayed_work initializers and add missing one

2012-08-08 Thread Tejun Heo

Reimplement delayed_work initializers using new timer initializers
which take timer flags.  This reduces code duplications and will ease
further initializer changes.  This patch also adds a missing
initializer - INIT_DEFERRABLE_WORK_ONSTACK().

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |   48 +---
 1 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 1c1a65b..83755f4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -133,26 +133,20 @@ struct execute_work {
__WORK_INIT_LOCKDEP_MAP(#n, &(n))   \
}
 
-#define __DELAYED_WORK_INITIALIZER(n, f) { \
+#define __DELAYED_WORK_INITIALIZER(n, f, tflags) { \
.work = __WORK_INITIALIZER((n).work, (f)),  \
-   .timer = TIMER_INITIALIZER(delayed_work_timer_fn,   \
-   0, (unsigned long)&(n)),\
-   }
-
-#define __DEFERRABLE_WORK_INITIALIZER(n, f) {  \
-   .work = __WORK_INITIALIZER((n).work, (f)),  \
-   .timer = TIMER_DEFERRED_INITIALIZER(delayed_work_timer_fn,  \
-   0, (unsigned long)&(n)),\
+   .timer = __TIMER_INITIALIZER(delayed_work_timer_fn, \
+   0, (unsigned long)&(n), (tflags)),  \
}
 
 #define DECLARE_WORK(n, f) \
struct work_struct n = __WORK_INITIALIZER(n, f)
 
 #define DECLARE_DELAYED_WORK(n, f) \
-   struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f)
+   struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, 0)
 
 #define DECLARE_DEFERRABLE_WORK(n, f)  \
-   struct delayed_work n = __DEFERRABLE_WORK_INITIALIZER(n, f)
+   struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f, 
TIMER_DEFERRABLE)
 
 /*
  * initialize a work item's function pointer
@@ -216,29 +210,33 @@ static inline unsigned int work_static(struct work_struct 
*work) { return 0; }
__INIT_WORK((_work), (_func), 1);   \
} while (0)
 
-#define INIT_DELAYED_WORK(_work, _func)
\
+#define __INIT_DELAYED_WORK(_work, _func, _tflags) \
do {\
INIT_WORK(&(_work)->work, (_func)); \
-   init_timer(&(_work)->timer);\
-   (_work)->timer.function = delayed_work_timer_fn;\
-   (_work)->timer.data = (unsigned long)(_work);   \
+   __setup_timer(&(_work)->timer, delayed_work_timer_fn,   \
+ (unsigned long)(_work), (_tflags));   \
} while (0)
 
-#define INIT_DELAYED_WORK_ONSTACK(_work, _func)
\
+#define __INIT_DELAYED_WORK_ONSTACK(_work, _func, _tflags) \
do {\
INIT_WORK_ONSTACK(&(_work)->work, (_func)); \
-   init_timer_on_stack(&(_work)->timer);   \
-   (_work)->timer.function = delayed_work_timer_fn;\
-   (_work)->timer.data = (unsigned long)(_work);   \
+   __setup_timer_on_stack(&(_work)->timer, \
+  delayed_work_timer_fn,   \
+  (unsigned long)(_work),  \
+  (_tflags));  \
} while (0)
 
+#define INIT_DELAYED_WORK(_work, _func)
\
+   __INIT_DELAYED_WORK(_work, _func, 0)
+
+#define INIT_DELAYED_WORK_ONSTACK(_work, _func)
\
+   __INIT_DELAYED_WORK_ONSTACK(_work, _func, 0)
+
 #define INIT_DEFERRABLE_WORK(_work, _func) \
-   do {\
-   INIT_WORK(&(_work)->work, (_func)); \
-   init_timer_deferrable(&(_work)->timer); \
-   (_work)->timer.function = delayed_work_timer_fn;\
-   (_work)->timer.data = (unsigned long)(_work);   \
-   } while (0)
+   __INIT_DELAYED_WORK(_work, _func, TIMER_DEFERRABLE)
+
+#define INIT_DEFERRABLE_WORK_ONSTACK(_work, _func) \
+   __INIT_DELAYED_WORK_ONSTACK(_work, _func, TIMER_DEFERRABLE)
 
 /**
  * work_pending - Find out whether a work item is currently pending
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
t

[PATCH 1/7] workqueue: cosmetic whitespace updates for macro definitions

2012-08-08 Thread Tejun Heo

Consistently use the last tab position for '\' line continuation in
complex macro definitions.  This is to help the following patches.

This patch is cosmetic.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |  126 ++--
 1 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index b14d5d5..0b94714 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -126,43 +126,43 @@ struct execute_work {
 #define __WORK_INIT_LOCKDEP_MAP(n, k)
 #endif
 
-#define __WORK_INITIALIZER(n, f) { \
-   .data = WORK_DATA_STATIC_INIT(),\
-   .entry  = { &(n).entry, &(n).entry },   \
-   .func = (f),\
-   __WORK_INIT_LOCKDEP_MAP(#n, &(n))   \
+#define __WORK_INITIALIZER(n, f) { \
+   .data = WORK_DATA_STATIC_INIT(),\
+   .entry  = { &(n).entry, &(n).entry },   \
+   .func = (f),\
+   __WORK_INIT_LOCKDEP_MAP(#n, &(n))   \
}
 
-#define __DELAYED_WORK_INITIALIZER(n, f) { \
-   .work = __WORK_INITIALIZER((n).work, (f)),  \
-   .timer = TIMER_INITIALIZER(delayed_work_timer_fn,   \
-   0, (unsigned long)&(n)),\
+#define __DELAYED_WORK_INITIALIZER(n, f) { \
+   .work = __WORK_INITIALIZER((n).work, (f)),  \
+   .timer = TIMER_INITIALIZER(delayed_work_timer_fn,   \
+   0, (unsigned long)&(n)),\
}
 
-#define __DEFERRED_WORK_INITIALIZER(n, f) {\
-   .work = __WORK_INITIALIZER((n).work, (f)),  \
-   .timer = TIMER_DEFERRED_INITIALIZER(delayed_work_timer_fn, \
-   0, (unsigned long)&(n)),\
+#define __DEFERRED_WORK_INITIALIZER(n, f) {\
+   .work = __WORK_INITIALIZER((n).work, (f)),  \
+   .timer = TIMER_DEFERRED_INITIALIZER(delayed_work_timer_fn,  \
+   0, (unsigned long)&(n)),\
}
 
-#define DECLARE_WORK(n, f) \
+#define DECLARE_WORK(n, f) \
struct work_struct n = __WORK_INITIALIZER(n, f)
 
-#define DECLARE_DELAYED_WORK(n, f) \
+#define DECLARE_DELAYED_WORK(n, f) \
struct delayed_work n = __DELAYED_WORK_INITIALIZER(n, f)
 
-#define DECLARE_DEFERRED_WORK(n, f)\
+#define DECLARE_DEFERRED_WORK(n, f)\
struct delayed_work n = __DEFERRED_WORK_INITIALIZER(n, f)
 
 /*
  * initialize a work item's function pointer
  */
-#define PREPARE_WORK(_work, _func) \
-   do {\
-   (_work)->func = (_func);\
+#define PREPARE_WORK(_work, _func) \
+   do {\
+   (_work)->func = (_func);\
} while (0)
 
-#define PREPARE_DELAYED_WORK(_work, _func) \
+#define PREPARE_DELAYED_WORK(_work, _func) \
PREPARE_WORK(&(_work)->work, (_func))
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -192,7 +192,7 @@ static inline unsigned int work_static(struct work_struct 
*work) { return 0; }
\
__init_work((_work), _onstack); \
(_work)->data = (atomic_long_t) WORK_DATA_INIT();   \
-   lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0);\
+   lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0); \
INIT_LIST_HEAD(&(_work)->entry);\
PREPARE_WORK((_work), (_func)); \
} while (0)
@@ -206,38 +206,38 @@ static inline unsigned int work_static(struct work_struct 
*work) { return 0; }
} while (0)
 #endif
 
-#define INIT_WORK(_work, _func)\
-   do {\
-   __INIT_WORK((_work), (_func), 0);   \
+#define INIT_WORK(_work, _func)
\
+   do {\
+   __INIT_WORK((_work), (_func), 0);   \
} while (

[PATCHSET] workqueue: use irqsafe timer in delayed_work

2012-08-08 Thread Tejun Heo

Hello,

Because IRQs can happen between delayed_work->timer being dispatched
and delayed_work_timer_fn() actually queueing delayed_work->work,
try_to_grab_pending() couldn't be used from IRQ handlers.  If it hits
the window, it will return -EAGAIN perpetually.  This makes it
impossible to steal PENDING from IRQ handlers using
try_to_grab_pending() leading to the following issues.

* mod_delayed_work() can't be used from IRQ handlers.

* __cancel_delayed_work() can't use the usual try_to_grab_pending()
  which handles all three states but instead only deals with the first
  state using a separate implementation.  There's no way to make a
  delayed_work not pending from IRQ handlers.

* The context / behavior differences among cancel_delayed_work(),
  __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and
  confusing (the first two are mostly historical tho).

This patchset makes delayed_work use the irqsafe timer added by the
pending "timer: clean up initializers and implement irqsafe timers"
patchset[1].  This enables try_to_grab_pending() to be used from any
context which in turn makes mod_delayed_work() usable from IRQ
handlers.  cancel_delayed_work() is reimplemented using
try_to_grab_pending() so that it also can be used from IRQ handlers
and its behavior is consitent with other canceling operations.
__cancel_delayed_work() is no longer necessary and deprecated.

 0001-workqueue-cosmetic-whitespace-updates-for-macro-defi.patch
 0002-workqueue-make-deferrable-delayed_work-initializer-n.patch
 0003-workqueue-clean-up-delayed_work-initializers-and-add.patch
 0004-workqueue-use-irqsafe-timer-for-delayed_work.patch
 0005-workqueue-use-mod_delayed_work-instead-of-__cancel-q.patch
 0006-workqueue-reimplement-cancel_delayed_work-using-try_.patch
 0007-workqueue-deprecate-__cancel_delayed_work.patch

0001-0003 are prep patches.

0004 makes delayed_work use irqsafe timers.  This makes
try_to_grab_pending() and mod_delayed_work() safe to use from any
context.

0005 converts all __cancel_delayed_work() + queue_delayed_work()
sequences to mod_delayed_work().  The link_watch conversion needs
David's ack.

0006 reimplements cancel_delayed_work() using try_to_grab_pending().

0007 replaces __cancel_delayed_work() calls with cancel_delayed_work()
and deprecates the underscored one.

This patchset is on top of

  [2] wq/for-3.7 (8fcd63664f "workqueue: fix CPU binding of flush_delayed...")
+ [1] timer: clean up initializers and implement irqsafe timers

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git 
review-delayed_work-irqsafe

diffstat follows.

 arch/powerpc/platforms/cell/cpufreq_spudemand.c |2 
 block/blk-core.c|8 -
 block/blk-throttle.c|7 -
 drivers/block/floppy.c  |5 
 drivers/cpufreq/cpufreq_conservative.c  |2 
 drivers/cpufreq/cpufreq_ondemand.c  |2 
 drivers/devfreq/devfreq.c   |2 
 drivers/infiniband/core/mad.c   |   16 --
 drivers/input/keyboard/qt2160.c |3 
 drivers/input/mouse/synaptics_i2c.c |7 -
 drivers/net/ethernet/mellanox/mlx4/sense.c  |2 
 drivers/power/ab8500_btemp.c|2 
 drivers/power/ab8500_charger.c  |8 -
 drivers/power/ab8500_fg.c   |8 -
 drivers/power/abx500_chargalg.c |4 
 drivers/power/max17040_battery.c|2 
 drivers/video/omap2/displays/panel-taal.c   |6 
 drivers/video/omap2/dss/dsi.c   |6 
 include/linux/workqueue.h   |  155 ++--
 kernel/workqueue.c  |   50 ++-
 mm/slab.c   |2 
 mm/vmstat.c |2 
 net/core/link_watch.c   |   21 ---
 net/core/neighbour.c|2 
 net/ipv4/inetpeer.c |2 
 net/sunrpc/cache.c  |2 
 26 files changed, 159 insertions(+), 169 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1340224
[2] git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 18/19] sched, numa: Per task memory placement for big processes

2012-08-08 Thread Peter Zijlstra

On Tue, 2012-07-31 at 21:12 +0200, Peter Zijlstra wrote:
> +#ifdef CONFIG_NUMA
> +   /*
> +* XXX fold this into flags for 64bit or so...
> +*/
> +   int nid_last;
> +#endif 

Something like the below? I still ought to update all the various
comments about page flag layout etc..

Also, that #warning gives a very noisy build indeed, I guess we should
either make it silent or increase the page frame size for those
configs.. 32bit NUMA is quite rare for normal people (sorry Paul) :)

---
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -611,10 +611,19 @@ static inline pte_t maybe_mkwrite(pte_t 
 #define NODES_WIDTH0
 #endif
 
+#if NODES_WIDTH && (SECTIONS_WIDTH+ZONES_WIDTH+2*NODES_SHIFT <= BITS_PER_LONG 
- NR_PAGEFLAGS)
+#define LAST_NID_WIDTH NODES_SHIFT
+#else
+#warning "faking page_xchg_last_nid"
+#define LAST_NID_NOT_IN_PAGE_FLAGS
+#define LAST_NID_WIDTH 0
+#endif
+
 /* Page flags: | [SECTION] | [NODE] | ZONE | ... | FLAGS | */
 #define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH)
 #define NODES_PGOFF(SECTIONS_PGOFF - NODES_WIDTH)
 #define ZONES_PGOFF(NODES_PGOFF - ZONES_WIDTH)
+#define LAST_NID_PGOFF (ZONES_PGOFF - LAST_NID_WIDTH)
 
 /*
  * We are going to use the flags for the page to node mapping if its in
@@ -632,6 +641,7 @@ static inline pte_t maybe_mkwrite(pte_t 
 #define SECTIONS_PGSHIFT   (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0))
 #define NODES_PGSHIFT  (NODES_PGOFF * (NODES_WIDTH != 0))
 #define ZONES_PGSHIFT  (ZONES_PGOFF * (ZONES_WIDTH != 0))
+#define LAST_NID_PGSHIFT   (LAST_NID_PGOFF * (LAST_NID_WIDTH != 0))
 
 /* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
 #ifdef NODE_NOT_IN_PAGE_FLAGS
@@ -653,6 +663,7 @@ static inline pte_t maybe_mkwrite(pte_t 
 #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1)
 #define NODES_MASK ((1UL << NODES_WIDTH) - 1)
 #define SECTIONS_MASK  ((1UL << SECTIONS_WIDTH) - 1)
+#define LAST_NID_MASK  ((1UL << LAST_NID_WIDTH) - 1)
 #define ZONEID_MASK((1UL << ZONEID_SHIFT) - 1)
 
 static inline enum zone_type page_zonenum(const struct page *page)
@@ -691,6 +702,39 @@ static inline int page_to_nid(const stru
 }
 #endif
 
+#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
+static inline int page_xchg_last_nid(struct page *page, int nid)
+{
+   return nid; /* fakin' it */
+}
+
+static inline int page_last_nid(struct page *page)
+{
+   return page_to_nid(page);
+}  
+#else
+static inline int page_xchg_last_nid(struct page *page, int nid)
+{
+   unsigned long old_flags, flags;
+   int last_nid;
+
+   old_flags = flags = page->flags;
+   last_nid = (flags >> LAST_NID_PGSHIFT) & LAST_NID_MASK;
+
+   flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT);
+   flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT;
+
+   (void)cmpxchg(&page->flags, old_flags, flags);
+
+   return last_nid;
+}
+
+static inline int page_last_nid(struct page *page)
+{
+   return (page->flags >> LAST_NID_PGSHIFT) & LAST_NID_MASK;
+}
+#endif /* LAST_NID_NOT_IN_PAGE_FLAGS */
+
 static inline struct zone *page_zone(const struct page *page)
 {
return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -176,12 +176,6 @@ struct page {
 */
void *shadow;
 #endif
-#ifdef CONFIG_NUMA
-   /*
-* XXX fold this into flags for 64bit or so...
-*/
-   int nid_last;
-#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1366,6 +1366,7 @@ static void __split_huge_page_refcount(s
page_tail->mapping = page->mapping;
 
page_tail->index = page->index + i;
+   page_xchg_last_nid(page, page_last_nid(tail_page));
page_tail->nid_last = page->nid_last;
 
BUG_ON(!PageAnon(page_tail));
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2265,10 +2265,9 @@ int mpol_misplaced(struct page *page, st
 * task_tick_numa().
 */
if (multi && (pol->flags & MPOL_F_HOME)) {
-   if (page->nid_last != polnid) {
-   page->nid_last = polnid;
+   int last_nid = page_xchg_last_nid(page, polnid);
+   if (last_nid != polnid)
goto out;
-   }
}
 
if (curnid != polnid)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] tun: don't zeroize sock->file on detach

2012-08-08 Thread David Miller

From: Yuchung Cheng 
Date: Wed, 8 Aug 2012 10:48:32 -0700

> On Wed, Aug 8, 2012 at 5:53 AM, Stanislav Kinsbursky
>  wrote:
>> Hi, Dave.
>> What about this patch?
>>
>>
>> On Wed, Jul 11, 2012 at 03:48:20PM +0400, Stanislav Kinsbursky wrote:
>>>
>>> This is a fix for bug, introduced in 3.4 kernel by commit
>>> 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d, which, among other things,
>>> replaced
>>> simple sock_put() by sk_release_kernel(). Below is sequence, which leads
>>> to
>>> oops for non-persistent devices:
>>>
>>> tun_chr_close()
>>> tun_detach()<== tun->socket.file = NULL
>>> tun_free_netdev()
>>> sk_release_sock()
>>> sock_release(sock->file == NULL)
>>> iput(SOCK_INODE(sock))  <== dereference on NULL pointer
>>>
>>> This patch just removes zeroing of socket's file from __tun_detach().
>>> sock_release() will do this.
>>>
>>> Signed-off-by: Stanislav Kinsbursky 
> Acked-by: Yuchung Cheng 
> 
> I has tested this patch and it works (so my kernel stops crashing
> using tun devices).

This patch needs to be formally resubmitted to netdev.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] irq_remap: disable IRQ remapping if any IOAPIC lacks an IOMMU

2012-08-08 Thread Seth Forshee

On Wed, Aug 08, 2012 at 10:57:06AM -0700, Yinghai Lu wrote:
> On Wed, Aug 8, 2012 at 6:27 AM, Seth Forshee  
> wrote:
> > The ACPI tables in the Macbook Air 5,1 define a single IOAPIC with id 2,
> > but the only remapping unit described in the DMAR table matches id 0.
> > Interrupt remapping fails as a result, and the kernel panics with the
> > message "timer doesn't work through Interrupt-remapped IO-APIC."
> >
> > To fix this, check each IOAPIC for a corresponding IOMMU. If an IOMMU is
> > not found, do not allow IRQ remapping to be enabled.
> >
> > v2: Move check to parse_ioapics_under_ir(), raise log level to KERN_ERR,
> > and add FW_BUG to the log message
> > v3: Skip check if IOMMU doesn't support interrupt remapping and remove
> > existing check that the IOMMU count equals the IOAPIC count
> >
> 
> Acked-by: Yinghai Lu 

Thanks!

I'm not sure whose tree this goes through, but it occurred to me that it
might be good to get this fixed in the stable kernels as well. Whoever
applies the patch might consider adding a Cc for stable, otherwise I can
submit it after it hits Linus's tree.

Thanks,
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/10] ARM: ks8695: __arch_virt_to_dma type handling

2012-08-08 Thread Arnd Bergmann

__arch_virt_to_dma expects a virtual address pointer, but
the ks8695 implementation of this macro treats it as an
integer. Adding a type cast avoids hundreds of identical
warning messages.

Without this patch, building acs5k_defconfig results in:

arch/arm/include/asm/dma-mapping.h: In function 'virt_to_dma':
arch/arm/include/asm/dma-mapping.h:60:2: warning: passing argument 1 of 
'__virt_to_phys' makes integer from pointer without a cast [enabled by default]
arch/arm/include/asm/memory.h:172:60: note: expected 'long unsigned int' but 
argument is of type 'void *'
In file included from include/linux/dma-mapping.h:73:0,
 from include/linux/skbuff.h:33,
 from security/commoncap.c:21:

Signed-off-by: Arnd Bergmann 
---
 arch/arm/mach-ks8695/include/mach/memory.h |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-ks8695/include/mach/memory.h 
b/arch/arm/mach-ks8695/include/mach/memory.h
index f7e1b9b..95e731a 100644
--- a/arch/arm/mach-ks8695/include/mach/memory.h
+++ b/arch/arm/mach-ks8695/include/mach/memory.h
@@ -34,7 +34,8 @@ extern struct bus_type platform_bus_type;
 #define __arch_dma_to_virt(dev, x) ({ (void *) (is_lbus_device(dev) ? \
__phys_to_virt(x) : __bus_to_virt(x)); 
})
 #define __arch_virt_to_dma(dev, x) ({ is_lbus_device(dev) ? \
-   (dma_addr_t)__virt_to_phys(x) : 
(dma_addr_t)__virt_to_bus(x); })
+   (dma_addr_t)__virt_to_phys((unsigned 
long)x) \
+   : (dma_addr_t)__virt_to_bus(x); })
 #define __arch_pfn_to_dma(dev, pfn)\
({ dma_addr_t __dma = __pfn_to_phys(pfn); \
   if (!is_lbus_device(dev)) \
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/10] ARM: footbridge: nw_gpio_lock is raw_spin_lock

2012-08-08 Thread Arnd Bergmann

bd31b85960a "locking, ARM: Annotate low level hw locks as raw"
made nw_gpio_lock a raw spinlock, but did not change all the
users in device drivers. This fixes the remaining ones.

sound/oss/waveartist.c: In function 'vnc_mute_spkr':
sound/oss/waveartist.c:1485:2: warning: passing argument 1 of 'spinlock_check' 
from incompatible pointer type [enabled by default]
include/linux/spinlock.h:272:102: note: expected 'struct spinlock_t *' but 
argument is of type 'struct raw_spinlock_t *'
drivers/char/ds1620.c: In function 'netwinder_lock':
drivers/char/ds1620.c:77:2: warning: passing argument 1 of 'spinlock_check' 
from incompatible pointer type [enabled by default]
include/linux/spinlock.h:272:102: note: expected 'struct spinlock_t *' but 
argument is of type 'struct raw_spinlock_t *'
drivers/char/nwflash.c: In function 'kick_open':
drivers/char/nwflash.c:620:2: warning: passing argument 1 of 'spinlock_check' 
from incompatible pointer type [enabled by default]
include/linux/spinlock.h:272:102: note: expected 'struct spinlock_t *' but 
argument is of type 'struct raw_spinlock_t *'

Signed-off-by: Arnd Bergmann 
Cc: Thomas Gleixner 
Cc: Russell King 
---
 drivers/char/ds1620.c  |8 
 drivers/char/nwflash.c |4 ++--
 sound/oss/waveartist.c |4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/char/ds1620.c b/drivers/char/ds1620.c
index aab9605..24ffd8c 100644
--- a/drivers/char/ds1620.c
+++ b/drivers/char/ds1620.c
@@ -74,21 +74,21 @@ static inline void netwinder_ds1620_reset(void)
 
 static inline void netwinder_lock(unsigned long *flags)
 {
-   spin_lock_irqsave(&nw_gpio_lock, *flags);
+   raw_spin_lock_irqsave(&nw_gpio_lock, *flags);
 }
 
 static inline void netwinder_unlock(unsigned long *flags)
 {
-   spin_unlock_irqrestore(&nw_gpio_lock, *flags);
+   raw_spin_unlock_irqrestore(&nw_gpio_lock, *flags);
 }
 
 static inline void netwinder_set_fan(int i)
 {
unsigned long flags;
 
-   spin_lock_irqsave(&nw_gpio_lock, flags);
+   raw_spin_lock_irqsave(&nw_gpio_lock, flags);
nw_gpio_modify_op(GPIO_FAN, i ? GPIO_FAN : 0);
-   spin_unlock_irqrestore(&nw_gpio_lock, flags);
+   raw_spin_unlock_irqrestore(&nw_gpio_lock, flags);
 }
 
 static inline int netwinder_get_fan(void)
diff --git a/drivers/char/nwflash.c b/drivers/char/nwflash.c
index d45c334..04e2a94 100644
--- a/drivers/char/nwflash.c
+++ b/drivers/char/nwflash.c
@@ -617,9 +617,9 @@ static void kick_open(void)
 * we want to write a bit pattern XXX1 to Xilinx to enable
 * the write gate, which will be open for about the next 2ms.
 */
-   spin_lock_irqsave(&nw_gpio_lock, flags);
+   raw_spin_lock_irqsave(&nw_gpio_lock, flags);
nw_cpld_modify(CPLD_FLASH_WR_ENABLE, CPLD_FLASH_WR_ENABLE);
-   spin_unlock_irqrestore(&nw_gpio_lock, flags);
+   raw_spin_unlock_irqrestore(&nw_gpio_lock, flags);
 
/*
 * let the ISA bus to catch on...
diff --git a/sound/oss/waveartist.c b/sound/oss/waveartist.c
index 24c430f..672af8b 100644
--- a/sound/oss/waveartist.c
+++ b/sound/oss/waveartist.c
@@ -1482,9 +1482,9 @@ vnc_mute_spkr(wavnc_info *devc)
 {
unsigned long flags;
 
-   spin_lock_irqsave(&nw_gpio_lock, flags);
+   raw_spin_lock_irqsave(&nw_gpio_lock, flags);
nw_cpld_modify(CPLD_UNMUTE, devc->spkr_mute_state ? 0 : CPLD_UNMUTE);
-   spin_unlock_irqrestore(&nw_gpio_lock, flags);
+   raw_spin_unlock_irqrestore(&nw_gpio_lock, flags);
 }
 
 static void
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/10] leds: renesas: fix error handling

2012-08-08 Thread Arnd Bergmann

bfe4c041 "leds: convert Renesas TPU LED driver to devm_kzalloc() and
cleanup error exit path" introduced a possible case in which r_tpu_probe
calls iounmap on a wild pointer. This changes the one case that was
missed in the same way as the other error paths.

Without this patch, building kota2_defconfig results in:

drivers/leds/leds-renesas-tpu.c: In function 'r_tpu_probe':
drivers/leds/leds-renesas-tpu.c:246:6: warning: 'ret' may be used uninitialized 
in this function [-Wuninitialized]
drivers/leds/leds-renesas-tpu.c:308:17: warning: 'p' may be used uninitialized 
in this function [-Wuninitialized]

Signed-off-by: Arnd Bergmann 
Cc: Bryan Wu 
Cc: Magnus Damm 

--- a/drivers/leds/leds-renesas-tpu.c
+++ b/drivers/leds/leds-renesas-tpu.c
@@ -247,7 +247,7 @@ static int __devinit r_tpu_probe(struct platform_device 
*pdev)

if (!cfg) {
dev_err(&pdev->dev, "missing platform data\n");
-   goto err0;
+   return -ENODEV;
}

p = devm_kzalloc(&pdev->dev, sizeof(*p), GFP_KERNEL);
---
 drivers/leds/leds-renesas-tpu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/leds/leds-renesas-tpu.c b/drivers/leds/leds-renesas-tpu.c
index 9ee12c2..771ea06 100644
--- a/drivers/leds/leds-renesas-tpu.c
+++ b/drivers/leds/leds-renesas-tpu.c
@@ -247,7 +247,7 @@ static int __devinit r_tpu_probe(struct platform_device 
*pdev)
 
if (!cfg) {
dev_err(&pdev->dev, "missing platform data\n");
-   goto err0;
+   return -ENODEV;
}
 
p = devm_kzalloc(&pdev->dev, sizeof(*p), GFP_KERNEL);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/10] asm-generic: xor: mark static functions as __maybe_unused

2012-08-08 Thread Arnd Bergmann

The asm-generic/xor.h header file is nasty and defines static functions
that are not inline. The header file is include by the ARM version of
asm/xor.h, which uses some but not all of the symbols defined there.

Marking the extraneous functions as __maybe_unused lets gcc drop them
without complaining.

Without this patch, building iop13xx_defconfig results in:

include/asm-generic/xor.h:696:34: warning: 'xor_block_8regs_p' defined but not 
used [-Wunused-variable]
include/asm-generic/xor.h:704:34: warning: 'xor_block_32regs_p' defined but not 
used [-Wunused-variable]

Signed-off-by: Arnd Bergmann 
Cc: Herbert Xu 
Cc: Dan Williams 
Cc: Neil Brown 
---
 include/asm-generic/xor.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
index 6028fb8..b4d8432 100644
--- a/include/asm-generic/xor.h
+++ b/include/asm-generic/xor.h
@@ -693,7 +693,7 @@ static struct xor_block_template xor_block_32regs = {
.do_5 = xor_32regs_5,
 };
 
-static struct xor_block_template xor_block_8regs_p = {
+static struct xor_block_template xor_block_8regs_p __maybe_unused = {
.name = "8regs_prefetch",
.do_2 = xor_8regs_p_2,
.do_3 = xor_8regs_p_3,
@@ -701,7 +701,7 @@ static struct xor_block_template xor_block_8regs_p = {
.do_5 = xor_8regs_p_5,
 };
 
-static struct xor_block_template xor_block_32regs_p = {
+static struct xor_block_template xor_block_32regs_p __maybe_unused = {
.name = "32regs_prefetch",
.do_2 = xor_32regs_p_2,
.do_3 = xor_32regs_p_3,
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/10] ARM: pass -marm to gcc by default

2012-08-08 Thread Arnd Bergmann

The Linaro cross toolchain and probably others nowadays default to
building in THUMB2 mode. When building a kernel for a CPU that does
not support THUMB2, the compiler complains about incorrect flags.
We can work around this by setting -marm for all non-T2 builds.

Without this patch, building assabet_defconfig results in:

usr/initramfs_data.S:1:0: warning: target CPU does not support THUMB 
instructions [enabled by default]
arch/arm/nwfpe/entry.S:1:0: warning: target CPU does not support THUMB 
instructions [enabled by default]
firmware/cis/PCMLM28.cis.gen.S:1:0: warning: target CPU does not support THUMB 
instructions [enabled by default]
(and many more)

Signed-off-by: Arnd Bergmann 
Cc: Russell King 
Cc: Dave Martin 
---
 arch/arm/Makefile |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..b4c2296 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -111,6 +111,9 @@ AFLAGS_THUMB2   :=$(CFLAGS_THUMB2) -Wa$(comma)-mthumb
 ifeq ($(CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11),y)
 CFLAGS_MODULE  +=-fno-optimize-sibling-calls
 endif
+else
+CFLAGS_THUMB2  :=-marm
+AFLAGS_THUMB2  :=-marm
 endif
 
 # Need -Uarm for gcc < 3.x
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/10] ARM: interesting warnings from defconfig builds

2012-08-08 Thread Arnd Bergmann

Most of these have been around for quite a while, but I think we
should fix them nonetheless. In some cases, I'm not very sure about
my solution, so I'd appreciate any ACK or NAK I can get.

Arnd

Arnd Bergmann (10):
  ARM: footbridge: nw_gpio_lock is raw_spin_lock
  ARM: ks8695: __arch_virt_to_dma type handling
  ARM: mv78xx0: fix win_cfg_base prototype
  ARM: pass -marm to gcc by default
  asm-generic: xor: mark static functions as __maybe_unused
  ARM: davinci: don't use broken ntosd2_init_i2c
  ARM: rpc: check device_register return code in ecard_probe
  ARM: s3c24xx: enable CONFIG_BUG for tct_hammer
  ARM: rpc: Fix building RiscPC
  leds: renesas: fix error handling

 arch/arm/Kconfig   |2 +-
 arch/arm/Makefile  |4 +++-
 arch/arm/configs/tct_hammer_defconfig  |2 +-
 arch/arm/mach-davinci/board-neuros-osd2.c  |7 +++
 arch/arm/mach-ks8695/include/mach/memory.h |3 ++-
 arch/arm/mach-mv78xx0/addr-map.c   |2 +-
 arch/arm/mach-rpc/ecard.c  |4 +++-
 arch/arm/mm/Kconfig|   12 ++--
 drivers/char/ds1620.c  |8 
 drivers/char/nwflash.c |4 ++--
 drivers/leds/leds-renesas-tpu.c|2 +-
 include/asm-generic/xor.h  |4 ++--
 sound/oss/waveartist.c |4 ++--
 13 files changed, 31 insertions(+), 27 deletions(-)

-- 
1.7.10

Cc: Thomas Gleixner 
Cc: Russell King 
Cc: Andrew Lunn 
Cc: Michael Walle 
Cc: Nicolas Pitre 
Cc: Russell King 
Cc: Dave Martin 
Cc: Herbert Xu 
Cc: Dan Williams 
Cc: Neil Brown 
Cc: Kevin Hilman 
Cc: Sekhar Nori 
Cc: Andrey Porodko 
Cc: Russell King 
Cc: Kukjin Kim 
Cc: Ben Dooks 
Cc: Russell King 
Cc: Bryan Wu 
Cc: Magnus Damm 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/10] ARM: mv78xx0: fix win_cfg_base prototype

2012-08-08 Thread Arnd Bergmann

Patch b6d1c33a31 "ARM: Orion: Consolidate the address map setup" tried
to merge the address map for the four orion platforms, but apparently
got it wrong for mv78xx0. Admittedly I don't understand what this
code actually does, but it's clear that the current version is
wrong.

Without this patch, building mv78xx0_defconfig results in:

arch/arm/mach-mv78xx0/addr-map.c:59:2: warning: initialization from 
incompatible pointer type [enabled by default]
arch/arm/mach-mv78xx0/addr-map.c:59:2: warning: (near initialization for 
'addr_map_cfg.win_cfg_base') [enabled by default]

Signed-off-by: Arnd Bergmann 
Cc: Andrew Lunn 
Cc: Michael Walle 
Cc: Nicolas Pitre 
---
 arch/arm/mach-mv78xx0/addr-map.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-mv78xx0/addr-map.c b/arch/arm/mach-mv78xx0/addr-map.c
index 62b53d7..a9bc841 100644
--- a/arch/arm/mach-mv78xx0/addr-map.c
+++ b/arch/arm/mach-mv78xx0/addr-map.c
@@ -37,7 +37,7 @@
 #define WIN0_OFF(n)(BRIDGE_VIRT_BASE + 0x + ((n) << 4))
 #define WIN8_OFF(n)(BRIDGE_VIRT_BASE + 0x0900 + (((n) - 8) << 4))
 
-static void __init __iomem *win_cfg_base(int win)
+static void __init __iomem *win_cfg_base(const struct orion_addr_map_cfg *cfg, 
int win)
 {
/*
 * Find the control register base address for this window.
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/10] ARM: rpc: Fix building RiscPC

2012-08-08 Thread Arnd Bergmann

ARMv3 support was removed in 357c9c1f07 "ARM: Remove support for ARMv3
ARM610 and ARM710 CPUs", which explicitly left parts of the CPU32v3
support in place for building RiscPC. However, this does not actually
build in my test setup.

This is probably not the right solution, but maybe someone has a better
idea for how to deal with this.

Without this patch, building rpc_defconfig results in:

arch/arm/lib/io-readsw-armv4.S: Assembler messages:
arch/arm/lib/io-readsw-armv4.S:23: Error: selected processor does not support 
ARM mode `ldrh ip,[r0]'
arch/arm/lib/io-readsw-armv4.S:25: Error: selected processor does not support 
ARM mode `strh ip,[r1],#2'
arch/arm/lib/io-readsw-armv4.S:38: Error: selected processor does not support 
ARM mode `ldrh r3,[r0]'
make[2]: *** [arch/arm/lib/io-readsw-armv4.o] Error 1
make[1]: *** [arch/arm/lib] Error 2

Signed-off-by: Arnd Bergmann 
Cc: Russell King 
---
 arch/arm/Kconfig|2 +-
 arch/arm/Makefile   |1 -
 arch/arm/mm/Kconfig |   12 ++--
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e91c7cd..1e435185 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2259,7 +2259,7 @@ config FPE_NWFPE_XP
 
 config FPE_FASTFPE
bool "FastFPE math emulation (EXPERIMENTAL)"
-   depends on (!AEABI || OABI_COMPAT) && !CPU_32v3 && EXPERIMENTAL
+   depends on (!AEABI || OABI_COMPAT) && EXPERIMENTAL
---help---
  Say Y here to include the FAST floating point emulator in the kernel.
  This is an experimental much faster emulator which now also has full
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index b4c2296..2c53344 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -70,7 +70,6 @@ endif
 arch-$(CONFIG_CPU_32v5):=-D__LINUX_ARM_ARCH__=5 $(call 
cc-option,-march=armv5te,-march=armv4t)
 arch-$(CONFIG_CPU_32v4T)   :=-D__LINUX_ARM_ARCH__=4 -march=armv4t
 arch-$(CONFIG_CPU_32v4):=-D__LINUX_ARM_ARCH__=4 -march=armv4
-arch-$(CONFIG_CPU_32v3):=-D__LINUX_ARM_ARCH__=3 -march=armv3
 
 # This selects how we optimise for the processor.
 tune-$(CONFIG_CPU_ARM7TDMI):=-mtune=arm7tdmi
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 101b968..28773e6 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -265,8 +265,7 @@ config CPU_ARM1026
 # SA110
 config CPU_SA110
bool "Support StrongARM(R) SA-110 processor" if ARCH_RPC
-   select CPU_32v3 if ARCH_RPC
-   select CPU_32v4 if !ARCH_RPC
+   select CPU_32v4
select CPU_ABRT_EV4
select CPU_PABRT_LEGACY
select CPU_CACHE_V4WB
@@ -395,12 +394,6 @@ config CPU_V7
 
 # Figure out what processor architecture version we should be using.
 # This defines the compiler instruction set which depends on the machine type.
-config CPU_32v3
-   bool
-   select TLS_REG_EMUL if SMP || !MMU
-   select NEEDS_SYSCALL_FOR_CMPXCHG if SMP
-   select CPU_USE_DOMAINS if MMU
-
 config CPU_32v4
bool
select TLS_REG_EMUL if SMP || !MMU
@@ -587,8 +580,7 @@ comment "Processor Features"
 
 config ARM_LPAE
bool "Support for the Large Physical Address Extension"
-   depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
-   !CPU_32v4 && !CPU_32v3
+   depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && !CPU_32v4
help
  Say Y if you have an ARMv7 processor supporting the LPAE page
  table format and you would like to access memory beyond the
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/10] ARM: rpc: check device_register return code in ecard_probe

2012-08-08 Thread Arnd Bergmann

device_register is marked __must_check, so we better propagate the error
value by returning it from ecard_probe.

Without this patch, building rpc_defconfig results in:

arch/arm/mach-rpc/ecard.c: In function 'ecard_probe':
arch/arm/mach-rpc/ecard.c:963:17: warning: ignoring return value of 
'device_register', declared with attribute warn_unused_result [-Wunused-result]

Signed-off-by: Arnd Bergmann 
Cc: Russell King 
---
 arch/arm/mach-rpc/ecard.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-rpc/ecard.c b/arch/arm/mach-rpc/ecard.c
index b91bc87..fcb1d59 100644
--- a/arch/arm/mach-rpc/ecard.c
+++ b/arch/arm/mach-rpc/ecard.c
@@ -960,7 +960,9 @@ static int __init ecard_probe(int slot, unsigned irq, 
card_type_t type)
*ecp = ec;
slot_to_expcard[slot] = ec;
 
-   device_register(&ec->dev);
+   rc = device_register(&ec->dev);
+   if (rc)
+   goto nodev;
 
return 0;
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/10] ARM: s3c24xx: enable CONFIG_BUG for tct_hammer

2012-08-08 Thread Arnd Bergmann

Disabling CONFIG_BUG creates an insane amount of build warnings, which
makes it useless to check for building defconfigs to see if new
warnings show up.

Without this patch, building tct_hammer_defconfig results in:

net/packet/af_packet.c: In function 'tpacket_rcv':
net/packet/af_packet.c:1889:30: warning: 'hdrlen' may be used uninitialized in 
this function [-Wuninitialized]
net/core/ethtool.c: In function 'ethtool_get_feature_mask':
net/core/ethtool.c:213:1: warning: control reaches end of non-void function 
[-Wreturn-type]
block/cfq-iosched.c: In function 'cfq_async_queue_prio':
block/cfq-iosched.c:2914:1: warning: control reaches end of non-void function 
[-Wreturn-type]
mm/bootmem.c: In function 'mark_bootmem':
mm/bootmem.c:352:1: warning: control reaches end of non-void function 
[-Wreturn-type]
net/core/dev.c: In function 'skb_warn_bad_offload':
net/core/dev.c:1904:33: warning: unused variable 'null_features' 
[-Wunused-variable]
drivers/mtd/chips/cfi_probe.c: In function 'cfi_chip_setup':
include/linux/mtd/cfi.h:489:3: warning: 'r.x[0]' may be used uninitialized in 
this function [-Wuninitialized]
include/linux/mtd/map.h:394:11: note: 'r.x[0]' was declared here
include/linux/mtd/cfi.h:489:3: warning: 'r.x[0]' may be used uninitialized in 
this function [-Wuninitialized]
(and many more)

The size of vmlinux increases by 1.78% because of this:

size obj-arm/vmlinux.nobug
   textdata bss dec hex filename
   2108474  116916   55352 2280742  22cd26 obj-arm/vmlinux
size obj-arm/vmlinux.bug
   textdata bss dec hex filename
   2150804  116916   53696 2321416  236c08 obj-arm/vmlinux

Signed-off-by: Arnd Bergmann 
Cc: Kukjin Kim 
Cc: Ben Dooks 
---
 arch/arm/configs/tct_hammer_defconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/configs/tct_hammer_defconfig 
b/arch/arm/configs/tct_hammer_defconfig
index 1d24f84..71277a1 100644
--- a/arch/arm/configs/tct_hammer_defconfig
+++ b/arch/arm/configs/tct_hammer_defconfig
@@ -7,7 +7,7 @@ CONFIG_SYSFS_DEPRECATED_V2=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
 # CONFIG_KALLSYMS is not set
-# CONFIG_BUG is not set
+# CONFIG_BUGVERBOSE is not set
 # CONFIG_ELF_CORE is not set
 # CONFIG_SHMEM is not set
 CONFIG_SLOB=y
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/10] ARM: davinci: don't use broken ntosd2_init_i2c

2012-08-08 Thread Arnd Bergmann

ntosd2_init_i2c walks the ntosd2_i2c_info array, which it expects to
be populated with at least one member. gcc correctly warns about
the out-of-bounds access here.

Without this patch, building davinci_all_defconfig results in:

arch/arm/mach-davinci/board-neuros-osd2.c: In function 'davinci_ntosd2_init':
arch/arm/mach-davinci/board-neuros-osd2.c:187:20: warning: array subscript is 
above array bounds [-Warray-bounds]

Signed-off-by: Arnd Bergmann 
Cc: Kevin Hilman 
Cc: Sekhar Nori 
Cc: Andrey Porodko 
---
 arch/arm/mach-davinci/board-neuros-osd2.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/mach-davinci/board-neuros-osd2.c 
b/arch/arm/mach-davinci/board-neuros-osd2.c
index 5de69f2..9d40df9 100644
--- a/arch/arm/mach-davinci/board-neuros-osd2.c
+++ b/arch/arm/mach-davinci/board-neuros-osd2.c
@@ -162,6 +162,7 @@ static void __init davinci_ntosd2_map_io(void)
dm644x_init();
 }
 
+#if 0
 /*
  I2C initialization
 */
@@ -193,6 +194,12 @@ static int ntosd2_init_i2c(void)
}
return status;
 }
+#else
+static  int ntosd2_init_i2c(void)
+{
+   return 0;
+}
+#endif
 
 static struct davinci_mmc_config davinci_ntosd2_mmc_config = {
.wires  = 4,
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 3/3] regulator: add MAX8907 driver

2012-08-08 Thread Stephen Warren

From: Gyungoh Yoo 

The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.

The original driver was written by:
* Gyungoh Yoo 

Various fixes and enhancements by:
* Jin Park 
* Tom Cherry 
* Prashant Gaikwad 
* Dan Willemsen 
* Laxman Dewangan 

During upstreaming, I (swarren):
* Converted to regmap.
* Allowed probing from device tree.
* Reworked the regulator driver to be represented as a single device that
  provides multiple regulators, rather than as a device per regulator.
* Replaced many regulator ops with standard functions.
* Added ability to specify supplies for each regulator.
* Removed the WLED regulator. If/when we expose this in the driver, it
  should be a backlight object not a regulator object.
* Renamed from max8907c->max8907, since the driver covers at least the
  C and B revisions.
* General cleanup.

Signed-off-by: Gyungoh Yoo 
Signed-off-by: Stephen Warren 
---
v2:
* Removed WLED regulator.
* Replaced list_voltage, set_voltage(_sel), get_voltage(_sel), enable,
  disable ops with standard regulator core functions.
* Got rid of struct max8907_regulator_info; everything we need now is
  part of the standard struct regulator_desc.
---
 drivers/regulator/Kconfig |8 +
 drivers/regulator/Makefile|1 +
 drivers/regulator/max8907-regulator.c |  396 +
 3 files changed, 405 insertions(+), 0 deletions(-)
 create mode 100644 drivers/regulator/max8907-regulator.c

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 9bc7749..129c827 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -171,6 +171,14 @@ config REGULATOR_MAX8660
  This driver controls a Maxim 8660/8661 voltage output
  regulator via I2C bus.
 
+config REGULATOR_MAX8907
+   tristate "Maxim 8907 voltage regulator"
+   depends on MFD_MAX8907
+   help
+ This driver controls a Maxim 8907 voltage output regulator
+ via I2C bus. The provided regulator is suitable for Tegra
+ chip to control Step-Down DC-DC and LDOs.
+
 config REGULATOR_MAX8925
tristate "Maxim MAX8925 Power Management IC"
depends on MFD_MAX8925
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 3342615..3a0dbc5 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_REGULATOR_LP8788) += lp8788-ldo.o
 obj-$(CONFIG_REGULATOR_MAX1586) += max1586.o
 obj-$(CONFIG_REGULATOR_MAX8649)+= max8649.o
 obj-$(CONFIG_REGULATOR_MAX8660) += max8660.o
+obj-$(CONFIG_REGULATOR_MAX8907) += max8907-regulator.o
 obj-$(CONFIG_REGULATOR_MAX8925) += max8925-regulator.o
 obj-$(CONFIG_REGULATOR_MAX8952) += max8952.o
 obj-$(CONFIG_REGULATOR_MAX8997) += max8997.o
diff --git a/drivers/regulator/max8907-regulator.c 
b/drivers/regulator/max8907-regulator.c
new file mode 100644
index 000..713bc7b
--- /dev/null
+++ b/drivers/regulator/max8907-regulator.c
@@ -0,0 +1,396 @@
+/*
+ * max8907-regulator.c -- support regulators in max8907
+ *
+ * Copyright (C) 2010 Gyungoh Yoo 
+ * Copyright (C) 2010-2012, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Portions based on drivers/regulator/tps65910-regulator.c,
+ * Copyright 2010 Texas Instruments Inc.
+ * Author: Graeme Gregory 
+ * Author: Jorge Eduardo Candelaria 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MAX8907_II2RR_VERSION_MASK 0xF0
+#define MAX8907_II2RR_VERSION_REV_A0x00
+#define MAX8907_II2RR_VERSION_REV_B0x10
+#define MAX8907_II2RR_VERSION_REV_C0x30
+
+struct max8907_regulator {
+   struct regulator_desc desc[MAX8907_NUM_REGULATORS];
+   struct regulator_dev *rdev[MAX8907_NUM_REGULATORS];
+};
+
+#define REG_MBATT() \
+   { \
+   .name = "MBATT", \
+   .supply_name = "mbatt", \
+   .id = MAX8907_MBATT, \
+   .ops = &max8907_mbatt_ops, \
+   .type = REGULATOR_VOLTAGE, \
+   .owner = THIS_MODULE, \
+   }
+
+#define REG_LDO(ids, supply, base, min, max, step) \
+   { \
+   .name = #ids, \
+   .supply_name = supply, \
+   .id = MAX8907_##ids, \
+   .n_voltages = ((max) - (min)) / (step) + 1, \
+   .ops = &max8907_ldo_ops, \
+   .type = REGULATOR_VOLTAGE, \
+   .owner = THIS_MODULE, \
+   .min_uV = (min), \
+   .uV_step = (step), \
+   .vsel_reg = (base) + MAX8907_VOUT, \
+   .vsel_mask = 0x3f, \
+   .enable_reg = (base) + MAX8907_CTL, \
+   .enable_mas

[PATCH V2 2/3] regulator: add regulator_get_voltage_fixed helper op

2012-08-08 Thread Stephen Warren

From: Stephen Warren 

Fixed regulators always output desc->min_uV. Add a helper get_voltage
op to save duplicating this code in drivers.

Signed-off-by: Stephen Warren 
---
v2: New patch

 drivers/regulator/core.c |   14 ++
 include/linux/regulator/driver.h |1 +
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 457be22..c0129bf 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1973,6 +1973,20 @@ int regulator_is_supported_voltage(struct regulator 
*regulator,
 EXPORT_SYMBOL_GPL(regulator_is_supported_voltage);
 
 /**
+ * regulator_get_voltage_fixed - standard get_voltage for fixed regulators
+ *
+ * @rdev: regulator to operate on
+ *
+ * Fixed regulators can use this as their get_voltage operation, saving
+ * some code.
+ */
+int regulator_get_voltage_fixed(struct regulator_dev *rdev)
+{
+   return rdev->desc->min_uV;
+}
+EXPORT_SYMBOL_GPL(regulator_get_voltage_fixed);
+
+/**
  * regulator_get_voltage_sel_regmap - standard get_voltage_sel for regmap users
  *
  * @rdev: regulator to operate on
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index 2c40c86..aa0145a 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -306,6 +306,7 @@ int regulator_map_voltage_linear(struct regulator_dev *rdev,
  int min_uV, int max_uV);
 int regulator_map_voltage_iterate(struct regulator_dev *rdev,
  int min_uV, int max_uV);
+int regulator_get_voltage_fixed(struct regulator_dev *rdev);
 int regulator_get_voltage_sel_regmap(struct regulator_dev *rdev);
 int regulator_set_voltage_sel_regmap(struct regulator_dev *rdev, unsigned sel);
 int regulator_is_enabled_regmap(struct regulator_dev *rdev);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 1/3] regulator: add always set/clear masks to regulator_enable_regmap

2012-08-08 Thread Stephen Warren

From: Stephen Warren 

Some regulators need some register bits set or cleared in order to place
them under software control. Add .en_dis_set_mask and .en_dis_clr_mask
fields to struct regulator_desc. These can't be part of the existing
.enable_mask field, whose bits are set when enabled and cleared when
disabled, since the bits in this field need to be set/cleard irrespective
of regulator state.

Signed-off-by: Stephen Warren 
---
v2: New patch

 drivers/regulator/core.c |   22 +++---
 include/linux/regulator/driver.h |4 
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index b28221a..457be22 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1762,14 +1762,18 @@ EXPORT_SYMBOL_GPL(regulator_is_enabled_regmap);
  * @rdev: regulator to operate on
  *
  * Regulators that use regmap for their register I/O can set the
- * enable_reg and enable_mask fields in their descriptor and then use
- * this as their enable() operation, saving some code.
+ * enable_reg, enable_mask, en_dis_set_mask, and en_dis_clr_mask fields in
+ * their descriptor and then use this as their enable() operation, saving
+ * some code.
  */
 int regulator_enable_regmap(struct regulator_dev *rdev)
 {
return regmap_update_bits(rdev->regmap, rdev->desc->enable_reg,
- rdev->desc->enable_mask,
- rdev->desc->enable_mask);
+ rdev->desc->enable_mask |
+ rdev->desc->en_dis_set_mask |
+ rdev->desc->en_dis_clr_mask,
+ rdev->desc->enable_mask |
+ rdev->desc->en_dis_set_mask);
 }
 EXPORT_SYMBOL_GPL(regulator_enable_regmap);
 
@@ -1779,13 +1783,17 @@ EXPORT_SYMBOL_GPL(regulator_enable_regmap);
  * @rdev: regulator to operate on
  *
  * Regulators that use regmap for their register I/O can set the
- * enable_reg and enable_mask fields in their descriptor and then use
- * this as their disable() operation, saving some code.
+ * enable_reg, enable_mask, en_dis_set_mask, and en_dis_clr_mask fields in
+ * their descriptor and then use this as their disable() operation, saving
+ * some code.
  */
 int regulator_disable_regmap(struct regulator_dev *rdev)
 {
return regmap_update_bits(rdev->regmap, rdev->desc->enable_reg,
- rdev->desc->enable_mask, 0);
+ rdev->desc->enable_mask |
+ rdev->desc->en_dis_set_mask |
+ rdev->desc->en_dis_clr_mask,
+ rdev->desc->en_dis_set_mask);
 }
 EXPORT_SYMBOL_GPL(regulator_disable_regmap);
 
diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index bac4c87..2c40c86 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -182,6 +182,8 @@ enum regulator_type {
  * @vsel_mask: Mask for register bitfield used for selector
  * @enable_reg: Register for control when using regmap enable/disable ops
  * @enable_mask: Mask for control when using regmap enable/disable ops
+ * @en_dis_set_mask: Mask to always set when using regmap enable/disable ops
+ * @en_dis_clr_mask: Mask to always clear when using regmap enable/disable ops
  *
  * @enable_time: Time taken for initial enable of regulator (in uS).
  */
@@ -205,6 +207,8 @@ struct regulator_desc {
unsigned int vsel_mask;
unsigned int enable_reg;
unsigned int enable_mask;
+   unsigned int en_dis_set_mask;
+   unsigned int en_dis_clr_mask;
 
unsigned int enable_time;
 };
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP

2012-08-08 Thread Rafael J. Wysocki


According to a compiler warning, the tpm_tis_resume() function is not
used for CONFIG_PM_SLEEP unset, so add a #ifdef to prevent it from
being built in that case.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/char/tpm/tpm_tis.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux/drivers/char/tpm/tpm_tis.c
===
--- linux.orig/drivers/char/tpm/tpm_tis.c
+++ linux/drivers/char/tpm/tpm_tis.c
@@ -807,6 +807,7 @@ module_param_string(hid, tpm_pnp_tbl[TIS
 MODULE_PARM_DESC(hid, "Set additional specific HID for this driver to probe");
 #endif
 
+#ifdef CONFIG_PM_SLEEP
 static int tpm_tis_resume(struct device *dev)
 {
struct tpm_chip *chip = dev_get_drvdata(dev);
@@ -816,6 +817,7 @@ static int tpm_tis_resume(struct device
 
return tpm_pm_resume(dev);
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(tpm_tis_pm, tpm_pm_suspend, tpm_tis_resume);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 017/109] ASoC: dapm: Fix locking during codec shutdown

2012-08-08 Thread Herton Ronaldo Krzesinski

On Tue, Aug 07, 2012 at 03:34:36PM -0700, Greg Kroah-Hartman wrote:
> From: Greg KH 
> 
> 3.4-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Liam Girdwood 
> 
> commit 01005a729a17ab419f61a366e22f3419e7a2c3fe upstream.
> 
> Codec shutdown performs a DAPM power sequence that might cause conflicts
> and/or race conditions if another stream power event is running 
> simultaneously.
> Use card's dapm mutex to protect any potential race condition between them.
> 
> Signed-off-by: Misael Lopez Cruz 
> Signed-off-by: Liam Girdwood 
> Signed-off-by: Mark Brown 
> Signed-off-by: Greg Kroah-Hartman 
> 
> ---
>  sound/soc/soc-dapm.c |5 +
>  1 file changed, 5 insertions(+)
> 
> --- a/sound/soc/soc-dapm.c
> +++ b/sound/soc/soc-dapm.c
> @@ -3210,10 +3210,13 @@ EXPORT_SYMBOL_GPL(snd_soc_dapm_free);
>  
>  static void soc_dapm_shutdown_codec(struct snd_soc_dapm_context *dapm)
>  {
> + struct snd_soc_card *card = dapm->card;
>   struct snd_soc_dapm_widget *w;
>   LIST_HEAD(down_list);
>   int powerdown = 0;
>  
> + mutex_lock(&card->dapm_mutex);
> +

Doesn't build on 3.4:

linux-stable/sound/soc/soc-dapm.c: In function 'soc_dapm_shutdown_codec':
linux-stable/sound/soc/soc-dapm.c:3226:18: error: 'struct snd_soc_card' has no 
member named 'dapm_mutex'
linux-stable/sound/soc/soc-dapm.c:3251:20: error: 'struct snd_soc_card' has no 
member named 'dapm_mutex'

It seems this patch should be dropped as when it was applied to 3.2, is
that correct?

>   list_for_each_entry(w, &dapm->card->widgets, list) {
>   if (w->dapm != dapm)
>   continue;
> @@ -3236,6 +3239,8 @@ static void soc_dapm_shutdown_codec(stru
>   snd_soc_dapm_set_bias_level(dapm,
>   SND_SOC_BIAS_STANDBY);
>   }
> +
> + mutex_unlock(&card->dapm_mutex);
>  }
>  
>  /*
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
[]'s
Herton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] platform / x86/ PM: Fix unused function warnings for CONFIG_PM_SLEEP

2012-08-08 Thread Rafael J. Wysocki


According to compiler warnings, quite some suspend/resume functions
in platform x86 drivers are not used for CONFIG_PM_SLEEP unset, so
add #ifdefs to prevent them from being built in that case.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/platform/x86/classmate-laptop.c  |4 
 drivers/platform/x86/fujitsu-tablet.c|2 ++
 drivers/platform/x86/hdaps.c |2 ++
 drivers/platform/x86/hp_accel.c  |2 +-
 drivers/platform/x86/msi-laptop.c|4 
 drivers/platform/x86/panasonic-laptop.c  |4 
 drivers/platform/x86/sony-laptop.c   |   12 +++-
 drivers/platform/x86/thinkpad_acpi.c |2 ++
 drivers/platform/x86/toshiba_acpi.c  |2 ++
 drivers/platform/x86/toshiba_bluetooth.c |4 
 drivers/platform/x86/xo15-ebook.c|2 ++
 11 files changed, 38 insertions(+), 2 deletions(-)

Index: linux/drivers/platform/x86/classmate-laptop.c
===
--- linux.orig/drivers/platform/x86/classmate-laptop.c
+++ linux/drivers/platform/x86/classmate-laptop.c
@@ -350,6 +350,7 @@ static void cmpc_accel_idev_init_v4(stru
inputdev->close = cmpc_accel_close_v4;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int cmpc_accel_suspend_v4(struct device *dev)
 {
struct input_dev *inputdev;
@@ -384,6 +385,7 @@ static int cmpc_accel_resume_v4(struct d
 
return 0;
 }
+#endif
 
 static int cmpc_accel_add_v4(struct acpi_device *acpi)
 {
@@ -752,6 +754,7 @@ static int cmpc_tablet_remove(struct acp
return cmpc_remove_acpi_notify_device(acpi);
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int cmpc_tablet_resume(struct device *dev)
 {
struct input_dev *inputdev = dev_get_drvdata(dev);
@@ -761,6 +764,7 @@ static int cmpc_tablet_resume(struct dev
input_report_switch(inputdev, SW_TABLET_MODE, !val);
return 0;
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(cmpc_tablet_pm, NULL, cmpc_tablet_resume);
 
Index: linux/drivers/platform/x86/fujitsu-tablet.c
===
--- linux.orig/drivers/platform/x86/fujitsu-tablet.c
+++ linux/drivers/platform/x86/fujitsu-tablet.c
@@ -440,11 +440,13 @@ static int __devexit acpi_fujitsu_remove
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_fujitsu_resume(struct device *dev)
 {
fujitsu_reset();
return 0;
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(acpi_fujitsu_pm, NULL, acpi_fujitsu_resume);
 
Index: linux/drivers/platform/x86/hdaps.c
===
--- linux.orig/drivers/platform/x86/hdaps.c
+++ linux/drivers/platform/x86/hdaps.c
@@ -305,10 +305,12 @@ static int hdaps_probe(struct platform_d
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int hdaps_resume(struct device *dev)
 {
return hdaps_device_init();
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(hdaps_pm, NULL, hdaps_resume);
 
Index: linux/drivers/platform/x86/hp_accel.c
===
--- linux.orig/drivers/platform/x86/hp_accel.c
+++ linux/drivers/platform/x86/hp_accel.c
@@ -352,7 +352,7 @@ static int lis3lv02d_remove(struct acpi_
 }
 
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static int lis3lv02d_suspend(struct device *dev)
 {
/* make sure the device is off when we suspend */
Index: linux/drivers/platform/x86/msi-laptop.c
===
--- linux.orig/drivers/platform/x86/msi-laptop.c
+++ linux/drivers/platform/x86/msi-laptop.c
@@ -85,7 +85,9 @@
 #define MSI_STANDARD_EC_TOUCHPAD_ADDRESS   0xe4
 #define MSI_STANDARD_EC_TOUCHPAD_MASK  (1 << 4)
 
+#ifdef CONFIG_PM_SLEEP
 static int msi_laptop_resume(struct device *device);
+#endif
 static SIMPLE_DEV_PM_OPS(msi_laptop_pm, NULL, msi_laptop_resume);
 
 #define MSI_STANDARD_EC_DEVICES_EXISTS_ADDRESS 0x2f
@@ -753,6 +755,7 @@ err_bluetooth:
return retval;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int msi_laptop_resume(struct device *device)
 {
u8 data;
@@ -773,6 +776,7 @@ static int msi_laptop_resume(struct devi
 
return 0;
 }
+#endif
 
 static int __init msi_laptop_input_setup(void)
 {
Index: linux/drivers/platform/x86/panasonic-laptop.c
===
--- linux.orig/drivers/platform/x86/panasonic-laptop.c
+++ linux/drivers/platform/x86/panasonic-laptop.c
@@ -188,7 +188,9 @@ static const struct acpi_device_id pcc_d
 };
 MODULE_DEVICE_TABLE(acpi, pcc_device_ids);
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_pcc_hotkey_resume(struct device *dev);
+#endif
 static SIMPLE_DEV_PM_OPS(acpi_pcc_hotkey_pm, NULL, acpi_pcc_hotkey_resume);
 
 static struct acpi_driver acpi_pcc_driver = {
@@ -540,6 +542,7 @@ static void acpi_pcc_destroy_input(struc
 
 /* kernel module interface */
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_pcc_hotkey_resume(struct device *dev)
 {
struct pcc_acpi *pcc;
@@ -556,6 +559,7 @@

[PATCH 1/3] ACPI / PM: Fix unused function warnings for CONFIG_PM_SLEEP

2012-08-08 Thread Rafael J. Wysocki


According to compiler warnings, several suspend/resume functions
in ACPI drivers are not used for CONFIG_PM_SLEEP unset, so add
#ifdefs to prevent them from being built in that case.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/ac.c  |4 
 drivers/acpi/battery.c |2 ++
 drivers/acpi/button.c  |4 
 drivers/acpi/fan.c |4 
 drivers/acpi/power.c   |4 
 drivers/acpi/sbs.c |2 ++
 drivers/acpi/thermal.c |4 
 7 files changed, 24 insertions(+)

Index: linux/drivers/acpi/ac.c
===
--- linux.orig/drivers/acpi/ac.c
+++ linux/drivers/acpi/ac.c
@@ -69,7 +69,9 @@ static const struct acpi_device_id ac_de
 };
 MODULE_DEVICE_TABLE(acpi, ac_device_ids);
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_ac_resume(struct device *dev);
+#endif
 static SIMPLE_DEV_PM_OPS(acpi_ac_pm, NULL, acpi_ac_resume);
 
 static struct acpi_driver acpi_ac_driver = {
@@ -313,6 +315,7 @@ static int acpi_ac_add(struct acpi_devic
return result;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_ac_resume(struct device *dev)
 {
struct acpi_ac *ac;
@@ -332,6 +335,7 @@ static int acpi_ac_resume(struct device
kobject_uevent(&ac->charger.dev->kobj, KOBJ_CHANGE);
return 0;
 }
+#endif
 
 static int acpi_ac_remove(struct acpi_device *device, int type)
 {
Index: linux/drivers/acpi/battery.c
===
--- linux.orig/drivers/acpi/battery.c
+++ linux/drivers/acpi/battery.c
@@ -1052,6 +1052,7 @@ static int acpi_battery_remove(struct ac
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
 /* this is needed to learn about changes made in suspended state */
 static int acpi_battery_resume(struct device *dev)
 {
@@ -1068,6 +1069,7 @@ static int acpi_battery_resume(struct de
acpi_battery_update(battery);
return 0;
 }
+#endif
 
 static SIMPLE_DEV_PM_OPS(acpi_battery_pm, NULL, acpi_battery_resume);
 
Index: linux/drivers/acpi/button.c
===
--- linux.orig/drivers/acpi/button.c
+++ linux/drivers/acpi/button.c
@@ -78,7 +78,9 @@ static int acpi_button_add(struct acpi_d
 static int acpi_button_remove(struct acpi_device *device, int type);
 static void acpi_button_notify(struct acpi_device *device, u32 event);
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_button_resume(struct device *dev);
+#endif
 static SIMPLE_DEV_PM_OPS(acpi_button_pm, NULL, acpi_button_resume);
 
 static struct acpi_driver acpi_button_driver = {
@@ -310,6 +312,7 @@ static void acpi_button_notify(struct ac
}
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_button_resume(struct device *dev)
 {
struct acpi_device *device = to_acpi_device(dev);
@@ -319,6 +322,7 @@ static int acpi_button_resume(struct dev
return acpi_lid_send_state(device);
return 0;
 }
+#endif
 
 static int acpi_button_add(struct acpi_device *device)
 {
Index: linux/drivers/acpi/fan.c
===
--- linux.orig/drivers/acpi/fan.c
+++ linux/drivers/acpi/fan.c
@@ -53,8 +53,10 @@ static const struct acpi_device_id fan_d
 };
 MODULE_DEVICE_TABLE(acpi, fan_device_ids);
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_fan_suspend(struct device *dev);
 static int acpi_fan_resume(struct device *dev);
+#endif
 static SIMPLE_DEV_PM_OPS(acpi_fan_pm, acpi_fan_suspend, acpi_fan_resume);
 
 static struct acpi_driver acpi_fan_driver = {
@@ -184,6 +186,7 @@ static int acpi_fan_remove(struct acpi_d
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_fan_suspend(struct device *dev)
 {
if (!dev)
@@ -207,6 +210,7 @@ static int acpi_fan_resume(struct device
 
return result;
 }
+#endif
 
 static int __init acpi_fan_init(void)
 {
Index: linux/drivers/acpi/power.c
===
--- linux.orig/drivers/acpi/power.c
+++ linux/drivers/acpi/power.c
@@ -67,7 +67,9 @@ static const struct acpi_device_id power
 };
 MODULE_DEVICE_TABLE(acpi, power_device_ids);
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_power_resume(struct device *dev);
+#endif
 static SIMPLE_DEV_PM_OPS(acpi_power_pm, NULL, acpi_power_resume);
 
 static struct acpi_driver acpi_power_driver = {
@@ -775,6 +777,7 @@ static int acpi_power_remove(struct acpi
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_power_resume(struct device *dev)
 {
int result = 0, state;
@@ -803,6 +806,7 @@ static int acpi_power_resume(struct devi
 
return result;
 }
+#endif
 
 int __init acpi_power_init(void)
 {
Index: linux/drivers/acpi/sbs.c
===
--- linux.orig/drivers/acpi/sbs.c
+++ linux/drivers/acpi/sbs.c
@@ -988,6 +988,7 @@ static void acpi_sbs_rmdirs(void)
 #endif
 }
 
+#ifdef CONFIG_PM_SLEEP
 static int acpi_sbs_resume(struct device *dev)
 {
struct acpi_sbs *sbs;
@@ -997,6 +998,7

[PATCH 0/3] PM: Fix some unused function warnings for CONFIG_PM_SLEEP

2012-08-08 Thread Rafael J. Wysocki

Hi all,

The recent conversion to the PM handling based on struct dev_pm_ops
uncovered some code that is not used for CONFIG_PM_SLEEP unset, which
results in a number of new copiler warning.

Admittedly, I should have spotted those places before, but anyway
patches fixing those for ACPI, platform/x86 and tpm_tis follow.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] proc: add /proc/pid/shmaps

2012-08-08 Thread David Rientjes

On Wed, 8 Aug 2012, Qiaowei Ren wrote:

> Add a shmaps entry to /proc/pid: show information about shared memory in an 
> address space.
> 
> People that use shared memory and want to perform an analyzing about it. For 
> example, judge whether any memory address is shared. This file just contains 
> 'share' part of /proc/pid/maps now. There are too many contents in maps, and 
> so we have to do a lot of analysis to obtain relative information every time.
> 
> Signed-off-by: Qiaowei Ren 

Nack as unnecessary; /proc/pid/maps already explicitly emits 's' for 
VM_MAYSHARE and 'p' otherwise so this information is already available to 
userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 00/16] ACPI based system device hotplug framework

2012-08-08 Thread Toshi Kani

On Wed, 2012-08-08 at 23:44 +0800, Jiang Liu wrote:
> On 08/08/2012 07:38 AM, Toshi Kani wrote:
> > On Sat, 2012-08-04 at 20:13 +0800, Jiang Liu wrote:
> >> From: Jiang Liu 
> >>
> >> The patchset is based on v3.5-rc6 and you may pull them from:
> >> git://github.com/jiangliu/linux.git acpihp
> >>
> >> Modern high-end server may support advanced hotplug features for system
> >> devices, including physical processor, memory board, IO extension board
> >> and/or computer node. The ACPI specifications have provided standard
> >> interfaces between firmware and OS to support device hotplug at runtime.
> >> This patch series provide an ACPI based hotplug framework to support system
> >> device hotplug at runtime, which will replace current existing ACPI device
> >> driver based CPU/memory/CONTAINER hotplug mechanism.
> >>
> >> The new ACPI based hotplug framework is modelled after PCI hotplug
> >> architecture and target to achieve following goals:
> > 
> > Hi Jiang,
> > 
> > It is nice to see such infrastructure work!  I have some high-level
> > questions / comments below.  So far, I have only looked at part of the
> > changes briefly, so please correct me if I missed something.
> 
> Hi Toshi,
>   Thanks for your time to review these patches!
> 
> > 
> >> 1) Provide a mechanism to detect hotplug slots by checking ACPI _EJ0 
> >> method,
> >> ACPI PRCT (platform RAS capabilities table) and other platform specific
> >> mechanisms.
> > 
> > Does this mean that hot-plug device must support both hot-add &
> > hot-delete operations?  Some platforms may choose to only support
> > hot-add operations to increase the resource on-line (since it requires
> > less effort and Windows does not support hot-remove, either).
> This is a good question. By default, the framework detects hotplug slot
> by checking _EJ0 method. If a system does support hot-add only components,
> some static ACPI tables, like PRCT, may be used to describe hotplug slots
> available in the system. 
> 
> Basically ACPI PRCT table contains tuples of (device type, uid, RAS 
> capabilities).

I am not familiar with the PRCT table.  Can you point me to the spec?  I
did not find it in the ACPI spec.

> >> 2) Unify the way to enumerate ACPI based hotplug slots. All hotplug slots
> >> will be enumerated by the enumeration driver, instead of by ACPI device
> >> drivers.
> > 
> > It is nice to see redundant ACPI namespace walks removed from the ACPI
> > drivers.  But why do you need to add a new enumerator to create the
> > acpihp_slot tree, in addition to the current acpi_device tree?  I'd
> > prefer hotplug features to be generally integrated into the current ACPI
> > core code and data structures, instead of adding a new layer on top of
> > it.
> The idea comes from PCI hotplug framework, which has an concepts of PCI
> hotplug slot and PCI device. For system device hotplug, we could follow
> the same model as PCI by abstracting control points as slots. By introducing
> of hotplug slot, we could:

Yes, I understand that.  Using the slot concept on PCI hotplug makes
sense as it has PCI slot objects in ACPI.  For non-PCI, however, you are
kind of faking slots and introducing a new slot tree on top of the
existing ACPI device tree, so I am not sure if it is a good idea...

> 1) Report all hotplug slots and slot's capabilities to user, no matter whether
> there are devices connecting to a slot. If we integrate hotplug functionality
> into current ACPI device tree, the slot (or device) is only visible when the
> connected devices are enabled.

We need to think about both physical and virtual machines.  Devices can
be virtualized and there can be many of them.  For example, let's say,
1TB of physical memory is logically sliced up with 1024 * 1GB memory
objects for guests, so that they can add / delete memory by 1GB.  If a
guest has only 2GB of memory assigned, it has other 1022 devices
disabled.  In this case, showing 1022 empty memory slots may not be very
helpful for users.

> 2) Provide interfaces for software to control hotplug slots. With current ACPI
> definition, we could only trigger ACPI hotplug events by pressing hotplug 
> button
> or through some OOB device management system. To support RAS features like
> memory power management, memory migration, dynamic resource management etc, we
> need to trigger hotplug events through in-band interfaces. 

acpi_device objects also have sysfs entries.  Can they be used for such
in-bound interfaces?

> > Also, acpihp_dev_get_type() in core.c relies on PNP IDs that is embedded
> > in the file.  This does not seem very flexible / extendable.  One should
> > be able to add a handler for a new or vendor-specific PNP ID without
> > changing the core code.  struct acpi_driver allows such extension today.
> Good catch. That's a design limitation currently. If the need arise, we could
> extend the core to support platform specific extensions. But that may be a
> little hard because all devices connecting to a slot will

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Paris

On Wed, Aug 8, 2012 at 5:03 PM, Paul Moore  wrote:
> On Wednesday, August 08, 2012 04:51:56 PM Eric Paris wrote:

>> Could we add a __init function which does the security_sk_alloc() in
>> the same file where we declared them?
>
> Is it safe to call security_sk_alloc() from inside another __init function?  I
> think in both the case of SELinux and Smack it shouldn't be a problem, but I'm
> concerned about the more general case of calling a LSM hook potentially before
> the LSM has been initialized.
>
> If that isn't an issue we could probably do something in ip_init().

The security_initcall() functions should happen way before __init
functions.  If an LSM busts, it's the LSM initializing itself too late
not the code here being wrong...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Paul Moore

On Wednesday, August 08, 2012 04:51:56 PM Eric Paris wrote:
> On Wed, Aug 8, 2012 at 4:35 PM, Paul Moore  wrote:
> > On Wednesday, August 08, 2012 10:09:38 PM Eric Dumazet wrote:
> > 
> > Actually, the issue is that the shared socket doesn't have an init/alloc
> > function to do the LSM allocation like we do with other sockets so Eric's
> > patch does it as part of ip_send_unicast_reply().
> > 
> > If we look at the relevant part of Eric's patch:
> >  +#ifdef CONFIG_SECURITY
> >  +   if (!sk->sk_security && security_sk_alloc(sk, PF_INET,
> >  GFP_ATOMIC))
> >  +   goto out;
> >  +#endif
> > 
> > ... if we were to remove the CONFIG_SECURITY conditional we would end up
> > calling security_sk_alloc() each time through in the CONFIG_SECURITY=n
> > case as sk->sk_security would never be initialized to a non-NULL value. 
> > In the CONFIG_SECURITY=y case it should only be called once as
> > security_sk_alloc() should set sk->sk_security to a LSM blob.
> 
> Ifndef SECURITY this turns into (because security_sk_alloc is a static
> inline in that case)
> 
> if (!sk->sk_security && 0)
> goto out;
> 
> Which I'd hope the compiler would optimize.  So that only leaves us
> caring about the case there CONFIG_SECURITY is true.  In that case if
> we need code which does if !alloc'd then alloc it seems we broke the
> model of everything else in the code and added a branch needlessly.
> 
> Could we add a __init function which does the security_sk_alloc() in
> the same file where we declared them?

Is it safe to call security_sk_alloc() from inside another __init function?  I 
think in both the case of SELinux and Smack it shouldn't be a problem, but I'm 
concerned about the more general case of calling a LSM hook potentially before 
the LSM has been initialized.

If that isn't an issue we could probably do something in ip_init().

> > The issue I'm struggling with at present is how should we handle this
> > traffic from a LSM perspective.  The label based LSMs, e.g. SELinux and
> > Smack, use the LSM blob assigned to locally generated outbound traffic to
> > identify the traffic and apply the security policy, so not only do we
> > have to resolve the issue of ensuring the traffic is labeled correctly,
> > we have to do it with a shared socket (although the patch didn't change
> > the shared nature of the socket).
> > 
> > For those who are interested, I think the reasonable labeling solution
> > here is to go with SECINITSID_KERNEL/kernel_t for SELinux and likely the
> > ambient label for Smack as in both the TCP reset and timewait ACK there
> > shouldn't be any actual user data present.
> 
> I'm willing to accept that argument from an SELinux perspective.  I'd
> also accept the argument that it is private and do something similar
> to what we do with IS_PRIVATE on inodes.  Although sockets probably
> don't have a good field to use...

I'm not aware of one.  See my comments on Eric's last patch posting (the other 
Eric, not you).

-- 
paul moore
www.paul-moore.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [dm-devel] [PATCH] dm: verity support data device offset (Linux 3.4.7)

2012-08-08 Thread Milan Broz

On 08/08/2012 10:46 PM, Wesley Miaw wrote:
> On Aug 8, 2012, at 1:31 PM, Milan Broz wrote:

> I did modify veritysetup on my own so the format and verify commands will 
> work with regular files on disk instead of having to mount through loop 
> devices.

Which veritysetup? In upstream (cryptsetup repository) it allocates loop 
automatically.
(And for userspace verification it doesn't need loop at all.)

Anyway, please send a patch for userspace as well then ;-)

Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: snd_hda_intel bug during boot

2012-08-08 Thread Jesper Juhl

On Wed, 8 Aug 2012, Jesper Juhl wrote:

> On Wed, 8 Aug 2012, Alexei Kornienko wrote:
> 
> > Seems like I have a bug in audio driver on my laptop. Cause of this I
> > don't have any sound card detected.
> > Please find more details below:
> > 
> > ** Version:
> > Linux version 3.2.0-29-generic-pae (buildd@roseapple) (gcc version
> > 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #46-Ubuntu SMP Fri Jul 27
> > 17:25:43 UTC 2012
> > 
> Ok, so I openly admit that I have no clue as to what your problem might 
> be. But, one thing I do know is that, a *lot* has changed since the 3.2 
> kernel you are running and the most recent stable (3.5) kernel. So, 
> perhaps you could test the 3.5 kernel and tell us if you still see the 
> problem with that one or not?
> If you are feeling adventurous you could also try a snapshot of the latest 
> Linus (to become 3.6) kernel and tell us your experiences with that...?
> 
Also, if this used to work with an older kernel, then it would probably 
be helpful if you could do a "git bisect" between the older (working) 
kernel version and your current (broken) kernel version in order to zoom 
in on the commit that broke things for you.

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Paris

On Wed, Aug 8, 2012 at 4:35 PM, Paul Moore  wrote:
> On Wednesday, August 08, 2012 10:09:38 PM Eric Dumazet wrote:

> Actually, the issue is that the shared socket doesn't have an init/alloc
> function to do the LSM allocation like we do with other sockets so Eric's
> patch does it as part of ip_send_unicast_reply().
>
> If we look at the relevant part of Eric's patch:
>
>  +#ifdef CONFIG_SECURITY
>  +   if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC))
>  +   goto out;
>  +#endif
>
> ... if we were to remove the CONFIG_SECURITY conditional we would end up
> calling security_sk_alloc() each time through in the CONFIG_SECURITY=n case as
> sk->sk_security would never be initialized to a non-NULL value.  In the
> CONFIG_SECURITY=y case it should only be called once as security_sk_alloc()
> should set sk->sk_security to a LSM blob.

Ifndef SECURITY this turns into (because security_sk_alloc is a static
inline in that case)

if (!sk->sk_security && 0)
goto out;

Which I'd hope the compiler would optimize.  So that only leaves us
caring about the case there CONFIG_SECURITY is true.  In that case if
we need code which does if !alloc'd then alloc it seems we broke the
model of everything else in the code and added a branch needlessly.

Could we add a __init function which does the security_sk_alloc() in
the same file where we declared them?

>> IMHO it seems wrong to even care about security for internal sockets.
>>
>> They are per cpu, shared for all users on the machine.
>
> The issue, from a security point of view, is that these sockets are sending
> network traffic; even if it is just resets and timewait ACKs, it is still
> network traffic and the LSMs need to be able to enforce security policy on
> this traffic.  After all, what would you say if your firewall let these same
> packets pass without any filtering?
>
> The issue I'm struggling with at present is how should we handle this traffic
> from a LSM perspective.  The label based LSMs, e.g. SELinux and Smack, use the
> LSM blob assigned to locally generated outbound traffic to identify the
> traffic and apply the security policy, so not only do we have to resolve the
> issue of ensuring the traffic is labeled correctly, we have to do it with a
> shared socket (although the patch didn't change the shared nature of the
> socket).
>
> For those who are interested, I think the reasonable labeling solution here is
> to go with SECINITSID_KERNEL/kernel_t for SELinux and likely the ambient label
> for Smack as in both the TCP reset and timewait ACK there shouldn't be any
> actual user data present.

I'm willing to accept that argument from an SELinux perspective.  I'd
also accept the argument that it is private and do something similar
to what we do with IS_PRIVATE on inodes.  Although sockets probably
don't have a good field to use...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: snd_hda_intel bug during boot

2012-08-08 Thread Jesper Juhl

On Wed, 8 Aug 2012, Alexei Kornienko wrote:

> Seems like I have a bug in audio driver on my laptop. Cause of this I
> don't have any sound card detected.
> Please find more details below:
> 
> ** Version:
> Linux version 3.2.0-29-generic-pae (buildd@roseapple) (gcc version
> 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #46-Ubuntu SMP Fri Jul 27
> 17:25:43 UTC 2012
> 
Ok, so I openly admit that I have no clue as to what your problem might 
be. But, one thing I do know is that, a *lot* has changed since the 3.2 
kernel you are running and the most recent stable (3.5) kernel. So, 
perhaps you could test the 3.5 kernel and tell us if you still see the 
problem with that one or not?
If you are feeling adventurous you could also try a snapshot of the latest 
Linus (to become 3.6) kernel and tell us your experiences with that...?

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Paul Moore

On Wednesday, August 08, 2012 10:32:52 PM Eric Dumazet wrote:
> On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote:
> > On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote:
> > > Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
> > > code.
> > 
> > Sure but it seems include file misses an accessor for this.
> > 
> > We could add it on a future cleanup patch, as Paul mentioned.
> 
> I cooked following patch.
> But smack/smack_lsm.c makes a reference to
> smk_of_current()... so it seems we are in a hole...
> 
> It makes little sense to me to have any kind of security on this
> internal sockets.
> 
> Maybe selinux should not crash if sk->sk_security is NULL ?

I realize our last emails probably passed each other mid-flight, but hopefully 
it explains why we can't just pass packets when sk->sk_security is NULL.

Regardless, some quick comments below ...

> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 6c77f63..459eca6 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4289,10 +4289,13 @@ out:
>   return 0;
>  }
> 
> -static int selinux_sk_alloc_security(struct sock *sk, int family, ...
> +static int selinux_sk_alloc_security(struct sock *sk, int family, ...
>  {
>   struct sk_security_struct *sksec;
> 
> + if (check && sk->sk_security)
> + return 0;
> +
>   sksec = kzalloc(sizeof(*sksec), priority);
>   if (!sksec)
>   return -ENOMEM;

I think I might replace the "check" boolean with a "kern/kernel" boolean so 
that in addition to the allocation we can also initialize the socket to 
SECINITSID_KERNEL/kernel_t here in the case when the boolean is set.  The only 
place that would set the boolean to true would be ip_send_unicast_reply(), all 
other callers would set it to false.

> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 8221514..8965cf1 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -1754,11 +1754,14 @@ static void smack_task_to_inode(struct task_struct
> *p, struct inode *inode) *
>   * Returns 0 on success, -ENOMEM is there's no memory
>   */
> -static int smack_sk_alloc_security(struct sock *sk, int family, gfp_t
> gfp_flags) +static int smack_sk_alloc_security(struct sock *sk, int family,
> gfp_t gfp_flags, bool check) {
>   char *csp = smk_of_current();
>   struct socket_smack *ssp;
> 
> + if (check && sk->sk_security)
> + return 0;
> +
>   ssp = kzalloc(sizeof(struct socket_smack), gfp_flags);
>   if (ssp == NULL)
>   return -ENOMEM;

In the case of Smack, when the kernel boolean is true I think the right 
solution is to use smack_net_ambient.

-- 
paul moore
www.paul-moore.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [dm-devel] [PATCH] dm: verity support data device offset (Linux 3.4.7)

2012-08-08 Thread Wesley Miaw

On Aug 8, 2012, at 1:31 PM, Milan Broz wrote:

> On 08/08/2012 08:46 PM, Mikulas Patocka wrote:
> 
>> The problem with the patch is that it changes interface to the userspace 
>> tool. The userspace tool veritysetup already exists in recent cryptsetup 
>> package, so we can't change the interface - you should change the patch so 
>> that the starting data block is the last argument and the argument is 
>> optional - so that it is compatible with the existing userspace too.
> 
> yes. Please never change interface without at least increasing target version.
> 
> I have to add userspace support as well to veritysetup and we need a way
> how to detect that option is supported by running kernel.


Understood. Thank you for the feedback. I will attempt a new patch version 
which addresses these issues. I also found that I did not correct the 
last-block boundary check so I will re-submit my patch with that as well.

I did modify veritysetup on my own so the format and verify commands will work 
with regular files on disk instead of having to mount through loop devices.
--
Wesley Miaw

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Paul Moore

On Wednesday, August 08, 2012 10:09:38 PM Eric Dumazet wrote:
> On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote:
> > Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
> > code.
> 
> Sure but it seems include file misses an accessor for this.
> 
> We could add it on a future cleanup patch, as Paul mentioned.

Actually, the issue is that the shared socket doesn't have an init/alloc 
function to do the LSM allocation like we do with other sockets so Eric's 
patch does it as part of ip_send_unicast_reply().

If we look at the relevant part of Eric's patch:

 +#ifdef CONFIG_SECURITY
 +   if (!sk->sk_security && security_sk_alloc(sk, PF_INET, GFP_ATOMIC))
 +   goto out;
 +#endif

... if we were to remove the CONFIG_SECURITY conditional we would end up 
calling security_sk_alloc() each time through in the CONFIG_SECURITY=n case as 
sk->sk_security would never be initialized to a non-NULL value.  In the 
CONFIG_SECURITY=y case it should only be called once as security_sk_alloc() 
should set sk->sk_security to a LSM blob.

> >  Ifndef CONF_SECURITY then security_sk_alloc() is a static
> > 
> > inline return 0;   I guess the question is "Where did the sk come
> > from"?  Why wasn't security_sk_alloc() called when it was allocated?
> > Should it have been updated at some time and that wasn't done either?
> > Seems wrong to be putting packets on the queue for a socket where the
> > security data was never allocated and was never set to its proper
> > state.
> 
> IMHO it seems wrong to even care about security for internal sockets.
>
> They are per cpu, shared for all users on the machine.

The issue, from a security point of view, is that these sockets are sending 
network traffic; even if it is just resets and timewait ACKs, it is still 
network traffic and the LSMs need to be able to enforce security policy on 
this traffic.  After all, what would you say if your firewall let these same 
packets pass without any filtering?

The issue I'm struggling with at present is how should we handle this traffic 
from a LSM perspective.  The label based LSMs, e.g. SELinux and Smack, use the 
LSM blob assigned to locally generated outbound traffic to identify the 
traffic and apply the security policy, so not only do we have to resolve the 
issue of ensuring the traffic is labeled correctly, we have to do it with a 
shared socket (although the patch didn't change the shared nature of the 
socket).

For those who are interested, I think the reasonable labeling solution here is 
to go with SECINITSID_KERNEL/kernel_t for SELinux and likely the ambient label 
for Smack as in both the TCP reset and timewait ACK there shouldn't be any 
actual user data present.

-- 
paul moore
www.paul-moore.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet

On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote:
> On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote:
> 
> > Seems wrong.  We shouldn't ever need ifdef CONFIG_SECURITY in core
> > code. 
> 
> Sure but it seems include file misses an accessor for this.
> 
> We could add it on a future cleanup patch, as Paul mentioned.

I cooked following patch.
But smack/smack_lsm.c makes a reference to 
smk_of_current()... so it seems we are in a hole...

It makes little sense to me to have any kind of security on this
internal sockets.

Maybe selinux should not crash if sk->sk_security is NULL ?



 include/linux/security.h   |6 +++---
 net/core/sock.c|2 +-
 net/ipv4/ip_output.c   |4 +++-
 security/security.c|4 ++--
 security/selinux/hooks.c   |5 -
 security/smack/smack_lsm.c |5 -
 6 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index 4e5a73c..aa648b2 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1601,7 +1601,7 @@ struct security_operations {
int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb);
int (*socket_getpeersec_stream) (struct socket *sock, char __user 
*optval, int __user *optlen, unsigned len);
int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff 
*skb, u32 *secid);
-   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority);
+   int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority, 
bool check);
void (*sk_free_security) (struct sock *sk);
void (*sk_clone_security) (const struct sock *sk, struct sock *newsk);
void (*sk_getsecid) (struct sock *sk, u32 *secid);
@@ -2539,7 +2539,7 @@ int security_sock_rcv_skb(struct sock *sk, struct sk_buff 
*skb);
 int security_socket_getpeersec_stream(struct socket *sock, char __user *optval,
  int __user *optlen, unsigned len);
 int security_socket_getpeersec_dgram(struct socket *sock, struct sk_buff *skb, 
u32 *secid);
-int security_sk_alloc(struct sock *sk, int family, gfp_t priority);
+int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool check);
 void security_sk_free(struct sock *sk);
 void security_sk_clone(const struct sock *sk, struct sock *newsk);
 void security_sk_classify_flow(struct sock *sk, struct flowi *fl);
@@ -2667,7 +2667,7 @@ static inline int security_socket_getpeersec_dgram(struct 
socket *sock, struct s
return -ENOPROTOOPT;
 }
 
-static inline int security_sk_alloc(struct sock *sk, int family, gfp_t 
priority)
+static inline int security_sk_alloc(struct sock *sk, int family, gfp_t 
priority, bool check)
 {
return 0;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 8f67ced..e00cadf 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1186,7 +1186,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, 
gfp_t priority,
if (sk != NULL) {
kmemcheck_annotate_bitfield(sk, flags);
 
-   if (security_sk_alloc(sk, family, priority))
+   if (security_sk_alloc(sk, family, priority, false))
goto out_free;
 
if (!try_module_get(prot->owner))
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 76dde25..b233d6e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1524,6 +1524,8 @@ void ip_send_unicast_reply(struct net *net, struct 
sk_buff *skb, __be32 daddr,
sk->sk_priority = skb->priority;
sk->sk_protocol = ip_hdr(skb)->protocol;
sk->sk_bound_dev_if = arg->bound_dev_if;
+   if (security_sk_alloc(sk, PF_INET, GFP_ATOMIC, true))
+   goto out;
sock_net_set(sk, net);
__skb_queue_head_init(&sk->sk_write_queue);
sk->sk_sndbuf = sysctl_wmem_default;
@@ -1539,7 +1541,7 @@ void ip_send_unicast_reply(struct net *net, struct 
sk_buff *skb, __be32 daddr,
skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb));
ip_push_pending_frames(sk, &fl4);
}
-
+out:
put_cpu_var(unicast_sock);
 
ip_rt_put(rt);
diff --git a/security/security.c b/security/security.c
index 860aeb3..af7404e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1146,9 +1146,9 @@ int security_socket_getpeersec_dgram(struct socket *sock, 
struct sk_buff *skb, u
 }
 EXPORT_SYMBOL(security_socket_getpeersec_dgram);
 
-int security_sk_alloc(struct sock *sk, int family, gfp_t priority)
+int security_sk_alloc(struct sock *sk, int family, gfp_t priority, bool check)
 {
-   return security_ops->sk_alloc_security(sk, family, priority);
+   return security_ops->sk_alloc_security(sk, family, priority, check);
 }
 
 void security_sk_free(struct sock *sk)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6c77f63..459eca6 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4289,10 +4289,13 @@ out:

< 1 2 3 4 5 6 7 >

101 - 200 of 631 matches

Mail list logo