date:20121106

Re: [Query]: sched/fair: prio_changed_fair()

2012-11-06 Thread Viresh Kumar

On 7 November 2012 13:26, Michael Wang  wrote:
> It's the user nice value I suppose, so it should be reversed when we are
> talking about weight.

Ahh.. I knew it .. How can i miss it.

Sorry for the noise :(

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Query]: sched/fair: prio_changed_fair()

2012-11-06 Thread Michael Wang

On 11/07/2012 03:49 PM, Viresh Kumar wrote:
Hi, Viresh
> Hi Ingo/Peter,
> 
> I am trying to understand the complex scheduler code and just found
> something incorrect (maybe i am not reading it well):
> 
> File: kernel/sched/fair.c
> 
> static void
> prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
> {
>   if (!p->se.on_rq)
>   return;
> 
>   /*
>* Reschedule if we are currently running on this runqueue and
>* our priority decreased, or if we are not currently running on
>* this runqueue and our priority is higher than the current's
>*/
>   if (rq->curr == p) {
>   if (p->prio > oldprio)
>   resched_task(rq->curr);
>   } else
>   check_preempt_curr(rq, p, 0);
> }
> 
> 
> Comment says that we must mark the task to be rescheduled, if we
> are currently running and our priority has decreased. But in code we
> are checking (p->prio > oldprio). i.e. reschedule if we were currently
> running and our priority increased.

It's the user nice value I suppose, so it should be reversed when we are
talking about weight.

Regards,
Michael Wang

> 
> Sorry if i am wrong :(
> 
> --
> viresh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] mm: fix build warning for uninitialized value

2012-11-06 Thread Haggai Eran

On 05/11/2012 23:36, David Rientjes wrote:
> do_wp_page() sets mmun_called if mmun_start and mmun_end were initialized 
> and, if so, may call mmu_notifier_invalidate_range_end() with these 
> values.  This doesn't prevent gcc from emitting a build warning though:
> 
> mm/memory.c: In function ‘do_wp_page’:
> mm/memory.c:2530: warning: ‘mmun_start’ may be used uninitialized in this 
> function
> mm/memory.c:2531: warning: ‘mmun_end’ may be used uninitialized in this 
> function

I haven't seen these warning. Perhaps I used a different compiler
version, or the right flags.

> 
> It's much easier to initialize the variables to impossible values and do a 
> simple comparison to determine if they were initialized to remove the bool 
> entirely.

This solution looks great to me.

> 
> Signed-off-by: David Rientjes 
> ---
>  mm/memory.c |   10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2527,9 +2527,8 @@ static int do_wp_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>   int ret = 0;
>   int page_mkwrite = 0;
>   struct page *dirty_page = NULL;
> - unsigned long mmun_start;   /* For mmu_notifiers */
> - unsigned long mmun_end; /* For mmu_notifiers */
> - bool mmun_called = false;   /* For mmu_notifiers */
> + unsigned long mmun_start = 0;   /* For mmu_notifiers */
> + unsigned long mmun_end = 0; /* For mmu_notifiers */
>  
>   old_page = vm_normal_page(vma, address, orig_pte);
>   if (!old_page) {
> @@ -2708,8 +2707,7 @@ gotten:
>   goto oom_free_new;
>  
>   mmun_start  = address & PAGE_MASK;
> - mmun_end= (address & PAGE_MASK) + PAGE_SIZE;
> - mmun_called = true;
> + mmun_end= mmun_start + PAGE_SIZE;
>   mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
>  
>   /*
> @@ -2778,7 +2776,7 @@ gotten:
>   page_cache_release(new_page);
>  unlock:
>   pte_unmap_unlock(page_table, ptl);
> - if (mmun_called)
> + if (mmun_end > mmun_start)
>   mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
>   if (old_page) {
>   /*
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 14/19] vfs: add debugfs support

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 7:45 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:56PM +0800, zwu.ker...@gmail.com wrote:
>> +static int hot_range_seq_show(struct seq_file *seq, void *v)
>> +{
>> + struct hot_range_item *hr = v;
>> + struct hot_inode_item *he = hr->hot_inode;
>> + struct hot_freq_data *freq_data = >hot_range.hot_freq_data;
>> +
>> + /* Always lock hot_inode_item first */
>> + spin_lock(>hot_inode.lock);
>> + spin_lock(>hot_range.lock);
>> + seq_printf(seq, "inode #%llu, range start " \
>
> the # seems unnecessary to me
OK, removed.
>
>> + "%llu (range len %u) reads %u, writes %u, "
>> + "avg read time %llu, avg write time %llu, temp %u\n",
>
> compiler will complain if it sees a %llu format and not the expected
> type of 'unsigned long long'
When built, i haven't seen any warning report about this...
>
>> + he->i_ino,
>
> (unsigned long long)he->i_ino,
>
>> + (u64)hr->start * RANGE_SIZE,
>> + hr->len,
>> + freq_data->nr_reads,
>> + freq_data->nr_writes,
>> + freq_data->avg_delta_reads / NSEC_PER_MSEC,
>> + freq_data->avg_delta_writes / NSEC_PER_MSEC,
>> + freq_data->last_temp >> (32 - HEAT_MAP_BITS));
>> + spin_unlock(>hot_range.lock);
>> + spin_unlock(>hot_inode.lock);
>> +
>> + return 0;
>> +}
>> +
>> +static int hot_inode_seq_show(struct seq_file *seq, void *v)
>> +{
>> + struct hot_inode_item *he = v;
>> + struct hot_freq_data *freq_data = >hot_inode.hot_freq_data;
>> +
>> + spin_lock(>hot_inode.lock);
>> + seq_printf(seq, "inode #%llu, reads %u, writes %u, " \
>> + "avg read time %llu, avg write time %llu, temp %u\n",
>
> (same here)
ditto.
>
>> + he->i_ino,
>> + freq_data->nr_reads,
>> + freq_data->nr_writes,
>> + freq_data->avg_delta_reads / NSEC_PER_MSEC,
>> + freq_data->avg_delta_writes / NSEC_PER_MSEC,
>> + freq_data->last_temp >> (32 - HEAT_MAP_BITS));
>> + spin_unlock(>hot_inode.lock);
>> +
>> + return 0;
>> +}
>>
>> +static void *hot_spot_range_seq_next(struct seq_file *seq, void *v, loff_t 
>> *pos)
>> +{
>> + struct hot_info *root = seq->private;
>> + struct hot_range_item *hr_next, *hr = v;
>> + struct hot_comm_item *comm_item;
>> + struct list_head *n_list;
>> + int i =
>> +  hr->hot_range.hot_freq_data.last_temp >> (32 - HEAT_MAP_BITS);
>
> now I have noticed that I've seen the ... (32 - HEAT_MAP_BITS)
> expression so many times that it tend to think it deserves a helper
> function
This helper function has existed, hot_raw_shift(), i will replace this with it.
>
>> +
>> + n_list = seq_list_next(>hot_range.n_list,
>> + >heat_range_map[i].node_list, pos);
>> + hot_range_item_put(hr);
>> +next:
>> + if (n_list) {
>> + comm_item = container_of(n_list,
>> + struct hot_comm_item, n_list);
>> + hr_next = container_of(comm_item,
>> + struct hot_range_item, hot_range);
>> + kref_get(_next->hot_range.refs);
>> + return hr_next;
>> + } else if (--i >= 0) {
>> + n_list = seq_list_next(>heat_range_map[i].node_list,
>> + >heat_range_map[i].node_list, pos);
>> + goto next;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static void hot_debugfs_exit(struct super_block *sb)
>> +{
>> + struct dentry *vol_dentry;
>> +
>> + vol_dentry = debugfs_get_dentry(sb->s_id,
>> + sb->s_hot_root->debugfs_root, 
>> strlen(sb->s_id));
>> + /* remove all debugfs entries recursively from the volume root */
>> + if (vol_dentry)
>> + debugfs_remove_recursive(vol_dentry);
>> + else
>> + BUG_ON(1);
>
> BUG()
done, thanks.
>
>> +
>> + if (list_empty(>s_hot_root->debugfs_root->d_subdirs))
>> + debugfs_remove(sb->s_hot_root->debugfs_root);
>> +}
>> +
>> +/*
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Query]: sched/fair: prio_changed_fair()

2012-11-06 Thread Viresh Kumar

Hi Ingo/Peter,

I am trying to understand the complex scheduler code and just found
something incorrect (maybe i am not reading it well):

File: kernel/sched/fair.c

static void
prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
{
if (!p->se.on_rq)
return;

/*
 * Reschedule if we are currently running on this runqueue and
 * our priority decreased, or if we are not currently running on
 * this runqueue and our priority is higher than the current's
 */
if (rq->curr == p) {
if (p->prio > oldprio)
resched_task(rq->curr);
} else
check_preempt_curr(rq, p, 0);
}


Comment says that we must mark the task to be rescheduled, if we
are currently running and our priority has decreased. But in code we
are checking (p->prio > oldprio). i.e. reschedule if we were currently
running and our priority increased.

Sorry if i am wrong :(

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [perf] convert_variable_type does not correctly check type of arrays

2012-11-06 Thread Namhyung Kim

Hi Hannes,

On Mon, 5 Nov 2012 23:49:16 +0100, Hannes Frederic Sowa wrote:
> While casting an array of (unsigned) chars to a string, perf does not
> check the containing type but only the opaque type and is bailing out:
>
>   $ perf probe -v -a 'neigh_destroy:22 dev->name:string'
>   probe-definition(0): neigh_destroy:22 dev->name:string
>   symbol:neigh_destroy file:(null) line:22 offset:0 return:0 lazy:(null)
>   parsing arg: dev->name:string into type:string dev, name(1)
>   1 arguments
>   Use vmlinux: /home/hannes/linux/vmlinux
>   Using /home/hannes/linux/vmlinux for symbols
>   Probe point found: neigh_destroy+115
>   Searching 'dev' variable in context.
>   Converting variable dev into trace event.
>   converting name in dev
>   name type is (null).
>   Failed to cast into string: name is not (unsigned) char *.
>   Failed to find 'dev' in this function.
>   An error occurred in debuginfo analysis (-22).
> Error: Failed to add events. (-22)
>
> After the code flow ensures that type could only be a pointer or
> array type, call die_get_real_type unconditionally again to fetch the
> containing type and have further validation been done on that Die.

Hyeoncheol posted a same fix before, but not merged yet, Arnaldo?

  https://lkml.org/lkml/2012/9/20/3

Thanks,
Namhyung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] firmware loader: Fix the race FW_STATUS_DONE is followed by class_timeout

2012-11-06 Thread Chuansheng Liu


There is a race condition as below when calling request_firmware():

CPU1CPU2
write 0 > loading
mutex_lock(_lock);
...
set_bit FW_STATUS_DONE  class_timeout is coming
set_bit FW_STATUS_ABORT
complete_all 
...
mutex_unlock(_lock)

In this time, the bit FW_STATUS_DONE and FW_STATUS_ABORT are set,
and request_firmware() will return failure due to condition in
_request_firmware_load():

if (!buf->size || test_bit(FW_STATUS_ABORT, >status))
retval = -ENOENT;

But from the above scenerio, it should be a succcessful requesting.
So we need judge if the FW_STATUS_DONE is set before calling abort
in timeout function firmware_class_timeout().

Signed-off-by: liu chuansheng 
---
 drivers/base/firmware_class.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 8945f4e..35fffd8 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -671,6 +671,13 @@ static void firmware_class_timeout(u_long data)
 {
struct firmware_priv *fw_priv = (struct firmware_priv *) data;
 
+   mutex_lock(_lock);
+   if (test_bit(FW_STATUS_DONE, &(fw_priv->buf->status))) {
+   mutex_unlock(_lock);
+   return;
+   }
+   mutex_unlock(_lock);
+
fw_load_abort(fw_priv);
 }
 
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Update start_pfn in zone and pg_data when spanned_pages == 0.

2012-11-06 Thread Tang Chen


On 10/23/2012 06:30 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu

..

+   /* The zone has no valid section */
+   zone->zone_start_pfn = 0;
+   zone->spanned_pages = 0;
+   zone_span_writeunlock(zone);
+}
+
+static void shrink_pgdat_span(struct pglist_data *pgdat,
+ unsigned long start_pfn, unsigned long end_pfn)
+{

..

+   /* The pgdat has no valid section */
+   pgdat->node_start_pfn = 0;
+   pgdat->node_spanned_pages = 0;
+}


Hi,

If we hot-remove memory only and leave the cpus alive, the corresponding
node will not be removed. But the node_start_pfn and node_spanned_pages
in pg_data will be reset to 0. In this case, when we hot-add the memory
back next time, the node_start_pfn will always be 0 because no pfn is less
than 0. After that, if we hot-remove the memory again, it will cause kernel
panic in function find_biggest_section_pfn() when it tries to scan all 
the pfns.


The zone will also have the same problem.

This patch sets start_pfn to the start_pfn of the section being added when
spanned_pages of the zone or pg_data is 0.

---How to reproduce---

1. hot-add a container with some memory and cpus;
2. hot-remove the container's memory, and leave cpus there;
3. hot-add these memory again;
4. hot-remove them again;

then, the kernel will panic.

---Call trace---

[10530.646285] BUG: unable to handle kernel paging request at 
0fff82a8cc38

[10530.729670] IP: [] find_biggest_section_pfn+0xe5/0x180
..
[10533.064975] Call Trace:
[10533.094162]  [] ? __remove_zone+0x2f/0x1b0
[10533.161757]  [] __remove_zone+0x184/0x1b0
[10533.228318]  [] __remove_section+0x8c/0xb0
[10533.295916]  [] __remove_pages+0xe7/0x120
[10533.362476]  [] arch_remove_memory+0x2c/0x80
[10533.432151]  [] remove_memory+0x56/0x90
[10533.496633]  [] 
acpi_memory_device_remove_memory+0x48/0x73

[10533.580846]  [] acpi_memory_device_notify+0x153/0x274
[10533.659865]  [] ? acpi_bus_get_device+0x2f/0x77
[10533.732653]  [] ? acpi_bus_notify+0xb5/0xec
[10533.801291]  [] acpi_ev_notify_dispatch+0x41/0x5f
[10533.876156]  [] acpi_os_execute_deferred+0x27/0x34
[10533.952062]  [] process_one_work+0x219/0x680
[10534.021736]  [] ? process_one_work+0x1b8/0x680
[10534.093488]  [] ? 
acpi_os_wait_events_complete+0x23/0x23

[10534.175622]  [] worker_thread+0x12e/0x320
[10534.242181]  [] ? manage_workers+0x110/0x110
[10534.311855]  [] kthread+0xc6/0xd0
[10534.370111]  [] kernel_thread_helper+0x4/0x10
[10534.440824]  [] ? retint_restore_args+0x13/0x13
[10534.513612]  [] ? __init_kthread_worker+0x70/0x70
[10534.588480]  [] ? gs_change+0x13/0x13
..
[10535.045543] ---[ end trace 96d845dbf33fee11 ]---


Signed-off-by: Tang Chen 
---
 mm/memory_hotplug.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 56b758a..4aa313c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -212,7 +212,7 @@ static void grow_zone_span(struct zone *zone, 
unsigned long start_pfn,

zone_span_writelock(zone);

old_zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
-   if (start_pfn < zone->zone_start_pfn)
+   if (!zone->spanned_pages || start_pfn < zone->zone_start_pfn)
zone->zone_start_pfn = start_pfn;

zone->spanned_pages = max(old_zone_end_pfn, end_pfn) -
@@ -227,7 +227,7 @@ static void grow_pgdat_span(struct pglist_data 
*pgdat, unsigned long start_pfn,

unsigned long old_pgdat_end_pfn =
pgdat->node_start_pfn + pgdat->node_spanned_pages;

-   if (start_pfn < pgdat->node_start_pfn)
+   if (!pgdat->node_spanned_pages || start_pfn < pgdat->node_start_pfn)
pgdat->node_start_pfn = start_pfn;

pgdat->node_spanned_pages = max(old_pgdat_end_pfn, end_pfn) -
--
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/16] perf: add memory access sampling support

2012-11-06 Thread Namhyung Kim

Hi Arnaldo,

On Tue, 6 Nov 2012 17:52:21 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
> [root@sandy acme]# perf mem -t load rep --stdio 
> --sort=symbol,symbol_daddr,cost
> # Samples: 30  of event 'cpu/mem-loads/pp'
> # Total cost : 640
> # Sort order : symbol,symbol_daddr,cost
> #
> # Overhead  Samples  Symbol Data Symbol 
> Cost
> #   ...  ..  ..  
> ...
> #
> 55.00%1  [k] lookup_fast [k] 0x8803b7521bd4  
> 352
>  5.47%1  [k] cache_alloc_refill  [k] 0x880407705024   
> 35
>  3.44%1  [k] cache_alloc_refill  [k] 0x88041d8527d8   
> 22
>  3.28%1  [k] run_timer_softirq   [k] 0x88041e2c3e90   
> 21
>  2.50%1  [k] __list_add  [k] 0x8803b7521d68   
> 16
>  2.19%1  [.] __strcoll_l [.] 0x7fffa8d44080   
> 14
>  1.88%1  [.] __strcoll_l [.] 0x7fffa8d44104   
> 12
>
> If we go to the annotation browser to see where is that lookup_fast hitting 
> we get:
>
> 100.00 │   mov-0x34(%rbp),%eax
>
> How to map 0x8803b7521bd4 to a stack variable, struct members and all?
>
> Humm, for userspace we have PERF_SAMPLE_REGS_USER for the dwarf unwinder we
> need for userspace, but what about reverse mapping of kernel variables? Jiri?

I suspect there aren't much thing we can do on stack.  One thing we can
do is adding dso_daddr to sort key and seeing it's a [stack] (or
[stack:tid]) or not.  But not sure it can be done for kernel stacks too.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 12/19] vfs: add one ioctl interface

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 7:30 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:54PM +0800, zwu.ker...@gmail.com wrote:
>> +static int ioctl_heat_info(struct file *file, void __user *argp)
>> +{
>> + struct inode *inode = file->f_dentry->d_inode;
>> + struct hot_heat_info *heat_info;
>> + struct hot_inode_item *he;
>> + int ret = 0;
>> +
>> + heat_info = kmalloc(sizeof(struct hot_heat_info),
>> + GFP_KERNEL | GFP_NOFS);
>
> heat_info is small enough to fit onto the stack, so you can avoid the
> kmalloc, I don't think there are deep callstacks to be expected.
ok, done.
> Nevertheless, if you want to use kmalloc here, then please check the
> return value and use GFP_KERNEL.
thanks for your pointing out.
>
>> +
>> + if (copy_from_user((void *) heat_info,
>> + argp,
>> + sizeof(struct hot_heat_info)) != 0) {
>> + ret = -EFAULT;
>> + goto err;
>> + }
>> +
>> + he = hot_inode_item_find(inode->i_sb->s_hot_root, inode->i_ino);
>> + if (!he) {
>> + /* we don't have any info on this file yet */
>> + ret = -ENODATA;
>> + goto err;
>> + }
>> +
>> + spin_lock(>hot_inode.lock);
>> + heat_info->avg_delta_reads =
>> + (__u64) he->hot_inode.hot_freq_data.avg_delta_reads;
>> + heat_info->avg_delta_writes =
>> + (__u64) he->hot_inode.hot_freq_data.avg_delta_writes;
>> + heat_info->last_read_time =
>> + (__u64) timespec_to_ns(>hot_inode.hot_freq_data.last_read_time);
>> + heat_info->last_write_time =
>> + (__u64) timespec_to_ns(>hot_inode.hot_freq_data.last_write_time);
>> + heat_info->num_reads =
>> + (__u32) he->hot_inode.hot_freq_data.nr_reads;
>> + heat_info->num_writes =
>> + (__u32) he->hot_inode.hot_freq_data.nr_writes;
>> +
>> + if (heat_info->live > 0) {
>> + /*
>> +  * got a request for live temperature,
>> +  * call hot_hash_calc_temperature to recalculate
>> +  */
>> + heat_info->temp =
>> + inode->i_sb->s_hot_root->hot_func_type->ops.hot_temp_calc_fn(
>> + >hot_inode.hot_freq_data);
>> + } else {
>> + /* not live temperature, get it from the hashlist */
>> + heat_info->temp = he->hot_inode.hot_freq_data.last_temp;
>> + }
>> + spin_unlock(>hot_inode.lock);
>> +
>> + hot_inode_item_put(he);
>> +
>> + if (copy_to_user(argp, (void *) heat_info,
>> + sizeof(struct hot_heat_info))) {
>> + ret = -EFAULT;
>> + goto err;
>> + }
>> +
>> +err:
>> + kfree(heat_info);
>> + return ret;
>> +}
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SPARC and OF_GPIO

2012-11-06 Thread David Miller

From: Thierry Reding 
Date: Wed, 7 Nov 2012 07:52:58 +0100

> It seems like OF_ADDRESS would be trickier. A comment around line 60 in
> drivers/of/platform.c says that SPARC doesn't need functions defined in
> the enclosing #ifdef CONFIG_OF_ADDRESS block. I'm not sure it would be
> acceptable to remove the conflict nonetheless, even if the functions
> aren't used. One benefit would be that the code could receive some extra
> compile coverage.
 ...
> Finally, OF_IRQ is again just generic code to map device tree data to
> IRQ domains. While I didn't see the IRQ_DOMAIN symbol selected anywhere
> in SPARC it should still be possible to run drivers that properly
> implement IRQ domains on SPARC, right? Or is there any reason why they
> wouldn't work?

These are the two most conflicted areas for Sparc.

For addresses, we fully compute the full fully resolved physical
address of all registers of an OF device very early at bootup time
when we first scan the device tree.

Same goes for interrupts, we fully compute them early in the bootup
process.  Also, we support multiple interrupts for a device.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] perf tools: Set kernel data mapping length

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

Currently only text (function) mapping was set, so that the kernel
data addresses couldn't parsed correctly.  Fix it.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/machine.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 34d8dfeaa2b2..fc471704af12 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -84,15 +84,19 @@ int machine__process_lost_event(struct machine *machine 
__maybe_unused,
 static void machine__set_kernel_mmap_len(struct machine *machine,
 union perf_event *event)
 {
-   machine->vmlinux_maps[MAP__FUNCTION]->start = event->mmap.start;
-   machine->vmlinux_maps[MAP__FUNCTION]->end   = (event->mmap.start +
-  event->mmap.len);
-   /*
-* Be a bit paranoid here, some perf.data file came with
-* a zero sized synthesized MMAP event for the kernel.
-*/
-   if (machine->vmlinux_maps[MAP__FUNCTION]->end == 0)
-   machine->vmlinux_maps[MAP__FUNCTION]->end = ~0ULL;
+   int i;
+
+   for (i = 0; i < MAP__NR_TYPES; i++) {
+   machine->vmlinux_maps[i]->start = event->mmap.start;
+   machine->vmlinux_maps[i]->end   = (event->mmap.start +
+  event->mmap.len);
+   /*
+* Be a bit paranoid here, some perf.data file came with
+* a zero sized synthesized MMAP event for the kernel.
+*/
+   if (machine->vmlinux_maps[i]->end == 0)
+   machine->vmlinux_maps[i]->end = ~0ULL;
+   }
 }
 
 static int machine__process_kernel_mmap_event(struct machine *machine,
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] perf tools: Ignore ABS symbols when loading data maps

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

When loading symbols in a data mapping, ABS symbols (which has a value
of SHN_ABS in its st_shndx) failed at elf_getscn().  And it marks the
loading as a failure so already loaded symbols cannot be fixed up.

I'm not sure what should be done. Just ignore them for now. :)

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/symbol-elf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index db0cc92cf2ea..00cf128e26f4 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -719,6 +719,9 @@ int dso__load_sym(struct dso *dso, struct map *map,
used_opd = true;
}
 
+   if (sym.st_shndx == SHN_ABS)
+   continue;
+
sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
if (!sec)
goto out_elf_end;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] perf tools: Free {branch,mem}_info when freeing hist_entry

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

Those data should be free along with the associated hist_entry,
otherwise they'll be leaked.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/hist.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 52fe4e24502b..50030133fb3b 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -470,6 +470,8 @@ hist_entry__collapse(struct hist_entry *left, struct 
hist_entry *right)
 
 void hist_entry__free(struct hist_entry *he)
 {
+   free(he->branch_info);
+   free(he->mem_info);
free(he);
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6] perf tools: Fix detection of stack area

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

Output of /proc//maps contains helpful information to anonymous
mappings like stack, heap, ...  For the case of stack, it can show
multiple stack area for each thread in the process:

  $ cat /proc/$(pidof gnome-shell)/maps | grep stack
  7fe019946000-7fe01a146000 rw-p  00:00 0  [stack:1624]
  7fe040e32000-7fe041632000 rw-p  00:00 0  [stack:1451]
  7fe041643000-7fe041e43000 rw-p  00:00 0  [stack:1450]
  7fe04204b000-7fe04284b000 rw-p  00:00 0  [stack:1449]
  7fe042a7e000-7fe04327e000 rw-p  00:00 0  [stack:1446]
  7fe0432ff000-7fe043aff000 rw-p  00:00 0  [stack:1445]
  7fe043b0-7fe04430 rw-p  00:00 0  [stack:1444]
  7fe044301000-7fe044b01000 rw-p  00:00 0  [stack:1443]
  7fe044b02000-7fe045302000 rw-p  00:00 0  [stack:1442]
  7fe045303000-7fe045b03000 rw-p  00:00 0  [stack:1441]
  7fe045b04000-7fe046304000 rw-p  00:00 0  [stack:1440]
  7fe046305000-7fe046b05000 rw-p  00:00 0  [stack:1439]
  7fe046b06000-7fe047306000 rw-p  00:00 0  [stack:1438]
  7fff4b16f000-7fff4b19 rw-p  00:00 0  [stack]

However perf only knew about the main thread's.  Fix it.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 9b40c444039c..579187865f08 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -24,7 +24,7 @@ static inline int is_anon_memory(const char *filename)
 
 static inline int is_no_dso_memory(const char *filename)
 {
-   return !strcmp(filename, "[stack]") ||
+   return !strncmp(filename, "[stack", 6) ||
   !strcmp(filename, "[heap]");
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6] perf tools: Fix output of symbol_daddr offset

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

The symbol addresses in a dso have relative offsets from the start of
a mapping.  So in order to ouput correct offset value from @ip, one of
them should be converted.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/sort.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e2e466d24c16..166480ca2865 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -186,7 +186,7 @@ static int _hist_entry__sym_snprintf(struct map *map, 
struct symbol *sym,
if (map->type == MAP__VARIABLE) {
ret += repsep_snprintf(bf + ret, size - ret, "%s", 
sym->name);
ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
-   ip - sym->start);
+   ip - map->unmap_ip(map, sym->start));
ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
   width - ret, "");
} else {
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6] perf tools: Synthesize data mmap events for threads

2012-11-06 Thread Namhyung Kim

From: Namhyung Kim 

Current perf_event__synthesize_mmap_events() only deals with
executable mappings.  With upcoming memory access sampling,
non-executable data mappings are needed also.

While at it, convert parsing code to use sscanf which makes
the code cleaner IMHO.

Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/event.c | 78 ++---
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index ca9ca285406a..068acf606b40 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -193,55 +193,47 @@ static int perf_event__synthesize_mmap_events(struct 
perf_tool *tool,
event->header.misc = PERF_RECORD_MISC_USER;
 
while (1) {
-   char bf[BUFSIZ], *pbf = bf;
-   int n;
+   char bf[BUFSIZ];
+   char prot[5], *pprot = prot;
size_t size;
+   char exec_name[PATH_MAX], *pexec = exec_name;
+   char anonstr[] = "//anon";
+
if (fgets(bf, sizeof(bf), fp) == NULL)
break;
 
+   /* ensure null termination since stack will be reused */
+   strcpy(exec_name, "");
+
/* 0040-0040c000 r-xp  fd:01 41038  /bin/cat */
-   n = hex2u64(pbf, >mmap.start);
-   if (n < 0)
-   continue;
-   pbf += n + 1;
-   n = hex2u64(pbf, >mmap.len);
-   if (n < 0)
+   sscanf(bf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n",
+  >mmap.start, >mmap.len, pprot,
+  >mmap.pgoff, pexec);
+
+   if (prot[2] == 'x' && !strcmp(exec_name, ""))
+   strcpy(exec_name, anonstr);
+
+   /* ignore non-executable anon mappings */
+   if (!strcmp(exec_name, ""))
continue;
-   pbf += n + 3;
-   if (*pbf == 'x') { /* vm_exec */
-   char anonstr[] = "//anon\n";
-   char *execname = strchr(bf, '/');
-
-   /* Catch VDSO */
-   if (execname == NULL)
-   execname = strstr(bf, "[vdso]");
-
-   /* Catch anonymous mmaps */
-   if ((execname == NULL) && !strstr(bf, "["))
-   execname = anonstr;
-
-   if (execname == NULL)
-   continue;
-
-   pbf += 3;
-   n = hex2u64(pbf, >mmap.pgoff);
-
-   size = strlen(execname);
-   execname[size - 1] = '\0'; /* Remove \n */
-   memcpy(event->mmap.filename, execname, size);
-   size = PERF_ALIGN(size, sizeof(u64));
-   event->mmap.len -= event->mmap.start;
-   event->mmap.header.size = (sizeof(event->mmap) -
-   (sizeof(event->mmap.filename) - 
size));
-   memset(event->mmap.filename + size, 0, 
machine->id_hdr_size);
-   event->mmap.header.size += machine->id_hdr_size;
-   event->mmap.pid = tgid;
-   event->mmap.tid = pid;
-
-   if (process(tool, event, _sample, machine) != 0) {
-   rc = -1;
-   break;
-   }
+
+   if (prot[2] != 'x')
+   event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+
+   size = strlen(exec_name) + 1;
+   memcpy(event->mmap.filename, exec_name, size);
+   size = PERF_ALIGN(size, sizeof(u64));
+   event->mmap.len -= event->mmap.start;
+   event->mmap.header.size = (sizeof(event->mmap) -
+  (sizeof(event->mmap.filename) - 
size));
+   memset(event->mmap.filename + size, 0, machine->id_hdr_size);
+   event->mmap.header.size += machine->id_hdr_size;
+   event->mmap.pid = tgid;
+   event->mmap.tid = pid;
+
+   if (process(tool, event, _sample, machine) != 0) {
+   rc = -1;
+   break;
}
}
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC/PATCH 0/6] perf tools: Additional works for memory access sampling

2012-11-06 Thread Namhyung Kim

Hi,

During playing with the Stephane's memory access sampling series [1],
I needed to have these patches to make perf mem work properly.

It still gets a segfult when analyzing system wide sample data, and
needs more work on dealing with kernel's percpu symbols and rodata
symbols in user app, it worked well for my toy application at least
sometimes. ;-) I'll continue to chasing it down but before doing it
I'd like to share what I have now.

Thanks,
Namhyung

[1] https://lkml.org/lkml/2012/11/5/485


Namhyung Kim (6):
  perf tools: Synthesize data mmap events for threads
  perf tools: Set kernel data mapping length
  perf tools: Fix detection of stack area
  perf tools: Ignore ABS symbols when loading data maps
  perf tools: Fix output of symbol_daddr offset
  perf tools: Free {branch,mem}_info when freeing hist_entry

 tools/perf/util/event.c  | 78 
 tools/perf/util/hist.c   |  2 ++
 tools/perf/util/machine.c| 22 -
 tools/perf/util/map.c|  2 +-
 tools/perf/util/sort.c   |  2 +-
 tools/perf/util/symbol-elf.c |  3 ++
 6 files changed, 55 insertions(+), 54 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 2/2] xen/arm: Fix compile errors when drivers are compiled as modules.

2012-11-06 Thread Ian Campbell

CCing Russell since I believe this is the fix for the "BUG: ARM build
failures due to Xen" failure he reported yesterday,

On Tue, 2012-11-06 at 22:13 +, Konrad Rzeszutek Wilk wrote:
> We end up with:
> 
> ERROR: "HYPERVISOR_event_channel_op" [drivers/xen/xen-gntdev.ko] undefined!
> ERROR: "privcmd_call" [drivers/xen/xen-privcmd.ko] undefined!
> ERROR: "HYPERVISOR_grant_table_op" [drivers/net/xen-netback/xen-netback.ko] 
> undefined!
> 
> and this patch exports said function (which is implemented in hypercall.S).
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Ian Campbell 

> ---
>  arch/arm/xen/enlighten.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 59bcb96..96d969d 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -166,3 +166,8 @@ void free_xenballooned_pages(int nr_pages, struct page 
> **pages)
>   *pages = NULL;
>  }
>  EXPORT_SYMBOL_GPL(free_xenballooned_pages);
> +
> +/* In the hypervisor.S file. */
> +EXPORT_SYMBOL_GPL(HYPERVISOR_event_channel_op);
> +EXPORT_SYMBOL_GPL(HYPERVISOR_grant_table_op);
> +EXPORT_SYMBOL_GPL(privcmd_call);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH 1/2] xen/generic: Disable fallback build on ARM.

2012-11-06 Thread Ian Campbell

On Tue, 2012-11-06 at 22:13 +, Konrad Rzeszutek Wilk wrote:
> As there is no need for it (the fallback code is for older
> hypervisors and they won't run under ARM), 

I think more specifically they won't run on anything other than x86.

[...]
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 46de6cd..273d2b9 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -1,8 +1,8 @@
>  ifneq ($(CONFIG_ARM),y)
> -obj-y+= manage.o balloon.o
> +obj-y+= manage.o balloon.o fallback.o
>  obj-$(CONFIG_HOTPLUG_CPU)+= cpu_hotplug.o
>  endif

I think :
  obj-$(CONFIG_X86) += fallback.o
would better reflect what is going on here.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: macbook pro 9.2 stat/ata bus error

2012-11-06 Thread Azat Khuzhin

Also I want to note that some times notebook was suspended.
But this errors don't appears every time before suspend or just after resume.


On Wed, Nov 7, 2012 at 7:41 AM, Azat Khuzhin  wrote:
>  Anybody?
>
> On Mon, Nov 5, 2012 at 7:28 PM, Azat Khuzhin  wrote:
>> After installing linux on macbook 9.2 (mid 2012), I have next errors
>> in dmesg log:
>>
>> [  389.623828] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=600
>> [  410.038465] NMI watchdog: enabled on all CPUs, permanently consumes
>> one hw-PMU counter.
>> [  410.075042] ehci_hcd :00:1a.0: setting latency timer to 64
>> [  410.483526] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=0
>> [ 1401.834509] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=1800
>> [ 1406.467268] NMI watchdog: enabled on all CPUs, permanently consumes
>> one hw-PMU counter.
>> [ 1406.506769] ehci_hcd :00:1a.0: setting latency timer to 64
>> [ 1406.590122] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=0
>> [ 1407.492260] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x5
>> action 0xe frozen
>> [ 1407.494441] ata2.00: irq_stat 0x0040, PHY RDY changed
>> [ 1407.495238] ata2: SError: { PHYRdyChg CommWake }
>> [ 1407.496035] sr 1:0:0:0: CDB:
>> [ 1407.497333] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
>> [ 1407.498285] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0
>> pio 16392 in
>> [ 1407.498285]  res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask
>> 0x10 (ATA bus error)
>> [ 1407.501987] ata2.00: status: { DRDY }
>> [ 1407.502882] ata2: hard resetting link
>> [ 1408.230302] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [ 1408.233279] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>> filtered out
>> [ 1408.237467] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>> filtered out
>> [ 1408.239084] ata2.00: configured for UDMA/100
>> [ 1408.262238] ata2: EH complete
>> [ 3565.785609] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=1800
>> [ 3576.921499] NMI watchdog: enabled on all CPUs, permanently consumes
>> one hw-PMU counter.
>> [ 3576.958624] ehci_hcd :00:1a.0: setting latency timer to 64
>> [ 3577.114612] EXT4-fs (sda4): re-mounted. Opts:
>> errors=remount-ro,data=ordered,commit=0
>> [ 3577.923688] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x5
>> action 0xe frozen
>> [ 3577.925852] ata2.00: irq_stat 0x0040, PHY RDY changed
>> [ 3577.926746] ata2: SError: { PHYRdyChg CommWake }
>> [ 3577.927544] sr 1:0:0:0: CDB:
>> [ 3577.928345] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
>> [ 3577.929642] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0
>> pio 16392 in
>> [ 3577.929642]  res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask
>> 0x10 (ATA bus error)
>> [ 3577.932954] ata2.00: status: { DRDY }
>> [ 3577.934264] ata2: hard resetting link
>> [ 3578.662228] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>> [ 3578.665211] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>> filtered out
>> [ 3578.669355] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>> filtered out
>> [ 3578.670969] ata2.00: configured for UDMA/100
>> [ 3578.694145] ata2: EH complete
>>
>> Is it linux driver, or maybe
>>
>> $ lspci # sata information only
>> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family
>> 6-port SATA Controller [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0])
>> Subsystem: Intel Corporation Device 7270
>> Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 20
>> I/O ports at 2098 [size=8]
>> I/O ports at 20bc [size=4]
>> I/O ports at 2090 [size=8]
>> I/O ports at 20b8 [size=4]
>> I/O ports at 2060 [size=32]
>> Memory at a0816000 (32-bit, non-prefetchable) [size=2K]
>> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> Capabilities: [70] Power Management version 3
>> Capabilities: [a8] SATA HBA v1.0
>> Capabilities: [b0] PCI Advanced Features
>> Kernel driver in use: ahci
>>
>> $ uname -a
>> Linux macbook-pro 3.6.5macbook-pro-custom-v0.1 #4 SMP Sun Nov 4
>> 12:39:03 UTC 2012 x86_64 GNU/Linux
>> $ cat /etc/debian_version
>> wheezy/sid
>>
>> In OSX there is no errors with hard drive.
>>
>> What else can I do investigate this situation next?
>>
>> --
>> Azat Khuzhin
>
>
>
> --
> Azat Khuzhin



-- 
Azat Khuzhin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Result] total stat for "* %8s *" in kernel region

2012-11-06 Thread Chen Gang

Hello :

1) Result:

   A) netdev:   3 patches (got all reply)
   B) arch/x86: 1 patch (waiting reply)
   C) arch/blackfin:1 patch (waiting reply)
   D) fs/ocfs1: 1 patch (waiting reply)
   E) drivers/gpu/drm:  1 suggestion (waiting reply)
   F) kernel/irq:   1 suggestion (waiting reply)

   These are all which I can find to need improvement, in kernel region.

2) Thanks:

   A) Thank Eric Dumazet, which gives important supports to me.
   B) Thank Shan Wei and David Miller, which provide suggestions to me.
   C) I referenced original Asianux patch (which not packed into kernel)
  but sorry, I cannot find original author (seems OGAWA Hirofumi)


3) Next:

   A) I will continue analysing Asianux patches
  if 'lucky', maybe can find another 'valuable' things, again.

   B) I will continue using LTP to test kernel
  if 'lucky', maybe meet another issues (not only nfs)

   C) Welcome any members to providing 'tasks' to me.
  i)   for myself, I prefer to do something for netdev
  ii)  since they have truly given me much help in mailing list.
  iii) but now, I am truly not know, what to do for them, next.

  maybe truly need additional communications with each other.


   thanks.

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 10/19] vfs: introduce hot func register framework

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 7:14 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:52PM +0800, zwu.ker...@gmail.com wrote:
>> +static struct hot_func_type *hot_func_get(const char *name)
>> +{
>> + struct hot_func_type *f, *h = _func_def;
>> +
>> + spin_lock(_func_list_lock);
>> + list_for_each_entry(f, _func_list, list) {
>> + if (!strcmp(f->hot_func_name, name))
>> + h = f;
>
> You probably want to break here
Good catch, done, thanks.
>
>> + }
>> + spin_unlock(_func_list_lock);
>> +
>> + return h;
>> +}
>> +
>> +int hot_func_register(struct hot_func_type *h)
>> +{
>> + struct hot_func_type *f, *t = NULL;
>> +
>> + /* register, don't allow duplicate names */
>> + spin_lock(_func_list_lock);
>> + list_for_each_entry(f, _func_list, list) {
>> + if (!strcmp(f->hot_func_name, h->hot_func_name))
>> + t = f;
>
> if duplicate names are not allowed, then a warning may make sense to
> let us know that something is wrong
done, thanks.
>
>> + }
>> +
>> + if (t) {
>> + spin_unlock(_func_list_lock);
>> + return -EBUSY;
>> + }
>> +
>> + list_add_tail(>list, _func_list);
>> + spin_unlock(_func_list_lock);
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(hot_func_register);
>> --- a/include/linux/hot_tracking.h
>> +++ b/include/linux/hot_tracking.h
>> @@ -73,6 +75,25 @@ struct hot_range_item {
>>   u32 len; /* length in bytes */
>>  };
>>
>> +typedef u64 (hot_rw_freq_calc_fn) (struct timespec old_atime,
>> + struct timespec cur_time, u64 old_avg);
>> +typedef u32 (hot_temp_calc_fn) (struct hot_freq_data *freq_data);
>> +typedef bool (hot_is_obsolete_fn) (struct hot_freq_data *freq_data);
>
> I'm thinking, whether these typedefs are useful, similar ops structures
> do not introduce them, also when you pick a struct member names exactly
> same as the typedefs:
>
>> +struct hot_func_ops {
>> + hot_rw_freq_calc_fn *hot_rw_freq_calc_fn;
>> + hot_temp_calc_fn *hot_temp_calc_fn;
>> + hot_is_obsolete_fn *hot_is_obsolete_fn;
>> +};
>
> My suggestion is to make the types explicit in the structure.
sorry, i don't get your point, can you elaborate it about how to do this?
>
>> +/* identifies an hot func type */
>> +struct hot_func_type {
>> + char hot_func_name[HOT_NAME_MAX];
>
> 'name' would be sufficient IMHO
done, thanks.
>
>> + /* fields provided by specific FS */
>> + struct hot_func_ops ops;
>> + struct list_head list;
>> +};
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches

2012-11-06 Thread Andrew Morton

On Wed, 7 Nov 2012 08:13:08 +0100 Glauber Costa  wrote:

> On 11/06/2012 01:48 AM, Andrew Morton wrote:
> > On Thu,  1 Nov 2012 16:07:41 +0400
> > Glauber Costa  wrote:
> > 
> >> This means that when we destroy a memcg cache that happened to be empty,
> >> those caches may take a lot of time to go away: removing the memcg
> >> reference won't destroy them - because there are pending references, and
> >> the empty pages will stay there, until a shrinker is called upon for any
> >> reason.
> >>
> >> In this patch, we will call kmem_cache_shrink for all dead caches that
> >> cannot be destroyed because of remaining pages. After shrinking, it is
> >> possible that it could be freed. If this is not the case, we'll schedule
> >> a lazy worker to keep trying.
> > 
> > This patch is really quite nasty.  We poll the cache once per minute
> > trying to shrink then free it?  a) it gives rise to concerns that there
> > will be scenarios where the system could suffer unlimited memory windup
> > but mainly b) it's just lame.
> > 
> > The kernel doesn't do this sort of thing.  The kernel tries to be
> > precise: in a situation like this we keep track of the number of
> > outstanding objects and when that falls to zero, we free their
> > container synchronously.  If those objects are normally left floating
> > around in an allocated but reclaimable state then we can address that
> > by synchronously freeing them if their container has been destroyed.
> > 
> > Or something like that.  If it's something else then fine, but not this.
> > 
> > What do we need to do to fix this?
> > 
> The original patch had a unlikely() test in the free path, conditional
> on whether or not the cache is dead, that would then call this is the
> cache would now be empty.
> 
> I got several requests to remove it and change it to something like
> this, because that is a fast path (I myself think an unlikely branch is
> not that bad)
> 
> If you think such a test is acceptable, I can bring it back and argue in
> the basis of "akpm made me do it!". But meanwhile I will give this extra
> though to see if there is any alternative way I can do it...

OK, thanks, please do take a look at it.

I'd be interested in seeing the old version of the patch which had this
test-n-branch.  Perhaps there's some trick we can pull to lessen its cost.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 19/29] memcg: infrastructure to match an allocation to the right cache

2012-11-06 Thread Andrew Morton

On Wed, 7 Nov 2012 08:04:03 +0100 Glauber Costa  wrote:

> On 11/06/2012 01:28 AM, Andrew Morton wrote:
> > On Thu,  1 Nov 2012 16:07:35 +0400
> > Glauber Costa  wrote:
> > 
> >> +static __always_inline struct kmem_cache *
> >> +memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
> > 
> > I still don't understand why this code uses __always_inline so much.
> > 
> > I don't recall seeing the compiler producing out-of-line versions of
> > "static inline" functions (and perhaps it has special treatment for
> > functions which were defined in a header file?).
> > 
> > And if the compiler *does* decide to uninline the function, perhaps it
> > knows best, and the function shouldn't have been declared inline in the
> > first place.
> > 
> > 
> > If it is indeed better to use __always_inline in this code then we have
> > a heck of a lot of other "static inline" definitions whcih we need to
> > convert!  So, what's going on here?
> > 
> 
> The original motivation is indeed performance related. We want to make
> sure it is inline so it will figure out quickly the "I am not a memcg
> user" case and keep it going. The slub, for instance, is full of
> __always_inline functions to make sure that the fast path contains
> absolutely no function calls. So I was just following this here.

Well.  Do we really know that inlining is best in all these cases?  And
in future, as the code evolves?  If for some reason the compiler
chooses not to inline the function, maybe it was right.  Small code
footprint has benefits.

> I can remove the marker without a problem and leave it to the compiler
> if you think it is best

It's a minor thing.  But __always_inline is rather specialised and
readers of this code will be wondering why it was done here.  Unless we
can actually demonstrate benefit from __always_inline, I'd suggest
following convention here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH V3] binfmt_elf.c: use get_random_int() to fix entropy depleting

2012-11-06 Thread Kees Cook

On Tue, Nov 6, 2012 at 11:02 PM, Jeff Liu  wrote:
> On 11/07/2012 02:21 PM, Kees Cook wrote:
>> I still want to hear at least from Ted about this changes -- we would
>> be potentially increasing the predictability of these bytes...
>
> We would not increasing that if this routine would be used for AT_RANDOM
> only(and if the array keeping aligned to 4 bytes).
> Otherwise, it would be, so let's waiting for further feedbacks.

get_random_int() comes from a different pool than get_random_bytes(),
IIUC. I'd like to hear some convincing reasoning as to why this change
doesn't compromise predictability. :)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 25/29] memcg/sl[au]b: shrink dead caches

2012-11-06 Thread Glauber Costa

On 11/06/2012 01:48 AM, Andrew Morton wrote:
> On Thu,  1 Nov 2012 16:07:41 +0400
> Glauber Costa  wrote:
> 
>> This means that when we destroy a memcg cache that happened to be empty,
>> those caches may take a lot of time to go away: removing the memcg
>> reference won't destroy them - because there are pending references, and
>> the empty pages will stay there, until a shrinker is called upon for any
>> reason.
>>
>> In this patch, we will call kmem_cache_shrink for all dead caches that
>> cannot be destroyed because of remaining pages. After shrinking, it is
>> possible that it could be freed. If this is not the case, we'll schedule
>> a lazy worker to keep trying.
> 
> This patch is really quite nasty.  We poll the cache once per minute
> trying to shrink then free it?  a) it gives rise to concerns that there
> will be scenarios where the system could suffer unlimited memory windup
> but mainly b) it's just lame.
> 
> The kernel doesn't do this sort of thing.  The kernel tries to be
> precise: in a situation like this we keep track of the number of
> outstanding objects and when that falls to zero, we free their
> container synchronously.  If those objects are normally left floating
> around in an allocated but reclaimable state then we can address that
> by synchronously freeing them if their container has been destroyed.
> 
> Or something like that.  If it's something else then fine, but not this.
> 
> What do we need to do to fix this?
> 
The original patch had a unlikely() test in the free path, conditional
on whether or not the cache is dead, that would then call this is the
cache would now be empty.

I got several requests to remove it and change it to something like
this, because that is a fast path (I myself think an unlikely branch is
not that bad)

If you think such a test is acceptable, I can bring it back and argue in
the basis of "akpm made me do it!". But meanwhile I will give this extra
though to see if there is any alternative way I can do it...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 18/29] Allocate memory for memcg caches whenever a new memcg appears

2012-11-06 Thread Andrew Morton

On Wed, 7 Nov 2012 08:05:10 +0100 Glauber Costa  wrote:

> Since you have already included this in mm, would you like me to
> resubmit the series changing things according to your feedback, or
> should I send incremental patches?

I normally don't care.  I do turn replacements into incrementals so
that I and others can see what changed, but that's all scripted.

However in this case the patches have been changed somewhat (mainly
because of sched/numa getting in the way) so incrementals would be nice
if convenient, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 2/7] input: ti_am335x_tsc: Order of TSC wires, made configurable

2012-11-06 Thread Patil, Rachna

The current driver expected touchscreen input
wires(XP,XN,YP,YN) to be connected in a particular order.
Making changes to accept this as platform data.

Signed-off-by: Patil, Rachna 
---
 drivers/input/touchscreen/ti_am335x_tsc.c |  156 ++---
 include/linux/input/ti_am335x_tsc.h   |   12 ++
 include/linux/mfd/ti_am335x_tscadc.h  |   10 ++-
 3 files changed, 159 insertions(+), 19 deletions(-)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index 4369224..6a817a8 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -33,6 +33,17 @@
 #define SEQ_SETTLE 275
 #define MAX_12BIT  ((1 << 12) - 1)
 
+/*
+ * Refer to function regbit_map() to
+ * map the values in the matrix.
+ */
+static int config[4][4] = {
+   {1, 0,  1,  0},
+   {2, 3,  2,  3},
+   {4, 5,  4,  5},
+   {0, 6,  0,  6}
+};
+
 struct titsc {
struct input_dev*input;
struct ti_tscadc_dev*mfd_tscadc;
@@ -42,6 +53,9 @@ struct titsc {
unsigned intenable_bits;
boolpen_down;
int steps_to_configure;
+   int config_inp[20];
+   int bit_xp, bit_xn, bit_yp, bit_yn;
+   int inp_xp, inp_xn, inp_yp, inp_yn;
 };
 
 static unsigned int titsc_readl(struct titsc *ts, unsigned int reg)
@@ -55,6 +69,107 @@ static void titsc_writel(struct titsc *tsc, unsigned int 
reg,
writel(val, tsc->mfd_tscadc->tscadc_base + reg);
 }
 
+/*
+ * Each of the analog lines are mapped
+ * with one or two register bits,
+ * which can be either pulled high/low
+ * depending on the value to be read.
+ */
+static int regbit_map(int val)
+{
+   int map_bits = 0;
+
+   switch (val) {
+   case 1:
+   map_bits = XPP;
+   break;
+   case 2:
+   map_bits = XNP;
+   break;
+   case 3:
+   map_bits = XNN;
+   break;
+   case 4:
+   map_bits = YPP;
+   break;
+   case 5:
+   map_bits = YPN;
+   break;
+   case 6:
+   map_bits = YNN;
+   break;
+   }
+
+   return map_bits;
+}
+
+static int titsc_config_wires(struct titsc *ts_dev)
+{
+   int analog_line[10], wire_order[10];
+   int i, temp_bits, err;
+
+   for (i = 0; i < 4; i++) {
+   /*
+* Get the order in which TSC wires are attached
+* w.r.t. each of the analog input lines on the EVM.
+*/
+   analog_line[i] = ts_dev->config_inp[i] & 0xF0;
+   analog_line[i] = analog_line[i] >> 4;
+
+   wire_order[i] = ts_dev->config_inp[i] & 0x0F;
+   }
+
+   for (i = 0; i < 4; i++) {
+   switch (wire_order[i]) {
+   case 0:
+   temp_bits = config[analog_line[i]][0];
+   if (temp_bits == 0) {
+   err = -EINVAL;
+   goto ret;
+   } else {
+   ts_dev->bit_xp = regbit_map(temp_bits);
+   ts_dev->inp_xp = analog_line[i];
+   break;
+   }
+   case 1:
+   temp_bits = config[analog_line[i]][1];
+   if (temp_bits == 0) {
+   err = -EINVAL;
+   goto ret;
+   } else {
+   ts_dev->bit_xn = regbit_map(temp_bits);
+   ts_dev->inp_xn = analog_line[i];
+   break;
+   }
+   case 2:
+   temp_bits = config[analog_line[i]][2];
+   if (temp_bits == 0) {
+   err = -EINVAL;
+   goto ret;
+   } else {
+   ts_dev->bit_yp = regbit_map(temp_bits);
+   ts_dev->inp_yp = analog_line[i];
+   break;
+   }
+   case 3:
+   temp_bits = config[analog_line[i]][3];
+   if (temp_bits == 0) {
+   err = -EINVAL;
+   goto ret;
+   } else {
+   ts_dev->bit_yn = regbit_map(temp_bits);
+   ts_dev->inp_yn = analog_line[i];
+   break;
+   }
+   }
+   }
+
+   return 0;
+
+ret:
+   return err;
+}
+
 static void

[PATCH RESEND 3/7] input: ti_am335x_tsc: Add variance filter

2012-11-06 Thread Patil, Rachna

Only fine tuning variance parameter present in tslib
utility does not help in removing all the ADC noise.
This logic of filtering is necessary to get this
touchscreen to work finely.

Signed-off-by: Patil, Rachna 
---
 drivers/input/touchscreen/ti_am335x_tsc.c |   15 ++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index 6a817a8..7a26810 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -32,6 +32,8 @@
 #define ADCFSM_STEPID  0x10
 #define SEQ_SETTLE 275
 #define MAX_12BIT  ((1 << 12) - 1)
+#define TSCADC_DELTA_X 15
+#define TSCADC_DELTA_Y 15
 
 /*
  * Refer to function regbit_map() to
@@ -51,6 +53,8 @@ struct titsc {
unsigned intwires;
unsigned intx_plate_resistance;
unsigned intenable_bits;
+   unsigned intbckup_x;
+   unsigned intbckup_y;
boolpen_down;
int steps_to_configure;
int config_inp[20];
@@ -309,12 +313,18 @@ static irqreturn_t titsc_irq(int irq, void *dev)
unsigned int z1, z2, z;
unsigned int fsm;
unsigned int fifo1count, fifo0count;
+   unsigned int diffx = 0, diffy = 0;
int i;
 
status = titsc_readl(ts_dev, REG_IRQSTATUS);
if (status & IRQENB_FIFO0THRES) {
titsc_read_coordinates(ts_dev, , );
 
+   diffx = abs(x - (ts_dev->bckup_x));
+   diffy = abs(y - (ts_dev->bckup_y));
+   ts_dev->bckup_x = x;
+   ts_dev->bckup_y = y;
+
z1 = titsc_readl(ts_dev, REG_FIFO0) & 0xfff;
z2 = titsc_readl(ts_dev, REG_FIFO1) & 0xfff;
 
@@ -338,7 +348,8 @@ static irqreturn_t titsc_irq(int irq, void *dev)
z /= z1;
z = (z + 2047) >> 12;
 
-   if (z <= MAX_12BIT) {
+   if ((diffx < TSCADC_DELTA_X) &&
+   (diffy < TSCADC_DELTA_Y) && (z <= MAX_12BIT)) {
input_report_abs(input_dev, ABS_X, x);
input_report_abs(input_dev, ABS_Y, y);
input_report_abs(input_dev, ABS_PRESSURE, z);
@@ -361,6 +372,8 @@ static irqreturn_t titsc_irq(int irq, void *dev)
fsm = titsc_readl(ts_dev, REG_ADCFSM);
if (fsm == ADCFSM_STEPID) {
ts_dev->pen_down = false;
+   ts_dev->bckup_x = 0;
+   ts_dev->bckup_y = 0;
input_report_key(input_dev, BTN_TOUCH, 0);
input_report_abs(input_dev, ABS_PRESSURE, 0);
input_sync(input_dev);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 4/7] MFD: ti_am335x_tscadc: add device tree binding information

2012-11-06 Thread Patil, Rachna

Signed-off-by: Patil, Rachna 
---
 .../devicetree/bindings/mfd/ti_am335x_tscadc.txt   |   35 
 1 files changed, 35 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt

diff --git a/Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt 
b/Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt
new file mode 100644
index 000..c13c492
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt
@@ -0,0 +1,35 @@
+Texas Instruments - TSC / ADC multi-functional device
+
+ti_tscadc is a multi-function device with touchscreen and ADC on chip.
+This document describes the binding for mfd device.
+
+Required properties:
+- compatible: "ti,ti-tscadc"
+- reg: Specifies the address of MFD block
+- interrupts: IRQ line connected to the main SoC
+- interrupt-parent: The parent interrupt controller
+
+Optional properties:
+- ti,hwmods: Hardware information related to TSC/ADC MFD device
+
+Example:
+
+   tscadc: tscadc@44e0d000 {
+   compatible = "ti,ti-tscadc";
+   reg = <0x44e0d000 0x1000>;
+
+   interrupt-parent = <>;
+   interrupts = <16>;
+   ti,hwmods = "adc_tsc";
+
+   tsc {
+   wires = <4>;
+   x-plate-resistance = <200>;
+   steps-to-configure = <5>;
+   wire-config = <0x00 0x11 0x22 0x33>;
+   };
+
+   adc {
+   adc-channels = <4>;
+   };
+   };
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 1/7] input: ti_am335x_tsc: Step enable bits made configurable

2012-11-06 Thread Patil, Rachna

Current code has hard coded value written to
step enable bits. Now the bits are updated based
on how many steps are needed to be configured got
from platform data.

The user needs to take care not to exceed
the count more than 16. While using ADC and TSC
one should take care to set this parameter correctly.

Signed-off-by: Patil, Rachna 
---
 drivers/input/touchscreen/ti_am335x_tsc.c |   10 --
 include/linux/mfd/ti_am335x_tscadc.h  |1 -
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index 7a18a8a..4369224 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -39,6 +39,7 @@ struct titsc {
unsigned intirq;
unsigned intwires;
unsigned intx_plate_resistance;
+   unsigned intenable_bits;
boolpen_down;
int steps_to_configure;
 };
@@ -57,6 +58,7 @@ static void titsc_writel(struct titsc *tsc, unsigned int reg,
 static void titsc_step_config(struct titsc *ts_dev)
 {
unsigned intconfig;
+   unsigned intstepenable = 0;
int i, total_steps;
 
/* Configure the Step registers */
@@ -128,7 +130,11 @@ static void titsc_step_config(struct titsc *ts_dev)
titsc_writel(ts_dev, REG_STEPDELAY(total_steps + 2),
STEPCONFIG_OPENDLY);
 
-   titsc_writel(ts_dev, REG_SE, STPENB_STEPENB_TC);
+   for (i = 0; i <= (total_steps + 2); i++)
+   stepenable |= 1 << i;
+   ts_dev->enable_bits = stepenable;
+
+   titsc_writel(ts_dev, REG_SE, ts_dev->enable_bits);
 }
 
 static void titsc_read_coordinates(struct titsc *ts_dev,
@@ -250,7 +256,7 @@ static irqreturn_t titsc_irq(int irq, void *dev)
 
titsc_writel(ts_dev, REG_IRQSTATUS, irqclr);
 
-   titsc_writel(ts_dev, REG_SE, STPENB_STEPENB_TC);
+   titsc_writel(ts_dev, REG_SE, ts_dev->enable_bits);
return IRQ_HANDLED;
 }
 
diff --git a/include/linux/mfd/ti_am335x_tscadc.h 
b/include/linux/mfd/ti_am335x_tscadc.h
index c79ad5d..23e4f33 100644
--- a/include/linux/mfd/ti_am335x_tscadc.h
+++ b/include/linux/mfd/ti_am335x_tscadc.h
@@ -47,7 +47,6 @@
 #define STEPENB_MASK   (0x1 << 0)
 #define STEPENB(val)   ((val) << 0)
 #define STPENB_STEPENB STEPENB(0x1)
-#define STPENB_STEPENB_TC  STEPENB(0x1FFF)
 
 /* IRQ enable */
 #define IRQENB_HW_PEN  BIT(0)
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 7/7] IIO: ti_am335x_adc: Add DT support

2012-11-06 Thread Patil, Rachna

Add DT support for client ADC driver.

Signed-off-by: Patil, Rachna 
---
 drivers/iio/adc/ti_am335x_adc.c |   24 
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c
index 02a43c8..1f1ec0c 100644
--- a/drivers/iio/adc/ti_am335x_adc.c
+++ b/drivers/iio/adc/ti_am335x_adc.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -141,11 +143,18 @@ static int __devinit tiadc_probe(struct platform_device 
*pdev)
struct iio_dev  *indio_dev;
struct tiadc_device *adc_dev;
struct ti_tscadc_dev*tscadc_dev = pdev->dev.platform_data;
-   struct mfd_tscadc_board *pdata;
+   struct mfd_tscadc_board *pdata = NULL;
+   struct device_node  *node = NULL;
int err;
+   u32 val32;
 
-   pdata = tscadc_dev->dev->platform_data;
-   if (!pdata || !pdata->adc_init) {
+   if (tscadc_dev->dev->of_node) {
+   node = tscadc_dev->dev->of_node;
+   node = of_find_node_by_name(node, "adc");
+   } else
+   pdata = tscadc_dev->dev->platform_data;
+
+   if (!pdata && !node) {
dev_err(>dev, "Could not find platform data\n");
return -EINVAL;
}
@@ -159,7 +168,14 @@ static int __devinit tiadc_probe(struct platform_device 
*pdev)
adc_dev = iio_priv(indio_dev);
 
adc_dev->mfd_tscadc = tscadc_dev;
-   adc_dev->channels = pdata->adc_init->adc_channels;
+   if (node) {
+   err = of_property_read_u32(node, "adc-channels", );
+   if (err < 0)
+   goto err_free_device;
+   else
+   adc_dev->channels = val32;
+   } else
+   adc_dev->channels = pdata->adc_init->adc_channels;
 
indio_dev->dev.parent = >dev;
indio_dev->name = dev_name(>dev);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 6/7] input: ti_am335x_tsc: Add DT support

2012-11-06 Thread Patil, Rachna

Add DT support for client touchscreen driver

Signed-off-by: Patil, Rachna 
---
 drivers/input/touchscreen/ti_am335x_tsc.c |   60 -
 1 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/drivers/input/touchscreen/ti_am335x_tsc.c 
b/drivers/input/touchscreen/ti_am335x_tsc.c
index 7a26810..c063cf6 100644
--- a/drivers/input/touchscreen/ti_am335x_tsc.c
+++ b/drivers/input/touchscreen/ti_am335x_tsc.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -398,12 +400,18 @@ static int __devinit titsc_probe(struct platform_device 
*pdev)
struct titsc *ts_dev;
struct input_dev *input_dev;
struct ti_tscadc_dev *tscadc_dev = pdev->dev.platform_data;
-   struct mfd_tscadc_board *pdata;
-   int err;
-
-   pdata = tscadc_dev->dev->platform_data;
-
-   if (!pdata) {
+   int err, i;
+   struct mfd_tscadc_board *pdata = NULL;
+   struct device_node *node = NULL;
+   u32 val32, wires_conf[4];
+
+   if (tscadc_dev->dev->of_node) {
+   node = tscadc_dev->dev->of_node;
+   node = of_find_node_by_name(node, "tsc");
+   } else
+   pdata = tscadc_dev->dev->platform_data;
+
+   if (!pdata && !node) {
dev_err(>dev, "Could not find platform data\n");
return -EINVAL;
}
@@ -421,11 +429,43 @@ static int __devinit titsc_probe(struct platform_device 
*pdev)
ts_dev->mfd_tscadc = tscadc_dev;
ts_dev->input = input_dev;
ts_dev->irq = tscadc_dev->irq;
-   ts_dev->wires = pdata->tsc_init->wires;
-   ts_dev->x_plate_resistance = pdata->tsc_init->x_plate_resistance;
-   ts_dev->steps_to_configure = pdata->tsc_init->steps_to_configure;
-   memcpy(ts_dev->config_inp, pdata->tsc_init->wire_config,
+
+   if (node) {
+   err = of_property_read_u32(node, "wires", );
+   if (err < 0)
+   goto err_free_mem;
+   else
+   ts_dev->wires = val32;
+
+   err = of_property_read_u32(node, "x-plate-resistance", );
+   if (err < 0)
+   goto err_free_mem;
+   else
+   ts_dev->x_plate_resistance = val32;
+
+   err = of_property_read_u32(node, "steps-to-configure", );
+   if (err < 0)
+   goto err_free_mem;
+   else
+   ts_dev->steps_to_configure = val32;
+
+   err = of_property_read_u32_array(node, "wire-config",
+   wires_conf, ARRAY_SIZE(wires_conf));
+   if (err < 0)
+   goto err_free_mem;
+   else {
+   for (i = 0; i < ARRAY_SIZE(wires_conf); i++)
+   ts_dev->config_inp[i] = wires_conf[i];
+   }
+   } else {
+   ts_dev->wires = pdata->tsc_init->wires;
+   ts_dev->x_plate_resistance =
+   pdata->tsc_init->x_plate_resistance;
+   ts_dev->steps_to_configure =
+   pdata->tsc_init->steps_to_configure;
+   memcpy(ts_dev->config_inp, pdata->tsc_init->wire_config,
sizeof(pdata->tsc_init->wire_config));
+   }
 
err = request_irq(ts_dev->irq, titsc_irq,
  0, pdev->dev.driver->name, ts_dev);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 0/7] MFD: ti_am335x_tscadc: DT support and TSC features addition

2012-11-06 Thread Patil, Rachna

This patch set is a cumulative set of [1] and [2] sent earlier.

Note that there are no code changes in either of the patch set,
only rebased on top of MFD-next to make sure that all the patches
apply without any conflicts.

This patch set has been tested on AM335x EVM and is based on top of [3].

[1] http://www.spinics.net/lists/linux-input/msg23060.html
[2] http://www.spinics.net/lists/linux-input/msg23090.html
[3] https://lkml.org/lkml/2012/11/6/67

Patil, Rachna (7):
  input: ti_am335x_tsc: Step enable bits made configurable
  input: ti_am335x_tsc: Order of TSC wires, made configurable
  input: ti_am335x_tsc: Add variance filter
  MFD: ti_am335x_tscadc: add device tree binding information
  MFD: ti_am335x_tscadc: Add DT support
  input: ti_am335x_tsc: Add DT support
  IIO: ti_am335x_adc: Add DT support

 .../devicetree/bindings/mfd/ti_am335x_tscadc.txt   |   35 +++
 drivers/iio/adc/ti_am335x_adc.c|   24 ++-
 drivers/input/touchscreen/ti_am335x_tsc.c  |  239 +---
 drivers/mfd/ti_am335x_tscadc.c |   28 ++-
 include/linux/input/ti_am335x_tsc.h|   12 +
 include/linux/mfd/ti_am335x_tscadc.h   |   11 +-
 6 files changed, 308 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/ti_am335x_tscadc.txt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RESEND 5/7] MFD: ti_am335x_tscadc: Add DT support

2012-11-06 Thread Patil, Rachna

Make changes to add DT support in the MFD core driver.

Signed-off-by: Patil, Rachna 
---
 drivers/mfd/ti_am335x_tscadc.c |   28 +++-
 1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/mfd/ti_am335x_tscadc.c b/drivers/mfd/ti_am335x_tscadc.c
index 8ca3bf0..07b7788 100644
--- a/drivers/mfd/ti_am335x_tscadc.c
+++ b/drivers/mfd/ti_am335x_tscadc.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -64,20 +66,31 @@ static  int __devinit ti_tscadc_probe(struct 
platform_device *pdev)
struct resource *res;
struct clk  *clk;
struct mfd_tscadc_board *pdata = pdev->dev.platform_data;
+   struct device_node  *node = pdev->dev.of_node;
struct mfd_cell *cell;
int err, ctrl;
int clk_value, clock_rate;
-   int tsc_wires, adc_channels = 0, total_channels;
+   int tsc_wires = 0, adc_channels = 0, total_channels;
 
-   if (!pdata) {
+   if (!pdata && !pdev->dev.of_node) {
dev_err(>dev, "Could not find platform data\n");
return -EINVAL;
}
 
-   if (pdata->adc_init)
-   adc_channels = pdata->adc_init->adc_channels;
+   if (pdev->dev.of_node) {
+   node = of_find_node_by_name(pdev->dev.of_node, "tsc");
+   of_property_read_u32(node, "wires", _wires);
+
+   node = of_find_node_by_name(pdev->dev.of_node, "adc");
+   of_property_read_u32(node, "adc-channels", _channels);
+   } else {
+   if (pdata->tsc_init)
+   tsc_wires = pdata->tsc_init->wires;
+
+   if (pdata->adc_init)
+   adc_channels = pdata->adc_init->adc_channels;
+   }
 
-   tsc_wires = pdata->tsc_init->wires;
total_channels = tsc_wires + adc_channels;
 
if (total_channels > 8) {
@@ -256,11 +269,16 @@ static const struct dev_pm_ops tscadc_pm_ops = {
 #define TSCADC_PM_OPS NULL
 #endif
 
+static const struct of_device_id ti_tscadc_dt_ids[] = {
+   { .compatible = "ti,ti-tscadc", },
+};
+
 static struct platform_driver ti_tscadc_driver = {
.driver = {
.name   = "ti_tscadc",
.owner  = THIS_MODULE,
.pm = TSCADC_PM_OPS,
+   .of_match_table = of_match_ptr(ti_tscadc_dt_ids),
},
.probe  = ti_tscadc_probe,
.remove = __devexit_p(ti_tscadc_remove),
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 05/19] vfs: add hooks to enable hot tracking

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 6:51 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:47PM +0800, zwu.ker...@gmail.com wrote:
>> --- a/mm/readahead.c
>> +++ b/mm/readahead.c
>> @@ -19,6 +19,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  /*
>>   * Initialise a struct file's readahead state.  Assumes that the caller has
>> @@ -138,6 +139,11 @@ static int read_pages(struct address_space *mapping, 
>> struct file *filp,
>>  out:
>>   blk_finish_plug();
>>
>> + /* Hot data tracking */
>> + hot_update_freqs(mapping->host, (u64)(list_entry(pages->prev,\
>> + struct page, lru)->index) << PAGE_CACHE_SHIFT,
>> + (u64)nr_pages * PAGE_CACHE_SIZE, 0);
>
> There's a stale \ at the end of the line, and I find this formatting
> hard to read. Does the following look acceptable?
yes, great, thanks.
>
> hot_update_freqs(mapping->host,
> (u64)(list_entry(pages->prev, struct page, lru)->index)
> << PAGE_CACHE_SHIFT,
> (u64)nr_pages * PAGE_CACHE_SIZE, 0);
>
>> +
>>   return ret;
>>  }
>>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.

2012-11-06 Thread R, Durgadoss

Hi Rui,


> -Original Message-
> From: Zhang, Rui
> Sent: Wednesday, November 07, 2012 12:07 PM
> To: R, Durgadoss
> Cc: Jonghwa Lee; linux...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Brown, Len; Rafael J. Wysocki; Amit Dinel Kachhap; MyungJoo Ham;
> Kyungmin Park
> Subject: RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting
> exynos's emulation mode.
> 
> On Thu, 2012-11-01 at 23:13 -0600, R, Durgadoss wrote:
> > Hi Lee,
> >
> > > -Original Message-
> > > From: Jonghwa Lee [mailto:jonghwa3@samsung.com]
> > > Sent: Friday, November 02, 2012 7:55 AM
> > > To: linux...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org; Brown, Len; R, Durgadoss; Rafael J.
> > > Wysocki; Amit Dinel Kachhap; MyungJoo Ham; Kyungmin Park; Jonghwa
> Lee
> > > Subject: [PATCH v4] Thermal: exynos: Add sysfs node supporting
> exynos's
> > > emulation mode.
> > >
> > > This patch supports exynos's emulation mode with newly created sysfs
> node.
> > > Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for
> thermal
> > > management unit. Thermal emulation mode supports software debug for
> > > TMU's
> > > operation. User can set temperature manually with software code and
> TMU
> > > will read current temperature from user value not from sensor's value.
> > > This patch includes also documentary placed under
> > > Documentation/thermal/.
> > >
> >
> first of all, what would happen if overheat happens during emulation?
> 
> I just had a thought about if we can introduce this to the generic
> thermal layer.

Sure, we can.

> to do this, we only need to:
> 1) introduce tz->emulation
> 2) introduce thermal_get_temp()
>   static int thermal_get_temp(tz) {
>   if (tz->emulation)
>   return tz->emulation;
>   else
>   return tz->ops->get_temp(tz);
>   }
> 3) replace tz->ops->get_temp() with thermal_get_temp() in thermal layer
> 4) introduce /sys/class/thermal/thermal_zoneX/emulation
> 5) when setting /sys/class/thermal/thermal_zoneX/emulation,
>a) set tz->emulation
>b) invoke thermal_zone_device_update();
> this is a pure software emulation solution but it would work on all
> generic thermal layer users.
> 
> do you think this proposal would work properly?

Yes, this should work..
But, I am working on (top of your -next tree) to add multiple sensor support to
thermal framework.(What we discussed in Plumbers this year).
This changes APIs quite a bit in the thermal framework.
So, we will add this emulation support after the above changes are
in. What do you think ?

Thanks,
Durga

> if yes, I'd like to see if it is valuable for the other platform thermal
> drivers.
> 
> thanks,
> rui
> > Thanks for fixing the comments.
> > Please CC linux-acpi, when you submit thermal patches, going forward.
> > I am CCing Rui for now, for him to review/merge this patch.
> >
> > Reviewed-by: Durgadoss R 
> >
> > Thanks,
> > Durga
> >
> > > Signed-off-by: Jonghwa Lee 
> > > ---
> > > v4
> > >  - Fix Typo.
> > >  - Remove unnecessary codes.
> > >  - Add comments about feature of exynos emulation operation to the
> > > document.
> > >
> > > v3
> > >  - Remove unnecessay variables.
> > >  - Do some code clean in exynos_tmu_emulation_store().
> > >  - Make wrapping function of sysfs node creation function to use
> > >#ifdefs in minimum.
> > >
> > > v2
> > >  exynos_thermal.c
> > >  - Fix build error occured by wrong emulation control register name.
> > >  - Remove exynos5410 dependent codes.
> > >  exynos_thermal_emulation
> > >  - Align indentation.
> > >
> > >  Documentation/thermal/exynos_thermal_emulation |   56
> > > +++
> > >  drivers/thermal/Kconfig|9 +++
> > >  drivers/thermal/exynos_thermal.c   |   91
> > > 
> > >  3 files changed, 156 insertions(+), 0 deletions(-)
> > >  create mode 100644
> Documentation/thermal/exynos_thermal_emulation
> > >
> > > diff --git a/Documentation/thermal/exynos_thermal_emulation
> > > b/Documentation/thermal/exynos_thermal_emulation
> > > new file mode 100644
> > > index 000..a6ea06f
> > > --- /dev/null
> > > +++ b/Documentation/thermal/exynos_thermal_emulation
> > > @@ -0,0 +1,56 @@
> > > +EXYNOS EMULATION MODE
> > > +
> > > +
> > > +Copyright (C) 2012 Samsung Electronics
> > > +
> > > +Written by Jonghwa Lee 
> > > +
> > > +Description
> > > +---
> > > +
> > > +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for
> thermal
> > > management unit.
> > > +Thermal emulation mode supports software debug for TMU's
> operation.
> > > User can set temperature
> > > +manually with software code and TMU will read current temperature
> from
> > > user value not from
> > > +sensor's value.
> > > +
> > > +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this
> support
> > > in available.
> > > +When it's enabled, sysfs node will be created under
> > > +/sys/bus/platform/devices/'exynos device name'/

Re: [PATCH v6 18/29] Allocate memory for memcg caches whenever a new memcg appears

2012-11-06 Thread Glauber Costa

On 11/06/2012 01:23 AM, Andrew Morton wrote:
> On Thu,  1 Nov 2012 16:07:34 +0400
> Glauber Costa  wrote:
> 
>> Every cache that is considered a root cache (basically the "original" caches,
>> tied to the root memcg/no-memcg) will have an array that should be large 
>> enough
>> to store a cache pointer per each memcg in the system.
>>
>> Theoreticaly, this is as high as 1 << sizeof(css_id), which is currently in 
>> the
>> 64k pointers range. Most of the time, we won't be using that much.
>>
>> What goes in this patch, is a simple scheme to dynamically allocate such an
>> array, in order to minimize memory usage for memcg caches. Because we would
>> also like to avoid allocations all the time, at least for now, the array will
>> only grow. It will tend to be big enough to hold the maximum number of
>> kmem-limited memcgs ever achieved.
>>
>> We'll allocate it to be a minimum of 64 kmem-limited memcgs. When we have 
>> more
>> than that, we'll start doubling the size of this array every time the limit 
>> is
>> reached.
>>
>> Because we are only considering kmem limited memcgs, a natural point for this
>> to happen is when we write to the limit. At that point, we already have
>> set_limit_mutex held, so that will become our natural synchronization
>> mechanism.
>>
>> ...
>>
>> +static struct ida kmem_limited_groups;
> 
> Could use DEFINE_IDA() here
> 
>>
>> ...
>>
>>  static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys 
>> *ss)
>>  {
>> +int ret;
>> +
>>  memcg->kmemcg_id = -1;
>> -memcg_propagate_kmem(memcg);
>> +ret = memcg_propagate_kmem(memcg);
>> +if (ret)
>> +return ret;
>> +
>> +if (mem_cgroup_is_root(memcg))
>> +ida_init(_limited_groups);
> 
> and zap this?
> 

Ok.

I am starting to go over your replies now, and general question:
Since you have already included this in mm, would you like me to
resubmit the series changing things according to your feedback, or
should I send incremental patches?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 19/29] memcg: infrastructure to match an allocation to the right cache

2012-11-06 Thread Glauber Costa

On 11/06/2012 01:28 AM, Andrew Morton wrote:
> On Thu,  1 Nov 2012 16:07:35 +0400
> Glauber Costa  wrote:
> 
>> +static __always_inline struct kmem_cache *
>> +memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
> 
> I still don't understand why this code uses __always_inline so much.
> 
> I don't recall seeing the compiler producing out-of-line versions of
> "static inline" functions (and perhaps it has special treatment for
> functions which were defined in a header file?).
> 
> And if the compiler *does* decide to uninline the function, perhaps it
> knows best, and the function shouldn't have been declared inline in the
> first place.
> 
> 
> If it is indeed better to use __always_inline in this code then we have
> a heck of a lot of other "static inline" definitions whcih we need to
> convert!  So, what's going on here?
> 

The original motivation is indeed performance related. We want to make
sure it is inline so it will figure out quickly the "I am not a memcg
user" case and keep it going. The slub, for instance, is full of
__always_inline functions to make sure that the fast path contains
absolutely no function calls. So I was just following this here.

I can remove the marker without a problem and leave it to the compiler
if you think it is best

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 03/19] vfs: add I/O frequency update function

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 6:37 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:45PM +0800, zwu.ker...@gmail.com wrote:
>> --- a/fs/hot_tracking.c
>> +++ b/fs/hot_tracking.c
>> +struct hot_inode_item
>> +*hot_inode_item_find(struct hot_info *root, u64 ino)
>> +{
>> + struct hot_inode_item *he;
>> + int ret;
>> +
>> +again:
>> + spin_lock(>lock);
>> + he = radix_tree_lookup(>hot_inode_tree, ino);
>> + if (he) {
>> + kref_get(>hot_inode.refs);
>> + spin_unlock(>lock);
>> + return he;
>> + }
>> + spin_unlock(>lock);
>> +
>> + he = kmem_cache_zalloc(hot_inode_item_cachep,
>> + GFP_KERNEL | GFP_NOFS);
>> + if (!he)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + hot_inode_item_init(he, ino, >hot_inode_tree);
>> +
>> + ret = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
>> + if (ret) {
>> + kmem_cache_free(hot_inode_item_cachep, he);
>
> radix_tree_preload_end()
>
>> + return ERR_PTR(ret);
>> + }
>> +
>> + spin_lock(>lock);
>> + ret = radix_tree_insert(>hot_inode_tree, ino, he);
>> + if (ret == -EEXIST) {
>> + kmem_cache_free(hot_inode_item_cachep, he);
>> + spin_unlock(>lock);
>> + radix_tree_preload_end();
>> + goto again;
>> + }
>> + spin_unlock(>lock);
>> + radix_tree_preload_end();
>> +
>> + kref_get(>hot_inode.refs);
>> + return he;
>> +}
>> +EXPORT_SYMBOL_GPL(hot_inode_item_find);
>> +
>> +static struct hot_range_item
>> +*hot_range_item_find(struct hot_inode_item *he,
>> + u32 start)
>> +{
>> + struct hot_range_item *hr;
>> + int ret;
>> +
>> +again:
>> + spin_lock(>lock);
>> + hr = radix_tree_lookup(>hot_range_tree, start);
>> + if (hr) {
>> + kref_get(>hot_range.refs);
>> + spin_unlock(>lock);
>> + return hr;
>> + }
>> + spin_unlock(>lock);
>> +
>> + hr = kmem_cache_zalloc(hot_range_item_cachep,
>> + GFP_KERNEL | GFP_NOFS);
>> + if (!hr)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + hot_range_item_init(hr, start, he);
>> +
>> + ret = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM);
>> + if (ret) {
>> + kmem_cache_free(hot_range_item_cachep, hr);
>
> radix_tree_preload_end()
I checked some kernel existing cases about the usage of
radix_tree_preload(), it seems that when radix_tree_preload() fail,
its error handling doesn't need call radix_tree_preload_end() any
more.
>
>> + return ERR_PTR(ret);
>> + }
>> +
>> + spin_lock(>lock);
>> + ret = radix_tree_insert(>hot_range_tree, start, hr);
>> + if (ret == -EEXIST) {
>> + kmem_cache_free(hot_range_item_cachep, hr);
>> + spin_unlock(>lock);
>> + radix_tree_preload_end();
ditto.
>> + goto again;
>> + }
>> + spin_unlock(>lock);
>> + radix_tree_preload_end();
>> +
>> + kref_get(>hot_range.refs);
>> + return hr;
>> +}
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH V3] binfmt_elf.c: use get_random_int() to fix entropy depleting

2012-11-06 Thread Jeff Liu

On 11/07/2012 02:21 PM, Kees Cook wrote:
> On Tue, Nov 6, 2012 at 10:11 PM, Jeff Liu  wrote:
>> Hello,
>>
>> This is the revised patch for fix entropy depleting.
>>
>> Changes:
>> 
>> v3->v2:
>> - Tweak code comments of random_stack_user().
>> - Remove redundant bits mask and shift upon the random variable.
>>
>> v2->v1:
>> Fix random copy to check up buffer length that are not 4-byte multiples.
>>
>> v2 can be found at:
>> http://www.spinics.net/lists/linux-fsdevel/msg59418.html
>> v1 can be found at:
>> http://www.spinics.net/lists/linux-fsdevel/msg59128.html
>>
>> Many thanks to Andreas, Andrew as well as Kees for reviewing the patch of 
>> past!
>> -Jeff
>>
>>
>> Entropy is quickly depleted under normal operations like ls(1), cat(1),
>> etc...  between 2.6.30 to current mainline, for instance:
>>
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 3428
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2911
>> $cat /proc/sys/kernel/random/entropy_avail
>> 2620
>>
>> We observed this problem has been occurring since 2.6.30 with
>> fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by
>> f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
>>
>> /*
>>  * Generate 16 random bytes for userspace PRNG seeding.
>>  */
>> get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
>>
>> The patch introduces a wrapper around get_random_int() which has lower
>> overhead than calling get_random_bytes() directly.
>>
>> With this patch applied:
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2731
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2802
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2878
>>
>> Analyzed by John Sobecki.
>>
>> Signed-off-by: Jie Liu 
>> Cc: John Sobecki 
>> Cc: Al Viro 
>> Cc: Andreas Dilger 
>> Cc: Alan Cox 
>> Cc: Arnd Bergmann 
>> Cc: James Morris 
>> Cc: Ted Ts'o 
>> Cc: Greg Kroah-Hartman 
>> Cc: Kees Cook 
>> Cc: Jakub Jelinek 
>> Cc: Ulrich Drepper 
>> Signed-off-by: Andrew Morton 
>>
>> ---
>>  fs/binfmt_elf.c |   22 +-
>>  1 files changed, 21 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index fbd9f60..b6c59f6 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_binprm *bprm, 
>> struct pt_regs *regs);
>>  static int load_elf_library(struct file *);
>>  static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr 
>> *,
>> int, int, unsigned long);
>> +static void randomize_stack_user(unsigned char *buf, size_t nbytes);
> 
> I think it would be easier to just move the function ahead of its use
> to avoid the predeclaration.
Yes, it's better.
> 
>>
>>  /*
>>   * If we don't support core dumping, then supply a NULL so we
>> @@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *bprm, struct 
>> elfhdr *exec,
>> /*
>>  * Generate 16 random bytes for userspace PRNG seeding.
>>  */
>> -   get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
>> +   randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
>> u_rand_bytes = (elf_addr_t __user *)
>>STACK_ALLOC(p, sizeof(k_rand_bytes));
>> if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
>> @@ -558,6 +559,25 @@ static unsigned long randomize_stack_top(unsigned long 
>> stack_top)
>>  #endif
>>  }
>>
>> +/*
>> + * Use get_random_int() to implement AT_RANDOM while avoiding depletion
>> + * of the entropy pool.
>> + */
>> +static void randomize_stack_user(unsigned char *buf, size_t nbytes)
> 
> I think this name needs changing -- it has nothing to do with the
> stack except that that's where it ends up in userspace. Perhaps
> "get_atrandom_bytes"?
I racked my brains but can not think out a better name than yours. :)
> 
>> +{
>> +   unsigned char *p = buf;
>> +
>> +   while (nbytes) {
>> +   unsigned int random_variable;
>> +   size_t chunk = min(nbytes, sizeof(unsigned int));
>> +
>> +   random_variable = get_random_int();
> 
> I still want to hear at least from Ted about this changes -- we would
> be potentially increasing the predictability of these bytes...
We would not increasing that if this routine would be used for AT_RANDOM
only(and if the array keeping aligned to 4 bytes).
Otherwise, it would be, so let's waiting for further feedbacks.

Thanks,
-Jeff
> 
>> +   memcpy(p, _variable, chunk);
>> +   p += chunk;
>> +   nbytes -= chunk;
>> +   }
>> +}
>> +
>>  static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
>>  {
>> struct file *interpreter = NULL; /* to shut gcc up */
>> --
>> 1.7.4.1
> 
> -Kees
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-06 Thread Gerd Hoffmann

On 11/05/12 19:19, Andy King wrote:
> Hi David,
> 
>> The big and only question is whether anyone can actually use any of
>> this stuff without your proprietary bits?
> 
> Do you mean the VMCI calls?  The VMCI driver is in the process of being
> upstreamed into the drivers/misc tree.  Greg (cc'd on these patches) is
> actively reviewing that code and we are addressing feedback.
> 
> Also, there was some interest from RedHat into using vSockets as a unified
> interface, routed over a hypervisor-specific transport (virtio or
> otherwise, although for now VMCI is the only one implemented).

Can you outline how this can be done?  From a quick look over the code
it seems like vsock has a hard dependency on vmci, is that correct?

When making vsock a generic, reusable kernel service it should be the
other way around:  vsock should provide the core implementation and an
interface where hypervisor-specific transports (vmci, virtio, xenbus,
...) can register themself.

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v4+ hot_track 02/19] vfs: initialize and free data structures

2012-11-06 Thread Zhi Yong Wu

On Wed, Nov 7, 2012 at 6:24 AM, David Sterba  wrote:
> On Mon, Oct 29, 2012 at 12:30:44PM +0800, zwu.ker...@gmail.com wrote:
>> +/* Frees the entire hot_range_tree. */
>> +static void hot_inode_item_free(struct kref *kref)
>> +{
>> + struct hot_comm_item *comm_item = container_of(kref,
>> + struct hot_comm_item, refs);
>> + struct hot_inode_item *he = container_of(comm_item,
>> + struct hot_inode_item, hot_inode);
>> +
>> + hot_range_tree_free(he);
>> + radix_tree_delete(he->hot_inode_tree, he->i_ino);
>
> void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
>
> and he::i_ino is u64, this will not work when
> sizeof(unsigned long) != sizeof(u64) (iirc this is a known limitation of
> radix tree implementation). This will work on 64bit only, not sure if
> this is intentional.
i actually also realized this. Do you have a better way to handle this?
>
>> + kmem_cache_free(hot_inode_item_cachep, he);
>> +}
>> +
>> +/* Frees the entire hot_inode_tree. */
>> +static void hot_inode_tree_exit(struct hot_info *root)
>> +{
>> + struct hot_inode_item *hi_nodes[8];
>> + u64 ino = 0;
>> + int i, n;
>
> nitpick, put the declarations on separate lines
Will it have any issue? It has passed the check of checkpatch.pl.

>
>> +
>> + while (1) {
>> + spin_lock(>lock);
>> + n = radix_tree_gang_lookup(>hot_inode_tree,
>> +(void **)hi_nodes, ino,
>> +ARRAY_SIZE(hi_nodes));
>> + if (!n) {
>> + spin_unlock(>lock);
>> + break;
>> + }
>> +
>> + ino = hi_nodes[n - 1]->i_ino + 1;
>> + for (i = 0; i < n; i++)
>> + hot_inode_item_put(hi_nodes[i]);
>> + spin_unlock(>lock);
>> + }
>> +}
>> +
>>  /*
>>   * Initialize kmem cache for hot_inode_item and hot_range_item.
>>   */
>> @@ -106,3 +197,36 @@ err:
>>   kmem_cache_destroy(hot_inode_item_cachep);
>>  }
>>  EXPORT_SYMBOL_GPL(hot_cache_init);
>> +
>> +/*
>> + * Initialize the data structures for hot data tracking.
>> + */
>> +int hot_track_init(struct super_block *sb)
>> +{
>> + struct hot_info *root;
>> + int ret = -ENOMEM;
>> +
>> + root = kzalloc(sizeof(struct hot_info), GFP_NOFS);
>> + if (!root) {
>> + printk(KERN_ERR "%s: Failed to malloc memory for "
>> + "hot_info\n", __func__);
>> + return ret;
>
> minor: you can drop the variable ret and just reurn ENOMEM here
This variable will also be used in the following patches.

>
>> + }
>> +
>> + sb->s_hot_root = root;
>> + hot_inode_tree_init(root);
>> +
>> + printk(KERN_INFO "VFS: Turning on hot data tracking\n");
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(hot_track_init);
>
> david



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/5] Thermal: Remove the cooling_cpufreq_list.

2012-11-06 Thread Zhang Rui

On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" 
> 
> Problem of using this list is that the cpufreq_get_max_state callback will be
> called when register cooling device by thermal_cooling_device_register, but
> this list isn't ready at this moment. What's more, there is no need to 
> maintain
> such a list, we can get cpufreq_cooling_device instance by the private
> thermal_cooling_device.devdata.
> 
> Signed-off-by: hongbo.zhang 
> Reviewed-by: Francesco Lavra 
> Reviewed-by: Amit Daniel Kachhap 

applied to thermal-next.

thanks,
rui

> ---
>  drivers/thermal/cpu_cooling.c | 91 
> +--
>  1 file changed, 19 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index bfd62b7..392d57d 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -58,8 +58,9 @@ struct cpufreq_cooling_device {
>  };
>  static LIST_HEAD(cooling_cpufreq_list);
>  static DEFINE_IDR(cpufreq_idr);
> +static DEFINE_MUTEX(cooling_cpufreq_lock);
>  
> -static struct mutex cooling_cpufreq_lock;
> +static unsigned int cpufreq_dev_count;
>  
>  /* notify_table passes value to the CPUFREQ_ADJUST callback function. */
>  #define NOTIFY_INVALID NULL
> @@ -240,28 +241,18 @@ static int cpufreq_thermal_notifier(struct 
> notifier_block *nb,
>  static int cpufreq_get_max_state(struct thermal_cooling_device *cdev,
>unsigned long *state)
>  {
> - int ret = -EINVAL, i = 0;
> - struct cpufreq_cooling_device *cpufreq_device;
> - struct cpumask *maskPtr;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> + struct cpumask *maskPtr = _device->allowed_cpus;
>   unsigned int cpu;
>   struct cpufreq_frequency_table *table;
>   unsigned long count = 0;
> + int i = 0;
>  
> - mutex_lock(_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, _cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev)
> - break;
> - }
> - if (cpufreq_device == NULL)
> - goto return_get_max_state;
> -
> - maskPtr = _device->allowed_cpus;
>   cpu = cpumask_any(maskPtr);
>   table = cpufreq_frequency_get_table(cpu);
>   if (!table) {
>   *state = 0;
> - ret = 0;
> - goto return_get_max_state;
> + return 0;
>   }
>  
>   for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
> @@ -272,12 +263,10 @@ static int cpufreq_get_max_state(struct 
> thermal_cooling_device *cdev,
>  
>   if (count > 0) {
>   *state = --count;
> - ret = 0;
> + return 0;
>   }
>  
> -return_get_max_state:
> - mutex_unlock(_cpufreq_lock);
> - return ret;
> + return -EINVAL;
>  }
>  
>  /**
> @@ -288,20 +277,10 @@ return_get_max_state:
>  static int cpufreq_get_cur_state(struct thermal_cooling_device *cdev,
>unsigned long *state)
>  {
> - int ret = -EINVAL;
> - struct cpufreq_cooling_device *cpufreq_device;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
>  
> - mutex_lock(_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, _cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev) {
> - *state = cpufreq_device->cpufreq_state;
> - ret = 0;
> - break;
> - }
> - }
> - mutex_unlock(_cpufreq_lock);
> -
> - return ret;
> + *state = cpufreq_device->cpufreq_state;
> + return 0;
>  }
>  
>  /**
> @@ -312,22 +291,9 @@ static int cpufreq_get_cur_state(struct 
> thermal_cooling_device *cdev,
>  static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>unsigned long state)
>  {
> - int ret = -EINVAL;
> - struct cpufreq_cooling_device *cpufreq_device;
> + struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
>  
> - mutex_lock(_cpufreq_lock);
> - list_for_each_entry(cpufreq_device, _cpufreq_list, node) {
> - if (cpufreq_device && cpufreq_device->cool_dev == cdev) {
> - ret = 0;
> - break;
> - }
> - }
> - if (!ret)
> - ret = cpufreq_apply_cooling(cpufreq_device, state);
> -
> - mutex_unlock(_cpufreq_lock);
> -
> - return ret;
> + return cpufreq_apply_cooling(cpufreq_device, state);
>  }
>  
>  /* Bind cpufreq callbacks to thermal cooling device ops */
> @@ -351,14 +317,11 @@ struct thermal_cooling_device *cpufreq_cooling_register(
>  {
>   struct thermal_cooling_device *cool_dev;
>   struct cpufreq_cooling_device *cpufreq_dev = NULL;
> - unsigned int cpufreq_dev_count = 0, min = 0, max = 0;
> + unsigned int min = 0, max = 0;
>   char dev_name[THERMAL_NAME_LENGTH];
>   int ret = 0,

Re: [PATCH V3 2/5] Thermal: fix bug of counting cpu frequencies.

2012-11-06 Thread Zhang Rui

On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" 
> 
> In the while loop for counting cpu frequencies, if table[i].frequency equals
> CPUFREQ_ENTRY_INVALID, index i won't be increased, so this leads to an endless
> loop, what's more the index i cannot be referred as cpu frequencies number if
> there is CPUFREQ_ENTRY_INVALID case.
> 
> Signed-off-by: hongbo.zhang 
> Reviewed-by: Viresh Kumar 
> Reviewed-by: Amit Daniel Kachhap 

applied to thermal-next.

thanks,
rui
> ---
>  drivers/thermal/cpu_cooling.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index b6b4c2a..bfd62b7 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -245,6 +245,7 @@ static int cpufreq_get_max_state(struct 
> thermal_cooling_device *cdev,
>   struct cpumask *maskPtr;
>   unsigned int cpu;
>   struct cpufreq_frequency_table *table;
> + unsigned long count = 0;
>  
>   mutex_lock(_cpufreq_lock);
>   list_for_each_entry(cpufreq_device, _cpufreq_list, node) {
> @@ -263,13 +264,14 @@ static int cpufreq_get_max_state(struct 
> thermal_cooling_device *cdev,
>   goto return_get_max_state;
>   }
>  
> - while (table[i].frequency != CPUFREQ_TABLE_END) {
> + for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
>   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
>   continue;
> - i++;
> + count++;
>   }
> - if (i > 0) {
> - *state = --i;
> +
> + if (count > 0) {
> + *state = --count;
>   ret = 0;
>   }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/2] x86, pci: Reset PCIe devices at boot time

2012-11-06 Thread Takao Indoh


(2012/10/16 13:23), Takao Indoh wrote:

(2012/10/16 3:36), Yinghai Lu wrote:

On Mon, Oct 15, 2012 at 12:00 AM, Takao Indoh
 wrote:

This patch resets PCIe devices at boot time by hot reset when
"reset_devices" is specified.


how about pci devices that domain_nr is not zero ?


This patch does not support multiple domains yet.



Signed-off-by: Takao Indoh 
---
  arch/x86/include/asm/pci-direct.h |1
  arch/x86/kernel/setup.c   |3
  arch/x86/pci/early.c  |  344 
  include/linux/pci.h   |2
  init/main.c   |4
  5 files changed, 352 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci-direct.h 
b/arch/x86/include/asm/pci-direct.h
index b1e7a45..de30db2 100644
--- a/arch/x86/include/asm/pci-direct.h
+++ b/arch/x86/include/asm/pci-direct.h
@@ -18,4 +18,5 @@ extern int early_pci_allowed(void);
  extern unsigned int pci_early_dump_regs;
  extern void early_dump_pci_device(u8 bus, u8 slot, u8 func);
  extern void early_dump_pci_devices(void);
+extern void early_reset_pcie_devices(void);
  #endif /* _ASM_X86_PCI_DIRECT_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a2bb18e..73d3425 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -987,6 +987,9 @@ void __init setup_arch(char **cmdline_p)
 generic_apic_probe();

 early_quirks();
+#ifdef CONFIG_PCI
+   early_reset_pcie_devices();
+#endif

 /*
  * Read APIC and some other early information from ACPI tables.
diff --git a/arch/x86/pci/early.c b/arch/x86/pci/early.c
index d1067d5..683b30f 100644
--- a/arch/x86/pci/early.c
+++ b/arch/x86/pci/early.c
@@ -1,5 +1,6 @@
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -109,3 +110,346 @@ void early_dump_pci_devices(void)
 }
 }
  }
+
+#define PCI_EXP_SAVE_REGS  7
+#define pcie_cap_has_devctl(type, flags)   1
+#define pcie_cap_has_lnkctl(type, flags)   \
+   ((flags & PCI_EXP_FLAGS_VERS) > 1 ||\
+(type == PCI_EXP_TYPE_ROOT_PORT || \
+ type == PCI_EXP_TYPE_ENDPOINT ||  \
+ type == PCI_EXP_TYPE_LEG_END))
+#define pcie_cap_has_sltctl(type, flags)   \
+   ((flags & PCI_EXP_FLAGS_VERS) > 1 ||\
+((type == PCI_EXP_TYPE_ROOT_PORT) ||   \
+ (type == PCI_EXP_TYPE_DOWNSTREAM &&   \
+  (flags & PCI_EXP_FLAGS_SLOT
+#define pcie_cap_has_rtctl(type, flags)\
+   ((flags & PCI_EXP_FLAGS_VERS) > 1 ||\
+(type == PCI_EXP_TYPE_ROOT_PORT || \
+ type == PCI_EXP_TYPE_RC_EC))
+
+struct save_config {
+   u32 pci[16];
+   u16 pcie[PCI_EXP_SAVE_REGS];
+};
+
+struct pcie_dev {
+   int cap;   /* position of PCI Express capability */
+   int flags; /* PCI_EXP_FLAGS */
+   struct save_config save; /* saved configration register */
+};
+
+struct pcie_port {
+   struct list_head dev;
+   u8 secondary;
+   struct pcie_dev child[PCI_MAX_FUNCTIONS];
+};
+
+static LIST_HEAD(device_list);
+static void __init pci_udelay(int loops)
+{
+   while (loops--) {
+   /* Approximately 1 us */
+   native_io_delay();
+   }
+}
+
+/* Derived from drivers/pci/pci.c */
+#define PCI_FIND_CAP_TTL   48
+static int __init __pci_find_next_cap_ttl(u8 bus, u8 slot, u8 func,
+ u8 pos, int cap, int *ttl)
+{
+   u8 id;
+
+   while ((*ttl)--) {
+   pos = read_pci_config_byte(bus, slot, func, pos);
+   if (pos < 0x40)
+   break;
+   pos &= ~3;
+   id = read_pci_config_byte(bus, slot, func,
+   pos + PCI_CAP_LIST_ID);
+   if (id == 0xff)
+   break;
+   if (id == cap)
+   return pos;
+   pos += PCI_CAP_LIST_NEXT;
+   }
+   return 0;
+}
+
+static int __init __pci_find_next_cap(u8 bus, u8 slot, u8 func, u8 pos, int 
cap)
+{
+   int ttl = PCI_FIND_CAP_TTL;
+
+   return __pci_find_next_cap_ttl(bus, slot, func, pos, cap, );
+}
+
+static int __init __pci_bus_find_cap_start(u8 bus, u8 slot, u8 func,
+  u8 hdr_type)
+{
+   u16 status;
+
+   status = read_pci_config_16(bus, slot, func, PCI_STATUS);
+   if (!(status & PCI_STATUS_CAP_LIST))
+   return 0;
+
+   switch (hdr_type) {
+   case PCI_HEADER_TYPE_NORMAL:
+   case PCI_HEADER_TYPE_BRIDGE:
+   return PCI_CAPABILITY_LIST;
+   case PCI_HEADER_TYPE_CARDBUS:
+   return PCI_CB_CAPABILITY_LIST;
+   default:
+   return 0;
+   }
+
+   return 0;
+}
+
+static int __init early_pci_find_capability(u8 bus, u8 slot, u8 func, int cap)
+{
+   int

Re: [PATCH V3 1/5] Thermal: add indent for code alignment.

2012-11-06 Thread Zhang Rui

On Tue, 2012-10-30 at 17:48 +0100, hongbo.zhang wrote:
> From: "hongbo.zhang" 
> 
> The curly bracket should be aligned with corresponding if else statements.
> 
> Signed-off-by: hongbo.zhang 
> Reviewed-by: Viresh Kumar 

applied to thermal-next.

thanks,
rui

> ---
>  drivers/thermal/cpu_cooling.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index cc1c930..b6b4c2a 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -369,7 +369,7 @@ struct thermal_cooling_device *cpufreq_cooling_register(
>   if (min != policy.cpuinfo.min_freq ||
>   max != policy.cpuinfo.max_freq)
>   return ERR_PTR(-EINVAL);
> -}
> + }
>   }
>   cpufreq_dev = kzalloc(sizeof(struct cpufreq_cooling_device),
>   GFP_KERNEL);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: tidspbridge: dynload: reloc.c: checkpatch.pl cleanup

2012-11-06 Thread Kumar Amit Mehta

fix for few error messages as reported by checkpatch.pl

Signed-off-by: Kumar Amit Mehta 
---
 drivers/staging/tidspbridge/dynload/reloc.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/tidspbridge/dynload/reloc.c 
b/drivers/staging/tidspbridge/dynload/reloc.c
index 7b28c07..463abdb 100644
--- a/drivers/staging/tidspbridge/dynload/reloc.c
+++ b/drivers/staging/tidspbridge/dynload/reloc.c
@@ -45,7 +45,7 @@ static const char bsssymbol[] = { ".bss" };
  * Effect:
  * Extracts the specified field and returns it.
  * */
-rvalue dload_unpack(struct dload_state *dlthis, tgt_au_t * data, int fieldsz,
+rvalue dload_unpack(struct dload_state *dlthis, tgt_au_t *data, int fieldsz,
int offset, unsigned sgn)
 {
register rvalue objval;
@@ -98,7 +98,7 @@ rvalue dload_unpack(struct dload_state *dlthis, tgt_au_t * 
data, int fieldsz,
  * */
 static const unsigned char ovf_limit[] = { 1, 2, 2 };
 
-int dload_repack(struct dload_state *dlthis, rvalue val, tgt_au_t * data,
+int dload_repack(struct dload_state *dlthis, rvalue val, tgt_au_t *data,
 int fieldsz, int offset, unsigned sgn)
 {
register urvalue objval, mask;
@@ -161,7 +161,7 @@ static const u8 c60_scale[SCALE_MASK + 1] = {
  * Effect:
  * Performs the specified relocation operation
  * */
-void dload_relocate(struct dload_state *dlthis, tgt_au_t * data,
+void dload_relocate(struct dload_state *dlthis, tgt_au_t *data,
struct reloc_record_t *rp, bool *tramps_generated,
bool second_pass)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SPARC and OF_GPIO

2012-11-06 Thread Thierry Reding

On Tue, Nov 06, 2012 at 06:40:58PM -0500, David Miller wrote:
> From: Thierry Reding 
> Date: Mon, 5 Nov 2012 10:53:15 +0100
> 
> > Are you aware of any reasons why this conflict would still be necessary?
> 
> No reason that I can see, I'll push something like the patch below
> via the sparc tree.

Thanks for doing this.

> > This is not only the case for OF_GPIO but likely also for OF_SPI,
> > OF_I2C, OF_IRQ and OF_ADDRESS. Shouldn't those all work even on SPARC
> > nowadays?
> 
> Those also would need to be tested on an individual basis, but
> there are no fundamental problems that I am aware of.

It seems like OF_ADDRESS would be trickier. A comment around line 60 in
drivers/of/platform.c says that SPARC doesn't need functions defined in
the enclosing #ifdef CONFIG_OF_ADDRESS block. I'm not sure it would be
acceptable to remove the conflict nonetheless, even if the functions
aren't used. One benefit would be that the code could receive some extra
compile coverage.

Oddly I'm no longer able to find any reference to OF_SPI, so maybe I
just made that up...

The code conditionalized on OF_I2C looks very generic, so I think there
shouldn't be a problem to remove that conflict either.

Finally, OF_IRQ is again just generic code to map device tree data to
IRQ domains. While I didn't see the IRQ_DOMAIN symbol selected anywhere
in SPARC it should still be possible to run drivers that properly
implement IRQ domains on SPARC, right? Or is there any reason why they
wouldn't work?

So this seems like all conflicts except the one for OF_ADDRESS can
easily be removed. And even for OF_ADDRESS there may be some value in
removing the conflict.

Thierry

> 
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index b6b442b..f0a5391 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -14,6 +14,7 @@ config SPARC
>   default y
>   select OF
>   select OF_PROMTREE
> + select OF_GPIO
>   select HAVE_IDE
>   select HAVE_OPROFILE
>   select HAVE_ARCH_KGDB if !SMP || SPARC64
> diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
> index d055cee..f11d8e3 100644
> --- a/drivers/gpio/Kconfig
> +++ b/drivers/gpio/Kconfig
> @@ -47,7 +47,7 @@ if GPIOLIB
>  
>  config OF_GPIO
>   def_bool y
> - depends on OF && !SPARC
> + depends on OF
>  
>  config DEBUG_GPIO
>   bool "Debug GPIO calls"
> 

pgpyg1w1Cqboj.pgp
Description: PGP signature

Re: [PATCH v2 2/2] mailbox: split internal header from API header

2012-11-06 Thread Omar Ramirez Luna

Hi Loic,

On 6 November 2012 06:53, Loic PALLARDY  wrote:
>
>
> On 11/06/2012 03:55 AM, Omar Ramirez Luna wrote:
>> Now internal structures can remain hidden to the user and just API
>> related functions and defines are made available.
>>
>> Signed-off-by: Omar Ramirez Luna
>> ---
>>   drivers/mailbox/mailbox.c |   34 
>>   drivers/mailbox/mailbox.h |   48 
>> +
>>   include/linux/mailbox.h   |   22 +++
> I agree with the file split but I think mailbox framework API should be
> more generic.
> omap_ prefix should not be used. mailbox_ prefix will be better.

Ok.

> Message type must be more opened like for example:
> struct mailbox_msg {
> int size;
> unsigned char   header;
> unsigned char   *pdata;
> };

We can analyze the requirement for having such structure, presumably
you expect variable size messages, in OMAP case it is a 4 byte value,
but I'm open to change in order to accommodate other users needs.

Cheers,

Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: tidspbridge: dynload: dload_internal.h: fix for coding style issue

2012-11-06 Thread Kumar Amit Mehta

fixed few error messages as reported by checkpatch.pl

Signed-off-by: Kumar Amit Mehta 
---
 .../staging/tidspbridge/dynload/dload_internal.h   |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/tidspbridge/dynload/dload_internal.h 
b/drivers/staging/tidspbridge/dynload/dload_internal.h
index 7b77573..b9d079b 100644
--- a/drivers/staging/tidspbridge/dynload/dload_internal.h
+++ b/drivers/staging/tidspbridge/dynload/dload_internal.h
@@ -313,14 +313,14 @@ extern uint32_t dload_reverse_checksum16(void *data, 
unsigned siz);
 /*
  * exported by reloc.c
  */
-extern void dload_relocate(struct dload_state *dlthis, tgt_au_t * data,
-  struct reloc_record_t *rp, bool * tramps_generated,
+extern void dload_relocate(struct dload_state *dlthis, tgt_au_t *data,
+  struct reloc_record_t *rp, bool *tramps_generated,
   bool second_pass);
 
-extern rvalue dload_unpack(struct dload_state *dlthis, tgt_au_t * data,
+extern rvalue dload_unpack(struct dload_state *dlthis, tgt_au_t *data,
   int fieldsz, int offset, unsigned sgn);
 
-extern int dload_repack(struct dload_state *dlthis, rvalue val, tgt_au_t * 
data,
+extern int dload_repack(struct dload_state *dlthis, rvalue val, tgt_au_t *data,
int fieldsz, int offset, unsigned sgn);
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.

2012-11-06 Thread Zhang Rui

On Thu, 2012-11-01 at 23:13 -0600, R, Durgadoss wrote:
> Hi Lee,
> 
> > -Original Message-
> > From: Jonghwa Lee [mailto:jonghwa3@samsung.com]
> > Sent: Friday, November 02, 2012 7:55 AM
> > To: linux...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org; Brown, Len; R, Durgadoss; Rafael J.
> > Wysocki; Amit Dinel Kachhap; MyungJoo Ham; Kyungmin Park; Jonghwa Lee
> > Subject: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's
> > emulation mode.
> > 
> > This patch supports exynos's emulation mode with newly created sysfs node.
> > Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
> > management unit. Thermal emulation mode supports software debug for
> > TMU's
> > operation. User can set temperature manually with software code and TMU
> > will read current temperature from user value not from sensor's value.
> > This patch includes also documentary placed under
> > Documentation/thermal/.
> > 
> 
first of all, what would happen if overheat happens during emulation?

I just had a thought about if we can introduce this to the generic
thermal layer.
to do this, we only need to:
1) introduce tz->emulation
2) introduce thermal_get_temp()
  static int thermal_get_temp(tz) {
  if (tz->emulation)
  return tz->emulation;
  else
  return tz->ops->get_temp(tz);
  }
3) replace tz->ops->get_temp() with thermal_get_temp() in thermal layer
4) introduce /sys/class/thermal/thermal_zoneX/emulation
5) when setting /sys/class/thermal/thermal_zoneX/emulation,
   a) set tz->emulation
   b) invoke thermal_zone_device_update();
this is a pure software emulation solution but it would work on all
generic thermal layer users.

do you think this proposal would work properly?
if yes, I'd like to see if it is valuable for the other platform thermal
drivers.

thanks,
rui
> Thanks for fixing the comments. 
> Please CC linux-acpi, when you submit thermal patches, going forward.
> I am CCing Rui for now, for him to review/merge this patch.
> 
> Reviewed-by: Durgadoss R 
> 
> Thanks,
> Durga
> 
> > Signed-off-by: Jonghwa Lee 
> > ---
> > v4
> >  - Fix Typo.
> >  - Remove unnecessary codes.
> >  - Add comments about feature of exynos emulation operation to the
> > document.
> > 
> > v3
> >  - Remove unnecessay variables.
> >  - Do some code clean in exynos_tmu_emulation_store().
> >  - Make wrapping function of sysfs node creation function to use
> >#ifdefs in minimum.
> > 
> > v2
> >  exynos_thermal.c
> >  - Fix build error occured by wrong emulation control register name.
> >  - Remove exynos5410 dependent codes.
> >  exynos_thermal_emulation
> >  - Align indentation.
> > 
> >  Documentation/thermal/exynos_thermal_emulation |   56
> > +++
> >  drivers/thermal/Kconfig|9 +++
> >  drivers/thermal/exynos_thermal.c   |   91
> > 
> >  3 files changed, 156 insertions(+), 0 deletions(-)
> >  create mode 100644 Documentation/thermal/exynos_thermal_emulation
> > 
> > diff --git a/Documentation/thermal/exynos_thermal_emulation
> > b/Documentation/thermal/exynos_thermal_emulation
> > new file mode 100644
> > index 000..a6ea06f
> > --- /dev/null
> > +++ b/Documentation/thermal/exynos_thermal_emulation
> > @@ -0,0 +1,56 @@
> > +EXYNOS EMULATION MODE
> > +
> > +
> > +Copyright (C) 2012 Samsung Electronics
> > +
> > +Written by Jonghwa Lee 
> > +
> > +Description
> > +---
> > +
> > +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
> > management unit.
> > +Thermal emulation mode supports software debug for TMU's operation.
> > User can set temperature
> > +manually with software code and TMU will read current temperature from
> > user value not from
> > +sensor's value.
> > +
> > +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support
> > in available.
> > +When it's enabled, sysfs node will be created under
> > +/sys/bus/platform/devices/'exynos device name'/ with name of
> > 'emulation'.
> > +
> > +The sysfs node, 'emulation', will contain value 0 for the initial state. 
> > When
> > you input any
> > +temperature you want to update to sysfs node, it automatically enable
> > emulation mode and
> > +current temperature will be changed into it.
> > +(Exynos also supports user changable delay time which would be used to
> > delay of
> > + changing temperature. However, this node only uses same delay of real
> > sensing time, 938us.)
> > +
> > +Exynos emulation mode requires synchronous of value changing and
> > enabling. It means when you
> > +want to update the any value of delay or next temperature, then you have
> > to enable emulation
> > +mode at the same time. (Or you have to keep the mode enabling.) If you
> > don't, it fails to
> > +change the value to updated one and just use last succeessful value
> > repeatedly. That's why
> > +this node gives users the right to change termerpature only. Just one
> >

Re: VFS hot tracking: How to calculate data temperature?

2012-11-06 Thread Zheng Liu

On Tue, Nov 06, 2012 at 05:00:19PM +0800, Zhi Yong Wu wrote:
> On Tue, Nov 6, 2012 at 4:39 PM, Zheng Liu  wrote:
> > On Mon, Nov 05, 2012 at 10:29:39AM +0800, Zhi Yong Wu wrote:
> >> On Fri, Nov 2, 2012 at 4:41 PM, Zheng Liu  wrote:
> >> > On Fri, Nov 02, 2012 at 02:38:29PM +0800, Zhi Yong Wu wrote:
> >> >> Here also has another question.
> >> >>
> >> >> How to save the file temperature among the umount to be able to
> >> >> preserve the file tempreture after reboot?
> >> >>
> >> >> This above is the requirement from DB product.
> >> >> I thought that we can save file temperature in its inode struct, that
> >> >> is, add one new field in struct inode, then this info will be written
> >> >> to disk with inode.
> >> >>
> >> >> Any comments or ideas are appreciated, thanks.
> >> >
> >> > Hi Zhiyong,
> >> >
> >> > I think that we might define a callback function.  If a filesystem wants
> >> > to save these data, it can implement a function to save them.  The
> >> > filesystem can decide whether adding it or not by themselves.
> >> Great idea,  temperature saving function is maybe very specific to FS.
> >> But i am wondering if we can find one generic way to save temperature
> >> info at first.
> >
> > I don't think a generic way is better because it cannot support a
> > variety of filesystems.  So maybe you must answer this question firstly:
> > how many filesystems do you want to save this info? such as ext4, xfs,
> > btrfs, etc.  Then we can try to find a generic way.  If only these three
> > filesystems you want to support, maybe saving in xattr is an optional
> > way.
> yes, xattr is one good choice from currect discussion result. Maybe we
> can provide one generic way, and one callback registering
> infrastructure, if FS register its own saving callback, this callback
> function will be used, otherwise the generic way will be applied.
> 
> By the way, as what Dave mentioned, the patchset v4+ review has
> highest priority, then the way how to calc data temperature, and the
> lowest priority is the way how to save data temperature info.

Great!  Thanks for sharing the news with me.  IMHO the highest priority
is that we must know the overhead that this patch set costs after using
these patches.  My point of view is that there is no any overhead when
it is disabled, and it only brings a little overhead when it is enabled.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci/runtime-pm: respect devices autosuspend timeout on config access

2012-11-06 Thread David Airlie



- Original Message -
> From: "Huang Ying" 
> To: "David Airlie" 
> Cc: linux-...@vger.kernel.org, linux-kernel@vger.kernel.org, "Bjorn Helgaas" 
> , "Rafael J.
> Wysocki" 
> Sent: Wednesday, 7 November, 2012 4:26:25 PM
> Subject: Re: [PATCH] pci/runtime-pm: respect devices autosuspend timeout on 
> config access
> 
> On Wed, 2012-11-07 at 01:15 -0500, David Airlie wrote:
> > > > 
> > > > Cc: Huang Ying 
> > > > Cc: Bjorn Helgaas 
> > > > Cc: Rafael J. Wysocki 
> > > > Signed-off-by: Dave Airlie 
> > > > ---
> > > >  drivers/pci/pci-sysfs.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > > index 02d107b..12d3d52 100644
> > > > --- a/drivers/pci/pci-sysfs.c
> > > > +++ b/drivers/pci/pci-sysfs.c
> > > > @@ -487,7 +487,7 @@ pci_config_pm_runtime_put(struct pci_dev
> > > > *pdev)
> > > > struct device *dev = >dev;
> > > > struct device *parent = dev->parent;
> > > >  
> > > > -   pm_runtime_put(dev);
> > > > +   pm_runtime_put_autosuspend(dev);
> > > > if (parent)
> > > > pm_runtime_put_sync(parent);
> > > >  }
> > > 
> > > I think you do not need that.  You can implement timeout
> > > in .runtime_idle callback of the driver.
> > 
> > If I understand what you are suggesting, I should setup some kinda
> > of timer callback to later call suspend, but that seems pointless
> > for me if we have the autosuspend mechanism in place.
> > 
> > Won't I end up racing my timer against other pm stuff? I'm not
> > really runtime pm expert so maybe I'm just missing something.
> 
> You can call pm_runtime_autosuspend or pm_runtime_schedule_suspend
> in .runtime_idle callback of the driver.

Ah that explains what I was probably missing, I'll go play with that for a 
while then!

Thanks,
Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tty, vt: lockdep warnings

2012-11-06 Thread Hugh Dickins

On Tue, 6 Nov 2012, Hugh Dickins wrote:
> 
> Ah, now I actually scan through it, I see references to blank screen:
> I'll try taking off your patch and seeing if it came up at screen
> blanking time, then put on your patch back on and try again.
> I'll report back in an hour or two.

Yes, that was it.  When the console screen blanked on 3.7.0-rc3-mm1,
it generated that lockdep splat, visible once I unblanked.  But once I
applied your patch to the kernel, lockdep kept quiet across blank/unblank.

Thanks!
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci/runtime-pm: respect devices autosuspend timeout on config access

2012-11-06 Thread Huang Ying

On Wed, 2012-11-07 at 01:15 -0500, David Airlie wrote:
> > > 
> > > Cc: Huang Ying 
> > > Cc: Bjorn Helgaas 
> > > Cc: Rafael J. Wysocki 
> > > Signed-off-by: Dave Airlie 
> > > ---
> > >  drivers/pci/pci-sysfs.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index 02d107b..12d3d52 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -487,7 +487,7 @@ pci_config_pm_runtime_put(struct pci_dev *pdev)
> > >   struct device *dev = >dev;
> > >   struct device *parent = dev->parent;
> > >  
> > > - pm_runtime_put(dev);
> > > + pm_runtime_put_autosuspend(dev);
> > >   if (parent)
> > >   pm_runtime_put_sync(parent);
> > >  }
> > 
> > I think you do not need that.  You can implement timeout
> > in .runtime_idle callback of the driver.
> 
> If I understand what you are suggesting, I should setup some kinda of timer 
> callback to later call suspend, but that seems pointless for me if we have 
> the autosuspend mechanism in place.
> 
> Won't I end up racing my timer against other pm stuff? I'm not really runtime 
> pm expert so maybe I'm just missing something.

You can call pm_runtime_autosuspend or pm_runtime_schedule_suspend
in .runtime_idle callback of the driver.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH V3] binfmt_elf.c: use get_random_int() to fix entropy depleting

2012-11-06 Thread Kees Cook

On Tue, Nov 6, 2012 at 10:11 PM, Jeff Liu  wrote:
> Hello,
>
> This is the revised patch for fix entropy depleting.
>
> Changes:
> 
> v3->v2:
> - Tweak code comments of random_stack_user().
> - Remove redundant bits mask and shift upon the random variable.
>
> v2->v1:
> Fix random copy to check up buffer length that are not 4-byte multiples.
>
> v2 can be found at:
> http://www.spinics.net/lists/linux-fsdevel/msg59418.html
> v1 can be found at:
> http://www.spinics.net/lists/linux-fsdevel/msg59128.html
>
> Many thanks to Andreas, Andrew as well as Kees for reviewing the patch of 
> past!
> -Jeff
>
>
> Entropy is quickly depleted under normal operations like ls(1), cat(1),
> etc...  between 2.6.30 to current mainline, for instance:
>
> $ cat /proc/sys/kernel/random/entropy_avail
> 3428
> $ cat /proc/sys/kernel/random/entropy_avail
> 2911
> $cat /proc/sys/kernel/random/entropy_avail
> 2620
>
> We observed this problem has been occurring since 2.6.30 with
> fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by
> f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
>
> /*
>  * Generate 16 random bytes for userspace PRNG seeding.
>  */
> get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
>
> The patch introduces a wrapper around get_random_int() which has lower
> overhead than calling get_random_bytes() directly.
>
> With this patch applied:
> $ cat /proc/sys/kernel/random/entropy_avail
> 2731
> $ cat /proc/sys/kernel/random/entropy_avail
> 2802
> $ cat /proc/sys/kernel/random/entropy_avail
> 2878
>
> Analyzed by John Sobecki.
>
> Signed-off-by: Jie Liu 
> Cc: John Sobecki 
> Cc: Al Viro 
> Cc: Andreas Dilger 
> Cc: Alan Cox 
> Cc: Arnd Bergmann 
> Cc: James Morris 
> Cc: Ted Ts'o 
> Cc: Greg Kroah-Hartman 
> Cc: Kees Cook 
> Cc: Jakub Jelinek 
> Cc: Ulrich Drepper 
> Signed-off-by: Andrew Morton 
>
> ---
>  fs/binfmt_elf.c |   22 +-
>  1 files changed, 21 insertions(+), 1 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index fbd9f60..b6c59f6 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_binprm *bprm, 
> struct pt_regs *regs);
>  static int load_elf_library(struct file *);
>  static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
> int, int, unsigned long);
> +static void randomize_stack_user(unsigned char *buf, size_t nbytes);

I think it would be easier to just move the function ahead of its use
to avoid the predeclaration.

>
>  /*
>   * If we don't support core dumping, then supply a NULL so we
> @@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *bprm, struct 
> elfhdr *exec,
> /*
>  * Generate 16 random bytes for userspace PRNG seeding.
>  */
> -   get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
> +   randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
> u_rand_bytes = (elf_addr_t __user *)
>STACK_ALLOC(p, sizeof(k_rand_bytes));
> if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
> @@ -558,6 +559,25 @@ static unsigned long randomize_stack_top(unsigned long 
> stack_top)
>  #endif
>  }
>
> +/*
> + * Use get_random_int() to implement AT_RANDOM while avoiding depletion
> + * of the entropy pool.
> + */
> +static void randomize_stack_user(unsigned char *buf, size_t nbytes)

I think this name needs changing -- it has nothing to do with the
stack except that that's where it ends up in userspace. Perhaps
"get_atrandom_bytes"?

> +{
> +   unsigned char *p = buf;
> +
> +   while (nbytes) {
> +   unsigned int random_variable;
> +   size_t chunk = min(nbytes, sizeof(unsigned int));
> +
> +   random_variable = get_random_int();

I still want to hear at least from Ted about this changes -- we would
be potentially increasing the predictability of these bytes...

> +   memcpy(p, _variable, chunk);
> +   p += chunk;
> +   nbytes -= chunk;
> +   }
> +}
> +
>  static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
>  {
> struct file *interpreter = NULL; /* to shut gcc up */
> --
> 1.7.4.1

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci/runtime-pm: respect devices autosuspend timeout on config access

2012-11-06 Thread David Airlie


> > 
> > Cc: Huang Ying 
> > Cc: Bjorn Helgaas 
> > Cc: Rafael J. Wysocki 
> > Signed-off-by: Dave Airlie 
> > ---
> >  drivers/pci/pci-sysfs.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 02d107b..12d3d52 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -487,7 +487,7 @@ pci_config_pm_runtime_put(struct pci_dev *pdev)
> > struct device *dev = >dev;
> > struct device *parent = dev->parent;
> >  
> > -   pm_runtime_put(dev);
> > +   pm_runtime_put_autosuspend(dev);
> > if (parent)
> > pm_runtime_put_sync(parent);
> >  }
> 
> I think you do not need that.  You can implement timeout
> in .runtime_idle callback of the driver.

If I understand what you are suggesting, I should setup some kinda of timer 
callback to later call suspend, but that seems pointless for me if we have the 
autosuspend mechanism in place.

Won't I end up racing my timer against other pm stuff? I'm not really runtime 
pm expert so maybe I'm just missing something.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND PATCH V3] binfmt_elf.c: use get_random_int() to fix entropy depleting

2012-11-06 Thread Jeff Liu

Hello,

This is the revised patch for fix entropy depleting.

Changes:

v3->v2:
- Tweak code comments of random_stack_user().
- Remove redundant bits mask and shift upon the random variable.

v2->v1:
Fix random copy to check up buffer length that are not 4-byte multiples.

v2 can be found at:
http://www.spinics.net/lists/linux-fsdevel/msg59418.html
v1 can be found at:
http://www.spinics.net/lists/linux-fsdevel/msg59128.html

Many thanks to Andreas, Andrew as well as Kees for reviewing the patch of past!
-Jeff


Entropy is quickly depleted under normal operations like ls(1), cat(1),
etc...  between 2.6.30 to current mainline, for instance:

$ cat /proc/sys/kernel/random/entropy_avail
3428
$ cat /proc/sys/kernel/random/entropy_avail
2911
$cat /proc/sys/kernel/random/entropy_avail
2620

We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by
f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").

/*
 * Generate 16 random bytes for userspace PRNG seeding.
 */
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));

The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.

With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878

Analyzed by John Sobecki.

Signed-off-by: Jie Liu 
Cc: John Sobecki 
Cc: Al Viro 
Cc: Andreas Dilger 
Cc: Alan Cox 
Cc: Arnd Bergmann 
Cc: James Morris 
Cc: Ted Ts'o 
Cc: Greg Kroah-Hartman 
Cc: Kees Cook 
Cc: Jakub Jelinek 
Cc: Ulrich Drepper 
Signed-off-by: Andrew Morton 

---
 fs/binfmt_elf.c |   22 +-
 1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fbd9f60..b6c59f6 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct 
pt_regs *regs);
 static int load_elf_library(struct file *);
 static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
int, int, unsigned long);
+static void randomize_stack_user(unsigned char *buf, size_t nbytes);
 
 /*
  * If we don't support core dumping, then supply a NULL so we
@@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
/*
 * Generate 16 random bytes for userspace PRNG seeding.
 */
-   get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
+   randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
u_rand_bytes = (elf_addr_t __user *)
   STACK_ALLOC(p, sizeof(k_rand_bytes));
if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
@@ -558,6 +559,25 @@ static unsigned long randomize_stack_top(unsigned long 
stack_top)
 #endif
 }
 
+/*
+ * Use get_random_int() to implement AT_RANDOM while avoiding depletion
+ * of the entropy pool.
+ */
+static void randomize_stack_user(unsigned char *buf, size_t nbytes)
+{
+   unsigned char *p = buf;
+
+   while (nbytes) {
+   unsigned int random_variable;
+   size_t chunk = min(nbytes, sizeof(unsigned int));
+
+   random_variable = get_random_int();
+   memcpy(p, _variable, chunk);
+   p += chunk;
+   nbytes -= chunk;
+   }
+}
+
 static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 {
struct file *interpreter = NULL; /* to shut gcc up */
-- 
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v4 08/24] block: Remove some unnecessary bi_vcnt usage

2012-11-06 Thread Reddy, Sreekanth

Hi,

This patch seem to be fine. Please consider this patch as Acked-by: "Sreekanth 
Reddy" 

Regards,
Sreekanth.

-Original Message-
From: linux-scsi-ow...@vger.kernel.org 
[mailto:linux-scsi-ow...@vger.kernel.org] On Behalf Of Kent Overstreet
Sent: Tuesday, October 16, 2012 1:39 AM
To: linux-bca...@vger.kernel.org; linux-kernel@vger.kernel.org; 
dm-de...@redhat.com
Cc: Kent Overstreet; t...@kernel.org; ax...@kernel.dk; ne...@suse.de; 
vgo...@redhat.com; Moore, Eric; James E.J. Bottomley; linux-s...@vger.kernel.org
Subject: [PATCH v4 08/24] block: Remove some unnecessary bi_vcnt usage

More prep work for immutable bvecs/effecient bio splitting - usage of bi_vcnt 
has to be auditing, so getting rid of all the unnecessary usage makes that 
easier.

Plus, bio_segments() is really what this code wanted, as it respects the 
current value of bi_idx.

Signed-off-by: Kent Overstreet 
CC: Jens Axboe 
CC: Eric Moore 
CC: "James E.J. Bottomley" 
CC: linux-s...@vger.kernel.org
---
 drivers/message/fusion/mptsas.c  |  6 +++---
 drivers/scsi/libsas/sas_expander.c   |  6 +++---
 drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c 
index 551262e..5406a9f 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, 
struct sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
+   if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u %u, rsp %u 
%u\n",
-   ioc->name, __func__, req->bio->bi_vcnt, blk_rq_bytes(req),
-   rsp->bio->bi_vcnt, blk_rq_bytes(rsp));
+   ioc->name, __func__, bio_segments(req->bio), 
blk_rq_bytes(req),
+   bio_segments(rsp->bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index efc6e72..ee331a7 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2151,10 +2151,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
}
 
/* do we need to support multiple segments? */
-   if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
+   if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
printk("%s: multiple segments req %u %u, rsp %u %u\n",
-  __func__, req->bio->bi_vcnt, blk_rq_bytes(req),
-  rsp->bio->bi_vcnt, blk_rq_bytes(rsp));
+  __func__, bio_segments(req->bio), blk_rq_bytes(req),
+  bio_segments(rsp->bio), blk_rq_bytes(rsp));
return -EINVAL;
}
 
diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c 
b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index c6cf20f..403a57b 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1939,7 +1939,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
ioc->transport_cmds.status = MPT2_CMD_PENDING;
 
/* Check if the request is split across multiple segments */
-   if (req->bio->bi_vcnt > 1) {
+   if (bio_segments(req->bio) > 1) {
u32 offset = 0;
 
/* Allocate memory and copy the request */ @@ -1971,7 +1971,7 
@@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 
/* Check if the response needs to be populated across
 * multiple segments */
-   if (rsp->bio->bi_vcnt > 1) {
+   if (bio_segments(rsp->bio) > 1) {
pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
_dma_in);
if (!pci_addr_in) {
@@ -2038,7 +2038,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
-   if (req->bio->bi_vcnt > 1) {
+   if (bio_segments(req->bio) > 1) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(req) - 4), pci_dma_out);
} else {
@@ -2054,7 +2054,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct 
sas_rphy *rphy,
MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
MPI2_SGE_FLAGS_END_OF_LIST);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
-   if (rsp->bio->bi_vcnt > 1) {
+   if (bio_segments(rsp->bio) > 1) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(rsp) + 4), pci_dma_in);
} else {
@@

Re: [PATCH] pinctrl: Staticize pinconf_ops

2012-11-06 Thread Dong Aisheng

On 7 November 2012 13:37, Axel Lin  wrote:
> They are not referenced outside respective driver.
>
> Signed-off-by: Axel Lin 
> Cc: Jean-Christophe PLAGNIOL-VILLARD 
> Cc: Simon Arlott 
> Cc: John Crispin 
> Cc: Dong Aisheng 
> Cc: Shawn Guo 
> Cc: Stephen Warren 
> ---
>  drivers/pinctrl/pinctrl-at91.c|2 +-
>  drivers/pinctrl/pinctrl-bcm2835.c |2 +-
>  drivers/pinctrl/pinctrl-falcon.c  |2 +-
>  drivers/pinctrl/pinctrl-imx.c |2 +-

For imx,
Acked-by: Dong Aisheng 

Regards
Dong Aisheng

>  drivers/pinctrl/pinctrl-mxs.c |2 +-
>  drivers/pinctrl/pinctrl-tegra.c   |2 +-
>  drivers/pinctrl/pinctrl-xway.c|2 +-
>  7 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c
> index b9e2cbd..8490a55 100644
> --- a/drivers/pinctrl/pinctrl-at91.c
> +++ b/drivers/pinctrl/pinctrl-at91.c
> @@ -665,7 +665,7 @@ static void at91_pinconf_group_dbg_show(struct 
> pinctrl_dev *pctldev,
>  {
>  }
>
> -struct pinconf_ops at91_pinconf_ops = {
> +static struct pinconf_ops at91_pinconf_ops = {
> .pin_config_get = at91_pinconf_get,
> .pin_config_set = at91_pinconf_set,
> .pin_config_dbg_show= at91_pinconf_dbg_show,
> diff --git a/drivers/pinctrl/pinctrl-bcm2835.c 
> b/drivers/pinctrl/pinctrl-bcm2835.c
> index 7e9be18..9a963ed 100644
> --- a/drivers/pinctrl/pinctrl-bcm2835.c
> +++ b/drivers/pinctrl/pinctrl-bcm2835.c
> @@ -916,7 +916,7 @@ static int bcm2835_pinconf_set(struct pinctrl_dev 
> *pctldev,
> return 0;
>  }
>
> -struct pinconf_ops bcm2835_pinconf_ops = {
> +static struct pinconf_ops bcm2835_pinconf_ops = {
> .pin_config_get = bcm2835_pinconf_get,
> .pin_config_set = bcm2835_pinconf_set,
>  };
> diff --git a/drivers/pinctrl/pinctrl-falcon.c 
> b/drivers/pinctrl/pinctrl-falcon.c
> index ee73059..8ed20e8 100644
> --- a/drivers/pinctrl/pinctrl-falcon.c
> +++ b/drivers/pinctrl/pinctrl-falcon.c
> @@ -322,7 +322,7 @@ static void falcon_pinconf_group_dbg_show(struct 
> pinctrl_dev *pctrldev,
>  {
>  }
>
> -struct pinconf_ops falcon_pinconf_ops = {
> +static struct pinconf_ops falcon_pinconf_ops = {
> .pin_config_get = falcon_pinconf_get,
> .pin_config_set = falcon_pinconf_set,
> .pin_config_group_get   = falcon_pinconf_group_get,
> diff --git a/drivers/pinctrl/pinctrl-imx.c b/drivers/pinctrl/pinctrl-imx.c
> index 63866d9..f3d2384 100644
> --- a/drivers/pinctrl/pinctrl-imx.c
> +++ b/drivers/pinctrl/pinctrl-imx.c
> @@ -397,7 +397,7 @@ static void imx_pinconf_group_dbg_show(struct pinctrl_dev 
> *pctldev,
> }
>  }
>
> -struct pinconf_ops imx_pinconf_ops = {
> +static struct pinconf_ops imx_pinconf_ops = {
> .pin_config_get = imx_pinconf_get,
> .pin_config_set = imx_pinconf_set,
> .pin_config_dbg_show = imx_pinconf_dbg_show,
> diff --git a/drivers/pinctrl/pinctrl-mxs.c b/drivers/pinctrl/pinctrl-mxs.c
> index 4ba4636..3e7d4d6 100644
> --- a/drivers/pinctrl/pinctrl-mxs.c
> +++ b/drivers/pinctrl/pinctrl-mxs.c
> @@ -319,7 +319,7 @@ static void mxs_pinconf_group_dbg_show(struct pinctrl_dev 
> *pctldev,
> seq_printf(s, "0x%lx", config);
>  }
>
> -struct pinconf_ops mxs_pinconf_ops = {
> +static struct pinconf_ops mxs_pinconf_ops = {
> .pin_config_get = mxs_pinconf_get,
> .pin_config_set = mxs_pinconf_set,
> .pin_config_group_get = mxs_pinconf_group_get,
> diff --git a/drivers/pinctrl/pinctrl-tegra.c b/drivers/pinctrl/pinctrl-tegra.c
> index 7da0b37..f7fe91e 100644
> --- a/drivers/pinctrl/pinctrl-tegra.c
> +++ b/drivers/pinctrl/pinctrl-tegra.c
> @@ -660,7 +660,7 @@ static void tegra_pinconf_config_dbg_show(struct 
> pinctrl_dev *pctldev,
>  }
>  #endif
>
> -struct pinconf_ops tegra_pinconf_ops = {
> +static struct pinconf_ops tegra_pinconf_ops = {
> .pin_config_get = tegra_pinconf_get,
> .pin_config_set = tegra_pinconf_set,
> .pin_config_group_get = tegra_pinconf_group_get,
> diff --git a/drivers/pinctrl/pinctrl-xway.c b/drivers/pinctrl/pinctrl-xway.c
> index b9bcaec..ad90984 100644
> --- a/drivers/pinctrl/pinctrl-xway.c
> +++ b/drivers/pinctrl/pinctrl-xway.c
> @@ -522,7 +522,7 @@ static int xway_pinconf_set(struct pinctrl_dev *pctldev,
> return 0;
>  }
>
> -struct pinconf_ops xway_pinconf_ops = {
> +static struct pinconf_ops xway_pinconf_ops = {
> .pin_config_get = xway_pinconf_get,
> .pin_config_set = xway_pinconf_set,
>  };
> --
> 1.7.9.5
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci/runtime-pm: respect devices autosuspend timeout on config access

2012-11-06 Thread Huang Ying

On Wed, 2012-11-07 at 15:30 +1000, Dave Airlie wrote:
> So I've been adding runtime pm to nouveau/radeon, and on X start it does a
> lot of pci accesses. Now because the pm on these devices is equivalent
> to D3cold, we have to resume them which involves a heavy latency due to
> POSTing the cards. The driver configures the autosuspend timeout to 5s for
> this reason, and I think the PCI layer config accesses should respect
> the autosuspend.
> 
> Cc: Huang Ying 
> Cc: Bjorn Helgaas 
> Cc: Rafael J. Wysocki 
> Signed-off-by: Dave Airlie 
> ---
>  drivers/pci/pci-sysfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 02d107b..12d3d52 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -487,7 +487,7 @@ pci_config_pm_runtime_put(struct pci_dev *pdev)
>   struct device *dev = >dev;
>   struct device *parent = dev->parent;
>  
> - pm_runtime_put(dev);
> + pm_runtime_put_autosuspend(dev);
>   if (parent)
>   pm_runtime_put_sync(parent);
>  }

I think you do not need that.  You can implement timeout
in .runtime_idle callback of the driver.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3] binfmt_elf.c: Introduce a wrapper of get_random_int() to fix entropy depleting

2012-11-06 Thread Jeff Liu

Please ignore this patch since I forgot revising the comments of this
new wrapper according to Andrew's, will re-send it a little while, sorry
for the noise!

-Jeff

On 11/07/2012 01:27 PM, Jeff Liu wrote:
> Hello,
> 
> We have observed entropy quickly depleting under normal I/O between 2.6.30 to 
> upstream, for instance:
> 
> $ cat /proc/sys/kernel/random/entropy_avail 
> 3428
> $ cat /proc/sys/kernel/random/entropy_avail 
> 2911
> $cat /proc/sys/kernel/random/entropy_avail 
> 2620
> 
> It has been occurred with fs/binfmt_elf.c: 
> create_elf_tables()->get_random_bytes()
> was introduced began at 2.6.30.
> /*
>  * Generate 16 random bytes for userspace PRNG seeding.
>  */
> get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
> 
> This proposal patch is trying to introduce a wrapper of get_random_int() 
> which has lower overhead
> than calling get_random_bytes() directly.
> 
> Entropy increased with this patch applied:
> $ cat /proc/sys/kernel/random/entropy_avail
> 2731
> $ cat /proc/sys/kernel/random/entropy_avail
> 2802
> $ cat /proc/sys/kernel/random/entropy_avail
> 2878
> 
> 
> v3->v2:
> ---
> Remove redundant bits mask and shift upon the random variable according to 
> Kees's review.
> 
> v2->v1:
> ---
> Fix random copy to check up buffer length that are not 4-byte multiples 
> according to Andreas's comments.
> 
> v2 can be found at:
> http://www.spinics.net/lists/linux-fsdevel/msg59418.html
> v1 can be found at:
> http://www.spinics.net/lists/linux-fsdevel/msg59128.html
> 
> Many thanks to Andreas, Andrew as well as Kees for reviewing the patch of 
> past.
> 
> -Jeff
> 
> 
> Signed-off-by: Jie Liu 
> Cc: John Sobecki 
> CC: Andrew Morton 
> Cc: Al Viro 
> Cc: Andreas Dilger 
> Cc: Alan Cox 
> Cc: Arnd Bergmann 
> Cc: James Morris 
> Cc: Ted Ts'o 
> Cc: Greg Kroah-Hartman 
> Cc: Kees Cook 
> Cc: Jakub Jelinek 
> Cc: Ulrich Drepper 
> 
> ---
>  fs/binfmt_elf.c |   24 +++-
>  1 files changed, 23 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index fbd9f60..9c36e50 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_binprm *bprm, 
> struct pt_regs *regs);
>  static int load_elf_library(struct file *);
>  static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
>   int, int, unsigned long);
> +static void randomize_stack_user(unsigned char *buf, size_t nbytes);
>  
>  /*
>   * If we don't support core dumping, then supply a NULL so we
> @@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *bprm, struct 
> elfhdr *exec,
>   /*
>* Generate 16 random bytes for userspace PRNG seeding.
>*/
> - get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
> + randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
>   u_rand_bytes = (elf_addr_t __user *)
>  STACK_ALLOC(p, sizeof(k_rand_bytes));
>   if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
> @@ -558,6 +559,27 @@ static unsigned long randomize_stack_top(unsigned long 
> stack_top)
>  #endif
>  }
>  
> +/*
> + * A wrapper of get_random_int() to generate random bytes which has lower
> + * overhead than calling get_random_bytes() directly.
> + * create_elf_tables() call this function to generate 16 random bytes for
> + * userspace PRNG seeding.
> + */
> +static void randomize_stack_user(unsigned char *buf, size_t nbytes)
> +{
> + unsigned char *p = buf;
> +
> + while (nbytes) {
> + unsigned int random_variable;
> + size_t chunk = min(nbytes, sizeof(unsigned int));
> +
> + random_variable = get_random_int();
> + memcpy(p, _variable, chunk);
> + p += chunk;
> + nbytes -= chunk;
> + }
> +}
> +
>  static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
>  {
>   struct file *interpreter = NULL; /* to shut gcc up */
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4] staging: ste_rmi4: Convert to Type-B support

2012-11-06 Thread Alexandra Chin

Convert to MT-B because Synaptics touch devices are capable of tracking
identifiable fingers.

Signed-off-by: Alexandra Chin 
---
Changes from v4:
- Incorporated Henrik's review comments
  *split function synpatics_rmi4_touchscreen_report
  *split function synaptics_rmi4_i2c_query_device

Changes from v3:
- Incorporated Henrik's review comments
  *remove 'else' after an error path return
  *add input_mt_sync_frame() for pointer emulation effects
  *correct names of touchscreen
- Replace printk with dev_err

Changes from v2:
- Incorporated Henrik's review comments
  *directly report finger state with Type-B
- Against 3.7-rcX
  *call input_mt_init_slots with INPUT_MT_DIRECT flag
---
 drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c |  375 ++---
 1 files changed, 215 insertions(+), 160 deletions(-)

diff --git a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c 
b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
index 277491a..ef3fd0c 100644
--- a/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
+++ b/drivers/staging/ste_rmi4/synaptics_i2c_rmi4.c
@@ -1,10 +1,11 @@
 /**
  *
- * Synaptics Register Mapped Interface (RMI4) I2C Physical Layer Driver.
- * Copyright (c) 2007-2010, Synaptics Incorporated
+ * Synaptics Register Mapped Interface (RMI4) I2C Touchscreen Driver.
+ * Copyright (c) 2007-2012, Synaptics Incorporated
  *
  * Author: Js HA  for ST-Ericsson
  * Author: Naveen Kumar G  for ST-Ericsson
+ * Author: Alexandra Chin 
  * Copyright 2010 (c) ST-Ericsson AB
  */
 /*
@@ -31,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "synaptics_i2c_rmi4.h"
 
 /* TODO: for multiple device support will need a per-device mutex */
@@ -63,12 +65,11 @@
 #define MASK_4BIT  0x0F
 #define MASK_3BIT  0x07
 #define MASK_2BIT  0x03
-#define TOUCHPAD_CTRL_INTR 0x8
+#define TOUCHSCREEN_CTRL_INTR  0x8
 #define PDT_START_SCAN_LOCATION (0x00E9)
 #define PDT_END_SCAN_LOCATION  (0x000A)
 #define PDT_ENTRY_SIZE (0x0006)
-#define RMI4_NUMBER_OF_MAX_FINGERS (8)
-#define SYNAPTICS_RMI4_TOUCHPAD_FUNC_NUM   (0x11)
+#define SYNAPTICS_RMI4_TOUCHSCREEN_FUNC_NUM(0x11)
 #define SYNAPTICS_RMI4_DEVICE_CONTROL_FUNC_NUM (0x01)
 
 /**
@@ -164,6 +165,7 @@ struct synaptics_rmi4_device_info {
  * @regulator: pointer to the regulator structure
  * @wait: wait queue structure variable
  * @touch_stopped: flag to stop the thread function
+ * @fingers_supported: maximum supported fingers
  *
  * This structure gives the device data information.
  */
@@ -184,6 +186,8 @@ struct synaptics_rmi4_data {
struct regulator*regulator;
wait_queue_head_t   wait;
booltouch_stopped;
+   unsigned char   fingers_supported;
+   int finger_status_register_count;
 };
 
 /**
@@ -291,34 +295,100 @@ exit:
 }
 
 /**
- * synpatics_rmi4_touchpad_report() - reports for the rmi4 touchpad device
+ * synpatics_rmi4_finger_report() - finger reports
  * @pdata: pointer to synaptics_rmi4_data structure
  * @rfi: pointer to synaptics_rmi4_fn structure
+ * @finger: finger index
+ * @values: pointer to buffer of status registers
  *
- * This function calls to reports for the rmi4 touchpad device
+ * This function calls to report multi-finger data to input subsystem
+ * and returns true if finger status is non zero
  */
-static int synpatics_rmi4_touchpad_report(struct synaptics_rmi4_data *pdata,
+static bool synpatics_rmi4_finger_report(struct synaptics_rmi4_data *pdata,
+   struct synaptics_rmi4_fn *rfi,
+   int finger,
+   unsigned char *values)
+{
+   int retval;
+   int x, y;
+   int wx, wy;
+   int reg;
+   int finger_shift;
+   int finger_status;
+   int finger_registers = pdata->finger_status_register_count;
+   unsigned char   data[DATA_LEN];
+   unsigned char   data_reg_blk_size = rfi->size_of_data_register_block;
+   unsigned short  data_offset;
+   unsigned short  data_base_addr = rfi->fn_desc.data_base_addr;
+   struct  i2c_client *client = pdata->i2c_client;
+   struct  input_dev *input_dev = pdata->input_dev;
+
+   /* determine which data byte the finger status is in */
+   reg = finger / 4;
+   /* bit shift to get finger's status */
+   finger_shift= (finger % 4) * 2;
+   finger_status   = (values[reg] >> finger_shift) & MASK_2BIT;
+
+   /*
+* if finger status indicates a finger is present then
+* read the finger data and report it
+*/
+   input_mt_slot(input_dev, finger);
+   input_mt_report_slot_state(input_dev, MT_TOOL_FINGER,
+   finger_status != 0);
+   if (finger_status) {
+

[PATCH] pinctrl: Staticize pinconf_ops

2012-11-06 Thread Axel Lin

They are not referenced outside respective driver.

Signed-off-by: Axel Lin 
Cc: Jean-Christophe PLAGNIOL-VILLARD 
Cc: Simon Arlott 
Cc: John Crispin 
Cc: Dong Aisheng 
Cc: Shawn Guo 
Cc: Stephen Warren 
---
 drivers/pinctrl/pinctrl-at91.c|2 +-
 drivers/pinctrl/pinctrl-bcm2835.c |2 +-
 drivers/pinctrl/pinctrl-falcon.c  |2 +-
 drivers/pinctrl/pinctrl-imx.c |2 +-
 drivers/pinctrl/pinctrl-mxs.c |2 +-
 drivers/pinctrl/pinctrl-tegra.c   |2 +-
 drivers/pinctrl/pinctrl-xway.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c
index b9e2cbd..8490a55 100644
--- a/drivers/pinctrl/pinctrl-at91.c
+++ b/drivers/pinctrl/pinctrl-at91.c
@@ -665,7 +665,7 @@ static void at91_pinconf_group_dbg_show(struct pinctrl_dev 
*pctldev,
 {
 }
 
-struct pinconf_ops at91_pinconf_ops = {
+static struct pinconf_ops at91_pinconf_ops = {
.pin_config_get = at91_pinconf_get,
.pin_config_set = at91_pinconf_set,
.pin_config_dbg_show= at91_pinconf_dbg_show,
diff --git a/drivers/pinctrl/pinctrl-bcm2835.c 
b/drivers/pinctrl/pinctrl-bcm2835.c
index 7e9be18..9a963ed 100644
--- a/drivers/pinctrl/pinctrl-bcm2835.c
+++ b/drivers/pinctrl/pinctrl-bcm2835.c
@@ -916,7 +916,7 @@ static int bcm2835_pinconf_set(struct pinctrl_dev *pctldev,
return 0;
 }
 
-struct pinconf_ops bcm2835_pinconf_ops = {
+static struct pinconf_ops bcm2835_pinconf_ops = {
.pin_config_get = bcm2835_pinconf_get,
.pin_config_set = bcm2835_pinconf_set,
 };
diff --git a/drivers/pinctrl/pinctrl-falcon.c b/drivers/pinctrl/pinctrl-falcon.c
index ee73059..8ed20e8 100644
--- a/drivers/pinctrl/pinctrl-falcon.c
+++ b/drivers/pinctrl/pinctrl-falcon.c
@@ -322,7 +322,7 @@ static void falcon_pinconf_group_dbg_show(struct 
pinctrl_dev *pctrldev,
 {
 }
 
-struct pinconf_ops falcon_pinconf_ops = {
+static struct pinconf_ops falcon_pinconf_ops = {
.pin_config_get = falcon_pinconf_get,
.pin_config_set = falcon_pinconf_set,
.pin_config_group_get   = falcon_pinconf_group_get,
diff --git a/drivers/pinctrl/pinctrl-imx.c b/drivers/pinctrl/pinctrl-imx.c
index 63866d9..f3d2384 100644
--- a/drivers/pinctrl/pinctrl-imx.c
+++ b/drivers/pinctrl/pinctrl-imx.c
@@ -397,7 +397,7 @@ static void imx_pinconf_group_dbg_show(struct pinctrl_dev 
*pctldev,
}
 }
 
-struct pinconf_ops imx_pinconf_ops = {
+static struct pinconf_ops imx_pinconf_ops = {
.pin_config_get = imx_pinconf_get,
.pin_config_set = imx_pinconf_set,
.pin_config_dbg_show = imx_pinconf_dbg_show,
diff --git a/drivers/pinctrl/pinctrl-mxs.c b/drivers/pinctrl/pinctrl-mxs.c
index 4ba4636..3e7d4d6 100644
--- a/drivers/pinctrl/pinctrl-mxs.c
+++ b/drivers/pinctrl/pinctrl-mxs.c
@@ -319,7 +319,7 @@ static void mxs_pinconf_group_dbg_show(struct pinctrl_dev 
*pctldev,
seq_printf(s, "0x%lx", config);
 }
 
-struct pinconf_ops mxs_pinconf_ops = {
+static struct pinconf_ops mxs_pinconf_ops = {
.pin_config_get = mxs_pinconf_get,
.pin_config_set = mxs_pinconf_set,
.pin_config_group_get = mxs_pinconf_group_get,
diff --git a/drivers/pinctrl/pinctrl-tegra.c b/drivers/pinctrl/pinctrl-tegra.c
index 7da0b37..f7fe91e 100644
--- a/drivers/pinctrl/pinctrl-tegra.c
+++ b/drivers/pinctrl/pinctrl-tegra.c
@@ -660,7 +660,7 @@ static void tegra_pinconf_config_dbg_show(struct 
pinctrl_dev *pctldev,
 }
 #endif
 
-struct pinconf_ops tegra_pinconf_ops = {
+static struct pinconf_ops tegra_pinconf_ops = {
.pin_config_get = tegra_pinconf_get,
.pin_config_set = tegra_pinconf_set,
.pin_config_group_get = tegra_pinconf_group_get,
diff --git a/drivers/pinctrl/pinctrl-xway.c b/drivers/pinctrl/pinctrl-xway.c
index b9bcaec..ad90984 100644
--- a/drivers/pinctrl/pinctrl-xway.c
+++ b/drivers/pinctrl/pinctrl-xway.c
@@ -522,7 +522,7 @@ static int xway_pinconf_set(struct pinctrl_dev *pctldev,
return 0;
 }
 
-struct pinconf_ops xway_pinconf_ops = {
+static struct pinconf_ops xway_pinconf_ops = {
.pin_config_get = xway_pinconf_get,
.pin_config_set = xway_pinconf_set,
 };
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pci/runtime-pm: respect devices autosuspend timeout on config access

2012-11-06 Thread Dave Airlie

So I've been adding runtime pm to nouveau/radeon, and on X start it does a
lot of pci accesses. Now because the pm on these devices is equivalent
to D3cold, we have to resume them which involves a heavy latency due to
POSTing the cards. The driver configures the autosuspend timeout to 5s for
this reason, and I think the PCI layer config accesses should respect
the autosuspend.

Cc: Huang Ying 
Cc: Bjorn Helgaas 
Cc: Rafael J. Wysocki 
Signed-off-by: Dave Airlie 
---
 drivers/pci/pci-sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 02d107b..12d3d52 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -487,7 +487,7 @@ pci_config_pm_runtime_put(struct pci_dev *pdev)
struct device *dev = >dev;
struct device *parent = dev->parent;

-   pm_runtime_put(dev);
+   pm_runtime_put_autosuspend(dev);
if (parent)
pm_runtime_put_sync(parent);
 }
-- 
1.7.12.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V3] binfmt_elf.c: Introduce a wrapper of get_random_int() to fix entropy depleting

2012-11-06 Thread Jeff Liu

Hello,

We have observed entropy quickly depleting under normal I/O between 2.6.30 to 
upstream, for instance:

$ cat /proc/sys/kernel/random/entropy_avail 
3428
$ cat /proc/sys/kernel/random/entropy_avail 
2911
$cat /proc/sys/kernel/random/entropy_avail 
2620

It has been occurred with fs/binfmt_elf.c: 
create_elf_tables()->get_random_bytes()
was introduced began at 2.6.30.
/*
 * Generate 16 random bytes for userspace PRNG seeding.
 */
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));

This proposal patch is trying to introduce a wrapper of get_random_int() which 
has lower overhead
than calling get_random_bytes() directly.

Entropy increased with this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878


v3->v2:
---
Remove redundant bits mask and shift upon the random variable according to 
Kees's review.

v2->v1:
---
Fix random copy to check up buffer length that are not 4-byte multiples 
according to Andreas's comments.

v2 can be found at:
http://www.spinics.net/lists/linux-fsdevel/msg59418.html
v1 can be found at:
http://www.spinics.net/lists/linux-fsdevel/msg59128.html

Many thanks to Andreas, Andrew as well as Kees for reviewing the patch of past.

-Jeff


Signed-off-by: Jie Liu 
Cc: John Sobecki 
CC: Andrew Morton 
Cc: Al Viro 
Cc: Andreas Dilger 
Cc: Alan Cox 
Cc: Arnd Bergmann 
Cc: James Morris 
Cc: Ted Ts'o 
Cc: Greg Kroah-Hartman 
Cc: Kees Cook 
Cc: Jakub Jelinek 
Cc: Ulrich Drepper 

---
 fs/binfmt_elf.c |   24 +++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fbd9f60..9c36e50 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_binprm *bprm, struct 
pt_regs *regs);
 static int load_elf_library(struct file *);
 static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr *,
int, int, unsigned long);
+static void randomize_stack_user(unsigned char *buf, size_t nbytes);
 
 /*
  * If we don't support core dumping, then supply a NULL so we
@@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
/*
 * Generate 16 random bytes for userspace PRNG seeding.
 */
-   get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
+   randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
u_rand_bytes = (elf_addr_t __user *)
   STACK_ALLOC(p, sizeof(k_rand_bytes));
if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
@@ -558,6 +559,27 @@ static unsigned long randomize_stack_top(unsigned long 
stack_top)
 #endif
 }
 
+/*
+ * A wrapper of get_random_int() to generate random bytes which has lower
+ * overhead than calling get_random_bytes() directly.
+ * create_elf_tables() call this function to generate 16 random bytes for
+ * userspace PRNG seeding.
+ */
+static void randomize_stack_user(unsigned char *buf, size_t nbytes)
+{
+   unsigned char *p = buf;
+
+   while (nbytes) {
+   unsigned int random_variable;
+   size_t chunk = min(nbytes, sizeof(unsigned int));
+
+   random_variable = get_random_int();
+   memcpy(p, _variable, chunk);
+   p += chunk;
+   nbytes -= chunk;
+   }
+}
+
 static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 {
struct file *interpreter = NULL; /* to shut gcc up */
-- 
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Nov 7

2012-11-06 Thread Stephen Rothwell

Hi all,

/me resists commenting on recent political events

Changes since 20121106:

The pci tree still has its build failure for which I applied a merge fix patch.

The v4l-dvb tree lost its build failure but gained another so I used the
version from next-20121026.

The pinctrl tree gained a build failure for which I applied a patch.

The arm-soc tree gained a conflict against the l2-mtd and pinctrl trees.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 209 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (3d70f8c Linux 3.7-rc4)
Merging fixes/master (12250d8 Merge branch 'i2c-embedded/for-next' of 
git://git.pengutronix.de/git/wsa/linux)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (6404f0b ARM: 7569/1: mm: uninitialized warning 
corrections)
Merging m68k-current/for-linus (8a745ee m68k: Wire up kcmp)
Merging powerpc-merge/merge (8c23f40 Merge 
git://git.kernel.org/pub/scm/virt/kvm/kvm)
Merging sparc/master (f7e8d9f qlogicpti: Fix build warning.)
Merging net/master (cacb6ba net: inet_diag -- Return error code if protocol 
handler is missed)
Merging sound-current/for-linus (ae24c31 ALSA: hda - Force to reset IEC958 
status bits for AD codecs)
Merging pci-current/for-linus (ff8e59b PCI/portdrv: Don't create hotplug slots 
unless port supports hotplug)
Merging wireless/master (6fe7cc7 ath9k: Test for TID only in BlockAcks while 
checking tx status)
Merging driver-core.current/driver-core-linus (8f0d816 Linux 3.7-rc3)
Merging tty.current/tty-linus (8f0d816 Linux 3.7-rc3)
Merging usb.current/usb-linus (d99e65b USB: fix build with XEN and 
EARLY_PRINTK_DBGP enabled but USB_SUPPORT disabled)
Merging staging.current/staging-linus (8f0d816 Linux 3.7-rc3)
Merging char-misc.current/char-misc-linus (8f0d816 Linux 3.7-rc3)
Merging input-current/for-linus (32ed191 Input: tsc40 - remove wrong 
announcement of pressure support)
Merging md-current/for-linus (ed30be0 MD RAID10: Fix oops when creating RAID10 
arrays via dm-raid.c)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)
Merging spi-current/spi/merge (d1c185b of/spi: Fix SPI module loading by using 
proper "spi:" modalias prefixes.)
Merging gpio-current/gpio/merge (96b7064 gpio/tca6424: merge I2C transactions, 
remove cast)
Merging rr-fixes/fixes (f6a79af modules: don't break modules_install on 
external modules with no key.)
Merging asm-generic/master (9b04ebd asm-generic/io.h: remove asm/cacheflush.h 
include)
Merging arm/

Re: [PATCH 4/5] gpiolib: call pin removal in chip removal function

2012-11-06 Thread viresh kumar

On Tue, Nov 6, 2012 at 8:47 PM, Linus Walleij
 wrote:
> From: Linus Walleij 
>
> This makes us call gpiochio_remove_pin_ranges() in the
> gpiochip_remove() function, so we get rid of ranges when
> freeing the chip.
>
> Signed-off-by: Linus Walleij 

Reviewed-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] gpiolib: remove duplicate pin range code

2012-11-06 Thread viresh kumar

On Tue, Nov 6, 2012 at 8:46 PM, Linus Walleij
 wrote:
> From: Linus Walleij 
>
> Commit 69e1601bca88809dc118abd1becb02c15a02ec71
> "gpiolib: provide provision to register pin ranges"
>
> Introduced both of_gpiochip_remove_pin_range() and
> gpiochip_remove_pin_ranges(). But the contents are exactly
> the same so remove the OF one and rely on the range deletion
> in the core.
>
> Signed-off-by: Linus Walleij 

I can't believe that i did this :(

Reviewed-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] gpiolib-of: staticize the pin range calls

2012-11-06 Thread viresh kumar

On 6 November 2012 20:46, Linus Walleij  wrote:
diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c

 #ifdef CONFIG_PINCTRL
-void of_gpiochip_add_pin_range(struct gpio_chip *chip)
+static void of_gpiochip_add_pin_range(struct gpio_chip *chip)
 {
struct device_node *np = chip->of_node;
struct gpio_pin_range *pin_range;
@@ -254,7 +254,7 @@ void of_gpiochip_add_pin_range(struct gpio_chip *chip)
} while (index++);
 }

-void of_gpiochip_remove_pin_range(struct gpio_chip *chip)
+static void of_gpiochip_remove_pin_range(struct gpio_chip *chip)
 {
struct gpio_pin_range *pin_range, *tmp;

@@ -265,8 +265,8 @@ void of_gpiochip_remove_pin_range(struct gpio_chip *chip)
}
 }
 #else
-void of_gpiochip_add_pin_range(struct gpio_chip *chip) {}
-void of_gpiochip_remove_pin_range(struct gpio_chip *chip) {}
+static void of_gpiochip_add_pin_range(struct gpio_chip *chip) {}
+static void of_gpiochip_remove_pin_range(struct gpio_chip *chip) {}

Maybe static inline??

Reviewed-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with DISCARD and RAID5

2012-11-06 Thread Shaohua Li

On Tue, Nov 06, 2012 at 09:06:16AM +0100, Jens Axboe wrote:
> On 2012-11-05 22:48, Dave Chinner wrote:
> > On Fri, Nov 02, 2012 at 09:40:58AM +0800, Shaohua Li wrote:
> >> On Thu, Nov 01, 2012 at 05:38:54PM +1100, NeilBrown wrote:
> >>>
> >>> Hi Shaohua,
> >>>  I've been doing some testing and discovered a problem with your discard
> >>>  support for RAID5.
> >>>
> >>>  The code in blkdev_issue_discard assumes that the 'granularity' is a 
> >>> power
> >>>  of 2, and for example subtracts 1 to get a mask.
> >>>
> >>>  However RAID5 sets the granularity to be the stripe size which often is 
> >>> not
> >>>  a power of two.  When this happens you can easily get into an infinite 
> >>> loop.
> >>>
> >>>  I suspect that to make this work properly, blkdev_issue_discard will 
> >>> need to
> >>>  be changed to allow 'granularity' to be an arbitrary value.
> >>>  When it is a power of two, the current masking can be used.
> >>>  When it is anything else, it will need to use sector_div().
> >>
> >> Yep, looks we need use sector_div. And this isn't the only problem. discard
> >> request can be merged, and the merge check only checks max_discard_sectors.
> >> That means the split requests in blkdev_issue_discard can be merged again. 
> >> The
> >> split nerver works.
> >>
> >> I'm wondering what's purpose of discard_alignment and discard_granularity. 
> >> Are
> >> there devices with discard_granularity not 1 sector?
> > 
> > Most certainly. Thin provisioned storage often has granularity in the
> > order of megabytes
> 
> Can't really to to much about that...
> 
> >> If bio isn't discard
> >> aligned, what device will do?
> > 
> > Up to the device.
> 
> We should not send those down, if they are violating the restrictions
> set by the driver.
> 
> >> Further, why driver handles alignment/granularity
> >> if device will ignore misaligned request.
> > 
> > When you send a series of sequential unaligned requests, the device
> > may ignore them all. Hence you end up with nothing being discarded,
> > even though the entire range being discarded is much, much larger
> > than the discard granularity
> 
> That's just tough luck, unfortunately. Shaohua, I'd suggest sending down
> whatever discards you can, IFF they are aligned according to the
> restrictions being set. If that ends up not discarding to devices that
> have large alignment/size constraints, nothing we can do about that.

So we have two problems here:

1. as Neil described, blkdev_issue_discard assumes alignment and granularity
are a power of 2. We can fix it with sector_div for example.

2. discard request can be merged. The merge check currently ignore alignment
and granularity. So it's possible unaligned requests are merged to aligned, or
one aligned request and one unaligned request are merged to unaligned. Just
ignore unaligned request, so such merge will not happen?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2] gpio-mcp23s08: Build I2C support even when CONFIG_I2C=m

2012-11-06 Thread Daniel M. Weeks

The driver has both SPI and I2C pieces. The appropriate pieces are built based
on whether SPI and/or I2C is/are enabled. However, it was only checking if I2C
was built-in, never if it was built as a module. This patch checks for either
since building both this driver and I2C as modules is possible.

Signed-off-by: Daniel M. Weeks 
---
v2: use IS_ENABLED macro

 drivers/gpio/gpio-mcp23s08.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-mcp23s08.c b/drivers/gpio/gpio-mcp23s08.c
index 0f42518..ce1c847 100644
--- a/drivers/gpio/gpio-mcp23s08.c
+++ b/drivers/gpio/gpio-mcp23s08.c
@@ -77,7 +77,7 @@ struct mcp23s08_driver_data {
 
 /*--*/
 
-#ifdef CONFIG_I2C
+#if IS_ENABLED(CONFIG_I2C)
 
 static int mcp23008_read(struct mcp23s08 *mcp, unsigned reg)
 {
@@ -399,7 +399,7 @@ static int mcp23s08_probe_one(struct mcp23s08 *mcp, struct 
device *dev,
break;
 #endif /* CONFIG_SPI_MASTER */
 
-#ifdef CONFIG_I2C
+#if IS_ENABLED(CONFIG_I2C)
case MCP_TYPE_008:
mcp->ops = _ops;
mcp->chip.ngpio = 8;
@@ -473,7 +473,7 @@ fail:
 
 /*--*/
 
-#ifdef CONFIG_I2C
+#if IS_ENABLED(CONFIG_I2C)
 
 static int __devinit mcp230xx_probe(struct i2c_client *client,
const struct i2c_device_id *id)
-- 
Daniel M. Weeks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the final tree (pinctrl tree related)

2012-11-06 Thread Stephen Rothwell

Hi all,

After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

In file included from include/linux/gpio.h:48:0,
 from include/linux/of_gpio.h:20,
 from arch/powerpc/sysdev/ppc4xx_gpio.c:29:
include/asm-generic/gpio.h:74:10: error: 'struct gpio_chip' declared inside 
parameter list [-Werror]
include/asm-generic/gpio.h:74:10: error: its scope is only this definition or 
declaration, which is probably not what you want [-Werror]
include/asm-generic/gpio.h: In function 'gpiochip_add_pin_range':
include/asm-generic/gpio.h:76:1: error: no return statement in function 
returning non-void [-Werror=return-type]
include/asm-generic/gpio.h: At top level:
include/asm-generic/gpio.h:79:35: error: 'struct gpio_chip' declared inside 
parameter list [-Werror]

Caused by commit e8321df59155 ("gpiolib: iron out include ladder
mistakes") (and some earlier ones) from the pinctrl tree.

I added this patch for today (there may be a better return value from
gpiochip_add_pin_range):

From: Stephen Rothwell 
Date: Wed, 7 Nov 2012 15:42:44 +1100
Subject: [PATCH] gpiolib: fix non CONFIG_GPIOLIB functions

Signed-off-by: Stephen Rothwell 
---
 include/asm-generic/gpio.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h
index 54e02e6..f9acd78 100644
--- a/include/asm-generic/gpio.h
+++ b/include/asm-generic/gpio.h
@@ -69,10 +69,13 @@ void gpiochip_remove_pin_ranges(struct gpio_chip *chip);
 
 #else
 
+struct gpio_chip;
+
 static inline int
 gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name,
   unsigned int pin_base, unsigned int npins)
 {
+   return 0;
 }
 
 static inline void
-- 
1.7.10.280.gaa39

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgplVn2YQyWfP.pgp
Description: PGP signature

Re: + binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch added to -mm tree

2012-11-06 Thread Jeff Liu

On 11/07/2012 12:29 PM, Kees Cook wrote:
> On Tue, Nov 6, 2012 at 8:21 PM, Jeff Liu  wrote:
>> Hi Andrew and Kees,
>>
>> Great thanks for both your comments!
>>
>> On 11/07/2012 09:11 AM, Kees Cook wrote:
>>> Hrm, I don't like this. get_random_int() specifically says: "Get a
>>> random word for internal kernel use only." The intent of AT_RANDOM is
>>> for userspace pRNG seeding (though glibc currently uses it directly
>>> for stack protector and pointer mangling), which is not "internal
>>> kernel use only". :) Though I suppose this is already being used for
>>> the randomize_stack_top(), but I think it'd still be better to use
>>> higher quality bits.
>> Btw Kees, does it sounds make sense if we just return the 16 bytes
>> uninitialized stack array if the user disable the stack randomize via
>> "/proc/sys/kernel/randomize_va_space = 0" or via the related sysctl, or
>> even specified norandmaps on boot?
> 
> No, I feel that ASLR (randomize_va_space) is distinctly separate from
> how glibc uses AT_RANDOM (stack protector and pointer mangling).
> AT_RANDOM should remain active even if randomize_va_space is 0.
Ok, I was confused about the semantics of ASLR, thanks for your
clarification, will post another patch soon according to your feedback.

-Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] sched: power aware load balance,

2012-11-06 Thread Preeti Murthy

Hi Alex,

What I am concerned about in this patchset as Peter also
mentioned in the previous discussion of your approach
(https://lkml.org/lkml/2012/8/13/139)
is that:

1.Using nr_running of two different sched groups to decide which one
can be group_leader or group_min might not be be the right approach,
as this might mislead us to think that a group running one task is less
loaded than the group running three tasks although the former task is
a cpu hogger.

2.Comparing the number of cpus with the number of tasks running in a sched
group to decide if the group is underloaded or overloaded again faces
the same issue.The tasks might be short running,not utilizing cpu much.

I also feel before we introduce another side to the scheduler called
'power aware',why not try and see if the current scheduler itself can
perform better? We have an opportunity in terms of PJT's patches which
can help scheduler make more realistic decisions in load balance.Also
since PJT's metric is a statistical one,I believe we could vary it to
allow scheduler to do more rigorous or less rigorous power savings.

It is true however that this approach will not try and evacuate nearly idle
cpus over to nearly full cpus.That is definitely one of the benefits of your
patch,in terms of power savings,but I believe your patch is not making use
of the right metric to decide that.

IMHO,the appraoch towards power aware scheduler should take the following steps:

1.Make use of PJT's per-entity-load tracking metric to allow scheduler to make
more intelligent decisions in load balancing.Test the performance and power save
numbers.

2.If the above shows some characteristic change in behaviour over the earlier
scheduler,it should be either towards power save or towards performance.If found
positive towards one of them, try varying the calculation of
per-entity-load to see
if it can lean towards the other behaviour.If it can,then there you
go,you have a
knob to change between policies right there!

3.If you don't get enough power savings with the above approach then
add your patchset
to evacuate nearly idle towards nearly busy groups,but by using PJT's metric to
make the decision.

What do you think?

Regards
Preeti U Murthy
On Tue, Nov 6, 2012 at 6:39 PM, Alex Shi  wrote:
> This patch enabled the power aware consideration in load balance.
>
> As mentioned in the power aware scheduler proposal, Power aware
> scheduling has 2 assumptions:
> 1, race to idle is helpful for power saving
> 2, shrink tasks on less sched_groups will reduce power consumption
>
> The first assumption make performance policy take over scheduling when
> system busy.
> The second assumption make power aware scheduling try to move
> disperse tasks into fewer groups until that groups are full of tasks.
>
> This patch reuse lots of Suresh's power saving load balance code.
> Now the general enabling logical is:
> 1, Collect power aware scheduler statistics with performance load
> balance statistics collection.
> 2, if domain is eligible for power load balance do it and forget
> performance load balance, else do performance load balance.
>
> Has tried on my 2 sockets * 4 cores * HT NHM EP machine.
> and 2 sockets * 8 cores * HT SNB EP machine.
> In the following checking, when I is 2/4/8/16, all tasks are
> shrank to run on single core or single socket.
>
> $for ((i=0; i < I; i++)) ; do while true; do : ; done & done
>
> Checking the power consuming with a powermeter on the NHM EP.
> powersaving performance
> I = 2   148w160w
> I = 4   175w181w
> I = 8   207w224w
> I = 16  324w324w
>
> On a SNB laptop(4 cores *HT)
> powersaving performance
> I = 2   28w 35w
> I = 4   38w 52w
> I = 6   44w 54w
> I = 8   56w 56w
>
> On the SNB EP machine, when I = 16, power saved more than 100 Watts.
>
> Also tested the specjbb2005 with jrockit, kbuild, their peak performance
> has no clear change with powersaving policy on all machines. Just
> specjbb2005 with openjdk has about 2% drop on NHM EP machine with
> powersaving policy.
>
> This patch seems a bit long, but seems hard to split smaller.
>
> Signed-off-by: Alex Shi 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/6] mm: teach mm by current context info to not do I/O during memory allocation

2012-11-06 Thread Ming Lei

On Wed, Nov 7, 2012 at 11:48 AM, Andrew Morton
 wrote:
>>
>> Firstly,  the patch follows the policy in the system suspend/resume 
>> situation,
>> in which the __GFP_FS is cleared, and basically the problem is very similar
>> with that in system PM path.
>
> I suspect that code is wrong.  Or at least, suboptimal.
>
>> Secondly, inside shrink_page_list(), pageout() may be triggered on dirty anon
>> page if __GFP_FS is set.
>
> pageout() should be called if GFP_FS is set or if GFP_IO is set and the
> IO is against swap.
>
> And that's what we want to happen: we want to enter the fs to try to
> turn dirty pagecache into clean pagecache without doing IO.  If we in
> fact enter the device drivers when GFP_IO was not set then that's a bug
> which we should fix.

OK, I got it, and I'll not clear GFP_FS in -v5.

>
>> IMO, if performing I/O can be completely avoided when __GFP_FS is set, the
>> flag can be kept, otherwise it is better to clear it in the situation.
>
> yup.
>
>> >
>> > Also, you can probably put the unlikely() inside memalloc_noio() and
>> > avoid repeating it at all the callsites.
>> >
>> > And it might be neater to do:
>> >
>> > /*
>> >  * Nice comment goes here
>> >  */
>> > static inline gfp_t memalloc_noio_flags(gfp_t flags)
>> > {
>> > if (unlikely(current->flags & PF_MEMALLOC_NOIO))
>> > flags &= ~GFP_IOFS;
>> > return flags;
>> > }
>>
>> But without the check in callsites, some local variables will be write
>> two times,
>> so it is better to not do it.
>
> I don't see why - we just modify the incoming gfp_t at the start of the
> function, then use it.
>
> It gets a bit tricky with those struct initialisations.  Things like
>
> struct foo bar {
> .a = a1,
> .b = b1,
> };
>
> should not be turned into
>
> struct foo bar {
> .a = a1,
> };
>
> bar.b = b1;
>
> and we don't want to do
>
> struct foo bar { };
>
> bar.a = a1;
> bar.b = b1;
>
> either, because these are indeed a double-write.  But we can do
>
> struct foo bar {
> .flags = (flags = memalloc_noio_flags(flags)),
> .b = b1,
> };
>
> which is a bit arcane but not t bad.  Have a think about it...

Got it, looks memalloc_noio_flags() neater, and I will take it in v5.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tty, vt: lockdep warnings

2012-11-06 Thread Hugh Dickins

On Tue, 6 Nov 2012, Alan Cox wrote:
> On Mon, 5 Nov 2012 12:34:44 -0800 (PST)
> Hugh Dickins  wrote:
> > On Mon, 5 Nov 2012, Alan Cox wrote:
> > > > The fbdev potential for deadlock may be years old, but the warning
> > > > (and consequent disabling of lockdep from that point on - making it
> > > > useless to everybody else in need of it) is new, and comes from the
> > > > commit below in linux-next.
> > > > 
> > > > I revert it in my own testing: if there is no quick fix to the
> > > > fbdev issue on the way, Daniel, please revert it from your tree.
> > > 
> > > If you revert it you swap it for a different deadlock - and one that
> > > happens more often I would expect. Not very useful.
> > 
> > But a deadlock we have lived with for years.  Without reverting,
> > we're prevented from discovering all the new deadlocks we're adding.
> 
> We lived with it locking boxes up on users but not knowing why. The root
> cause is loading two different framebuffers with one taking over from
> another - that should be an obscure corner case and once the fuzz testing
> can avoid.

I'm bemused, but at least I now understand why we disagreed on this.

You thought it was a lockdep splat I got in the course of fuzz testing,
or doing some other obscure test: no, I thought I got it in booting up
the laptop, so it was in the way of doing useful testing thereafter.

I'd swear that I saw it two or three times, on each boot of 3.7.0-rc3-mm1;
then lost patience and deleted all the console_lock_dep_map lines from
kernel/printk.c, after which no problem.

But /var/log/messages calls me a liar, shows only one instance, and that
10 minutes after booting: that splat appended below in case it tells you
anything new; but I've no idea what triggered iti.  (The "W" taint comes
from my using a "numa=fake=2" boot option, which surprised smpboot.c to
find smt-siblings on different nodes: not related to the console, I hope).

>  
> > That would be ideal - thanks.
> 
> 
> I had a semi-informed poke at this and came up with a possible patch (not 
> very tested)

Many thanks for your effort.

> 
> commit f4fa6c739ecc367dbb98f5be1ff626d9b2750878
> Author: Alan Cox 
> Date:   Tue Nov 6 15:33:18 2012 +
> 
> fb: Rework locking to fix lock ordering on takeover
> 
> Adjust the console layer to allow a take over call where the caller 
> already
> holds the locks. Make the fb layer lock in order.
> 
> This s partly a band aid, the fb layer is terminally confused about the
> locking rules it uses for its notifiers it seems.
> 
> Signed-off-by: Alan Cox 

So I went to test this, but first tried to reproduce the orginal lockdep
splat that had irritated me so, and was utterly unsuccessful.  So although
I am now running happily with your patch applied, no ill effects observed,
this gives no confidence because I cannot reproduce the condition anyway.

Sorry to be so unhelpful, original splat without your patch below.

Ah, now I actually scan through it, I see references to blank screen:
I'll try taking off your patch and seeing if it came up at screen
blanking time, then put on your patch back on and try again.
I'll report back in an hour or two.

Hugh

==
[ INFO: possible circular locking dependency detected ]
3.7.0-rc3-mm1 #2 Tainted: GW   
---
kworker/0:1/30 is trying to acquire lock:
ACPI: Invalid Power Resource to register!
 ((fb_notifier_list).rwsem){.+.+.+}, at: [] 
__blocking_notifier_call_chain+0x6b/0xa2

but task is already holding lock:
 (console_lock){+.+.+.}, at: [] console_callback+0xc/0xf7

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}:
   [] __lock_acquire+0x7fc/0x8bb
   [] lock_acquire+0x57/0x6d
   [] console_lock+0x67/0x69
   [] register_con_driver+0x36/0x128
   [] take_over_console+0x21/0x2b7
   [] fbcon_takeover+0x56/0x98
   [] fbcon_event_notify+0x3bb/0x6ee
   [] notifier_call_chain+0xa7/0xd4
   [] __blocking_notifier_call_chain+0x81/0xa2
   [] blocking_notifier_call_chain+0xf/0x11
   [] fb_notifier_call_chain+0x16/0x18
   [] register_framebuffer+0x20c/0x270
   [] drm_fb_helper_single_fb_probe+0x1ce/0x270
   [] drm_fb_helper_initial_config+0x1ca/0x1e1
   [] intel_fbdev_init+0x76/0x89
   [] i915_driver_load+0xb20/0xcf7
   [] drm_get_pci_dev+0x162/0x25b
   [] i915_pci_probe+0x60/0x69
   [] local_pci_probe+0x12/0x16
   [] pci_device_probe+0xbe/0xeb
   [] driver_probe_device+0x91/0x19e
   [] __driver_attach+0x5d/0x80
   [] bus_for_each_dev+0x52/0x84
   [] driver_attach+0x19/0x1b
   [] bus_add_driver+0xe7/0x20c
   [] driver_register+0x8e/0x114
   [] __pci_register_driver+0x5a/0x5f
   [] drm_pci_init+0x80/0xe5
   [] i915_init+0x66/0x68
   [] do_one_initcall+0x7a/0x131
   []

Re: + binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch added to -mm tree

2012-11-06 Thread Kees Cook

On Tue, Nov 6, 2012 at 8:21 PM, Jeff Liu  wrote:
> Hi Andrew and Kees,
>
> Great thanks for both your comments!
>
> On 11/07/2012 09:11 AM, Kees Cook wrote:
>> Hrm, I don't like this. get_random_int() specifically says: "Get a
>> random word for internal kernel use only." The intent of AT_RANDOM is
>> for userspace pRNG seeding (though glibc currently uses it directly
>> for stack protector and pointer mangling), which is not "internal
>> kernel use only". :) Though I suppose this is already being used for
>> the randomize_stack_top(), but I think it'd still be better to use
>> higher quality bits.
> Btw Kees, does it sounds make sense if we just return the 16 bytes
> uninitialized stack array if the user disable the stack randomize via
> "/proc/sys/kernel/randomize_va_space = 0" or via the related sysctl, or
> even specified norandmaps on boot?

No, I feel that ASLR (randomize_va_space) is distinctly separate from
how glibc uses AT_RANDOM (stack protector and pointer mangling).
AT_RANDOM should remain active even if randomize_va_space is 0.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] mmc: core: Add support for idle time BKOPS

2012-11-06 Thread merez

Hi Jaehoon,

Any update on this patch review and testing?

Thanks,
Maya
On Mon, October 15, 2012 11:53 pm, Jaehoon Chung wrote:
> Hi Maya,
>
> I'm testing with your patch..but i need to have the more time for testing.
> In now, it looks good to me. Thank you for working the idle bkops.
>
> Best Regards,
> Jaehoon Chung
>
>> ---
>> diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
>> index 172a768..ed040d5 100644
>> --- a/drivers/mmc/card/block.c
>> +++ b/drivers/mmc/card/block.c
>> @@ -827,6 +827,9 @@ static int mmc_blk_issue_discard_rq(struct mmc_queue
>> *mq, struct request *req)
>>  from = blk_rq_pos(req);
>>  nr = blk_rq_sectors(req);
>>
>> +if (card->ext_csd.bkops_en)
>> +card->bkops_info.sectors_changed += blk_rq_sectors(req);
> using nr?
>> +
>>  if (mmc_can_discard(card))
>>  arg = MMC_DISCARD_ARG;
>>  else if (mmc_can_trim(card))
>> @@ -1268,6 +1271,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue
>> *mq, struct request *rqc)
>>  if (!rqc && !mq->mqrq_prev->req)
>>  return 0;
>>
>> +if (rqc && (card->ext_csd.bkops_en) && (rq_data_dir(rqc) == WRITE))
>> +card->bkops_info.sectors_changed += blk_rq_sectors(rqc);
> Fix the indent.
>> +
>>  do {
>>  if (rqc) {
>>  /*
>> diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
>> index e360a97..e96f5cf 100644
>> --- a/drivers/mmc/card/queue.c
>> +++ b/drivers/mmc/card/queue.c
>> @@ -51,6 +51,7 @@ static int mmc_queue_thread(void *d)
>>  {
>>  struct mmc_queue *mq = d;
>>  struct request_queue *q = mq->queue;
>> +struct mmc_card *card = mq->card;
>>
>>  current->flags |= PF_MEMALLOC;
>>
>> @@ -66,6 +67,17 @@ static int mmc_queue_thread(void *d)
>>  spin_unlock_irq(q->queue_lock);
>>
>>  if (req || mq->mqrq_prev->req) {
>> +/*
>> + * If this is the first request, BKOPs might be in
>> + * progress and needs to be stopped before issuing the
>> + * request
>> + */
>> +if (card->ext_csd.bkops_en &&
>> +card->bkops_info.started_delayed_bkops) {
>> +card->bkops_info.started_delayed_bkops = false;
>> +mmc_stop_bkops(card);
> if mmc_stop_bkops is failed..?
>> +}
>> +
>>  set_current_state(TASK_RUNNING);
>>  mq->issue_fn(mq, req);
>>  } else {
>> @@ -73,6 +85,7 @@ static int mmc_queue_thread(void *d)
>>  set_current_state(TASK_RUNNING);
>>  break;
>>  }
>> +mmc_start_delayed_bkops(card);
>>  up(>thread_sem);
>>  schedule();
>>  down(>thread_sem);
>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>> index 6612163..fd8783d 100644
>> --- a/drivers/mmc/core/core.c
>> +++ b/drivers/mmc/core/core.c
>> @@ -253,9 +253,42 @@ mmc_start_request(struct mmc_host *host, struct
>> mmc_request *mrq)
>>  }
>>
>>  /**
>> + * mmc_start_delayed_bkops() - Start a delayed work to check for
>> + *  the need of non urgent BKOPS
>> + *
>> + * @card: MMC card to start BKOPS on
>> + */
>> +void mmc_start_delayed_bkops(struct mmc_card *card)
>> +{
>> +if (!card || !card->ext_csd.bkops_en || mmc_card_doing_bkops(card))
>> +return;
>> +
>> +if (card->bkops_info.sectors_changed <
>> +BKOPS_MIN_SECTORS_TO_QUEUE_DELAYED_WORK)
>> +return;
>> +
>> +pr_debug("%s: %s: queueing delayed_bkops_work\n",
>> + mmc_hostname(card->host), __func__);
>> +
>> +card->bkops_info.sectors_changed = 0;
>> +
>> +/*
>> + * cancel_delayed_bkops_work will prevent a race condition between
>> + * fetching a request by the mmcqd and the delayed work, in case
>> + * it was removed from the queue work but not started yet
>> + */
>> +card->bkops_info.cancel_delayed_work = false;
>> +card->bkops_info.started_delayed_bkops = true;
>> +queue_delayed_work(system_nrt_wq, >bkops_info.dw,
>> +   msecs_to_jiffies(
>> +   card->bkops_info.delay_ms));
>> +}
>> +EXPORT_SYMBOL(mmc_start_delayed_bkops);
>> +
>> +/**
>>   *  mmc_start_bkops - start BKOPS for supported cards
>>   *  @card: MMC card to start BKOPS
>> - *  @form_exception: A flag to indicate if this function was
>> + *  @from_exception: A flag to indicate if this function was
>>   *   called due to an exception raised by the card
>>   *
>>   *  Start background operations whenever requested.
>> @@ -269,25 +302,47 @@ void mmc_start_bkops(struct mmc_card *card, bool
>> from_exception)
>>  bool use_busy_signal;
>>
>>  BUG_ON(!card);
>> -
>> -if (!card->ext_csd.bkops_en ||

Re: [PATCH Resend V2] dt: add helper function to read u8 & u16 variables & arrays

2012-11-06 Thread viresh kumar

On Tue, Nov 6, 2012 at 7:48 PM, Rob Herring  wrote:
>> +#define of_property_read_array(_np, _pname, _out, _sz)  
>>  \

>> + while (_sz--)   \
>> + *_out++ = (typeof(*_out))be32_to_cpup(_val++);  \

> This will not work. You are incrementing _out by 1, 2, or 4 bytes, but
> _val is always incremented by 4 bytes.
>
> According to the dtc commit adding this feature, the values are packed:
>
> With this patch the following property assignment:
>
> property = /bits/ 16 <0x1234 0x5678 0x0 0x>;
>
> is equivalent to:
>
> property = <0x12345678 0x>;

I thought of it a bit more and wasn't actually aligned with your explanation :(

If that is the case, how will current implementation of u32 array will work
if we pass something like: 0x88 0x8400 0x5890 from DT?

So, i did a dummy test of my current implementation, with following changes
in one of my drivers:

dts changes:

cluster0: cluster@0 {
+   data1 = <0x50 0x60 0x70>;
+   data2 = <0x5000 0x6000 0x7000>;
+   data3 = <0x5000 0x6000 0x7000>;
   }

driver changes:

+void test(struct device_node *cluster)
+{
+   u8 data1[3];
+   u16 data2[3];
+   u32 data3[3], i;
+
+   of_property_read_u8_array(cluster, "data1", data1, 3);
+   of_property_read_u16_array(cluster, "data2", data2, 3);
+   of_property_read_u32_array(cluster, "data3", data3, 3);
+
+   for (i = 0; i < 3; i++) {
+   printk(KERN_INFO "u8 %d: %x\n", i, data1[i]);
+   printk(KERN_INFO "u16 %d: %x\n", i, data2[i]);
+   printk(KERN_INFO "u32 %d: %x\n", i, data3[i]);
+   }
+}

And following is the output

[    4.087205] u8 0: 50
[    4.093746] u16 0: 5000
[    4.101067] u32 0: 5000
[    4.109512] u8 1: 60
[    4.116036] u16 1: 6000
[    4.123357] u32 1: 6000
[    4.131718] u8 2: 70
[    4.138241] u16 2: 7000
[    4.145573] u32 2: 7000

which looks to be what we were looking for, isn't it?

Following is fixup for the doc comment missing:

commit 00803aed0781de451048df0d15a3e8c814a343c8
Author: Viresh Kumar 
Date:   Wed Nov 7 09:48:46 2012 +0530

fixup! dt: add helper function to read u8 & u16 variables & arrays
---
 drivers/of/base.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index fbb634b..4a6632e 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -669,6 +669,7 @@ EXPORT_SYMBOL(of_find_node_by_phandle);
  * @np:device node from which the property value is to be read.
  * @propname:  name of the property to be searched.
  * @out_value: pointer to return value, modified only if return value is 0.
+ * @sz:number of array elements to read
  *
  * Search for a property in a device node and read 8-bit value(s) from
  * it. Returns 0 on success, -EINVAL if the property does not exist,
@@ -690,6 +691,7 @@ EXPORT_SYMBOL_GPL(of_property_read_u8_array);
  * @np:device node from which the property value is to be read.
  * @propname:  name of the property to be searched.
  * @out_value: pointer to return value, modified only if return value is 0.
+ * @sz:number of array elements to read
  *
  * Search for a property in a device node and read 16-bit value(s) from
  * it. Returns 0 on success, -EINVAL if the property does not exist,
@@ -712,6 +714,7 @@ EXPORT_SYMBOL_GPL(of_property_read_u16_array);
  * @np:device node from which the property value is to be read.
  * @propname:  name of the property to be searched.
  * @out_value: pointer to return value, modified only if return value is 0.
+ * @sz:number of array elements to read
  *
  * Search for a property in a device node and read 32-bit value(s) from
  * it. Returns 0 on success, -EINVAL if the property does not exist,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch added to -mm tree

2012-11-06 Thread Jeff Liu

Hi Andrew and Kees,

Great thanks for both your comments!

On 11/07/2012 09:11 AM, Kees Cook wrote:
> Hrm, I don't like this. get_random_int() specifically says: "Get a
> random word for internal kernel use only." The intent of AT_RANDOM is
> for userspace pRNG seeding (though glibc currently uses it directly
> for stack protector and pointer mangling), which is not "internal
> kernel use only". :) Though I suppose this is already being used for
> the randomize_stack_top(), but I think it'd still be better to use
> higher quality bits.
Btw Kees, does it sounds make sense if we just return the 16 bytes
uninitialized stack array if the user disable the stack randomize via
"/proc/sys/kernel/randomize_va_space = 0" or via the related sysctl, or
even specified norandmaps on boot?

I guess this sounds more stupid since some scripts kids would like it
for writing exploits. :-P
> 
> Notes below...
> 
> On Tue, Nov 6, 2012 at 4:16 PM,   wrote:
>>
>> The patch titled
>>  Subject: binfmt_elf.c: use get_random_int() to fix entropy depleting
>> has been added to the -mm tree.  Its filename is
>>  binfmt_elfc-use-get_random_int-to-fix-entropy-depleting.patch
>>
>> Before you just go and hit "reply", please:
>>a) Consider who else should be cc'ed
>>b) Prefer to cc a suitable mailing list as well
>>c) Ideally: find the original patch on the mailing list and do a
>>   reply-to-all to that, adding suitable additional cc's
>>
>> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>>
>> The -mm tree is included into linux-next and is updated
>> there every 3-4 working days
>>
>> --
>> From: Jeff Liu 
>> Subject: binfmt_elf.c: use get_random_int() to fix entropy depleting
>>
>> Entropy is quickly depleted under normal operations like ls(1), cat(1),
>> etc...  between 2.6.30 to current mainline, for instance:
>>
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 3428
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2911
>> $cat /proc/sys/kernel/random/entropy_avail
>> 2620
>>
>> We observed this problem has been occurring since 2.6.30 with
>> fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by
>> f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
>>
>> /*
>>  * Generate 16 random bytes for userspace PRNG seeding.
>>  */
>> get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
>>
>> The patch introduces a wrapper around get_random_int() which has lower
>> overhead than calling get_random_bytes() directly.
>>
>> With this patch applied:
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2731
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2802
>> $ cat /proc/sys/kernel/random/entropy_avail
>> 2878
>>
>> Analyzed by John Sobecki.
>>
>> Signed-off-by: Jie Liu 
>> Cc: John Sobecki 
>> Cc: Al Viro 
>> Cc: Andreas Dilger 
>> Cc: Alan Cox 
>> Cc: Arnd Bergmann 
>> Cc: James Morris 
>> Cc: Ted Ts'o 
>> Cc: Greg Kroah-Hartman 
>> Cc: Kees Cook 
>> Cc: Jakub Jelinek 
>> Cc: Ulrich Drepper 
>> Signed-off-by: Andrew Morton 
>> ---
>>
>>  fs/binfmt_elf.c |   26 +-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff -puN 
>> fs/binfmt_elf.c~binfmt_elfc-use-get_random_int-to-fix-entropy-depleting 
>> fs/binfmt_elf.c
>> --- a/fs/binfmt_elf.c~binfmt_elfc-use-get_random_int-to-fix-entropy-depleting
>> +++ a/fs/binfmt_elf.c
>> @@ -48,6 +48,7 @@ static int load_elf_binary(struct linux_
>>  static int load_elf_library(struct file *);
>>  static unsigned long elf_map(struct file *, unsigned long, struct elf_phdr 
>> *,
>> int, int, unsigned long);
>> +static void randomize_stack_user(unsigned char *buf, size_t nbytes);
>>
>>  /*
>>   * If we don't support core dumping, then supply a NULL so we
>> @@ -200,7 +201,7 @@ create_elf_tables(struct linux_binprm *b
>> /*
>>  * Generate 16 random bytes for userspace PRNG seeding.
>>  */
>> -   get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
>> +   randomize_stack_user(k_rand_bytes, sizeof(k_rand_bytes));
>> u_rand_bytes = (elf_addr_t __user *)
>>STACK_ALLOC(p, sizeof(k_rand_bytes));
>> if (__copy_to_user(u_rand_bytes, k_rand_bytes, sizeof(k_rand_bytes)))
>> @@ -558,6 +559,29 @@ static unsigned long randomize_stack_top
>>  #endif
>>  }
>>
>> +/*
>> + * A wrapper of get_random_int() to generate random bytes which has lower
>> + * overhead than call get_random_bytes() directly.
>> + * create_elf_tables() call this function to generate 16 random bytes for
>> + * userspace PRNG seeding.
>> + */
>> +static void randomize_stack_user(unsigned char *buf, size_t nbytes)
>> +{
>> +   unsigned char *p = buf;
>> +
>> +   while (nbytes) {
>> +   unsigned int random_variable;
>> +   size_t chunk = min(nbytes, sizeof(unsigned int));
>> +
>> +   random_variable = get_random_int() & STACK_RND_MASK;
>> +

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-06 Thread Sasha Levin

On 11/06/2012 10:54 PM, Michel Lespinasse wrote:
> On Tue, Nov 6, 2012 at 12:24 AM, Michel Lespinasse  wrote:
>> On Mon, Nov 5, 2012 at 5:41 AM, Michel Lespinasse  wrote:
>>> On Sun, Nov 4, 2012 at 8:44 PM, Michel Lespinasse  wrote:
 On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu  wrote:
> Hmm, I attached a simple fix patch.

 Reviewed-by: Michel Lespinasse 
 (also ran some tests with it, but I could never reproduce the original
 issue anyway).
>>>
>>> Wait a minute, this is actually wrong. You need to call
>>> vma_lock_anon_vma() / vma_unlock_anon_vma() to avoid the issue with
>>> vma->anon_vma == NULL.
>>>
>>> I'll fix it and integrate it into my next patch series, which I intend
>>> to send later today. (I am adding new code into validate_mm(), so that
>>> it's easier to have it in the same patch series to avoid merge
>>> conflicts)
>>
>> Hmmm, now I'm getting confused about anon_vma locking again :/
>>
>> As Hugh privately remarked to me, the same_vma linked list is supposed
>> to be protected by exclusive mmap_sem ownership, not by anon_vma lock.
>> So now looking at it a bit more, I'm not sure what race we're
>> preventing by taking the anon_vma lock in validate_mm() ???
> 
> Looking at it a bit more:
> 
> the same_vma linked list is *generally* protected by *exclusive*
> mmap_sem ownership. However, in expand_stack() we only have *shared*
> mmap_sem ownership, so that two concurrent expand_stack() calls
> (possibly on different vmas that have a different anon_vma lock) could
> race with each other. For this reason we do need the validate_mm()
> taking each vma's anon_vma lock (if any) before calling
> anon_vma_interval_tree_verify().
> 
> While this justifies Bob's patch, this does not explain Sasha's
> reports - in both of them the backtrace did not involve
> expand_stack(), and there should be exclusive mmap_sem ownership, so
> I'm still unclear as to what could be causing Sasha's issue.
> 
> Sasha, how reproduceable is this ?

This is pretty hard to reproduce, I've seen this only twice so far.

> 
> Also, would the following change print something when the issue triggers ?

I'll run it with your patch, but as I've mentioned above - it's a PITA
to reproduce.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm: NULL ptr deref in anon_vma_interval_tree_verify

2012-11-06 Thread Michel Lespinasse

On Tue, Nov 6, 2012 at 12:24 AM, Michel Lespinasse  wrote:
> On Mon, Nov 5, 2012 at 5:41 AM, Michel Lespinasse  wrote:
>> On Sun, Nov 4, 2012 at 8:44 PM, Michel Lespinasse  wrote:
>>> On Sun, Nov 4, 2012 at 8:14 PM, Bob Liu  wrote:
 Hmm, I attached a simple fix patch.
>>>
>>> Reviewed-by: Michel Lespinasse 
>>> (also ran some tests with it, but I could never reproduce the original
>>> issue anyway).
>>
>> Wait a minute, this is actually wrong. You need to call
>> vma_lock_anon_vma() / vma_unlock_anon_vma() to avoid the issue with
>> vma->anon_vma == NULL.
>>
>> I'll fix it and integrate it into my next patch series, which I intend
>> to send later today. (I am adding new code into validate_mm(), so that
>> it's easier to have it in the same patch series to avoid merge
>> conflicts)
>
> Hmmm, now I'm getting confused about anon_vma locking again :/
>
> As Hugh privately remarked to me, the same_vma linked list is supposed
> to be protected by exclusive mmap_sem ownership, not by anon_vma lock.
> So now looking at it a bit more, I'm not sure what race we're
> preventing by taking the anon_vma lock in validate_mm() ???

Looking at it a bit more:

the same_vma linked list is *generally* protected by *exclusive*
mmap_sem ownership. However, in expand_stack() we only have *shared*
mmap_sem ownership, so that two concurrent expand_stack() calls
(possibly on different vmas that have a different anon_vma lock) could
race with each other. For this reason we do need the validate_mm()
taking each vma's anon_vma lock (if any) before calling
anon_vma_interval_tree_verify().

While this justifies Bob's patch, this does not explain Sasha's
reports - in both of them the backtrace did not involve
expand_stack(), and there should be exclusive mmap_sem ownership, so
I'm still unclear as to what could be causing Sasha's issue.

Sasha, how reproduceable is this ?

Also, would the following change print something when the issue triggers ?

diff --git a/mm/mmap.c b/mm/mmap.c
index 619b280505fe..4c09e7ebcfa7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -404,8 +404,13 @@ void validate_mm(struct mm_struct *mm)
while (vma) {
struct anon_vma_chain *avc;
vma_lock_anon_vma(vma);
-   list_for_each_entry(avc, >anon_vma_chain, same_vma)
+   list_for_each_entry(avc, >anon_vma_chain, same_vma) {
+   if (avc->vma != vma) {
+   printk("avc->vma %p vma %p\n", avc->vma, vma);
+   bug = 1;
+   }
anon_vma_interval_tree_verify(avc);
+   }
vma_unlock_anon_vma(vma);
highest_address = vma->vm_end;
vma = vma->vm_next;

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/6] mm: teach mm by current context info to not do I/O during memory allocation

2012-11-06 Thread Andrew Morton

On Wed, 7 Nov 2012 11:11:24 +0800 Ming Lei  wrote:

> On Wed, Nov 7, 2012 at 7:23 AM, Andrew Morton  
> wrote:
> >
> > It's unclear from the description why we're also clearing __GFP_FS in
> > this situation.
> >
> > If we can avoid doing this then there will be a very small gain: there
> > are some situations in which a filesystem can clean pagecache without
> > performing I/O.
> 
> Firstly,  the patch follows the policy in the system suspend/resume situation,
> in which the __GFP_FS is cleared, and basically the problem is very similar
> with that in system PM path.

I suspect that code is wrong.  Or at least, suboptimal.

> Secondly, inside shrink_page_list(), pageout() may be triggered on dirty anon
> page if __GFP_FS is set.

pageout() should be called if GFP_FS is set or if GFP_IO is set and the
IO is against swap.

And that's what we want to happen: we want to enter the fs to try to
turn dirty pagecache into clean pagecache without doing IO.  If we in
fact enter the device drivers when GFP_IO was not set then that's a bug
which we should fix.

> IMO, if performing I/O can be completely avoided when __GFP_FS is set, the
> flag can be kept, otherwise it is better to clear it in the situation.

yup.

> >
> > Also, you can probably put the unlikely() inside memalloc_noio() and
> > avoid repeating it at all the callsites.
> >
> > And it might be neater to do:
> >
> > /*
> >  * Nice comment goes here
> >  */
> > static inline gfp_t memalloc_noio_flags(gfp_t flags)
> > {
> > if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> > flags &= ~GFP_IOFS;
> > return flags;
> > }
> 
> But without the check in callsites, some local variables will be write
> two times,
> so it is better to not do it.

I don't see why - we just modify the incoming gfp_t at the start of the
function, then use it.

It gets a bit tricky with those struct initialisations.  Things like

struct foo bar {
.a = a1,
.b = b1,
};

should not be turned into

struct foo bar {
.a = a1,
};

bar.b = b1;

and we don't want to do

struct foo bar { };

bar.a = a1;
bar.b = b1;

either, because these are indeed a double-write.  But we can do

struct foo bar {
.flags = (flags = memalloc_noio_flags(flags)),
.b = b1,
};

which is a bit arcane but not t bad.  Have a think about it...


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] Do not change worker's running cpu in cmci_rediscover().

2012-11-06 Thread Tang Chen


On 11/06/2012 10:44 PM, Borislav Petkov wrote:

On Tue, Nov 06, 2012 at 07:17:26PM +0800, Tang Chen wrote:

Hi Tony, Borislav,

Would you please help to review this patch ?


I'm guessing mingo or hpa haven't pulled yet:

http://marc.info/?l=linux-kernel=135163452500984


Hum, thank you very much for the info. :)



Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: macbook pro 9.2 stat/ata bus error

2012-11-06 Thread Azat Khuzhin

 Anybody?

On Mon, Nov 5, 2012 at 7:28 PM, Azat Khuzhin  wrote:
> After installing linux on macbook 9.2 (mid 2012), I have next errors
> in dmesg log:
>
> [  389.623828] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=600
> [  410.038465] NMI watchdog: enabled on all CPUs, permanently consumes
> one hw-PMU counter.
> [  410.075042] ehci_hcd :00:1a.0: setting latency timer to 64
> [  410.483526] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=0
> [ 1401.834509] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=1800
> [ 1406.467268] NMI watchdog: enabled on all CPUs, permanently consumes
> one hw-PMU counter.
> [ 1406.506769] ehci_hcd :00:1a.0: setting latency timer to 64
> [ 1406.590122] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=0
> [ 1407.492260] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x5
> action 0xe frozen
> [ 1407.494441] ata2.00: irq_stat 0x0040, PHY RDY changed
> [ 1407.495238] ata2: SError: { PHYRdyChg CommWake }
> [ 1407.496035] sr 1:0:0:0: CDB:
> [ 1407.497333] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
> [ 1407.498285] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0
> pio 16392 in
> [ 1407.498285]  res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask
> 0x10 (ATA bus error)
> [ 1407.501987] ata2.00: status: { DRDY }
> [ 1407.502882] ata2: hard resetting link
> [ 1408.230302] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 1408.233279] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [ 1408.237467] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [ 1408.239084] ata2.00: configured for UDMA/100
> [ 1408.262238] ata2: EH complete
> [ 3565.785609] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=1800
> [ 3576.921499] NMI watchdog: enabled on all CPUs, permanently consumes
> one hw-PMU counter.
> [ 3576.958624] ehci_hcd :00:1a.0: setting latency timer to 64
> [ 3577.114612] EXT4-fs (sda4): re-mounted. Opts:
> errors=remount-ro,data=ordered,commit=0
> [ 3577.923688] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x5
> action 0xe frozen
> [ 3577.925852] ata2.00: irq_stat 0x0040, PHY RDY changed
> [ 3577.926746] ata2: SError: { PHYRdyChg CommWake }
> [ 3577.927544] sr 1:0:0:0: CDB:
> [ 3577.928345] Get event status notification: 4a 01 00 00 10 00 00 00 08 00
> [ 3577.929642] ata2.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0
> pio 16392 in
> [ 3577.929642]  res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask
> 0x10 (ATA bus error)
> [ 3577.932954] ata2.00: status: { DRDY }
> [ 3577.934264] ata2: hard resetting link
> [ 3578.662228] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 3578.665211] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [ 3578.669355] ata2.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [ 3578.670969] ata2.00: configured for UDMA/100
> [ 3578.694145] ata2: EH complete
>
> Is it linux driver, or maybe
>
> $ lspci # sata information only
> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family
> 6-port SATA Controller [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0])
> Subsystem: Intel Corporation Device 7270
> Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 20
> I/O ports at 2098 [size=8]
> I/O ports at 20bc [size=4]
> I/O ports at 2090 [size=8]
> I/O ports at 20b8 [size=4]
> I/O ports at 2060 [size=32]
> Memory at a0816000 (32-bit, non-prefetchable) [size=2K]
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [70] Power Management version 3
> Capabilities: [a8] SATA HBA v1.0
> Capabilities: [b0] PCI Advanced Features
> Kernel driver in use: ahci
>
> $ uname -a
> Linux macbook-pro 3.6.5macbook-pro-custom-v0.1 #4 SMP Sun Nov 4
> 12:39:03 UTC 2012 x86_64 GNU/Linux
> $ cat /etc/debian_version
> wheezy/sid
>
> In OSX there is no errors with hard drive.
>
> What else can I do investigate this situation next?
>
> --
> Azat Khuzhin



-- 
Azat Khuzhin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/6] solve deadlock caused by memory allocation with I/O

2012-11-06 Thread Ming Lei

On Wed, Nov 7, 2012 at 7:23 AM, Andrew Morton  wrote:
>
> It generally looks OK to me.  I have a few comments and I expect to grab
> v5.

Andrew, thanks for your review, and I will prepare -v5 later.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: rcar: fixup compilation errors

2012-11-06 Thread Zhang Rui

On Tue, 2012-10-09 at 01:14 -0700, Kuninori Morimoto wrote:
> This patch fixup following error
> 
> ${LINUX}/drivers/thermal/rcar_thermal.c: In function 'rcar_thermal_probe':
> ${LINUX}/drivers/thermal/rcar_thermal.c:214:9: warning: passing argument 3 \
>   of 'thermal_zone_device_register' makes integer from pointer without\
>   a cast [enabled by default]
> ${LINUX}/include/linux/thermal.h:215:29: note: expected 'int' but argument \
>   is of type 'struct rcar_thermal_priv *'
> ${LINUX}/drivers/thermal/rcar_thermal.c:214:9:\
>   error: too few arguments to function 'thermal_zone_device_register'
> 
> Signed-off-by: Devendra Naga 
> Signed-off-by: Kuninori Morimoto 

shipped in 3.7-rc4.

thanks,
rui
> ---
> for linus/master branch
> 
>  drivers/thermal/rcar_thermal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> index b13fe5d..762f637 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -210,7 +210,7 @@ static int rcar_thermal_probe(struct platform_device 
> *pdev)
>   goto error_free_priv;
>   }
>  
> - zone = thermal_zone_device_register("rcar_thermal", 0, priv,
> + zone = thermal_zone_device_register("rcar_thermal", 0, 0, priv,
>   _thermal_zone_ops, NULL, 0, 0);
>   if (IS_ERR(zone)) {
>   dev_err(>dev, "thermal zone device is NULL\n");


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: rcar_thermal: remove explicitly used devm_kfree/iounap()

2012-11-06 Thread Zhang Rui



On Tue, 2012-10-02 at 23:51 -0700, Kuninori Morimoto wrote:
> devm_kfree and devm_iounmap should not have to be explicitly used
> 
> Signed-off-by: Kuninori Morimoto 

applied to thermal-next.

thanks,
rui
> ---
> This patch is based on Devendra's
> [PATCH] thermal: solve compilation errors in rcar_thermal
> 
>  drivers/thermal/rcar_thermal.c |   18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> index 762f637..81dce23 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -185,7 +185,6 @@ static int rcar_thermal_probe(struct platform_device 
> *pdev)
>   struct thermal_zone_device *zone;
>   struct rcar_thermal_priv *priv;
>   struct resource *res;
> - int ret;
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>   if (!res) {
> @@ -206,16 +205,14 @@ static int rcar_thermal_probe(struct platform_device 
> *pdev)
> res->start, resource_size(res));
>   if (!priv->base) {
>   dev_err(>dev, "Unable to ioremap thermal register\n");
> - ret = -ENOMEM;
> - goto error_free_priv;
> + return -ENOMEM;
>   }
>  
>   zone = thermal_zone_device_register("rcar_thermal", 0, 0, priv,
>   _thermal_zone_ops, NULL, 0, 0);
>   if (IS_ERR(zone)) {
>   dev_err(>dev, "thermal zone device is NULL\n");
> - ret = PTR_ERR(zone);
> - goto error_iounmap;
> + return PTR_ERR(zone);
>   }
>  
>   platform_set_drvdata(pdev, zone);
> @@ -223,26 +220,15 @@ static int rcar_thermal_probe(struct platform_device 
> *pdev)
>   dev_info(>dev, "proved\n");
>  
>   return 0;
> -
> -error_iounmap:
> - devm_iounmap(>dev, priv->base);
> -error_free_priv:
> - devm_kfree(>dev, priv);
> -
> - return ret;
>  }
>  
>  static int rcar_thermal_remove(struct platform_device *pdev)
>  {
>   struct thermal_zone_device *zone = platform_get_drvdata(pdev);
> - struct rcar_thermal_priv *priv = zone->devdata;
>  
>   thermal_zone_device_unregister(zone);
>   platform_set_drvdata(pdev, NULL);
>  
> - devm_iounmap(>dev, priv->base);
> - devm_kfree(>dev, priv);
> -
>   return 0;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-06 Thread Ming Lei

On Wed, Nov 7, 2012 at 7:24 AM, Andrew Morton  wrote:
>
> checkpatch finds a number of problems with this patch, all of which
> should be fixed.  Please always use checkpatch.

Sorry for missing the check.

>> + /* only clear the flag for one device if all
>> +  * children of the device don't set the flag.
>> +  */
>
> Such a comment is usually laid out as
>
> /*
>  * Only ...

Will do it in -v5.

> More significantly, the comment describes what the code is doing but
> not why the code is doing it.  The former is (usually) obvious from
> reading the C, and the latter is what good code comments address.
>
> And it's needed in this case.  Why does the code do this?

Suppose both two usb scsi disks which share the same usb
configuration(device) set the device memalloc_noio flag, and
its ancestors' memalloc_noio flag should be cleared only after
both the two usb scsi disk's flags have been cleared.

OK, we'll add comment on clearing flag.

>
> Also, can a device have more than one child?  If so, the code doesn't
> do what the comment says it does.

It should do that because device_for_each_child() returns true immediately
only if dev_memalloc_noio() for one child returns true.

>
>> + if (!dev || (!enable &&
>> +  device_for_each_child(dev, NULL,
>> +dev_memalloc_noio)))
>> + break;
>> + }
>> + mutex_unlock(_hotplug_mutex);
>> +}
>> +EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the arm-soc tree with the l2-mtd and pinctrl trees

2012-11-06 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
arch/arm/mach-nomadik/board-nhk8815.c between commit 1cd2fc449091 ("ARM:
nomadik: fixup some FSMC merge problems") from the l2-mtd tree, commits
bb16bd9b9da4 ("pinctrl/nomadik: move the platform data header") from the
pinctrl and commit 44e47ccf8ab6 ("Merge branch 'next/multiplatform' into
for-next") from the arm-soc tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-nomadik/board-nhk8815.c
index ab7104f,5ccdf53..000
--- a/arch/arm/mach-nomadik/board-nhk8815.c
+++ b/arch/arm/mach-nomadik/board-nhk8815.c
@@@ -30,11 -31,10 +32,9 @@@
  #include 
  #include 
  #include 
- #include 
  #include 
  #include 
- 
- #include 
 -#include 
+ #include 
  
  #include "cpu-8815.h"
  


pgpQX4euYf3mo.pgp
Description: PGP signature

[PATCH] bonding: rlb mode of bond should not alter ARP replies originating via bridge

2012-11-06 Thread Zheng Li

ARP traffic passing through a bridge and out via the bond (when the bond is a 
port of the bridge) should not have its source MAC address adjusted by the 
receive load balance code in rlb_arp_xmit.

Signed-off-by: Zheng Li 
Cc: Jay Vosburgh 
Cc: Andy Gospodarek 
Cc: "David S. Miller" 

---
 drivers/net/bonding/bond_alb.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index e15cc11..641b3f1 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -700,7 +700,17 @@ static struct slave *rlb_arp_xmit(struct sk_buff *skb, 
struct bonding *bond)
*/
tx_slave = rlb_choose_channel(skb, bond);
if (tx_slave) {
-   memcpy(arp->mac_src,tx_slave->dev->dev_addr, ETH_ALEN);
+   struct slave *tmp_slave = NULL;
+   int i = 0;
+   bond_for_each_slave(bond, tmp_slave, i) {
+   if (ether_addr_equal_64bits(arp->mac_src,
+   tmp_slave->dev->dev_addr)) {
+   memcpy(arp->mac_src,
+   tx_slave->dev->dev_addr,
+   ETH_ALEN);
+   break;
+   }
+   }
}
pr_debug("Server sent ARP Reply packet\n");
} else if (arp->op_code == htons(ARPOP_REQUEST)) {
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/6] mm: teach mm by current context info to not do I/O during memory allocation

2012-11-06 Thread Ming Lei

On Wed, Nov 7, 2012 at 7:23 AM, Andrew Morton  wrote:
>
> It's unclear from the description why we're also clearing __GFP_FS in
> this situation.
>
> If we can avoid doing this then there will be a very small gain: there
> are some situations in which a filesystem can clean pagecache without
> performing I/O.

Firstly,  the patch follows the policy in the system suspend/resume situation,
in which the __GFP_FS is cleared, and basically the problem is very similar
with that in system PM path.

Secondly, inside shrink_page_list(), pageout() may be triggered on dirty anon
page if __GFP_FS is set.

IMO, if performing I/O can be completely avoided when __GFP_FS is set, the
flag can be kept, otherwise it is better to clear it in the situation.

>
> It doesn't appear that the patch will add overhead to the alloc/free
> hotpaths, which is good.

Thanks for previous Minchan's comment.

>
>>
>> ...
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1805,6 +1805,7 @@ extern void thread_group_times(struct task_struct *p, 
>> cputime_t *ut, cputime_t *
>>  #define PF_FROZEN0x0001  /* frozen for system suspend */
>>  #define PF_FSTRANS   0x0002  /* inside a filesystem transaction */
>>  #define PF_KSWAPD0x0004  /* I am kswapd */
>> +#define PF_MEMALLOC_NOIO 0x0008  /* Allocating memory without IO 
>> involved */
>>  #define PF_LESS_THROTTLE 0x0010  /* Throttle me less: I clean memory */
>>  #define PF_KTHREAD   0x0020  /* I am a kernel thread */
>>  #define PF_RANDOMIZE 0x0040  /* randomize virtual address space */
>> @@ -1842,6 +1843,15 @@ extern void thread_group_times(struct task_struct *p, 
>> cputime_t *ut, cputime_t *
>>  #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
>>  #define used_math() tsk_used_math(current)
>>
>> +#define memalloc_noio() (current->flags & PF_MEMALLOC_NOIO)
>> +#define memalloc_noio_save(flag) do { \
>> + (flag) = current->flags & PF_MEMALLOC_NOIO; \
>> + current->flags |= PF_MEMALLOC_NOIO; \
>> +} while (0)
>> +#define memalloc_noio_restore(flag) do { \
>> + current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flag; \
>> +} while (0)
>> +
>
> Again with the ghastly macros.  Please, do this properly in regular old
> C, as previously discussed.  It really doesn't matter what daft things
> local_irq_save() did 20 years ago.  Just do it right!

OK, I will take inline function in -v5.

>
> Also, you can probably put the unlikely() inside memalloc_noio() and
> avoid repeating it at all the callsites.
>
> And it might be neater to do:
>
> /*
>  * Nice comment goes here
>  */
> static inline gfp_t memalloc_noio_flags(gfp_t flags)
> {
> if (unlikely(current->flags & PF_MEMALLOC_NOIO))
> flags &= ~GFP_IOFS;
> return flags;
> }

But without the check in callsites, some local variables will be write
two times,
so it is better to not do it.

>
>>   * task->jobctl flags
>>   */
>>
>> ...
>>
>> @@ -2304,6 +2304,12 @@ unsigned long try_to_free_pages(struct zonelist 
>> *zonelist, int order,
>>   .gfp_mask = sc.gfp_mask,
>>   };
>>
>> + if (unlikely(memalloc_noio())) {
>> + gfp_mask &= ~GFP_IOFS;
>> + sc.gfp_mask = gfp_mask;
>> + shrink.gfp_mask = sc.gfp_mask;
>> + }
>
> We can avoid writing to shrink.gfp_mask twice.  And maybe sc.gfp_mask
> as well.  Unclear, I didn't think about it too hard ;)

Yes, we can do it by initializing 'shrink' local variable just after the branch,
so one writing is enough. Will do it in -v5.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 0/7] make balloon pages movable by compaction

2012-11-06 Thread Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Following are numbers that prove this patch benefits on allowing compaction
to be more effective at memory ballooned guests.

Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB chunks,
at every minute (inflating/deflating), while test was running:

===BEGIN stress-highalloc

STRESS-HIGHALLOC
 highalloc-3.7 highalloc-3.7
 rc4-clean rc4-patch
Pass 1  55.00 ( 0.00%)62.00 ( 7.00%)
Pass 2  54.00 ( 0.00%)62.00 ( 8.00%)
while Rested75.00 ( 0.00%)80.00 ( 5.00%)

MMTests Statistics: duration
 3.7 3.7
   rc4-clean   rc4-patch
User 1207.59 1207.46
System   1300.55 1299.61
Elapsed  2273.72 2157.06

MMTests Statistics: vmstat
3.7 3.7
  rc4-clean   rc4-patch
Page Ins3581516 2374368
Page Outs  1114869210410332
Swap Ins 80  47
Swap Outs  3641 476
Direct pages scanned  37978   33826
Kswapd pages scanned1828245 1342869
Kswapd pages reclaimed  1710236 1304099
Direct pages reclaimed32207   31005
Kswapd efficiency   93% 97%
Kswapd velocity 804.077 622.546
Direct efficiency   84% 91%
Direct velocity  16.703  15.682
Percentage direct scans  2%  2%
Page writes by reclaim792529704
Page writes file  756119228
Page writes anon   3641 476
Page reclaim immediate16764   11014
Page rescued immediate0   0
Slabs scanned   2171904 2152448
Direct inode steals 3852261
Kswapd inode steals  659137  609670
Kswapd skipped wait   1  69
THP fault alloc 546 631
THP collapse alloc  361 339
THP splits  259 263
THP fault fallback   98  50
THP collapse fail20  17
Compaction stalls   747 499
Compaction success  244 145
Compaction failures 503 354
Compaction pages moved   370888  474837
Compaction move failure   77378   65259

===END stress-highalloc

Rafael Aquini (7):
  mm: adjust address_space_operations.migratepage() return code
  mm: redefine address_space.assoc_mapping
  mm: introduce a common interface for balloon pages mobility
  mm: introduce compaction and migration for ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce putback_movable_pages()
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c| 136 +--
 fs/buffer.c|  12 +-
 fs/gfs2/glock.c|   2 +-
 fs/hugetlbfs/inode.c   |   4 +-
 fs/inode.c |   2 +-
 fs/nilfs2/page.c   |   2 +-
 include/linux/balloon_compaction.h | 220 ++
 include/linux/fs.h |   2 +-
 include/linux/migrate.h|  19 +++
 include/linux/pagemap.h|  16 +++
 include/linux/vm_event_item.h  |   8 +-
 mm/Kconfig |  15 ++
 mm/Makefile|   1 +
 mm/balloon_compaction.c| 271 +
 mm/compaction.c|  27 +++-
 mm/migrate.c   |  77 +--
 mm/page_alloc.c|   2 +-
 mm/vmstat.c|  10 +-
 18 files changed, 782 insertions(+), 44 deletions(-)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

Change log:
v11:
 * Address AKPM's last review suggestions;
 * Extend the balloon compaction common API and simplify its usage at driver;
 * Minor nitpicks on code commentary;
v10:
 * Adjust leak_balloon() wait_event logic to make a clear locking scheme (MST);
 * Drop the RCU

[PATCH v11 2/7] mm: redefine address_space.assoc_mapping

2012-11-06 Thread Rafael Aquini

This patch overhauls struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*.
By this approach we consistently name the .private_* elements from
struct address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.

Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change.

Signed-off-by: Rafael Aquini 
---
 fs/buffer.c| 12 ++--
 fs/gfs2/glock.c|  2 +-
 fs/inode.c |  2 +-
 fs/nilfs2/page.c   |  2 +-
 include/linux/fs.h |  2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b5f0442..e0bad95 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -555,7 +555,7 @@ void emergency_thaw_all(void)
  */
 int sync_mapping_buffers(struct address_space *mapping)
 {
-   struct address_space *buffer_mapping = mapping->assoc_mapping;
+   struct address_space *buffer_mapping = mapping->private_data;
 
if (buffer_mapping == NULL || list_empty(>private_list))
return 0;
@@ -588,10 +588,10 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, 
struct inode *inode)
struct address_space *buffer_mapping = bh->b_page->mapping;
 
mark_buffer_dirty(bh);
-   if (!mapping->assoc_mapping) {
-   mapping->assoc_mapping = buffer_mapping;
+   if (!mapping->private_data) {
+   mapping->private_data = buffer_mapping;
} else {
-   BUG_ON(mapping->assoc_mapping != buffer_mapping);
+   BUG_ON(mapping->private_data != buffer_mapping);
}
if (!bh->b_assoc_map) {
spin_lock(_mapping->private_lock);
@@ -788,7 +788,7 @@ void invalidate_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = >i_data;
struct list_head *list = >private_list;
-   struct address_space *buffer_mapping = mapping->assoc_mapping;
+   struct address_space *buffer_mapping = mapping->private_data;
 
spin_lock(_mapping->private_lock);
while (!list_empty(list))
@@ -811,7 +811,7 @@ int remove_inode_buffers(struct inode *inode)
if (inode_has_buffers(inode)) {
struct address_space *mapping = >i_data;
struct list_head *list = >private_list;
-   struct address_space *buffer_mapping = mapping->assoc_mapping;
+   struct address_space *buffer_mapping = mapping->private_data;
 
spin_lock(_mapping->private_lock);
while (!list_empty(list)) {
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e6c2fd5..0f22d09 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -768,7 +768,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
mapping->host = s->s_bdev->bd_inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
-   mapping->assoc_mapping = NULL;
+   mapping->private_data = NULL;
mapping->backing_dev_info = s->s_bdi;
mapping->writeback_index = 0;
}
diff --git a/fs/inode.c b/fs/inode.c
index b03c719..4cac8e1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -165,7 +165,7 @@ int inode_init_always(struct super_block *sb, struct inode 
*inode)
mapping->host = inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
-   mapping->assoc_mapping = NULL;
+   mapping->private_data = NULL;
mapping->backing_dev_info = _backing_dev_info;
mapping->writeback_index = 0;
 
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 3e7b2a0..07f76db 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -431,7 +431,7 @@ void nilfs_mapping_init(struct address_space *mapping, 
struct inode *inode,
mapping->host = inode;
mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_NOFS);
-   mapping->assoc_mapping = NULL;
+   mapping->private_data = NULL;
mapping->backing_dev_info = bdi;
mapping->a_ops = _aops;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..0982565 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -418,7 +418,7 @@ struct address_space {
struct backing_dev_info *backing_dev_info; /* device readahead, etc */
spinlock_t  private_lock;   /* for use by the address_space 
*/
struct list_headprivate_list;   /* ditto */
-   struct address_space*assoc_mapping; /* ditto */
+   void*private_data;  /* ditto */
 } __attribute__((aligned(sizeof(long;
/*
 * On most architectures that alignment is already the case; but
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

[PATCH v11 6/7] mm: introduce putback_movable_pages()

2012-11-06 Thread Rafael Aquini

The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini 
---
 include/linux/migrate.h |  2 ++
 mm/compaction.c |  6 +++---
 mm/migrate.c| 20 
 mm/page_alloc.c |  2 +-
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index e570c3c..ff074a4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -27,6 +27,7 @@ typedef struct page *new_page_t(struct page *, unsigned long 
private, int **);
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -50,6 +51,7 @@ extern int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index 76abd84..f268bd8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -995,7 +995,7 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
-   putback_lru_pages(>migratepages);
+   putback_movable_pages(>migratepages);
cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
@@ -1018,9 +1018,9 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);
 
-   /* Release LRU pages not migrated */
+   /* Release isolated pages not migrated */
if (err) {
-   putback_lru_pages(>migratepages);
+   putback_movable_pages(>migratepages);
cc->nr_migratepages = 0;
if (err == -ENOMEM) {
ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index 87ffe54..adb3d44 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -80,6 +80,26 @@ void putback_lru_pages(struct list_head *l)
list_del(>lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+   putback_lru_page(page);
+   }
+}
+
+/*
+ * Put previously isolated pages back onto the appropriate lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+   struct page *page;
+   struct page *page2;
+
+   list_for_each_entry_safe(page, page2, l, lru) {
+   list_del(>lru);
+   dec_zone_page_state(page, NR_ISOLATED_ANON +
+   page_is_file_cache(page));
if (unlikely(balloon_page_movable(page)))
balloon_page_putback(page);
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..1cb0f93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5710,7 +5710,7 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
0, false, MIGRATE_SYNC);
}
 
-   putback_lru_pages(>migratepages);
+   putback_movable_pages(>migratepages);
return ret > 0 ? 0 : ret;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 3/7] mm: introduce a common interface for balloon pages mobility

2012-11-06 Thread Rafael Aquini

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini 
---
 include/linux/balloon_compaction.h | 220 ++
 include/linux/migrate.h|  10 ++
 include/linux/pagemap.h|  16 +++
 mm/Kconfig |  15 +++
 mm/Makefile|   1 +
 mm/balloon_compaction.c| 269 +
 6 files changed, 531 insertions(+)
 create mode 100644 include/linux/balloon_compaction.h
 create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h 
b/include/linux/balloon_compaction.h
new file mode 100644
index 000..1865bd5
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,220 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc.  Rafael Aquini 
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Balloon device information descriptor.
+ * This struct is used to allow the common balloon compaction interface
+ * procedures to find the proper balloon device holding memory pages they'll
+ * have to cope for page compaction / migration, as well as it serves the
+ * balloon driver as a page book-keeper for its registered balloon devices.
+ */
+struct balloon_dev_info {
+   void *balloon_device;   /* balloon device descriptor */
+   struct address_space *mapping;  /* balloon special page->mapping */
+   unsigned long isolated_pages;   /* # of isolated pages for migration */
+   spinlock_t pages_lock;  /* Protection to pages list */
+   struct list_head pages; /* Pages enqueued & handled to Host */
+};
+
+extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
+extern struct balloon_dev_info *balloon_devinfo_alloc(
+   void *balloon_dev_descriptor);
+
+static inline void balloon_devinfo_free(struct balloon_dev_info *b_dev_info)
+{
+   kfree(b_dev_info);
+}
+
+#ifdef CONFIG_BALLOON_COMPACTION
+extern bool balloon_page_isolate(struct page *page);
+extern void balloon_page_putback(struct page *page);
+extern int balloon_page_migrate(struct page *newpage,
+   struct page *page, enum migrate_mode mode);
+extern struct address_space
+*balloon_mapping_alloc(struct balloon_dev_info *b_dev_info,
+   const struct address_space_operations *a_ops);
+
+static inline void balloon_mapping_free(struct address_space *balloon_mapping)
+{
+   kfree(balloon_mapping);
+}
+
+/*
+ * balloon_page_insert - insert a page into the balloon's page list and make
+ *  the page->mapping assignment accordingly.
+ * @page: page to be assigned as a 'balloon page'
+ * @mapping : allocated special 'balloon_mapping'
+ * @head: balloon's device page list head
+ */
+static inline void balloon_page_insert(struct page *page,
+  struct address_space *mapping,
+  struct list_head *head)
+{
+   list_add(>lru, head);
+   /*
+* Make sure the page is already inserted on balloon's page list
+* before assigning its ->mapping.
+*/
+   smp_wmb();
+   page->mapping = mapping;
+}
+
+/*
+ * balloon_page_delete - clear the page->mapping and delete the page from
+ *  balloon's page list accordingly.
+ * @page: page to be released from balloon's page list
+ */
+static inline void balloon_page_delete(struct page *page)
+{
+   page->mapping = NULL;
+   /*
+* Make sure page->mapping is cleared before we proceed with
+* balloon's page list deletion.
+*/
+   smp_wmb();
+   list_del(>lru);
+}
+
+/*
+ * __is_movable_balloon_page - helper to perform @page mapping->flags tests
+ */
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+   /*
+* we might attempt to read ->mapping concurrently to other
+* threads trying to write to it.
+*/
+   struct address_space *mapping = ACCESS_ONCE(page->mapping);
+   smp_read_barrier_depends();
+   return mapping_balloon(mapping);
+}
+
+/*
+ * balloon_page_movable - test page->mapping->flags to identify balloon pages
+ *

[PATCH v11 7/7] mm: add vm event counters for balloon pages compaction

2012-11-06 Thread Rafael Aquini

This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c |  1 +
 include/linux/vm_event_item.h   |  8 +++-
 mm/balloon_compaction.c |  2 ++
 mm/migrate.c|  1 +
 mm/vmstat.c | 10 +-
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 69eede7..3756fc1 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -411,6 +411,7 @@ int virtballoon_migratepage(struct address_space *mapping,
tell_host(vb, vb->deflate_vq);
 
mutex_unlock(>balloon_lock);
+   balloon_event_count(COMPACTBALLOONMIGRATED);
 
return MIGRATEPAGE_BALLOON_SUCCESS;
 }
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3d31145..cbd72fc 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#ifdef CONFIG_BALLOON_COMPACTION
+   COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+   COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+   COMPACTBALLOONRELEASED, /* old-page released after migration */
+   COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+#endif /* CONFIG_BALLOON_COMPACTION */
+#endif /* CONFIG_COMPACTION */
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 90935aa..32927eb 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -215,6 +215,7 @@ bool balloon_page_isolate(struct page *page)
if (__is_movable_balloon_page(page) &&
page_count(page) == 2) {
__isolate_balloon_page(page);
+   balloon_event_count(COMPACTBALLOONISOLATED);
unlock_page(page);
return true;
}
@@ -237,6 +238,7 @@ void balloon_page_putback(struct page *page)
if (__is_movable_balloon_page(page)) {
__putback_balloon_page(page);
put_page(page);
+   balloon_event_count(COMPACTBALLOONRETURNED);
} else {
__WARN();
dump_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index adb3d44..ee3037d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -896,6 +896,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+   balloon_event_count(COMPACTBALLOONRELEASED);
return 0;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1363edc 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -781,7 +781,15 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
-#endif
+
+#ifdef CONFIG_BALLOON_COMPACTION
+   "compact_balloon_isolated",
+   "compact_balloon_migrated",
+   "compact_balloon_released",
+   "compact_balloon_returned",
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* CONFIG_COMPACTION */
 
 #ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1166 matches

Mail list logo