date:20130701

Re: [PATCH V2 12/15] perf tools: allow non-matching sample types

2013-07-01 Thread Adrian Hunter

On 01/07/13 22:10, Stephane Eranian wrote:
> On Mon, Jul 1, 2013 at 8:53 PM, David Ahern  wrote:
>>
>> On 7/1/13 3:32 AM, Adrian Hunter wrote:
>>>
>>> Snip
>>>

 While this works for a combined S/W and tracepoint events session, I do not
 like promoting sample types to the minimum compatible level for all events
 in the session. perf needs to allow each event to have its own sample_type
 and not force a minimal compatibility.
>>>
>>>
>>> Why?  The impact is small. The kernel API is completely unchanged.
>>
>>
>> I'd like to see libperf become a stable, usable library - usable by more 
>> than the perf binary and its builtin commands. I have already done this once 
>> for a daemon, and it was a PITA to get the specific use functional without 
>> memory leaks/growth in the libperf part.
>>
>> With respect to this specific patch it means appropriate flexibility in the 
>> data collected for events. ie., each event can have its own sample_type. For 
>> example if the tracepoint already contains task information TID is not 
>> needed - and IP may not be wanted either. The code processing the samples 
>> should not require all events to have some minimum data format - that just 
>> wastes buffer space.
>>
> I agree. This kernel needs to allow for any bit combination on
> sample_type and yet provide enough info
> to parse the buffer in the case of multi-event sampling. This is
> kernel bug. Tools should not have to handle
> this. Because it'd have to be repeated for each tool.
> 
> Later this week, I'll post a patch that address the kernel limitation.

But isn't it trivial.  Just add a new sample type that puts the ID first.
Anyone using the new PERF_SAMPLE_IDENTIFIER gets the new ABI.



diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0b1df41..6bb217e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -134,8 +134,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_STACK_USER  = 1U << 13,
PERF_SAMPLE_WEIGHT  = 1U << 14,
PERF_SAMPLE_DATA_SRC= 1U << 15,
+   PERF_SAMPLE_IDENTIFIER  = 1U << 16,
 
-   PERF_SAMPLE_MAX = 1U << 16, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 17, /* non-ABI */
 };
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1db3af9..a3707af 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1203,6 +1203,9 @@ static void perf_event__id_header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_TIME)
size += sizeof(data->time);
 
+   if (sample_type & PERF_SAMPLE_IDENTIFIER)
+   size += sizeof(data->id);
+
if (sample_type & PERF_SAMPLE_ID)
size += sizeof(data->id);
 
@@ -4229,7 +4232,7 @@ static void __perf_event_header__init_id(struct 
perf_event_header *header,
if (sample_type & PERF_SAMPLE_TIME)
data->time = perf_clock();
 
-   if (sample_type & PERF_SAMPLE_ID)
+   if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER))
data->id = primary_event_id(event);
 
if (sample_type & PERF_SAMPLE_STREAM_ID)
@@ -4268,6 +4271,9 @@ static void __perf_event__output_id_sample(struct 
perf_output_handle *handle,
 
if (sample_type & PERF_SAMPLE_CPU)
perf_output_put(handle, data->cpu_entry);
+
+   if (sample_type & PERF_SAMPLE_IDENTIFER)
+   perf_output_put(handle, data->id);
 }
 
 void perf_event__output_id_sample(struct perf_event *event,
@@ -4380,6 +4386,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 
perf_output_put(handle, *header);
 
+   if (sample_type & PERF_SAMPLE_IDENTIFIER)
+   perf_output_put(handle, data->id);
+
if (sample_type & PERF_SAMPLE_IP)
perf_output_put(handle, data->ip);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] tracing/uprobes: Support ftrace_event_file base multibuffer

2013-07-01 Thread zhangwei(Jovi)

On 2013/7/2 4:27, Oleg Nesterov wrote:
> On 06/29, zhangwei(Jovi) wrote:
>>
>> [v3->v4]:
> 
> I am wondering how much you will hate me if I suggest to make v5 ;)
> 
Feel free to do that :)

> But look, imho probe_event_enable() looks a bit more confusing than
> it needs.
> 
>> -probe_event_enable(struct trace_uprobe *tu, int flag, filter_func_t filter)
>> +probe_event_enable(struct trace_uprobe *tu, struct ftrace_event_file *file,
>> +   filter_func_t filter)
>>  {
>> +bool enabled = is_trace_uprobe_enabled(tu);
>> +struct event_file_link *link;
>>  int ret = 0;
> 
> Unnecessary initialization.
> 
>> -if (is_trace_uprobe_enabled(tu))
>> -return -EINTR;
>> +if (file) {
>> +if (tu->flags & TP_FLAG_PROFILE)
>> +return -EINTR;
>> +
>> +link = kmalloc(sizeof(*link), GFP_KERNEL);
>> +if (!link)
>> +return -ENOMEM;
>> +
>> +link->file = file;
>> +list_add_tail_rcu(&link->list, &tu->files);
>> +
>> +tu->flags |= TP_FLAG_TRACE;
>> +} else {
>> +if (tu->flags & TP_FLAG_TRACE)
>> +return -EINTR;
>> +
>> +tu->flags |= TP_FLAG_PROFILE;
>> +}
>>
>>  WARN_ON(!uprobe_filter_is_empty(&tu->filter));
>>
>> -tu->flags |= flag;
>> -tu->consumer.filter = filter;
>> -ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>> -if (ret)
>> -tu->flags &= ~flag;
>> +/* we cannot call uprobe_register twice for same tu */
> 
> The comment is confusing, I'd suggest to simply remove it.
> 
> Yes, we can't do uprobe_register() twice as we already discussed.
> But it is not that we "can't", we simply do not need this if uprobe
> was already created.
> 
>> +if (!enabled) {
>> +tu->consumer.filter = filter;
>> +ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
>> +}
>> +
>> +if (ret) {
>> +if (file) {
>> +list_del_rcu(&link->list);
> 
> I won't insist, but _rcu is not needed in this case. Again, this looks
> a bit confusing, as if we expect that some rcu reader can ever see this
> entry. But this is not true and we are going to just kfree it without
> synchronize_rcu().
> 
Yes, _rcu is not needed in there.

>> +kfree(link);
>> +tu->flags &= ~TP_FLAG_TRACE;
>> +} else
>> +tu->flags &= ~TP_FLAG_PROFILE;
>> +}
> 
> This is correct, but again, this is not immediately obvious.
> 
> Why it is correct to correct to clear TP_FLAG_TRACE? Because we know
> that "enabled" was false and thus we remove the single list entry.
> 
> So, perhaps,
> 
>   if (enabled)
>   return 0;
> 
>   ret = uprobe_register();
>   if (ret) {
>   ...;
>   }
> 
>   return ret;
> 
> will be a bit more clean.
> 
I will change it in v5 patch.

> Oleg.
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 0/8] 3.0.85-stable review

2013-07-01 Thread Guenter Roeck

On Mon, Jul 01, 2013 at 01:10:32PM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.0.85 release.
> There are 8 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Jul  3 19:59:07 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.85-rc1.gz
> and the diffstat can be found below.
> 

Build results are as follows. Same results as with 3.0.84.

Guenter

---
Build x86_64:defconfig passed
Build x86_64:allyesconfig passed
Build x86_64:allmodconfig passed
Build x86_64:allnoconfig passed
Build x86_64:alldefconfig passed
Build i386:defconfig passed
Build i386:allyesconfig passed
Build i386:allmodconfig passed
Build i386:allnoconfig passed
Build i386:alldefconfig passed
Build mips:defconfig passed
Build mips:bcm47xx_defconfig passed
Build mips:bcm63xx_defconfig passed
Build mips:ar7_defconfig passed
Build mips:fuloong2e_defconfig passed
Build mips:e55_defconfig passed
Build mips:powertv_defconfig passed
Build mips:malta_defconfig passed
Build powerpc:defconfig failed
Build powerpc:allyesconfig failed
Build powerpc:allmodconfig failed
Build powerpc:maple_defconfig failed
Build powerpc:ppc6xx_defconfig passed
Build powerpc:mpc83xx_defconfig passed
Build powerpc:mpc85xx_defconfig passed
Build powerpc:mpc85xx_smp_defconfig passed
Build powerpc:tqm8xx_defconfig passed
Build powerpc:85xx/sbc8548_defconfig passed
Build powerpc:83xx/mpc834x_mds_defconfig passed
Build powerpc:86xx/sbc8641d_defconfig passed
Build arm:defconfig passed
Build arm:allyesconfig failed
Build arm:allmodconfig failed
Build arm:exynos4_defconfig passed
Build arm:kirkwood_defconfig passed
Build arm:omap2plus_defconfig passed
Build arm:tegra_defconfig passed
Build arm:u8500_defconfig failed
Build arm:ap4evb_defconfig passed
Build arm:pxa910_defconfig passed
Build m68k:defconfig passed
Build m68k:m5272c3_defconfig failed
Build m68k:m5307c3_defconfig failed
Build m68k:m5249evb_defconfig failed
Build m68k:m5407c3_defconfig failed
Build m68k:sun3_defconfig passed
Build sparc:defconfig passed
Build sparc:sparc64_defconfig passed
Build xtensa:defconfig failed
Build xtensa:iss_defconfig failed
Build microblaze:mmu_defconfig failed
Build microblaze:nommu_defconfig failed
Build blackfin:defconfig failed
Build parisc:defconfig failed

---
Total builds: 54 Total build errors: 17
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi: s3c64xx: add missing check for polling mode

2013-07-01 Thread Girish KS

On Thu, Jun 27, 2013 at 4:45 PM, Mark Brown  wrote:
> On Thu, Jun 27, 2013 at 12:26:53PM +0530, Girish K S wrote:
>> After the patch "spi/s3c64xx: Fix non-dmaengine usage"
>> with commit id 563b444e33810f3120838620c990480304e24e63
>> submitted by Mark Brown, the spi device detection in polling
>> mode breaks. This revealed the missing check for polling during
>> dma prepare. This patch adds the missing check.
>
> Applied with a fixed commit message - since the dmaengine stuff was
> already in mainline at the time that polling mode was added the isse was
> that the patch hadn't been tested with current mainline code.

Hello Mark, This patch is missing in your pull request for 3.11. is it
possible to add it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 2/3] dt:net:stmmac: Add support to dwmac version 3.610 and 3.710

2013-07-01 Thread Srinivas KANDAGATLA

Thanks Peppe for the comments,
On 01/07/13 18:20, Giuseppe CAVALLARO wrote:
> On 7/1/2013 1:43 PM, Srinivas KANDAGATLA wrote:
>> From: Srinivas Kandagatla 
>>
>> +
>> +plat->bus_id = of_alias_get_id(np, "ethernet");
>> +if (plat->bus_id < 0)
>> +plat->bus_id = 0;
>> +
>> +of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr);
>> +
>>   plat->mdio_bus_data = devm_kzalloc(&pdev->dev,
>>  sizeof(struct stmmac_mdio_bus_data),
>>  GFP_KERNEL);
>> @@ -51,11 +60,25 @@ static int stmmac_probe_config_dt(struct
>> platform_device *pdev,
>>*/
>>   if (of_device_is_compatible(np, "st,spear600-gmac") ||
>>   of_device_is_compatible(np, "snps,dwmac-3.70a") ||
>> +of_device_is_compatible(np, "snps,dwmac-3.610") ||
I forgot to add "snps,dwmac-3.710" to this list, I will do it in V2 patch.

>>   of_device_is_compatible(np, "snps,dwmac")) {
>>   plat->has_gmac = 1;
>>   plat->pmt = 1;
>>   }
>>
>> +if (of_device_is_compatible(np, "snps,dwmac-3.610") ||
>> +of_device_is_compatible(np, "snps,dwmac-3.710")) {
>> +plat->enh_desc = 1;
>> +plat->bugged_jumbo = 1;
>> +plat->force_sf_dma_mode = 1;
>> +}
> 
> I think some these shouldn't be forced here. Maybe plat->enh_desc could
> be set because for new syn mac cores.
> 
> Also pmt could not be forced because it is an extra module so it could
> happen that a new chip has no PMT block.
I agree with you, But the new chips should/will have different version
numbers, so having the version number in the compatible string should
make it possible for cores without PMT module to not set pmt or any
other properties.

Are you happy with the setting pmt based on compatible string or do you
think passing pmt as another property to device tree makes more sense?

Thanks,
srini

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Performance regression from switching lock to rw-sem for anon-vma tree

2013-07-01 Thread Ingo Molnar


* Tim Chen  wrote:

> On Sat, 2013-06-29 at 09:12 +0200, Ingo Molnar wrote:
> > * Tim Chen  wrote:
> > 
> > > > If my analysis is correct so far then it might be useful to add two 
> > > > more stats: did rwsem_spin_on_owner() fail because lock->owner == NULL 
> > > > [owner released the rwsem], or because owner_running() failed [owner 
> > > > went to sleep]?
> > > 
> > > Ingo,
> > > 
> > > I tabulated the cases where rwsem_spin_on_owner returns false and causes 
> > > us to stop spinning.
> > > 
> > > 97.12%  was due to lock's owner switching to another writer
> > >  0.01% was due to the owner of the lock sleeping
> > >  2.87%  was due to need_resched() 
> > > 
> > > I made a change to allow us to continue to spin even when lock's owner 
> > > switch to another writer.  I did get the lock to be acquired now mostly 
> > > (98%) via optimistic spin and lock stealing, but my benchmark's 
> > > throughput actually got reduced by 30% (too many cycles spent on useless 
> > > spinning?).
> > 
> > Hm, I'm running out of quick ideas :-/ The writer-ends-spinning sequence 
> > is pretty similar in the rwsem and in the mutex case. I'd have a look at 
> > one more detail: is the wakeup of another writer in the rwsem case 
> > singular, is only a single writer woken? I suspect the answer is yes ...
> 
> Ingo, we can only wake one writer, right? In __rwsem_do_wake, that is 
> indeed the case.  Or you are talking about something else?

Yeah, I was talking about that, and my understanding and reading of the 
code says that too - I just wanted to make sure :-)

> >
> > A quick glance suggests that the ordering of wakeups of waiters is the 
> > same for mutexes and rwsems: FIFO, single waiter woken on 
> > slowpath-unlock. So that shouldn't make a big difference.
> 
> > If all last-ditch efforts to analyze it via counters fail then the way 
> > I'd approach it next is brute-force instrumentation:
> > 
> >  - First I'd create a workload 'steady state' that can be traced and 
> >examined without worrying that that it ends or switches to some other 
> >workload.
> > 
> >  - Then I'd create a relatively lightweight trace (maybe trace_printk() is
> >lightweight enough), and capture key mutex and rwsem events.
> > 
> >  - I'd capture a 1-10 seconds trace in steady state, both with rwsems and 
> >mutexes. I'd have a good look at which tasks take locks and schedule
> >how and why. I'd try to eliminate any assymetries in behavior, i.e. 
> >make rwsems behave like mutexes.
> 
> You mean adding trace points to record the events?  If you can be more 
> specific on what data to capture, that will be helpful.  It will be 
> holidays here in US so I may get around to this the following week.

Yeah, adding the relevant tracepoints (or trace_printk(), which is much 
simpler albeit a bit more expensive) - and capturing a 1-second 
steady-state traces via ftrace.

Then I'd compare the two traces and look at the biggest difference, and 
try to zoom in on to figure out why the difference occurs. More 
trace_printk()s can be added as you suspect specific areas or want to 
confirm various theories.

[ Assuming the phenomenon does not go away under tracing :-/ ]

[
  Another brute-force approach is to add a dynamic debug knob to switch 
  between a spinlock and an rwsem implementation for that lock: I'd do it 
  by adding both types of locks to the data structure, initializing both 
  but using a /proc/sys/kernel/ knob to decide whether to use spin_*() or
  rwsem facilities to utilize it. This way you could switch between the 
  two implementations without rebooting the system. In theory.
]

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Michael Wang

On 07/02/2013 02:29 PM, Mike Galbraith wrote:
> On Tue, 2013-07-02 at 14:17 +0800, Michael Wang wrote:
> 
>> As Peter mentioned before, we currently need some solution like the
>> buddy-idea, and when folks report regression (I suppose they won't...),
>> we will have more data then.
>>
>> So we could firstly try to regain the lost performance of pgbench, if it
>> strip the benefit of other benchmarks, let's fix it, and at last we will
>> have a real smart wake-affine and no one will complain ;-)
> 
> The idea is plenty simple (and the fastpath has a deep and abiding love
> of simple) so the idea itself flies in my book.  It doesn't add as much
> knowledge as may be nice to have, but if it adds enough to help pgbench
> and ilk without harming others, cool.

Nice to know you like it ;-)

There are some thinking behind the idea, since the knob is unacceptable,
I try to make the filter more strict, we actually
could get all the lost 50% performance back, but will run the risk to
strip other's benefit (like hackbench), but if just get 40% performance
back, then we may could reduce the risk nearly to 0.

So the principle of this idea is to filter out the extremely bad cases,
and we make sure under such cases, the chances of mess things up is very
high, thus the wake_affine() will become a little smart and know to stop
in front of the cliff...

Regards,
Michael Wang


> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] cpufreq: stats: Add 'load_table' debugfs file to show accumulated data of CPUs

2013-07-01 Thread Chanwoo Choi

On 06/28/2013 07:13 PM, Viresh Kumar wrote:
> On 28 June 2013 14:52, Chanwoo Choi  wrote:
>> On 06/28/2013 05:18 PM, Viresh Kumar wrote:
>>> On 28 June 2013 13:18, Chanwoo Choi  wrote:
> 
>>> Can you describe a bit about the layout this will create in debugfs?
>>> I thought you will have a load_table file per policy->cpu ??
>>>
>>
>> The debugfs_cpufreq is debugfs root directory (/sys/kernel/debug/cpufreq)
> 
> Which you are creating anyway in your patch.
> 
>> and debugfs_cpufreq has many child directory for Per-CPU debugfs according 
>> to NR_CPUS number (/sys/kernel/debug/cpufreq/cpuX).
> 
> Even you are creating this only for policy->cpu
> 
>> Finally, Per-CPU debugfs create load_table debugfs file 
>> (/sys/kernel/debug/cpufreq/cpuX/load_table).
>>
>> For example, only CPU0 create sysfs directory and file 
>> (/sys/devices/system/cpu/cpu0/cpufreq)
>> and then other CPUx create link of created sysfs directory by CPU0 in 
>> cpufreq_add_dev_symlink().
> 
> This isn't how its happening now. You aren't creating any links.

You're right. This patch didn't create link for CPU1/2/3.

> 
>> So, I'm considering whether to create link of CPUx's debugfs file except for 
>> CPU0 as sysfs file.
>> - /sys/kernel/debug/cpufreq/cpu1/
>> - /sys/kernel/debug/cpufreq/cpu2/
>> - /sys/kernel/debug/cpufreq/cpu3/
> 
> Yes please.

OK. I'll create link for CPU1,2,3 if all CPUs is included in one cluster.

I explain the sequence for creating sysfs file of CPU0/1/2/3.
There are difference about sysfs file. Only, CPU0 creates sysfs file
and then CPU1/2/3 create a link to CPU0 sysfs file. If we want to create
debugfs link for CPU1/2/3, I should have to debugfs file for CPU0 /
debugfs link for CPU1/2/3 when cpufreq_register_driver() is operated.
This proposal won't always remove debugfs file for cpufreq when user change
cpufreq governor from ondemand/conservative to performance/powersave.

So, I suggest that cpufreq core executes dbs_check_cpu() to calculate
CPUx load when cpufreq governor is performance/powersave. While maintaing
same cpu frequency on performance/powersave governor, there are different
power-consumption according to CPUx load. I think we need to check CPUs load
on peformance/powersave governor.

[Flow sequence for CPU0]
cpufreq_register_driver()
->subsys_interface_register()
-->sif->add_dev()
---> cpufreq_add_dev()
> cpufreq_add_policy_cpu()
-> sysfs_create_link(&dev->kboj, &policy->kobj, "cpufreq"); : Create sysfs 
file (/sys/devices/system/cpu/cpu0/cpufreq)

[Flow sequence for CPU1/2/3]
cpufreq_register_driver()
->subsys_interface_register()
-->sif->add_dev()
---> cpufreq_add_dev()
> cpufreq_add_policy_cpu()
-> cpufreq_add_dev_interface(cpu, ...)
--> cpufreq_add_dev_symlink(cpu, ...) : Create sysfs link about CPU0 sysfs 
file(/sys/devices/system/cpu/cpu0/cpufreq)

> 
>> - A number of online CPU is 4
>> Time(ms)   Old Freq(Hz) New Freq(Hz) CPU0 CPU1 CPU2 CPU3
>> 23165  20   20   2000
>> 23370  20   20   2000
>> 23575  20   20   2010
>> 23640  20   20   5110
>> 23780  20   20   3010
>> 23830  20   20   7100
>> 23985  20   20   1000
>> 24190  20   20   2011
>> 24385  20   20   2000
>> 24485  20   20   6010
>>
>> - A number of online CPU is 2
>> Time(ms)   Old Freq(Hz) New Freq(Hz) CPU0 CPU3
>> 37615  20   20   00
>> 37792  20   20   05
>> 38015  20   20   21   8
>> 38215  20   20   00
>> 38275  20   20   50
>> 38415  20   20   15   3
>> 38615  20   20   00
>> 38730  20   20   10
>> 38945  20   20   00
>> 39155  20   20   11
> 
> If you do the loop over for_each_cpu(cpu, policy->cpus),
> this problem will be resolved. You will see only online cpus.
> 
>> I'm considering whether to check the kind of cpufreq governor for creating 
>> load_table
>> in cpufreq_stats or execute dbs_check_cpu() on performance/powersave 
>> governor to check
>> CPUx load. If you have opinion about this, I'd like to listen it.
> 
> Maybe create these directories and do this stuff only when
> the first CPUFREQ_LOADCHECK notification is received inside
> cpufreq_stats.c
> 
> Also don't create debug/cpufreq directory unless you have any
> stuff to be created within this directory. Like, don't create it
> if we are using performance governor for all cpus.
> 

If core create debugfs/cpufreq directory when first CPUFREQ_LOADCHECK
notification is received inside cpufreq_stats.c, CPU1/2/3 don't send
CPUFREQ_LOADCHECK notification. In result, cpufreq_stats.c couldn't
create link for /sy

Re: [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64

2013-07-01 Thread Ingo Molnar


* Borislav Petkov  wrote:

> On Mon, Jul 01, 2013 at 03:35:47PM -0700, Wedson Almeida Filho wrote:
> > On Mon, Jul 1, 2013 at 3:28 PM, Borislav Petkov  wrote:
> > >
> > > perf stat --repeat 10 -a --sync --pre 'make -s clean; echo 1 > 
> > > /proc/sys/vm/drop_caches' make -s -j64 bzImage
> > 
> > How many CPUs do you have in your system? Maybe -j64 vs -jNUM_CPUs
> > affects your measurements as well.
> 
> 8. But that shouldn't matter since I made the non-differing measurements 
> two mails back with -j64.
> 
> Also -j9, i.e. -j$(($NUM_CPUS+1)) gives "121.613217871 seconds time 
> elapsed" because with -j9 the probability of some core not executing a 
> make thread for whatever reason is higher than with -j64. But it is only 
> as high as an additional 1s with this workload.
> 
> I think with -j64 Ingo meant to saturate the scheduler to make sure 
> there always are runnable threads more than cores available so that we 
> can maximize the core utilization with threads running our workload.

Yeah - I didn't know your CPU count, -j64 is what I use.

Also, just in case it wasn't clear: thanks for the measurements - and I'd 
be in favor of merging this patch if it shows any improvement or if 
measurements lie within noise, because per asm review the change should be 
a win.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Bcache changes for 3.11

2013-07-01 Thread Jens Axboe

On Mon, Jul 01 2013, Kent Overstreet wrote:
> Hi Jens, here's the bcache changes for 3.11.
> 
> The following changes since commit 9e895ace5d82df8929b16f58e9f515f6d54ab82d:
> 
>   Linux 3.10-rc7 (2013-06-22 09:47:31 -1000)

Ugh, that is 6 rcs ahead of where my for-3.11/drivers branch is sitting.
I guess there's no real way to avoid fast forwarding a bit, since there
are bcache fixes in that range as well.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: frequent softlockups with 3.10rc6.

2013-07-01 Thread Dave Chinner

On Mon, Jul 01, 2013 at 02:00:37PM +0200, Jan Kara wrote:
> On Sat 29-06-13 13:39:24, Dave Chinner wrote:
> > On Fri, Jun 28, 2013 at 12:28:19PM +0200, Jan Kara wrote:
> > > On Fri 28-06-13 13:58:25, Dave Chinner wrote:
> > > > writeback: store inodes under writeback on a separate list
> > > > 
> > > > From: Dave Chinner 
> > > > 
> > > > When there are lots of cached inodes, a sync(2) operation walks all
> > > > of them to try to find which ones are under writeback and wait for
> > > > IO completion on them. Run enough load, and this caused catastrophic
> > > > lock contention on the inode_sb_list_lock.
.
> > >   Ugh, the locking looks ugly.
> > 
> > Yes, it is, and I don't really like it.
> > 
> > >   Plus the list handling is buggy because the
> > > first wait_sb_inodes() invocation will move all inodes to its private
> > > sync_list so if there's another wait_sb_inodes() invocation racing with 
> > > it,
> > > it won't wait properly for all the inodes it should.
> > 
> > H - yeah, we only have implicit ordering of concurrent sync()
> > calls based on the serialisation of bdi-flusher work queuing and
> > dispatch. The waiting for IO completion is not serialised at all.
> > Seems like it's easy to fix with a per-sb sync mutex around the
> > dispatch and wait in sync_inodes_sb()

SO I have a patchset that does this, then moves to per-sb inode list
locks, then does

> > > Won't it be easier to remove inodes from b_wb list (btw, I'd slightly
> > > prefer name b_writeback)
> > 
> > Yeah, b_writeback would be nicer. It's messy, though - the writeback
> > structure uses b_io/b_more_io for stuff that is queued for writeback
> > (not actually under IO), while the inode calls that the i_wb_list.
> > Now we add a writeback list to the writeback structure for inodes
> > under IO, and call the inode list i_io_list. I think this needs to
> > be cleaned up as well...
>   Good point. The naming is somewhat inconsistent and would use a cleanup.

... this, and then does
> 
> > > lazily instead of from
> > > test_clear_page_writeback()? I mean we would remove inodes from b_wb list
> > > only in wait_sb_inodes() or when inodes get reclaimed from memory. That 
> > > way
> > > we save some work in test_clear_page_writeback() which is a fast path and
> > > defer it to sync which isn't that performance critical.

... this.

> > 
> > We could, but we just end up in the same place with sync as we are
> > now - with a long list of clean inodes with a few inodes hidden in
> > it that are under IO. i.e. we still have to walk lots of clean
> > inodes to find the dirty ones that we need to wait on
>   If the syncs are rare then yes. If they are relatively frequent, you
> would win because the first sync will cleanup the list and subsequent ones
> will be fast.

I haven't done this yet, because I've found an interesting
performance problem with our sync implementation. Basically, sync(2)
on a filesystem that is being constantly dirtied blocks the flusher
thread waiting for IO completion like so:

# echo w > /proc/sysrq-trigger 
[ 1968.031001] SysRq : Show Blocked State
[ 1968.032748]   taskPC stack   pid father
[ 1968.034534] kworker/u19:2   D 8800bed13140  3448  4830  2 0x
[ 1968.034534] Workqueue: writeback bdi_writeback_workfn (flush-253:32)
[ 1968.034534]  8800bdca3998 0046 8800bd1cae20 
8800bdca3fd8
[ 1968.034534]  8800bdca3fd8 8800bdca3fd8 88003ea1 
8800bd1cae20
[ 1968.034534]  8800bdca3968 8800bd1cae20 8800bed139a0 
0002
[ 1968.034534] Call Trace:
[ 1968.034534]  [] schedule+0x29/0x70
[ 1968.034534]  [] io_schedule+0x8f/0xd0
[ 1968.034534]  [] sleep_on_page+0xe/0x20
[ 1968.034534]  [] __wait_on_bit+0x60/0x90
[ 1968.034534]  [] wait_on_page_bit+0x80/0x90
[ 1968.034534]  [] filemap_fdatawait_range+0x101/0x190
[ 1968.034534]  [] filemap_fdatawait+0x27/0x30
[ 1968.034534]  [] __writeback_single_inode+0x1b8/0x220
[ 1968.034534]  [] writeback_sb_inodes+0x27b/0x410
[ 1968.034534]  [] wb_writeback+0xf0/0x2c0
[ 1968.034534]  [] wb_do_writeback+0xb8/0x210
[ 1968.034534]  [] bdi_writeback_workfn+0x72/0x160
[ 1968.034534]  [] process_one_work+0x177/0x400
[ 1968.034534]  [] worker_thread+0x122/0x380
[ 1968.034534]  [] kthread+0xd8/0xe0
[ 1968.034534]  [] ret_from_fork+0x7c/0xb0

i.e. this code:

static int
__writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
{
struct address_space *mapping = inode->i_mapping;
long nr_to_write = wbc->nr_to_write;
unsigned dirty;
int ret;

WARN_ON(!(inode->i_state & I_SYNC));

trace_writeback_single_inode_start(inode, wbc, nr_to_write);

ret = do_writepages(mapping, wbc);

/*
 * Make sure to wait on the data before writing out the metadata.
 * This is important for filesystems that modify metadata on data
 * I/O completion.
 */
if (wbc->sync_mode == WB_S

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Mike Galbraith

On Tue, 2013-07-02 at 14:17 +0800, Michael Wang wrote:

> As Peter mentioned before, we currently need some solution like the
> buddy-idea, and when folks report regression (I suppose they won't...),
> we will have more data then.
> 
> So we could firstly try to regain the lost performance of pgbench, if it
> strip the benefit of other benchmarks, let's fix it, and at last we will
> have a real smart wake-affine and no one will complain ;-)

The idea is plenty simple (and the fastpath has a deep and abiding love
of simple) so the idea itself flies in my book.  It doesn't add as much
knowledge as may be nice to have, but if it adds enough to help pgbench
and ilk without harming others, cool.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: remove the unnecessrary code of fs/inode.c

2013-07-01 Thread Dong Fang


On 07/02/2013 02:11 AM, Dong Fang wrote:

On 07/02/2013 12:41 AM, Al Viro wrote:

On Mon, Jul 01, 2013 at 08:19:03AM -0400, Dong Fang wrote:

These functions, such as find_inode_fast() and find_inode(),
iget_lock() and
iget5_lock(), insert_inode_locked() and insert_inode_locked4(),
almost have
the same code.


NAK.  These functions exist exactly because the variant with callbacks
costs more.  We walk the hash chain and for each inode on it your
variant would result in
* call
* fetching ino from memory
* comparison (and storing result in general-purpose register)
* return
* checking that register and branch on the result of that check
What's more, the whole thing's not fun for branch predictor.

It is a hot enough path to warrant a special-cased variant; if we can't
get away with that, we use the variants with callbacks, but on
filesystems
where ->i_ino is sufficient as search key we really want to avoid the
overhead.



that's right, i didn't think of it, but i think may be we can remove
the deduplicate codes of iget_lock() and iget5_lock() function, right?

if ok, i will send a new patch later. :)

thx Viro.


Viro, regard this :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: linux-next: manual merge of the arm-soc tree with the l2-mtd tree

2013-07-01 Thread Gupta, Pekon

> 
> On Mon, Jul 1, 2013 at 10:44 PM, Gupta, Pekon  wrote:
> >>
> >> Hi all,
> >>
> >> Today's linux-next merge of the arm-soc tree got a conflict in
> >> Documentation/devicetree/bindings/mtd/gpmc-nand.txt between
> commits
> >> 6c88058ef927 ("ARM: OMAP2+: cleaned-up DT support of various ECC
> >> schemes") and 212012138deb ("mtd: nand: omap2: updated support for
> >> BCH4
> >> ECC scheme") from the l2-mtd tree and commit 496c8a0bbb72 ("ARM:
> >> OMAP2+:
> >> Allow NAND transfer mode to be specified in DT") from the arm-soc tree.
> >>
> >> I fixed it up (maybe - see below) and can carry the fix as necessary (no
> >> action is required).
> >>
> >> --
> >> Cheers,
> >> Stephen Rothwells...@canb.auug.org.au
> >>
> > Yes following merge is correct. Apologies, as there were multiple OMAP2
> NAND and GPMC updates and clean-up going into different trees, so these
> conflict came. Going forward you shouldn't find such issues, as code is more
> stable now. Thanks for help.
> >
> > with regards, pekon
> 
> Sigh. The new bindings seem to never have been reviewed by any device
> tree maintainers, and from the look of it, it might need some
> discussion. It wasn't even cc:d to devicetree-discuss.
> 
> It's completely inappropriate to merge a patch like this at this time
> without any kind of acks from the people reviewing bindings. Can it
> please be dropped ASAP from the MTD tree? Thanks!
> 
> Or, if you want it in different wording: The mtd-tree patch is a
> strong NAK until this has been sorted out.
> 
> It was also applied today, after the merge window opened. Don't merge
> it for 3.11. Artem?
> 
> 
Hi OIof,

You may drop this patch if you wish so, but its not correct to say that this
was not reviewed. Following are comments from "Arnd Bergmann"
http://lists.infradead.org/pipermail/linux-mtd/2013-May/047030.html
And follow-up reasoning..
http://lists.infradead.org/pipermail/linux-mtd/2013-May/047032.html


And based on Arnd's feedbacks patches which contained changes in 
binding string was even dropped from the series, please see 
cover-letter of V3 set below.
http://lists.infradead.org/pipermail/linux-mtd/2013-June/047319.html
- PATCH-4  update DT attribute for ti,nand-ecc-opt 
- received feedback to keep DT mapping independent of linuxism
- PATCH-4: : ARM: dts: AM33xx: updated default ECC scheme in nand-ecc-opt
- independent patch for AM335x-evm.dts update based on PATCH-2

So, in case you have objections to same earlier patch where I had
introduced 'linux-based' nomenclature for bindings, that particular
one is already dropped in V3 of this patch series.


with regards, pekon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 4/4] ARM: dts: AM33XX: update rtc node compatibility

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 11:42:49, Nori, Sekhar wrote:
> Changing to Benoit's gmail id since he apparently wont access TI mail
> anymore.
> 
> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> > Since AM33xx  RTC IP has RTC_IRQWAKEEN to support Alarm Wake-up.
> > 
> > Update the rtc compatible property to "ti,am3352-rtc" to enable handling
> > of this feature inside rtc-omap driver.
> > 
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Tony Lindgren 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: b-cous...@ti.com
> > ---
> > :100644 100644 77aa1b0... dde180a... M  arch/arm/boot/dts/am33xx.dtsi
> >  arch/arm/boot/dts/am33xx.dtsi |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
> > index 77aa1b0..dde180a 100644
> > --- a/arch/arm/boot/dts/am33xx.dtsi
> > +++ b/arch/arm/boot/dts/am33xx.dtsi
> > @@ -297,7 +297,7 @@
> > };
> >  
> > rtc@44e3e000 {
> > -   compatible = "ti,da830-rtc";
> > +   compatible = "ti,am3352-rtc";
> 
> compatible is a list so you can instead do:
>   
>   compatible = "ti,am3352-rtc", "ti,da830-rtc";

I believe the order is not important here. I mean, below is also fine. Right?

compatible = "ti,da830-rtc", "ti,am3352-rtc";


> 
> That way the dts works irrespective of driver updates. When driver
> supports enhanced features of hardware, they are available to the user
> else the basic functionality still works.
> 
> Thanks,
> Sekhar
> 


Regards, 
Gururaja

Re: [PATCH V3 06/15] perf tools: fix parse_events_terms() freeing local variable on error path

2013-07-01 Thread Adrian Hunter

On 01/07/13 21:46, David Ahern wrote:
> On 7/1/13 2:01 AM, Adrian Hunter wrote:
>> On 28/06/13 20:19, David Ahern wrote:
>>> On 6/28/13 2:43 AM, Adrian Hunter wrote:
 The list_head is on the stack, so just free the rest of the list.

 Signed-off-by: Adrian Hunter 
 ---
tools/perf/util/parse-events.c | 7 ++-
tools/perf/util/parse-events.h | 1 +
tools/perf/util/pmu.c  | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)

 diff --git a/tools/perf/util/parse-events.c
 b/tools/perf/util/parse-events.c
 index 995fc25..d9cb055 100644
 --- a/tools/perf/util/parse-events.c
 +++ b/tools/perf/util/parse-events.c
 @@ -1231,12 +1231,17 @@ int parse_events_term__clone(struct
 parse_events_term **new,
term->val.str, term->val.num);
}

 -void parse_events__free_terms(struct list_head *terms)
 +void parse_events__free_terms_only(struct list_head *terms)
{
struct parse_events_term *term, *h;

list_for_each_entry_safe(term, h, terms, list)
free(term);
 +}
 +
 +void parse_events__free_terms(struct list_head *terms)
 +{
 +parse_events__free_terms_only(terms);

free(terms);
}
>>>
>>> I still don't understand the reasoning for an _only function. There is only
>>> 1 place that mallocs the list_head and that 1 user should free its own
>>> memory. All of the other users pass a stack variable.
>>
>> No.  See parse-events.y
> 
> Fine. Fix both then. My point is that parse-events.c code should not be
> freeing memory it does not allocate.

No.  Read the code.  The 'head' member is shared with other lists.  It does
not make sense to turn a tiny bug-fix into such a lot of re-work.

>>
>> The list head is defined as a pointer in the YYTYPE stack element:
>>
>> %union
>> {
>> char *str;
>> u64 num;
>> struct list_head *head;
>> struct parse_events_term *term;
>> }
>>
>> It is malloc'ed when terms are created:
>>
>> event_term
>> {
>> struct list_head *head = malloc(sizeof(*head));
>> struct parse_events_term *term = $1;
>>
>> ABORT_ON(!head);
>> INIT_LIST_HEAD(head);
>> list_add_tail(&term->list, head);
>> $$ = head;
>> }
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] metag: cpu hotplug: route_irq: preserve irq mask

2013-07-01 Thread Srivatsa S. Bhat

On 07/01/2013 09:34 PM, James Hogan wrote:
> The route_irq() function needs to preserve the irq mask by using the
> _irqsave/irqrestore variants of raw spin lock functions instead of the
> _irq variants. This is because it is called from __cpu_disable() (via
> migrate_irqs()), which is called with IRQs disabled, so using the _irq
> variants re-enables IRQs.
> 
> This appears to have been causing occasional hits of the
> BUG_ON(!irqs_disabled()) in __irq_work_run() during CPU hotplug soak
> testing:
>   BUG: failure at kernel/irq_work.c:122/__irq_work_run()!
> 
> Signed-off-by: James Hogan 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

>  arch/metag/kernel/irq.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/metag/kernel/irq.c b/arch/metag/kernel/irq.c
> index d91b1e9..2a2c9d5 100644
> --- a/arch/metag/kernel/irq.c
> +++ b/arch/metag/kernel/irq.c
> @@ -279,11 +279,12 @@ static void route_irq(struct irq_data *data, unsigned 
> int irq, unsigned int cpu)
>  {
>   struct irq_desc *desc = irq_to_desc(irq);
>   struct irq_chip *chip = irq_data_get_irq_chip(data);
> + unsigned long flags;
> 
> - raw_spin_lock_irq(&desc->lock);
> + raw_spin_lock_irqsave(&desc->lock, flags);
>   if (chip->irq_set_affinity)
>   chip->irq_set_affinity(data, cpumask_of(cpu), false);
> - raw_spin_unlock_irq(&desc->lock);
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
>  }
> 
>  /*
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v3 1/3] acpi: Call acpi_os_prepare_sleep hook in reduced hardware sleep path

2013-07-01 Thread Zheng, Lv

Thanks for your efforts!

I wonder if it is possible to remove the argument - "u8 extended" and convert 
"pm1a_control, pm1b_control" into some u8 values that are equivalent to 
"acpi_gbl_sleep_type_a, acpi_gbl_sleep_type_b" in the legacy sleep path.
It can also simplify Xen codes.

As in ACPI specification, the bit definitions between the legacy sleep 
registers and the extended sleep registers are equivalent.

The legacy sleep register definition:
Table 4-16 PM1 Status Registers Fixed Hardware Feature Status Bits - 
WAK_STS(bit 15)
Table 4-18 PM1 Control Registers Fixed Hardware Feature Control Bits - SLP_TYPx 
(bit 10-12), SLP_EN (bit 13)

The extended sleep register definition:
Table 4-24 Sleep Control Register - SLP_TYPx (3 bits from offset 2), SLP_EN (1 
bit from offset 5), here 10-8 = 2, and 13-8 = 5, this definition is equivalent 
to Table 4-18.
Table 4-25 Sleep Status Register - WAK_STS (1 bit 7), 15-8 = 7, this definition 
is equivalent to Table 4-16.

Thanks and best regards
-Lv

> -Original Message-
> From: linux-acpi-ow...@vger.kernel.org
> [mailto:linux-acpi-ow...@vger.kernel.org] On Behalf Of Ben Guthro
> Sent: Wednesday, June 26, 2013 10:06 PM
> To: Konrad Rzeszutek Wilk; Jan Beulich; Rafaell J . Wysocki;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org;
> xen-de...@lists.xen.org
> Cc: Ben Guthro; Moore, Robert
> Subject: [PATCH v3 1/3] acpi: Call acpi_os_prepare_sleep hook in reduced
> hardware sleep path
> 
> In version 3.4 acpi_os_prepare_sleep() got introduced in parallel with
> reduced hardware sleep support, and the two changes didn't get
> synchronized: The new code doesn't call the hook function (if so
> requested). Fix this, requiring a parameter to be added to the
> hook function to distinguish "extended" from "legacy" sleep.
> 
> Signed-off-by: Ben Guthro 
> Signed-off-by: Jan Beulich 
> Cc: Bob Moore 
> Cc: Rafaell J. Wysocki 
> Cc: linux-a...@vger.kernel.org
> ---
>  drivers/acpi/acpica/hwesleep.c |8 
>  drivers/acpi/acpica/hwsleep.c  |2 +-
>  drivers/acpi/osl.c |   16 
>  include/linux/acpi.h   |   10 +-
>  4 files changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/acpi/acpica/hwesleep.c b/drivers/acpi/acpica/hwesleep.c
> index 5e5f762..6834dd7 100644
> --- a/drivers/acpi/acpica/hwesleep.c
> +++ b/drivers/acpi/acpica/hwesleep.c
> @@ -43,6 +43,7 @@
>   */
> 
>  #include 
> +#include 
>  #include "accommon.h"
> 
>  #define _COMPONENT  ACPI_HARDWARE
> @@ -128,6 +129,13 @@ acpi_status acpi_hw_extended_sleep(u8
> sleep_state)
> 
>   ACPI_FLUSH_CPU_CACHE();
> 
> + status = acpi_os_prepare_sleep(sleep_state, acpi_gbl_sleep_type_a,
> +acpi_gbl_sleep_type_b, true);
> + if (ACPI_SKIP(status))
> + return_ACPI_STATUS(AE_OK);
> + if (ACPI_FAILURE(status))
> + return_ACPI_STATUS(status);
> +
>   /*
>* Set the SLP_TYP and SLP_EN bits.
>*
> diff --git a/drivers/acpi/acpica/hwsleep.c b/drivers/acpi/acpica/hwsleep.c
> index e3828cc..a93c299 100644
> --- a/drivers/acpi/acpica/hwsleep.c
> +++ b/drivers/acpi/acpica/hwsleep.c
> @@ -153,7 +153,7 @@ acpi_status acpi_hw_legacy_sleep(u8 sleep_state)
>   ACPI_FLUSH_CPU_CACHE();
> 
>   status = acpi_os_prepare_sleep(sleep_state, pm1a_control,
> -pm1b_control);
> +pm1b_control, false);
>   if (ACPI_SKIP(status))
>   return_ACPI_STATUS(AE_OK);
>   if (ACPI_FAILURE(status))
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index e721863..3fc2801 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -77,8 +77,8 @@ EXPORT_SYMBOL(acpi_in_debugger);
>  extern char line_buf[80];
>  #endif   /*ENABLE_DEBUGGER */
> 
> -static int (*__acpi_os_prepare_sleep)(u8 sleep_state, u32 pm1a_ctrl,
> -   u32 pm1b_ctrl);
> +static int (*__acpi_os_prepare_sleep)(u8 sleep_state, u32 val_a, u32 val_b,
> +   u8 extended);
> 
>  static acpi_osd_handler acpi_irq_handler;
>  static void *acpi_irq_context;
> @@ -1757,13 +1757,13 @@ acpi_status acpi_os_terminate(void)
>   return AE_OK;
>  }
> 
> -acpi_status acpi_os_prepare_sleep(u8 sleep_state, u32 pm1a_control,
> -   u32 pm1b_control)
> +acpi_status acpi_os_prepare_sleep(u8 sleep_state, u32 val_a, u32 val_b,
> +   u8 extended)
>  {
>   int rc = 0;
>   if (__acpi_os_prepare_sleep)
> - rc = __acpi_os_prepare_sleep(sleep_state,
> -  pm1a_control, pm1b_control);
> + rc = __acpi_os_prepare_sleep(sleep_state, val_a, val_b,
> +  extended);
>   if (rc < 0)
>   return AE_ERROR;
>   else if (rc > 0)
> @@ -1772,8 +1772,8 @@ acpi_status acpi_os_prepare_s

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Michael Wang

On 07/02/2013 01:54 PM, Mike Galbraith wrote:
> On Tue, 2013-07-02 at 12:43 +0800, Michael Wang wrote: 
>> Since RFC:
>>  Tested again with the latest tip 3.10.0-rc7.
>>
>> wake-affine stuff is always trying to pull wakee close to waker, by theory,
>> this will bring benefit if waker's cpu cached hot data for wakee, or the
>> extreme ping-pong case.
>>
>> And testing show it could benefit hackbench 15% at most.
> 
> How much does this still help with Alex's patches integrated?

I remember Alex already tested hackbench, and for wake_affine(), his
patch set is some kind of load filter, mine is nr_wakee filter, they are
separated, but I will do more test on this point when it become the last
concern.

> 
> aside: were I a maintainer, I'd be a little concerned that what this
> helps with collides somewhat with the ongoing numa work.

As Peter mentioned before, we currently need some solution like the
buddy-idea, and when folks report regression (I suppose they won't...),
we will have more data then.

So we could firstly try to regain the lost performance of pgbench, if it
strip the benefit of other benchmarks, let's fix it, and at last we will
have a real smart wake-affine and no one will complain ;-)

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Sekhar Nori



On 7/2/2013 11:41 AM, Hebbar, Gururaja wrote:
> On Tue, Jul 02, 2013 at 11:39:28, Nori, Sekhar wrote:
>> On 7/2/2013 11:34 AM, Hebbar, Gururaja wrote:
>>> On Tue, Jul 02, 2013 at 11:32:34, Nori, Sekhar wrote:
 On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
> is available to enable Alarm Wakeup feature. This register needs to be
> properly handled for the rtcwake to work properly.
>
> Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
> compatibility node.
>
> Signed-off-by: Hebbar Gururaja 
> Cc: Grant Likely 
> Cc: Rob Herring 
> Cc: Rob Landley 
> Cc: Sekhar Nori 
> Cc: Kevin Hilman 
> Cc: Alessandro Zummo 
> Cc: rtc-li...@googlegroups.com
> Cc: devicetree-disc...@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---

 [...]

> -#define  OMAP_RTC_DATA_DA830_IDX 1
> +#define  OMAP_RTC_DATA_DA830_IDX 1
> +#define  OMAP_RTC_DATA_AM335X_IDX2
>  
>  static struct platform_device_id omap_rtc_devtype[] = {
>   {
> @@ -309,6 +321,9 @@ static struct platform_device_id omap_rtc_devtype[] = 
> {
>   }, {
>   .name   = "da830-rtc",
>   .driver_data = OMAP_RTC_HAS_KICKER,
> + }, {
> + .name   = "am335x-rtc",

 may be use am3352-rtc here just to keep the platform device name and of
 compatible in sync.
>>>
>>> Correct. I will update the same in v2.
>>>

> + .driver_data = OMAP_RTC_HAS_KICKER | OMAP_RTC_HAS_IRQWAKEEN,
>   },
>   {},

 It is better to use the index defined above in the static initialization
 so they remain in sync.
>>>
>>> Sorry. I didn’t get this.
>>>
>>
>> See example below I provided. If its still not clear, let me know what
>> is not clear.
>>
...
[OMAP_RTC_DATA_DA830_IDX] = {
.name   = "da830-rtc",
.driver_data = OMAP_RTC_HAS_KICKER,
},
> 
> Thanks for the clarification. In this case will it ok if I update the previous
> member also.

You dont really reference [0] in omap_rtc_of_match[] so even if you
leave it as-is, that's fine with me. I am mostly concerned with the
index definitions and initialization order being out of sync and that's
really not an issue with [0].

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] metag: smp: don't spin waiting for CPU to start

2013-07-01 Thread Srivatsa S. Bhat

On 07/01/2013 09:34 PM, James Hogan wrote:
> Use a completion to block until a secondary CPU has started up, like ARM
> do, instead of a loop of udelays.
> 
> On Meta, SMP is really SMT, with each "CPU" being a different hardware
> thread on the same Meta processor core, so as well as being more
> efficient and latency friendly, using a completion prevents the bogomips
> of the secondary CPU from being drastically skewed every time by the
> execution of the tight in-cache udelay loop on the other CPU.
> 
> Signed-off-by: James Hogan 
> Cc: "Srivatsa S. Bhat" 
> Cc: Thomas Gleixner 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat


>  arch/metag/kernel/smp.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> index 09979f2..e413875 100644
> --- a/arch/metag/kernel/smp.c
> +++ b/arch/metag/kernel/smp.c
> @@ -8,6 +8,7 @@
>   * published by the Free Software Foundation.
>   */
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -62,6 +63,8 @@ static DEFINE_PER_CPU(struct ipi_data, ipi_data) = {
> 
>  static DEFINE_SPINLOCK(boot_lock);
> 
> +static DECLARE_COMPLETION(cpu_running);
> +
>  /*
>   * "thread" is assumed to be a valid Meta hardware thread ID.
>   */
> @@ -235,20 +238,12 @@ int __cpuinit __cpu_up(unsigned int cpu, struct 
> task_struct *idle)
>*/
>   ret = boot_secondary(thread, idle);
>   if (ret == 0) {
> - unsigned long timeout;
> -
>   /*
>* CPU was successfully started, wait for it
>* to come online or time out.
>*/
> - timeout = jiffies + HZ;
> - while (time_before(jiffies, timeout)) {
> - if (cpu_online(cpu))
> - break;
> -
> - udelay(10);
> - barrier();
> - }
> + wait_for_completion_timeout(&cpu_running,
> + msecs_to_jiffies(1000));
> 
>   if (!cpu_online(cpu))
>   ret = -EIO;
> @@ -391,6 +386,7 @@ asmlinkage void secondary_start_kernel(void)
>* OK, now it's safe to let the boot CPU continue
>*/
>   set_cpu_online(cpu, true);
> + complete(&cpu_running);
> 
>   /*
>* Enable local interrupts.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 4/4] ARM: dts: AM33XX: update rtc node compatibility

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 11:42:49, Nori, Sekhar wrote:
> Changing to Benoit's gmail id since he apparently wont access TI mail
> anymore.
> 
> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> > Since AM33xx  RTC IP has RTC_IRQWAKEEN to support Alarm Wake-up.
> > 
> > Update the rtc compatible property to "ti,am3352-rtc" to enable handling
> > of this feature inside rtc-omap driver.
> > 
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Tony Lindgren 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: b-cous...@ti.com
> > ---
> > :100644 100644 77aa1b0... dde180a... M  arch/arm/boot/dts/am33xx.dtsi
> >  arch/arm/boot/dts/am33xx.dtsi |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
> > index 77aa1b0..dde180a 100644
> > --- a/arch/arm/boot/dts/am33xx.dtsi
> > +++ b/arch/arm/boot/dts/am33xx.dtsi
> > @@ -297,7 +297,7 @@
> > };
> >  
> > rtc@44e3e000 {
> > -   compatible = "ti,da830-rtc";
> > +   compatible = "ti,am3352-rtc";
> 
> compatible is a list so you can instead do:
>   
>   compatible = "ti,am3352-rtc", "ti,da830-rtc";
> 
> That way the dts works irrespective of driver updates. When driver
> supports enhanced features of hardware, they are available to the user
> else the basic functionality still works.

Ok. I will update the same now in v2.

> 
> Thanks,
> Sekhar
> 


Regards, 
Gururaja
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 4/4] ARM: dts: AM33XX: update rtc node compatibility

2013-07-01 Thread Sekhar Nori

Changing to Benoit's gmail id since he apparently wont access TI mail
anymore.

On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> Since AM33xx  RTC IP has RTC_IRQWAKEEN to support Alarm Wake-up.
> 
> Update the rtc compatible property to "ti,am3352-rtc" to enable handling
> of this feature inside rtc-omap driver.
> 
> Signed-off-by: Hebbar Gururaja 
> Cc: Tony Lindgren 
> Cc: Sekhar Nori 
> Cc: Kevin Hilman 
> Cc: b-cous...@ti.com
> ---
> :100644 100644 77aa1b0... dde180a... March/arm/boot/dts/am33xx.dtsi
>  arch/arm/boot/dts/am33xx.dtsi |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
> index 77aa1b0..dde180a 100644
> --- a/arch/arm/boot/dts/am33xx.dtsi
> +++ b/arch/arm/boot/dts/am33xx.dtsi
> @@ -297,7 +297,7 @@
>   };
>  
>   rtc@44e3e000 {
> - compatible = "ti,da830-rtc";
> + compatible = "ti,am3352-rtc";

compatible is a list so you can instead do:

compatible = "ti,am3352-rtc", "ti,da830-rtc";

That way the dts works irrespective of driver updates. When driver
supports enhanced features of hardware, they are available to the user
else the basic functionality still works.

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 11:39:28, Nori, Sekhar wrote:
> On 7/2/2013 11:34 AM, Hebbar, Gururaja wrote:
> > On Tue, Jul 02, 2013 at 11:32:34, Nori, Sekhar wrote:
> >> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> >>> On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
> >>> is available to enable Alarm Wakeup feature. This register needs to be
> >>> properly handled for the rtcwake to work properly.
> >>>
> >>> Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
> >>> compatibility node.
> >>>
> >>> Signed-off-by: Hebbar Gururaja 
> >>> Cc: Grant Likely 
> >>> Cc: Rob Herring 
> >>> Cc: Rob Landley 
> >>> Cc: Sekhar Nori 
> >>> Cc: Kevin Hilman 
> >>> Cc: Alessandro Zummo 
> >>> Cc: rtc-li...@googlegroups.com
> >>> Cc: devicetree-disc...@lists.ozlabs.org
> >>> Cc: linux-...@vger.kernel.org
> >>> ---
> >>
> >> [...]
> >>
> >>> -#define  OMAP_RTC_DATA_DA830_IDX 1
> >>> +#define  OMAP_RTC_DATA_DA830_IDX 1
> >>> +#define  OMAP_RTC_DATA_AM335X_IDX2
> >>>  
> >>>  static struct platform_device_id omap_rtc_devtype[] = {
> >>>   {
> >>> @@ -309,6 +321,9 @@ static struct platform_device_id omap_rtc_devtype[] = 
> >>> {
> >>>   }, {
> >>>   .name   = "da830-rtc",
> >>>   .driver_data = OMAP_RTC_HAS_KICKER,
> >>> + }, {
> >>> + .name   = "am335x-rtc",
> >>
> >> may be use am3352-rtc here just to keep the platform device name and of
> >> compatible in sync.
> > 
> > Correct. I will update the same in v2.
> > 
> >>
> >>> + .driver_data = OMAP_RTC_HAS_KICKER | OMAP_RTC_HAS_IRQWAKEEN,
> >>>   },
> >>>   {},
> >>
> >> It is better to use the index defined above in the static initialization
> >> so they remain in sync.
> > 
> > Sorry. I didn’t get this.
> > 
> 
> See example below I provided. If its still not clear, let me know what
> is not clear.
> 
> >>...
> >>[OMAP_RTC_DATA_DA830_IDX] = {
> >>.name   = "da830-rtc",
> >>.driver_data = OMAP_RTC_HAS_KICKER,
> >>},

Thanks for the clarification. In this case will it ok if I update the previous
member also.

> 
> Thanks,
> Sekhar
> 


Regards, 
Gururaja

Re: [PATCH] vfs: remove the unnecessrary code of fs/inode.c

2013-07-01 Thread Dong Fang


On 07/02/2013 12:41 AM, Al Viro wrote:

On Mon, Jul 01, 2013 at 08:19:03AM -0400, Dong Fang wrote:

These functions, such as find_inode_fast() and find_inode(), iget_lock() and
iget5_lock(), insert_inode_locked() and insert_inode_locked4(), almost have
the same code.


NAK.  These functions exist exactly because the variant with callbacks
costs more.  We walk the hash chain and for each inode on it your
variant would result in
* call
* fetching ino from memory
* comparison (and storing result in general-purpose register)
* return
* checking that register and branch on the result of that check
What's more, the whole thing's not fun for branch predictor.

It is a hot enough path to warrant a special-cased variant; if we can't
get away with that, we use the variants with callbacks, but on filesystems
where ->i_ino is sufficient as search key we really want to avoid the
overhead.



that's right, i didn't think of it, but i think may be we can remove
the deduplicate codes of iget_lock() and iget5_lock() function, right?

if ok, i will send a new patch later. :)

thx Viro.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Sekhar Nori

On 7/2/2013 11:34 AM, Hebbar, Gururaja wrote:
> On Tue, Jul 02, 2013 at 11:32:34, Nori, Sekhar wrote:
>> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
>>> On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
>>> is available to enable Alarm Wakeup feature. This register needs to be
>>> properly handled for the rtcwake to work properly.
>>>
>>> Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
>>> compatibility node.
>>>
>>> Signed-off-by: Hebbar Gururaja 
>>> Cc: Grant Likely 
>>> Cc: Rob Herring 
>>> Cc: Rob Landley 
>>> Cc: Sekhar Nori 
>>> Cc: Kevin Hilman 
>>> Cc: Alessandro Zummo 
>>> Cc: rtc-li...@googlegroups.com
>>> Cc: devicetree-disc...@lists.ozlabs.org
>>> Cc: linux-...@vger.kernel.org
>>> ---
>>
>> [...]
>>
>>> -#defineOMAP_RTC_DATA_DA830_IDX 1
>>> +#defineOMAP_RTC_DATA_DA830_IDX 1
>>> +#defineOMAP_RTC_DATA_AM335X_IDX2
>>>  
>>>  static struct platform_device_id omap_rtc_devtype[] = {
>>> {
>>> @@ -309,6 +321,9 @@ static struct platform_device_id omap_rtc_devtype[] = {
>>> }, {
>>> .name   = "da830-rtc",
>>> .driver_data = OMAP_RTC_HAS_KICKER,
>>> +   }, {
>>> +   .name   = "am335x-rtc",
>>
>> may be use am3352-rtc here just to keep the platform device name and of
>> compatible in sync.
> 
> Correct. I will update the same in v2.
> 
>>
>>> +   .driver_data = OMAP_RTC_HAS_KICKER | OMAP_RTC_HAS_IRQWAKEEN,
>>> },
>>> {},
>>
>> It is better to use the index defined above in the static initialization
>> so they remain in sync.
> 
> Sorry. I didn’t get this.
> 

See example below I provided. If its still not clear, let me know what
is not clear.

>>  ...
>>  [OMAP_RTC_DATA_DA830_IDX] = {
>>  .name   = "da830-rtc",
>>  .driver_data = OMAP_RTC_HAS_KICKER,
>>  },

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 11:32:34, Nori, Sekhar wrote:
> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> > On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
> > is available to enable Alarm Wakeup feature. This register needs to be
> > properly handled for the rtcwake to work properly.
> > 
> > Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
> > compatibility node.
> > 
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Grant Likely 
> > Cc: Rob Herring 
> > Cc: Rob Landley 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: Alessandro Zummo 
> > Cc: rtc-li...@googlegroups.com
> > Cc: devicetree-disc...@lists.ozlabs.org
> > Cc: linux-...@vger.kernel.org
> > ---
> 
> [...]
> 
> > -#defineOMAP_RTC_DATA_DA830_IDX 1
> > +#defineOMAP_RTC_DATA_DA830_IDX 1
> > +#defineOMAP_RTC_DATA_AM335X_IDX2
> >  
> >  static struct platform_device_id omap_rtc_devtype[] = {
> > {
> > @@ -309,6 +321,9 @@ static struct platform_device_id omap_rtc_devtype[] = {
> > }, {
> > .name   = "da830-rtc",
> > .driver_data = OMAP_RTC_HAS_KICKER,
> > +   }, {
> > +   .name   = "am335x-rtc",
> 
> may be use am3352-rtc here just to keep the platform device name and of
> compatible in sync.

Correct. I will update the same in v2.

> 
> > +   .driver_data = OMAP_RTC_HAS_KICKER | OMAP_RTC_HAS_IRQWAKEEN,
> > },
> > {},
> 
> It is better to use the index defined above in the static initialization
> so they remain in sync.

Sorry. I didn’t get this.

> 
>   ...
>   [OMAP_RTC_DATA_DA830_IDX] = {
>   .name   = "da830-rtc",
>   .driver_data = OMAP_RTC_HAS_KICKER,
>   },
>   ...
> 
> >  };
> > @@ -318,6 +333,9 @@ static const struct of_device_id omap_rtc_of_match[] = {
> > {   .compatible = "ti,da830-rtc",
> > .data   = &omap_rtc_devtype[OMAP_RTC_DATA_DA830_IDX],
> > },
> > +   {   .compatible = "ti,am3352-rtc",
> > +   .data   = &omap_rtc_devtype[OMAP_RTC_DATA_AM335X_IDX],
> > +   },
> > {},
> >  };
> >  MODULE_DEVICE_TABLE(of, omap_rtc_of_match);
> 
> Apart from these minor issues, the patch looks good to me.
> 
> Acked-by: Sekhar Nori 
> 
> Thanks,
> Sekhar
> 


Regards, 
Gururaja

Re: memmap exclude boot command: how to check if it was indeed applied

2013-07-01 Thread Eric Valette


On 01/07/2013 20:40, Eric Valette wrote:

Hi,

After  hunting an unreproducible bug, I decided to run memtest86+ and
found that only 8 byte of memory refuse to write the last two digit on
the last 4GB memory stick.

Memory is at 0x31db357558

So I decided to add a memmap=4K@0x00031db35000 boot options in
GRUB_CMDLINE_LINUX. But checking dmesg, I see no mention of this page
being reserved.

PS: CC me I'm not subscribed.


Analysing /sys/firmware/memmap files, I see no trace of buggy address 
exclusion. Any help?




--eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Sekhar Nori

On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
> is available to enable Alarm Wakeup feature. This register needs to be
> properly handled for the rtcwake to work properly.
> 
> Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
> compatibility node.
> 
> Signed-off-by: Hebbar Gururaja 
> Cc: Grant Likely 
> Cc: Rob Herring 
> Cc: Rob Landley 
> Cc: Sekhar Nori 
> Cc: Kevin Hilman 
> Cc: Alessandro Zummo 
> Cc: rtc-li...@googlegroups.com
> Cc: devicetree-disc...@lists.ozlabs.org
> Cc: linux-...@vger.kernel.org
> ---

[...]

> -#define  OMAP_RTC_DATA_DA830_IDX 1
> +#define  OMAP_RTC_DATA_DA830_IDX 1
> +#define  OMAP_RTC_DATA_AM335X_IDX2
>  
>  static struct platform_device_id omap_rtc_devtype[] = {
>   {
> @@ -309,6 +321,9 @@ static struct platform_device_id omap_rtc_devtype[] = {
>   }, {
>   .name   = "da830-rtc",
>   .driver_data = OMAP_RTC_HAS_KICKER,
> + }, {
> + .name   = "am335x-rtc",

may be use am3352-rtc here just to keep the platform device name and of
compatible in sync.

> + .driver_data = OMAP_RTC_HAS_KICKER | OMAP_RTC_HAS_IRQWAKEEN,
>   },
>   {},

It is better to use the index defined above in the static initialization
so they remain in sync.

...
[OMAP_RTC_DATA_DA830_IDX] = {
.name   = "da830-rtc",
.driver_data = OMAP_RTC_HAS_KICKER,
},
...

>  };
> @@ -318,6 +333,9 @@ static const struct of_device_id omap_rtc_of_match[] = {
>   {   .compatible = "ti,da830-rtc",
>   .data   = &omap_rtc_devtype[OMAP_RTC_DATA_DA830_IDX],
>   },
> + {   .compatible = "ti,am3352-rtc",
> + .data   = &omap_rtc_devtype[OMAP_RTC_DATA_AM335X_IDX],
> + },
>   {},
>  };
>  MODULE_DEVICE_TABLE(of, omap_rtc_of_match);

Apart from these minor issues, the patch looks good to me.

Acked-by: Sekhar Nori 

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: block layer softlockup

2013-07-01 Thread Dave Jones

On Tue, Jul 02, 2013 at 12:07:41PM +1000, Dave Chinner wrote:
 > On Mon, Jul 01, 2013 at 01:57:34PM -0400, Dave Jones wrote:
 > > On Fri, Jun 28, 2013 at 01:54:37PM +1000, Dave Chinner wrote:
 > >  > On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
 > >  > > On Thu, Jun 27, 2013 at 3:18 PM, Dave Chinner  
 > > wrote:
 > >  > > >
 > >  > > > Right, that will be what is happening - the entire system will go
 > >  > > > unresponsive when a sync call happens, so it's entirely possible
 > >  > > > to see the soft lockups on inode_sb_list_add()/inode_sb_list_del()
 > >  > > > trying to get the lock because of the way ticket spinlocks work...
 > >  > > 
 > >  > > So what made it all start happening now? I don't recall us having had
 > >  > > these kinds of issues before..
 > >  > 
 > >  > Not sure - it's a sudden surprise for me, too. Then again, I haven't
 > >  > been looking at sync from a performance or lock contention point of
 > >  > view any time recently.  The algorithm that wait_sb_inodes() is
 > >  > effectively unchanged since at least 2009, so it's probably a case
 > >  > of it having been protected from contention by some external factor
 > >  > we've fixed/removed recently.  Perhaps the bdi-flusher thread
 > >  > replacement in -rc1 has changed the timing sufficiently that it no
 > >  > longer serialises concurrent sync calls as much
 > > 
 > > This mornings new trace reminded me of this last sentence. Related ?
 > 
 > Was this running the last patch I posted, or a vanilla kernel?

yeah, this had v2 of your patch (the one post lockdep warnings)

 > That's doing IO completion processing in softirq time, and the lock
 > it just dropped was the q->queue_lock. But that lock is held over
 > end IO processing, so it is possible that the way the page writeback
 > transition handling of my POC patch caused this.
 > 
 > FWIW, I've attached a simple patch you might like to try to see if
 > it *minimises* the inode_sb_list_lock contention problems. All it
 > does is try to prevent concurrent entry in wait_sb_inodes() for a
 > given superblock and hence only have one walker on the contending
 > filesystem at a time. Replace the previous one I sent with it. If
 > that doesn't work, I have another simple patch that makes the
 > inode_sb_list_lock per-sb to take this isolation even further
 
I can try it, though as always, proving a negative

Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the arm-soc tree with the l2-mtd tree

2013-07-01 Thread Olof Johansson

On Mon, Jul 1, 2013 at 10:44 PM, Gupta, Pekon  wrote:
>>
>> Hi all,
>>
>> Today's linux-next merge of the arm-soc tree got a conflict in
>> Documentation/devicetree/bindings/mtd/gpmc-nand.txt between commits
>> 6c88058ef927 ("ARM: OMAP2+: cleaned-up DT support of various ECC
>> schemes") and 212012138deb ("mtd: nand: omap2: updated support for
>> BCH4
>> ECC scheme") from the l2-mtd tree and commit 496c8a0bbb72 ("ARM:
>> OMAP2+:
>> Allow NAND transfer mode to be specified in DT") from the arm-soc tree.
>>
>> I fixed it up (maybe - see below) and can carry the fix as necessary (no
>> action is required).
>>
>> --
>> Cheers,
>> Stephen Rothwells...@canb.auug.org.au
>>
> Yes following merge is correct. Apologies, as there were multiple OMAP2 NAND 
> and GPMC updates and clean-up going into different trees, so these conflict 
> came. Going forward you shouldn't find such issues, as code is more stable 
> now. Thanks for help.
>
> with regards, pekon

Sigh. The new bindings seem to never have been reviewed by any device
tree maintainers, and from the look of it, it might need some
discussion. It wasn't even cc:d to devicetree-discuss.

It's completely inappropriate to merge a patch like this at this time
without any kind of acks from the people reviewing bindings. Can it
please be dropped ASAP from the MTD tree? Thanks!

Or, if you want it in different wording: The mtd-tree patch is a
strong NAK until this has been sorted out.

It was also applied today, after the merge window opened. Don't merge
it for 3.11. Artem?

-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] metag: smp: enable irqs after set_cpu_online

2013-07-01 Thread Srivatsa S. Bhat

On 07/01/2013 09:34 PM, James Hogan wrote:
> In secondary_start_kernel() interrupts should be enabled with
> local_irq_enable() after the cpu is marked as online with
> set_cpu_online(). Otherwise it's possible for a timer interrupt to
> trigger a softirq, which if the cpu is marked as offline may have it's
> affinity altered.
> 
> Reported-by: Kirill Tkhai 
> Signed-off-by: James Hogan 
> Cc: Kirill Tkhai 
> Cc: "Srivatsa S. Bhat" 
> Cc: Thomas Gleixner 
> ---

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

>  arch/metag/kernel/smp.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
> index b813515..09979f2 100644
> --- a/arch/metag/kernel/smp.c
> +++ b/arch/metag/kernel/smp.c
> @@ -379,12 +379,7 @@ asmlinkage void secondary_start_kernel(void)
> 
>   setup_priv();
> 
> - /*
> -  * Enable local interrupts.
> -  */
> - tbi_startup_interrupt(TBID_SIGNUM_TRT);
>   notify_cpu_starting(cpu);
> - local_irq_enable();
> 
>   pr_info("CPU%u (thread %u): Booted secondary processor\n",
>   cpu, cpu_2_hwthread_id[cpu]);
> @@ -398,6 +393,12 @@ asmlinkage void secondary_start_kernel(void)
>   set_cpu_online(cpu, true);
> 
>   /*
> +  * Enable local interrupts.
> +  */
> + tbi_startup_interrupt(TBID_SIGNUM_TRT);
> + local_irq_enable();
> +
> + /*
>* OK, it's off to the idle thread for us
>*/
>   cpu_startup_entry(CPUHP_ONLINE);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: remove the unnecessrary code of fs/inode.c

2013-07-01 Thread Dong Fang


On 07/02/2013 12:15 AM, Gu Zheng wrote:

On 07/01/2013 08:19 PM, Dong Fang wrote:


These functions, such as find_inode_fast() and find_inode(), iget_lock() and
iget5_lock(), insert_inode_locked() and insert_inode_locked4(), almost have
the same code.


Maybe the title "[PATCH] vfs: remove the reduplicate code of fs/inode.c" is more
suitable.



that's right, thanks for your advice



Signed-off-by: Dong Fang 



Reviewed-by: Gu Zheng 

Thanks,
Gu


---
  fs/inode.c |  134 
  1 files changed, 26 insertions(+), 108 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..847eee9 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -790,6 +790,22 @@ void prune_icache_sb(struct super_block *sb, int 
nr_to_scan)
  }

  static void __wait_on_freeing_inode(struct inode *inode);
+
+
+static int test_ino(struct inode *inode, void *data)
+{
+   unsigned long ino = *(unsigned long *) data;
+   return inode->i_ino == ino;


Can be more concise:
return inode->i_ino == *(unsigned long *) data;
,so does the new insert_inode_locked():



+}
+
+static int set_ino(struct inode *inode, void *data)
+{
+   inode->i_ino = *(unsigned long *) data;
+   return 0;
+}
+
+
+
  /*
   * Called with the inode lock held.
   */
@@ -829,28 +845,7 @@ repeat:
  static struct inode *find_inode_fast(struct super_block *sb,
struct hlist_head *head, unsigned long ino)
  {
-   struct inode *inode = NULL;
-
-repeat:
-   hlist_for_each_entry(inode, head, i_hash) {
-   spin_lock(&inode->i_lock);
-   if (inode->i_ino != ino) {
-   spin_unlock(&inode->i_lock);
-   continue;
-   }
-   if (inode->i_sb != sb) {
-   spin_unlock(&inode->i_lock);
-   continue;
-   }
-   if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
-   __wait_on_freeing_inode(inode);
-   goto repeat;
-   }
-   __iget(inode);
-   spin_unlock(&inode->i_lock);
-   return inode;
-   }
-   return NULL;
+   return find_inode(sb, head, test_ino, (void *)&ino);
  }

  /*
@@ -1073,50 +1068,7 @@ EXPORT_SYMBOL(iget5_locked);
   */
  struct inode *iget_locked(struct super_block *sb, unsigned long ino)
  {
-   struct hlist_head *head = inode_hashtable + hash(sb, ino);
-   struct inode *inode;
-
-   spin_lock(&inode_hash_lock);
-   inode = find_inode_fast(sb, head, ino);
-   spin_unlock(&inode_hash_lock);
-   if (inode) {
-   wait_on_inode(inode);
-   return inode;
-   }
-
-   inode = alloc_inode(sb);
-   if (inode) {
-   struct inode *old;
-
-   spin_lock(&inode_hash_lock);
-   /* We released the lock, so.. */
-   old = find_inode_fast(sb, head, ino);
-   if (!old) {
-   inode->i_ino = ino;
-   spin_lock(&inode->i_lock);
-   inode->i_state = I_NEW;
-   hlist_add_head(&inode->i_hash, head);
-   spin_unlock(&inode->i_lock);
-   inode_sb_list_add(inode);
-   spin_unlock(&inode_hash_lock);
-
-   /* Return the locked inode with I_NEW set, the
-* caller is responsible for filling in the contents
-*/
-   return inode;
-   }
-
-   /*
-* Uhhuh, somebody else created the same inode under
-* us. Use the old inode instead of the one we just
-* allocated.
-*/
-   spin_unlock(&inode_hash_lock);
-   destroy_inode(inode);
-   inode = old;
-   wait_on_inode(inode);
-   }
-   return inode;
+   return iget5_locked(sb, ino, test_ino, set_ino, (void *)&ino);
  }
  EXPORT_SYMBOL(iget_locked);

@@ -1281,48 +1233,6 @@ struct inode *ilookup(struct super_block *sb, unsigned 
long ino)
  }
  EXPORT_SYMBOL(ilookup);

-int insert_inode_locked(struct inode *inode)
-{
-   struct super_block *sb = inode->i_sb;
-   ino_t ino = inode->i_ino;
-   struct hlist_head *head = inode_hashtable + hash(sb, ino);
-
-   while (1) {
-   struct inode *old = NULL;
-   spin_lock(&inode_hash_lock);
-   hlist_for_each_entry(old, head, i_hash) {
-   if (old->i_ino != ino)
-   continue;
-   if (old->i_sb != sb)
-   continue;
-   spin_lock(&old->i_lock);
-   if (old->i_state & (I_FREEING|I_WILL_FREE)) {
-   spin_unlock(&old->i_lock);
-   continue;
-

Re: [PATCH RFC nohz_full v2 2/7] nohz_full: Add rcu_dyntick data for scalable detection of all-idle state

2013-07-01 Thread Paul E. McKenney

On Tue, Jul 02, 2013 at 07:10:52AM +0200, Mike Galbraith wrote:
> On Mon, 2013-07-01 at 12:16 -0700, Paul E. McKenney wrote: 
> > On Mon, Jul 01, 2013 at 11:34:13AM -0700, Josh Triplett wrote:
> 
> > > > > This also naturally raises the question "How can we let userspace get
> > > > > accurate time without forcing a timer tick?".
> > > > 
> > > > We don't.  ;-)
> > > 
> > > We don't currently, hence my question about how we can. :)
> > 
> > Per-CPU atomic clocks?
> 
> Great idea, who needs timekeeping code. 
> 
> http://www.euronews.com/2013/04/02/swiss-sets-sights-on-miniscule-atomic-clock/

"in theory you’ll only need to set it once every 3,000 years, providing
of course your battery lasts that long" ;-) ;-) ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC ticketlock] v3 Auto-queued ticketlock

2013-07-01 Thread Paul E. McKenney

On Mon, Jul 01, 2013 at 02:49:34PM +0530, Raghavendra KT wrote:
> On Sun, Jun 23, 2013 at 11:23 PM, Raghavendra KT
>  wrote:
> >
> >
> > On Wed, Jun 12, 2013 at 9:10 PM, Paul E. McKenney
> >  wrote:
> >>
> >> Breaking up locks is better than implementing high-contention locks, but
> >> if we must have high-contention locks, why not make them automatically
> >> switch between light-weight ticket locks at low contention and queued
> >> locks at high contention?  After all, this would remove the need for
> >> the developer to predict which locks will be highly contended.
> >>
> >> This commit allows ticket locks to automatically switch between pure
> >> ticketlock and queued-lock operation as needed.  If too many CPUs are
> >> spinning on a given ticket lock, a queue structure will be allocated
> >> and the lock will switch to queued-lock operation.  When the lock becomes
> >> free, it will switch back into ticketlock operation.  The low-order bit
> >> of the head counter is used to indicate that the lock is in queued mode,
> >> which forces an unconditional mismatch between the head and tail counters.
> >> This approach means that the common-case code path under conditions of
> >> low contention is very nearly that of a plain ticket lock.
> >>
> >> A fixed number of queueing structures is statically allocated in an
> >> array.  The ticket-lock address is used to hash into an initial element,
> >> but if that element is already in use, it moves to the next element.  If
> >> the entire array is already in use, continue to spin in ticket mode.
> >>
> >> Signed-off-by: Paul E. McKenney 
> >> [ paulmck: Eliminate duplicate code and update comments (Steven Rostedt).
> >> ]
> >> [ paulmck: Address Eric Dumazet review feedback. ]
> >> [ paulmck: Use Lai Jiangshan idea to eliminate smp_mb(). ]
> >> [ paulmck: Expand ->head_tkt from s32 to s64 (Waiman Long). ]
> >> [ paulmck: Move cpu_relax() to main spin loop (Steven Rostedt). ]
> >> [ paulmck: Reduce queue-switch contention (Waiman Long). ]
> >> [ paulmck: __TKT_SPIN_INC for __ticket_spin_trylock() (Steffen Persvold).
> >> ]
> >> [ paulmck: Type safety fixes (Steven Rostedt). ]
> >> [ paulmck: Pre-check cmpxchg() value (Waiman Long). ]
> >> [ paulmck: smp_mb() downgrade to smp_wmb() (Lai Jiangshan). ]
> >>
> > [...]
> >
> > I did test this on 32 core machine with 32 vcpu guests.
> >
> > This version gave me around 20% improvement fro sysbench and 36% improvement
> > for ebizzy, for 1x commit though other overcommited results showed
> > degradation. I have not tested Lai Jiangshan's patches on top of this yet.
> > Will report any findings.
> 
> Sorry for late report.

Not a problem, thank you for running these numbers!

> With Lai's patch I see few percentage of improvement in ebizzy 1x and
> reduction in degradation in dbench 1x.

OK, good!  But my guess is that even pushing the lock-acquisition
slowpath out of line, we still would not reach parity for the less-good
results.  Still seems like I should add Lai Jiangshan's patches
and post them somewhere in case they are helpful in some other context.

Thanx, Paul

> But over-commit degradation seem to still persist. seeing this,  I
> feel it is more of qmode overhead itself for large guests,
> 
> +---+---+---+---++---+
>   ebizzy (rec/sec higher is better)
> +---+---+---+---+---++---+
> base  stdev patched   stdev %improvement
> +---+---+---+---++---+
> 1x  5574.9000   237.4997  7851.9000   148.673740.84378
> 2x  2741.5000   561.3090  1620.9000   410.8299   -40.87543
> 3x  2146.2500   216.7718  1751.833396.5023   -18.37702
> +---+---+---+---++---+
> +---+---+---+---++---+
>   dbench (throughput higher is better)
> +---+---+---+---++---+
> base  stdev patched   stdev %improvement
> +---+---+---+---++---+
> 1x 14111.5600   754.4525 13826.5700  1458.0744-2.01955
> 2x  2481.627071.2665  1549.3740   245.3777   -37.56620
> 3x  1510.248331.8634  1116.015826.4882   -26.10382
> +---+---+---+---++---+
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Mike Galbraith

On Tue, 2013-07-02 at 12:43 +0800, Michael Wang wrote: 
> Since RFC:
>   Tested again with the latest tip 3.10.0-rc7.
> 
> wake-affine stuff is always trying to pull wakee close to waker, by theory,
> this will bring benefit if waker's cpu cached hot data for wakee, or the
> extreme ping-pong case.
> 
> And testing show it could benefit hackbench 15% at most.

How much does this still help with Alex's patches integrated?

aside: were I a maintainer, I'd be a little concerned that what this
helps with collides somewhat with the ongoing numa work.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Michael Wang

On 07/02/2013 01:38 PM, Mike Galbraith wrote:
> On Tue, 2013-07-02 at 12:43 +0800, Michael Wang wrote:
> 
>> +static int nasty_pull(struct task_struct *p)
>> +{
>> +int factor = cpumask_weight(cpu_online_mask);
>> +
>> +/*
>> + * Yeah, it's the switching-frequency, could means many wakee or
>> + * rapidly switch, use factor here will just help to automatically
>> + * adjust the loose-degree, so more cpu will lead to more pull.
>> + */
>> +if (p->nr_wakee_switch > factor) {
>> +/*
>> + * wakee is somewhat hot, it needs certain amount of cpu
>> + * resource, so if waker is far more hot, prefer to leave
>> + * it alone.
>> + */
>> +if (current->nr_wakee_switch > (factor * p->nr_wakee_switch))
>> +return 1;
>> +}
>> +
>> +return 0;
>> +}
> 
> Ew.  I haven't gotten around to test-driving this patchlet, and I see
> you haven't gotten around to finding a better name either.  Any other
> name will likely have a better chance of flying.

Trust me, I've tried to get a good name...and some cells in my brain do
sacrificed for it, bravely ;-)

> 
> tasks_related()
> ...
> well, nearly any..
> tasks_think_wake_affine_sucks_rocks()
> ..that won't fly either :)

Hmm...better than those in my mind (like dragon_wake_affine(), well...at
least dragon could fly).

Anyway, if the idea itself become acceptable, then any name is ok for
me, let's figure out a good one at that time :)

Regards,
Michael Wang


> 
> -Mike
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting

2013-07-01 Thread Zheng Liu

On Mon, Jul 01, 2013 at 09:43:29PM -0700, Dave Hansen wrote:
> On 07/01/2013 07:37 PM, Zheng Liu wrote:
> > FWIW, it would be great if we can let MAP_POPULATE flag support shared
> > mappings because in our product system there has a lot of applications
> > that uses mmap(2) and then pre-faults this mapping.  Currently these
> > applications need to pre-fault the mapping manually.
> 
> Are you sure it doesn't?  From a cursory look at the code, it looked to
> me like it would populate anonymous and file-backed, but I didn't
> double-check experimentally.

Thanks for pointing it out. I write a program to test this issue, and it
seems to me that it can populate a shared mapping.  But in manpage it
describes as below:

MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping.  For a file mapping,
this causes read-ahead on the file.  Later accesses to the mapping
will not be blocked by page faults.  MAP_POPULATE is only supported
for private mappings since Linux 2.6.23.

This page is part of release 3.24 of the Linux man-pages project.  I am
not sure whether it has been updated or not.

Regards,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: linux-next: manual merge of the arm-soc tree with the l2-mtd tree

2013-07-01 Thread Gupta, Pekon

> 
> Hi all,
> 
> Today's linux-next merge of the arm-soc tree got a conflict in
> Documentation/devicetree/bindings/mtd/gpmc-nand.txt between commits
> 6c88058ef927 ("ARM: OMAP2+: cleaned-up DT support of various ECC
> schemes") and 212012138deb ("mtd: nand: omap2: updated support for
> BCH4
> ECC scheme") from the l2-mtd tree and commit 496c8a0bbb72 ("ARM:
> OMAP2+:
> Allow NAND transfer mode to be specified in DT") from the arm-soc tree.
> 
> I fixed it up (maybe - see below) and can carry the fix as necessary (no
> action is required).
> 
> --
> Cheers,
> Stephen Rothwells...@canb.auug.org.au
> 
Yes following merge is correct. Apologies, as there were multiple OMAP2 NAND 
and GPMC updates and clean-up going into different trees, so these conflict 
came. Going forward you shouldn't find such issues, as code is more stable now. 
Thanks for help.

with regards, pekon

> diff --cc Documentation/devicetree/bindings/mtd/gpmc-nand.txt
> index b3f23df,df338cb..000
> --- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
> +++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
> @@@ -17,59 -17,27 +17,66 @@@ Required properties
> 
>   Optional properties:
> 
>  - - nand-bus-width:  Set this numeric value to 16 if the hardware
>  -is wired that way. If not specified, a bus
>  -width of 8 is assumed.
>  + - nand-bus-width:  Determines data-width of the connected
> device
>  +x16 = "16"
>  +x8  = "8" (default)
> 
>  - - ti,nand-ecc-opt: A string setting the ECC layout to use. One of:
> 
>  -"sw"Software method (default)
>  -"hw"Hardware method
>  -"hw-romcode"gpmc hamming mode method & romcode
> layout
>  -"bch4"  4-bit BCH ecc code
>  -"bch8"  8-bit BCH ecc code
>  + - ti,nand-ecc-opt: Determines the ECC scheme used by driver.
>  +It can be any of the following strings:
>  +
>  +"hamming_code_sw"   1-bit Hamming ECC
>  +- ECC calculation in software
>  +- Error detection in software
>  +- ECC layout compatible with S/W
> scheme
>  +
>  +"hamming_code_hw"   1-bit Hamming ECC
>  +- ECC calculation in hardware
>  +- Error detection in software
>  +- ECC layout compatible with S/W
> scheme
>  +
>  +"hamming_code_hw_romcode"   1-bit Hamming ECC
>  +- ECC calculation in hardware
>  +- Error detection in software
>  +- ECC layout compatible with ROM
> code
>  +
>  +"bch4_code_hw_detection_sw" 4-bit BCH ECC
>  +- ECC calculation in hardware
>  +- Error detection in software
>  +- ECC layout compatible with S/W
> scheme
>  +* depends on
> CONFIG_MTD_NAND_ECC_BCH
>  +
>  +"bch4_code_hw"  4-bit BCH ECC
>  +- ECC calculation in hardware
>  +- Error detection in hardware
>  +- ECC layout compatible with ROM
> code
>  +* depends on
> CONFIG_MTD_NAND_OMAP_BCH
>  +* requires  to be specified
>  +
>  +"bch8_code_hw_detection_sw" 8-bit BCH ECC
>  +- ECC calculation in hardware
>  +- Error detection in software
>  +- ECC layout compatible with S/W
> scheme
>  +* depends on
> CONFIG_MTD_NAND_ECC_BCH
>  +
>  +"bch8_code_hw"  8-bit BCH ECC
>  +- ECC calculation in hardware
>  +- Error detection in hardware
>  +- ECC layout compatible with ROM
> code
>  +* depends on
> CONFIG_MTD_NAND_OMAP_BCH
>  +* requires  to be specified
> 
> +  - ti,nand-xfer-type:   A string setting the data transfer 
> type. One
> of:
> +
> + "prefetch-polled"   Prefetch polled mode (default)
> + "polled"Polled mode, without prefetch
> + "prefetch-dma"  Prefetch enabled sDMA
> mode
> + "prefetch-irq"  Prefetch enabled irq mode
> +
>  - - elm_id:  Specifies elm device node. This is required to support BCH
>  -e

[PATCH 1/3] mds: update atime if client can read.

2013-07-01 Thread majianpeng

Now, update atime only for CEPH_CAP_FILE_EXCL.Change this if
CEPH_CAP_FILE_RD.

Signed-off-by: Jianpeng Ma 
---
 src/mds/Locker.cc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/mds/Locker.cc b/src/mds/Locker.cc
index 30e014a..58f953f 100644
--- a/src/mds/Locker.cc
+++ b/src/mds/Locker.cc
@@ -2676,8 +2676,9 @@ void Locker::_do_snap_update(CInode *in, snapid_t snap, 
int dirty, snapid_t foll
 void Locker::_update_cap_fields(CInode *in, int dirty, MClientCaps *m, inode_t 
*pi)
 {
   // file
+  utime_t atime = m->get_atime();
+
   if (dirty & (CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_WR)) {
-utime_t atime = m->get_atime();
 utime_t mtime = m->get_mtime();
 utime_t ctime = m->get_ctime();
 uint64_t size = m->get_size();
@@ -2700,11 +2701,7 @@ void Locker::_update_cap_fields(CInode *in, int dirty, 
MClientCaps *m, inode_t *
   pi->size = size;
   pi->rstat.rbytes = size;
 }
-if ((dirty & CEPH_CAP_FILE_EXCL) && atime != pi->atime) {
-  dout(7) << "  atime " << pi->atime << " -> " << atime
- << " for " << *in << dendl;
-  pi->atime = atime;
-}
+
 if ((dirty & CEPH_CAP_FILE_EXCL) &&
ceph_seq_cmp(pi->time_warp_seq, m->get_time_warp_seq()) < 0) {
   dout(7) << "  time_warp_seq " << pi->time_warp_seq << " -> " << 
m->get_time_warp_seq()
@@ -2712,6 +2709,12 @@ void Locker::_update_cap_fields(CInode *in, int dirty, 
MClientCaps *m, inode_t *
   pi->time_warp_seq = m->get_time_warp_seq();
 }
   }
+  
+  if ((dirty & CEPH_CAP_FILE_RD) && atime > pi->atime) {
+  dout(7) << "  atime " << pi->atime << " -> " << atime
+ << " for " << *in << dendl;
+  pi->atime = atime;
+  }
   // auth
   if (dirty & CEPH_CAP_AUTH_EXCL) {
 if (m->head.uid != pi->uid) {
-- 
1.8.3.rc1.44.gb387c77
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf＂�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

[PATCH 2/3] ceph: update atime after read-operation.

2013-07-01 Thread majianpeng

Now ceph don't support updating atime after read-operation if the open
mode is CEPH_CAP_FILE_RD.There are two reasons:
1:in client of fs,it don't set dirty cap of CEPH_CAP_FILE_RD.
2:in mds,it only update the atime if the condition
"dirty & (CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_WR) is true.
But if we can read, we can update atime. This patch only modify client to
support.

Signed-off-by: Jianpeng Ma 
---
 fs/ceph/file.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 87df15a..9daea70 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -672,6 +672,15 @@ again:
 out:
dout("aio_read %p %llx.%llx dropping cap refs on %s = %d\n",
 inode, ceph_vinop(inode), ceph_cap_string(got), (int)ret);
+
+   if (ret >= 0) {
+   int dirty;
+   spin_lock(&ci->i_ceph_lock);
+   dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_RD);
+   spin_unlock(&ci->i_ceph_lock);
+   if (dirty)
+   __mark_inode_dirty(inode, dirty);
+   }
ceph_put_cap_refs(ci, got);
 
if (checkeof && ret >= 0) {
-- 
1.8.1.2
N�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf＂�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

linux-next: manual merge of the renesas tree with the arm tree

2013-07-01 Thread Stephen Rothwell

Hi Simon,

Today's linux-next merge of the renesas tree got a conflict in
arch/arm/mach-shmobile/Kconfig between commit fb521a0da155 ("arm: fix up
ARM_ARCH_TIMER selects") from the arm tree and commits 462972da5f18
("ARM: shmobile: Make r8a7790 Arch timer optional") and 39d97587d6cb
("ARM: shmobile: Make r8a73a4 Arch timer optional") from the renesas tree.

I fixed it up (the latter patches removed the "select"s that the former
modified) and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp5IRLXnFdml.pgp
Description: PGP signature

[PATCH 3/3] ceph: For ceph_sync_read, update the atime of file.

2013-07-01 Thread majianpeng

For buffer read, the func generic_file_aio_read will update atime of
file.But the ceph_sync_read don't do it.So add this.

Signed-off-by: Jianpeng Ma 
---
 fs/ceph/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 656e169..87df15a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -442,6 +442,9 @@ done:
ceph_put_page_vector(pages, num_pages, true);
else
ceph_release_page_vector(pages, num_pages);
+
+   file_accessed(file);
+
dout("sync_read result %d\n", ret);
return ret;
 }
-- 
1.8.1.2

[PATCH 0/3] implement of updating atime for client has CEPH_CAP_FILE_RD

2013-07-01 Thread majianpeng

Now update atime of file, only for client had CEPH_CAP_FLE_EXCL.But for atime, 
if one can read he can update this attribte.
For this feature,it need both client and mds modify.
PATCH1,it modify the mds to support.
PATCH2,it modify the client to suport.
PATCH3, it support this feature for sync_read mode.

Jianpeng Ma (3):
 mds: update atime if client can read
  ceph: update atime after read-operation.
  ceph: For ceph_sync_read, update the atime of file.

 fs/ceph/file.c | 12 
 1 file changed, 12 insertions(+)

-- 
1.8.1.2

RE: [PATCH 2/4] davinci: da8xx/omap-l1: Remove hard coding of rtc device wakeup

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 11:10:14, Nori, Sekhar wrote:
> 
> On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> > Since now rtc-omap driver itself calls deice_init_wakeup(dev, true),
> > duplicate call from the rtc device registration can be removed.
> > 
> > This is basically a partial revert of the prev commit
> > 
> > commit 75c99bb0006ee065b4e2995078d779418b0fab54
> > Author: Sekhar Nori 
> > 
> > davinci: da8xx/omap-l1: mark RTC as a wakeup source
> > 
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: Russell King 
> 
> Subject line should be prefixed with ARM: keeping with arch/arm
> convention. Otherwise looks good.

Will fix it in v2.

> 
> Acked-by: Sekhar Nori 

Thanks for the review.

> 
> Thanks,
> Sekhar
> 


Regards, 
Gururaja

Re: [PATCH 2/4] davinci: da8xx/omap-l1: Remove hard coding of rtc device wakeup

2013-07-01 Thread Sekhar Nori


On 6/28/2013 3:05 PM, Hebbar Gururaja wrote:
> Since now rtc-omap driver itself calls deice_init_wakeup(dev, true),
> duplicate call from the rtc device registration can be removed.
> 
> This is basically a partial revert of the prev commit
> 
> commit 75c99bb0006ee065b4e2995078d779418b0fab54
> Author: Sekhar Nori 
> 
> davinci: da8xx/omap-l1: mark RTC as a wakeup source
> 
> Signed-off-by: Hebbar Gururaja 
> Cc: Sekhar Nori 
> Cc: Kevin Hilman 
> Cc: Russell King 

Subject line should be prefixed with ARM: keeping with arch/arm
convention. Otherwise looks good.

Acked-by: Sekhar Nori 

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: smart wake-affine

2013-07-01 Thread Mike Galbraith

On Tue, 2013-07-02 at 12:43 +0800, Michael Wang wrote:

> +static int nasty_pull(struct task_struct *p)
> +{
> + int factor = cpumask_weight(cpu_online_mask);
> +
> + /*
> +  * Yeah, it's the switching-frequency, could means many wakee or
> +  * rapidly switch, use factor here will just help to automatically
> +  * adjust the loose-degree, so more cpu will lead to more pull.
> +  */
> + if (p->nr_wakee_switch > factor) {
> + /*
> +  * wakee is somewhat hot, it needs certain amount of cpu
> +  * resource, so if waker is far more hot, prefer to leave
> +  * it alone.
> +  */
> + if (current->nr_wakee_switch > (factor * p->nr_wakee_switch))
> + return 1;
> + }
> +
> + return 0;
> +}

Ew.  I haven't gotten around to test-driving this patchlet, and I see
you haven't gotten around to finding a better name either.  Any other
name will likely have a better chance of flying.

tasks_related()
...
well, nearly any..
tasks_think_wake_affine_sucks_rocks()
..that won't fly either :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH V2 1/1] mwifiex: add tx info to skb when forming mgmt frame

2013-07-01 Thread Bing Zhao

Hi Harvey,

> From: Huawei Yang 
> 
> In function 'mwifiex_write_data_complete' it need tx info to find the
> mwifiex_private to updates statistics and wake up tx queues.
> Or we may trigger tx queues timeout when transmitting lots of mgmt frames.
> 
> Signed-off-by: Huawei Yang 
> ---
>  drivers/net/wireless/mwifiex/cfg80211.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/wireless/mwifiex/cfg80211.c
> b/drivers/net/wireless/mwifiex/cfg80211.c
> index e42b266..b4e2538 100644
> --- a/drivers/net/wireless/mwifiex/cfg80211.c
> +++ b/drivers/net/wireless/mwifiex/cfg80211.c
> @@ -186,6 +186,7 @@ mwifiex_cfg80211_mgmt_tx(struct wiphy *wiphy,
> struct wireless_dev *wdev,
>   struct sk_buff *skb;
>   u16 pkt_len;
>   const struct ieee80211_mgmt *mgmt;
> + struct mwifiex_txinfo *tx_info;
>   struct mwifiex_private *priv = mwifiex_netdev_get_priv(wdev-
> >netdev);
> 
>   if (!buf || !len) {
> @@ -212,6 +213,10 @@ mwifiex_cfg80211_mgmt_tx(struct wiphy *wiphy,
> struct wireless_dev *wdev,
>   wiphy_err(wiphy, "allocate skb failed for management
> frame\n");
>   return -ENOMEM;
>   }
> +

Here checkpatch.pl script reports whitespace damaged error.
I can fix it in my local tree and resend v3 to John after the 3.11 merge window.

Thanks,
Bing

> + tx_info = MWIFIEX_SKB_TXCB(skb);
> + tx_info->bss_num = priv->bss_num;
> + tx_info->bss_type = priv->bss_type;
> 
>   mwifiex_form_mgmt_frame(skb, buf, len);
>   mwifiex_queue_tx_pkt(priv, skb);
> --
> 1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag

2013-07-01 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
> Chen
> Sent: Thursday, June 20, 2013 1:23 PM
> To: b...@kernel.crashing.org
> Cc: linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag
> 
> We need to introduce a flag to indicate we're already running
> a kexec kernel then we can go proper path. For example, We
> shouldn't access spin_table from the bootloader to up any secondary
> cpu for kexec kernel, and kexec kernel already know how to jump to
> generic_secondary_smp_init.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  arch/powerpc/include/asm/smp.h|3 +++
>  arch/powerpc/kernel/head_64.S |   12 
>  arch/powerpc/kernel/misc_64.S |6 ++
>  arch/powerpc/platforms/85xx/smp.c |   14 ++
>  4 files changed, 35 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index ffbaabe..fbc3d9b 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -200,6 +200,9 @@ extern void generic_secondary_thread_init(void);
>  extern unsigned long __secondary_hold_spinloop;
>  extern unsigned long __secondary_hold_acknowledge;
>  extern char __secondary_hold;
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> +extern unsigned long __run_at_kexec;
> +#endif
> 
>  extern void __early_start(void);
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 3e19ba2..ffa4b18 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -89,6 +89,12 @@ __secondary_hold_spinloop:
>  __secondary_hold_acknowledge:
>   .llong  0x0
> 
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> + .globl  __run_at_kexec
> +__run_at_kexec:
> + .llong  0x0 /* Flag for the secondary kernel from kexec. */
> +#endif
> +
>  #ifdef CONFIG_RELOCATABLE
>   /* This flag is set to 1 by a loader if the kernel should run
>* at the loaded address instead of the linked address.  This
> @@ -417,6 +423,12 @@ _STATIC(__after_prom_start)
>  #if defined(CONFIG_PPC_BOOK3E)
>   tovirt(r26,r26) /* on booke, we already run at
> PAGE_OFFSET */
>  #endif
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> + /* If relocated we need to restore this flag on that relocated address. 
> */
> + ld  r7,__run_at_kexec-_stext(r26)
> + std r7,__run_at_kexec-_stext(r26)
> +#endif
> +
>   lwz r7,__run_at_load-_stext(r26)
>  #if defined(CONFIG_PPC_BOOK3E)
>   tophys(r26,r26) /* Restore for the remains. */
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 20cbb98..c89aead 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -619,6 +619,12 @@ _GLOBAL(kexec_sequence)
>   bl  .copy_and_flush /* (dest, src, copy limit, start offset) */
>  1:   /* assume normal blr return */
> 
> + /* notify we're going into kexec kernel for SMP. */
> + LOAD_REG_ADDR(r3,__run_at_kexec)
> + li  r4,1
> + std r4,0(r3)
> + sync
> +
>   /* release other cpus to the new kernel secondary start at 0x60 */
>   mflrr5
>   li  r6,1
> diff --git a/arch/powerpc/platforms/85xx/smp.c
> b/arch/powerpc/platforms/85xx/smp.c
> index 6a17599..b308373 100644
> --- a/arch/powerpc/platforms/85xx/smp.c
> +++ b/arch/powerpc/platforms/85xx/smp.c
> @@ -150,6 +150,9 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
>   int hw_cpu = get_hard_smp_processor_id(nr);
>   int ioremappable;
>   int ret = 0;
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> + unsigned long *ptr;
> +#endif

What about if we can remove the ifdef around *ptr ...

> 
>   WARN_ON(nr < 0 || nr >= NR_CPUS);
>   WARN_ON(hw_cpu < 0 || hw_cpu >= NR_CPUS);
> @@ -238,11 +241,22 @@ out:
>  #else
>   smp_generic_kick_cpu(nr);
> 
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> + ptr  = (unsigned long *)((unsigned long)&__run_at_kexec);

... #endif here ...

> + /* We shouldn't access spin_table from the bootloader to up any
> +  * secondary cpu for kexec kernel, and kexec kernel already
> +  * know how to jump to generic_secondary_smp_init.
> +  */
> + if (!*ptr) {
> +#endif

... remove #endif ...

>   flush_spin_table(spin_table);
>   out_be32(&spin_table->pir, hw_cpu);
>   out_be64((u64 *)(&spin_table->addr_h),
> __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
>   flush_spin_table(spin_table);
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> + }
> +#endif

--- remove above 3 lines

-Bharat

>  #endif
> 
>   local_irq_restore(flags);
> --
> 1.7.9.5
> 
> ___
> Linuxppc-dev

[PATCH] Add hsize argument in write_buf call of pstore_ftrace_call

2013-07-01 Thread Aruna Balakrishnaiah

Incorporate the addition of hsize argument in write_buf callback
of pstore.

Signed-off-by: Aruna Balakrishnaiah 
---

 fs/pstore/ftrace.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c
index 43b1280..76a4eeb 100644
--- a/fs/pstore/ftrace.c
+++ b/fs/pstore/ftrace.c
@@ -44,7 +44,7 @@ static void notrace pstore_ftrace_call(unsigned long ip,
rec.parent_ip = parent_ip;
pstore_ftrace_encode_cpu(&rec, raw_smp_processor_id());
psinfo->write_buf(PSTORE_TYPE_FTRACE, 0, NULL, 0, (void *)&rec,
- sizeof(rec), psinfo);
+ 0, sizeof(rec), psinfo);
 
local_irq_restore(flags);
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the arm-soc tree with the l2-mtd tree

2013-07-01 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
drivers/mtd/nand/Kconfig between commit 212012138deb ("mtd: nand: omap2:
updated support for BCH4 ECC scheme") from the l2-mtd tree and commit
930d800bded7 ("mtd: omap2: allow bulding as a module") from the arm-soc
tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/mtd/nand/Kconfig
index a6e247c,50543f1..000
--- a/drivers/mtd/nand/Kconfig
+++ b/drivers/mtd/nand/Kconfig
@@@ -95,13 -95,35 +95,13 @@@ config MTD_NAND_OMAP
  
  config MTD_NAND_OMAP_BCH
depends on MTD_NAND && MTD_NAND_OMAP2 && ARCH_OMAP3
-   bool "Support hardware based BCH error correction"
 -  tristate "Enable support for hardware BCH error correction"
++  tristate "Support hardware based BCH error correction"
default n
select BCH
 -  select BCH_CONST_PARAMS
help
 -   Support for hardware BCH error correction.
 -
 -choice
 -  prompt "BCH error correction capability"
 -  depends on MTD_NAND_OMAP_BCH
 -
 -config MTD_NAND_OMAP_BCH8
 -  bool "8 bits / 512 bytes (recommended)"
 -  help
 -   Support correcting up to 8 bitflips per 512-byte block.
 -   This will use 13 bytes of spare area per 512 bytes of page data.
 -   This is the recommended mode, as 4-bit mode does not work
 -   on some OMAP3 revisions, due to a hardware bug.
 -
 -config MTD_NAND_OMAP_BCH4
 -  bool "4 bits / 512 bytes"
 -  help
 -   Support correcting up to 4 bitflips per 512-byte block.
 -   This will use 7 bytes of spare area per 512 bytes of page data.
 -   Note that this mode does not work on some OMAP3 revisions, due to a
 -   hardware bug. Please check your OMAP datasheet before selecting this
 -   mode.
 -
 -endchoice
 +Some devices have built-in ELM hardware engine, which can be used to
 +locate and correct errors when using BCH ECC scheme. This enables the
 +driver support for same.
  
  if MTD_NAND_OMAP_BCH
  config BCH_CONST_M


pgpsAM2A22JRP.pgp
Description: PGP signature

Re: [PATCH v3 10/45] smp: Use get/put_online_cpus_atomic() to prevent CPU offline

2013-07-01 Thread Michael Wang

Hi, Srivatsa

On 06/28/2013 03:54 AM, Srivatsa S. Bhat wrote:
[snip]
> @@ -625,8 +632,9 @@ EXPORT_SYMBOL(on_each_cpu_mask);
>   * The function might sleep if the GFP flags indicates a non
>   * atomic allocation is allowed.
>   *
> - * Preemption is disabled to protect against CPUs going offline but not 
> online.
> - * CPUs going online during the call will not be seen or sent an IPI.
> + * We use get/put_online_cpus_atomic() to protect against CPUs going
> + * offline but not online. CPUs going online during the call will
> + * not be seen or sent an IPI.

I was a little confused about this comment, if the offline-cpu still
have chances to become online, then there is chances that we will pick
it from for_each_online_cpu(), isn't it? Did I miss some point?

Regards,
Michael Wang

>   *
>   * You must not call this function with disabled interrupts or
>   * from a hardware interrupt handler or from a bottom half handler.
> @@ -641,26 +649,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void 
> *info),
>   might_sleep_if(gfp_flags & __GFP_WAIT);
> 
>   if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN {
> - preempt_disable();
> + get_online_cpus_atomic();
>   for_each_online_cpu(cpu)
>   if (cond_func(cpu, info))
>   cpumask_set_cpu(cpu, cpus);
>   on_each_cpu_mask(cpus, func, info, wait);
> - preempt_enable();
> + put_online_cpus_atomic();
>   free_cpumask_var(cpus);
>   } else {
>   /*
>* No free cpumask, bother. No matter, we'll
>* just have to IPI them one by one.
>*/
> - preempt_disable();
> + get_online_cpus_atomic();
>   for_each_online_cpu(cpu)
>   if (cond_func(cpu, info)) {
>   ret = smp_call_function_single(cpu, func,
>   info, wait);
>   WARN_ON_ONCE(!ret);
>   }
> - preempt_enable();
> + put_online_cpus_atomic();
>   }
>  }
>  EXPORT_SYMBOL(on_each_cpu_cond);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Jul 1 [ drm-intel-next: Several call-traces ]

2013-07-01 Thread Sedat Dilek

On Mon, Jul 1, 2013 at 11:03 AM, Sedat Dilek  wrote:
> On Mon, Jul 1, 2013 at 10:52 AM, Daniel Vetter  wrote:
>> On Mon, Jul 1, 2013 at 10:49 AM, Sedat Dilek  wrote:
>>> On Mon, Jul 1, 2013 at 9:59 AM, Stephen Rothwell  
>>> wrote:
>>>> Hi all,
>>>>
>>>> Changes since 20130628:
>>>>
>>>> The regulator tree gained a build failure so I used the version from
>>>> next-20130628.
>>>>
>>>> The trivial tree gained a conflict against the fbdev tree.
>>>>
>>>> The arm-soc tree gained a conflict against the net-next tree.
>>>>
>>>> The akpm tree lost a few patches that turned up elsewhere and I removed 2
>>>> that were causing run time problems.
>>>>
>>>
>>> [ CC drm and drm-intel folks ]
>>>
>>> [ Did not check any relevant MLs ]
>>>
>>> Please, see attached dmesg output.
>>
>> Clock mismatch, one for Jesse to figure out. Note that this patch is
>> for 3.12, I simply haven't yet gotten around to properly split my
>> patch queue so a few spilled into -next. I'll do that now.
>
> I like lightspeed-fast replies :-).
>
> Guess "drm/i915: get mode clock when reading the pipe config v9" [1]
> is the cause.
>

Problem solved by applying these patches to next-20130701 from
intel-gfx patchwork-service [0]:

   [1/2] drm/i915: fixup messages in pipe_config_compare
   [2/2] drm/i915: get clock config when checking CRTC state too

AFAICS 2/2 was folded into updated "drm/i915: get mode clock when
reading the pipe config v9" [3].

It would be kind to be CCed on the patches and get also some credits.
Also a CC to the report in linux-next should IMHO be done.

- Sedat -

[0] https://patchwork.kernel.org/project/intel-gfx/list/
[1] https://patchwork.kernel.org/patch/2809031/
[2] https://patchwork.kernel.org/patch/2809021/
[3] 
http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-nightly&id=f1f644dc66cbaf5a4c7dcde683361536b41885b9

> - Sedat -
>
> [1] 
> http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-next-queued&id=d325d8b4f351f9d45e7c8baabf581fd21f343133
>
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the arm-soc tree with the l2-mtd tree

2013-07-01 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the arm-soc tree got a conflict in
Documentation/devicetree/bindings/mtd/gpmc-nand.txt between commits
6c88058ef927 ("ARM: OMAP2+: cleaned-up DT support of various ECC
schemes") and 212012138deb ("mtd: nand: omap2: updated support for BCH4
ECC scheme") from the l2-mtd tree and commit 496c8a0bbb72 ("ARM: OMAP2+:
Allow NAND transfer mode to be specified in DT") from the arm-soc tree.

I fixed it up (maybe - see below) and can carry the fix as necessary (no
action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc Documentation/devicetree/bindings/mtd/gpmc-nand.txt
index b3f23df,df338cb..000
--- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
+++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
@@@ -17,59 -17,27 +17,66 @@@ Required properties
  
  Optional properties:
  
 - - nand-bus-width:Set this numeric value to 16 if the hardware
 -  is wired that way. If not specified, a bus
 -  width of 8 is assumed.
 + - nand-bus-width:Determines data-width of the connected device
 +  x16 = "16"
 +  x8  = "8" (default)
  
 - - ti,nand-ecc-opt:   A string setting the ECC layout to use. One of:
  
 -  "sw"Software method (default)
 -  "hw"Hardware method
 -  "hw-romcode"gpmc hamming mode method & romcode layout
 -  "bch4"  4-bit BCH ecc code
 -  "bch8"  8-bit BCH ecc code
 + - ti,nand-ecc-opt:   Determines the ECC scheme used by driver.
 +  It can be any of the following strings:
 +
 +  "hamming_code_sw"   1-bit Hamming ECC
 +  - ECC calculation in software
 +  - Error detection in software
 +  - ECC layout compatible with S/W scheme
 +
 +  "hamming_code_hw"   1-bit Hamming ECC
 +  - ECC calculation in hardware
 +  - Error detection in software
 +  - ECC layout compatible with S/W scheme
 +
 +  "hamming_code_hw_romcode"   1-bit Hamming ECC
 +  - ECC calculation in hardware
 +  - Error detection in software
 +  - ECC layout compatible with ROM code
 +
 +  "bch4_code_hw_detection_sw" 4-bit BCH ECC
 +  - ECC calculation in hardware
 +  - Error detection in software
 +  - ECC layout compatible with S/W scheme
 +  * depends on CONFIG_MTD_NAND_ECC_BCH
 +
 +  "bch4_code_hw"  4-bit BCH ECC
 +  - ECC calculation in hardware
 +  - Error detection in hardware
 +  - ECC layout compatible with ROM code
 +  * depends on CONFIG_MTD_NAND_OMAP_BCH
 +  * requires  to be specified
 +
 +  "bch8_code_hw_detection_sw" 8-bit BCH ECC
 +  - ECC calculation in hardware
 +  - Error detection in software
 +  - ECC layout compatible with S/W scheme
 +  * depends on CONFIG_MTD_NAND_ECC_BCH
 +
 +  "bch8_code_hw"  8-bit BCH ECC
 +  - ECC calculation in hardware
 +  - Error detection in hardware
 +  - ECC layout compatible with ROM code
 +  * depends on CONFIG_MTD_NAND_OMAP_BCH
 +  * requires  to be specified
  
+  - ti,nand-xfer-type: A string setting the data transfer type. One of:
+ 
+   "prefetch-polled"   Prefetch polled mode (default)
+   "polled"Polled mode, without prefetch
+   "prefetch-dma"  Prefetch enabled sDMA mode
+   "prefetch-irq"  Prefetch enabled irq mode
+ 
 - - elm_id:Specifies elm device node. This is required to support BCH
 -  error correction using ELM module.
 +
 + - elm_id:Specifies elm device node. This is required to
 +  support some BCH ECC schemes mentioned above.
 +
  
  For inline partiton table parsing (optional):
  


pgpGr02zUsHk6.pgp
Description: PGP signature

Re: [PATCH 0/6] Basic scheduler support for automatic NUMA balancing

2013-07-01 Thread Srikar Dronamraju

* Mel Gorman  [2013-07-01 09:43:21]:

> 
> Thanks. Each of the the two runs had 5 iterations and there is a
> difference in the reported average. Do you know what the standard
> deviation is of the results?

Yes, the results were from 2 different runs. 
I hadnt calculated the std deviation for those runs.
> 
> I'm less concerned about the numa01 results as it is an adverse
> workload on machins with more than two sockets but the numa02 results
> are certainly of concern. My own testing for numa02 showed little or no
> change. Would you mind testing with "Increase NUMA PTE scanning when a
> new preferred node is selected" reverted please?
> 

Here are the results with the last patch reverted as requested by you.

KernelVersion: 3.9.0-mainline_v39+ your patches - last patch
Testcase:  Min  Max  Avg  StdDev  %Change
  numa01:  1704.50  1841.82  1757.55   49.272.42%
 numa01_THREAD_ALLOC:   433.25   517.07   464.17   28.15  -32.99%
  numa02:55.6461.7557.702.19  -43.52%
  numa02_SMT:44.7853.4548.722.91  -18.53%



Detailed run output here 

numa01 1704.50 248.67 71999.86 207091 1093
numa01_THREAD_ALLOC 461.62 416.89 23064.79 90283 961
numa02 61.75 93.86 2444.21 10652 6
numa02_SMT 46.79 23.13 977.94 1925 8
numa01 1769.09 262.00 74607.77 226677 1313
numa01_THREAD_ALLOC 433.25 365.12 21994.25 88597 773
numa02 55.64 89.52 2250.01 8848 210
numa02_SMT 49.39 19.81 938.86 1376 33
numa01 1841.82 407.73 78683.69 227428 1834
numa01_THREAD_ALLOC 517.07 465.71 26152.60 111689 978
numa02 55.95 103.26 2223.36 8471 158
numa02_SMT 53.45 19.73 962.08 1349 26
numa01 1760.41 474.74 76094.03 231278 2802
numa01_THREAD_ALLOC 456.80 395.35 23170.23 88049 835
numa02 57.18 87.31 2390.11 10804 3
numa02_SMT 44.78 26.48 944.28 1314 7
numa01 1711.91 421.49 77728.30 224185 2103
numa01_THREAD_ALLOC 452.09 430.88 22271.38 83418 2035
numa02 57.97 126.86 2354.34 8991 135
numa02_SMT 49.19 34.99 914.35 1308 22


> -- 
> Mel Gorman
> SUSE Labs
> 

-- 
Thanks and Regards
Srikar Dronamraju

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] arch/arc updates for 3.11

2013-07-01 Thread Vineet Gupta

Hi Linus,

First batch of ARC changes for 3.11. Please pull.

There's a second bunch to follow next week - which depends on commits on other
trees (irq/net). I'd have preferred the accompanying ARC change via respective
trees, but it didn't workout somehow.

Thx,
-Vineet

--->
The following changes since commit 7d132055814ef17a6c7b69f342244c410a5e000f:

  Linux 3.10-rc6 (2013-06-15 11:51:07 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git/
tags/arc-v3.11-rc1-part1

for you to fetch changes up to baadb8fd0c62540f2ffb2d0f12b8a47c7975562b:

  ARC: warn on improper stack unwind FDE entries (2013-06-27 14:37:59 +0530)


ARC changes for 3.11

Highlights of changes:

-Continuation of ARC MM changes from 3.10 including
   zero page optimization;
   Setting pagecache pages dirty by default;
   Non executable stack by default;
   Reducing dcache flushes for aliasing VIPT config

-Long overdue rework of pt_regs machinery - removing the unused word gutters
 and adding ECR register to baseline (helps cleanup lot of low level code)

-Support for ARC gcc 4.8

-Few other preventive fixes, cosmetics, usage of Kconfig helper..

The diffstat is larger than normal primarily because of arcregs.h header split
as well as beautification of macros in entry.h


Alexey Brodkin (1):
  ARC: make dcache VIPT aliasing support dependant on dcache

Mischa Jonker (1):
  ARC: [plat-arcfpga] Fix build breakage when !CONFIG_ARC_SERIAL

Paul Gortmaker (1):
  arc: delete __cpuinit usage from all arc files

Vineet Gupta (29):
  ARC: Use kconfig helper IS_ENABLED() to get rid of defines.h
  ARC: More code beautification with IS_ENABLED()
  ARC: Disintegrate arcregs.h
  ARC: Reduce Code for ECR printing
  ARC: cache detection code bitrot
  ARC: No-op full icache flush if !CONFIG_ARC_HAS_ICACHE
  ARC: [mm] Zero page optimization
  ARC: [mm] optimise VIPT dcache aliasing 1/x
  ARC: [mm] optimise VIPT dcache aliasing 2/x
  ARC: [mm] Assume pagecache page dirty by default
  ARC: [mm] Make stack/heap Non-executable by default
  ARC: [mm] Remove @write argument to do_page_fault()
  ARC: pt_regs update #0: remove kernel stack canary
  ARC: pt_regs update #1: Align pt_regs end with end of kernel stack page
  ARC: pt_regs update #2: Remove unused gutter at start of pt_regs
  ARC: pt_regs update #3: Remove unused gutter at start of callee_regs
  ARC: Increase readability of entry handlers
  ARC: Entry Handler tweaks: Avoid hardcoded LIMMS for ECR values
  ARC: Entry Handler tweaks: Simplify branch for in-kernel preemption
  ARC: K/U SP saved from one location in stack switching macro
  ARC: pt_regs update #4: r25 saved/restored unconditionally
  ARC: stop using pt_regs->orig_r8
  ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values
  ARC: Remove explicit passing around of ECR
  ARC: Setup Vector Table Base in early boot
  ARC: Adjustments for gcc 4.8
  ARC: [tlb-miss] Extraneous PTE bit testing/setting
  ARC: [tlb-miss] Fix bug with CONFIG_ARC_DBG_TLB_MISS_COUNT
  ARC: warn on improper stack unwind FDE entries

 arch/arc/Kconfig|   8 +-
 arch/arc/Makefile   |  28 +-
 arch/arc/configs/fpga_defconfig |   2 +-
 arch/arc/configs/nsimosci_defconfig |   2 +-
 arch/arc/configs/tb10x_defconfig|   2 +-
 arch/arc/include/asm/arcregs.h  | 127 +
 arch/arc/include/asm/bug.h  |   5 +-
 arch/arc/include/asm/cache.h|  26 +-
 arch/arc/include/asm/cacheflush.h   |  13 +-
 arch/arc/include/asm/defines.h  |  56 
 arch/arc/include/asm/entry.h| 521 +++-
 arch/arc/include/asm/irq.h  |   2 +-
 arch/arc/include/asm/irqflags.h |  20 ++
 arch/arc/include/asm/kgdb.h |   4 +-
 arch/arc/include/asm/kprobes.h  |   6 +-
 arch/arc/include/asm/mmu.h  |  44 +++
 arch/arc/include/asm/page.h |   7 +-
 arch/arc/include/asm/pgtable.h  |   6 +
 arch/arc/include/asm/processor.h|  17 +-
 arch/arc/include/asm/ptrace.h   |  47 ++--
 arch/arc/include/asm/syscall.h  |   5 +-
 arch/arc/include/asm/tlb-mmu1.h |   4 +-
 arch/arc/include/asm/tlb.h  |  26 --
 arch/arc/include/asm/unaligned.h|   4 +-
 arch/arc/include/uapi/asm/ptrace.h  |  15 +-
 arch/arc/kernel/asm-offsets.c   |   7 +-
 arch/arc/kernel/ctx_sw.c|  14 +-
 arch/arc/kernel/entry.S | 103 +++
 arch/arc/kernel/head.S  |   2 +
 arch/arc/kernel/irq.c   |  16 +-
 arch/arc/kernel/kgdb.c  |   4 +-
 arch/arc/kernel/kprobes.c   |   5 +-
 arch/arc/kernel/process.c   |   9 +-
 arch/arc/kernel/ptrace.c|  14 +-
 arch/arc/kernel/set

RE: [PATCH 3/4] rtc: omap: add rtc wakeup support to alarm events

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 05:45:01, Kevin Hilman wrote:
> Hebbar Gururaja  writes:
> 
> > On some platforms (like AM33xx), a special register (RTC_IRQWAKEEN)
> > is available to enable Alarm Wakeup feature. This register needs to be
> > properly handled for the rtcwake to work properly.
> >
> > Platforms using such IP should set "ti,am3352-rtc" in rtc device dt
> > compatibility node.
> >
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Grant Likely 
> > Cc: Rob Herring 
> > Cc: Rob Landley 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: Alessandro Zummo 
> > Cc: rtc-li...@googlegroups.com
> > Cc: devicetree-disc...@lists.ozlabs.org
> > Cc: linux-...@vger.kernel.org
> 
> Acked-by: Kevin Hilman 
> 
> ...with a minor nit below...
> 
> > ---
> > :100644 100644 b47aa41... 5a0f02d... M  
> > Documentation/devicetree/bindings/rtc/rtc-omap.txt
> > :100644 100644 761919d... 666b0c2... M  drivers/rtc/rtc-omap.c
> >  Documentation/devicetree/bindings/rtc/rtc-omap.txt |6 ++-
> >  drivers/rtc/rtc-omap.c |   56 
> > +---
> >  2 files changed, 54 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/rtc/rtc-omap.txt 
> > b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
> > index b47aa41..5a0f02d 100644
> > --- a/Documentation/devicetree/bindings/rtc/rtc-omap.txt
> > +++ b/Documentation/devicetree/bindings/rtc/rtc-omap.txt
> > @@ -1,7 +1,11 @@
> >  TI Real Time Clock
> >  
> >  Required properties:
> > -- compatible: "ti,da830-rtc"
> > +- compatible:
> > +   - "ti,da830-rtc"  - for RTC IP used similar to that on DA8xx SoC family.
> > +   - "ti,am3352-rtc" - for RTC IP used similar to that on AM335x SoC 
> > family.
> > +   This RTC IP has special WAKE-EN Register to enable
> > +   Wakeup generation for event Alarm.
> >  - reg: Address range of rtc register set
> >  - interrupts: rtc timer, alarm interrupts in order
> >  - interrupt-parent: phandle for the interrupt controller
> > diff --git a/drivers/rtc/rtc-omap.c b/drivers/rtc/rtc-omap.c
> > index 761919d..666b0c2 100644
> > --- a/drivers/rtc/rtc-omap.c
> > +++ b/drivers/rtc/rtc-omap.c
> > @@ -72,6 +72,8 @@
> >  #define OMAP_RTC_KICK0_REG 0x6c
> >  #define OMAP_RTC_KICK1_REG 0x70
> >  
> > +#define OMAP_RTC_IRQWAKEEN 0x7C
> > +
> 
> nit: letters in hex numbers are usually lower-case.

Thanks for the review. V2 will soon follow.

> 
> 
> Kevin
> 


Regards, 
Gururaja
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/4] davinci: da8xx/omap-l1: Remove hard coding of rtc device wakeup

2013-07-01 Thread Hebbar, Gururaja

On Tue, Jul 02, 2013 at 05:37:43, Kevin Hilman wrote:
> Hebbar Gururaja  writes:
> 
> > Since now rtc-omap driver itself calls deice_init_wakeup(dev, true),
> > duplicate call from the rtc device registration can be removed.
> >
> > This is basically a partial revert of the prev commit
> >
> > commit 75c99bb0006ee065b4e2995078d779418b0fab54
> > Author: Sekhar Nori 
> >
> > davinci: da8xx/omap-l1: mark RTC as a wakeup source
> >
> > Signed-off-by: Hebbar Gururaja 
> > Cc: Sekhar Nori 
> > Cc: Kevin Hilman 
> > Cc: Russell King 
> >
> > ---
> > :100644 100644 bf57252... 85a900c... M  
> > arch/arm/mach-davinci/devices-da8xx.c
> >  arch/arm/mach-davinci/devices-da8xx.c |9 +
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/arch/arm/mach-davinci/devices-da8xx.c 
> > b/arch/arm/mach-davinci/devices-da8xx.c
> > index bf57252..85a900c 100644
> > --- a/arch/arm/mach-davinci/devices-da8xx.c
> > +++ b/arch/arm/mach-davinci/devices-da8xx.c
> > @@ -827,14 +827,7 @@ static struct platform_device da8xx_rtc_device = {
> >  
> >  int da8xx_register_rtc(void)
> >  {
> > -   int ret;
> > -
> > -   ret = platform_device_register(&da8xx_rtc_device);
> > -   if (!ret)
> > -   /* Atleast on DA850, RTC is a wakeup source */
> > -   device_init_wakeup(&da8xx_rtc_device.dev, true);
> > -
> > -   return ret;
> > +   return  platform_device_register(&da8xx_rtc_device);
> 
> nit: extra space between 'return' and 'platform_'

Thanks for the review. V2 will soon follow.

> 
> >  }
> >  
> >  static void __iomem *da8xx_ddr2_ctlr_base;
> 
> Otherwise,
> 
> Acked-by: Kevin Hilman 
> 
> 


Regards, 
Gururaja
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel

2013-07-01 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
> Chen
> Sent: Thursday, June 20, 2013 1:23 PM
> To: b...@kernel.crashing.org
> Cc: linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel
> 
> We need to active KEXEC for book3e and bypass or convert non-book3e stuff
> in kexec coverage.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  arch/powerpc/Kconfig   |2 +-
>  arch/powerpc/kernel/machine_kexec_64.c |6 ++
>  arch/powerpc/kernel/misc_64.S  |6 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c33e3ad..6ecf3c9 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -364,7 +364,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
> 
>  config KEXEC
>   bool "kexec system call"
> - depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP))
> + depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) || PPC_BOOK3E
>   help
> kexec is a system call that implements the ability to shutdown your
> current kernel, and to start another kernel.  It is like a reboot
> diff --git a/arch/powerpc/kernel/machine_kexec_64.c
> b/arch/powerpc/kernel/machine_kexec_64.c
> index 611acdf..ef39271 100644
> --- a/arch/powerpc/kernel/machine_kexec_64.c
> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> @@ -33,6 +33,7 @@
>  int default_machine_kexec_prepare(struct kimage *image)
>  {
>   int i;
> +#ifndef CONFIG_PPC_BOOK3E
>   unsigned long begin, end;   /* limits of segment */
>   unsigned long low, high;/* limits of blocked memory range */
>   struct device_node *node;
> @@ -41,6 +42,7 @@ int default_machine_kexec_prepare(struct kimage *image)
> 
>   if (!ppc_md.hpte_clear_all)
>   return -ENOENT;
> +#endif

Do we really need this function for book3e? can we have a separate function 
rather than multiple confusing ifdef?

-Bharat

> 
>   /*
>* Since we use the kernel fault handlers and paging code to
> @@ -51,6 +53,7 @@ int default_machine_kexec_prepare(struct kimage *image)
>   if (image->segment[i].mem < __pa(_end))
>   return -ETXTBSY;
> 
> +#ifndef CONFIG_PPC_BOOK3E
>   /*
>* For non-LPAR, we absolutely can not overwrite the mmu hash
>* table, since we are still using the bolted entries in it to
> @@ -92,6 +95,7 @@ int default_machine_kexec_prepare(struct kimage *image)
>   return -ETXTBSY;
>   }
>   }
> +#endif
> 
>   return 0;
>  }
> @@ -367,6 +371,7 @@ void default_machine_kexec(struct kimage *image)
>   /* NOTREACHED */
>  }
> 
> +#ifndef CONFIG_PPC_BOOK3E
>  /* Values we need to export to the second kernel via the device tree. */
>  static unsigned long htab_base;
> 
> @@ -411,3 +416,4 @@ static int __init export_htab_values(void)
>   return 0;
>  }
>  late_initcall(export_htab_values);
> +#endif
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 6820e45..f1a7ce7 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -543,9 +543,13 @@ _GLOBAL(kexec_sequence)
>   lhz r25,PACAHWCPUID(r13)/* get our phys cpu from paca */
> 
>   /* disable interrupts, we are overwriting kernel data next */
> +#ifndef CONFIG_PPC_BOOK3E
>   mfmsr   r3
>   rlwinm  r3,r3,0,17,15
>   mtmsrd  r3,1
> +#else
> + wrteei  0
> +#endif
> 
>   /* copy dest pages, flush whole dest image */
>   mr  r3,r29
> @@ -567,10 +571,12 @@ _GLOBAL(kexec_sequence)
>   li  r6,1
>   stw r6,kexec_flag-1b(5)
> 
> +#ifndef CONFIG_PPC_BOOK3E
>   /* clear out hardware hash page table and tlb */
>   ld  r5,0(r27)   /* deref function descriptor */
>   mtctr   r5
>   bctrl   /* ppc_md.hpte_clear_all(void); */
> +#endif
> 
>  /*
>   *   kexec image calling is:
> --
> 1.7.9.5
> 
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE

2013-07-01 Thread Bhushan Bharat-R65777



> -Original Message-
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
> Chen
> Sent: Thursday, June 20, 2013 1:23 PM
> To: b...@kernel.crashing.org
> Cc: linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
> Subject: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE
> 
> book3e is different with book3s since 3s includes the exception
> vectors code in head_64.S as it relies on absolute addressing
> which is only possible within this compilation unit. So we have
> to get that label address with got.
> 
> And when boot a relocated kernel, we should reset ipvr properly again
> after .relocate.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  arch/powerpc/include/asm/exception-64e.h |8 
>  arch/powerpc/kernel/exceptions-64e.S |   15 ++-
>  arch/powerpc/kernel/head_64.S|   22 ++
>  arch/powerpc/lib/feature-fixups.c|7 +++
>  4 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64e.h
> b/arch/powerpc/include/asm/exception-64e.h
> index 51fa43e..89e940d 100644
> --- a/arch/powerpc/include/asm/exception-64e.h
> +++ b/arch/powerpc/include/asm/exception-64e.h
> @@ -214,10 +214,18 @@ exc_##label##_book3e:
>  #define TLB_MISS_STATS_SAVE_INFO_BOLTED
>  #endif
> 
> +#ifndef CONFIG_RELOCATABLE
>  #define SET_IVOR(vector_number, vector_offset)   \
>   li  r3,vector_offset@l; \
>   ori r3,r3,interrupt_base_book3e@l;  \
>   mtspr   SPRN_IVOR##vector_number,r3;
> +#else
> +#define SET_IVOR(vector_number, vector_offset)   \
> + LOAD_REG_ADDR(r3,interrupt_base_book3e);\
> + rlwinm  r3,r3,0,15,0;   \
> + ori r3,r3,vector_offset@l;  \
> + mtspr   SPRN_IVOR##vector_number,r3;
> +#endif
> 
>  #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
> 
> diff --git a/arch/powerpc/kernel/exceptions-64e.S
> b/arch/powerpc/kernel/exceptions-64e.S
> index 645170a..4b23119 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1097,7 +1097,15 @@ skpinv:addir6,r6,1 
> /*
> Increment */
>   * r4 = MAS0 w/TLBSEL & ESEL for the temp mapping
>   */
>   /* Now we branch the new virtual address mapped by this entry */
> +#ifdef CONFIG_RELOCATABLE
> + /* We have to find out address from lr. */
> + bl  1f  /* Find our address */
> +1:   mflrr6
> + addir6,r6,(2f - 1b)
> + tovirt(r6,r6)
> +#else
>   LOAD_REG_IMMEDIATE(r6,2f)
> +#endif
>   lis r7,MSR_KERNEL@h
>   ori r7,r7,MSR_KERNEL@l
>   mtspr   SPRN_SRR0,r6
> @@ -1348,9 +1356,14 @@ _GLOBAL(book3e_secondary_thread_init)
>   mflrr28
>   b   3b
> 
> -_STATIC(init_core_book3e)
> +_GLOBAL(init_core_book3e)
>   /* Establish the interrupt vector base */
> +#ifdef CONFIG_RELOCATABLE
> + tovirt(r2,r2)
> + LOAD_REG_ADDR(r3, interrupt_base_book3e)
> +#else
>   LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e)
> +#endif
>   mtspr   SPRN_IVPR,r3
>   sync
>   blr
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index b61363d..0942f3a 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -414,12 +414,22 @@ _STATIC(__after_prom_start)
>   /* process relocations for the final address of the kernel */
>   lis r25,PAGE_OFFSET@highest /* compute virtual base of kernel */
>   sldir25,r25,32
> +#if defined(CONFIG_PPC_BOOK3E)
> + tovirt(r26,r26) /* on booke, we already run at
> PAGE_OFFSET */
> +#endif
>   lwz r7,__run_at_load-_stext(r26)
> +#if defined(CONFIG_PPC_BOOK3E)
> + tophys(r26,r26) /* Restore for the remains. */
> +#endif
>   cmplwi  cr0,r7,1/* flagged to stay where we are ? */
>   bne 1f
>   add r25,r25,r26
>  1:   mr  r3,r25
>   bl  .relocate
> +#if defined(CONFIG_PPC_BOOK3E)
> + /* We should set ivpr again after .relocate. */
> + bl  .init_core_book3e
> +#endif
>  #endif
> 
>  /*
> @@ -447,12 +457,24 @@ _STATIC(__after_prom_start)
>   * variable __run_at_load, if it is set the kernel is treated as relocatable
>   * kernel, otherwise it will be moved to PHYSICAL_START
>   */
> +#if defined(CONFIG_PPC_BOOK3E)
> + tovirt(r26,r26) /* on booke, we already run at
> PAGE_OFFSET */
> +#endif
>   lwz r7,__run_at_load-_stext(r26)
> +#if defined(CONFIG_PPC_BOOK3E)
> + tophys(r26,r26) /* Restore for the remains. */
> +#endif
>   cmplwi  cr0,r7,1
>   bne 3f
> 
> +#ifdef CONFIG_PPC_BOOK3E
> + LOAD_REG_ADDR(r5, interrupt_end_book3e)
> + LOAD_REG_ADDR(r11, _stext)
> + sub r5,r5,r11
> +#else
>   /* just copy interrupts */
>   LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext)
> +#endif
>

Re: [PATCH RFC nohz_full v2 2/7] nohz_full: Add rcu_dyntick data for scalable detection of all-idle state

2013-07-01 Thread Mike Galbraith

On Mon, 2013-07-01 at 12:16 -0700, Paul E. McKenney wrote: 
> On Mon, Jul 01, 2013 at 11:34:13AM -0700, Josh Triplett wrote:

> > > > This also naturally raises the question "How can we let userspace get
> > > > accurate time without forcing a timer tick?".
> > > 
> > > We don't.  ;-)
> > 
> > We don't currently, hence my question about how we can. :)
> 
> Per-CPU atomic clocks?

Great idea, who needs timekeeping code. 

http://www.euronews.com/2013/04/02/swiss-sets-sights-on-miniscule-atomic-clock/

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: udevd cannot modprobe snd-hda-intel with 3.9.8

2013-07-01 Thread Damien Wyart

* Marc Haber  [130701 17:50]:
> The issue does not appear when one uses a Debian kernel, or uses the
> configuration that Debian uses for its kernels to build a vanilla
> kernel.org kernel. This has, after a gazillion of reoboots and
> experimenting with man different blacklist entries and kernel
> configuration, led me to the fact that this bug does not show if the
> kernel is compiled with CONFIG_SND_SUPPORT_OLD_API=y.

> So it was my error to dump backwards compatibility and to remove the
> old api support. I have now set CONFIG_SND_SUPPORT_OLD_API=y again and
> everything is fine.

Even with this option, I get the problem, so this is not enough...

-- 
Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/11] tracing: trace event triggers

2013-07-01 Thread zhangwei(Jovi)

On 2013/7/1 20:32, Masami Hiramatsu wrote:
> (2013/06/29 18:30), zhangwei(Jovi) wrote:
>>> This patchset implements 'trace event triggers', which are similar to
>>> the function triggers implemented for 'ftrace filter commands' (see
>>> 'Filter commands' in Documentation/trace/ftrace.txt), but instead of
>>> being invoked from function calls are invoked by trace events.
>>> Basically the patchset allows 'commands' to be triggered whenever a
>>> given trace event is hit.  The set of commands implemented by this
>>> patchset are:
>>>
>>>  - enable/disable_event - enable or disable another event whenever
>>>the trigger event is hit
>>>
>>>  - stacktrace - dump a stacktrace to the trace buffer whenever the
>>>trigger event is hit
>>>
>>>  - snapshot - create a snapshot of the current trace buffer whenever
>>>the trigger event is hit
>>>
>>>  - traceon/traceoff - turn tracing on or off whenever the trigger
>>>event is hit
>>>
>>> Triggers can also be conditionally invoked by associating a standard
>>> trace event filter with them - if the given event passes the filter,
>>> the trigger is invoked, otherwise it's not. (see 'Event filtering' in
>>> Documentation/trace/events.txt for info on event filters).
>>>
>>
>> I just aware that we are implementing more and more scripting functionality 
>> into
>> tracing subsystem, like filter and trigger mode, of cause we don't call it
>> as scripting, but basically the pattern is same, all is "do something when 
>> event hit".
> 
> Agreed, that's a good direction to handle event by script in kernel :)
> That may be simply done with an extension of "event trigger". Of course
> your ktap work will be the next step for ftrace. But I think, the basic
> implementation can be done by just passing recorded event entry to
> each action. (other works are for debugfs management)
> And that could be a generic trace-event interface for other users too.
> 
Fully agree "passing recorded event entry to each action".
Actually there already have this interface, it's perf.

struct perf_event *
perf_event_create_kernel_counter(struct perf_event_attr *attr,
int cpu,
struct task_struct *task,
perf_overflow_handler_t callback,
void *context);

As we known, each event has a id, register this id in perf_event_attr, and
give a callback function, then it will call the callback function when event 
hit.

void overflow_callback(struct perf_event *event,
   struct perf_sample_data *data,
   struct pt_regs *regs)

the recorded event entry is passed as data->raw->data;

This perf interface is a generic action trigger interface now, it support
tracepoint, k(ret)probe, u(ret)probe, PMU, hw_breakpoint(perhaps we could 
implement
PMU and hw_breakpoint trigger in future)

So why we need to reinvent another trigger interface as this patchset did?
(this patchset changed lots of places, include macro in ftrace.h, and will 
change more
if support kprobe/uprobe trigger in future.)

jovi





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v3.9] [v3.10] [Regression] serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller

2013-07-01 Thread Wang YanQing

On Mon, Jul 01, 2013 at 12:14:45PM -0400, Joseph Salisbury wrote:
> Hi Wang,
> 
> A bug was opened against the Ubuntu kernel[0].  After a kernel bisect,
> it was found that reverting the following commit resolved this bug:
> 
> commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366
> Author: Wang YanQing 
> Date:   Fri Mar 1 11:47:20 2013 +0800
> 
> serial: 8250_pci: add support for another kind of NetMos Technology
> PCI 9835 Multi-I/O Controller
> 
> 
> The regression was introduced as of v3.9-rc3 and still exists in the
> current Mainline tree.  It was also propagated to the stable trees.
> 
> The patch causes the device to use the serial module instead of
> parport_serial.  Maybe the the quirk in ~drivers/pci/quirks.c
> quirk_netmos() needs to be modified?
> 
> I see that you are the author of this patch, so I wanted to run this by
> you.  I was thinking of requesting a revert, but I wanted to get your
> feedback first.
> 
> 
> Thanks,
> 
> Joe
Hi all,
I am sorry for it and later reply.

But I am sure I have included the parport_serial in the
kernel for my consumers at the time they report their
PCI 9835 Multi-I/O Controller didn't work which cause
this "culprit" patch.

I don't have the card in hand right now, so I can't
dig into it. After stare into parport_serial.c, yes,
it seems like it will handle this pci serial card.

Maybe I forget or miss something, I hope.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: is it desirable to improve the build system?

2013-07-01 Thread Greg KH

On Mon, Jul 01, 2013 at 05:12:01PM -0700, Mark Galeck wrote:
> Dear Linux-Kernel Community,
> 
> I am a consultant specializing in builds, and I recently worked for a
> large client company, a world-wide leader in its field, where I
> overhauled their build system: sped it up by more of an order of
> magnitude, and improved maintainability, for example making
> comment-to-code ratio approach 1:1. 
> 
> A part of their build is a modified Linux kernel. I rebuilt it
> countless times in various configurations, but made only a few further
> changes, because those improvements would have a small effect on the
> whole system, and because they want to stay close to your current
> release for ease of porting.
> 
> From that limited experience, it nevertheless seemed to me, that the
> Linux kernel build, while correct, is somewhat slow, and the sources
> could be more readable.

How is it "slow"?  And it has to be correct, fast and non-correct
doesn't work well, does it :)

What "sources" are you referring to as being not readable?

> Does the Linux-Kernel Community perceive that is the case?
> 
> If so, do you think it is possible to improve?
> 
> If so, would such an attempt be welcome, including and especially by,
> the current maintainer(s) of the build?  Of course it would have to be
> completely backwards-compatible, including to the text output
> interface and requirements for modules makefiles.

Patches are always gladly accepted, if they work well, please feel free
to submit them.

> I do apologize if my impressions are simply the result of
> unfamilliarity and naivete, and that I don't understand the deep
> reasons why "it has to be this way", and that I am unaware that such
> attempts were already made by some very skilled people.  

What do you not understand that you think could be changed?

Have you looked at the history of the build code to help understand why
things were changed to be they way they are?  git should help you out
here.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sched: smart wake-affine

2013-07-01 Thread Michael Wang

Since RFC:
Tested again with the latest tip 3.10.0-rc7.

wake-affine stuff is always trying to pull wakee close to waker, by theory,
this will bring benefit if waker's cpu cached hot data for wakee, or the
extreme ping-pong case.

And testing show it could benefit hackbench 15% at most.

However, the whole stuff is somewhat blindly and time-consuming, some
workload therefore suffer.

And testing show it could damage pgbench 50% at most.

Thus, wake-affine stuff should be more smart, and realise when to stop
it's thankless effort.

This patch introduced 'nr_wakee_switch', which will be increased each
time the task switch it's wakee.

So a high 'nr_wakee_switch' means the task has more than one wakee, and
bigger the number, higher the wakeup frequency.

Now when making the decision on whether to pull or not, pay attention on
the wakee with a high 'nr_wakee_switch', pull such task may benefit wakee,
but also imply that waker will face cruel competition later, it could be
very cruel or very fast depends on the story behind 'nr_wakee_switch',
whatever, waker therefore suffer.

Furthermore, if waker also has a high 'nr_wakee_switch', imply that multiple
tasks rely on it, then waker's higher latency will damage all of them, pull
wakee seems to be a bad deal.

Thus, when 'waker->nr_wakee_switch / wakee->nr_wakee_switch' become higher
and higher, the deal seems to be worse and worse.

The patch therefore help wake-affine stuff to stop it's work when:

wakee->nr_wakee_switch > factor &&
waker->nr_wakee_switch > (factor * wakee->nr_wakee_switch)

The factor here is the online cpu number, and more cpu will lead to more pull
since the trial become more severe.

After applied the patch, pgbench show 40% improvement at most.

Test:
Test with 12 cpu X86 server and tip 3.10.0-rc7.

basesmart

| db_size | clients |  tps  |   |  tps  |
+-+-+---+   +---+
| 22 MB   |   1 | 10598 |   | 10693 |
| 22 MB   |   2 | 21257 |   | 21409 |
| 22 MB   |   4 | 41386 |   | 41517 |
| 22 MB   |   8 | 51253 |   | 58173 |
| 22 MB   |  12 | 48570 |   | 53817 |
| 22 MB   |  16 | 46748 |   | 55992 | +19.77%
| 22 MB   |  24 | 44346 |   | 56087 | +26.48%
| 22 MB   |  32 | 43460 |   | 54781 | +26.05%
| 7484 MB |   1 |  8951 |   |  9336 |
| 7484 MB |   2 | 19233 |   | 19348 |
| 7484 MB |   4 | 37239 |   | 37316 |
| 7484 MB |   8 | 46087 |   | 49329 |
| 7484 MB |  12 | 42054 |   | 49231 |
| 7484 MB |  16 | 40765 |   | 51082 | +25.31%
| 7484 MB |  24 | 37651 |   | 52740 | +40.08%
| 7484 MB |  32 | 37056 |   | 50866 | +37.27%
| 15 GB   |   1 |  8845 |   |  9124 |
| 15 GB   |   2 | 19094 |   | 19187 |
| 15 GB   |   4 | 36979 |   | 37178 |
| 15 GB   |   8 | 46087 |   | 50075 |
| 15 GB   |  12 | 41901 |   | 48098 |
| 15 GB   |  16 | 40147 |   | 51463 | +28.19%
| 15 GB   |  24 | 37250 |   | 51750 | +38.93%
| 15 GB   |  32 | 36470 |   | 50807 | +39.31%

CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Mike Galbraith 
Signed-off-by: Michael Wang 
---
 include/linux/sched.h |3 +++
 kernel/sched/fair.c   |   45 +
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..1c996c7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1041,6 +1041,9 @@ struct task_struct {
 #ifdef CONFIG_SMP
struct llist_node wake_entry;
int on_cpu;
+   struct task_struct *last_wakee;
+   unsigned long nr_wakee_switch;
+   unsigned long last_switch_decay;
 #endif
int on_rq;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61a614..591c113 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3109,6 +3109,45 @@ static inline unsigned long effective_load(struct 
task_group *tg, int cpu,
 
 #endif
 
+static void record_wakee(struct task_struct *p)
+{
+   /*
+* Rough decay, don't worry about the boundary, really active
+* task won't care the loose.
+*/
+   if (jiffies > current->last_switch_decay + HZ) {
+   current->nr_wakee_switch = 0;
+   current->last_switch_decay = jiffies;
+   }
+
+   if (current->last_wakee != p) {
+   current->last_wakee = p;
+   current->nr_wakee_switch++;
+   }
+}
+
+static int nasty_pull(struct task_struct *p)
+{
+   int factor = cpumask_weight(cpu_online_mask);
+
+   /*
+* Yeah, it's the switching-frequency, could means many wakee or
+* rapidly switch, use factor here will just help to automatically
+* adjust the loose-degree, so more cpu will lead to more pull.
+

Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting

2013-07-01 Thread Dave Hansen

On 07/01/2013 07:37 PM, Zheng Liu wrote:
> FWIW, it would be great if we can let MAP_POPULATE flag support shared
> mappings because in our product system there has a lot of applications
> that uses mmap(2) and then pre-faults this mapping.  Currently these
> applications need to pre-fault the mapping manually.

Are you sure it doesn't?  From a cursory look at the code, it looked to
me like it would populate anonymous and file-backed, but I didn't
double-check experimentally.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: remove the unnecessrary code of fs/inode.c

2013-07-01 Thread Al Viro

On Mon, Jul 01, 2013 at 08:19:03AM -0400, Dong Fang wrote:
> These functions, such as find_inode_fast() and find_inode(), iget_lock() and
> iget5_lock(), insert_inode_locked() and insert_inode_locked4(), almost have
> the same code.

NAK.  These functions exist exactly because the variant with callbacks
costs more.  We walk the hash chain and for each inode on it your
variant would result in
* call
* fetching ino from memory
* comparison (and storing result in general-purpose register)
* return
* checking that register and branch on the result of that check
What's more, the whole thing's not fun for branch predictor.

It is a hot enough path to warrant a special-cased variant; if we can't
get away with that, we use the variants with callbacks, but on filesystems
where ->i_ino is sufficient as search key we really want to avoid the
overhead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] vmpressure: implement strict mode

2013-07-01 Thread Anton Vorontsov

On Mon, Jul 01, 2013 at 05:22:36PM +0900, Hyunhee Kim wrote:
> >> > > for each event in memory.pressure_level; do
> >> > >   /* register eventfd to be notified on "event" */
> >> > > done
> >> >
> >> > This scheme registers "all" events.
> >>
> >> Yes, because I thought that's the user-case that matters for activity
> >> manager :)
> >
> > Some activity managers use only low levels (Android), some might use only
> > medium levels (simple load-balancing).
> 
> When the platform like Android uses only "low" level, is it the
> process you intended when designing vmpressure?
> 
> 1. activity manager receives "low" level events
> 2. it reads and checks the current memory (e.g. available memory) using vmstat
> 3. if the available memory is not under the threshold (defined e.g. by
> activity manager), activity manager does nothing
> 4. if the available memory is under the threshold, activity manager
> handles it by e.g. reclaiming or killing processes?

Yup, exactly.

> At first time when I saw this vmpressure, I thought that I should
> register all events ("low", "medium", and "critical
> ") and use different handler for each event. However, without the mode
> like strict mode, I should see too many events. So, now, I think that
> it is better to use only one level and run each handler after checking
> available memory as you mentioned.

Yup, this should work ideally.

> But,
> 
> 1. Isn't it overhead to read event and check memory state every time
> we receive events?

Even if it is an overhead, is it measurable? Plus, vmstat/memcg stats are
the only source of information that Activity Manager can use to make a
decision, so there is no point in duplicating the information in the
notifications.

> - sometimes, even when there are lots of available memory, low
> level event could occur if most of them is reclaimable memory not free
> pages.

The point of low level is to signal [any] reclaiming activity. So, yes, 

> - Don't most of platforms use available memory to judge their
> current memory state?

No, because you hardly want to monitor available memory only. You want to
take into account the level of the page caches, etc.

> Is there any reason vmpressure use reclaim rate?

Yes, you can refer to this email:

  http://lkml.org/lkml/2012/10/4/145

And here is about the levels thing:

  http://lkml.org/lkml/2012/10/22/177

> IMO, activity manager doesn't have to check available memory if it
> could receive signal based on the available memory.

But userspace can define its own policy of managing the tasks/resouces
based on different factors, other than just available memory. And that is
exactly why we don't filter the events in the kernel anymore. The only
filtering that we make is the levels, which, as it appears, can work for
many use-cases. 

> 2. If we use only "medium" to avoid the overheads occurred when using
> "low" level, isn't it possible to miss sending events when there is a
> little available memory but reclaim ratio is high?

If your app don't "trust" reclaim ratio idicator, then the application can
use its own heuristics, using low level just to monitor reclaiming
activity. More than that, you can change vmpressure itself to use
different heuristics for low/med/crit levels: the point of introducing
levels was also to hide the implementation and memory management details,
so if you can come up with a better approach for vmpressure "internals"
you are more than welcome to do so. :)

> IMHO, we cannot consider and cover all the use cases, but considering
> some use cases and giving some guides and directions to use this
> vmpressure will be helpful to make many platform accept this for their
> low memory manager.

Can't argue with that. :) I guess I will need to better document current
behavior of the levels and when exactly the events trigger.

Thanks!

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: remove the unnecessrary code of fs/inode.c

2013-07-01 Thread Gu Zheng

On 07/01/2013 08:19 PM, Dong Fang wrote:

> These functions, such as find_inode_fast() and find_inode(), iget_lock() and
> iget5_lock(), insert_inode_locked() and insert_inode_locked4(), almost have
> the same code.

Maybe the title "[PATCH] vfs: remove the reduplicate code of fs/inode.c" is more
suitable.

> 
> Signed-off-by: Dong Fang 


Reviewed-by: Gu Zheng 

Thanks,
Gu

> ---
>  fs/inode.c |  134 
> 
>  1 files changed, 26 insertions(+), 108 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 00d5fc3..847eee9 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -790,6 +790,22 @@ void prune_icache_sb(struct super_block *sb, int 
> nr_to_scan)
>  }
>  
>  static void __wait_on_freeing_inode(struct inode *inode);
> +
> +
> +static int test_ino(struct inode *inode, void *data)
> +{
> + unsigned long ino = *(unsigned long *) data;
> + return inode->i_ino == ino;

Can be more concise:
return inode->i_ino == *(unsigned long *) data;
,so does the new insert_inode_locked():


> +}
> +
> +static int set_ino(struct inode *inode, void *data)
> +{
> + inode->i_ino = *(unsigned long *) data;
> + return 0;
> +}
> +
> +
> +
>  /*
>   * Called with the inode lock held.
>   */
> @@ -829,28 +845,7 @@ repeat:
>  static struct inode *find_inode_fast(struct super_block *sb,
>   struct hlist_head *head, unsigned long ino)
>  {
> - struct inode *inode = NULL;
> -
> -repeat:
> - hlist_for_each_entry(inode, head, i_hash) {
> - spin_lock(&inode->i_lock);
> - if (inode->i_ino != ino) {
> - spin_unlock(&inode->i_lock);
> - continue;
> - }
> - if (inode->i_sb != sb) {
> - spin_unlock(&inode->i_lock);
> - continue;
> - }
> - if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
> - __wait_on_freeing_inode(inode);
> - goto repeat;
> - }
> - __iget(inode);
> - spin_unlock(&inode->i_lock);
> - return inode;
> - }
> - return NULL;
> + return find_inode(sb, head, test_ino, (void *)&ino);
>  }
>  
>  /*
> @@ -1073,50 +1068,7 @@ EXPORT_SYMBOL(iget5_locked);
>   */
>  struct inode *iget_locked(struct super_block *sb, unsigned long ino)
>  {
> - struct hlist_head *head = inode_hashtable + hash(sb, ino);
> - struct inode *inode;
> -
> - spin_lock(&inode_hash_lock);
> - inode = find_inode_fast(sb, head, ino);
> - spin_unlock(&inode_hash_lock);
> - if (inode) {
> - wait_on_inode(inode);
> - return inode;
> - }
> -
> - inode = alloc_inode(sb);
> - if (inode) {
> - struct inode *old;
> -
> - spin_lock(&inode_hash_lock);
> - /* We released the lock, so.. */
> - old = find_inode_fast(sb, head, ino);
> - if (!old) {
> - inode->i_ino = ino;
> - spin_lock(&inode->i_lock);
> - inode->i_state = I_NEW;
> - hlist_add_head(&inode->i_hash, head);
> - spin_unlock(&inode->i_lock);
> - inode_sb_list_add(inode);
> - spin_unlock(&inode_hash_lock);
> -
> - /* Return the locked inode with I_NEW set, the
> -  * caller is responsible for filling in the contents
> -  */
> - return inode;
> - }
> -
> - /*
> -  * Uhhuh, somebody else created the same inode under
> -  * us. Use the old inode instead of the one we just
> -  * allocated.
> -  */
> - spin_unlock(&inode_hash_lock);
> - destroy_inode(inode);
> - inode = old;
> - wait_on_inode(inode);
> - }
> - return inode;
> + return iget5_locked(sb, ino, test_ino, set_ino, (void *)&ino);
>  }
>  EXPORT_SYMBOL(iget_locked);
>  
> @@ -1281,48 +1233,6 @@ struct inode *ilookup(struct super_block *sb, unsigned 
> long ino)
>  }
>  EXPORT_SYMBOL(ilookup);
>  
> -int insert_inode_locked(struct inode *inode)
> -{
> - struct super_block *sb = inode->i_sb;
> - ino_t ino = inode->i_ino;
> - struct hlist_head *head = inode_hashtable + hash(sb, ino);
> -
> - while (1) {
> - struct inode *old = NULL;
> - spin_lock(&inode_hash_lock);
> - hlist_for_each_entry(old, head, i_hash) {
> - if (old->i_ino != ino)
> - continue;
> - if (old->i_sb != sb)
> - continue;
> - spin_lock(&old->i_lock);
> - if (old->i_state & (I_FREEING|I_WILL_FREE)) {
> - spin_unlock(&old->i_lock);
> -

Re: [PATCH] sched: fix cpu utilization account error

2013-07-01 Thread Xie XiuQi

On 2013/7/2 11:20, Michael Wang wrote:
> Hi, Xie
> 
> On 07/01/2013 07:26 PM, Xie XiuQi wrote:
> [snip]
>> Here is the kthread main logic. Although it's not a good idea, but it does
>> exist:
>> while (!kthread_should_stop()) {
>>  /* call schedule every 1 sec */
>>  if (HZ <= jiffies - last) {
>>  last = jiffies;
>>  schedule();
>>  }
>>
>>  /* get data and sent it */
>>  get_msg();
>>  send_msg();
> 
> What about use cond_resched() here? Isn't that more gentle?
> 

That's a good idea for driver implementation.
Thank you Michael.

> Regards,
> Michael Wang
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: fix cpu utilization account error

2013-07-01 Thread Xie XiuQi

On 2013/7/2 11:07, Mike Galbraith wrote:
> On Mon, 2013-07-01 at 19:26 +0800, Xie XiuQi wrote:
> 
>> Here is the kthread main logic. Although it's not a good idea, but it does
>> exist:
> 
> Why not fix this instead?
> 
>> while (!kthread_should_stop()) {
>>  /* call schedule every 1 sec */
>>  if (HZ <= jiffies - last) {
>>  last = jiffies;
>>  schedule();
>>  }
> 
> Hanging out in the kernel for ages is not cool.  That doesn't mean
> something else might not pop up that forces the issue, but to date it
> has not, and sacrificing precious fastpath cycles is not attractive.
> 

That is to say, the driver's code needs improvement.
Thank you Mike.

> -Mike
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: Add workaround for idle/iowait decreasing problem.

2013-07-01 Thread Fernando Luis Vazquez Cao


Hi Frederic,

I'm sorry it's taken me so long to respond; I got sidetracked for
a while. Comments follow below.

On 2013/04/28 09:49, Frederic Weisbecker wrote:

On Tue, Apr 23, 2013 at 09:45:23PM +0900, Tetsuo Handa wrote:

CONFIG_NO_HZ=y can cause idle/iowait values to decrease.

[...]

It's not clear in the changelog why you see non-monotonic idle/iowait values.

Looking at the previous patch from Fernando, it seems that's because we can
race with concurrent updates from the CPU target when it wakes up from idle?
(could be updated by drivers/cpufreq/cpufreq_governor.c as well).

If so the bug has another symptom: we may also report a wrong iowait/idle time
by accounting the last idle time twice.

In this case we should fix the bug from the source, for example we can force
the given ordering:

= Write side =  = Read side =

// tick_nohz_start_idle()
write_seqcount_begin(ts->seq)
ts->idle_entrytime = now
ts->idle_active = 1
write_seqcount_end(ts->seq)

// tick_nohz_stop_idle()
write_seqcount_begin(ts->seq)
ts->iowait_sleeptime += now - ts->idle_entrytime
t->idle_active = 0
write_seqcount_end(ts->seq)

 // get_cpu_iowait_time_us()
 do {
 seq = read_seqcount_begin(ts->seq)
 if (t->idle_active) {
 time = now - ts->idle_entrytime
 time += ts->iowait_sleeptime
 } else {
 time = ts->iowait_sleeptime
 }
 } while (read_seqcount_retry(ts->seq, 
seq));

Right? seqcount should be enough to make sure we are getting a consistent 
result.
I doubt we need harder locking.


I tried that and it doesn't suffice. The problem that causes the most
serious skews is related to the CPU scheduler: the per-run queue
counter nr_iowait can be updated not only from the CPU it belongs
to but also from any other CPU if tasks are migrated out while
waiting on I/O.

The race looks like this:

CPU0CPU1
[ CPU1_rq->nr_iowait == 0 ]
Task foo: io_schedule()
schedule()
[ CPU1_rq->nr_iowait == 1) ]
Task foo migrated to CPU0
Goes to sleep

// get_cpu_iowait_time_us(1, NULL)
[ CPU1_ts->idle_active == 1, CPU1_rq->nr_iowait == 1 ]
[ CPU1_ts->iowait_sleeptime = 4, CPU1_ts->idle_entrytime = 3 ]
now = 5
delta = 5 - 3 = 2
iowait = 4 + 2 = 6

Task foo wakes up
[ CPU1_rq->nr_iowait == 0 ]

CPU1 comes out of sleep state
tick_nohz_stop_idle()
  update_ts_time_stats()
[ CPU1_ts->idle_active == 1, 
CPU1_rq->nr_iowait == 0 ]
[ CPU1_ts->iowait_sleeptime = 4, 
CPU1_ts->idle_entrytime = 3 ]
now = 6
delta = 6 - 3 = 3
(CPU1_ts->iowait_sleeptime is not updated)
CPU1_ts->idle_entrytime = now = 6
  CPU1_ts->idle_active = 0

// get_cpu_iowait_time_us(1, NULL)
[ CPU1_ts->idle_active == 0, CPU1_rq->nr_iowait == 0 ]
[ CPU1_ts->iowait_sleeptime = 4, CPU1_ts->idle_entrytime = 6 ]
iowait = CPU1_ts->iowait_sleeptime = 4
(iowait decreased from 6 to 4)



Another thing while at it. It seems that an update done from 
drivers/cpufreq/cpufreq_governor.c
(calling get_cpu_iowait_time_us() -> update_ts_time_stats()) can randomly race 
with a CPU
entering/exiting idle. I have no idea why drivers/cpufreq/cpufreq_governor.c 
does the update
itself. It can just compute the delta like any reader. May be we could remove 
that and only
ever call update_ts_time_stats() from the CPU that exit idle.

What do you think?


I am all for it. We just need to make sure that CPU governors
can cope with non-monotonic idle and iowait times. I'll take
a closer look at the code but I wouldn't mind if Arjan (CCed)
beat me at that.

Thanks,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: context tracking demolishes pipe-test

2013-07-01 Thread Mike Galbraith

On Mon, 2013-07-01 at 11:20 +0200, Mike Galbraith wrote: 
> On Mon, 2013-07-01 at 11:12 +0200, Mike Galbraith wrote: 
> > On Mon, 2013-07-01 at 10:06 +0200, Peter Zijlstra wrote:
> > 
> > > So aside from the context tracking stuff, there's still a regression
> > > we might want to look at. That's still a ~10% drop against 2.6.32 for
> > > TCP_RR and few percents for tbench.
> > 
> > Yeah, known, and some of it's ours.
> 
> (btw tbench has a ~5% phase-of-moon jitter, you can pretty much
> disregard that one)

Hm.  Seems we don't own much of TCP_RR regression after all, somewhere
along the line while my silly-tester hat was moldering, we got some
cycles back.. in the light config case anyway.

With wakeup granularity set to zero, per pipe-test, scheduler is within
variance of .32, sometimes appearing a tad lighter, though usually a wee
bit heavier.  TCP_RR throughput delta does not correlate.

echo 0 > sched_wakeup_granularity_ns

pipe-test
2.6.32-regress689.8 Khz1.000
3.10.0-regress682.5 Khz .989

netperf TCP_RR
2.6.32-regress   117910.11 Trans/sec   1.000
3.10.0-regress96955.12 Trans/sec.822

It should be closer than this. 

 3.10.0-regress  2.6.32-regress
 3.85%  [kernel][k] tcp_ack  4.04%  
[kernel][k] tcp_sendmsg
 3.34%  [kernel][k] __schedule   3.63%  
[kernel][k] schedule
 2.93%  [kernel][k] tcp_sendmsg  2.86%  
[kernel][k] tcp_recvmsg
 2.54%  [kernel][k] tcp_rcv_established  2.83%  
[kernel][k] tcp_ack
 2.26%  [kernel][k] tcp_transmit_skb 2.19%  
[kernel][k] system_call
 1.90%  [kernel][k] __netif_receive_skb_core 2.16%  
[kernel][k] tcp_transmit_skb
 1.87%  [kernel][k] tcp_v4_rcv   2.07%  
libc-2.14.1.so  [.] __libc_recv
 1.84%  [kernel][k] tcp_write_xmit   1.95%  
[kernel][k] _spin_lock_bh
 1.70%  [kernel][k] __switch_to  1.89%  
libc-2.14.1.so  [.] __libc_send
 1.57%  [kernel][k] tcp_recvmsg  1.77%  
[kernel][k] tcp_rcv_established
 1.54%  [kernel][k] _raw_spin_lock_bh1.70%  
[kernel][k] netif_receive_skb
 1.52%  libc-2.14.1.so  [.] __libc_recv  1.61%  
[kernel][k] tcp_v4_rcv
 1.43%  [kernel][k] ip_rcv   1.49%  
[kernel][k] native_sched_clock
 1.35%  [kernel][k] local_bh_enable  1.49%  
[kernel][k] tcp_write_xmit
 1.33%  [kernel][k] _raw_spin_lock_irqsave   1.46%  
[kernel][k] __switch_to
 1.26%  [kernel][k] ip_queue_xmit1.35%  
[kernel][k] dev_queue_xmit
 1.16%  [kernel][k] __inet_lookup_established1.29%  
[kernel][k] __alloc_skb
 1.14%  [kernel][k] mod_timer1.27%  
[kernel][k] skb_release_data
 1.13%  [kernel][k] process_backlog  1.26%  
netserver   [.] recv_tcp_rr
 1.13%  [kernel][k] read_tsc 1.22%  
[kernel][k] local_bh_enable
 1.13%  libc-2.14.1.so  [.] __libc_send  1.18%  netperf 
[.] send_tcp_rr
 1.12%  [kernel][k] system_call  1.18%  
[kernel][k] sched_clock_local
 1.07%  [kernel][k] tcp_event_data_recv  1.11%  
[kernel][k] copy_user_generic_string
 1.04%  [kernel][k] ip_finish_output 1.07%  
[kernel][k] _spin_lock_irqsave

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 00/11] tracing: trace event triggers

2013-07-01 Thread zhangwei(Jovi)

On 2013/7/1 23:49, Tom Zanussi wrote:
> Hi jovi,
> 
> On Sat, 2013-06-29 at 17:30 +0800, zhangwei(Jovi) wrote:
>> On 2013/6/29 13:08, Tom Zanussi wrote:
>>> Hi,
>>>
>>> This is v2 of the trace event triggers patchset, addressing comments
>>> from Masami Hiramatsu, zhangwei(Jovi), and Steve Rostedt (thanks for
>>> reviewing v1).
>>>
>>> v2:
>>>  - removed all changes to __ftrace_event_enable_disable() (except
>>>for patch 04/11 which clears the soft_disabled bit as discussed)
>>>and created a separate trace_event_trigger_enable_disable() that
>>>calls it after setting/clearing the TRIGGER_MODE_BIT.
>>>  - added a trigger_mode enum for future patches that break up the
>>>trigger calls for filtering, but that's also now used as a command
>>>id for registering/unregistering commands.
>>>  - removed the enter_file/exit_file members that were added to
>>>syscall_metadata after realizing that they were unnecessary if
>>>ftrace_syscall_enter/exit() were modified to receive a pointer
>>>to the ftrace_file instead of the pointer to the trace_array in
>>>the ftrace_file.
>>>  - broke up the trigger invocation into two parts so that triggers
>>>like 'stacktrace' that themselves log into the trace buffer can
>>>defer the actual trigger invocation until after the current
>>>record is closed, which is needed for the filter check that
>>>in turn determines whether the trigger gets invoked.
>>>  - other minor cleanup
>>>
>>>
>>> This patchset implements 'trace event triggers', which are similar to
>>> the function triggers implemented for 'ftrace filter commands' (see
>>> 'Filter commands' in Documentation/trace/ftrace.txt), but instead of
>>> being invoked from function calls are invoked by trace events.
>>> Basically the patchset allows 'commands' to be triggered whenever a
>>> given trace event is hit.  The set of commands implemented by this
>>> patchset are:
>>>
>>>  - enable/disable_event - enable or disable another event whenever
>>>the trigger event is hit
>>>
>>>  - stacktrace - dump a stacktrace to the trace buffer whenever the
>>>trigger event is hit
>>>
>>>  - snapshot - create a snapshot of the current trace buffer whenever
>>>the trigger event is hit
>>>
>>>  - traceon/traceoff - turn tracing on or off whenever the trigger
>>>event is hit
>>>
>>> Triggers can also be conditionally invoked by associating a standard
>>> trace event filter with them - if the given event passes the filter,
>>> the trigger is invoked, otherwise it's not. (see 'Event filtering' in
>>> Documentation/trace/events.txt for info on event filters).
>>>
>>
>> I just aware that we are implementing more and more scripting functionality 
>> into
>> tracing subsystem, like filter and trigger mode, of cause we don't call it
>> as scripting, but basically the pattern is same, all is "do something when 
>> event hit".
>>
> 
> Not really - this patchset is just reusing the existing filter code
> that's been there for years, and yes, it does follow the pattern of
> 'doing something when an event is hit', but the things it does are
> really dirt simple - toggling of other events or the global tracing
> on/off switch, snapshotting trace buffers, etc.  All things that don't
> require any kind of scripting - this is more on the level of wiring
> things together on a breadboard.  And it's all available by simply using
> 'cat' and 'echo' - no separate command and scripts to keep track of.  
> 
>> FYI, a pretty simple scripting module of tracing is there:
>>  https://github.com/ktap/ktap.git
>>
> 
> It looks pretty nice, but I wonder if Linux is ready for a full-fledged
> language interpreter in the kernel.  It's been tried before - see
> DProbes (in fact that effort is where kprobes came from - after it was
> obvious DProbes wouldn't make it into the kernel, it was broken up into
> multiple pieces - kprobes and uprobes eventually got in, but the RPN
> interpreter, which also had an ANSI C compiler targeting the bytecode
> (dpcc) never did...).  Well, following that, DTrace came along and
> showed how useful it could be, so maybe there wouldn't be as much
> resistance these days...

Actually ktap is very lightweight compare with Dtrace and other tools,
it doesn't reinvent tracing interface, it make use on 
kprobe/uprobe/tracepoint/perf,
it doesn't engage with debugging info, just a simple script interface for 
tracing,
as demoed in example, the tracing interface is same as perf.
> 
> Also, assuming an in-kernel language interpreter would fly, did you
> consider starting with something already-baked rather than starting from
> scratch?  How about taking something like the parrot VM and carving out
> a minimal core subset of that suitable for embedding in the kernel?  It
> probably wouldn't be easy, but you'd be building on a relatively mature
> and tested VM that's been designed for targeting many languages
> (including lua).
> 
ktap is building on mature interpreter(lua) which proven

Re: [PATCH 0/6] Introducing Device Tree Overlays

2013-07-01 Thread Guenter Roeck

On Mon, Jul 01, 2013 at 12:46:24PM +0300, Pantelis Antoniou wrote:
> Hi Guenter,
> 
> Yes there is an updated patchset against 3.10 as of this morning.
> 
> I will post details how to get it later today.
> 
Hi Pantelis,

looking forward to it. I see you have a large number of new branches in your
repository. It would help a lot to know which patches from which branch I need
to get DT overlay support to work. For example, the mainline-pdev-fixes branch
seems relevant, but that is not immediately obvious (and may be a wrong
assumption).

Are you trying to get the patches in the various branches into the upstream
kernel ?

Thanks a lot!

Guenter

> Regards
> 
> -- Pantelis
> 
> On Jun 29, 2013, at 5:38 AM, Guenter Roeck wrote:
> 
> > On Fri, Jan 04, 2013 at 09:31:04PM +0200, Pantelis Antoniou wrote:
> >> The following patchset introduces Device Tree overlays, a method
> >> of dynamically altering the kernel's live Device Tree.
> >> 
> >> This patchset is against mainline as of Friday Jan 4 2013.
> >> (4956964 Merge tag 'driver-core-3.8-rc2' of \
> >>git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core)
> >> 
> >> Note that a separate patch for the DTC compiler has been posted and
> >> is required to compile the DTS files according to the documentation.
> >> The patch is "dtc: Dynamic symbols & fixup support"
> >> 
> >> An implementation patchset for a beaglebone cape loader will follow,
> >> but if you want to check out a working kernel for the beaglebone please
> >> pull from:
> >> 
> >> git://github.com/pantoniou/linux-bbxm.git branch not-capebus-v8
> >> 
> >> Pantelis Antoniou (6):
> >>  OF: Introduce device tree node flag helpers.
> >>  OF: export of_property_notify
> >>  OF: Export all DT proc update functions
> >>  OF: Introduce utility helper functions
> >>  OF: Introduce Device Tree resolve support.
> >>  OF: Introduce DT overlay support.
> >> 
> > Hi Pantelis,
> > 
> > do you have an update of this patchset ? I want to seriously start testing 
> > it.
> > Digging through your tree on github is a bit cumbersome, and I am not sure
> > if I got all patches. It would also be nice to get an update with all the
> > comments addressed.
> > 
> > Thanks,
> > Guenter
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: Fix possible sleep in invalid context

2013-07-01 Thread Sujit Reddy Thumma


On 7/2/2013 8:34 AM, Aaron Lu wrote:

Fix this by releasing spin_lock_irq() before calling
>pm_runtime_autosuspend() in blk_post_runtime_resume().

Hi Sujit,

Thanks for testing out block layer runtime PM!

As for the problem here, it is already fixed by:

commit c60855cdb976c632b3bf8922eeab8a0e78edfc04
Author: Aaron Lu
Date:   Fri May 17 15:47:20 2013 +0800

 blkpm: avoid sleep when holding queue lock


Thanks Aaron. I see that is merged in 3.10-rc6.
Please ignore this patch.

--
Regards,
Sujit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] usb: host: xhci: Enable XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0

2013-07-01 Thread Sarah Sharp

Thanks George, this looks fine.  I will munge the description a bit when
I commit it, and mark it for stable as well.

Unfortunately, due to the timing of the merge window, this patch will
have to wait for 2-3 weeks until 3.11-rc1 is out.

Sarah Sharp

On Mon, Jul 01, 2013 at 10:59:12AM +0530, George Cherian wrote:
> Xhci controllers with hci_version > 0.96 gives spurious success
> events on short packet completion. During webcam capture the
> "ERROR Transfer event TRB DMA ptr not part of current TD" was observed.
> The same application works fine with synopsis controllers hci_version 0.96.
> The same Issue is seen with Intel Pantherpoint xhci controller. So enabling
> this quirk in xhci_gen_setup if controller verion is greater than 0.96.
> For xhci-pci move the quirk to much generic place xhci_gen_setup.
> 
> Signed-off-by: George Cherian 
> ---
>  drivers/usb/host/xhci-pci.c | 1 -
>  drivers/usb/host/xhci.c | 7 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index cc24e39..f00cb20 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -93,7 +93,6 @@ static void xhci_pci_quirks(struct device *dev, struct 
> xhci_hcd *xhci)
>   }
>   if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
>   pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI) {
> - xhci->quirks |= XHCI_SPURIOUS_SUCCESS;
>   xhci->quirks |= XHCI_EP_LIMIT_QUIRK;
>   xhci->limit_active_eps = 64;
>   xhci->quirks |= XHCI_SW_BW_CHECKING;
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index d8f640b..0f7be59 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -4697,6 +4697,13 @@ int xhci_gen_setup(struct usb_hcd *hcd, 
> xhci_get_quirks_t get_quirks)
>  
>   get_quirks(dev, xhci);
>  
> + /* In xhci controllers which follow xhci 1.0 spec gives a spurious
> +  * success event after a short transfer. This quirk will ignore such
> +  * spurious event.
> +  */
> + if (xhci->hci_version > 0x96)
> + xhci->quirks |= XHCI_SPURIOUS_SUCCESS;
> +
>   /* Make sure the HC is halted. */
>   retval = xhci_halt(xhci);
>   if (retval)
> -- 
> 1.8.1.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 08/13] sound: sam9x5_wm8731: machine driver for at91sam9x5 wm8731 boards

2013-07-01 Thread Bo Shen


Hi Richard,

  Will move this patch before 5, 6, 7?

On 7/1/2013 16:39, Richard Genoud wrote:

From: Nicolas Ferre 

Description of the Asoc machine driver for an at91sam9x5 based board
with a wm8731 audio DAC. Wm8731 is clocked by a crystal and used as a
master on the SSC/I2S interface. Its connections are a headphone jack
and an Line input jack.

[Richard: this is based on an old patch from Nicolas that I forward
ported and reworked to use only device tree]

Signed-off-by: Nicolas Ferre 
Signed-off-by: Uwe Kleine-König 
Signed-off-by: Richard Genoud 
---
  sound/soc/atmel/Kconfig |   12 ++
  sound/soc/atmel/Makefile|2 +
  sound/soc/atmel/sam9x5_wm8731.c |  232 +++
  3 files changed, 246 insertions(+)
  create mode 100644 sound/soc/atmel/sam9x5_wm8731.c

diff --git a/sound/soc/atmel/Kconfig b/sound/soc/atmel/Kconfig
index 3fdd87f..f24d601 100644
--- a/sound/soc/atmel/Kconfig
+++ b/sound/soc/atmel/Kconfig
@@ -13,6 +13,7 @@ config SND_ATMEL_SOC_PDC
  config SND_ATMEL_SOC_DMA
tristate
depends on SND_ATMEL_SOC
+   select SND_SOC_DMAENGINE_PCM

  config SND_ATMEL_SOC_SSC
tristate
@@ -32,6 +33,17 @@ config SND_AT91_SOC_SAM9G20_WM8731
  Say Y if you want to add support for SoC audio on WM8731-based
  AT91sam9g20 evaluation board.

+config SND_AT91_SOC_SAM9X5_WM8731
+   tristate "SoC Audio support for WM8731-based at91sam9x5 board"
+   depends on ATMEL_SSC && SND_ATMEL_SOC && SOC_AT91SAM9X5
+   select SND_ATMEL_SOC_SSC
+   select SND_ATMEL_SOC_DMA
+   select SND_ATMEL_SOC_PDC


Not need to select SND_ATMEL_SOC_PDC


+   select SND_SOC_WM8731
+   help
+ Say Y if you want to add support for audio SoC on an
+ at91sam9x5 based board that is using WM8731 codec.
+
  config SND_AT91_SOC_AFEB9260
tristate "SoC Audio support for AFEB9260 board"
depends on ARCH_AT91 && ATMEL_SSC && ARCH_AT91 && MACH_AFEB9260 && 
SND_ATMEL_SOC
diff --git a/sound/soc/atmel/Makefile b/sound/soc/atmel/Makefile
index 41967cc..7784c09 100644
--- a/sound/soc/atmel/Makefile
+++ b/sound/soc/atmel/Makefile
@@ -11,6 +11,8 @@ obj-$(CONFIG_SND_ATMEL_SOC_SSC) += snd-soc-atmel_ssc_dai.o

  # AT91 Machine Support
  snd-soc-sam9g20-wm8731-objs := sam9g20_wm8731.o
+snd-soc-sam9x5-wm8731-objs := sam9x5_wm8731.o

  obj-$(CONFIG_SND_AT91_SOC_SAM9G20_WM8731) += snd-soc-sam9g20-wm8731.o
+obj-$(CONFIG_SND_AT91_SOC_SAM9X5_WM8731) += snd-soc-sam9x5-wm8731.o
  obj-$(CONFIG_SND_AT91_SOC_AFEB9260) += snd-soc-afeb9260.o
diff --git a/sound/soc/atmel/sam9x5_wm8731.c b/sound/soc/atmel/sam9x5_wm8731.c
new file mode 100644
index 000..83ca457
--- /dev/null
+++ b/sound/soc/atmel/sam9x5_wm8731.c
@@ -0,0 +1,232 @@
+/*
+ * sam9x5_wm8731   --  SoC audio for AT91SAM9X5-based boards
+ * that are using WM8731 as codec.
+ *
+ *  Copyright (C) 2011 Atmel,
+ *   Nicolas Ferre 
+ *
+ * Based on sam9g20_wm8731.c by:
+ * Sedji Gaouaou 
+ *
+ * GPL
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "../codecs/wm8731.h"
+#include "atmel-pcm.h"
+#include "atmel_ssc_dai.h"
+
+#define MCLK_RATE 12288000
+
+#define DRV_NAME "sam9x5-snd-wm8731"
+
+/*
+ * Audio paths on at91sam9x5ek board:
+ *
+ *  |A| > |  | ---R> Headphone Jack
+ *  |T| <\|  WM  | ---L--/
+ *  |9| ---> CLK <--> | 8751 | <--R- Line In Jack
+ *  |1| < |  | <--L--/
+ */
+static const struct snd_soc_dapm_widget at91sam9x5ek_dapm_widgets[] = {
+   SND_SOC_DAPM_HP("Headphone Jack", NULL),
+   SND_SOC_DAPM_LINE("Line In Jack", NULL),
+};
+
+/*
+ * Logic for a wm8731 as connected on a at91sam9x5 based board.
+ */
+static int at91sam9x5ek_wm8731_init(struct snd_soc_pcm_runtime *rtd)
+{
+   struct snd_soc_codec *codec = rtd->codec;
+   struct snd_soc_dai *codec_dai = rtd->codec_dai;
+   struct snd_soc_dapm_context *dapm = &codec->dapm;
+   struct device *dev = rtd->dev;
+   int ret;
+
+   dev_dbg(dev, "ASoC: at91sam9x5ek_wm8731_init() called\n");
+
+   /*
+* remove some not supported rates in relation with clock
+* provided to the wm8731 codec
+*/
+   switch (MCLK_RATE) {
+   case 12288000:
+   codec_dai->driver->playback.rates &= SNDRV_PCM_RATE_8000 |
+SNDRV_PCM_RATE_32000 |
+SNDRV_PCM_RATE_48000 |
+SNDRV_PCM_RATE_96000;
+   codec_dai->driver->capture.rates &= SNDRV_PCM_RATE_8000 |
+   SNDRV_PCM_RATE_32000 |
+   SNDRV_PCM_RATE_48000 |
+   SNDRV_PCM_RATE_96000;
+

Re: [PATCH] sched: fix cpu utilization account error

2013-07-01 Thread Michael Wang

Hi, Xie

On 07/01/2013 07:26 PM, Xie XiuQi wrote:
[snip]
> Here is the kthread main logic. Although it's not a good idea, but it does
> exist:
> while (!kthread_should_stop()) {
>   /* call schedule every 1 sec */
>   if (HZ <= jiffies - last) {
>   last = jiffies;
>   schedule();
>   }
> 
>   /* get data and sent it */
>   get_msg();
>   send_msg();

What about use cond_resched() here? Isn't that more gentle?

Regards,
Michael Wang

> 
>   if (kthread_should_stop())
>   break;
> }
> 
>> That said, accounting funnies induced by skipped update are possible,
>> which could trump the cycle savings I suppose, so maybe savings (sniff)
>> should just go away?
> 
> Indeed, removing the skip_clock_update could resolve the issue, but I found
> there is no this issue in preempt mode. However, if remove skip_clock_update
> we'll get more precise time account.
> 
> So, what's your opinion, Mike.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: fix cpu utilization account error

2013-07-01 Thread Mike Galbraith

On Mon, 2013-07-01 at 19:26 +0800, Xie XiuQi wrote:

> Here is the kthread main logic. Although it's not a good idea, but it does
> exist:

Why not fix this instead?

> while (!kthread_should_stop()) {
>   /* call schedule every 1 sec */
>   if (HZ <= jiffies - last) {
>   last = jiffies;
>   schedule();
>   }

Hanging out in the kernel for ages is not cool.  That doesn't mean
something else might not pop up that forces the issue, but to date it
has not, and sacrificing precious fastpath cycles is not attractive.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 03/13] ARM: at91: DTS: sam9x5: add clock for SSC DT entry

2013-07-01 Thread Bo Shen


Hi Richard,

On 7/1/2013 16:39, Richard Genoud wrote:

Signed-off-by: Richard Genoud 
---
  arch/arm/mach-at91/at91sam9x5.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-at91/at91sam9x5.c b/arch/arm/mach-at91/at91sam9x5.c
index 2abee66..191eb4b 100644
--- a/arch/arm/mach-at91/at91sam9x5.c
+++ b/arch/arm/mach-at91/at91sam9x5.c
@@ -233,6 +233,7 @@ static struct clk_lookup periph_clocks_lookups[] = {
CLKDEV_CON_DEV_ID("mci_clk", "f000c000.mmc", &mmc1_clk),
CLKDEV_CON_DEV_ID("dma_clk", "ec00.dma-controller", &dma0_clk),
CLKDEV_CON_DEV_ID("dma_clk", "ee00.dma-controller", &dma1_clk),
+   CLKDEV_CON_DEV_ID("pclk", "at91sam9g45_ssc.0", &ssc_clk),


Actually, we don't use this anymore. Am I right?


CLKDEV_CON_DEV_ID("pclk", "f001.ssc", &ssc_clk),
CLKDEV_CON_DEV_ID(NULL, "f801.i2c", &twi0_clk),
CLKDEV_CON_DEV_ID(NULL, "f8014000.i2c", &twi1_clk),



Best Regards,
Bo Shen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 02/13] misc: atmel_ssc: keep the count of pdev->id

2013-07-01 Thread Bo Shen


Hi Richard,

On 7/1/2013 16:39, Richard Genoud wrote:

With device tree, pdev->id is always -1, so we introduce a local
counter.

Signed-off-by: Richard Genoud 
---
  drivers/misc/atmel-ssc.c |7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/misc/atmel-ssc.c b/drivers/misc/atmel-ssc.c
index 3afbd82..d1ec5ab 100644
--- a/drivers/misc/atmel-ssc.c
+++ b/drivers/misc/atmel-ssc.c
@@ -173,6 +173,12 @@ out:
return err;
  }

+/* counter of ssc devive instances.
+ * With device tree pdev->id is always -1, so we have to keep the
+ * count ourselves
+ */
+static int ssc_device_id;


Do we really need this? If Yes, would it better to get from device 
through of_alias_get_id?



+
  static int ssc_probe(struct platform_device *pdev)
  {
struct resource *regs;
@@ -235,6 +241,7 @@ static int ssc_probe(struct platform_device *pdev)
}

spin_lock(&user_lock);
+   pdev->id = ssc_device_id++;
list_add_tail(&ssc->list, &ssc_list);
spin_unlock(&user_lock);




Best Regards,
Bo Shen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: Fix possible sleep in invalid context

2013-07-01 Thread Aaron Lu

On 07/01/2013 11:28 PM, Sujit Reddy Thumma wrote:
> When block runtime PM is enabled following warning is seen
> while resuming the device.
> 
> BUG: sleeping function called from invalid context at
> .../drivers/base/power/runtime.c:923
> in_atomic(): 1, irqs_disabled(): 128, pid: 12, name: kworker/0:1
> [] (unwind_backtrace+0x0/0x120) from
> [] (__pm_runtime_suspend+0x34/0xa0) from
> [] (blk_post_runtime_resume+0x4c/0x5c) from
> [] (scsi_runtime_resume+0x90/0xb4) from
> [] (__rpm_callback+0x30/0x58) from
> [] (rpm_callback+0x18/0x28) from
> [] (rpm_resume+0x3dc/0x540) from
> [] (pm_runtime_work+0x8c/0x98) from
> [] (process_one_work+0x238/0x3e4) from
> [] (worker_thread+0x1ac/0x2ac) from
> [] (kthread+0x88/0x94) from
> [] (kernel_thread_exit+0x0/0x8)
> 
> Fix this by releasing spin_lock_irq() before calling
> pm_runtime_autosuspend() in blk_post_runtime_resume().

Hi Sujit,

Thanks for testing out block layer runtime PM!

As for the problem here, it is already fixed by:

commit c60855cdb976c632b3bf8922eeab8a0e78edfc04
Author: Aaron Lu 
Date:   Fri May 17 15:47:20 2013 +0800

blkpm: avoid sleep when holding queue lock

-Aaron

> 
> Signed-off-by: Sujit Reddy Thumma 
> Cc: sta...@vger.kernel.org
> ---
>  block/blk-core.c |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 33c33bc..2456116 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3159,16 +3159,18 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
>   */
>  void blk_post_runtime_resume(struct request_queue *q, int err)
>  {
> - spin_lock_irq(q->queue_lock);
>   if (!err) {
> + spin_lock_irq(q->queue_lock);
>   q->rpm_status = RPM_ACTIVE;
>   __blk_run_queue(q);
>   pm_runtime_mark_last_busy(q->dev);
> + spin_unlock_irq(q->queue_lock);
>   pm_runtime_autosuspend(q->dev);
>   } else {
> + spin_lock_irq(q->queue_lock);
>   q->rpm_status = RPM_SUSPENDED;
> + spin_unlock_irq(q->queue_lock);
>   }
> - spin_unlock_irq(q->queue_lock);
>  }
>  EXPORT_SYMBOL(blk_post_runtime_resume);
>  #endif
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 01/13] misc: atmel_ssc: add device tree DMA support

2013-07-01 Thread Bo Shen


Hi Richard,

On 7/1/2013 16:39, Richard Genoud wrote:

The ssc device has to fill the at_dma_slave structure with the
device tree informations.
Doing a of_dma_request_slave_channel()+dma_release_channel() for that
seems wrong (or at least not very clean).


Please hold on of this, as to the ASoC dmaengine will deal with this. 
So, we not need do it manually.


Now, I am working on it. And will send out the patch soon after testing OK.


Signed-off-by: Richard Genoud
---
  drivers/misc/atmel-ssc.c|   56 +++
  include/linux/atmel-ssc.h   |2 ++
  include/linux/platform_data/dma-atmel.h |2 ++
  3 files changed, 60 insertions(+)


Best Regards,
Bo Shen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [draft] Tracing multibuffer support concurrency issues

2013-07-01 Thread Alexander Lam

On Mon, Jul 1, 2013 at 6:35 PM, Steven Rostedt  wrote:
> On Mon, 2013-07-01 at 15:33 -0700, Alexander Lam wrote:
>
>> To fix this we could go through the ftrace_trace_arrays list and use
>> addresses to check if a particular pointer to a trace_array is still
>> valid, but this is vulnerable to the ABA problem if a trace_array is
>> freed and another is reallocated at the same address. This method is
>> used by subsystem_open() in trace_events.c
>
> And what's so bad about that? If it is freed and a new one is allocated
> at the same address, let it return crap. I'm much more interested in not
> letting it crash then caring about inconsistent data from someone that's
> doing crazy things to the system.

I figured you might say something like that. I agree with you on this.

>>
>> An ugly way to get around the ABA issue is to use a monotonically
>> increasing ID # for each trace_array instance. Those IDs could be used
>> instead of pointers when creating debugfs files.
>
> Not worth it.
>
>>
>> Is there a better way to fix this problem?
>>
>> Also unaddressed are all of the other files which use a trace_array,
>> trace_cpu, or ftrace_event_file in their operation - these would need
>> the same fix.
>
> Hmm, really? Just the initiator. That is, when an event is enabled or
> anything gets opened, we block deletion from happening. That way we
> don't need to care about the rest. Only open and enabling events.

Yes, I did mean the initiator, but I meant the "search for this
pointer in a list" would have to be applied for each of those struct
types because they were being passed through inode->i_private. I see
how this isn't a problem anymore after looking at your patch. It
didn't occur to me that checking the tr field of those structs before
using other pointers in the struct would also work.

Thanks,
Alex

>
> -- Steve
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V2 2/2] sched: update cfs_rq weight earlier in enqueue_entity

2013-07-01 Thread Lei Wen

Paul,

On Mon, Jul 1, 2013 at 10:07 PM, Paul Turner  wrote:
> Could you please restate the below?
>
> On Mon, Jul 1, 2013 at 5:33 AM, Lei Wen  wrote:
>> Since we are going to calculate cfs_rq's average ratio by
>> runnable_load_avg/load.weight
>
> I don't understand what you mean by this.

Previously I take runnable_load_avg/load.weight calculation as the cfs_rq's
average ratio. But as Alex point out, the runnable_avg_sum/runnable_avg_period
may better sever this need.

>
>>, if not increase the load.weight prior to
>> enqueue_entity_load_avg, it may lead to one cfs_rq's avg ratio higher
>> than 100%.
>>
>
> Or this.

In my mind, runnable_load_avg in one cfs_rq should always be less than
load.weight.
Not sure whether this assumption stand here, but runnable_load_avg/load.weight
truly could shows out the cfs_rq execution trend in some aspect.

The previous problem that enqueue_entity_load_avg called before
account_entity_enqueue,
which make runnable_load_avg be updated first, then the load.weight.
So that with the trace info log inside of enqueue_entity_load_avg, we
may get the calculation
result for runnable_load_avg/load.weight > 1.
This result is not friendly for the final data being parsed out.

>
>> Adjust the sequence, so that all ratio is kept below 100%.
>>
>> Signed-off-by: Lei Wen 
>> ---
>>  kernel/sched/fair.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 07bd74c..d1eee84 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1788,8 +1788,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct 
>> sched_entity *se, int flags)
>>  * Update run-time statistics of the 'current'.
>>  */
>> update_curr(cfs_rq);
>> -   enqueue_entity_load_avg(cfs_rq, se, flags & ENQUEUE_WAKEUP);
>> account_entity_enqueue(cfs_rq, se);
>> +   enqueue_entity_load_avg(cfs_rq, se, flags & ENQUEUE_WAKEUP);
>
> account_entity_enqueue is independent of enqueue_entity_load_avg;
> their order should not matter.

Yes, agree, the order should not be matter, but for make trace info
integrated, we may
need some order here.

>
> Further, should we restore the reverted amortization commit (improves
> context switch times)

Not understand here...
What the "should we restore the reverted amortization commit (improves
context switch times)" means here...?

enqueue_entity_load_avg needs to precede
> account_entity_enqueue as it may update se->load.weight.

account_entity_enqueue needs to precede enqueue_entity_load_avg?

Thanks,
Lei

>
>> update_cfs_shares(cfs_rq);
>>
>> if (flags & ENQUEUE_WAKEUP) {
>> --
>> 1.7.10.4
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 2/3] ARM: mxs: cfa10049: Switch bus i2c1 to bitbanging

2013-07-01 Thread Fabio Estevam

Hi Alexandre,

On Mon, Jun 24, 2013 at 2:24 PM, Alexandre Belloni
 wrote:
> From: Maxime Ripard 
>
> The ADCs connected to this bus have been experiencing some timeout
> issues when using the iMX28 i2c controller. Switching back to bitbanging
> solves this.

Are you able to use the mxs i2c controller instead of bitbang if you
use this patch?
http://www.spinics.net/lists/stable/msg13202.html

Regards,

Fabio Estevam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] tracing: Protect ftrace_trace_arrays list in trace_events.c

2013-07-01 Thread Alexander Z Lam

There are multiple places where the ftrace_trace_arrays list is accessed in
trace_events.c without the trace_types_lock held.

Cc: David Sharp 
Cc: Alexander Z Lam 
Signed-off-by: Alexander Z Lam 
---
 kernel/trace/trace.c|  2 +-
 kernel/trace/trace.h|  2 ++
 kernel/trace/trace_events.c | 11 ++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2f7307e..35e5e55 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -245,7 +245,7 @@ static struct tracer*trace_types 
__read_mostly;
 /*
  * trace_types_lock is used to protect the trace_types list.
  */
-static DEFINE_MUTEX(trace_types_lock);
+DEFINE_MUTEX(trace_types_lock);
 
 /*
  * serialize the access of the ring buffer
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3de07e0..334dc85 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -225,6 +225,8 @@ enum {
 
 extern struct list_head ftrace_trace_arrays;
 
+extern struct mutex trace_types_lock;
+
 /*
  * The global tracer (top) should be the first trace array added,
  * but we check the flag anyway.
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 6db3290..1b14751 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -987,6 +987,7 @@ static int subsystem_open(struct inode *inode, struct file 
*filp)
int ret;
 
/* Make sure the system still exists */
+   mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
list_for_each_entry(dir, &tr->systems, list) {
@@ -1002,6 +1003,7 @@ static int subsystem_open(struct inode *inode, struct 
file *filp)
}
  exit_loop:
mutex_unlock(&event_mutex);
+   mutex_unlock(&trace_types_lock);
 
if (!system)
return -ENODEV;
@@ -1586,6 +1588,7 @@ static void __add_event_to_tracers(struct 
ftrace_event_call *call,
 int trace_add_event_call(struct ftrace_event_call *call)
 {
int ret;
+   mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
 
ret = __register_event(call, NULL);
@@ -1593,11 +1596,13 @@ int trace_add_event_call(struct ftrace_event_call *call)
__add_event_to_tracers(call, NULL);
 
mutex_unlock(&event_mutex);
+   mutex_unlock(&trace_types_lock);
return ret;
 }
 
 /*
- * Must be called under locking both of event_mutex and trace_event_sem.
+ * Must be called under locking of trace_types_lock, event_mutex and
+ * trace_event_sem.
  */
 static void __trace_remove_event_call(struct ftrace_event_call *call)
 {
@@ -1609,11 +1614,13 @@ static void __trace_remove_event_call(struct 
ftrace_event_call *call)
 /* Remove an event_call */
 void trace_remove_event_call(struct ftrace_event_call *call)
 {
+   mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
down_write(&trace_event_sem);
__trace_remove_event_call(call);
up_write(&trace_event_sem);
mutex_unlock(&event_mutex);
+   mutex_unlock(&trace_types_lock);
 }
 
 #define for_each_event(event, start, end)  \
@@ -1757,6 +1764,7 @@ static int trace_module_notify(struct notifier_block 
*self,
 {
struct module *mod = data;
 
+   mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
switch (val) {
case MODULE_STATE_COMING:
@@ -1767,6 +1775,7 @@ static int trace_module_notify(struct notifier_block 
*self,
break;
}
mutex_unlock(&event_mutex);
+   mutex_unlock(&trace_types_lock);
 
return 0;
 }
-- 
1.8.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: [draft] Tracing multibuffer support concurrency issues

2013-07-01 Thread Steven Rostedt

On Mon, 2013-07-01 at 21:35 -0400, Steven Rostedt wrote:

> > 
> > Is there a better way to fix this problem?
> > 
> > Also unaddressed are all of the other files which use a trace_array,
> > trace_cpu, or ftrace_event_file in their operation - these would need
> > the same fix.
> 
> Hmm, really? Just the initiator. That is, when an event is enabled or
> anything gets opened, we block deletion from happening. That way we
> don't need to care about the rest. Only open and enabling events.

I added two helper functions to handle this. trace_array_get() and
trace_array_put(). This patch prevents your example from crashing.

This probably needs to be added to opening of other files if not already
done.

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e36da7f..89a3930 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -204,6 +204,37 @@ static struct trace_array  global_trace;
 
 LIST_HEAD(ftrace_trace_arrays);
 
+int trace_array_get(struct trace_array *this_tr)
+{
+   struct trace_array *tr;
+   int ret = -ENODEV;
+
+   mutex_lock(&trace_types_lock);
+   list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+   if (tr == this_tr) {
+   tr->ref++;
+   ret = 0;
+   break;
+   }
+   }
+   mutex_unlock(&trace_types_lock);
+
+   return ret;
+}
+
+static void __trace_array_put(struct trace_array *this_tr)
+{
+   WARN_ON(!this_tr->ref);
+   this_tr->ref--;
+}
+
+void trace_array_put(struct trace_array *this_tr)
+{
+   mutex_lock(&trace_types_lock);
+   __trace_array_put(this_tr);
+   mutex_unlock(&trace_types_lock);
+}
+
 int filter_current_check_discard(struct ring_buffer *buffer,
 struct ftrace_event_call *call, void *rec,
 struct ring_buffer_event *event)
@@ -2831,10 +2862,9 @@ static const struct seq_operations tracer_seq_ops = {
 };
 
 static struct trace_iterator *
-__tracing_open(struct inode *inode, struct file *file, bool snapshot)
+__tracing_open(struct trace_array *tr, struct trace_cpu *tc,
+  struct inode *inode, struct file *file, bool snapshot)
 {
-   struct trace_cpu *tc = inode->i_private;
-   struct trace_array *tr = tc->tr;
struct trace_iterator *iter;
int cpu;
 
@@ -2913,8 +2943,6 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
tracing_iter_reset(iter, cpu);
}
 
-   tr->ref++;
-
mutex_unlock(&trace_types_lock);
 
return iter;
@@ -2944,17 +2972,20 @@ static int tracing_release(struct inode *inode, struct 
file *file)
struct trace_array *tr;
int cpu;
 
-   if (!(file->f_mode & FMODE_READ))
+   /* Writes do not use seq_file, need to grab tr from inode */
+   if (!(file->f_mode & FMODE_READ)) {
+   struct trace_cpu *tc = inode->i_private;
+
+   trace_array_put(tc->tr);
return 0;
+   }
 
iter = m->private;
tr = iter->tr;
+   trace_array_put(tr);
 
mutex_lock(&trace_types_lock);
 
-   WARN_ON(!tr->ref);
-   tr->ref--;
-
for_each_tracing_cpu(cpu) {
if (iter->buffer_iter[cpu])
ring_buffer_read_finish(iter->buffer_iter[cpu]);
@@ -2973,20 +3004,23 @@ static int tracing_release(struct inode *inode, struct 
file *file)
kfree(iter->trace);
kfree(iter->buffer_iter);
seq_release_private(inode, file);
+
return 0;
 }
 
 static int tracing_open(struct inode *inode, struct file *file)
 {
+   struct trace_cpu *tc = inode->i_private;
+   struct trace_array *tr = tc->tr;
struct trace_iterator *iter;
int ret = 0;
 
+   if (trace_array_get(tr) < 0)
+   return -ENODEV;
+
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) &&
(file->f_flags & O_TRUNC)) {
-   struct trace_cpu *tc = inode->i_private;
-   struct trace_array *tr = tc->tr;
-
if (tc->cpu == RING_BUFFER_ALL_CPUS)
tracing_reset_online_cpus(&tr->trace_buffer);
else
@@ -2994,12 +3028,16 @@ static int tracing_open(struct inode *inode, struct 
file *file)
}
 
if (file->f_mode & FMODE_READ) {
-   iter = __tracing_open(inode, file, false);
+   iter = __tracing_open(tr, tc, inode, file, false);
if (IS_ERR(iter))
ret = PTR_ERR(iter);
else if (trace_flags & TRACE_ITER_LATENCY_FMT)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
}
+
+   if (ret < 0)
+   trace_array_put(tr);
+
return ret;
 }
 
@@ -4575,12 +4613,16 @@ struct ftrace_buffer_info {
 static int tracing_snapshot_open(struct inode *inode, struct file *file)
 {
struc

Re: [GIT PULL for v3.11] media patches for v3.11

2013-07-01 Thread Stephen Rothwell

Hi Mauro,

On Mon, 1 Jul 2013 07:58:56 -0300 Mauro Carvalho Chehab  
wrote:
>
> Please pull from:
>   git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 
> v4l_for_linus
> 
> For the media patches for Kernel v3.11.

I am not sure why you added a back merge of v3.10 before sending this to
Linus?

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpG9twURwAYx.pgp
Description: PGP signature

[PATCH] shdma: fixup sh_dmae_get_partial() calculation error

2013-07-01 Thread Kuninori Morimoto

sh_desc->hw.tcr is controlling data size,
and register TCR is controlling data transfer count
which was xmit_shift'ed value of hw.tcr.
Current sh_dmae_get_partial() is calculating in different unit.
This patch fixes it.

Cc: Guennadi Liakhovetski 
Signed-off-by: Kuninori Morimoto 
---
>> Guennadi

Can you please review this patch ?

 drivers/dma/sh/shdma.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/sh/shdma.c b/drivers/dma/sh/shdma.c
index b70709b..d670b8b 100644
--- a/drivers/dma/sh/shdma.c
+++ b/drivers/dma/sh/shdma.c
@@ -388,8 +388,8 @@ static size_t sh_dmae_get_partial(struct shdma_chan *schan,
shdma_chan);
struct sh_dmae_desc *sh_desc = container_of(sdesc,
struct sh_dmae_desc, shdma_desc);
-   return (sh_desc->hw.tcr - sh_dmae_readl(sh_chan, TCR)) <<
-   sh_chan->xmit_shift;
+   return sh_desc->hw.tcr -
+   (sh_dmae_readl(sh_chan, TCR) << sh_chan->xmit_shift);
 }
 
 /* Called from error IRQ or NMI */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] tracing: Protect ftrace_trace_arrays list in trace_events.c

2013-07-01 Thread Alexander Lam

Oh, sorry, that is an incomplete patch; some bits are in a patch I
dropped. I'll send you a new one in about 20 minutes.

- Alex

On Mon, Jul 1, 2013 at 6:25 PM, Steven Rostedt  wrote:
> On Mon, 2013-07-01 at 15:31 -0700, Alexander Z Lam wrote:
>> There are multiple places where the ftrace_trace_arrays list is accessed in
>> trace_events.c without the trace_types_lock held.
>
> Hmm, doesn't compile. Not a complete patch? trace_types_lock is local to
> trace.c, and needs to be in trace.h and non static.
>
> -- Steve
>
>>
>> Cc: David Sharp 
>> Cc: Alexander Z Lam 
>> Signed-off-by: Alexander Z Lam 
>> ---
>>  kernel/trace/trace_events.c | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>> index 6db3290..26351cc 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -1586,6 +1586,7 @@ static void __add_event_to_tracers(struct 
>> ftrace_event_call *call,
>>  int trace_add_event_call(struct ftrace_event_call *call)
>>  {
>>   int ret;
>> + mutex_lock(&trace_types_lock);
>>   mutex_lock(&event_mutex);
>>
>>   ret = __register_event(call, NULL);
>> @@ -1593,11 +1594,13 @@ int trace_add_event_call(struct ftrace_event_call 
>> *call)
>>   __add_event_to_tracers(call, NULL);
>>
>>   mutex_unlock(&event_mutex);
>> + mutex_unlock(&trace_types_lock);
>>   return ret;
>>  }
>>
>>  /*
>> - * Must be called under locking both of event_mutex and trace_event_sem.
>> + * Must be called under locking of trace_types_lock, event_mutex and
>> + * trace_event_sem.
>>   */
>>  static void __trace_remove_event_call(struct ftrace_event_call *call)
>>  {
>> @@ -1609,11 +1612,13 @@ static void __trace_remove_event_call(struct 
>> ftrace_event_call *call)
>>  /* Remove an event_call */
>>  void trace_remove_event_call(struct ftrace_event_call *call)
>>  {
>> + mutex_lock(&trace_types_lock);
>>   mutex_lock(&event_mutex);
>>   down_write(&trace_event_sem);
>>   __trace_remove_event_call(call);
>>   up_write(&trace_event_sem);
>>   mutex_unlock(&event_mutex);
>> + mutex_unlock(&trace_types_lock);
>>  }
>>
>>  #define for_each_event(event, start, end)\
>> @@ -1757,6 +1762,7 @@ static int trace_module_notify(struct notifier_block 
>> *self,
>>  {
>>   struct module *mod = data;
>>
>> + mutex_lock(&trace_types_lock);
>>   mutex_lock(&event_mutex);
>>   switch (val) {
>>   case MODULE_STATE_COMING:
>> @@ -1767,6 +1773,7 @@ static int trace_module_notify(struct notifier_block 
>> *self,
>>   break;
>>   }
>>   mutex_unlock(&event_mutex);
>> + mutex_unlock(&trace_types_lock);
>>
>>   return 0;
>>  }
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 01/12] iommu/exynos: add missing cache flush for removed pagetable entries

2013-07-01 Thread Grant Grundler

-linux-arm (wrong email address - sorry)

On Mon, Jul 1, 2013 at 6:49 PM, Grant Grundler  wrote:
> On Tuesday, December 25, 2012 6:00:01 PM UTC-8, Cho KyongHo wrote:
>> This commit adds cache flush for removed small page and large page
>> entries in exynos_iommu_unmap(). Missing cache flush of removed
>> page table entries can cause missing page fault interrupt when a
>> master IP accesses an unmapped area.
>
> KyongHo,
> It appears this patch was never applied and got caught up in the
> device tree binding discussion. AFAICT, this patch is still necessary.
> Can you resubmit this patch separately. Or ok if I do?
>
> Original patch is here:
> https://patchwork.kernel.org/patch/1910261/
>
> thanks,
> grant
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2 1/1] mwifiex: add tx info to skb when forming mgmt frame

2013-07-01 Thread Harvey Yang

From: Huawei Yang 

In function 'mwifiex_write_data_complete' it need tx info to find
the mwifiex_private to updates statistics and wake up tx queues.
Or we may trigger tx queues timeout when transmitting lots of mgmt
frames.

Signed-off-by: Huawei Yang 
---
 drivers/net/wireless/mwifiex/cfg80211.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/wireless/mwifiex/cfg80211.c 
b/drivers/net/wireless/mwifiex/cfg80211.c
index e42b266..b4e2538 100644
--- a/drivers/net/wireless/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/mwifiex/cfg80211.c
@@ -186,6 +186,7 @@ mwifiex_cfg80211_mgmt_tx(struct wiphy *wiphy, struct 
wireless_dev *wdev,
struct sk_buff *skb;
u16 pkt_len;
const struct ieee80211_mgmt *mgmt;
+   struct mwifiex_txinfo *tx_info;
struct mwifiex_private *priv = mwifiex_netdev_get_priv(wdev->netdev);
 
if (!buf || !len) {
@@ -212,6 +213,10 @@ mwifiex_cfg80211_mgmt_tx(struct wiphy *wiphy, struct 
wireless_dev *wdev,
wiphy_err(wiphy, "allocate skb failed for management frame\n");
return -ENOMEM;
}
+   
+   tx_info = MWIFIEX_SKB_TXCB(skb);
+   tx_info->bss_num = priv->bss_num;
+   tx_info->bss_type = priv->bss_type;
 
mwifiex_form_mgmt_frame(skb, buf, len);
mwifiex_queue_tx_pkt(priv, skb);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting

2013-07-01 Thread Zheng Liu

On Mon, Jul 01, 2013 at 09:16:46AM -0700, Dave Hansen wrote:
> On 06/28/2013 07:20 PM, Zheng Liu wrote:
> >> > IOW, a process needing to do a bunch of MAP_POPULATEs isn't
> >> > parallelizable, but one using this mechanism would be.
> > I look at the code, and it seems that we will handle MAP_POPULATE flag
> > after we release mmap_sem locking in vm_mmap_pgoff():
> > 
> > down_write(&mm->mmap_sem);
> > ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
> > &populate);
> > up_write(&mm->mmap_sem);
> > if (populate)
> > mm_populate(ret, populate);
> > 
> > Am I missing something?
> 
> I went and did my same test using mmap(MAP_POPULATE)/munmap() pair
> versus using MADV_POPULATE in 160 threads in parallel.
> 
> MADV_POPULATE was about 10x faster in the threaded configuration.
> 
> With MADV_POPULATE, the biggest cost is shipping the mmap_sem cacheline
> around so that we can write the reader count update in to it.  With
> mmap(), there is a lot of _contention_ on that lock which is much, much
> more expensive than simply bouncing a cacheline around.

Thanks for your explanation.

FWIW, it would be great if we can let MAP_POPULATE flag support shared
mappings because in our product system there has a lot of applications
that uses mmap(2) and then pre-faults this mapping.  Currently these
applications need to pre-fault the mapping manually.

Regards,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] include/asm-generic/io.h: add dummy fuctions to support 'COMPILE_TEST' in 'asm-generic'.

2013-07-01 Thread Chen Gang

'asm-generic' need provide necessary configuration checking, if can't
pass checking, 'asm-generic' shouldn't implement it.

For 'COMPILE_TEST', according to its help contents, 'asm-generic' need
let it pass configuration checking, and provide related dummy contents
for it.

Part of 'COMPLE_TEST' help contents in "init/Kconfig":

  "...Despite they cannot be loaded there (or even when they load they cannot 
be used due to missing HW support)..."


Signed-off-by: Chen Gang 
---
 include/asm-generic/io.h |   22 ++
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index d5afe96..301ce80 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -303,13 +303,18 @@ static inline void *phys_to_virt(unsigned long address)
 /*
  * Change "struct page" to physical address.
  *
- * This implementation is for the no-MMU case only... if you have an MMU
- * you'll need to provide your own definitions.
+ * This for the no-MMU, or no-IOMEM but still try to COMPILE_TEST cases.
+ * if you have an MMU and IOMEM, you'll need to provide your own definitions.
  */
-#ifndef CONFIG_MMU
+#if !defined(CONFIG_MMU) || \
+   (!defined(CONFIG_HAS_IOMEM) && defined(CONFIG_COMPILE_TEST))
 static inline void __iomem *ioremap(phys_addr_t offset, unsigned long size)
 {
+#if !defined(CONFIG_MMU)
return (void __iomem*) (unsigned long)offset;
+#else
+   return NULL;
+#endif
 }
 
 #define __ioremap(offset, size, flags) ioremap(offset, size)
@@ -325,7 +330,7 @@ static inline void __iomem *ioremap(phys_addr_t offset, 
unsigned long size)
 static inline void iounmap(void __iomem *addr)
 {
 }
-#endif /* CONFIG_MMU */
+#endif /* !CONFIG_MMU || (!CONFIG_HAS_IOMEM && CONFIG_COMPILE_TEST) */
 
 #ifdef CONFIG_HAS_IOPORT
 #ifndef CONFIG_GENERIC_IOMAP
@@ -341,6 +346,15 @@ static inline void ioport_unmap(void __iomem *p)
 extern void __iomem *ioport_map(unsigned long port, unsigned int nr);
 extern void ioport_unmap(void __iomem *p);
 #endif /* CONFIG_GENERIC_IOMAP */
+#elif defined(CONFIG_COMPILE_TEST) /* CONFIG_HAS_IOPORT */
+static inline void __iomem *ioport_map(unsigned long port, unsigned int nr)
+{
+   return NULL;
+}
+
+static inline void ioport_unmap(void __iomem *p)
+{
+}
 #endif /* CONFIG_HAS_IOPORT */
 
 #ifndef xlate_dev_kmem_ptr
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: block layer softlockup

2013-07-01 Thread Dave Chinner

On Mon, Jul 01, 2013 at 01:57:34PM -0400, Dave Jones wrote:
> On Fri, Jun 28, 2013 at 01:54:37PM +1000, Dave Chinner wrote:
>  > On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
>  > > On Thu, Jun 27, 2013 at 3:18 PM, Dave Chinner  
> wrote:
>  > > >
>  > > > Right, that will be what is happening - the entire system will go
>  > > > unresponsive when a sync call happens, so it's entirely possible
>  > > > to see the soft lockups on inode_sb_list_add()/inode_sb_list_del()
>  > > > trying to get the lock because of the way ticket spinlocks work...
>  > > 
>  > > So what made it all start happening now? I don't recall us having had
>  > > these kinds of issues before..
>  > 
>  > Not sure - it's a sudden surprise for me, too. Then again, I haven't
>  > been looking at sync from a performance or lock contention point of
>  > view any time recently.  The algorithm that wait_sb_inodes() is
>  > effectively unchanged since at least 2009, so it's probably a case
>  > of it having been protected from contention by some external factor
>  > we've fixed/removed recently.  Perhaps the bdi-flusher thread
>  > replacement in -rc1 has changed the timing sufficiently that it no
>  > longer serialises concurrent sync calls as much
> 
> This mornings new trace reminded me of this last sentence. Related ?

Was this running the last patch I posted, or a vanilla kernel?

> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child1:7219]

> CPU: 0 PID: 7219 Comm: trinity-child1 Not tainted 3.10.0+ #38
.
> RIP: 0010:[]  [] 
> _raw_spin_unlock_irqrestore+0x67/0x80
.
>   
> 
>  [] blk_end_bidi_request+0x51/0x60
>  [] blk_end_request+0x10/0x20
>  [] scsi_io_completion+0xf3/0x6e0
>  [] scsi_finish_command+0xb0/0x110
>  [] scsi_softirq_done+0x12f/0x160
>  [] blk_done_softirq+0x88/0xa0
>  [] __do_softirq+0xff/0x440
>  [] irq_exit+0xcd/0xe0
>  [] smp_apic_timer_interrupt+0x6b/0x9b
>  [] apic_timer_interrupt+0x6f/0x80
>   

That's doing IO completion processing in softirq time, and the lock
it just dropped was the q->queue_lock. But that lock is held over
end IO processing, so it is possible that the way the page writeback
transition handling of my POC patch caused this.

FWIW, I've attached a simple patch you might like to try to see if
it *minimises* the inode_sb_list_lock contention problems. All it
does is try to prevent concurrent entry in wait_sb_inodes() for a
given superblock and hence only have one walker on the contending
filesystem at a time. Replace the previous one I sent with it. If
that doesn't work, I have another simple patch that makes the
inode_sb_list_lock per-sb to take this isolation even further

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

sync: serialise per-superblock sync operations

From: Dave Chinner 

When competing sync(2) calls walk the same filesystem, they need to
walk the list of inodes on the superblock to find all the inodes
that we need to wait for IO completion on. However, when multiple
wait_sb_inodes() calls do this at the same time, they contend on the
the inode_sb_list_lock and the contention causes system wide
slowdowns. In effect, concurrent sync(2) calls the take longer and
burn more CPU than if they were serialised.

Stop the worst of the contention by adding a per-sb mutex to wrap
around sync_inodes_sb() so that we only execute one sync(2)
operation at a time per superblock and hence mostly avoid
contention.

Signed-off-by: Dave Chinner 
---
 fs/fs-writeback.c  |9 -
 fs/super.c |1 +
 include/linux/fs.h |2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 996f91a..4d7a90c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1353,7 +1353,12 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
  * @sb: the superblock
  *
  * This function writes and waits on any dirty inode belonging to this
- * super_block.
+ * super_block. The @s_sync_lock is used to serialise concurrent sync 
operations
+ * to avoid lock contention problems with concurrent wait_sb_inodes() calls.
+ * This also allows us to optimise wait_sb_inodes() to use private dirty lists
+ * as subsequent sync calls will block waiting for @s_sync_lock and hence 
always
+ * wait for the inodes in the private sync lists to be completed before they do
+ * their own private wait.
  */
 void sync_inodes_sb(struct super_block *sb)
 {
@@ -1372,10 +1377,12 @@ void sync_inodes_sb(struct super_block *sb)
return;
WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
+   mutex_lock(&sb->s_sync_lock);
bdi_queue_work(sb->s_bdi, &work);
wait_for_completion(&done);
 
wait_sb_inodes(sb);
+   mutex_unlock(&sb->s_sync_lock);
 }
 EXPORT_SYMBOL(sync_inodes_sb);
 
diff --git a/fs/super.c b/fs/super.c
index 7465d43..887bfbe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -181,6 +181,7 @@ static struct super_block *alloc_super(struct 
file_system_type *type, int flags)
INIT_HL

Re: linux-next: build failure after merge of the powerpc tree

2013-07-01 Thread Benjamin Herrenschmidt

On Tue, 2013-07-02 at 10:54 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the powerpc tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> fs/pstore/ftrace.c: In function 'pstore_ftrace_call':
> fs/pstore/ftrace.c:47:6: warning: passing argument 7 of 'psinfo->write_buf' 
> makes integer from pointer without a cast [enabled by default]
>   sizeof(rec), psinfo);
>   ^
> fs/pstore/ftrace.c:47:6: note: expected 'size_t' but argument is of type 
> 'struct pstore_info *'
> fs/pstore/ftrace.c:47:6: error: too few arguments to function 
> 'psinfo->write_buf'
> 
> Caused by commit 6bbbca735936 ("pstore: Pass header size in the pstore
> write callback").
> 
> I have used the version from next-20130701 for today.

Interestingly enough I didn't see that when testing a x86_64 build, I
might have failed to test with ftrace enabled.

Aruna, please send a fix ASAP.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/9] cciss: rework pci pm related code for simplification

2013-07-01 Thread Yijing Wang

Hi Jens,
   Sorry to disturb you, do you have any comments for this patch?

Thanks!
Yijing.

On 2013/6/18 16:19, Yijing Wang wrote:
> Use pci core pm interface to simplify code.
> 
> Signed-off-by: Yijing Wang 
> Cc: Mike Miller 
> Cc: iss_storage...@hp.com
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/block/cciss.c |   16 +++-
>  1 files changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
> index 62b6c2c..18da685 100644
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -4528,9 +4528,6 @@ static int cciss_message(struct pci_dev *pdev, unsigned 
> char opcode,
>  static int cciss_controller_hard_reset(struct pci_dev *pdev,
>   void * __iomem vaddr, u32 use_doorbell)
>  {
> - u16 pmcsr;
> - int pos;
> -
>   if (use_doorbell) {
>   /* For everything after the P600, the PCI power state method
>* of resetting the controller doesn't work, so we have this
> @@ -4548,8 +4545,7 @@ static int cciss_controller_hard_reset(struct pci_dev 
> *pdev,
>* this causes a secondary PCI reset which will reset the
>* controller." */
>  
> - pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
> - if (pos == 0) {
> + if (!pdev->pm_cap) {
>   dev_err(&pdev->dev,
>   "cciss_controller_hard_reset: "
>   "PCI PM not supported\n");
> @@ -4557,18 +4553,12 @@ static int cciss_controller_hard_reset(struct pci_dev 
> *pdev,
>   }
>   dev_info(&pdev->dev, "using PCI PM to reset controller\n");
>   /* enter the D3hot power management state */
> - pci_read_config_word(pdev, pos + PCI_PM_CTRL, &pmcsr);
> - pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> - pmcsr |= PCI_D3hot;
> - pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr);
> + pci_set_power_state(pdev, PCI_D3hot);
>  
>   msleep(500);
>  
>   /* enter the D0 power management state */
> - pmcsr &= ~PCI_PM_CTRL_STATE_MASK;
> - pmcsr |= PCI_D0;
> - pci_write_config_word(pdev, pos + PCI_PM_CTRL, pmcsr);
> -
> + pci_set_power_state(pdev, PCI_D0);
>   /*
>* The P600 requires a small delay when changing states.
>* Otherwise we may think the board did not reset and we bail.
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 697 matches

Mail list logo