date:20130403

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-03 Thread Jacob Pan

On Wed, 3 Apr 2013 09:30:52 -0700
Greg KH  wrote:

> On Tue, Apr 02, 2013 at 05:17:14PM -0700, Jacob Pan wrote:
> > On Tue, 2 Apr 2013 16:48:05 -0700
> > Greg KH  wrote:
> > 
> > > On Tue, Apr 02, 2013 at 04:33:57PM -0700, Jacob Pan wrote:
> > > > On Tue, 2 Apr 2013 16:00:42 -0700
> > > > Greg KH  wrote:
> > > > 
> > > > > > +#include "intel_rapl.h"
> > > > > > +#include "../../../fs/sysfs/sysfs.h"  
> > > > > 
> > > > > WTF?
> > > > > 
> > > > > Oh, that's a sure sign you are not doing something properly,
> > > > > if you think it's ok to muck around with the internals of
> > > > > sysfs.
> > > > > 
> > > > > There's a reason that file is "private", why do you think
> > > > > it's ok to use it directly?  Did you just think that I
> > > > > somehow "forgot" to put it in the proper include directory?
> > > > I did feel unsure about this but i saw some precedence in the
> > > > kernel.
> > > 
> > > Someone else is doing this with the sysfs api?  I don't see any
> > > other code in Linus's tree doing this at the moment, where did
> > > you see this? Let me know and I'll fix it up right away.
> > > 
> > no, i did not mean sysfs api. I mean include internal header files
> > via #include ../../ 
> > e.g.in drivers/usb/image/microtek.c
> > 
> > #include "../../scsi/scsi.h"
> > #include 
> 
> That is because this is a scsi host driver.  Your code is not part of
> sysfs itself.
> 
> > > > Anyway, I needed a way to validate a userspace file passed to
> > > > rapl driver belong to the same sysfs directory. I will look for
> > > > alternative ways.
> > > 
> > > What do you mean by this?  What exactly are you trying to do?  No
> > > normal driver code should _ever_ call sysfs functions directly,
> > > nor should they ever care about sysfs internals.
> > > 
> > i did not call sysfs internal calls, just need to use 
> > struct sysfs_dirent {}
> > 
> > to do the following sanity check against user passed event control
> > file, it is still not a 100% strong check. 
> > /* check if the cfile belongs to the same rapl domain */
> > if (strcmp(rd->kobj.sd->s_name,
> > cfile->f_dentry->d_parent->d_name.name)) {
> > pr_debug("cfile does not belong to domain %s\n",
> > rd->kobj.sd->s_name);
> > ret = -EINVAL;
> > goto exit_cleanup_fds;
> > }
> 
> This made it through a code review at Intel?  Seriously?  Come on,
> there's just so much wrong here, I don't know where to begin.
> 
> Hint, if you find yourself caring about the internals of sysfs in a
> device driver, you are doing something so wrong it's not funny.  Do
> you see _any_ other driver doing anything like this?  What makes this
> driver so special that it can do unexpected, and totally different
> things with sysfs?
> 
I admit that my knowledge in this area are limited. I appreciate your
help and straightforward comments.

Perhaps the reason is that not many drivers use eventfd and its way to
arm event thresholds. The userspace passed an eventfd, a file
descriptor, and a threshold value to the RAPL driver. In order to
authenticate the control file descriptor is a valid, I need to check
1. the control file is capable of generating events, e.g. it can be an
energy counter but not a constant power_limit1
2. the control file belongs to the same RAPL domain since the name are
reused in all RAPL domains. e.g. all domains have energy counter and
events are per domain.

I used the sysfs directory check for #2. It is just an extra check. But
I think can drop that check and user pick events based on its name
string. The fact that user writes to the per domain control file
implies the domain specificness of the event.

In similar eventfd usage in cgroup, it has its own fs so i does the
check based on file ops of the control files.

> > > And, odds are, you didn't test your code as a module, right, as
> > > any internal sysfs function that you could get from this .h file,
> > > wouldn't be exported for a module to use, unless I missed one
> > > somewhere?
> > > 
> > I did run the driver as module since i didn't use sysfs internal
> > functions, just the struct. I may be hitting a corner case here, but
> > for drivers who need to discover sysfs hierarchy would it be useful
> > to expose some info in struct sysfs_dirent{}?
> 
> No, not at all, why would a driver ever care about that?  Somehow we
> have gotten by for the past 10+ years without needing it, why is your
> driver so different than the thousands of other Linux drivers?
> 
> greg k-h

ditto, I will drop that extra check.

-- 
Thanks,

Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-03 Thread Alex Williamson

On Tue, 2013-04-02 at 18:18 +0200, Joerg Roedel wrote:
> Cc'ing Alex Williamson
> 
> Alex, can you please review the iommu-group part of this patch?

Sure, it looks pretty reasonable.  AIUI, all PCI devices are below some
kind of host bridge that is either new and supports partitioning or old
and doesn't.  I don't know if that's a visibility or isolation
requirement, perhaps PCI ACS-ish.  In the new host bridge case, each
device gets a group.  This seems not to have any quirks for
multifunction devices though.  On AMD and Intel IOMMUs we test
multifunction device ACS support to determine whether all the functions
should be in the same group.  Is there any reason to trust multifunction
devices on PAMU?

I also find it curious what happens to the iommu group of the host
bridge.  In the partitionable case the host bridge group is removed, in
the non-partitionable case the host bridge group becomes the group for
the children, removing the host bridge.  It's unique to PAMU so far that
these host bridges are even in an iommu group (x86 only adds pci
devices), but I don't see it as necessarily wrong leaving it in either
scenario.  Does it solve some problem to remove them from the groups?
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arm64: Fix task tracing

2013-04-03 Thread Christopher Covington

For accurate accounting call contextidr_thread_switch before a
task is scheduled, rather than after.

Signed-off-by: Christopher Covington 
---
 arch/arm64/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 0337cdb..c2cc249 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -311,11 +311,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
fpsimd_thread_switch(next);
tls_thread_switch(next);
hw_breakpoint_thread_switch(next);
+   contextidr_thread_switch(next);
 
/* the actual thread switch */
last = cpu_switch_to(prev, next);
 
-   contextidr_thread_switch(next);
return last;
 }
 
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by the Linux Foundation.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 00/14] Tegra114 clockframework

2013-04-03 Thread Stephen Warren

On 04/03/2013 08:40 AM, Peter De Schrijver wrote:
> This is the nineth version of the Tegra114 clockframework. It is based on the
> next-20130320-fixed branch of
> git://nv-tegra.nvidia.com/user/swarren/linux-2.6.git,

That's not a particularly useful base; these patches won't be applied to
linux-next, but rather to Tegra's for-3.10/clk branch, with the clock
for-next tree merged in to pick up "clk: add table lookup to mux".

> http://patchwork.ozlabs.org/patch/229972/ and

That is "clk: tegra: Don't enable PLLs during early boot". That was
rejected. I assume this isn't really needed given your changelog entry
"Fixed clock initialization for audio", but rather my "clk: tegra: defer
application of init table" should be applied instead?

> http://patchwork.ozlabs.org/patch/229978/ and

For reference, that is "clk: add table lookup to mux", which is in the
clock for-next tree.

> http://patchwork.ozlabs.org/patch/233415/

That is Joseph's "clocksource: tegra: enable arch_timer". Is this
actually a dependency, or an optimization, or ...?

Rob Herring objected to that patch since it isn't necessary if his
series "ARM arch, sp804 and integrator timer CLKSRC_OF support" is applied.

If I apply just your (Peter's) CCF series without Joseph's or Rob's
patches, will everything still work, or is this a hard dependency to
create a working system?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-03 Thread John Stultz


On 04/03/2013 07:22 AM, Stephane Eranian wrote:

On Wed, Apr 3, 2013 at 4:14 PM, David Ahern  wrote:

On 4/3/13 8:00 AM, Stephane Eranian wrote:

Why not have perf convert its
perf_clock timestamps into monotonic or realtime when dumping events?


So this is exactly what I've been wondering through all this.

Perf can keep track of events using its own time domain (which is 
understandably required due to performance and locking issues), but when 
exporting those timestamps to userland, could it not do the same (likely 
imperfect) conversion to existing userland time domains (like 
CLOCK_MONOTONIC)?




Can monotonic timestamps be obtained from NMI context in the kernel?


I don't understand the context of the question.

I am not suggesting perf_clock be changed. I am working on correlating
existing perf_clock timestamps to clocks typically used by apps (REALTIME
and time-of-day but also applies to MONOTONIC).


But for that, you'd need to expose to users the correlation between
the two clocks.
And now you'd fixed two clock sources definitions not just one.


I'm not sure I follow this. If perf exported data came with 
CLOCK_MONOTONIC timestamps, no correlation would need to be exposed.  
perf would just have to do the extra overhead of doing the conversion on 
export.




You are wanting the reverse -- have apps emit perf_clock timestamps. I was
just wondering what is the advantage of this approach?


Well, that's how I interpreted your question ;-<

If you could have perf_clock use monotonic then we would not have this
discussion.
The correlation would be trivial.


I think the suggestion is not to have the perf_clock use 
CLOCK_MONOTONIC,  but the perf interfaces export CLOCK_MONOTONIC.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pnp: extend char array field in pnp_fixup structure

2013-04-03 Thread Masatake YAMATO

The length of id field of pnp_fixup structure is 7:

 struct pnp_fixup {
char id[7];
void (*quirk_function) (struct pnp_dev * dev);  /* fixup 
function */
 }

In other hand the field is initialized with a constant cstring
consisting of 7 characters in pnp_fixups defined in drivers/pnp/quirks.c:

static struct pnp_fixup pnp_fixups[] = {
/* Soundblaster awe io port quirk */
{"CTL0021", quirk_awe32_resources},
{"CTL0022", quirk_awe32_resources},

The constant cstring is too large to store; no space for nul char.

If the id field is just used as byte array, there is no problem.
However, it is used as c string in pnp_fixup_device function:

pnp_dbg(&dev->dev, "%s: calling %pF\n", f->id,
f->quirk_function);

Signed-off-by: Masatake YAMATO 
---
 include/linux/pnp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/pnp.h b/include/linux/pnp.h
index 195aafc..d734ee2 100644
--- a/include/linux/pnp.h
+++ b/include/linux/pnp.h
@@ -295,7 +295,7 @@ static inline void pnp_set_drvdata(struct pnp_dev *pdev, 
void *data)
 }
 
 struct pnp_fixup {
-   char id[7];
+   char id[8];
void (*quirk_function) (struct pnp_dev * dev);  /* fixup function */
 };
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 10/14] ARM: tegra: Define Tegra114 CAR binding

2013-04-03 Thread Stephen Warren

On 04/03/2013 08:40 AM, Peter De Schrijver wrote:
> The device tree binding models Tegra114 CAR (Clock And Reset) as a single
> monolithic clock provider.

> diff --git a/Documentation/devicetree/bindings/clock/nvidia,tegra114-car.txt 
> b/Documentation/devicetree/bindings/clock/nvidia,tegra114-car.txt

> +Mux clocks
> +
> +  300audio0_mux
> +  301audio1_mux
> +  302audio2_mux
> +  303audio3_mux
> +  304audio4_mux
> +  305spdif_mux
> +  306clk_out_1_mux
> +  307clk_out_2_mux
> +  308clk_out_3_mux
> +  309dsia_mux
> +  310dsib_mux

I think Prashant was expecting those to be removed, and for this series
to rely on "clk: Add composite clock type" instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-03 Thread John Stultz


On 04/03/2013 10:35 AM, Pawel Moll wrote:

On Wed, 2013-04-03 at 18:29 +0100, John Stultz wrote:

On 04/03/2013 10:19 AM, Pawel Moll wrote:

On Tue, 2013-04-02 at 17:19 +0100, John Stultz wrote:

But if we're going to have to do
this via a clockid, I'm going to want it to be done via a dynamic posix
clockid, so its clear its tightly tied with perf and not considered a
generic interface (and I can clearly point folks having problems to the
perf maintainers ;).

Ok, so how about the code below?

There are two distinct parts of the "solution":

1. The dynamic posix clock, as you suggested. Then one can get the perf
timestamp by doing:

clock_fd = open("/dev/perf-clock", O_RDONLY);
clock_gettime(FD_TO_CLOCKID(clock_fd), &ts)

2. A sort-of-hack in the get_posix_clock() function making it possible
to do the same using the perf event file descriptor, eg.:

fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
clock_gettime(FD_TO_CLOCKID(fd), &ts)

#2 makes my nose wrinkle.

To make myself clear: I consider the code as it is a hack.


Forgive me for being somewhat ignorant on the
perf interfaces, but why is the second portion necessary or beneficial?

My thinking: the perf syscall returns a file descriptor already, so it
would make sense to re-use it in the clock_gettime() call instead of
jumping through loops to open a character device file, which may not
exist at all (eg. no udev) or may be placed or named in a random way
(eg. some local udev rule).

I'm open for different opinions :-)


Cc'ing Richard for his thoughts here.


I get the reasoning around reusing the fd we already have, but is the 
possibility of a dynamic chardev pathname really a big concern?


I'm guessing the private_data on the perf file is already used?

Maybe can we extend the dynamic posix clock code to work on more then 
just the chardev? Although I worry about multiplexing too much 
functionality on the file.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 0/9] uretprobes: Return uprobes implementation

2013-04-03 Thread Oleg Nesterov

On 04/03, Anton Arapov wrote:
>
> Anton Arapov (9):
>   uretprobes: Introduce uprobe_consumer->ret_handler()
>   uretprobes: Reserve the first slot in xol_vma for trampoline
>   uretprobes/x86: Hijack return address
>   uretprobes/ppc: Hijack return address
>   uretprobes: Return probe entry, prepare_uretprobe()
>   uretprobes: Return probe exit, invoke handlers
>   uretprobes: Limit the depth of return probe nestedness
>   uretprobes: Remove -ENOSYS as return probes implemented
>   uretprobes: Documentation update
>
>  Documentation/trace/uprobetracer.txt | 126 +-
>  arch/powerpc/include/asm/uprobes.h   |   1 +
>  arch/powerpc/kernel/uprobes.c|  13 +++
>  arch/x86/include/asm/uprobes.h   |   1 +
>  arch/x86/kernel/uprobes.c|  29 +
>  include/linux/uprobes.h  |   7 ++
>  kernel/events/uprobes.c  | 202 
> +--
>  7 files changed, 320 insertions(+), 59 deletions(-)

Looks fine to me. I am going to add this to
git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc uprobes/core

Ananth. "4/9 uretprobes/ppc" looks "obviously correct", but could you
please review and ack/nack ?

To remind, this series depends on other changes in uprobes core
and kernel/trace/trace_uprobe.c, please see the full stat below.

I'll wait for the comments a bit, after that I'll ask Ingo to
pull if nobody objects.

Anton Arapov (9):
  uretprobes: Introduce uprobe_consumer->ret_handler()
  uretprobes: Reserve the first slot in xol_vma for trampoline
  uretprobes/x86: Hijack return address
  uretprobes/ppc: Hijack return address
  uretprobes: Return probe entry, prepare_uretprobe()
  uretprobes: Return probe exit, invoke handlers
  uretprobes: Limit the depth of return probe nestedness
  uretprobes: Remove -ENOSYS as return probes implemented
  uretprobes: Documentation update

Oleg Nesterov (15):
  uprobes: Turn copy_opcode() into copy_from_page()
  uprobes: Change __copy_insn() to use copy_from_page()
  uprobes: Kill the unnecesary filp != NULL check in __copy_insn()
  uprobes: Introduce copy_to_page()
  uprobes: Change write_opcode() to use copy_*page()
  uprobes/tracing: Kill the pointless task_pt_regs() calls
  uprobes/tracing: Kill the pointless seq_print_ip_sym() call
  uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
  uprobes/tracing: Generalize struct uprobe_trace_entry_head
  uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
  uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
  uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
  uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
  uprobes/tracing: Make seq_printf() code uretprobe-friendly
  uprobes/tracing: Change create_trace_uprobe() to support uretprobes

 Documentation/trace/uprobetracer.txt |  114 +---
 arch/powerpc/include/asm/uprobes.h   |1 +
 arch/powerpc/kernel/uprobes.c|   13 ++
 arch/x86/include/asm/uprobes.h   |1 +
 arch/x86/kernel/uprobes.c|   29 
 include/linux/uprobes.h  |7 +
 kernel/events/uprobes.c  |  251 --
 kernel/trace/trace.h |5 -
 kernel/trace/trace_uprobe.c  |  205 
 9 files changed, 480 insertions(+), 146 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/Resend 2/2] arm: mach-omap2: prevent UART console idle on suspend while using "no_console_suspend"

2013-04-03 Thread Kevin Hilman

Sourav Poddar  writes:

> Hi Kevin,
> On Wednesday 20 March 2013 05:36 PM, Sourav Poddar wrote:
>> Realised the list  to whom the patch was send got dropped. Ccing
>> them all..
>> On Wednesday 20 March 2013 05:18 PM, Sourav Poddar wrote:
>>> Hi Kevin,
>>> On Tuesday 19 March 2013 12:24 AM, Kevin Hilman wrote:
 Sourav Poddar  writes:

> With dt boot, uart wakeup after suspend is non functional on
> omap4/5 while using
> "no_console_suspend" in the bootargs. With "no_console_suspend"
> used, od->flags
> should be ORed with "OMAP_DEVICE_NO_IDLE_ON_SUSPEND", thereby not
> allowing the console
> to idle in the suspend path. For non-dt case, this was taken care
> by platform data.
>
> Tested on omap5430evm, omap4430sdp.
>
> Cc: Santosh Shilimkar
> Cc: Felipe Balbi
> Cc: Rajendra nayak
> Signed-off-by: Sourav Poddar
 This patch creates a dependency between omap_device (generic,
 device-independent code) and a specific driver (UART.)

 If you need to do something like this that's DT boot specific, then
 we probably need some late initcall in serial.c to handle this.
 It does
 not belong in omap_device.

>>> The following function "omap_device_disable_idle_on_suspend(pdev)"
>>> should only
>>> be called once the omap device has been build, which in the case of
>>> device tree is
>>> done in omap_device.c file. Moreover, the above call should be
>>> executed conditionally
>>> and should depend on the following two parameter.
>>>
>>> [1]  a. Whether "no_console_suspend" is set and
>>>  b.  the device build is a console uart.
>>>
>>> When I look closely into the serial.c file, I realised that
>>> "core_initcall(omap_serial_early_init)" gets called irrespective
>>> of dt/non dt boot and will take care of most of the stuff(checking
>>> whether
>>> "no_console_suspend" is used and which uart is used as a console
>>> uart) which the
>>> $subject patch is proposing.
>>>
>>> But the problem is that we need to exchange the parsed information
>>> from serial.c to the omap_device file for the condtional execution of
>>> "omap_device_disable_idle_on_suspend"
>>>
>>> In this case,
>>> from "serial.c" we need
>>> 1. no_console_suspend = true
>>> 2. strcpy(console_name, oh_name), where oh_name corresponds to
>>> the console uart.
>>>
>>> then in "omap_device.c" do
>>> if (no_console_suspend && !strcmp(oh->name, console_name))
>>> omap_device_disable_idle_on_suspend(pdev);
>>>
>>> Please correct if I am understanding it incorrectly.
>>>
>>> If the above understanding looks good to you, is there a way we can
>>> make this
>>> exchange of information happen between serial.c and omap_device.c file?
> Any input on this?
> As I explained earlier, that there is a need to parse information in
> serial.c and use that in
> omap_device.c only after the device is build.

As I explained earlier, any device specific hacks inside omap_device
should be a red flag that something has gone wrong.

How about fixing the UART driver/core to not runtime suspend if
no_console_suspend is given?  

Then we can get rid of this no_idle_on_suspend hack all together since
UART is the only remaining user.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low

2013-04-03 Thread Vivek Goyal

On Wed, Apr 03, 2013 at 10:32:23AM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu  wrote:
> > On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal  wrote:
> >
> >> - implement crashkernel_no_auto_low option to opt out of auto reserved
> >>   low memory
> >
> > No, that is ugly.
> ...
> >
> > It's *you* want me to change "Crash kernel low" to "Crash kernel".
> >
> > Do we need to drop second patch? So will still keep
> > "Crash kernel low" in /proc/iomem?
> 
> also we can drop the last patch and keep "crashkernel_high=" and
> "crashkernel_low="

as hpa mentioned, we should express memory reservation and dependency
of it in crashkernel= options. So introducing crashkernel_high or
crashkernel_low, just because you we don't want to support multiple
ranges is a kludge.

> 
> as you even like to introduce "crashkernel_no_auto_low".

This is a kludge too for ease of use. At least it does not spoil 
crashkernel= space and also works with existing crashkernel=X
parameters.

You know what, I think multiple ranges has another problem. And that is
all of the kexec/kdump code is written thinking there is one contiguous
reserved range.

/* Verify we have a valid entry point */
if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
result = -EADDRNOTAVAIL;
goto out;
}

Also look at crash_shrink_memory().

So what I am saying that all our code is written assuming there is one
single reserved range. Now if we need to reserve two ranges, then let
us make it generic to suppoprt multiple ranges instead of hardcoding
things and assume there can be 2 ranges. That will be a more generic
solution.

So how about this.

- In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
  memory. Support reservation of single range only. It could be either
  high or low.

- Those who are using iommu, they can use crashkernel=X;high. Old code
  can continue to use crashkernel=X and get memory reserved in low
  memory areas.

- In 3.10 add a feature to support multiple crash reserved ranges.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] regulator: lp8788: Implement list_voltage for lp8788_ldo_voltage_fixed_ops

2013-04-03 Thread Mark Brown

On Mon, Apr 01, 2013 at 11:28:48PM +0800, Axel Lin wrote:
> For fixed voltage, we can just set min_uV and use 
> regulator_list_voltage_linear
> for list_voltage callback. Regulator core will call list_voltage(rdev, 0) if
> both get_voltage get_voltage_sel are not implemented. Thus we can also remove
> lp8788_ldo_fixed_get_voltage() function.

Applied both, thanks.


signature.asc
Description: Digital signature

Re: [PATCH 1/4] regulator: ab8500: Remove set_voltage_time_sel and delay to fix build error

2013-04-03 Thread Mark Brown

On Wed, Apr 03, 2013 at 10:30:43AM +0800, Axel Lin wrote:
> The delay field is removed from struct ab8500_regulator_info in commit 
> 29234928
> "regulator: ab8500: Remove set_voltage_time_sel and delay setting".

I've now finished applying all Lee's outstanding patches (which weren't
sent to the list...) - could you please recheck if the first two of
these patches are still revlevant and that the last two are addressed
properly as per Lee's mails?  Thanks.

signature.asc
Description: Digital signature

[PATCH] gpio: pca953x: fix irq_domain_add_simple usage

2013-04-03 Thread Alexandre Belloni

We actually have to pass chip as the host_data parameter of
irq_domain_add_simple() as later on, it is used to initialize chip_data
in pca953x_gpio_irq_map(). Failing to do so is leading to a NULL pointer
dereference after calling irq_data_get_irq_chip_data() in
pca953x_irq_mask(), pca953x_irq_unmask(), pca953x_irq_bus_lock(),
pca953x_irq_bus_sync_unlock() and pca953x_irq_set_type().

Fixes regression introduced by commit
0e8f2fdacf1d44651aa7e57063c76142d1f4988b (gpio: pca953x: use simple
irqdomain)

Signed-off-by: Alexandre Belloni 
---

This would probably be good to introduce that patch in the next 3.9 rc as the
regression leads to a lot of breakage (and cursing).

 drivers/gpio/gpio-pca953x.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index e2dcac5..ea26e59 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -575,7 +575,7 @@ static int pca953x_irq_setup(struct pca953x_chip *chip,
chip->gpio_chip.ngpio,
irq_base,
&pca953x_irq_simple_ops,
-   NULL);
+   chip);
if (!chip->domain)
return -ENODEV;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -next] mxs/spi: fix error return code in mxs_spi_probe()

2013-04-03 Thread Mark Brown

On Wed, Apr 03, 2013 at 09:06:40PM +0800, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fix to return a negative error code from the error handling
> case instead of 0, as returned elsewhere in this function.

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low

2013-04-03 Thread Vivek Goyal

On Wed, Apr 03, 2013 at 10:12:46AM -0700, Yinghai Lu wrote:

[..]
> >> Can we just keep it separated?
> >
> > Kernel does not know about old kexec-tools or new kexec-tools. Neither
> > kernel can enforce what command line options are passed by user. So
> > kernel needs to define a clean interface here which is easily understood
> > and is extensible also in future.
> 
> Looks you are chasing wrong direction.

I don't think so. Once we are defining bunch of crashkernel= parameters
we need to make sure it is well understood how do they work together.

> 
> Those four patches fixes the regression that Wang and you reported,
> User don't need to change their kexec-tools and boot command lines
> kdump still works.

I am not objecting to that. I am objecting to introducing unexpected
behavior in crashkernel= command line space and ignoring how does
it co-exist with existing syntax.

> 
> We will never can stop user doing crazy thing with their system.

Yes, but we need to make sure if bunch of crashkernel= are passed on
kernel command line then behavior is well defined and extensible for
future usage.

[..]
> > But why are we tying ;low to ;high. One should be easily extend
> > crashkernel=X to be able to reserve memory above 4G if specified amount
> > is not available below 4G. In that case also one might want to reserve
> > some low memory?
> 
> I want to keep crashkernel=X to the old behavior.
> 
> If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
> will not work with new kernel.

It does not work anyway. Because current crashkernel=X will fail to
reserve memory if sufficient memory is not available below 896MB. So no
surprises there.

It will help new kexec-tools continue to work with crashkernel=X even
in high memory ranges.

> >
> > For that matter crashkernel=range1:size,range2:size syntax should be
> > extendible too to reserve memory above 4G if desired size of memory
> > is not available in low memory.
> >
> > Now in those cases too, one would like to have 72M of low memory
> > reserved. So ;low shoud not be tied to ;high necessarily.
> >
> > In fact current code does not care whetehr ;high was specified or not.
> > If memory is reserved above 4G, ;low code will kick in.
> 
> No, that is not right.
> 
> only when ;high is specified, kernel will try to allocate high above 4G.

As of today. But one can always extend crashkernel=X to allocate from
high addresses if memory is not available in low addresses without
breaking old tools.

Current ;low logic does not care whether high reservation was done using
crashkernel=X or crashkernel=X;high. And it should not. Tying ;low with
;high is the problem. Each crashkernel= directive should be able to 
specify its own range to reserve with constraints. There is no dependency
on other crashkernel= options passed. And that keeps the behavior well
defined.

If we start introducing dependency between various crashkernel= options
our behavior matrix will explode and it will be very difficult to explain
how does it work.

[..]
> > We really need to stick to the notion of only one crashkernel= option
> > is accepted and that is last one on command line. And if need be,
> > we need to work on multi range reservation feature where we process
> > and reserve ranges as specified by all crashkernel= parameters on
> > command line.
> 
> That is kept.
> 
> and only last high is honored

What about rest of the crashkernel=. I am not sure why are you not
seeing that you are stepping onto to already defined crashkernel=
command line option and breaking its semantics.

If you were defining crashkernel_foo, I couldn't care less.

Given the fact you are using crashkernel=, you need to take already
defined parameters in to consideration and stick to those semantics.

- Either support single range reservation and always use rightmost
  crashkernel= option.

- Or support multiple ranges and process all the crashkernel= options
  as specified on command line.

Please don't define more modes here.

> 
> >
> > Creating new combinations where some crashkernel= are preferred over
> > others and some crashkernel= options work with only selected crashkernel=
> > options, is asking for trouble, especially keeping in mind future
> > extensions.
> 
> I don't think so.
> 
> old conf that works before still use crashkernel= with high and low.
> old conf that does not work, could switch to crashkernel=;high/low
> with new kexec-tools

Please come out of this old conf and new conf mode. That is specific
usage of interface you are providing. But interface semantics should
still be well defined. And currently they are not.

> 
> >
> > I prefer following for 3.9.
> >
> > - process only right most crashkernel= option.
> 
> what is "right most" ?
> only last crashkernel=X is honored?
> I restored that already with those four patches.

only last crashkernel= is honored. It could be either crashkernel=X
or crashkernel=X;high or crashkernel=X;;;

Point being crashkernel= space

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-03 Thread Jacob Pan

On Wed, 3 Apr 2013 09:35:09 -0700
Greg KH  wrote:

> On Tue, Apr 02, 2013 at 09:48:18PM -0700, Jacob Pan wrote:
> > > Let's step back and start over, what exactly are you trying to
> > > tell userspace?  What data do you have that you need to express
> > > to it?  How do you want userspace to see/use it?
> > 
> > It is a good idea to step back and let me explain what I wanted to
> > do here for userspace.
> > 
> > I have two kinds of applications that might use this driver.
> > 1. simple use case where user sets a power limit for a RAPL domain.
> > e.g. set graphics unit power limit to 7w
> > 2. advanced use case where use can do fine tuning on top of simple
> > power limit,e.g. the dynamic response parameters of power control
> > logic, event notifications, etc.
> > 
> > For #1, this driver register with the abstract generic thermal layer
> > (/sys/class/thermal) and presents itself as a set of cooling devices
> > with a single knob per domain for power limits.
> > root@chromoly:/sys/class/thermal/cooling_device15# echo 7000 >
> > cur_state 
> 
> Great, how about submitting that functionality as patch 1 of your
> series?  That seems like a very "normal" thermal driver, right?
> 
yes, that would be a normal thermal cooling device driver. I will do
that first. Thanks for the suggestion.
> > For #2, to give userspace complete control of the RAPL interface,
> > which is not generic, I put them under the device private sysfs
> > area. root@chromoly:/sys/class/thermal/cooling_device15/device#
> > echo 1000 > time_window1 
> 
> I totally fail to understand the difference.  What do you want to show
> to userspace that can't be expressed through the thermal interface
> today?
The difference is one single knob (long term power limit) in the thermal
interface vs multiple fine grained control and data in the complete RAPL
interface.

Here is what a complete RAPL interface for package domain looks like.
root@chromoly:/sys/class/thermal/cooling_device15/device# grep . *
domain_name:package
energy:22396031
lock:0
max_power:0
max_window:0
min_power:0
pl1_clamp:0
pl1_enable:1
pl2_clamp:0
pl2_enable:1
power:7841
power_limit1:25000
power_limit2:31250
thermal_spec_power:17000
throttle_time: 
time_window1:28000
time_window2:0


>  Perhaps the thermal interface could be expanded to provide
> more functionality that you need?
yes, some of them such as limits. But not all the data in the list
above are suitable for thermal interface. That is why I am trying to
balance between abstracted generic data and RAPL specific data while
still allow linking between the two.

The way I envisioned how a thermal/power management app would use is:
1. go through generic thermal layer sysfs and find available RAPL
domains
2. if the app wants to do more fine grained control, it follows the
device symlink to locate the RAPL domain specific sysfs area.

>  Why create a one-off API that will
> never be used again and require userspace programs to be written just
> to handle this one type of device?
> 
why is that a one-off API? RAPL interface is maintained identical across
Intel CPUs after Sandy Bridges. I agree with you that it is still one
type of device with some of its data unique. Should i create a RAPL
class device?

> > As you mentioned about using device tree vs. fs, and how kobject are
> > used for fs. I do have the need to go between a generic thermal
> > sysfs and the true device tree. This is the reason why I used
> > kobjects and link them between device tree and its thermal sysfs
> > representation.
> 
> I don't understand your leap to using kobjects.
> 
I use kobjects mainly for its symlink to allow userspace locate the
'true' device behind generic thermal layers' cooling device.


> > e.g. a RAPL package cooling device linked with its platform device
> > kobj. (device is linked with rapl_domains/package, the line is too
> > long)
> > 
> > root@chromoly:/sys/class/thermal# ls -l cooling_device15/
> > total 0
> > -rw-r--r-- 1 root root 4096 Apr  2 15:03 cur_state
> > lrwxrwxrwx 1 root root0 Apr  2 21:28 device
> > -> ../../../platform/intel_rapl/rapl_domains/package
> > -r--r--r-- 1 root root 4096 Apr  2 15:03 max_state
> > drwxr-xr-x 2 root root0 Apr  2 21:28 power
> > lrwxrwxrwx 1 root root0 Apr  2 15:03 subsystem
> > -> ../../../../class/thermal
> > -r--r--r-- 1 root root 4096 Apr  2 15:03 type
> > -rw-r--r-- 1 root root 4096 Apr  2 15:03 uevent
> 
> I still don't understand.  What are you adding here, the device
> symlink? Or something else?
> 
> > For userspace which is not satisfied with the simple use case of a
> > single knob for setting power limit, it can follow the link to find
> > the device tree entry. Then get access to the complete knobs,
> > including event notifications.
> 
> And what is in that device directory? 
 
the device directory contains the complete RAPL interface
representation. paste the example of package domain again.

root@chromoly:/sys/class/thermal/cooling_device15/device# grep . *
domain_n

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-03 Thread Pawel Moll

On Wed, 2013-04-03 at 18:29 +0100, John Stultz wrote:
> On 04/03/2013 10:19 AM, Pawel Moll wrote:
> > On Tue, 2013-04-02 at 17:19 +0100, John Stultz wrote:
> >> But if we're going to have to do
> >> this via a clockid, I'm going to want it to be done via a dynamic posix
> >> clockid, so its clear its tightly tied with perf and not considered a
> >> generic interface (and I can clearly point folks having problems to the
> >> perf maintainers ;).
> > Ok, so how about the code below?
> >
> > There are two distinct parts of the "solution":
> >
> > 1. The dynamic posix clock, as you suggested. Then one can get the perf
> > timestamp by doing:
> >
> > clock_fd = open("/dev/perf-clock", O_RDONLY);
> > clock_gettime(FD_TO_CLOCKID(clock_fd), &ts)
> >
> > 2. A sort-of-hack in the get_posix_clock() function making it possible
> > to do the same using the perf event file descriptor, eg.:
> >
> > fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
> > clock_gettime(FD_TO_CLOCKID(fd), &ts)
> 
> #2 makes my nose wrinkle. 

To make myself clear: I consider the code as it is a hack.

> Forgive me for being somewhat ignorant on the 
> perf interfaces, but why is the second portion necessary or beneficial?

My thinking: the perf syscall returns a file descriptor already, so it
would make sense to re-use it in the clock_gettime() call instead of
jumping through loops to open a character device file, which may not
exist at all (eg. no udev) or may be placed or named in a random way
(eg. some local udev rule).

I'm open for different opinions :-)

Pawel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/9] ia64: cpufreq: move cpufreq driver to drivers/cpufreq

2013-04-03 Thread Tony Luck

On Mon, Apr 1, 2013 at 5:49 PM, Viresh Kumar  wrote:
> For now, your Ack will work :)

Acked-by: Tony Luck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low

2013-04-03 Thread Yinghai Lu

On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu  wrote:
> On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal  wrote:
>
>> - implement crashkernel_no_auto_low option to opt out of auto reserved
>>   low memory
>
> No, that is ugly.
...
>
> It's *you* want me to change "Crash kernel low" to "Crash kernel".
>
> Do we need to drop second patch? So will still keep
> "Crash kernel low" in /proc/iomem?

also we can drop the last patch and keep "crashkernel_high=" and
"crashkernel_low="

as you even like to introduce "crashkernel_no_auto_low".

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

2013-04-03 Thread Borislav Petkov

On Wed, Apr 03, 2013 at 12:17:38PM -0500, Jacob Shin wrote:
> Writing 0 to powersave_bias or unloading the AMD driver could do that.
>
> When the AMD driver loads, it will give a sane default value to
> powersave_bias to enable it, when it unloads, it will put it back to
> 0.

... and on systems without hw feedback, it will keep powersave_bias to 0
by default, retaining the old behavior.

Yeah, sounds like a plan.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-03 Thread John Stultz


On 04/03/2013 10:19 AM, Pawel Moll wrote:

On Tue, 2013-04-02 at 17:19 +0100, John Stultz wrote:

But if we're going to have to do
this via a clockid, I'm going to want it to be done via a dynamic posix
clockid, so its clear its tightly tied with perf and not considered a
generic interface (and I can clearly point folks having problems to the
perf maintainers ;).

Ok, so how about the code below?

There are two distinct parts of the "solution":

1. The dynamic posix clock, as you suggested. Then one can get the perf
timestamp by doing:

clock_fd = open("/dev/perf-clock", O_RDONLY);
clock_gettime(FD_TO_CLOCKID(clock_fd), &ts)

2. A sort-of-hack in the get_posix_clock() function making it possible
to do the same using the perf event file descriptor, eg.:

fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
clock_gettime(FD_TO_CLOCKID(fd), &ts)


#2 makes my nose wrinkle. Forgive me for being somewhat ignorant on the 
perf interfaces, but why is the second portion necessary or beneficial?


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 00/11] usb: dwc3/xhci/phy: Enable runtime power management

2013-04-03 Thread Sarah Sharp

Question: Do you still need this patch for 3.10?

http://marc.info/?l=linux-usb&m=136057666911621&w=2

Does this patchset build on top of that?

I'm really behind on my patches for 3.10, sorry.

Sarah Sharp

On Mon, Apr 01, 2013 at 07:23:59PM +0530, Vivek Gautam wrote:
> This patch-series enables runtime power management on xhci-plat,
> dwc3-core, dwc3-exynos as well as on Samsung's USB 2.0 type and
> USB 3.0 type PHYs.
> 
> Based on 'next' branch of Felipe Balbi's USB tree.
> 
> Changes from v2:
>  - Using separate functions for USB PHY runtime power management, instead of
>using macros.
>  - Adding 'pm_runtime_set_suspended()' api call in dwc3 core layer before
>enabling runtime pm. (Ideally, we should be explicitly make device
>'suspended' or 'active' before enabling runtime pm on it).
>  - Checking return code for 'put_sync' and 'get_sync' of USB-PHYs when
>waking up or suspending them from dwc3 core's runtime_pm callbacks.
>  - Removed buggy pm_runtime_put() calls from driver's (xhci, dwc3 and PHY)
>remove functions.
>  - Adding a patch to enable runtime power management of Samsung's USB 2.0 PHY
>(usb: phy: samsung: Enable runtime power management on usb2phy)
> 
> Changes from v1:
>  - Adding required PHY APIs to handle runtime power management
>instead of directly twiddling with phy->dev.
>  - Handling runtime power management of usb PHYs in dwc3 core
>driver instead of in any glue layer.
>  - Splitting the patch:
>[PATCH 4/4] usb: phy: samsung: Enable runtime power management on 
> samsung-usb
>into required number to bifurcate functionality.
> 
> Vivek Gautam (11):
>   usb: phy: Add APIs for runtime power management
>   USB: dwc3: Adjust runtime pm to allow autosuspend
>   usb: dwc3: Enable runtime pm only after PHYs are initialized
>   usb: dwc3: Add runtime power management callbacks
>   usb: dwc3: exynos: Enable runtime power management
>   usb: xhci: Enable runtime pm in xhci-plat
>   usb: phy: samsung: Enable runtime power management on usb2phy
>   usb: phy: samsung: Enable runtime power management on usb3phy
>   usb: phy: samsung: Add support for external reference clock
>   usb: phy: samsung: Add support for PHY ref_clk gpio
>   usb: phy: samsung: Add support for PHY refclk switching
> 
>  drivers/usb/dwc3/core.c|   59 ++--
>  drivers/usb/dwc3/dwc3-exynos.c |   12 +++
>  drivers/usb/host/xhci-plat.c   |6 ++
>  drivers/usb/phy/phy-samsung-usb.c  |   26 +++
>  drivers/usb/phy/phy-samsung-usb.h  |1 +
>  drivers/usb/phy/phy-samsung-usb2.c |5 ++
>  drivers/usb/phy/phy-samsung-usb3.c |  119 +--
>  include/linux/usb/phy.h|  141 
> 
>  8 files changed, 358 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.6.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] perf: Add persistent event facilities

2013-04-03 Thread Borislav Petkov

On Thu, Mar 28, 2013 at 07:15:16PM +0100, Robert Richter wrote:
> The mmap'ed region is already allocated by the kernel. How does a user
> know the buffer size of the mmap'ed region?

Right, so normal perf events get a buffer allocated at mmap time. The
size of that buffer is determined from the size of the vma which, in
turn, gets determined by perf (there's a default of 512K there).

Now, currently the size of the percpu buffers of a persistent event is
determined by the caller and this needs to be somehow better controlled
like limit them to a max size even when allocated from within the
kernel. Also maybe use perf's default when allocating the event from
perf. We can make them of a default size and keep them that way, which
would mean, perf would need to know about this. And there's the question
whether some persistent buffers would actually generate more samples and
need bigger buffers. Hmm...

We probably need to discuss this more though...

> Also, I wouldn't make too much use of -EINVAL, this should only be
> used if the syscall contains *wrong* data.

Ok.

> event must be set to an error code here.
>
> Better swap order of rb_alloc() and perf_event_create_kernel_
> counter(). Makes things easier.

Hehe, already done. I told you not to review that version of the
patches, remember? :-)

> > +static void rm_persistent_event(int cpu, struct perf_event_attr *attr)
>
> Would rather prefer del_... as this is actually used for deleting
> events in perf.

Done.

> > +   list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) {
> > +
> > +   if (desc->attr->config != attr->config)
> > +   continue;
> 
> Umm, the attr->config is not sufficient as a selector since it must be
> unique which is not granted (of course it works for one event only).

Right, so the tracepoints are enumerated by tracing code at boot
time. But not the hw events, for example. How can we select events
unambiguously?

> > +unwind:
> > +   while (--i >= 0)
> > +   rm_persistent_event(i, attr);
> > +
> > +   return -EINVAL;
> 
> Should return the actual error.

Done.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] pinctrl/nomadik: add device tree bindings for db8540

2013-04-03 Thread Linus Walleij

From: Gabriel Fernandez 

This adds a simple device tree binding for db8540 boards.

Signed-off-by: Gabriel Fernandez 
Reviewed-by: Philippe Langlais 
Signed-off-by: Linus Walleij 
---
 drivers/pinctrl/pinctrl-nomadik.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pinctrl/pinctrl-nomadik.c 
b/drivers/pinctrl/pinctrl-nomadik.c
index 267d81c..435bf30 100644
--- a/drivers/pinctrl/pinctrl-nomadik.c
+++ b/drivers/pinctrl/pinctrl-nomadik.c
@@ -2111,6 +2111,10 @@ static const struct of_device_id nmk_pinctrl_match[] = {
.compatible = "stericsson,nmk-pinctrl",
.data = (void *)PINCTRL_NMK_DB8500,
},
+   {
+   .compatible = "stericsson,nmk-pinctrl-db8540",
+   .data = (void *)PINCTRL_NMK_DB8540,
+   },
{},
 };
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm, x86: Do not zero hugetlbfs pages at boot. -v2

2013-04-03 Thread Robin Holt

On Wed, Apr 03, 2013 at 04:00:49PM +0200, Michal Hocko wrote:
> On Tue 02-04-13 21:43:44, Robin Holt wrote:
> [...]
> > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > index 2b0bcb0..b2e4027 100644
> > --- a/mm/bootmem.c
> > +++ b/mm/bootmem.c
> > @@ -705,12 +705,16 @@ void * __init __alloc_bootmem(unsigned long size, 
> > unsigned long align,
> >  
> >  void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
> > unsigned long size, unsigned long align,
> > -   unsigned long goal, unsigned long limit)
> > +   unsigned long goal, unsigned long limit,
> > +   int zeroed)
> >  {
> > void *ptr;
> >  
> > if (WARN_ON_ONCE(slab_is_available()))
> > -   return kzalloc(size, GFP_NOWAIT);
> > +   if (zeroed)
> > +   return kzalloc(size, GFP_NOWAIT);
> > +   else
> > +   return kmalloc(size, GFP_NOWAIT);
> >  again:
> >  
> > /* do not panic in alloc_bootmem_bdata() */
> 
> You need to update alloc_bootmem_bdata and alloc_bootmem_core as well.
> Otherwise this is a no-op for early allocations when slab is not
> available which is the case unless something is broken.

Michal,

Does this do what you would expect?  I compiled this for ia64, but I
have not tested it at all.

Robin

---
 mm/bootmem.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index b2e4027..350e0ab 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -497,7 +497,8 @@ static unsigned long __init align_off(struct bootmem_data 
*bdata,
 
 static void * __init alloc_bootmem_bdata(struct bootmem_data *bdata,
unsigned long size, unsigned long align,
-   unsigned long goal, unsigned long limit)
+   unsigned long goal, unsigned long limit,
+   int zeroed)
 {
unsigned long fallback = 0;
unsigned long min, max, start, sidx, midx, step;
@@ -584,7 +585,8 @@ find_block:
 
region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) +
start_off);
-   memset(region, 0, size);
+   if (zeroed)
+   memset(region, 0, size);
/*
 * The min_count is set to 0 so that bootmem allocated blocks
 * are never reported as leaks.
@@ -605,13 +607,18 @@ find_block:
 static void * __init alloc_bootmem_core(unsigned long size,
unsigned long align,
unsigned long goal,
-   unsigned long limit)
+   unsigned long limit,
+   int zeroed)
 {
bootmem_data_t *bdata;
void *region;
 
-   if (WARN_ON_ONCE(slab_is_available()))
-   return kzalloc(size, GFP_NOWAIT);
+   if (WARN_ON_ONCE(slab_is_available())) {
+   if (zeroed)
+   return kzalloc(size, GFP_NOWAIT);
+   else
+   return kmalloc(size, GFP_NOWAIT);
+   }
 
list_for_each_entry(bdata, &bdata_list, list) {
if (goal && bdata->node_low_pfn <= PFN_DOWN(goal))
@@ -619,7 +626,7 @@ static void * __init alloc_bootmem_core(unsigned long size,
if (limit && bdata->node_min_pfn >= PFN_DOWN(limit))
break;
 
-   region = alloc_bootmem_bdata(bdata, size, align, goal, limit);
+   region = alloc_bootmem_bdata(bdata, size, align, goal, limit, 
zeroed);
if (region)
return region;
}
@@ -635,7 +642,7 @@ static void * __init ___alloc_bootmem_nopanic(unsigned long 
size,
void *ptr;
 
 restart:
-   ptr = alloc_bootmem_core(size, align, goal, limit);
+   ptr = alloc_bootmem_core(size, align, goal, limit, 1);
if (ptr)
return ptr;
if (goal) {
@@ -710,22 +717,23 @@ void * __init ___alloc_bootmem_node_nopanic(pg_data_t 
*pgdat,
 {
void *ptr;
 
-   if (WARN_ON_ONCE(slab_is_available()))
+   if (WARN_ON_ONCE(slab_is_available())) {
if (zeroed)
return kzalloc(size, GFP_NOWAIT);
else
return kmalloc(size, GFP_NOWAIT);
+   }
 again:
 
/* do not panic in alloc_bootmem_bdata() */
if (limit && goal + size > limit)
limit = 0;
 
-   ptr = alloc_bootmem_bdata(pgdat->bdata, size, align, goal, limit);
+   ptr = alloc_bootmem_bdata(pgdat->bdata, size, align, goal, limit, 
zeroed);
if (ptr)
return ptr;
 
-   ptr = alloc_bootmem_core(size, align, goal, limit);
+   ptr = alloc_bootmem_core(size, align, goal, limit, z

Re: [PATCH 0/2] These two patches to s3c_pm_arch_prepare_irqs() were part of the work

2013-04-03 Thread Doug Anderson

Kukjin,

On Tue, Apr 2, 2013 at 7:16 PM, Kukjin Kim  wrote:
>> Applied with 1st one, BTW, do you want to send this for stable tree?

I don't have any need for it to be in stable tree.  The ARM Chromebook
hasn't reached critical functionality on any released/upstram Linux
versions so it doesn't make much sense to backport fixes.  If someone
else wants it in stable (and can confirm that it helps them) then I
certainly wouldn't object!

> One more note, just now I discussed Jaecheol Lee about the bit,
> ENABLE_WAKEUP_SW, as the patch fixed, it should be cleared but used to be
> set s3c_irqwake_intmask. Let me check again, then if any updates I'll let
> you know.

OK, thanks.  If there is a reason that ENABLE_WAKEUP_SW needs to be
set then it would be good to understand that case.  :)

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] efi: Distinguish between "remaining space" and actually used space

2013-04-03 Thread Matthew Garrett

On Wed, 2013-04-03 at 18:12 +0100, Matt Fleming wrote:

> The solution you're proposing has the same downsides as 3) - we risk
> having to tweak things either way. The difference is that in the case of
> 3) the tweaking is adding entries to the whitelist, whereas tweaking
> your solution has more chance of introducing further unwanted
> regressions because you'd be tweaking an algorithm, an algorithm that
> relies on the internal implementation of the variable storage code.

We *risk* having to tweak things, and we fail on the side of safety. 

> > Dealing with firmware is hard. This fixes (1) without leaving us with
> > (2), which seems like a net win.
> 
> I'm not convinced that implementing 3) would inevitably lead to 2),
> provided that we apply a bit of common sense when adding entries. I'm
> not advocating some kind of flag day where we add umpteen machines to
> the whitelist.
> 
> For reference, I just pushed two patches to the 'whitelist' branch at,
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
> 
> which should hopefully illustrate the kind of thing that I'm talking about.

I don't think that works. People are complaining that we broke some
Thinkpads as well, but we also have reports that Thinkpads can be
bricked if we use too much space.

-- 
Matthew Garrett | mj...@srcf.ucam.org

Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2013-04-03 Thread Pawel Moll

On Tue, 2013-04-02 at 17:19 +0100, John Stultz wrote:
> But if we're going to have to do 
> this via a clockid, I'm going to want it to be done via a dynamic posix 
> clockid, so its clear its tightly tied with perf and not considered a 
> generic interface (and I can clearly point folks having problems to the 
> perf maintainers ;).

Ok, so how about the code below?

There are two distinct parts of the "solution":

1. The dynamic posix clock, as you suggested. Then one can get the perf
timestamp by doing:

clock_fd = open("/dev/perf-clock", O_RDONLY);
clock_gettime(FD_TO_CLOCKID(clock_fd), &ts) 

2. A sort-of-hack in the get_posix_clock() function making it possible
to do the same using the perf event file descriptor, eg.:

fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
clock_gettime(FD_TO_CLOCKID(fd), &ts) 

Any (either strong or not) opinions?

Pawel

8<--
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e47ee46..b2127e3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -52,6 +52,7 @@ struct perf_guest_info_callbacks {
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct perf_callchain_entry {
@@ -845,4 +846,6 @@ _name##_show(struct device *dev,
\
\
 static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
 
+struct posix_clock *perf_get_posix_clock(struct file *fp);
+
 #endif /* _LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b0cd865..534cb43 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7446,6 +7446,49 @@ unlock:
 }
 device_initcall(perf_event_sysfs_init);
 
+static int perf_posix_clock_getres(struct posix_clock *pc, struct timespec *tp)
+{
+   *tp = ns_to_timespec(TICK_NSEC);
+   return 0;
+}
+
+static int perf_posix_clock_gettime(struct posix_clock *pc, struct timespec 
*tp)
+{
+   *tp = ns_to_timespec(perf_clock());
+   return 0;
+}
+
+static const struct posix_clock_operations perf_posix_clock_ops = {
+   .clock_getres = perf_posix_clock_getres,
+   .clock_gettime = perf_posix_clock_gettime,
+};
+
+static struct posix_clock perf_posix_clock;
+
+struct posix_clock *perf_get_posix_clock(struct file *fp)
+{
+   if (!fp || fp->f_op != &perf_fops)
+   return NULL;
+
+   down_read(&perf_posix_clock.rwsem);
+
+   return &perf_posix_clock;
+}
+
+static int __init perf_posix_clock_init(void)
+{
+   dev_t devt;
+   int ret;
+
+   ret = alloc_chrdev_region(&devt, 0, 1, "perf-clock");
+   if (ret)
+   return ret;
+
+   perf_posix_clock.ops = perf_posix_clock_ops;
+   return posix_clock_register(&perf_posix_clock, devt);
+}
+device_initcall(perf_posix_clock_init);
+
 #ifdef CONFIG_CGROUP_PERF
 static struct cgroup_subsys_state *perf_cgroup_css_alloc(struct cgroup *cont)
 {
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ce033c7..e2a40a5 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -249,16 +250,21 @@ struct posix_clock_desc {
 static int get_clock_desc(const clockid_t id, struct posix_clock_desc *cd)
 {
struct file *fp = fget(CLOCKID_TO_FD(id));
+   struct posix_clock *perf_clk = NULL;
int err = -EINVAL;
 
if (!fp)
return err;
 
-   if (fp->f_op->open != posix_clock_open || !fp->private_data)
+#if defined(CONFIG_PERF_EVENTS)
+   perf_clk = perf_get_posix_clock(fp);
+#endif
+   if ((fp->f_op->open != posix_clock_open || !fp->private_data) &&
+   !perf_clk)
goto out;
 
cd->fp = fp;
-   cd->clk = get_posix_clock(fp);
+   cd->clk = perf_clk ? perf_clk : get_posix_clock(fp);
 
err = cd->clk ? 0 : -ENODEV;
 out:



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

2013-04-03 Thread Jacob Shin

On Wed, Apr 03, 2013 at 07:04:56PM +0200, Borislav Petkov wrote:
> On Wed, Apr 03, 2013 at 11:53:24AM -0500, Jacob Shin wrote:
> > Then Thomas, Boris, would it be acceptable if enable the frequency
> > feedback feature by default with a sane powersave_bias tunable value?
> > And also add proper documentation for both vanila powersave_bias
> > and powersave_bias with AMD frequency sensitivity loaded to
> > Documentation/cpu-freq/ondemand ?
> 
> Yeah, this was what I was proposing, basically. The only question here
> is, would anyone want to disable freq decisions on systems with hw
> feedback? If yes, then you'd need to be able to disable the feedback
> thing, maybe have a magic value for powersave_bias...

Writing 0 to powersave_bias or unloading the AMD driver could do that.

When the AMD driver loads, it will give a sane default value to
powersave_bias to enable it, when it unloads, it will put it back to 0

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pinctrl: tegra: add suspend-resume support

2013-04-03 Thread Stephen Warren

On 04/03/2013 08:09 AM, Linus Walleij wrote:
> On Thu, Mar 28, 2013 at 6:48 PM, Stephen Warren  wrote:
> 
>> Why can't we just use the device suspend/resume functions rather than
>> global (syscore) suspend/resume functions? Presumably this is to ensure
>> that all other drivers suspend first, then the pinctrl driver does, and
>> the reverse for resume. Can't we rely on deferred probe to ensure that
>> instead?
>>
>> To make that work, we might need every affected driver to define a dummy
>> pinmux state in DT that references the pinctrl driver, to make sure they
>> all get probed after the pinctrl driver.
> 
> Hm that reminds me of that policy change I suggested a while back to
> do this instead of using hogs where possible.
> 
> It has the nice side-effect that when we inspect the debugfs info
> all pins will be properly owned by respective consuming device.

True, in theory that would also work.

However, in practice with Tegra's pinmux, it has to all be set up at
once to avoid any conflicts, so hogging is really the only practical way
to use it in most cases.

This is because in many cases, a single controller could have its
signals routed out to many different pins (or sets of pins), rather than
just having one possible location where each controller could be routed
to. In other words, the pinmux is m:n rather than m:1.

It's possible program the registers so that the same signal is connected
to (or from depending on signal direction) multiple pins at once. If
this is done, the behaviour is unspecified; who knows which pin will
actually receive (or provide) that signal?

This can easily happen if the whole pinmux is not initialized fully in
one pass, i.e. through hogs. For example, the HW default may be for e.g.
UART1 to get routed to pins A, and B, whereas a particular board may
assume that UART1 is routed to pins X, Y. So, SW must program pins X, Y
to mux in UART1. However, if pins A, B aren't also re-programmed away
from the UART1 option, then UART 1 on X, Y may not actually work. In
this case, we can't rely on some other driver having
acquire/re-programmed pins A, B, unless it's the hog of the pin
controller itself. Hence, the only sensible solution is for the pin
controller to hog absolutely everything.

The only exception would be for dynamic pin-muxing (e.g. pinctrl-based
I2C muxing), where hopefully everything is chosen carefully to avoid
this kind of issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] x86, kdump: Retore crashkernel= to allocate low

2013-04-03 Thread Yinghai Lu

On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal  wrote:
> On Tue, Apr 02, 2013 at 01:36:02PM -0700, Yinghai Lu wrote:
>
> [..]
>> > You are just describing what your code does. There is no theme or
>> > justification behind this behavior. There is no uniformity. A user can
>> > question that so far you used to honor last crashkernel= parameter and
>> > suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
>> > overriding crashkernel=X and it is not obivious why.
>>
>> Let me repeat again:
>> we keep crashkernel=X old behavior with old kexec-tools.
>> crashkernel=X;high is for new kexec-tools that support loading high.
>>
>> If the user want to use ,high but still with old kexec-tools, that is
>> not going to work.
>>
>> Can we just keep it separated?
>
> Kernel does not know about old kexec-tools or new kexec-tools. Neither
> kernel can enforce what command line options are passed by user. So
> kernel needs to define a clean interface here which is easily understood
> and is extensible also in future.

Looks you are chasing wrong direction.

Those four patches fixes the regression that Wang and you reported,
User don't need to change their kexec-tools and boot command lines
kdump still works.

We will never can stop user doing crazy thing with their system.

>
> [..]
>> >
>> > If user wants 128M in low memory, they will just specify
>> > crashkernel=128M;low
>>
>> in the kernel-parameter.txt, already says ;low is need to used with ;high.
>
> But why are we tying ;low to ;high. One should be easily extend
> crashkernel=X to be able to reserve memory above 4G if specified amount
> is not available below 4G. In that case also one might want to reserve
> some low memory?

I want to keep crashkernel=X to the old behavior.

If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
will not work with new kernel.

>
> For that matter crashkernel=range1:size,range2:size syntax should be
> extendible too to reserve memory above 4G if desired size of memory
> is not available in low memory.
>
> Now in those cases too, one would like to have 72M of low memory
> reserved. So ;low shoud not be tied to ;high necessarily.
>
> In fact current code does not care whetehr ;high was specified or not.
> If memory is reserved above 4G, ;low code will kick in.

No, that is not right.

only when ;high is specified, kernel will try to allocate high above 4G.


>
>>
>> >
>> > If they want to control multiple ranges of memory, then that's the feature
>> > we currently don't support. Currently we support only reserving one range
>> > of memory.
>> >
>> > If you want to support multiple ranges of memory,then do it properly.
>> > Parse all crashkernel= options, prepare a list of memory to be reserved
>> > and unreserved, resolve all the conflicts between various options and
>> > then reserve the memory. But that does not seem to be a requirement at
>> > this point of time.
>>
>> No we does not support multiple ranges, as it will need more changes
>> in kexec-tools.
>>
>> Can we stop here with those four patches?
>>
>> Later, we can extend it if it is really needed.
>
> crashkernel= options are already confusing. I think we with this patchset
> we will just make them even more confusing and future extensions
> difficult.

So keep crashkernel= without high and low to old behavior.

>
> We really need to stick to the notion of only one crashkernel= option
> is accepted and that is last one on command line. And if need be,
> we need to work on multi range reservation feature where we process
> and reserve ranges as specified by all crashkernel= parameters on
> command line.

That is kept.

and only last high is honored

>
> Creating new combinations where some crashkernel= are preferred over
> others and some crashkernel= options work with only selected crashkernel=
> options, is asking for trouble, especially keeping in mind future
> extensions.

I don't think so.

old conf that works before still use crashkernel= with high and low.
old conf that does not work, could switch to crashkernel=;high/low
with new kexec-tools

>
> I prefer following for 3.9.
>
> - process only right most crashkernel= option.

what is "right most" ?
only last crashkernel=X is honored?
I restored that already with those four patches.

> - implement crashkernel_no_auto_low option to opt out of auto reserved
>   low memory

No, that is ugly.

> - implement crashkernel=X;high to support high memory reservations.
>
> And now old kexec-tools user can use crashkernel=X while users needing
> high memory reservation can use crashkernel=X;high.

The four patches did not do that?

>
> If you really want to support user defined crashkernel=X;low along with
> crashkernel=Y;high, that is really a multi range reservation feature and
> need to be implemented properly instead of coming up with short cuts.

No it is not.

It's *you* want me to change "Crash kernel low" to "Crash kernel".

Do we need to drop second patch? So will still keep
"Crash

Re: [PATCH 2/2] efi: Distinguish between "remaining space" and actually used space

2013-04-03 Thread Matt Fleming

On 03/04/13 14:48, Matthew Garrett wrote:
> On Wed, 2013-04-03 at 14:11 +0100, Matt Fleming wrote:
> 
>> This looks like something that will differ between implementations, and the
>> fact that it's appearing in our code is a sure sign that this isn't the way 
>> to
>> go.
> 
> Our choices right now are:
> 
> 1) Break machines that don't garbage collect on every reboot
> 2) Leave Samsungs (and some Lenovos?) vulnerable to bricking
> 3) Maintain a whitelist or blacklist that will inevitably be inadequate,
> either breaking otherwise working machines or risking bricking of broken
> ones
> 4) Attempt to implement something that'll work in all cases

The solution you're proposing has the same downsides as 3) - we risk
having to tweak things either way. The difference is that in the case of
3) the tweaking is adding entries to the whitelist, whereas tweaking
your solution has more chance of introducing further unwanted
regressions because you'd be tweaking an algorithm, an algorithm that
relies on the internal implementation of the variable storage code.

> Dealing with firmware is hard. This fixes (1) without leaving us with
> (2), which seems like a net win.

I'm not convinced that implementing 3) would inevitably lead to 2),
provided that we apply a bit of common sense when adding entries. I'm
not advocating some kind of flag day where we add umpteen machines to
the whitelist.

For reference, I just pushed two patches to the 'whitelist' branch at,

  git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git

which should hopefully illustrate the kind of thing that I'm talking about.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/10] perf sort: Consolidate sort_entry__setup_elide()

2013-04-03 Thread Arnaldo Carvalho de Melo

Em Wed, Apr 03, 2013 at 09:26:19PM +0900, Namhyung Kim escreveu:
> From: Namhyung Kim 
> 
> The same code was duplicate to places, factor them out to common
> sort__setup_elide().

Looks ok, applying after fixing up fuzzes due to this being at the end
of the patchseries. Things like this that are clear cleanups are best
positioned in the start of the patch series.

- Arnaldo
 
> Signed-off-by: Namhyung Kim 
> ---
>  tools/perf/builtin-diff.c   |  4 +---
>  tools/perf/builtin-report.c | 20 +---
>  tools/perf/builtin-top.c|  4 +---
>  tools/perf/util/sort.c  | 45 
> +++--
>  tools/perf/util/sort.h  |  3 +--
>  5 files changed, 47 insertions(+), 29 deletions(-)
> 
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 03b56c542bb6..316bf13e59c7 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -611,9 +611,7 @@ int cmd_diff(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>  
>   setup_pager();
>  
> - sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", NULL);
> - sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", 
> NULL);
> - sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", 
> NULL);
> + sort__setup_elide(NULL);
>  
>   return __cmd_diff();
>  }
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index c95fd92fbd44..bff244fa4b5d 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -937,25 +937,7 @@ repeat:
>   report.symbol_filter_str = argv[0];
>   }
>  
> - sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", 
> stdout);
> -
> - if (sort__mode == SORT_MODE__BRANCH) {
> - sort_entry__setup_elide(&sort_dso_from, 
> symbol_conf.dso_from_list, "dso_from", stdout);
> - sort_entry__setup_elide(&sort_dso_to, symbol_conf.dso_to_list, 
> "dso_to", stdout);
> - sort_entry__setup_elide(&sort_sym_from, 
> symbol_conf.sym_from_list, "sym_from", stdout);
> - sort_entry__setup_elide(&sort_sym_to, symbol_conf.sym_to_list, 
> "sym_to", stdout);
> - } else {
> - if (report.mem_mode) {
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "symbol_daddr", stdout);
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "dso_daddr", stdout);
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "mem", stdout);
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "local_weight", stdout);
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "tlb", stdout);
> - sort_entry__setup_elide(&sort_dso, 
> symbol_conf.dso_list, "snoop", stdout);
> - }
> - sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", 
> stdout);
> - sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, 
> "symbol", stdout);
> - }
> + sort__setup_elide(stdout);
>  
>   ret = __cmd_report(&report);
>   if (ret == K_SWITCH_INPUT_DATA) {
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 4aa504baaf0b..fe4acf568483 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1201,9 +1201,7 @@ int cmd_top(int argc, const char **argv, const char 
> *prefix __maybe_unused)
>   if (symbol__init() < 0)
>   return -1;
>  
> - sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
> - sort_entry__setup_elide(&sort_comm, symbol_conf.comm_list, "comm", 
> stdout);
> - sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", 
> stdout);
> + sort__setup_elide(stdout);
>  
>   get_term_dimensions(&top.winsize);
>   if (top.print_entries == 0) {
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 213831133e08..86ae94d8782e 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -1,5 +1,6 @@
>  #include "sort.h"
>  #include "hist.h"
> +#include "symbol.h"
>  
>  regex_t  parent_regex;
>  const char   default_parent_pattern[] = "^sys_|^do_page_fault";
> @@ -1085,8 +1086,9 @@ int setup_sorting(void)
>   return ret;
>  }
>  
> -void sort_entry__setup_elide(struct sort_entry *self, struct strlist *list,
> -  const char *list_name, FILE *fp)
> +static void sort_entry__setup_elide(struct sort_entry *self,
> + struct strlist *list,
> + const char *list_name, FILE *fp)
>  {
>   if (list && strlist__nr_entries(list) == 1) {
>   if (fp != NULL)
> @@ -1095,3 +1097,42 @@ void sort_entry__setup_elide(struct sort_entry *self, 
> struct strlist *list,
>   self->elide = true;
>   }
>  }
> +
> +void sort__setup_elide(FILE *output)
>

Re: [PATCH 8/9] spark: cpufreq: move cpufreq driver to drivers/cpufreq

2013-04-03 Thread Viresh Kumar

On 3 April 2013 22:08, David Miller  wrote:
> From: Viresh Kumar 
> Date: Wed, 3 Apr 2013 14:59:44 +0530
>
>> On 1 April 2013 10:11, Viresh Kumar  wrote:
>>> On 31 March 2013 22:10, David Miller  wrote:
> On 26 March 2013 09:55, Viresh Kumar  wrote:
>> From: Viresh Kumar 
>> Date: Mon, 25 Mar 2013 11:20:23 +0530
>> Subject: [PATCH] sparc: cpufreq:  move cpufreq driver to drivers/cpufreq
>>>
 Subject line still has the "spark" typo.
>>>
>>> Your mail was scary, really... HOW can i do it??
>>>
>>> And then i saw how you got it wrong. I haven't sent a new mail, so mails 
>>> subject
>>> remains the same... I copied V2 in the same mail.. Check above, subject 
>>> looks
>>> fine :)
>>
>> Hi David,
>>
>> I think all pending issues are fixed now... Can i have your Ack please?
>> Or maybe more comments :)
>
> Acked-by: David S. Miller 

Adding everybody else in cc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 34/77] xen/blkback: Dont trust the handle from the frontend.

2013-04-03 Thread Greg Kroah-Hartman

On Wed, Apr 03, 2013 at 12:38:28PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 03, 2013 at 09:01:06AM -0700, Greg Kroah-Hartman wrote:
> > On Wed, Apr 03, 2013 at 04:01:54PM +0200, William Dauchy wrote:
> > > On Tue, Mar 12, 2013 at 11:10 PM, Greg Kroah-Hartman
> > >  wrote:
> > > >> > >> IOW I don't see why this got proposed for stable at all.
> > > >> > >
> > > >> > > So, you suggest to just drop this patch for v3.8.3, don't you?
> > > >> >
> > > >> > I do, yes. But I'd suggest to get Konrad to agree.
> > > >>
> > > >> Yes. Lets drop it.
> > > >
> > > > Now reverted, thanks.
> > > 
> > > Seems like still present in 3.4.x branch. Is that a mistake?
> > 
> > It showed up in 3.4.35, if that's a mistake, and I should revert it,
> > please, someone let me know.
> 
> Yes. It is a mistake. Please revert it.

Now reverted, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] perf hist browser: Use sort__has_sym

2013-04-03 Thread Arnaldo Carvalho de Melo

Em Wed, Apr 03, 2013 at 09:26:18PM +0900, Namhyung Kim escreveu:
> From: Namhyung Kim 
> 
> The TUI hist browser had a similar variable has_symbols for the same
> purpose.  Let's get rid of the duplication.

I'm ok with that, if it involves removing sort__has_sym, that is a
global variable, making it impossible to use different sort orders in
the same session, if we ever want to do that :-)

- Arnaldo
 
> Signed-off-by: Namhyung Kim 
> ---
>  tools/perf/ui/browsers/hists.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index cad8e37f05d9..a4268cab1921 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -25,7 +25,6 @@ struct hist_browser {
>   struct map_symbol   *selection;
>   int  print_seq;
>   bool show_dso;
> - bool has_symbols;
>  };
>  
>  extern void hist_browser__init_hpp(void);
> @@ -1155,10 +1154,6 @@ static struct hist_browser *hist_browser__new(struct 
> hists *hists)
>   browser->b.refresh = hist_browser__refresh;
>   browser->b.seek = ui_browser__hists_seek;
>   browser->b.use_navkeypressed = true;
> - if (sort__mode == SORT_MODE__BRANCH)
> - browser->has_symbols = sort_sym_from.list.next != NULL;
> - else
> - browser->has_symbols = sort_sym.list.next != NULL;
>   }
>  
>   return browser;
> @@ -1386,7 +1381,7 @@ static int perf_evsel__hists_browse(struct perf_evsel 
> *evsel, int nr_events,
>*/
>   goto out_free_stack;
>   case 'a':
> - if (!browser->has_symbols) {
> + if (!sort__has_sym) {
>   ui_browser__warning(&browser->b, delay_secs * 2,
>   "Annotation is only available for symbolic views, "
>   "include \"sym*\" in --sort to use it.");
> @@ -1485,7 +1480,7 @@ static int perf_evsel__hists_browse(struct perf_evsel 
> *evsel, int nr_events,
>   continue;
>   }
>  
> - if (!browser->has_symbols)
> + if (!sort__has_sym)
>   goto add_exit_option;
>  
>   if (sort__mode == SORT_MODE__BRANCH) {
> -- 
> 1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/10] perf sort: Add 'addr' sort key

2013-04-03 Thread Arnaldo Carvalho de Melo

What I expected was that the result was this:

perf report --sort addr | grep -v ^# | sort -k2 -n | less

And in hexadecimal, can you fix this?

- Arnaldo

Em Wed, Apr 03, 2013 at 09:26:12PM +0900, Namhyung Kim escreveu:
>  static void hists__set_unres_dso_col_len(struct hists *hists, int dso)
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index 14c2fe20aa62..9599f805828f 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -43,12 +43,13 @@ enum hist_column {
>   HISTC_COMM,
>   HISTC_PARENT,
>   HISTC_CPU,
> + HISTC_SRCLINE,

Why move SRCLINE?

> + HISTC_ADDR,
>   HISTC_MISPREDICT,
>   HISTC_SYMBOL_FROM,
>   HISTC_SYMBOL_TO,
>   HISTC_DSO_FROM,
>   HISTC_DSO_TO,
> - HISTC_SRCLINE,
>   HISTC_LOCAL_WEIGHT,
>   HISTC_GLOBAL_WEIGHT,
>   HISTC_MEM_DADDR_SYMBOL,
> diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> index 1dbf16949250..5640a95b3575 100644
> --- a/tools/perf/util/sort.c
> +++ b/tools/perf/util/sort.c
> @@ -342,6 +342,34 @@ struct sort_entry sort_cpu = {
>   .se_width_idx   = HISTC_CPU,
>  };
>  
> +/* --sort addr */
> +
> +static int64_t
> +sort__addr_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> + return right->ip - left->ip;
> +}
> +
> +static int hist_entry__addr_snprintf(struct hist_entry *self, char *bf,
> +  size_t size, unsigned int width)
> +{
> + struct map *map = self->ms.map;
> + u64 addr = self->ip;
> +
> + if (map)
> + addr = map->unmap_ip(map, self->ip);
> +
> + return repsep_snprintf(bf, size, "%#*llu", width, addr);
> +}
> +
> +struct sort_entry sort_addr = {
> + .se_header  = "Address",
> + .se_cmp = sort__addr_cmp,
> + .se_snprintf= hist_entry__addr_snprintf,
> + .se_width_idx   = HISTC_ADDR,
> +};
> +
> +
>  /* sort keys for branch stacks */
>  
>  static int64_t
> @@ -871,6 +899,7 @@ static struct sort_dimension common_sort_dimensions[] = {
>   DIM(SORT_PARENT, "parent", sort_parent),
>   DIM(SORT_CPU, "cpu", sort_cpu),
>   DIM(SORT_SRCLINE, "srcline", sort_srcline),
> + DIM(SORT_ADDR, "addr", sort_addr),
>  };
>  
>  #undef DIM
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index 0232d476da87..0815e344f38c 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -138,6 +138,7 @@ enum sort_type {
>   SORT_PARENT,
>   SORT_CPU,
>   SORT_SRCLINE,
> + SORT_ADDR,
>  
>   /* branch stack specific sort keys */
>   __SORT_BRANCH_STACK,
> -- 
> 1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] watchdog: Add Congatec CGEB watchdog driver

2013-04-03 Thread Sascha Hauer

On Wed, Apr 03, 2013 at 08:30:00AM -0700, Guenter Roeck wrote:
> On Wed, Apr 03, 2013 at 05:09:52PM +0200, Sascha Hauer wrote:
> [ ... ]
> 
> > > 
> > > On a side note, if the driver supports devicetree, it might make sense to 
> > > call
> > > watchdog_init_timeout, since it initializes the timeout from devicetree 
> > > data.
> > 
> > The driver does not support devicetree.
> > 
> I should have said "the system". Calling watchdog_init_timeout is the only 
> thing
> a watchdog driver has to do to support devicetree.

The system does not use devicetrees. I'm very familiar with devicetrees
on ARM, but I have never seen a x86 system with devicetree support
(although I know they exist)

Sascha

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC: PATCH 2/2] iio: adc: exynos_adc: Handle timeout and race conditions

2013-04-03 Thread Doug Anderson

Lars,

On Sat, Mar 16, 2013 at 7:41 AM, Lars-Peter Clausen  wrote:
> I think you still need the mutex for serialization, otherwise the requests
> would just cancel each other out. Btw. what happens if you start a conversion
> while another is still in progress? Is it possible to abort a conversion?

I was thinking that the spinlock would just replace the mutex for the
purposes of serialization.

I stepped back a bit, though, and I'm wondering if we're over-thinking
things.  The timeout case should certainly be handled properly (thanks
for pointing it out), but getting a timeout is really not expected and
adding a lot of extra overhead to handle it elegantly seems a bit
much?

Specifically, the mutex means that we have one user of the ADC at a
time, and ADC conversion has nothing variable about it.  The user
manual that I have access to talks about 12-bit conversion happening
in 1 microsecond with a 5MHz input clock or 5 microseconds with a 1MHz
input clock.  Even if someone has clocks configured very differently,
it would be hard to imagine a conversion actually taking a full
second.

...so that means that if the timeout actually fires then something
else fairly drastic has gone wrong.  It's _very_ unlikely that the IRQ
will still go off for this conversion sometime in the future.

To me, total modifications to what's landed already ought to be:

* Change timeout to long (from unsigned long)

* Make sure we return errors (negative results) from
wait_for_completion_interruptible_timeout() properly.

* If we get back a value of 0 from
wait_for_completion_interruptible_timeout() then we should print a
warning and attempt machinations to reset the ADC.  Without ever
seeing real-world situtations that would cause a real timeout these
machinations would be a bit of a guess (is resetting the adc useful
when it's more likely that someone accidentally messed with the clock
tree or power gated the ADC?)...  ...or perhaps a warning and a TODO
in the code would be enough?

Thoughts?

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

2013-04-03 Thread Borislav Petkov

On Wed, Apr 03, 2013 at 11:53:24AM -0500, Jacob Shin wrote:
> Then Thomas, Boris, would it be acceptable if enable the frequency
> feedback feature by default with a sane powersave_bias tunable value?
> And also add proper documentation for both vanila powersave_bias
> and powersave_bias with AMD frequency sensitivity loaded to
> Documentation/cpu-freq/ondemand ?

Yeah, this was what I was proposing, basically. The only question here
is, would anyone want to disable freq decisions on systems with hw
feedback? If yes, then you'd need to be able to disable the feedback
thing, maybe have a magic value for powersave_bias...

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NVIDIA RAID don't work with kernel 3.8

2013-04-03 Thread Илья

Hi!

I have a motherboard ASUS M2N32-SLI DELUXE with chipset NVIDIA nForce
® 590 SLI ™ MCP (defined as the MCP55) with built-in hardware RAID
which I have configured as a RAID 0 of 4-HDD

When using kernel 3.8 in the openSUSE repository
http://download.opensuse.org/repositories/Kernel:/stable/standard/x86_64/kernel-vanilla-3.8.5-1.1.x86_64.rpm
I no longer identificate raid with a message at boot

ERROR: nvidia: wrong # of devices in RAID set "nvidia_ieigdchd" [1/4]
on /dev/sda
ERROR: removing inconsistent RAID set "nvidia_ieigdchd"
ERROR: no RAID set found

In this case, the base kernel 3.7.10 works fine.

Is maybe it some kind of boot options, or editing boot scripts to
solve the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm, x86: Do not zero hugetlbfs pages at boot. -v2

2013-04-03 Thread Robin Holt

On Wed, Apr 03, 2013 at 04:02:47PM +0200, Michal Hocko wrote:
> On Tue 02-04-13 21:43:44, Robin Holt wrote:
> [...]
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ca9a7c6..7683f6a 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1185,7 +1185,7 @@ int __weak alloc_bootmem_huge_page(struct hstate *h)
> > while (nr_nodes) {
> > void *addr;
> >  
> > -   addr = __alloc_bootmem_node_nopanic(
> > +   addr = __alloc_bootmem_node_nopanic_notzeroed(
> > NODE_DATA(hstate_next_node_to_alloc(h,
> > &node_states[N_MEMORY])),
> > huge_page_size(h), huge_page_size(h), 0);
> 
> Ohh, and powerpc seems to have its own opinion how to allocate huge
> pages. See arch/powerpc/mm/hugetlbpage.c

Do I need to address their allocations?  Can I leave that part of the
changes as something powerpc can address if they are affected by this?

Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 2/2] cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor

2013-04-03 Thread Jacob Shin

On Tue, Apr 02, 2013 at 11:01:24PM +0200, Borislav Petkov wrote:
> On Tue, Apr 02, 2013 at 10:51:51PM +0200, Thomas Renninger wrote:
> > powersave_bias is undocumented in Documentation/cpu-freq/...
> > I guess its use-case is for people who want to get some percent more
> > power savings out of their laptop and do not care of the one or other
> > percent performance.
> > In fact I would like to get rid of this extra code and I expect nobody 
> > would 
> > miss it.
> > I might miss a configuration tool where someone went through the code,
> > documented things and allows users to set powersave_bias values through
> > some /etc/* config files.
> > Yep, if you want anyone to make use of this, it should better get
> > embedded in more general, at least general ondemand code.
> 
> Yeah, it all sounds like we want to enable this by default on systems
> which support it. Maybe with an off-switch for people who want plain
> ondemand decisions.
> 
> The remaining systems with ripped out powersave_bias would get plain
> ondemand governor decisions. Provided, of course, nobody uses
> powersave_bias and the functionality doesn't make any sense anyway.

Rafael, any thoughts on removing powersave_bias altogether ?

If we remove it, then is it acceptable to add an alternate callback/
handler registration to ondemand governor to account for hardware
feedback ?

Or, if we don't want to remove powersave_bias,

Then Thomas, Boris, would it be acceptable if enable the frequency
feedback feature by default with a sane powersave_bias tunable value ?
And also add proper documentation for both vanila powersave_bias and
powersave_bias with AMD frequency sensitivity loaded to
Documentation/cpu-freq/ondemand ?

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

2013-04-03 Thread Zheng Liu

On 04/04/2013 12:46 AM, Christian Kujau wrote:
> On Wed, 3 Apr 2013 at 15:02, Dmitry Monakhov wrote:
>> Good news big endian cpu owners
>> Please try following patches(second is most important):
>> http://patchwork.ozlabs.org/patch/233396/
>> http://patchwork.ozlabs.org/patch/233397/
>> I hope this should fix all known issues
> 
> Zheng Liu also sent a patch:
> 
>   [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> 
> When I try to apply all three of those to 3.9-4c4, the 2nd one from Dmitry 
> fails:
> 
> $ cat ~/dev/002-ext4_fix-cpu_vs_disk-conversions.diff | patch --dry-run -p1
> patching file fs/ext4/extents.c
> Hunk #2 FAILED at 2999.
> Hunk #3 FAILED at 3272.
> Hunk #4 FAILED at 4639.
> 3 out of 4 hunks FAILED -- saving rejects to file fs/ext4/extents.c.rej
> patching file fs/ext4/indirect.c
> Hunk #1 succeeded at 1539 (offset 215 lines).
> patching file fs/ext4/inode.c
> patching file fs/ext4/mmp.c
> patching file fs/ext4/namei.c
> patching file fs/ext4/super.c
> Hunk #1 succeeded at 1951 (offset -3 lines).
> patching file fs/ext4/xattr.c
> patching file include/trace/events/ext4.h
> Hunk #1 succeeded at 1956 (offset 8 lines).
> Hunk #2 succeeded at 2060 (offset 8 lines).
> Hunk #3 succeeded at 2079 (offset 8 lines).
> 
> With only Dimitry's patchesm this happens, to -rc4:
> 
> $ cat ~/dev/001-ext4_fix-usless-declarations.diff | patch -p1
> patching file fs/ext4/ialloc.c
> patching file fs/ext4/ioctl.c
> Hunk #1 succeeded at 359 (offset 4 lines).
> patching file fs/ext4/mballoc.c
> patching file fs/ext4/move_extent.c
> 
> $ cat ~/dev/002-ext4_fix-cpu_vs_disk-conversions.diff | patch --dry-run -p1
> patching file fs/ext4/extents.c
> Hunk #4 FAILED at 4639.
> 1 out of 4 hunks FAILED -- saving rejects to file fs/ext4/extents.c.rej
> patching file fs/ext4/indirect.c
> Hunk #1 succeeded at 1539 (offset 215 lines).
> patching file fs/ext4/inode.c
> patching file fs/ext4/mmp.c
> patching file fs/ext4/namei.c
> patching file fs/ext4/super.c
> Hunk #1 succeeded at 1951 (offset -3 lines).
> patching file fs/ext4/xattr.c
> patching file include/trace/events/ext4.h
> Hunk #1 succeeded at 1956 (offset 8 lines).
> Hunk #2 succeeded at 2060 (offset 8 lines).
> Hunk #3 succeeded at 2079 (offset 8 lines).

I guess that is because Dmitry's patch is against dev branch of ext4
tree.  Please applied my patch.  I think it could fix the bug.  That
would be great if you could give this patch a try [1].

1. http://patchwork.ozlabs.org/patch/233555/

Thanks,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: s390x: kernel BUG at fs/ext4/inode.c:1591! (powerpc too!)

2013-04-03 Thread Dmitry Monakhov

On Wed, 3 Apr 2013 09:46:56 -0700 (PDT), Christian Kujau 
 wrote:
> On Wed, 3 Apr 2013 at 15:02, Dmitry Monakhov wrote:
> > Good news big endian cpu owners
> > Please try following patches(second is most important):
> > http://patchwork.ozlabs.org/patch/233396/
> > http://patchwork.ozlabs.org/patch/233397/
> > I hope this should fix all known issues
> 
> Zheng Liu also sent a patch:
> 
>   [PATCH] ext4: fix a big-endian bug when an extent is zeroed out
> 
> When I try to apply all three of those to 3.9-4c4, the 2nd one from Dmitry 
> fails:
Yes. becase my patch was against ext4.git/dev so just ignore it.
Teodore have sent a patch http://patchwork.ozlabs.org/patch/233555/
This is most probable candidate for final fix.
> 
> $ cat ~/dev/002-ext4_fix-cpu_vs_disk-conversions.diff | patch --dry-run -p1
> patching file fs/ext4/extents.c
> Hunk #2 FAILED at 2999.
> Hunk #3 FAILED at 3272.
> Hunk #4 FAILED at 4639.
> 3 out of 4 hunks FAILED -- saving rejects to file fs/ext4/extents.c.rej
> patching file fs/ext4/indirect.c
> Hunk #1 succeeded at 1539 (offset 215 lines).
> patching file fs/ext4/inode.c
> patching file fs/ext4/mmp.c
> patching file fs/ext4/namei.c
> patching file fs/ext4/super.c
> Hunk #1 succeeded at 1951 (offset -3 lines).
> patching file fs/ext4/xattr.c
> patching file include/trace/events/ext4.h
> Hunk #1 succeeded at 1956 (offset 8 lines).
> Hunk #2 succeeded at 2060 (offset 8 lines).
> Hunk #3 succeeded at 2079 (offset 8 lines).
> 
> With only Dimitry's patchesm this happens, to -rc4:
> 
> $ cat ~/dev/001-ext4_fix-usless-declarations.diff | patch -p1
> patching file fs/ext4/ialloc.c
> patching file fs/ext4/ioctl.c
> Hunk #1 succeeded at 359 (offset 4 lines).
> patching file fs/ext4/mballoc.c
> patching file fs/ext4/move_extent.c
> 
> $ cat ~/dev/002-ext4_fix-cpu_vs_disk-conversions.diff | patch --dry-run -p1
> patching file fs/ext4/extents.c
> Hunk #4 FAILED at 4639.
> 1 out of 4 hunks FAILED -- saving rejects to file fs/ext4/extents.c.rej
> patching file fs/ext4/indirect.c
> Hunk #1 succeeded at 1539 (offset 215 lines).
> patching file fs/ext4/inode.c
> patching file fs/ext4/mmp.c
> patching file fs/ext4/namei.c
> patching file fs/ext4/super.c
> Hunk #1 succeeded at 1951 (offset -3 lines).
> patching file fs/ext4/xattr.c
> patching file include/trace/events/ext4.h
> Hunk #1 succeeded at 1956 (offset 8 lines).
> Hunk #2 succeeded at 2060 (offset 8 lines).
> Hunk #3 succeeded at 2079 (offset 8 lines).
> 
> 
> Christian.
> -- 
> BOFH excuse #451:
> 
> astropneumatic oscillations in the water-cooling
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix big-endian bugs which could cause fs corruptions

2013-04-03 Thread Zheng Liu

On 04/04/2013 12:36 AM, Theodore Ts'o wrote:
> From: Zheng Liu 
> 
> From: Zheng Liu 
> 
> When an extent was zeroed out, we forgot to do convert from cpu to le16.
> It could make us hit a BUG_ON when we try to write dirty pages out.  So
> fix it.
> 
> [ Also fix a bug found by Dmitry Monakhov where we were missing
>   le32_to_cpu() calls in the new indirect punch hole code.
> 
>   There are a number of other big endian warnings found by static code
>   analyzers, but we'll wait for the next merge window to fix them all
>   up.  These fixes are designed to be Obviously Correct by code
>   inspection, and easy to demonstrate that it won't make any
>   difference (and hence, won't introduce any bugs) on little endian
>   architectures such as x86.  --tytso ]
> 
> Signed-off-by: Zheng Liu 
> Signed-off-by: "Theodore Ts'o" 
> Reported-by: CAI Qian 
> Reported-by: Christian Kujau 
> Cc: Dmitry Monakhov 

Looks good to me.

Thanks,
- Zheng

> ---
> 
> This is what I plan to be sending to Linus very shortly.  If anyone
> could  test / review this patch ASAP, I'd really appreciate it, thanks!!
> 
>  fs/ext4/extents.c  | 11 +++
>  fs/ext4/indirect.c |  4 ++--
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 56efcaa..9c6d06d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2999,20 +2999,23 @@ static int ext4_split_extent_at(handle_t *handle,
>   if (split_flag & EXT4_EXT_DATA_VALID1) {
>   err = ext4_ext_zeroout(inode, ex2);
>   zero_ex.ee_block = ex2->ee_block;
> - zero_ex.ee_len = ext4_ext_get_actual_len(ex2);
> + zero_ex.ee_len = cpu_to_le16(
> + ext4_ext_get_actual_len(ex2));
>   ext4_ext_store_pblock(&zero_ex,
> ext4_ext_pblock(ex2));
>   } else {
>   err = ext4_ext_zeroout(inode, ex);
>   zero_ex.ee_block = ex->ee_block;
> - zero_ex.ee_len = ext4_ext_get_actual_len(ex);
> + zero_ex.ee_len = cpu_to_le16(
> + ext4_ext_get_actual_len(ex));
>   ext4_ext_store_pblock(&zero_ex,
> ext4_ext_pblock(ex));
>   }
>   } else {
>   err = ext4_ext_zeroout(inode, &orig_ex);
>   zero_ex.ee_block = orig_ex.ee_block;
> - zero_ex.ee_len = ext4_ext_get_actual_len(&orig_ex);
> + zero_ex.ee_len = cpu_to_le16(
> + 
> ext4_ext_get_actual_len(&orig_ex));
>   ext4_ext_store_pblock(&zero_ex,
> ext4_ext_pblock(&orig_ex));
>   }
> @@ -3272,7 +3275,7 @@ static int ext4_ext_convert_to_initialized(handle_t 
> *handle,
>   if (err)
>   goto out;
>   zero_ex.ee_block = ex->ee_block;
> - zero_ex.ee_len = ext4_ext_get_actual_len(ex);
> + zero_ex.ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex));
>   ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
>  
>   err = ext4_ext_get_access(handle, inode, path + depth);
> diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
> index b505a14..a041831 100644
> --- a/fs/ext4/indirect.c
> +++ b/fs/ext4/indirect.c
> @@ -1539,9 +1539,9 @@ static int free_hole_blocks(handle_t *handle, struct 
> inode *inode,
>   blk = *i_data;
>   if (level > 0) {
>   ext4_lblk_t first2;
> - bh = sb_bread(inode->i_sb, blk);
> + bh = sb_bread(inode->i_sb, le32_to_cpu(blk));
>   if (!bh) {
> - EXT4_ERROR_INODE_BLOCK(inode, blk,
> + EXT4_ERROR_INODE_BLOCK(inode, le32_to_cpu(blk),
>  "Read failure");
>   return -EIO;
>   }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sgy-cts1000: Remove __dev* attributes

2013-04-03 Thread Kumar Gala


On Mar 18, 2013, at 6:19 PM, Ben Collins wrote:

> Somehow the driver snuck in with these still in it.
> 
> Signed-off-by: Ben Collins 
> ---
> arch/powerpc/platforms/85xx/sgy_cts1000.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)

applied to next

- k

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 34/77] xen/blkback: Dont trust the handle from the frontend.

2013-04-03 Thread Konrad Rzeszutek Wilk

On Wed, Apr 03, 2013 at 09:01:06AM -0700, Greg Kroah-Hartman wrote:
> On Wed, Apr 03, 2013 at 04:01:54PM +0200, William Dauchy wrote:
> > On Tue, Mar 12, 2013 at 11:10 PM, Greg Kroah-Hartman
> >  wrote:
> > >> > >> IOW I don't see why this got proposed for stable at all.
> > >> > >
> > >> > > So, you suggest to just drop this patch for v3.8.3, don't you?
> > >> >
> > >> > I do, yes. But I'd suggest to get Konrad to agree.
> > >>
> > >> Yes. Lets drop it.
> > >
> > > Now reverted, thanks.
> > 
> > Seems like still present in 3.4.x branch. Is that a mistake?
> 
> It showed up in 3.4.35, if that's a mistake, and I should revert it,
> please, someone let me know.

Yes. It is a mistake. Please revert it.
> 
> thanks,
> 
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] drivers/block/xen-blkback: preq.dev is used without initialized

2013-04-03 Thread Konrad Rzeszutek Wilk

On Wed, Apr 03, 2013 at 03:34:16PM +0100, Jan Beulich wrote:
> >>> On 03.04.13 at 15:56, William Dauchy  wrote:
> > On Wed, Apr 3, 2013 at 3:42 PM, Jan Beulich  wrote:
> >> ChangeLog-3.8.3 for example has
> > 
> > oh sorry, you are right. I wasn't looking is the 3.8.x branch.
> > The thing is, the revert seems present only in 3.8.x branch. For
> > example in 3.4.x the last patch is still 01c681d
> > Should we consider this normal or is it a mistake?
> 
> I think it is a mistake, but ultimately it's Konrad's call.

It is a mistake. Please revert it if possible.
> 
> Jan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: convert the cpufreq_driver to use the rcu

2013-04-03 Thread Nathan Zimmer


On 04/03/2013 10:32 AM, Viresh Kumar wrote:

Please always mention Version number and history. Not everybody
remembers what changed after last version.

Your right.  I was rushing and forgot.
I need to develop the habit of adding some history to my git commits 
when I amend them.




On 3 April 2013 20:33, Nathan Zimmer  wrote:

We eventually would like to remove the rwlock cpufreq_driver_lock or convert
it back to a spinlock and protect the read sections with RCU.  The first step in

Why do we want to convert it back to spinlock?

Documentation/spinlocks.txt:84
I am not sure why but there is the directive I am following.

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
  bool have_governor_per_policy(void)
  {
-   return cpufreq_driver->have_governor_per_policy;
+   bool have_governor;

Name it have_governor_per_policy, it looks wrong otherwise.


+   rcu_read_lock();
+   have_governor = 
rcu_dereference(cpufreq_driver)->have_governor_per_policy;
+   rcu_read_unlock();
+   return have_governor;
  }

Will do.

  static ssize_t show_scaling_driver(struct cpufreq_policy *policy, char *buf)
  {
-   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n", cpufreq_driver->name);
+   char *name;
+   rcu_read_lock();
+   name = rcu_dereference(cpufreq_driver)->name;
+   rcu_read_unlock();
+   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n", name);
  }

This is the definition of struct cpufreq_driver:

struct cpufreq_driver {
struct module   *owner;
charname[CPUFREQ_NAME_LEN];

...
};

Purpose of rcu read_lock/unlock are to define the rcu critical section
after which rcu layer is free to free the memory allocated to earlier
instance of cpufreq_driver.

So, after the unlock() call you _should_not_ use the memory allocated to
cpufreq_driver instance. And here, you are using memory allocated to name[]
after the unlock() call.

Ok I'll fix this spot.


Which looks to be wrong... I left other parts of driver upto you to fix for this
"rule of thumb".
In places like show_bios_limit and cpufreq_add_dev_interface we know 
that the memory will still

be there since the cpufreq_driver->owner is held.


Sorry for not pointing this earlier but rcu is as new to me as it is
to you. I know
you must be frustrated with so many versions of this patch, and everytime we
get a new problem to you... Don't get disheartened with it.. Keep the good work
going :)
Making a learners mistake isn't really discouraging to me, even when I 
do it twice.



--
viresh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ext4: fix big-endian bugs which could cause fs corruptions

2013-04-03 Thread Theodore Ts'o

From: Zheng Liu 

From: Zheng Liu 

When an extent was zeroed out, we forgot to do convert from cpu to le16.
It could make us hit a BUG_ON when we try to write dirty pages out.  So
fix it.

[ Also fix a bug found by Dmitry Monakhov where we were missing
  le32_to_cpu() calls in the new indirect punch hole code.

  There are a number of other big endian warnings found by static code
  analyzers, but we'll wait for the next merge window to fix them all
  up.  These fixes are designed to be Obviously Correct by code
  inspection, and easy to demonstrate that it won't make any
  difference (and hence, won't introduce any bugs) on little endian
  architectures such as x86.  --tytso ]

Signed-off-by: Zheng Liu 
Signed-off-by: "Theodore Ts'o" 
Reported-by: CAI Qian 
Reported-by: Christian Kujau 
Cc: Dmitry Monakhov 
---

This is what I plan to be sending to Linus very shortly.  If anyone
could  test / review this patch ASAP, I'd really appreciate it, thanks!!

 fs/ext4/extents.c  | 11 +++
 fs/ext4/indirect.c |  4 ++--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 56efcaa..9c6d06d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2999,20 +2999,23 @@ static int ext4_split_extent_at(handle_t *handle,
if (split_flag & EXT4_EXT_DATA_VALID1) {
err = ext4_ext_zeroout(inode, ex2);
zero_ex.ee_block = ex2->ee_block;
-   zero_ex.ee_len = ext4_ext_get_actual_len(ex2);
+   zero_ex.ee_len = cpu_to_le16(
+   ext4_ext_get_actual_len(ex2));
ext4_ext_store_pblock(&zero_ex,
  ext4_ext_pblock(ex2));
} else {
err = ext4_ext_zeroout(inode, ex);
zero_ex.ee_block = ex->ee_block;
-   zero_ex.ee_len = ext4_ext_get_actual_len(ex);
+   zero_ex.ee_len = cpu_to_le16(
+   ext4_ext_get_actual_len(ex));
ext4_ext_store_pblock(&zero_ex,
  ext4_ext_pblock(ex));
}
} else {
err = ext4_ext_zeroout(inode, &orig_ex);
zero_ex.ee_block = orig_ex.ee_block;
-   zero_ex.ee_len = ext4_ext_get_actual_len(&orig_ex);
+   zero_ex.ee_len = cpu_to_le16(
+   
ext4_ext_get_actual_len(&orig_ex));
ext4_ext_store_pblock(&zero_ex,
  ext4_ext_pblock(&orig_ex));
}
@@ -3272,7 +3275,7 @@ static int ext4_ext_convert_to_initialized(handle_t 
*handle,
if (err)
goto out;
zero_ex.ee_block = ex->ee_block;
-   zero_ex.ee_len = ext4_ext_get_actual_len(ex);
+   zero_ex.ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex));
ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
 
err = ext4_ext_get_access(handle, inode, path + depth);
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index b505a14..a041831 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1539,9 +1539,9 @@ static int free_hole_blocks(handle_t *handle, struct 
inode *inode,
blk = *i_data;
if (level > 0) {
ext4_lblk_t first2;
-   bh = sb_bread(inode->i_sb, blk);
+   bh = sb_bread(inode->i_sb, le32_to_cpu(blk));
if (!bh) {
-   EXT4_ERROR_INODE_BLOCK(inode, blk,
+   EXT4_ERROR_INODE_BLOCK(inode, le32_to_cpu(blk),
   "Read failure");
return -EIO;
}
-- 
1.7.12.rc0.22.gcdd159b

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-03 Thread Greg KH

On Tue, Apr 02, 2013 at 09:48:18PM -0700, Jacob Pan wrote:
> > Let's step back and start over, what exactly are you trying to tell
> > userspace?  What data do you have that you need to express to it?  How
> > do you want userspace to see/use it?
> 
> It is a good idea to step back and let me explain what I wanted to
> do here for userspace.
> 
> I have two kinds of applications that might use this driver.
> 1. simple use case where user sets a power limit for a RAPL domain.
> e.g. set graphics unit power limit to 7w
> 2. advanced use case where use can do fine tuning on top of simple
> power limit,e.g. the dynamic response parameters of power control
> logic, event notifications, etc.
> 
> For #1, this driver register with the abstract generic thermal layer
> (/sys/class/thermal) and presents itself as a set of cooling devices
> with a single knob per domain for power limits.
> root@chromoly:/sys/class/thermal/cooling_device15# echo 7000 > cur_state 

Great, how about submitting that functionality as patch 1 of your
series?  That seems like a very "normal" thermal driver, right?

> For #2, to give userspace complete control of the RAPL interface, which
> is not generic, I put them under the device private sysfs area.
> root@chromoly:/sys/class/thermal/cooling_device15/device# echo 1000 > 
> time_window1 

I totally fail to understand the difference.  What do you want to show
to userspace that can't be expressed through the thermal interface
today?  Perhaps the thermal interface could be expanded to provide more
functionality that you need?  Why create a one-off API that will never
be used again and require userspace programs to be written just to
handle this one type of device?

> As you mentioned about using device tree vs. fs, and how kobject are
> used for fs. I do have the need to go between a generic thermal sysfs
> and the true device tree. This is the reason why I used kobjects and
> link them between device tree and its thermal sysfs representation.

I don't understand your leap to using kobjects.

> e.g. a RAPL package cooling device linked with its platform device
> kobj. (device is linked with rapl_domains/package, the line is too long)
> 
> root@chromoly:/sys/class/thermal# ls -l cooling_device15/
> total 0
> -rw-r--r-- 1 root root 4096 Apr  2 15:03 cur_state
> lrwxrwxrwx 1 root root0 Apr  2 21:28 device
> -> ../../../platform/intel_rapl/rapl_domains/package
> -r--r--r-- 1 root root 4096 Apr  2 15:03 max_state
> drwxr-xr-x 2 root root0 Apr  2 21:28 power
> lrwxrwxrwx 1 root root0 Apr  2 15:03 subsystem
> -> ../../../../class/thermal
> -r--r--r-- 1 root root 4096 Apr  2 15:03 type
> -rw-r--r-- 1 root root 4096 Apr  2 15:03 uevent

I still don't understand.  What are you adding here, the device symlink?
Or something else?

> For userspace which is not satisfied with the simple use case of a
> single knob for setting power limit, it can follow the link to find the
> device tree entry. Then get access to the complete knobs, including
> event notifications.

And what is in that device directory?  What is rapl_domains?  Why isn't
that a normal 'struct device'?

Still confused.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/6] Generic PHY Framework

2013-04-03 Thread David Miller

From: Kishon Vijay Abraham I 
Date: Wed, 3 Apr 2013 12:05:30 +0530

> This patch series is about drivers/phy which will be used for now by
> usb, sata and maybe some video PHY's. Network itself has a
> comprehensive PHY in drivers/net/phy which we'd like to merge it with
> drivers/phy so that we have all the phy stuff in drivers/phy but we
> are not planning too far ahead here.

Ok, that makes sense, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 5/5] hwmon: add ST-Ericsson ABX500 hwmon driver

2013-04-03 Thread Guenter Roeck

On Wed, Apr 03, 2013 at 08:18:12PM +0800, Hongbo Zhang wrote:
> Each of ST-Ericsson X500 chip set series consists of both ABX500 and DBX500
> chips. This is ABX500 hwmon driver, where the abx500.c is a common layer for
> all ABX500s, and the ab8500.c is specific for AB8500 chip. Under this designed
> structure, other chip specific files can be added simply using the same common
> layer abx500.c.
> 
> Signed-off-by: Hongbo Zhang 

Acked-by: Guenter Roeck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 4/5] ab8500: power: export abx500_res_to_temp tables for hwmon

2013-04-03 Thread Guenter Roeck

On Wed, Apr 03, 2013 at 08:18:11PM +0800, Hongbo Zhang wrote:
> This patch exports the thermistor resistance-to-temperature tables, so that 
> the
> hwmon driver can access them, and also adds the corresponding table size
> variables.
> 
> Signed-off-by: Hongbo Zhang 

Acked-by: Guenter Roeck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: divider: Introduce CLK_DIVIDER_ALLOW_ZERO flag

2013-04-03 Thread Sören Brinkmann

On Wed, Apr 03, 2013 at 08:40:27AM -0700, Mike Turquette wrote:
> Quoting Soren Brinkmann (2013-04-02 15:36:56)
> > Dividers which have CLK_DIVIDER_ONE_BASED set have a redundant state,
> > being a divider value of zero. Some hardware implementations allow a
> > zero divider which simply doesn't alter the frequency. I.e. it acts like
> > a divide by one or bypassing the divider.
> > This flag is used to handle such HW in the clk-divider model.
> > 
> > Signed-off-by: Soren Brinkmann 
> 
> This version looks good.  Taken into clk-next.
Thanks.

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-03 Thread Greg KH

On Tue, Apr 02, 2013 at 05:17:14PM -0700, Jacob Pan wrote:
> On Tue, 2 Apr 2013 16:48:05 -0700
> Greg KH  wrote:
> 
> > On Tue, Apr 02, 2013 at 04:33:57PM -0700, Jacob Pan wrote:
> > > On Tue, 2 Apr 2013 16:00:42 -0700
> > > Greg KH  wrote:
> > > 
> > > > > +#include "intel_rapl.h"
> > > > > +#include "../../../fs/sysfs/sysfs.h"  
> > > > 
> > > > WTF?
> > > > 
> > > > Oh, that's a sure sign you are not doing something properly, if
> > > > you think it's ok to muck around with the internals of sysfs.
> > > > 
> > > > There's a reason that file is "private", why do you think it's ok
> > > > to use it directly?  Did you just think that I somehow "forgot"
> > > > to put it in the proper include directory?
> > > I did feel unsure about this but i saw some precedence in the
> > > kernel.
> > 
> > Someone else is doing this with the sysfs api?  I don't see any other
> > code in Linus's tree doing this at the moment, where did you see this?
> > Let me know and I'll fix it up right away.
> > 
> no, i did not mean sysfs api. I mean include internal header files via
> #include ../../ 
> e.g.in drivers/usb/image/microtek.c
> 
> #include "../../scsi/scsi.h"
> #include 

That is because this is a scsi host driver.  Your code is not part of
sysfs itself.

> > > Anyway, I needed a way to validate a userspace file passed to rapl
> > > driver belong to the same sysfs directory. I will look for
> > > alternative ways.
> > 
> > What do you mean by this?  What exactly are you trying to do?  No
> > normal driver code should _ever_ call sysfs functions directly, nor
> > should they ever care about sysfs internals.
> > 
> i did not call sysfs internal calls, just need to use 
> struct sysfs_dirent {}
> 
> to do the following sanity check against user passed event control file,
> it is still not a 100% strong check. 
>   /* check if the cfile belongs to the same rapl domain */
>   if (strcmp(rd->kobj.sd->s_name,
>   cfile->f_dentry->d_parent->d_name.name)) {
>   pr_debug("cfile does not belong to domain %s\n",
>   rd->kobj.sd->s_name);
>   ret = -EINVAL;
>   goto exit_cleanup_fds;
>   }

This made it through a code review at Intel?  Seriously?  Come on,
there's just so much wrong here, I don't know where to begin.

Hint, if you find yourself caring about the internals of sysfs in a
device driver, you are doing something so wrong it's not funny.  Do you
see _any_ other driver doing anything like this?  What makes this driver
so special that it can do unexpected, and totally different things with
sysfs?

> > And, odds are, you didn't test your code as a module, right, as any
> > internal sysfs function that you could get from this .h file, wouldn't
> > be exported for a module to use, unless I missed one somewhere?
> > 
> I did run the driver as module since i didn't use sysfs internal
> functions, just the struct. I may be hitting a corner case here, but
> for drivers who need to discover sysfs hierarchy would it be useful to
> expose some info in struct sysfs_dirent{}?

No, not at all, why would a driver ever care about that?  Somehow we
have gotten by for the past 10+ years without needing it, why is your
driver so different than the thousands of other Linux drivers?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] nfsd4: Fix NULL dereference in legacy_recdir_name_error()

2013-04-03 Thread Takashi Iwai

The recent rewrite of NFSv4 recovery client tracking options per net
(commit 9a9c6478) introduced Oops when it faces an error for recdir
generation.

  NFSD: unable to generate recoverydir name (-2).
  NFSD: disabling legacy clientid tracking. Reboot recovery will not function 
correctly!
  BUG: unable to handle kernel NULL pointer dereference at 07a8
  IP: [] nfsd4_client_tracking_exit+0x17/0x70 [nfsd]
  PGD 0
  Oops:  [#1] PREEMPT SMP
  Modules linked in: nfsd fuse nfsv3 nfs_acl nfsv4 auth_rpcgss nfs lockd sunrpc 
cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi 
snd_hda_codec_realtek intel_powerclamp acpi_cpufreq mperf coretemp 
ghash_clmulni_intel aesni_intel kvm_intel snd_hda_intel ablk_helper 
snd_hda_codec snd_hwdep kvm snd_pcm cryptd lrw aes_x86_64 snd_timer xts 
gf128mul e1000e snd sr_mod iTCO_wdt microcode cdrom usb_storage dcdbas 
iTCO_vendor_support i2c_i801 cdc_acm sg ptp lpc_ich mei soundcore pps_core 
mfd_core snd_page_alloc pciehp pci_hotplug autofs4 btrfs raid6_pq zlib_deflate 
xor libcrc32c i915 crc32c_intel drm_kms_helper drm xhci_hcd i2c_algo_bit 
thermal button video processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw 
scsi_dh_emc scsi_dh_alua scsi_dh
  CPU 1
  Pid: 19567, comm: nfsd Not tainted 3.9.0-rc5-test+ #3 Dell Inc. OptiPlex 
9010/0M9KCM
  RIP: 0010:[]  [] 
nfsd4_client_tracking_exit+0x17/0x70 [nfsd]
  RSP: 0018:880181099c28  EFLAGS: 00010202
  RAX: 8801810900c0 RBX: 0004 RCX: 0006
  RDX: 0007 RSI: 0046 RDI: 
  RBP: 880181099c38 R08: 000a R09: 039f
  R10:  R11: 039e R12: 
  R13: 81a87280 R14: 88014c819220 R15: 88020b75d200
  FS:  () GS:88021e24() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 07a8 CR3: 01a0d000 CR4: 001407e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: 0ff0 DR7: 0400
  Process nfsd (pid: 19567, threadinfo 880181098000, task 8801810900c0)
  Stack:
   fffe 88020b75d200 880181099c58 a060c75c
   81a87280 880002ba7000 880181099cc8 a060cb37
   880181099d20 88014c819220 0001 88020b75d200
  Call Trace:
   [] legacy_recdir_name_error+0x3c/0x40 [nfsd]
   [] nfsd4_create_clid_dir+0xe7/0x200 [nfsd]
   [] ? nfs4_preprocess_seqid_op+0x63/0x160 [nfsd]
   [] nfsd4_client_record_create+0x5f/0x80 [nfsd]
   [] nfsd4_open_confirm+0x12f/0x1b0 [nfsd]
   [] nfsd4_proc_compound+0x55f/0x770 [nfsd]
   [] nfsd_dispatch+0xdd/0x220 [nfsd]
   [] svc_process_common+0x328/0x6d0 [sunrpc]
   [] svc_process+0x10c/0x160 [sunrpc]
   [] nfsd+0xbf/0x130 [nfsd]
   [] ? nfsd_destroy+0x90/0x90 [nfsd]
   [] kthread+0xbb/0xc0
   [] ? kthread_create_on_node+0x130/0x130
   [] ret_from_fork+0x7c/0xb0
   [] ? kthread_create_on_node+0x130/0x130
  Code: e0 49 8b 84 24 48 01 00 00 e9 25 ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 
41 54 49 89 fc 53 8b 1d 44 b4 00 00 e8 bb a9 a5 e0 85 db <49> 8b 84 24 a8 07 00 
00 74 43 3b 18 77 3f 83 eb 01 48 63 db 48
  RIP  [] nfsd4_client_tracking_exit+0x17/0x70 [nfsd]
   RSP 
  CR2: 07a8
  ---[ end trace 5dd4307598e98cef ]---

This patch fixes it by passing the proper net instance instead of
NULL.

Signed-off-by: Takashi Iwai 
Cc:  [v3.8+]
---
 fs/nfsd/nfs4recover.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 899ca26..ae0d5c9 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -146,7 +146,7 @@ out_no_tfm:
  * then disable recovery tracking.
  */
 static void
-legacy_recdir_name_error(int error)
+legacy_recdir_name_error(struct net *net, int error)
 {
printk(KERN_ERR "NFSD: unable to generate recoverydir "
"name (%d).\n", error);
@@ -160,8 +160,7 @@ legacy_recdir_name_error(int error)
printk(KERN_ERR "NFSD: disabling legacy clientid tracking. "
"Reboot recovery will not function correctly!\n");
 
-   /* the argument is ignored by the legacy exit function */
-   nfsd4_client_tracking_exit(NULL);
+   nfsd4_client_tracking_exit(net);
}
 }
 
@@ -184,7 +183,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
 
status = nfs4_make_rec_clidname(dname, &clp->cl_name);
if (status)
-   return legacy_recdir_name_error(status);
+   return legacy_recdir_name_error(clp->net, status);
 
status = nfs4_save_creds(&original_cred);
if (status < 0)
@@ -341,7 +340,7 @@ nfsd4_remove_clid_dir(struct nfs4_client *clp)
 
status = nfs4_make_rec_clidname(dname, &clp->cl_name);
if (status)
-   return legacy_recdir_name_error(status);
+   retur

Re: [PATCH 3.8-stable] thermal: return an error on failure to register thermal

2013-04-03 Thread Greg KH

On Wed, Apr 03, 2013 at 09:38:55AM +0900, Jonghwan Choi wrote:
> 3.8-stable review patch.  If anyone has any objections, please let me know.

Ok, I object, given that I told you to change the format of the message
you send out.  Please look at how others on the stable@ mailing list
send "should this patch be applied" messages, and copy how that is done.

> --
> 
> From: "Richard Guy Briggs "
> 
> commit da28d966f6aa942ae836d09729f76a1647932309 upstream.
> 
> The return code from the registration of the thermal class is used to
> unallocate resources, but this failure isn't passed back to the caller of
> thermal_init.  Return this failure back to the caller.
> 
> This bug was introduced in changeset 4cb18728 which overwrote the return
> code
> when the variable was re-used to catch the return code of the registration
> of
> the genetlink thermal socket family.

You linewrapped the patch message, careful, you might mess up the patch
itself as well :(

> Signed-off-by: Richard Guy Briggs 
> Signed-off-by: Zhang Rui 

You aren't signing off on it as well?  Why not?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] ARM: keep __my_cpu_offset consistent with generic one

2013-04-03 Thread Russell King - ARM Linux

On Tue, Mar 12, 2013 at 10:56:38AM +, Russell King - ARM Linux wrote:
> On Tue, Mar 12, 2013 at 10:32:21AM +0800, Ming Lei wrote:
> > On Thu, Mar 7, 2013 at 9:35 PM, Ming Lei  wrote:
> > > Commit 14318efb(ARM: 7587/1: implement optimized percpu variable access)
> > > introduces arm's __my_cpu_offset to optimize percpu vaiable access,
> > > which really works well on hackbench, but will cause __my_cpu_offset
> > > to return garbage value before it is initialized in cpu_init() called
> > > by setup_arch, so accessing percpu variable before setup_arch may cause
> > > kernel hang. But generic __my_cpu_offset always returns zero before
> > > percpu area is brought up, and won't hang kernel.
> > >
> > > So the patch tries to clear __my_cpu_offset on boot CPU early
> > > to avoid boot hang.
> > >
> > > At least now percpu variable is accessed by lockdep before
> > > setup_arch(), and enabling CONFIG_LOCK_STAT or CONFIG_DEBUG_LOCKDEP
> > > can trigger kernel hang.
> > >
> > > Cc: Peter Zijlstra 
> > > Cc: Ingo Molnar 
> > > Cc: Rob Herring 
> > > Cc: Will Deacon 
> > > Cc: Nicolas Pitre 
> > > Cc: Russell King 
> > > Signed-off-by: Ming Lei 
> > > ---
> > > V1:
> > > - documents lockdep uses percpu variable early
> > 
> > Looks no one objects the patch, so I has submitted it into Russell's
> > patch system, and hope it can be pushed to linus tree soon and
> > make LOCK_STAT/DEBUG_LOCKDEP usable on ARMv7.
> 
> I'm not convinced it is correct.  Is the percpu data as stored in the
> kernel image (in other words, at offset zero) supposed to be read only?
> If so, the above means that we'll be accessing that rather than the
> copy of the percpu data we should be accessing.
> 
> The percpu data areas are allocated by setup_per_cpu_areas() - that's
> where we should be initializing this, just like it's done on PowerPC.

Still not convinced this is a proper fix.  Look, the problem is this:

- Initially, set the CPU percpu offset to zero.  This means the boot
  CPU reads and writes to the percpu data section in the kernel image.

- The percpu areas are initialized, and the percpu data copied to each
  percpu data - this will have any writes from the boot CPU included as
  changes to the percpu data.

- The boot CPU continues to read/write to the percpu data section.

- If the boot CPU suspends/resumes, cpu_init() gets called, which will
  call set_my_cpu_offset(per_cpu_offset(cpu)); for the boot CPU.

- The boot CPU now uses the allocated percpu data section and any
  updates it did in the percpu data section in the kernel image are
  lost to it.

Your patch may be right on its own to solve the initial problem, but
it leaves a _big_ hole.

Now, the big question here: is it right that the boot CPU should ever
write to the static percpu data section in the kernel image?  What if
there's a pointer in there, initially NULL, which then gets checked
by each CPU and initialized if NULL - we'll end up sharing the same
allocation amongst all CPUs, which probably isn't what was intended.
If there's a list_head which gets added to, that too will be very bad.

Although you have uncovered a problem, I still think by setting the
offset to zero initially, you're just papering over a much bigger
can of worms.

So, should percpu data be used this early in boot before the percpu
stuff is properly initialized?  That feels _extremely_ unsafe.

This, I think, needs to be addressed properly.  And part of that is
knowing where things went wrong.  Will Deacon asked you for a backtrace
showing where this problem occured.  Your response seems to be to
resend the patch with a "v1" tag a no new information.

Sorry, not applying this until the above issue has been discussed
and the location of these percpu accesses has been properly analysed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pinctrl: tegra: add suspend-resume support

2013-04-03 Thread Linus Walleij

On Thu, Mar 28, 2013 at 6:11 PM, Bibek Basu  wrote:

Hm I recognize this name :-)

> This patch adds suspend and resume callbacks to the pinctrl-tegra driver.

Please be more verbose. How is this achieved? I have to
guess what the code is doing..

> +#ifdef CONFIG_PM_SLEEP
> +
> +static int pinctrl_suspend(void)
> +{
> +   int i, j;
> +   u32 *pg_data = pmx->pg_data;
> +   u32 *regs;
> +
> +   for (i = 0; i < pmx->nbanks; i++) {
> +   regs = pmx->regs[i];
> +   for (j = 0; j < pmx->regs_size[i] / 4; j++)
> +   *pg_data++ = readl(regs++);
> +   }
> +   return 0;
> +}
> +
> +static void pinctrl_resume(void)
> +{
> +   int i, j;
> +   u32 *pg_data = pmx->pg_data;
> +   u32 *regs;
> +
> +   for (i = 0; i < pmx->nbanks; i++) {
> +   regs = pmx->regs[i];
> +   for (j = 0; j < pmx->regs_size[i] / 4; j++)
> +   writel(*pg_data++, regs++);
> +   }
> +}
> +
> +static struct syscore_ops pinctrl_syscore_ops = {
> +   .suspend = pinctrl_suspend,
> +   .resume = pinctrl_resume,
> +};
> +
> +#endif
(...)
> +#ifdef CONFIG_PM_SLEEP
> +   register_syscore_ops(&pinctrl_syscore_ops);
> +#endif

So Stephen already commented that syscore ops is maybe too big
a sledgehammer for a fine-granular problem.

I mainly want to know what is happening above, it looks like
a state save/restore for all registers or something like this?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kbuild: Avoid DTB rebuilds if source files are untouched

2013-04-03 Thread Stephen Warren

On 04/03/2013 01:14 AM, Vineet Gupta wrote:
> forgot to CC linux-arch
> 
> On 04/03/2013 12:42 PM, Vineet Gupta wrote:
>> Currently, for every ARC kernel build I see the following:
>>
>> --->8-
>>   DTBarch/arc/boot/dts/angel4.dtb.S
>>   AS  arch/arc/boot/dts/angel4.dtb.o
>>   LD  arch/arc/boot/dts/built-in.o
>> rm arch/arc/boot/dts/angel4.dtb.S<-- forces rebuild next iter
>>   CHK kernel/config_data.h
>> --->8-

I assume that's because the file is an intermediate file, and only built
due to a chain of build rules, and hence make clean it up itself after
the build?

>> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib

>> +.PRECIOUS: $(obj)/%.dtb.S
>> +
>>  $(obj)/%.dtb.S: $(obj)/%.dtb
>>  $(call cmd,dt_S_dtb)

I'm not sure if .PRECIOUS is correct here. That prevents make from
deleting the file if make is CTRL-C'd in the middle of generating it.
Couldn't that leave a stale/corrupt file around that'd break the build.
Judging by:

http://www.gnu.org/software/make/manual/html_node/Special-Targets.html

I think .SECONDARY might be a better choice? Does that solve the problem
you're seeing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH bugfix 3.9] PCI: Don't try to disable Bus Master on disconnected PCI devices

2013-04-03 Thread Bjorn Helgaas

[+cc linux-pci]

On Mon, Apr 1, 2013 at 2:00 AM, Konstantin Khlebnikov
 wrote:
> BUMP. This is degradation from 3.8, so this patch must be in 3.9.
>
> I still don't like this forced clearing bus-master bit. But this hack
> definitely fixes problems in kexec, so there is reason to keep it here.

Applied to for-linus for v3.9, thanks!

> Konstantin Khlebnikov wrote:
>>
>> This is fix for commit 7897e6022761ace7377f0f784fca059da55f5d71 from
>> v3.9-rc1
>> ("PCI: Disable Bus Master unconditionally in pci_device_shutdown()")
>> in turn that was fix for b566a22c23327f18ce941ffad0ca907e50a53d41 from
>> v3.5-rc1
>> ("PCI: disable Bus Master on PCI device shutdown")
>>
>> Unfortunately fixing one bug uncovers another:
>> ->shutdown() callback might switch device to deep sleep state.
>> PCI config space no longer available after that.
>>
>> Link: https://lkml.org/lkml/2013/3/12/529
>> Signed-off-by: Konstantin Khlebnikov
>> Reported-and-Tested-by: Vivek Goyal
>> Cc: Bjorn Helgaas
>> Cc: Rafael J. Wysocki
>> ---
>>   drivers/pci/pci-driver.c |5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 1fa1e48..79277fb 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -390,9 +390,10 @@ static void pci_device_shutdown(struct device *dev)
>>
>> /*
>>  * Turn off Bus Master bit on the device to tell it to not
>> -* continue to do DMA
>> +* continue to do DMA. Don't touch devices in D3cold or unknown
>> states.
>>  */
>> -   pci_clear_master(pci_dev);
>> +   if (pci_dev->current_state <= PCI_D3hot)
>> +   pci_clear_master(pci_dev);
>>   }
>>
>>   #ifdef CONFIG_PM
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 9/9] uretprobes: Documentation update

2013-04-03 Thread Anton Arapov

add the uretprobe syntax and update an example

Signed-off-by: Anton Arapov 
---
 Documentation/trace/uprobetracer.txt | 114 ---
 1 file changed, 67 insertions(+), 47 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index 24ce682..d9c3e68 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -1,6 +1,8 @@
-   Uprobe-tracer: Uprobe-based Event Tracing
-   =
- Documentation written by Srikar Dronamraju
+Uprobe-tracer: Uprobe-based Event Tracing
+=
+
+   Documentation written by Srikar Dronamraju
+
 
 Overview
 
@@ -13,78 +15,94 @@ current_tracer. Instead of that, add probe points via
 /sys/kernel/debug/tracing/events/uprobes//enabled.
 
 However unlike kprobe-event tracer, the uprobe event interface expects the
-user to calculate the offset of the probepoint in the object
+user to calculate the offset of the probepoint in the object.
 
 Synopsis of uprobe_tracer
 -
-  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS]   : Set a probe
+  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe
+  r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe 
(uretprobe)
+  -:[GRP/]EVENT  : Clear uprobe or uretprobe 
event
 
- GRP   : Group name. If omitted, use "uprobes" for it.
- EVENT : Event name. If omitted, the event name is generated
- based on SYMBOL+offs.
- PATH  : path to an executable or a library.
- SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+  GRP   : Group name. If omitted, "uprobes" is the default value.
+  EVENT : Event name. If omitted, the event name is generated based
+  on SYMBOL+offs.
+  PATH  : Path to an executable or a library.
+  SYMBOL[+offs] : Symbol+offset where the probe is inserted.
 
- FETCHARGS : Arguments. Each probe can have up to 128 args.
-  %REG : Fetch register REG
+  FETCHARGS : Arguments. Each probe can have up to 128 args.
+   %REG : Fetch register REG
 
 Event Profiling
 ---
- You can check the total number of probe hits and probe miss-hits via
+You can check the total number of probe hits and probe miss-hits via
 /sys/kernel/debug/tracing/uprobe_profile.
- The first column is event name, the second is the number of probe hits,
+The first column is event name, the second is the number of probe hits,
 the third is the number of probe miss-hits.
 
 Usage examples
 --
-To add a probe as a new event, write a new definition to uprobe_events
-as below.
+ * Add a probe as a new uprobe event, write a new definition to uprobe_events
+as below: (sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash)
+
+echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Add a probe as a new uretprobe event:
+
+echo 'r: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+
+ * Unset registered event:
 
-  echo 'p: /bin/bash:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
+echo '-:bash_0x4245c0' >> /sys/kernel/debug/tracing/uprobe_events
 
- This sets a uprobe at an offset of 0x4245c0 in the executable /bin/bash
+ * Print out the events that are registered:
 
-  echo > /sys/kernel/debug/tracing/uprobe_events
+cat /sys/kernel/debug/tracing/uprobe_events
 
- This clears all probe points.
+ * Clear all events:
 
-The following example shows how to dump the instruction pointer and %ax
-a register at the probed text address.  Here we are trying to probe
-function zfree in /bin/zsh
+echo > /sys/kernel/debug/tracing/uprobe_events
+
+Following example shows how to dump the instruction pointer and %ax register
+at the probed text address. Probe zfree function in /bin/zsh:
 
 # cd /sys/kernel/debug/tracing/
-# cat /proc/`pgrep  zsh`/maps | grep /bin/zsh | grep r-xp
+# cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp
 0040-0048a000 r-xp  08:03 130904 /bin/zsh
 # objdump -T /bin/zsh | grep -w zfree
 00446420 gDF .text  0012  Basezfree
 
-0x46420 is the offset of zfree in object /bin/zsh that is loaded at
-0x0040. Hence the command to probe would be :
+  0x46420 is the offset of zfree in object /bin/zsh that is loaded at
+  0x0040. Hence the command to uprobe would be:
+
+# echo 'p:zfree_entry /bin/zsh:0x46420 %ip %ax' > uprobe_events
+
+  And the same for the uretprobe would be:
 
-# echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
+# echo 'r:zfree_exit /bin/zsh:0x46420 %ip %ax' >> uprobe_events
 
-Please note: User has to explicitly calculate the offset of the probepoint
+Please note: User has to explicitly calculate the offset of the probe-point
 in the

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-03 Thread Michael S. Tsirkin

On Tue, Apr 02, 2013 at 08:05:21PM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 02, 2013 at 09:57:38AM -0700, Roland Dreier wrote:
> > On Tue, Apr 2, 2013 at 8:51 AM, Michael S. Tsirkin  wrote:
> > >> At the moment registering an MR breaks COW.  This breaks memory
> > >> overcommit for users such as KVM: we have a lot of COW pages, e.g.
> > >> instances of the zero page or pages shared using KSM.
> > >>
> > >> If the application does not care that adapter sees stale data (for
> > >> example, it tracks writes reregisters and resends), it can use a new
> > >> IBV_ACCESS_GIFT flag to prevent registration from breaking COW.
> > >>
> > >> The semantics are similar to that of SPLICE_F_GIFT thus the name.
> > >>
> > >> Signed-off-by: Michael S. Tsirkin 
> > >
> > > Roland, Michael is yet to test this but could you please
> > > confirm whether this looks acceptable to you?
> > 
> > The patch itself is reasonable I guess, given the needs of this particular 
> > app.
> > 
> > I'm not particularly happy with the name of the flag.  The analogy
> > with SPLICE_F_GIFT doesn't seem particularly strong and I'm not
> > convinced even the splice flag name is very understandable.  But in
> > the RDMA case there's not really any sense in which we're "gifting"
> > memory to the adapter -- we're just telling the library "please don't
> > trigger copy-on-write" and it doesn't seem particularly easy for users
> > to understand that from the flag name.
> > 
> >  - R.
> 
> The point really is that any writes by application
> won't be seen until re-registration, right?
> OK, what's a better name?  IBV_ACCESS_NON_COHERENT?
> Please tell me what is preferable and we'll go ahead with it.

Um. ping? We are at -rc5 and things need to fall into place
if we are to have it in 3.10 ...

> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sysfs: check if one entry has been removed before freeing

2013-04-03 Thread Greg Kroah-Hartman

On Wed, Apr 03, 2013 at 03:05:37PM +0800, Ming Lei wrote:
> On Wed, Apr 3, 2013 at 1:35 PM, Greg Kroah-Hartman
>  wrote:
> > On Wed, Apr 03, 2013 at 11:52:39AM +0800, Ming Lei wrote:
> >> On Wed, Apr 3, 2013 at 11:04 AM, Dave Jones  wrote:
> >> > On Wed, Apr 03, 2013 at 10:58:23AM +0800, Ming Lei wrote:
> >> >
> >> >  > diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> >> >  > index 1bf016b..328ef9b 100644
> >> >  > --- a/fs/sysfs/dir.c
> >> >  > +++ b/fs/sysfs/dir.c
> >> >  > @@ -268,6 +268,13 @@ void release_sysfs_dirent(struct sysfs_dirent * 
> >> > sd)
> >> >  >   */
> >> >  >  parent_sd = sd->s_parent;
> >> >  >
> >> >  > +if (unlikely(!(sd->s_flags & SYSFS_FLAG_REMOVED))) {
> >> >  > +printk(KERN_ERR "sysfs: free using entry: %s/%s\n",
> >> >  > +parent_sd ? parent_sd->s_name : "",
> >> >  > +sd->s_name);
> >> >  > +BUG();
> >> >  > +}
> >> >
> >> > Please use WARN instead of BUG.  For an in-ram filesystem like
> >> > sysfs, there's no real reason to lock-up the machine in this way
> >> > making it harder to debug.
> >>
> >> If WARN is used, the freed memory will be allocated to other
> >> kernel components, then sysfs may change the memory and cause
> >> destruction, so maybe it is better to use BUG to stop kernel.
> >
> > No, it's never ok to call BUG(), sorry, please fix this.
> 
> Sorry, could you explain it in a bit detail? IMO, it is really a bug
> when code runs here, and there are much similar BUG_ON()
> uses in current sysfs code too.

Then make it a WARN() call, like David said, to give us a chance to get
the report from a user so we can fix it.  If the machine crashes after
that, fine, but hopefully we will get a oops report out of it.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] watchdog: fix w83627hf_wdt clear timeout expired

2013-04-03 Thread Guenter Roeck

On Wed, Apr 03, 2013 at 08:50:26AM -0700, Guenter Roeck wrote:
> On Wed, Apr 03, 2013 at 08:06:59AM -0700, Tony Chung wrote:
> > On Tue, Apr 2, 2013 at 9:21 PM, Guenter Roeck  wrote:
> > 
> > >
> > >
> > > What is the exact chip type in your system ? I want to have a look into 
> > > the
> > > datasheet; maybe I can find out how it can trigger without causing a 
> > > reset.
> > 
> > Winbond 83627HF chip
> > 

Followup: what chip revision ?

Revision G or later have a new configuration bit, bit 3 of CR E7 on logical
device A.

SELWDTORST. Watch Dog Timer Reset Control.
= 0 is reset by LPC_RST.
= 1 is reset by PWR_OK.

I could imagine that the WDT logic is never correctly initialized in your
system, which might explain the behavior. If so, your code is indeed
correct (or the best I could come up with too), as we would have to ensure
that the wdt subsystem is initialized correctly by writing into all its 
registers.

Given that, I would suggest to re-submit the patch with a different explanation
(we don't know if the wdt really started running, all we know is that the
expired bit is set), and I'll give it an Acked-by. Something along the line of 

"Observed that the Watchdog Timer Status bit can be set when the driver is
 loaded. Reset it during initialization. The time-out value must be set to 0
 explicitly in this case to prevent an immediate reset".

Thanks,
Guenter

> > I believe BIOS has watchdog disabled otherwise it would have reboot the box.
> > However, the timer just start counting.
> > 
> > Comparing to ipmi_watchdog, you can do this:
> > modprobe ipmi_watchdog ... start_now=0 ...action=<>  nowayout=1
> > 
> > So it is possible to load the driver without start counting.
> > 
> That is a different driver, though. you don't have the start_now option here.
> 
> > Notice it is an else, so t is actually 0 already (i.e. expired or
> > never start running):
> 
> Still no idea why that would cause the system to reboot when you reset
> the trigger without setting t to 0 again (or why the system doesn't reset
> in the first place if the watchdog already triggered).
> 
> I am not really sure what the best approach is here, so let's leave it
> up to the maintainer to decide which way to go.
> 
> Thanks,
> Guenter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 1/9] uretprobes: Introduce uprobe_consumer->ret_handler()

2013-04-03 Thread Anton Arapov

Enclose return probes implementation, introduce ->ret_handler() and update
existing code to rely on ->handler() *and* ->ret_handler() for uprobe and
uretprobe respectively.

v1 changes:
* add bp_vaddr argument to ->ret_handler()

RFCv5 changes:
* don't remove uprobe in case there are no uprobe consumer(handler),
  see handler_chain() changes.

RFCv3 changes: (the patch is introduced in v3)
* check whether at least one of the consumer's handlers were set.
* a 'TODO' cap that will be removed once return probes be implemented.
* introduce ->ret_handler().

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  3 +++
 kernel/events/uprobes.c | 17 ++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 02b83db..4042cad 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -46,6 +46,9 @@ enum uprobe_filter_ctx {
 
 struct uprobe_consumer {
int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+   int (*ret_handler)(struct uprobe_consumer *self,
+   unsigned long func,
+   struct pt_regs *regs);
bool (*filter)(struct uprobe_consumer *self,
enum uprobe_filter_ctx ctx,
struct mm_struct *mm);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 21d8a65..27c964b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -815,6 +815,14 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
struct uprobe *uprobe;
int ret;
 
+   /* Uprobe must have at least one set consumer */
+   if (!uc->handler && !uc->ret_handler)
+   return -EINVAL;
+
+   /* TODO: Implement return probes */
+   if (uc->ret_handler)
+   return -ENOSYS;
+
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
@@ -1473,10 +1481,13 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
 
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
-   int rc = uc->handler(uc, regs);
+   int rc = 0;
 
-   WARN(rc & ~UPROBE_HANDLER_MASK,
-   "bad rc=0x%x from %pf()\n", rc, uc->handler);
+   if (uc->handler) {
+   rc = uc->handler(uc, regs);
+   WARN(rc & ~UPROBE_HANDLER_MASK,
+   "bad rc=0x%x from %pf()\n", rc, uc->handler);
+   }
remove &= rc;
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 2/9] uretprobes: Reserve the first slot in xol_vma for trampoline

2013-04-03 Thread Anton Arapov

Allocate trampoline page, as the very first one in uprobed
task xol area, and fill it with breakpoint opcode.

Also introduce get_trampoline_vaddr() helper, to wrap the
trampoline address extraction from area->vaddr. That removes
confusion and eases the debug experience in case ->vaddr
notion will be changed.

v1 changes:
* rework get_trampoline_vaddr() helper.
* init xol_area->slot_count.

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 27c964b..d3c8201 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1109,6 +1109,7 @@ static struct xol_area *get_xol_area(void)
 {
struct mm_struct *mm = current->mm;
struct xol_area *area;
+   uprobe_opcode_t insn = UPROBE_SWBP_INSN;
 
area = mm->uprobes_state.xol_area;
if (area)
@@ -1126,7 +1127,12 @@ static struct xol_area *get_xol_area(void)
if (!area->page)
goto free_bitmap;
 
+   /* allocate first slot of task's xol_area for the return probes */
+   set_bit(0, area->bitmap);
+   copy_to_page(area->page, 0, &insn, UPROBE_SWBP_INSN_SIZE);
+   atomic_set(&area->slot_count, 1);
init_waitqueue_head(&area->wq);
+
if (!xol_add_vma(area))
return area;
 
@@ -1323,6 +1329,25 @@ static struct uprobe_task *get_utask(void)
return current->utask;
 }
 
+/*
+ * Current area->vaddr notion assume the trampoline address is always
+ * equal area->vaddr.
+ *
+ * Returns -1 in case the xol_area is not allocated.
+ */
+static unsigned long get_trampoline_vaddr(void)
+{
+   struct xol_area *area;
+   unsigned long trampoline_vaddr = -1;
+
+   area = current->mm->uprobes_state.xol_area;
+   smp_read_barrier_depends();
+   if (area)
+   trampoline_vaddr = area->vaddr;
+
+   return trampoline_vaddr;
+}
+
 /* Prepare to single-step probed instruction out of line. */
 static int
 pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 0/9] uretprobes: Return uprobes implementation

2013-04-03 Thread Anton Arapov

Hello All!

Uretprobes' core implementation. Enables a function's return probes in uprobe-
based event tracing.

Patchset introduce additional handler (ret_handler) in uprobe consumer that
defines uretprobe.

There is a regular uprobe with return probe handler behind every uretprobe.
Once hit the uprobe that has ret_handler set, we hijack the return address of
the probed function and replace it with the address of trampoline. Trampoline
is a preallocated page in probed task's xol area that filled with breakpoint
opcode. In turn, when the return breakpoint is hit, we invoke the ret_handler.

The patchset shouldn't be difficult to read and hopefully the comments to
commits will help. Please, review.

patchset in git:
  http://github.com/arapov/linux-aa/commits/uretprobes_v1

previous versions:
  v0: https://lkml.org/lkml/2013/3/22/218

RFC reviews:
  RFCv4: https://lkml.org/lkml/2013/3/4/246
  RFCv3: https://lkml.org/lkml/2013/2/28/148
  RFCv2: https://lkml.org/lkml/2013/1/9/157
  RFCv1: https://lkml.org/lkml/2012/12/21/133

thanks,
Anton.

Anton Arapov (9):
  uretprobes: Introduce uprobe_consumer->ret_handler()
  uretprobes: Reserve the first slot in xol_vma for trampoline
  uretprobes/x86: Hijack return address
  uretprobes/ppc: Hijack return address
  uretprobes: Return probe entry, prepare_uretprobe()
  uretprobes: Return probe exit, invoke handlers
  uretprobes: Limit the depth of return probe nestedness
  uretprobes: Remove -ENOSYS as return probes implemented
  uretprobes: Documentation update
 
 Documentation/trace/uprobetracer.txt | 126 +-
 arch/powerpc/include/asm/uprobes.h   |   1 +
 arch/powerpc/kernel/uprobes.c|  13 +++
 arch/x86/include/asm/uprobes.h   |   1 +
 arch/x86/kernel/uprobes.c|  29 +
 include/linux/uprobes.h  |   7 ++
 kernel/events/uprobes.c  | 202 +--
 7 files changed, 320 insertions(+), 59 deletions(-)

-- 
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 6/9] uretprobes: Return probe exit, invoke handlers

2013-04-03 Thread Anton Arapov

Uretprobe handlers are invoked when the trampoline is hit, on completion the
trampoline is replaced with the saved return address and the uretprobe instance
deleted.

v1 changes:
* pass bp_vaddr to ret_handler()
* simplify handle_uretprobe()

RFCv6 changes:
* rework handle_uretprobe()

RFCv5 changes:
* switch to simply linked list ->return_uprobes
* rework handle_uretprobe()

RFCv4 changes:
* check, whether utask is not NULL in handle_uretprobe()
* get rid of area->rp_trampoline_vaddr
* minor handle_uretprobe() fixups

RFCv3 changes:
* protected uprobe with refcounter. See put_uprobe() in handle_uretprobe()
  that reflects increment in prepare_uretprobe()

RFCv2 changes:
* get rid of ->return_consumers member from struct uprobe, introduce
  ret_handler() in consumer instead

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 60 -
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 08ecfff..d129c1d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1609,6 +1609,57 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
up_read(&uprobe->register_rwsem);
 }
 
+static void
+handler_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
+{
+   struct uprobe *uprobe = ri->uprobe;
+   struct uprobe_consumer *uc;
+
+   down_read(&uprobe->register_rwsem);
+   for (uc = uprobe->consumers; uc; uc = uc->next) {
+   if (uc->ret_handler)
+   uc->ret_handler(uc, ri->func, regs);
+   }
+   up_read(&uprobe->register_rwsem);
+}
+
+static bool handler_uretprobe(struct pt_regs *regs)
+{
+   struct uprobe_task *utask;
+   struct return_instance *ri, *tmp;
+   bool chained;
+
+   utask = current->utask;
+   if (!utask)
+   return false;
+
+   ri = utask->return_instances;
+   if (!ri)
+   return false;
+
+   instruction_pointer_set(regs, ri->orig_ret_vaddr);
+
+   for (;;) {
+   handler_uretprobe_chain(ri, regs);
+
+   chained = ri->chained;
+   put_uprobe(ri->uprobe);
+
+   tmp = ri;
+   ri = ri->next;
+   kfree(tmp);
+
+   if (!chained)
+   break;
+
+   BUG_ON(!ri);
+   }
+
+   utask->return_instances = ri;
+
+   return true;
+}
+
 /*
  * Run handler and ask thread to singlestep.
  * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1620,8 +1671,15 @@ static void handle_swbp(struct pt_regs *regs)
int uninitialized_var(is_swbp);
 
bp_vaddr = uprobe_get_swbp_addr(regs);
-   uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
+   if (bp_vaddr == get_trampoline_vaddr()) {
+   if (handler_uretprobe(regs))
+   return;
 
+   pr_warn("uprobe: unable to handle uretprobe pid/tgid=%d/%d\n",
+   current->pid, current->tgid);
+   }
+
+   uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {
if (is_swbp > 0) {
/* No matching uprobe; signal SIGTRAP. */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 7/9] uretprobes: Limit the depth of return probe nestedness

2013-04-03 Thread Anton Arapov

Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have  "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.

Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().

The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.

In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  3 +++
 kernel/events/uprobes.c | 11 +++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 5f8960e..d7bcf10 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -38,6 +38,8 @@ struct inode;
 #define UPROBE_HANDLER_REMOVE  1
 #define UPROBE_HANDLER_MASK1
 
+#define MAX_URETPROBE_DEPTH64
+
 enum uprobe_filter_ctx {
UPROBE_FILTER_REGISTER,
UPROBE_FILTER_UNREGISTER,
@@ -72,6 +74,7 @@ struct uprobe_task {
struct arch_uprobe_task autask;
 
struct return_instance  *return_instances;
+   unsigned intdepth;
struct uprobe   *active_uprobe;
 
unsigned long   xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d129c1d..489f5e3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1381,6 +1381,13 @@ static void prepare_uretprobe(struct uprobe *uprobe, 
struct pt_regs *regs)
if (!utask)
return;
 
+   if (utask->depth >= MAX_URETPROBE_DEPTH) {
+   printk_ratelimited(KERN_INFO "uprobe: omit uretprobe due to"
+   " nestedness limit pid/tgid=%d/%d\n",
+   current->pid, current->tgid);
+   return;
+   }
+
ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
if (!ri)
goto fail;
@@ -1416,6 +1423,8 @@ static void prepare_uretprobe(struct uprobe *uprobe, 
struct pt_regs *regs)
ri->orig_ret_vaddr = orig_ret_vaddr;
ri->chained = chained;
 
+   utask->depth++;
+
/* add instance to the stack */
ri->next = utask->return_instances;
utask->return_instances = ri;
@@ -1652,6 +1661,8 @@ static bool handler_uretprobe(struct pt_regs *regs)
if (!chained)
break;
 
+   utask->depth--;
+
BUG_ON(!ri);
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 8/9] uretprobes: Remove -ENOSYS as return probes implemented

2013-04-03 Thread Anton Arapov

Enclose return probes implementation.

Signed-off-by: Anton Arapov 
---
 kernel/events/uprobes.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 489f5e3..9af52f7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -828,10 +828,6 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
if (!uc->handler && !uc->ret_handler)
return -EINVAL;
 
-   /* TODO: Implement return probes */
-   if (uc->ret_handler)
-   return -ENOSYS;
-
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 4/9] uretprobes/ppc: Hijack return address

2013-04-03 Thread Anton Arapov

Hijack the return address and replace it with a trampoline address.
PowerPC implementation.

Signed-off-by: Anton Arapov 
---
 arch/powerpc/include/asm/uprobes.h |  1 +
 arch/powerpc/kernel/uprobes.c  | 13 +
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index b532060..2301602 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -51,4 +51,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
struct pt_regs *regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs *regs);
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index bc77834..567b975 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -188,3 +188,16 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 
return false;
 }
+
+unsigned long
+arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct 
pt_regs *regs)
+{
+   unsigned long orig_ret_vaddr;
+
+   orig_ret_vaddr = regs->link;
+
+   /* Replace the return addr with trampoline addr */
+   regs->link = trampoline_vaddr;
+
+   return orig_ret_vaddr;
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v1 5/9] uretprobes: Return probe entry, prepare_uretprobe()

2013-04-03 Thread Anton Arapov

When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.

* Return instances are kept as stack per uprobed task.
* Return instance is chained, when the original return address is
  trampoline's page vaddr (e.g. recursive call of the probed function).

v1 changes:
* preserve address of the breakpoint in return_instance.
* don't forget NULLify return_instances on free_utask.
* simplify prepare_uretprobe().

RFCv6 changes:
* rework prepare_uretprobe() logic in order to make further unwinding
  in handler_uretprobe() simplier.
* introduce the 'dirty' field.

RFCv5 changes:
* switch from hlist to simply linked list for tracking ->*return_uprobes.
* preallocate first slot xol_area for return probes, see xol_get_area()
  changes.
* add get_trampoline_vaddr() helper, to emphasize area->vaddr overload.

RFCv4 changes:
* get rid of area->rp_trampoline_vaddr as it always the same as ->vaddr.
* cleanup ->return_uprobes list in uprobe_free_utask(), because the
  task can exit from inside the ret-probe'd function(s).
* in find_active_uprobe(): Once we inserted "int3" we must ensure that
  handle_swbp() will be called even if this uprobe goes away. We have
  the reference but it only protects uprobe itself, it can't protect
  agains delete_uprobe().
  IOW, we must ensure that uprobe_pre_sstep_notifier() can't return 0.

RFCv3 changes:
* protected uprobe with refcounter. See atomic_inc in prepare_uretprobe()
  and put_uprobe() in a following patch in handle_uretprobe().

RFCv2 changes:
* get rid of ->return_consumers member from struct uprobe, introduce
  ret_handler() in consumer.

Signed-off-by: Anton Arapov 
---
 include/linux/uprobes.h |  1 +
 kernel/events/uprobes.c | 92 -
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 4042cad..5f8960e 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -71,6 +71,7 @@ struct uprobe_task {
enum uprobe_task_state  state;
struct arch_uprobe_task autask;
 
+   struct return_instance  *return_instances;
struct uprobe   *active_uprobe;
 
unsigned long   xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d3c8201..08ecfff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -75,6 +75,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct return_instance {
+   struct uprobe   *uprobe;
+   unsigned long   func;
+   unsigned long   orig_ret_vaddr; /* original return address */
+   boolchained;/* true, if instance is nested 
*/
+
+   struct return_instance  *next;  /* keep as stack */
+};
+
 /*
  * valid_vma: Verify if the specified vma is an executable vma
  * Relax restrictions while unregistering: vm_flags might have
@@ -1294,6 +1303,7 @@ unsigned long __weak uprobe_get_swbp_addr(struct pt_regs 
*regs)
 void uprobe_free_utask(struct task_struct *t)
 {
struct uprobe_task *utask = t->utask;
+   struct return_instance *ri, *tmp;
 
if (!utask)
return;
@@ -1301,6 +1311,15 @@ void uprobe_free_utask(struct task_struct *t)
if (utask->active_uprobe)
put_uprobe(utask->active_uprobe);
 
+   ri = utask->return_instances;
+   while (ri) {
+   tmp = ri;
+   ri = ri->next;
+
+   put_uprobe(tmp->uprobe);
+   kfree(tmp);
+   }
+
xol_free_insn_slot(t);
kfree(utask);
t->utask = NULL;
@@ -1348,6 +1367,65 @@ static unsigned long get_trampoline_vaddr(void)
return trampoline_vaddr;
 }
 
+static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+{
+   struct return_instance *ri;
+   struct uprobe_task *utask;
+   unsigned long orig_ret_vaddr, trampoline_vaddr;
+   bool chained = false;
+
+   if (!get_xol_area())
+   return;
+
+   utask = get_utask();
+   if (!utask)
+   return;
+
+   ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
+   if (!ri)
+   goto fail;
+
+   trampoline_vaddr = get_trampoline_vaddr();
+   orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, 
regs);
+   if (orig_ret_vaddr == -1)
+   goto fail;
+
+   /*
+* We don't want to keep trampoline address in stack, rather keep the
+* original return address of first caller thru all the consequent
+* instances. This also makes breakpoint unwrapping easier.
+*/
+   if (orig_ret_vaddr == trampoline_vaddr) {
+   if (!utask->return_instances) {
+   /*
+* This situation is not possible. Likely

Re: [PATCH] xen: drop tracking of IRQ vector

2013-04-03 Thread Stefano Stabellini

On Wed, 3 Apr 2013, Jan Beulich wrote:
> For quite a few Xen versions, this wasn't the IRQ vector anymore
> anyway, and it is not being used by the kernel for anything. Hence
> drop the field from struct irq_info, and respective function
> parameters.
> 
> Signed-off-by: Jan Beulich 
> Cc: Stefano Stabellini 


Acked-by: Stefano Stabellini 

> ---
>  arch/x86/pci/xen.c   |6 +++---
>  drivers/xen/events.c |   13 -
>  include/xen/events.h |3 +--
>  3 files changed, 8 insertions(+), 14 deletions(-)
> 
> --- 3.9-rc5/arch/x86/pci/xen.c
> +++ 3.9-rc5-xen-irq-no-vector/arch/x86/pci/xen.c
> @@ -177,7 +177,7 @@ static int xen_setup_msi_irqs(struct pci
>   goto error;
>   i = 0;
>   list_for_each_entry(msidesc, &dev->msi_list, list) {
> - irq = xen_bind_pirq_msi_to_irq(dev, msidesc, v[i], 0,
> + irq = xen_bind_pirq_msi_to_irq(dev, msidesc, v[i],
>  (type == PCI_CAP_ID_MSIX) ?
>  "pcifront-msi-x" :
>  "pcifront-msi",
> @@ -244,7 +244,7 @@ static int xen_hvm_setup_msi_irqs(struct
>   dev_dbg(&dev->dev,
>   "xen: msi already bound to pirq=%d\n", pirq);
>   }
> - irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq, 0,
> + irq = xen_bind_pirq_msi_to_irq(dev, msidesc, pirq,
>  (type == PCI_CAP_ID_MSIX) ?
>  "msi-x" : "msi",
>  DOMID_SELF);
> @@ -326,7 +326,7 @@ static int xen_initdom_setup_msi_irqs(st
>   }
>  
>   ret = xen_bind_pirq_msi_to_irq(dev, msidesc,
> -map_irq.pirq, map_irq.index,
> +map_irq.pirq,
>  (type == PCI_CAP_ID_MSIX) ?
>  "msi-x" : "msi",
>   domid);
> --- 3.9-rc5/drivers/xen/events.c
> +++ 3.9-rc5-xen-irq-no-vector/drivers/xen/events.c
> @@ -85,8 +85,7 @@ enum xen_irq_type {
>   * event channel - irq->event channel mapping
>   * cpu - cpu this event channel is bound to
>   * index - type-specific information:
> - *PIRQ - vector, with MSB being "needs EIO", or physical IRQ of the HVM
> - *   guest, or GSI (real passthrough IRQ) of the device.
> + *PIRQ - physical IRQ, GSI, flags, and owner domain
>   *VIRQ - virq number
>   *IPI - IPI vector
>   *EVTCHN -
> @@ -105,7 +104,6 @@ struct irq_info {
>   struct {
>   unsigned short pirq;
>   unsigned short gsi;
> - unsigned char vector;
>   unsigned char flags;
>   uint16_t domid;
>   } pirq;
> @@ -211,7 +209,6 @@ static void xen_irq_info_pirq_init(unsig
>  unsigned short evtchn,
>  unsigned short pirq,
>  unsigned short gsi,
> -unsigned short vector,
>  uint16_t domid,
>  unsigned char flags)
>  {
> @@ -221,7 +218,6 @@ static void xen_irq_info_pirq_init(unsig
>  
>   info->u.pirq.pirq = pirq;
>   info->u.pirq.gsi = gsi;
> - info->u.pirq.vector = vector;
>   info->u.pirq.domid = domid;
>   info->u.pirq.flags = flags;
>  }
> @@ -714,7 +710,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gs
>   goto out;
>   }
>  
> - xen_irq_info_pirq_init(irq, 0, pirq, gsi, irq_op.vector, DOMID_SELF,
> + xen_irq_info_pirq_init(irq, 0, pirq, gsi, DOMID_SELF,
>  shareable ? PIRQ_SHAREABLE : 0);
>  
>   pirq_query_unmask(irq);
> @@ -762,8 +758,7 @@ int xen_allocate_pirq_msi(struct pci_dev
>  }
>  
>  int xen_bind_pirq_msi_to_irq(struct pci_dev *dev, struct msi_desc *msidesc,
> -  int pirq, int vector, const char *name,
> -  domid_t domid)
> +  int pirq, const char *name, domid_t domid)
>  {
>   int irq, ret;
>  
> @@ -776,7 +771,7 @@ int xen_bind_pirq_msi_to_irq(struct pci_
>   irq_set_chip_and_handler_name(irq, &xen_pirq_chip, handle_edge_irq,
>   name);
>  
> - xen_irq_info_pirq_init(irq, 0, pirq, 0, vector, domid, 0);
> + xen_irq_info_pirq_init(irq, 0, pirq, 0, domid, 0);
>   ret = irq_set_msi_desc(irq, msidesc);
>   if (ret < 0)
>   goto error_irq;
> --- 3.9-rc5/include/xen/events.h
> +++ 3.9-rc5-xen-irq-no-vector/include/xen/events.h
> @@ -90,8 +90,7 @@ int xen_bind_pirq_gsi_to_irq(unsigned gs
>  int xen_allocate_pirq_msi(struct pci_dev *dev, struct msi_desc *msidesc);
>  /* Bind an PSI pirq to an ir

[PATCH v1 3/9] uretprobes/x86: Hijack return address

2013-04-03 Thread Anton Arapov

Hijack the return address and replace it with a trampoline address.

v1 changes:
* use force_sig_info()
* rework and simplify logic

RFCv5 changes:
* change the fail return code, because orig_ret_vaddr=0 is possible
* style fixup
RFCv2 changes:
* remove ->doomed flag, kill task immediately

Signed-off-by: Anton Arapov 
---
 arch/x86/include/asm/uprobes.h |  1 +
 arch/x86/kernel/uprobes.c  | 29 +
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 8ff8be7..6e51979 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -55,4 +55,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
struct pt_regs *regs);
 extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
 extern int  arch_uprobe_exception_notify(struct notifier_block *self, unsigned 
long val, void *data);
 extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
*regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
trampoline_vaddr, struct pt_regs *regs);
 #endif /* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 0ba4cfb..2ed8459 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -697,3 +697,32 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
send_sig(SIGTRAP, current, 0);
return ret;
 }
+
+unsigned long
+arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct 
pt_regs *regs)
+{
+   int rasize, ncopied;
+   unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
+
+   rasize = is_ia32_task() ? 4 : 8;
+   ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, 
rasize);
+   if (unlikely(ncopied))
+   return -1;
+
+   /* check whether address has been already hijacked */
+   if (orig_ret_vaddr == trampoline_vaddr)
+   return orig_ret_vaddr;
+
+   ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, 
rasize);
+   if (likely(!ncopied))
+   return orig_ret_vaddr;
+
+   if (ncopied != rasize) {
+   pr_err("uprobe: return address clobbered: pid=%d, %%sp=%#lx, "
+   "%%ip=%#lx\n", current->pid, regs->sp, regs->ip);
+
+   force_sig_info(SIGSEGV, SEND_SIG_FORCED, current);
+   }
+
+   return -1;
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 34/77] xen/blkback: Dont trust the handle from the frontend.

2013-04-03 Thread Greg Kroah-Hartman

On Wed, Apr 03, 2013 at 04:01:54PM +0200, William Dauchy wrote:
> On Tue, Mar 12, 2013 at 11:10 PM, Greg Kroah-Hartman
>  wrote:
> >> > >> IOW I don't see why this got proposed for stable at all.
> >> > >
> >> > > So, you suggest to just drop this patch for v3.8.3, don't you?
> >> >
> >> > I do, yes. But I'd suggest to get Konrad to agree.
> >>
> >> Yes. Lets drop it.
> >
> > Now reverted, thanks.
> 
> Seems like still present in 3.4.x branch. Is that a mistake?

It showed up in 3.4.35, if that's a mistake, and I should revert it,
please, someone let me know.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu implementation.

2013-04-03 Thread Yoder Stuart-B08248



> -Original Message-
> From: Sethi Varun-B16395
> Sent: Wednesday, April 03, 2013 12:12 AM
> To: Wood Scott-B07421; Timur Tabi
> Cc: Joerg Roedel; lkml; Kumar Gala; Yoder Stuart-B08248; 
> io...@lists.linux-foundation.org; Benjamin
> Herrenschmidt; linuxppc-...@lists.ozlabs.org
> Subject: RE: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu 
> implementation.
> 
> 
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Wednesday, April 03, 2013 7:23 AM
> > To: Timur Tabi
> > Cc: Joerg Roedel; Sethi Varun-B16395; lkml; Kumar Gala; Yoder Stuart-
> > B08248; io...@lists.linux-foundation.org; Benjamin Herrenschmidt;
> > linuxppc-...@lists.ozlabs.org
> > Subject: Re: [PATCH 5/5 v11] iommu/fsl: Freescale PAMU driver and iommu
> > implementation.
> >
> > On 04/02/2013 08:35:54 PM, Timur Tabi wrote:
> > > On Tue, Apr 2, 2013 at 11:18 AM, Joerg Roedel  wrote:
> > >
> > > > > + panic("\n");
> > > >
> > > > A kernel panic seems like an over-reaction to an access violation.
> > >
> > > We have no way to determining what code caused the violation, so we
> > > can't just kill the process.  I agree it seems like overkill, but what
> > > else should we do?  Does the IOMMU layer have a way for the IOMMU
> > > driver to stop the device that caused the problem?
> >
> > At a minimum, log a message and continue.  Probably turn off the LIODN,
> > at least if it continues to be noisy (otherwise we could get stuck in an
> > interrupt storm as you note).  Possibly let the user know somehow,
> > especially if it's a VFIO domain.
> [Sethi Varun-B16395] Can definitely log the message and disable the LIODN (to 
> avoid an interrupt storm),
> but
> we definitely need a mechanism to inform vfio subsystem about the error. 
> Also, disabling LIODN may not
> be a viable
> option with the new LIODN allocation scheme (where LIODN would be associated 
> with a domain).

I think for phase 1 of this, just log the error, shut down DMA as you described.
We can implement more full featured error management, like notifying vfio
or the VM somehow in the future.

Stuart


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] watchdog: fix w83627hf_wdt clear timeout expired

2013-04-03 Thread Guenter Roeck

On Wed, Apr 03, 2013 at 08:06:59AM -0700, Tony Chung wrote:
> On Tue, Apr 2, 2013 at 9:21 PM, Guenter Roeck  wrote:
> 
> >
> >
> > What is the exact chip type in your system ? I want to have a look into the
> > datasheet; maybe I can find out how it can trigger without causing a reset.
> 
> Winbond 83627HF chip
> 
> I believe BIOS has watchdog disabled otherwise it would have reboot the box.
> However, the timer just start counting.
> 
> Comparing to ipmi_watchdog, you can do this:
> modprobe ipmi_watchdog ... start_now=0 ...action=<>  nowayout=1
> 
> So it is possible to load the driver without start counting.
> 
That is a different driver, though. you don't have the start_now option here.

> Notice it is an else, so t is actually 0 already (i.e. expired or
> never start running):

Still no idea why that would cause the system to reboot when you reset
the trigger without setting t to 0 again (or why the system doesn't reset
in the first place if the watchdog already triggered).

I am not really sure what the best approach is here, so let's leave it
up to the maintainer to decide which way to go.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 2/6] usb: phy: omap-usb2: use the new generic PHY framework

2013-04-03 Thread Felipe Balbi

Hi,

On Wed, Apr 03, 2013 at 02:55:47PM +, Arnd Bergmann wrote:
> On Wednesday 03 April 2013, Felipe Balbi wrote:
> > const ? Maybe provide a:
> > 
> > #define DEFINE_PHY_OPS(name)\
> > const struct phy_ops #name_phy_ops = {
> > 
> > macro ? This will force people to add the const keyword :-)
> 
> Forcing people to use const structures is good, but I think it would be
> better without the macro, just by marking the argument in 
> devm_phy_create() as const.

that won't force the definition of the struct to be const, however. But
I get your point.

-- 
balbi


signature.asc
Description: Digital signature

Re: [PATCH v5 1/6] drivers: phy: add generic PHY framework

2013-04-03 Thread Felipe Balbi

Hi,

On Wed, Apr 03, 2013 at 08:02:52PM +0530, Kishon Vijay Abraham I wrote:
> + ret = -EINVAL;
> + goto err0;
> + }
> +
> + if (!phy_class)
> + phy_core_init();
> >>>
> >>>why don't you setup the class on module_init ? Then this would be a
> >>>terrible error condition here :-)
> >>
> >>This is for the case where the PHY driver gets loaded before the PHY
> >>framework. I could have returned EPROBE_DEFER here instead I thought
> >>will have it this way.
> >
> >looks a bit weird IMO. Is it really possible for PHY to load before ?
> 
> yeah. it actually happened when I tried with beagle and had all the
> modules as built-in. Because twl4030 has subsys_initcall(), it loads
> before PHY framework.

that's a bug in twl4030.

-- 
balbi


signature.asc
Description: Digital signature

Re: [PATCH 0/3] check regulator_enable() return value

2013-04-03 Thread Felipe Balbi

Hi,

On Wed, Apr 03, 2013 at 07:22:38AM -0700, Greg KH wrote:
> On Wed, Apr 03, 2013 at 05:06:22PM +0300, Felipe Balbi wrote:
> > Hi,
> > 
> > On Wed, Apr 03, 2013 at 04:02:24PM +0200, Fabio Baltieri wrote:
> > > While testing your 'next' branch merged with today's next I got some new
> > > warnings, caused by a recently introduced __must_check in:
> > > 
> > > c8801a8 regulator: core: Mark all get and enable calls as __must_check
> > > 
> > > These patches introduces a check for regulator_enable() return value in
> > > all three affected USB phy drivers, and issue a dev_err() in case it
> > > fails.
> > > 
> > > TWL4030 and TWL6030 patches has been build-tested only.
> > 
> > Sorry but I can't change my tree anymore, we can send these during
> > v3.10-rc.
> 
> Really?  You are going to send me a tree that adds build warnings?
> 
> Please don't.

alright, I'll merge these in.

-- 
balbi


signature.asc
Description: Digital signature

Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-03 Thread Phillip Susi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 4/3/2013 7:41 AM, Jens Axboe wrote:
>> Thanks for testing! I don't particularly like this stuff in
>> loop, though. It's quite nasty and depends on other behaviour. It
>> would be prettier if we just had rescan_partitions() do the right
>> thing, and only drop partitions and not rescan if NO_PART_SCAN is
>> set.
>> 
>> Ala the below, dropping the loop change and implementing that
>> change in the core code. Phillip, can you check whether this does
>> the right thing for your bug too?
> 
> Phillip? I'm going to revert the loop change asap, so if you want
> this fixed for 3.10, it's about that time to test it out.

I have not tested it yet, but I am pretty sure it won't work.  It
looks like the patch changes the BLKRRPART path to go ahead and remove
existing partitions when GENHD_FL_NO_PARTSCAN is set.  loop doesn't
issue the BLKRRPART ioctl when !LO_FLAGS_PARTSCAN so this won't help.
 I think loop needs to set GENHD_FL_NO_PARTSCAN and then issue the
ioctl regardless of the LO_FLAGS_PARTSCAN flag to get the partitions
to be removed.  I will try to test tonight.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRXE2dAAoJEJrBOlT6nu75PM0IAIxVmuHdxLPtdtUNPqkU2a1r
QanHb6F43qSbd7l37XlwYgzUlybVlntf1yvKGzh29g3QM0603sFqV1o+mbXd5LI3
b+I5QrQJh90Vou9oVSAxz1Ps/AlZvxVIDv8bRwNhpXcMmaj0EN5R+6pU5L7KU2BU
GFsvajssedFh3XnNskgkR3XlqevI7U7A8VqLRsswl7FJVu7R1s45xP/sQgBWgiUS
P5viykwhje4OTKmu0D7bFKrOVx6O3gK7IHzdOwwT9aWRxuxL+Y9yfBF9nx/xZXkc
I2G09w852KgYDVYUHgW3IfuRo4F+4Y7Mw0Klu4XX5OmEXhselIqhwwTmEKMvEns=
=OLri
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-3.9.0-rc1+: Output from "make kernelrelease"contains incorrect data

2013-04-03 Thread Chris Clayton




On 04/03/13 15:32, Michal Marek wrote:

On 1.4.2013 11:28, Chris Clayton wrote:

Ping!

This is still happening with 3.9-rc5.

[chris:~/kernel/linux]$ make bzImage
...
Kernel: arch/x86/boot/bzImage is ready  (#14)
[chris:~/kernel/linux]$ make kernelrelease
scripts/kconfig/conf --silentoldconfig Kconfig
3.9.0-rc5
[chris:~/kernel/linux]$ make kernelrelease
3.9.0-rc5


You need to run make -s kernelrelease.



Ah, right. I didn't see that announcement. The -s argument was not 
necessary with earlier releases.


Sorry for the noise.

Chris


Michal


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 105/124] af_unix: dont send SCM_CREDENTIAL when dest socket is NULL

2013-04-03 Thread Eric Dumazet

On Wed, 2013-04-03 at 17:10 +0200, Sven Joachim wrote:
> On 2013-04-03 16:00 +0200, Eric Dumazet wrote:

> 
> > It might be a wrong sender (application bug or bad identity), and udevd
> > correctly discards the incoming message.
> 
> How would I find out the culprit?

Change udevd to display the pid as well, and hopefully track the sender.

udevd receives uid and pid in the credentials.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: convert the cpufreq_driver to use the rcu

2013-04-03 Thread Viresh Kumar

Please always mention Version number and history. Not everybody
remembers what changed after last version.

On 3 April 2013 20:33, Nathan Zimmer  wrote:
> We eventually would like to remove the rwlock cpufreq_driver_lock or convert
> it back to a spinlock and protect the read sections with RCU.  The first step 
> in

Why do we want to convert it back to spinlock?

> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

>  bool have_governor_per_policy(void)
>  {
> -   return cpufreq_driver->have_governor_per_policy;
> +   bool have_governor;

Name it have_governor_per_policy, it looks wrong otherwise.

> +   rcu_read_lock();
> +   have_governor = 
> rcu_dereference(cpufreq_driver)->have_governor_per_policy;
> +   rcu_read_unlock();
> +   return have_governor;
>  }

>  static ssize_t show_scaling_driver(struct cpufreq_policy *policy, char *buf)
>  {
> -   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n", 
> cpufreq_driver->name);
> +   char *name;
> +   rcu_read_lock();
> +   name = rcu_dereference(cpufreq_driver)->name;
> +   rcu_read_unlock();
> +   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n", name);
>  }

This is the definition of struct cpufreq_driver:

struct cpufreq_driver {
struct module   *owner;
charname[CPUFREQ_NAME_LEN];

   ...
};

Purpose of rcu read_lock/unlock are to define the rcu critical section
after which rcu layer is free to free the memory allocated to earlier
instance of cpufreq_driver.

So, after the unlock() call you _should_not_ use the memory allocated to
cpufreq_driver instance. And here, you are using memory allocated to name[]
after the unlock() call.

Which looks to be wrong... I left other parts of driver upto you to fix for this
"rule of thumb".

Sorry for not pointing this earlier but rcu is as new to me as it is
to you. I know
you must be frustrated with so many versions of this patch, and everytime we
get a new problem to you... Don't get disheartened with it.. Keep the good work
going :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 2/7] memcg: don't use mem_cgroup_get() when creating a kmemcg cache

2013-04-03 Thread Michal Hocko

On Wed 03-04-13 17:12:21, Li Zefan wrote:
> Use css_get()/css_put() instead of mem_cgroup_get()/mem_cgroup_put().
> 
> Signed-off-by: Li Zefan 
> ---
>  mm/memcontrol.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 43ca91d..dafacb8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3191,7 +3191,7 @@ void memcg_release_cache(struct kmem_cache *s)
>   list_del(&s->memcg_params->list);
>   mutex_unlock(&memcg->slab_caches_mutex);
>  
> - mem_cgroup_put(memcg);
> + css_put(&memcg->css);
>  out:
>   kfree(s->memcg_params);
>  }
> @@ -3350,16 +3350,18 @@ static struct kmem_cache 
> *memcg_create_kmem_cache(struct mem_cgroup *memcg,
>  
>   mutex_lock(&memcg_cache_mutex);
>   new_cachep = cachep->memcg_params->memcg_caches[idx];
> - if (new_cachep)
> + if (new_cachep) {
> + css_put(&memcg->css);
>   goto out;
> + }
>  
>   new_cachep = kmem_cache_dup(memcg, cachep);
>   if (new_cachep == NULL) {
>   new_cachep = cachep;
> + css_put(&memcg->css);
>   goto out;
>   }
>  
> - mem_cgroup_get(memcg);
>   atomic_set(&new_cachep->memcg_params->nr_pages , 0);
>  
>   cachep->memcg_params->memcg_caches[idx] = new_cachep;
> @@ -3449,8 +3451,6 @@ static void memcg_create_cache_work_func(struct 
> work_struct *w)
>  
>   cw = container_of(w, struct create_work, work);
>   memcg_create_kmem_cache(cw->memcg, cw->cachep);
> - /* Drop the reference gotten when we enqueued. */
> - css_put(&cw->memcg->css);
>   kfree(cw);
>  }

You are putting references but I do not see any single css_{try}get
here. /me puzzled.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] block: avoid using uninitialized value in from queue_var_store

2013-04-03 Thread Arnd Bergmann

As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
that use a value generated by queue_var_store independent of whether
that value was set or not.

block/blk-sysfs.c: In function 'queue_store_nonrot':
block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this 
function [-Wmaybe-uninitialized]

Unlike most other such warnings, this one is not a false positive,
writing any non-number string into the sysfs files indeed has
an undefined result, rather than returning an error.

Signed-off-by: Arnd Bergmann 
Cc: Jens Axboe 
---
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 6206a93..5efc5a6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -229,6 +229,8 @@ queue_store_##name(struct request_queue *q, const char 
*page, size_t count) \
unsigned long val;  \
ssize_t ret;\
ret = queue_var_store(&val, page, count);   \
+   if (ret < 0)\
+return ret;\
if (neg)\
val = !val; \
\
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] watchdog: Add Congatec CGEB watchdog driver

2013-04-03 Thread Guenter Roeck

On Wed, Apr 03, 2013 at 05:09:52PM +0200, Sascha Hauer wrote:
[ ... ]

> > 
> > On a side note, if the driver supports devicetree, it might make sense to 
> > call
> > watchdog_init_timeout, since it initializes the timeout from devicetree 
> > data.
> 
> The driver does not support devicetree.
> 
I should have said "the system". Calling watchdog_init_timeout is the only thing
a watchdog driver has to do to support devicetree.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/7] memcg: use css_get in sock_update_memcg()

2013-04-03 Thread Michal Hocko

On Wed 03-04-13 16:58:48, Glauber Costa wrote:
> On 04/03/2013 01:11 PM, Li Zefan wrote:
> > Use css_get/css_put instead of mem_cgroup_get/put.
> > 
> > Note, if at the same time someone is moving @current to a different
> > cgroup and removing the old cgroup, css_tryget() may return false,
> > and sock->sk_cgrp won't be initialized.
> > 
> > Signed-off-by: Li Zefan 
> > ---
> >  mm/memcontrol.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 23d0f6e..43ca91d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -536,15 +536,15 @@ void sock_update_memcg(struct sock *sk)
> >  */
> > if (sk->sk_cgrp) {
> > BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
> > -   mem_cgroup_get(sk->sk_cgrp->memcg);
> > +   css_get(&sk->sk_cgrp->memcg->css);

I am not sure I understand this one. So we have a goup here (which means
that somebody already took a reference on it, right?) and we are taking
another reference. If this is released by sock_release_memcg then who
releases the previous one? It is not directly related to the patch
because this has been done previously already. Could you clarify
Glauber, please?

> > return;
> > }
> >  
> > rcu_read_lock();
> > memcg = mem_cgroup_from_task(current);
> > cg_proto = sk->sk_prot->proto_cgroup(memcg);
> > -   if (!mem_cgroup_is_root(memcg) && memcg_proto_active(cg_proto)) 
> > {
> > -   mem_cgroup_get(memcg);
> > +   if (!mem_cgroup_is_root(memcg) &&
> > +   memcg_proto_active(cg_proto) && css_tryget(&memcg->css)) {
> > sk->sk_cgrp = cg_proto;
> > }
> 
> What happens if this tryget fails ? Won't we leak a reference here? We
> will put regardless when the socket is released, and this may go
> negative. No?
 
AFAICS sock_release_memcg releases the reference only if sk->sk_cgrp and
that one wouldn't be set if css_tryget fails.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 000/124] 3.8.6-stable review

2013-04-03 Thread Greg Kroah-Hartman

On Wed, Apr 03, 2013 at 09:17:33AM -0600, Shuah Khan wrote:
> On Tue, Apr 2, 2013 at 4:09 PM, Greg Kroah-Hartman
>  wrote:
> >
> > This is the start of the stable review cycle for the 3.8.6 release.
> > There are 124 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Apr  4 22:09:55 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.6-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Patches applied cleanly to 3.0.71, 3.4.38, and 3.8.5
> 
> Reviewing patches - will let you know anything odd.
> 
> Compiled and booted on the following systems:
> 
> HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
> 
> dmesgs for all releases look good. No regressions compared to the
> previous dmesgs for each of these releases.
> 
> mips compile problem fixed
> 
> Cross-compile tests results:
> 
> alpha: defconfig passed on all
> arm: defconfig passed on all
> arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.8.y
> c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.8.y.
> powerpc: wii_defconfig passed on all
> sh: defconfig passed on all
> sparc: defconfig passed on all
> tile: tilegx_defconfig passed on all
> mips: defconfig passed on all
> mipsel: defconfig passed on all

Thanks for testing, and confirming that I got the build problem fixed.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v3.8, v3.9] [Regression] brcmsmac: move PHY functions

2013-04-03 Thread Joseph Salisbury


On 04/02/2013 05:02 AM, Piotr Haber wrote:

On 04/01/13 17:18, Joseph Salisbury wrote:

On 04/01/2013 10:42 AM, John W. Linville wrote:

On Fri, Mar 29, 2013 at 03:52:39PM -0400, Joseph Salisbury wrote:

Hi Piotr,

A bug was opened against the Ubuntu kernel[0].  After a kernel
bisect, it was found that reverting the following commit resolved
this bug:

commit b83576341664957978e125f5f5db2f15496980b1
Author: Piotr Haber 
Date:   Wed Nov 28 21:44:09 2012 +0100

  brcmsmac: move PHY functions

The regression was introduced as of v3.8-rc1.  The regression still
exists in v3.9-rc4.

I see that you are the author of this patch, so I wanted to run this
by you.  I was thinking of requesting a revert for v3.9, but I
wanted to get your feedback first.


Thanks,

Joe

[0] http://pad.lv/1131914

I recently reverted b6fc28a1, which is the follow-on to that patch.
The revert is _not_ in 3.9-rc5.

Could you try reverting that patch instead?  Does that fix the issue
for you?

John

Hi John,

Thanks for the response.

Yes, reverting commit b6fc28a1 does resolve this bug.  That is the appropriate 
fix for this issue.
Thanks for the assistance.

Thanks,

Joe





Hi Joe,
could you elaborate a little bit on your failure scenario?
In bug report you say it happens after suspend/resume, are there any other 
scenarios you see this
behaviour? (like disassociation/association without suspend)
Also you mention it comes back after some time - what is the time needed?
We had reports of problems on 4313 with this patch (that's why the revert was 
done) but so far i
assumed it was a total breakdown, in your case this seem like a transient issue.
This only happens after a suspend/resume cycle.  I haven't seen the 
issue happen with disassociation/association without suspend.  After 
suspend, the connection will re-establish after about 15 minutes or so.




One more thing, could you provide info about your hardware by sending me 
contents of:
/brcmsmac/bcma0:0/hardware


board vendor: 144f
board type: 7179
board revision: 1408
board flags: 8402a01
board flags2: 880
firmware revision: 262032c






Kind regards
Piotr




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 105/124] af_unix: dont send SCM_CREDENTIAL when dest socket is NULL

2013-04-03 Thread Greg Kroah-Hartman

On Wed, Apr 03, 2013 at 05:10:40PM +0200, Sven Joachim wrote:
> On 2013-04-03 16:00 +0200, Eric Dumazet wrote:
> 
> > On Wed, 2013-04-03 at 13:41 +0200, Sven Joachim wrote:
> >> On 2013-04-03 00:11 +0200, Greg Kroah-Hartman wrote:
> >> 
> >> > 3.8-stable review patch.  If anyone has any objections, please let me 
> >> > know.
> >> 
> >> I'm seeing several complaints from udevd at boot in both 3.8.6-rc1 and
> >> 3.9-rc5: "udevd[56]: sender uid=65534, message ignored".  Reverting the
> >> patch below on top of 3.8.6-rc1 fixes that.  I'm using udev version 175
> >> here, and 65534 is the uid of user "nobody".
> >
> > And if you use a 3.1 kernel (before commit
> > 16e5726269611b71c930054ffe9b858c1cea88eb) are you seeing this message ?
> 
> No (tested with 3.1.10).
> 
> > It might be a wrong sender (application bug or bad identity), and udevd
> > correctly discards the incoming message.
> 
> How would I find out the culprit?

Try running 'udevadm monitor' as root and see if something shows up
there.

I can't reproduce this here, running a newer version of udev (195),
sorry, I don't have any systems with older udev releases.

Note, someone else posted this same error earlier today on the linux-usb
mailing list, saying that USB storage devices would not automount
anymore.  Does that work properly for you?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: fix rebuild with EFI_STUB enabled

2013-04-03 Thread H. Peter Anvin

On 04/03/2013 07:58 AM, Jan Beulich wrote:
 On 03.04.13 at 16:48, "H. Peter Anvin"  wrote:
>> This looks awesome for 3.10, but getting a minimal fix for 3.9/stable would 
>> be good, too.
> 
> Do you really view this as relevant for stable? Considering that this
> had been this way for a while with apparently no-one having noticed,
> I wouldn't think so. Nor would I see a strong need for this to go into
> 3.9.
> 
> Jan
> 

Since it is a build fix, I think I do.  You never quite know who build
problems is going to bite, and I'd rather just fix them as long as the
fix is suitably small and obvious.  It doesn't exactly help that kbuild
is hideously fragile.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] s390 patches for the 3.9-rc6

2013-04-03 Thread Martin Schwidefsky

Hi Linus,

please pull from the 'for-linus' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git for-linus

to receive the following updates: Just a bunch of bugfixes.

Heiko Carstens (4):
  drivers/Kconfig: add several missing GENERIC_HARDIRQS dependencies
  s390/uaccess: fix clear_user_pt()
  s390/uaccess: fix page table walk
  s390/mm: provide emtpy check_pgt_cache() function

Martin Schwidefsky (1):
  s390/3270: fix minor_start issue

Sebastian Ott (1):
  s390/scm_block: fix printk format string

Wei Yongjun (1):
  s390/scm_blk: fix error return code in scm_blk_init()

 arch/s390/include/asm/pgtable.h |4 +-
 arch/s390/lib/uaccess_pt.c  |   83 ++-
 drivers/dma/Kconfig |1 +
 drivers/media/platform/Kconfig  |2 +-
 drivers/s390/block/scm_blk.c|   11 --
 drivers/s390/block/scm_drv.c|2 +-
 drivers/s390/char/tty3270.c |   16 
 drivers/spi/Kconfig |3 +-
 8 files changed, 79 insertions(+), 43 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 4a29308..4a54431 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -344,6 +344,7 @@ extern unsigned long MODULES_END;
 #define _REGION3_ENTRY_CO  0x100   /* change-recording override*/
 
 /* Bits in the segment table entry */
+#define _SEGMENT_ENTRY_ORIGIN_LARGE ~0xfUL /* large page address   */
 #define _SEGMENT_ENTRY_ORIGIN  ~0x7ffUL/* segment table origin */
 #define _SEGMENT_ENTRY_RO  0x200   /* page protection bit  */
 #define _SEGMENT_ENTRY_INV 0x20/* invalid segment table entry  */
@@ -1531,7 +1532,8 @@ extern int s390_enable_sie(void);
 /*
  * No page table caches to initialise
  */
-#define pgtable_cache_init()   do { } while (0)
+static inline void pgtable_cache_init(void) { }
+static inline void check_pgt_cache(void) { }
 
 #include 
 
diff --git a/arch/s390/lib/uaccess_pt.c b/arch/s390/lib/uaccess_pt.c
index dff631d..466fb33 100644
--- a/arch/s390/lib/uaccess_pt.c
+++ b/arch/s390/lib/uaccess_pt.c
@@ -77,42 +77,69 @@ static size_t copy_in_kernel(size_t count, void __user *to,
  * >= -4095 (IS_ERR_VALUE(x) returns true), a fault has occured and the address
  * contains the (negative) exception code.
  */
-static __always_inline unsigned long follow_table(struct mm_struct *mm,
- unsigned long addr, int write)
+#ifdef CONFIG_64BIT
+static unsigned long follow_table(struct mm_struct *mm,
+ unsigned long address, int write)
 {
-   pgd_t *pgd;
-   pud_t *pud;
-   pmd_t *pmd;
-   pte_t *ptep;
+   unsigned long *table = (unsigned long *)__pa(mm->pgd);
+
+   switch (mm->context.asce_bits & _ASCE_TYPE_MASK) {
+   case _ASCE_TYPE_REGION1:
+   table = table + ((address >> 53) & 0x7ff);
+   if (unlikely(*table & _REGION_ENTRY_INV))
+   return -0x39UL;
+   table = (unsigned long *)(*table & _REGION_ENTRY_ORIGIN);
+   case _ASCE_TYPE_REGION2:
+   table = table + ((address >> 42) & 0x7ff);
+   if (unlikely(*table & _REGION_ENTRY_INV))
+   return -0x3aUL;
+   table = (unsigned long *)(*table & _REGION_ENTRY_ORIGIN);
+   case _ASCE_TYPE_REGION3:
+   table = table + ((address >> 31) & 0x7ff);
+   if (unlikely(*table & _REGION_ENTRY_INV))
+   return -0x3bUL;
+   table = (unsigned long *)(*table & _REGION_ENTRY_ORIGIN);
+   case _ASCE_TYPE_SEGMENT:
+   table = table + ((address >> 20) & 0x7ff);
+   if (unlikely(*table & _SEGMENT_ENTRY_INV))
+   return -0x10UL;
+   if (unlikely(*table & _SEGMENT_ENTRY_LARGE)) {
+   if (write && (*table & _SEGMENT_ENTRY_RO))
+   return -0x04UL;
+   return (*table & _SEGMENT_ENTRY_ORIGIN_LARGE) +
+   (address & ~_SEGMENT_ENTRY_ORIGIN_LARGE);
+   }
+   table = (unsigned long *)(*table & _SEGMENT_ENTRY_ORIGIN);
+   }
+   table = table + ((address >> 12) & 0xff);
+   if (unlikely(*table & _PAGE_INVALID))
+   return -0x11UL;
+   if (write && (*table & _PAGE_RO))
+   return -0x04UL;
+   return (*table & PAGE_MASK) + (address & ~PAGE_MASK);
+}
 
-   pgd = pgd_offset(mm, addr);
-   if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
-   return -0x3aUL;
+#else /* CONFIG_64BIT */
 
-   pud = pud_offset(pgd, addr);
-   if (pud_none(*pud) || unlikely(pud_bad(*pud)))
-   return -0x3bUL;
+static unsigned long follow_table(struct mm_struct *mm,
+ unsigned long address, int write)
+{
+   unsigned l

Re: [PATCH 3/3] watchdog: Add Congatec CGEB watchdog driver

2013-04-03 Thread Sascha Hauer

On Tue, Mar 26, 2013 at 07:33:31AM -0700, Guenter Roeck wrote:
> On Tue, Mar 26, 2013 at 11:16:35AM +0100, Christian Gmeiner wrote:
> > 2013/2/12 Sascha Hauer :
> > > This driver provides support for the CGEB watchdog found on some
> > > Congatec x86 modules.
> > >
> > > Signed-off-by: Sascha Hauer 
> > > ---
> > >  drivers/watchdog/Kconfig  |   10 ++
> > >  drivers/watchdog/Makefile |1 +
> > >  drivers/watchdog/congatec_cgeb_watchdog.c |  161 
> > > +
> > >  3 files changed, 172 insertions(+)
> > >  create mode 100644 drivers/watchdog/congatec_cgeb_watchdog.c
> > >
> > > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> > > index 7f809fd..47138fb 100644
> > > --- a/drivers/watchdog/Kconfig
> > > +++ b/drivers/watchdog/Kconfig
> > > @@ -934,6 +934,16 @@ config SBC_EPX_C3_WATCHDOG
> > >   To compile this driver as a module, choose M here: the
> > >   module will be called sbc_epx_c3.
> > >
> > > +config CONGATEC_CGEB_WATCHDOG
> > > +   depends on CONGATEC_CGEB
> > > +   tristate "Congatec CGEB watchdog"
> > > +   ---help---
> > > + This driver provides support for the watchdogs found on Congatec
> > > + modules with the CGEB BIOS interface.
> > > +
> > > + To compile this driver as a module, choose M here: the
> > > + module will be called congatec_cgeb_wdt.
> > > +
> > >  # M32R Architecture
> > >
> > >  # M68K Architecture
> > > diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
> > > index 97bbdb3a..e67eee5 100644
> > > --- a/drivers/watchdog/Makefile
> > > +++ b/drivers/watchdog/Makefile
> > > @@ -108,6 +108,7 @@ obj-$(CONFIG_W83977F_WDT) += w83977f_wdt.o
> > >  obj-$(CONFIG_MACHZ_WDT) += machzwd.o
> > >  obj-$(CONFIG_SBC_EPX_C3_WATCHDOG) += sbc_epx_c3.o
> > >  obj-$(CONFIG_INTEL_SCU_WATCHDOG) += intel_scu_watchdog.o
> > > +obj-$(CONFIG_CONGATEC_CGEB_WATCHDOG) += congatec_cgeb_watchdog.o
> > >
> > >  # M32R Architecture
> > >
> > > diff --git a/drivers/watchdog/congatec_cgeb_watchdog.c 
> > > b/drivers/watchdog/congatec_cgeb_watchdog.c
> > > new file mode 100644
> > > index 000..b7b6cf5
> > > --- /dev/null
> > > +++ b/drivers/watchdog/congatec_cgeb_watchdog.c
> > > @@ -0,0 +1,161 @@
> > > +/*
> > > + * CGEB watchdog driver
> > > + *
> > > + * (c) 2011 Sascha Hauer, Pengutronix
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License as published by
> > > + * the Free Software Foundation; version 2 of the License.
> > > + *
> > > + * This program is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + */
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#define CGOS_WDOG_MODE_REBOOT_PC0
> > > +#define CGOS_WDOG_MODE_RESTART_OS   1
> > > +#define CGOS_WDOG_MODE_STAGED0x80
> > > +
> > > +#define CGOS_WDOG_OPMODE_DISABLED  0
> > > +#define CGOS_WDOG_OPMODE_ONETIME_TRIG  1
> > > +#define CGOS_WDOG_OPMODE_SINGLE_EVENT  2
> > > +#define CGOS_WDOG_OPMODE_EVENT_REPEAT  3
> > > +
> > > +#define CGOS_WDOG_EVENT_INT 0  /* NMI/IRQ */
> > > +#define CGOS_WDOG_EVENT_SCI 1  /* SMI/SCI */
> > > +#define CGOS_WDOG_EVENT_RST 2  /* system reset */
> > > +#define CGOS_WDOG_EVENT_BTN 3  /* power button */
> > > +
> > > +#define CGOS_WDOG_EVENT_MAX_STAGES 3
> > > +
> > > +struct cgeb_watchdog_stage {
> > > +   unsigned long timeout;
> > > +   unsigned long event;
> > > +};
> > > +
> > > +struct cgeb_watchdog_config {
> > > +   unsigned long size;
> > > +   unsigned long timeout; /* not used in staged mode */
> > > +   unsigned long delay;
> > > +   unsigned long mode;
> > > +   /* optional parameters for staged watchdog */
> > > +   unsigned long op_mode;
> > > +   unsigned long stage_count;
> > > +   struct cgeb_watchdog_stage stages[CGOS_WDOG_EVENT_MAX_STAGES];
> > > +};
> 
> Presumably that is a data structure sent to the board. Just wondering - can 
> the
> driver ever be build as 64 bit driver ? If so, you might want to use u32 
> instead
> of unsigned long.

I don't think that this is used on any 64bit capable hardware. Anyway,
explicitly using u32 here sounds very good. I'll change it.

> > > +static int cgeb_watchdog_start(struct watchdog_device *wdd)
> > > +{
> > > +   struct cgeb_watchdog_priv *priv = watchdog_get_drvdata(wdd);
> > > +
> > > +   return watchdog_set_config(priv, wdd->timeout);
> > > +}
> > > +
> > > +static int cgeb_watchdog_stop(struct watchdog_device *wdd)
> > > +{
> > > +   struct cgeb_watchdog_priv *priv = watchdog_get_drvdata(wdd);
> > > +
> > > +   return watchdog_set_config(priv, 0);
> > > +}
> > > +
> > > +static int cgeb_watchdog_set_timeout(

Re: [PATCH 3/3] watchdog: Add Congatec CGEB watchdog driver

2013-04-03 Thread Sascha Hauer

On Tue, Mar 26, 2013 at 11:16:35AM +0100, Christian Gmeiner wrote:
> 2013/2/12 Sascha Hauer :
> > --
> 
> There seems to be a problem:
> 
> Mar 26 16:11:25 OT kernel: [   80.207514] cgeb-watchdog
> cgeb-watchdog.0: registered
> Mar 26 16:11:38 OT watchdog[2519]: stopping daemon (5.9)
> Mar 26 16:11:43 OT watchdog[2750]: starting daemon (5.9):
> Mar 26 16:11:43 OT watchdog[2750]: int=1s realtime=yes sync=no soft=no
> mla=0 mem=0
> Mar 26 16:11:43 OT watchdog[2750]: ping: no machine to check
> Mar 26 16:11:43 OT watchdog[2750]: file: no file to check
> Mar 26 16:11:43 OT watchdog[2750]: pidfile: no server process to check
> Mar 26 16:11:43 OT watchdog[2750]: interface: no interface to check
> Mar 26 16:11:43 OT watchdog[2750]: test=none(0) repair=none(0)
> alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no
> Mar 26 16:11:43 OT watchdog[2750]: cannot set timeout 60 (errno = 95 =
> 'Operation not supported')

As mentioned in my other mail the watchdog core no longer sets
wdd->timeout which results in the above message. Should be fixed
with the next version.

Sascha


-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: fix a big-endian bug when an extent is zeroed out

2013-04-03 Thread Florian Fainelli


Hello,

Le 04/03/13 16:41, Theodore Ts'o a écrit :

On Wed, Apr 03, 2013 at 10:34:06AM -0400, Eric Whitney wrote:


The TI OMAP4 processor on my Pandaboard test system is little endian.


Ah... so basically, we need to find a test platform which allows us to
boot arbitrary kernels and allows us to have root access (which means
it's unlikely we'll be able to do this via remote access) and which
doesn't have exotic power requirements (which as far as I know rules
out pSeries and zSeries systems)

It would also be nice if we could run tests in finite time, which
probably rules out the Hercules emulator (it runs at one-tenth zSeries
processor speeds, which doesn't win speed competitions by default, and
I suspect their storage speeds are even worse).

Anyone else have any suggestions?  Or anyone willing to help us run
ext4 regression tests on the ext4 dev tree, so we can find these
problems before we merge into mainline?


Qemu emulates various mainline PowerPC, MIPS and SPARC big-endian 
systems pretty efficiently and it should not be too hard neither to 
script nor to get a recent kernel up and running on these platforms.


My 2 cents.
--
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 >

301 - 400 of 750 matches

Mail list logo