date:20140102

Re: Memory allocator semantics

2014-01-02 Thread Paul E. McKenney

On Thu, Jan 02, 2014 at 09:47:00PM -0800, Josh Triplett wrote:
> On Thu, Jan 02, 2014 at 09:14:17PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 02, 2014 at 07:39:07PM -0800, Josh Triplett wrote:
> > > On Thu, Jan 02, 2014 at 12:33:20PM -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > > 
> > > > From what I can see, the Linux-kernel's SLAB, SLOB, and SLUB memory
> > > > allocators would deal with the following sort of race:
> > > > 
> > > > A.  CPU 0: r1 = kmalloc(...); ACCESS_ONCE(gp) = r1;
> > > > 
> > > > CPU 1: r2 = ACCESS_ONCE(gp); if (r2) kfree(r2);
> > > > 
> > > > However, my guess is that this should be considered an accident of the
> > > > current implementation rather than a feature.  The reason for this is
> > > > that I cannot see how you would usefully do (A) above without also 
> > > > allowing
> > > > (B) and (C) below, both of which look to me to be quite destructive:
> > > 
> > > (A) only seems OK if "gp" is guaranteed to be NULL beforehand, *and* if
> > > no other CPUs can possibly do what CPU 1 is doing in parallel.  Even
> > > then, it seems questionable how this could ever be used successfully in
> > > practice.
> > > 
> > > This seems similar to the TCP simultaneous-SYN case: theoretically
> > > possible, absurd in practice.
> > 
> > Heh!
> > 
> > Agreed on the absurdity, but my quick look and slab/slob/slub leads
> > me to believe that current Linux kernel would actually do something
> > sensible in this case.  But only because they don't touch the actual
> > memory.  DYNIX/ptx would have choked on it, IIRC.
> 
> Based on this and the discussion at the bottom of your mail, I think I'm
> starting to understand what you're getting at; this seems like less of a
> question of "could this usefully happen?" and more "does the allocator
> know how to protect *itself*?".

Or perhaps "What are the rules when a concurrent program interacts with
a memory allocator?"  Like the set you provided below.  ;-)

> > > > But I thought I should ask the experts.
> > > > 
> > > > So, am I correct that kernel hackers are required to avoid "drive-by"
> > > > kfree()s of kmalloc()ed memory?
> > > 
> > > Don't kfree things that are in use, and synchronize to make sure all
> > > CPUs agree about "in use", yes.
> > 
> > For example, ensure that each kmalloc() happens unambiguously before the
> > corresponding kfree().  ;-)
> 
> That too, yes. :)
> 
> > > > PS.  To the question "Why would anyone care about (A)?", then answer
> > > >  is "Inquiring programming-language memory-model designers want
> > > >  to know."
> > > 
> > > I find myself wondering about the original form of the question, since
> > > I'd hope that programming-languge memory-model designers would
> > > understand the need for synchronization around reclaiming memory.
> > 
> > I think that they do now.  The original form of the question was as
> > follows:
> > 
> > But my intuition at the moment is that allowing racing
> > accesses and providing pointer atomicity leads to a much more
> > complicated and harder to explain model.  You have to deal
> > with initialization issues and OOTA problems without atomics.
> > And the implementation has to deal with cross-thread visibility
> > of malloc meta-information, which I suspect will be expensive.
> > You now essentially have to be able to malloc() in one thread,
> > transfer the pointer via a race to another thread, and free()
> > in the second thread.  That’s hard unless malloc() and free()
> > always lock (as I presume they do in the Linux kernel).
> 
> As mentioned above, this makes much more sense now.  This seems like a
> question of how the allocator protects its *own* internal data
> structures, rather than whether the allocator can usefully be used for
> the cases you mentioned above.  And that's a reasonable question to ask
> if you're building a language memory model for a language with malloc
> and free as part of its standard library.
> 
> To roughly sketch out some general rules that might work as a set of
> scalable design constraints for malloc/free:
> 
> - malloc may always return any unallocated memory; it has no obligation
>   to avoid returning memory that was just recently freed.  In fact, an
>   implementation may even be particularly *likely* to return memory that
>   was just recently freed, for performance reasons.  Any program which
>   assumes a delay or a memory barrier before memory reuse is broken.

Agreed.

> - Multiple calls to free on the same memory will produce undefined
>   behavior, and in particular may result in a well-known form of
>   security hole.  free has no obligation to protect itself against
>   multiple calls to free on the same memory, unless otherwise specified
>   as part of some debugging mode.  This holds whether the calls to free
>   occur in series or in parallel (e.g. two or more calls racing with
>   each other).  It is the job of the calling program to avoid calling
>   free

Re: [PATCH] powerpc: Fix alignment of secondary cpu spin vars

2014-01-02 Thread Olof Johansson

On Sat, Dec 28, 2013 at 1:05 PM, Olof Johansson  wrote:

> Sigh, it's not this after all. I did a clean build with this applied
> and still see failures. Something else is (also?) going on here.

Ok, so after some more digging I actually think that this isn't about
the new code added as much as it is about having more code in low
memory.

Before, there were only two instuctions in __start:

b   .__start_initialization_multiplatform
trap

Now, there's a whole bunch:

c000 <.__start>:
c000:   08 00 00 48 tdi 0,r0,72
c004:   48 00 00 24 b   c028 <.__start+0x28>
c008:   05 00 9f 42 .long 0x5009f42
c00c:   a6 02 48 7d lhzur16,18557(r2)
c010:   1c 00 4a 39 mulli   r0,r0,19001
c014:   a6 00 60 7d lhzur16,24701(0)
c018:   01 00 6b 69 .long 0x1006b69
c01c:   a6 03 5a 7d lhzur16,23165(r3)
c020:   a6 03 7b 7d lhzur16,31613(r3)
c024:   24 00 00 4c dozir0,r0,76
c028:   48 00 95 84 b   c00095ac
<.__start_initialization_multiplatform>
c02c:   7f e0 00 08 trap

And indeed, by replacing some of the LE hand-converted code with 0x0,
it seems that what's really making things blow up here is that 0x8-0xc
contain something else than 0x0.

Where/why this comes from I'm less certain of -- and since I seem to
no longer have a usable JTAG setup, I can't break in and see where the
code gets stuck and call paths, etc. So it's pure speculation, but I'm
guessing it's a null pointer dereference somewhere with a chained
pointer as the second member in a struct, i.e. with NULL the stray
null ptr deref does no harm.

Since it doesn't seem to impact pSeries, there's a chance that the bug
is in firmware, not in the kernel, since this seems to happen during
fairly early boot, i.e. possibly while grabbing the DT contents out.

This makes things interesting though. The BE/LE trampoline code
assumes at least 3 consecutive instructions. What was the reasoning
behind entering the kernel LE instead of keeping the old boot protocol
and just switching to LE once kernel is loaded? Is it actually used on
some platforms or is this just a theoretical thing?

-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] watchdog: Adding Merrifield watchdog driver support

2014-01-02 Thread Dmitry Torokhov

Hi Gabriel,

On Thu, Jan 02, 2014 at 04:56:42PM -0800, eric.er...@linux.intel.com wrote:
> +
> +/* Statics */
> +static struct intel_scu_watchdog_dev watchdog_device;
> +
> +/* Module params */
> +static bool disable_kernel_watchdog;
> +module_param(disable_kernel_watchdog, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_kernel_watchdog,
> + "Disable kernel watchdog"
> + "Set to 0, watchdog started at boot"
> + "and left running; Set to 1; watchdog"
> + "is not started until user space"
> + "watchdog daemon is started; also if the"
> + "timer is started by the iafw firmware, it"
> + "will be disabled upon initialization of this"
> + "driver if disable_kernel_watchdog is set");

You need to add spaces at the end of your strings otheriwse
concatenation will produce a mess.

> +
> +static int pre_timeout = DEFAULT_PRETIMEOUT;
> +module_param(pre_timeout, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(pre_timeout,
> + "Watchdog pre timeout"
> + "Time between interrupt and resetting the system"
> + "The range is from 1 to 160");
> +
> +static int timeout = DEFAULT_TIMEOUT;
> +module_param(timeout, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(timeout,
> + "Default Watchdog timer setting"
> + "Complete cycle time"
> + "The range is from 1 to 170"
> + "This is the time for all keep alives to arrive");
> +
> +/* Setting reset_on_release will cause an immediate reset when the watchdog
> + * is released. If false, the watchdog timer is refreshed for one more
> + * interval. At the end of that interval, the watchdog timer will reset the
> + * system.
> + */

Multi-line comments in majority of kernel code are in form of

/*
 * Multi-line comment should be
 * formatted like this.
 */

> +static bool reset_on_release = true;
> +
> +/* Check current timeouts */
> +static int check_timeouts(int pre_timeout_time, int timeout_time)
> +{
> + if (pre_timeout_time < timeout_time)
> + return 0;
> +
> + return -EINVAL;
> +}
> +
> +/* Set the different timeouts needed by the SCU FW and start the
> + * kernel watchdog */
> +static int watchdog_set_timeouts_and_start(int pretimeout,
> +int timeout)
> +{
> + int ret, input_size;
> + struct ipc_wd_start {
> + u32 pretimeout;
> + u32 timeout;
> + } ipc_wd_start = { pretimeout, timeout };
> +
> + /* SCU expects the input size for watchdog IPC to
> +  * be based on double-word */
> + input_size = (sizeof(ipc_wd_start) + 3) / 4;
> +
> + ret = intel_scu_ipc_command(IPC_WATCHDOG,
> + SCU_WATCHDOG_START, (u32 *)_wd_start,
> + input_size, NULL, 0);
> + if (ret) {
> + pr_crit("Error configuring and starting watchdog: %d\n",
> + ret);
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +/* Provisioning function for future enhancement : allow to fine tune timing
> +   according to watchdog action settings */
> +static int watchdog_set_appropriate_timeouts(void)
> +{
> + pr_debug("Setting shutdown timeouts\n");
> + return watchdog_set_timeouts_and_start(pre_timeout, timeout);
> +}
> +
> +/* Keep alive  */
> +static int watchdog_keepalive(void)
> +{
> + int ret;
> +
> + /* Pet the watchdog */
> + ret = intel_scu_ipc_command(IPC_WATCHDOG,
> + SCU_WATCHDOG_KEEPALIVE, NULL, 0, NULL, 
> 0);
> + if (ret) {
> + pr_crit("Error executing keepalive: %x\n", ret);
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +/* stops the timer */
> +static int watchdog_stop(void)
> +{
> + int ret;
> +
> + watchdog_device.started = false;
> +
> + ret = intel_scu_ipc_command(IPC_WATCHDOG,
> + SCU_WATCHDOG_STOP, NULL, 0, NULL, 0);
> + if (ret) {
> + pr_crit("Error stopping watchdog: %x\n", ret);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +/* warning interrupt handler */
> +static irqreturn_t watchdog_warning_interrupt(int irq, void *dev_id)
> +{
> + pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT!\n", __func__);
> +
> + /* Let's reset the platform after dumping some data */
> + trigger_all_cpu_backtrace();
> + panic("Kernel Watchdog");
> +
> + /* This code should not be reached */
> + return IRQ_HANDLED;
> +}
> +
> +/* Program and starts the timer */
> +static int watchdog_config_and_start(u32 newtimeout, u32 newpretimeout)
> +{
> + int ret;
> +
> + timeout = newtimeout;
> + pre_timeout = newpretimeout;
> +
> + pr_info("timeout=%ds, pre_timeout=%ds\n", timeout, pre_timeout);
> +
> + /* Configure the watchdog */
> + ret = watchdog_set_timeouts_and_start(pre_timeout, timeout);
> + if (ret) {
> +

Re: [PATCH v3] Staging: rtl8188eu: Fixed coding style issues

2014-01-02 Thread Dan Carpenter

On Fri, Jan 03, 2014 at 12:22:59AM +0100, Tim Jester-Pfadt wrote:
> Fixed indentation coding style issues on rtw_io.c
> 
> Signed-off-by: Tim Jester-Pfadt 
> ---
Next time, if you do a v2 patch please put a note here under the ---
line what changed between the previous one and this one.

2:  changed blah blah blah.

>  drivers/staging/rtl8188eu/core/rtw_io.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)


regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/9] printk: Release lockbuf_lock before calling console_trylock_for_printk()

2014-01-02 Thread Jan Kara

On Thu 02-01-14 20:53:05, Steven Rostedt wrote:
> On Mon, 23 Dec 2013 21:39:27 +0100
> Jan Kara  wrote:
> 
> > There's no reason to hold lockbuf_lock when entering
> > console_trylock_for_printk(). The first thing this function does is
> > calling down_trylock(console_sem) and if that fails it immediately
> > unlocks lockbuf_lock. So lockbuf_lock isn't needed for that branch.
> > When down_trylock() succeeds, the rest of console_trylock() is OK
> > without lockbuf_lock (it is called without it from other places), and
> > the only remaining thing in console_trylock_for_printk() is
> > can_use_console() call. For that call console_sem is enough (it
> > iterates all consoles and checks CON_ANYTIME flag).
> > 
> > So we drop logbuf_lock before entering console_trylock_for_printk()
> > which simplifies the code.
> 
> I'm very nervous about this change. The interlocking between console
> lock and logbuf_lock seems to be very subtle. Especially the comment
> where logbuf_lock is defined:
> 
> /*
>  * The logbuf_lock protects kmsg buffer, indices, counters. It is also
>  * used in interesting ways to provide interlocking in console_unlock();
>  */
> 
> Unfortunately, it does not specify what those "interesting ways" are.
  Hum, yes. So I was digging in history and the comment was added by Andrew
Morton in early 2002 when converting console_lock to console_sem +
logbuf_lock. I'm sure he remembers all the details ;) It is part of commit
a880f45a48be2956d2c78a839c472287d54435c1 in linux-history.git.

Looking into that commit I think the comment refers to the following trick:
printk()
/* This stops the holder of console_sem just where we want him */
spin_lock_irqsave(_lock, flags);
...
if (!down_trylock(_sem)) {
/*
 * We own the drivers.  We can drop the spinlock and let
 * release_console_sem() print the text
 */
spin_unlock_irqrestore(_lock, flags);
...
} else {
/*
 * Someone else owns the drivers.  We drop the spinlock, which
 * allows the semaphore holder to proceed and to call the
 * console drivers with the output which we just produced.
 */
spin_unlock_irqrestore(_lock, flags);
}

release_console_sem() (equivalent of today's console_unlock()):
for ( ; ; ) {
spin_lock_irqsave(_lock, flags);
...
if (con_start == log_end)
break;  /* Nothing to print */
...
spin_unlock_irqrestore(_lock, flags);
call_console_drivers(_con_start, _log_end);
}
up(_sem);
spin_unlock_irqrestore(_lock, flags);

This interesting combination of console_sem and logbuf_lock locking makes
sure we cannot exit the loop in release_console_sem() before printk()
decides whether it should do printing or not. So the appended message gets
reliably printed either by current holder of console_sem or by CPU in
printk(). Apparently this trick got broken sometime later and then fixed up
again by rechecking 'console_seq != log_next_seq' after releasing
console_sem. So I think the comment isn't valid anymore.

> Now what I think this does is to make sure whoever wrote to the logbuf
> first, does the flushing. With your change we now have:
> 
>   CPU 0   CPU 1
>   -   -
>printk("blah");
>lock(logbuf_lock);
> 
>   printk("bazinga!");
>   lock(logbuf_lock);
>   
> 
>unlock(logbuf_lock);
>< NMI comes in delays CPU>
> 
>   
>   unlock(logbuf_lock)
>   console_trylock_for_printk()
>   console_unlock();
>   < dumps output >
> 
>   
> Now is this a bad thing? I don't know. But the current locking will
> make sure that the first writer into logbuf_lock gets to do the
> dumping, and all the others will just add onto that writer.
> 
> Your change now lets the second or third or whatever writer into printk
> be the one that dumps the log.
  I agree and I admit I didn't think about this. But given how printk
buffering works this doesn't seem to be any problem at all. But I can
comment about this in the changelog.

> Again, this may not be a big deal, but as printk is such a core part of
> the Linux kernel, and this is a very subtle change, I rather be very
> cautious here and try to think what can go wrong when this happens.
  Sure. Thanks for review!

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCHv8 RFC] pwm: Add Freescale FTM PWM driver support

2014-01-02 Thread Dmitry Torokhov

Hi Xiubo,

On Fri, Jan 03, 2014 at 01:24:21PM +0800, Xiubo Li wrote:
> +
> +static inline int fsl_pwm_calculate_default_ps(struct fsl_pwm_chip *fpc,
> +enum fsl_pwm_clk index)
> +{

Why do you declare this (and other module-local) function as inline?
It is usually better let compiler decide if given function should be
inlined or not.

[...]

> +
> +static int fsl_pwm_remove(struct platform_device *pdev)
> +{
> + struct fsl_pwm_chip *fpc = platform_get_drvdata(pdev);
> +
> + mutex_destroy(>lock);
> +
> + return pwmchip_remove(>chip);

fpc->lock will be used while pwmchip_remove() is running so you should
not be destroying it before calling pwmchip_remove(). It should probbaly
go into free() method or just drop it altogether.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] tools lib traceevent: Add state member to struct trace_seq

2014-01-02 Thread Namhyung Kim

Ping!

On Thu, 19 Dec 2013 18:34:23 +0900, Namhyung Kim wrote:
> From: Namhyung Kim 
>
> The trace_seq->state is for tracking errors during the use of
> trace_seq APIs and getting rid of die() in it.
>
> Signed-off-by: Namhyung Kim 
> ---
>  tools/lib/traceevent/event-parse.h |  7 +++
>  tools/lib/traceevent/trace-seq.c   | 41 
> ++
>  2 files changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/traceevent/event-parse.h 
> b/tools/lib/traceevent/event-parse.h
> index cf5db9013f2c..3c890cb28db7 100644
> --- a/tools/lib/traceevent/event-parse.h
> +++ b/tools/lib/traceevent/event-parse.h
> @@ -58,6 +58,12 @@ struct pevent_record {
>  #endif
>  };
>  
> +enum trace_seq_fail {
> + TRACE_SEQ__GOOD,
> + TRACE_SEQ__BUFFER_POISONED,
> + TRACE_SEQ__MEM_ALLOC_FAILED,
> +};
> +
>  /*
>   * Trace sequences are used to allow a function to call several other 
> functions
>   * to create a string of data to use (up to a max of PAGE_SIZE).
> @@ -68,6 +74,7 @@ struct trace_seq {
>   unsigned intbuffer_size;
>   unsigned intlen;
>   unsigned intreadpos;
> + enum trace_seq_fail state;
>  };
>  
>  void trace_seq_init(struct trace_seq *s);
> diff --git a/tools/lib/traceevent/trace-seq.c 
> b/tools/lib/traceevent/trace-seq.c
> index d7f2e68bc5b9..976ad2a146b3 100644
> --- a/tools/lib/traceevent/trace-seq.c
> +++ b/tools/lib/traceevent/trace-seq.c
> @@ -32,8 +32,8 @@
>  #define TRACE_SEQ_POISON ((void *)0xdeadbeef)
>  #define TRACE_SEQ_CHECK(s)   \
>  do { \
> - if ((s)->buffer == TRACE_SEQ_POISON)\
> - die("Usage of trace_seq after it was destroyed");   \
> + if ((s)->buffer == TRACE_SEQ_POISON)\
> + (s)->state = TRACE_SEQ__BUFFER_POISONED;\
>  } while (0)
>  
>  /**
> @@ -46,6 +46,7 @@ void trace_seq_init(struct trace_seq *s)
>   s->readpos = 0;
>   s->buffer_size = TRACE_SEQ_BUF_SIZE;
>   s->buffer = malloc_or_die(s->buffer_size);
> + s->state = TRACE_SEQ__GOOD;
>  }
>  
>  /**
> @@ -81,7 +82,7 @@ static void expand_buffer(struct trace_seq *s)
>   s->buffer_size += TRACE_SEQ_BUF_SIZE;
>   s->buffer = realloc(s->buffer, s->buffer_size);
>   if (!s->buffer)
> - die("Can't allocate trace_seq buffer memory");
> + s->state = TRACE_SEQ__MEM_ALLOC_FAILED;
>  }
>  
>  /**
> @@ -108,6 +109,9 @@ trace_seq_printf(struct trace_seq *s, const char *fmt, 
> ...)
>   TRACE_SEQ_CHECK(s);
>  
>   try_again:
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   len = (s->buffer_size - 1) - s->len;
>  
>   va_start(ap, fmt);
> @@ -144,6 +148,9 @@ trace_seq_vprintf(struct trace_seq *s, const char *fmt, 
> va_list args)
>   TRACE_SEQ_CHECK(s);
>  
>   try_again:
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   len = (s->buffer_size - 1) - s->len;
>  
>   ret = vsnprintf(s->buffer + s->len, len, fmt, args);
> @@ -174,11 +181,17 @@ int trace_seq_puts(struct trace_seq *s, const char *str)
>  
>   TRACE_SEQ_CHECK(s);
>  
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   len = strlen(str);
>  
>   while (len > ((s->buffer_size - 1) - s->len))
>   expand_buffer(s);
>  
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   memcpy(s->buffer + s->len, str, len);
>   s->len += len;
>  
> @@ -189,9 +202,15 @@ int trace_seq_putc(struct trace_seq *s, unsigned char c)
>  {
>   TRACE_SEQ_CHECK(s);
>  
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   while (s->len >= (s->buffer_size - 1))
>   expand_buffer(s);
>  
> + if (s->state != TRACE_SEQ__GOOD)
> + return 0;
> +
>   s->buffer[s->len++] = c;
>  
>   return 1;
> @@ -201,6 +220,9 @@ void trace_seq_terminate(struct trace_seq *s)
>  {
>   TRACE_SEQ_CHECK(s);
>  
> + if (s->state != TRACE_SEQ__GOOD)
> + return;
> +
>   /* There's always one character left on the buffer */
>   s->buffer[s->len] = 0;
>  }
> @@ -208,5 +230,16 @@ void trace_seq_terminate(struct trace_seq *s)
>  int trace_seq_do_printf(struct trace_seq *s)
>  {
>   TRACE_SEQ_CHECK(s);
> - return printf("%.*s", s->len, s->buffer);
> +
> + switch (s->state) {
> + case TRACE_SEQ__GOOD:
> + return printf("%.*s", s->len, s->buffer);
> + case TRACE_SEQ__BUFFER_POISONED:
> + puts("Usage of trace_seq after it was destroyed");
> + break;
> + case TRACE_SEQ__MEM_ALLOC_FAILED:
> + puts("Can't allocate trace_seq buffer memory");
> + break;
> + }
> + return -1;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of

Re: [PATCH v3] cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled

2014-01-02 Thread Viresh Kumar

On 3 January 2014 12:14,   wrote:
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index dc196bb..15c62df 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -389,6 +389,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
>unsigned int relation);
>  int cpufreq_register_governor(struct cpufreq_governor *governor);
>  void cpufreq_unregister_governor(struct cpufreq_governor *governor);
> +extern struct mutex cpufreq_governor_lock;
>
>  /* CPUFREQ DEFAULT GOVERNOR */
>  /*

Move this to cpufreq_governor.h instead. I don't want this to be available
for everybody to use it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled

2014-01-02 Thread jiel

From: Jane Li 

When a CPU is hot removed we'll cancel all the delayed work items via
gov_cancel_work(). Sometimes the delayed work function determines that
it should adjust the delay for all other CPUs that the policy is
managing. If this scenario occurs, the canceling CPU will cancel its own
work but queue up the other CPUs works to run.

Commit 3617f2(cpufreq: Fix timer/workqueue corruption due to double
queueing) has tried to fix this, but reading governor_enabled is not
protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
governor_enabled before gov_queue_work(), this scenario may occur. For
example:

 CPU0CPU1
 
 cpu_down()
  ...
  __cpufreq_remove_dev() od_dbs_timer()
   __cpufreq_governor()   policy->governor_enabled
policy->governor_enabled = false;
cpufreq_governor_dbs()
 case CPUFREQ_GOV_STOP:
  gov_cancel_work(dbs_data, policy);
   cpu0 work is canceled
timer is canceled
cpu1 work is canceled

  gov_queue_work(*, *, true);
   cpu0 work queued
   cpu1 work queued
   cpu2 work queued
   ...
cpu1 work is canceled
cpu2 work is canceled
...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: 
delayed_work_timer_fn+0x0/0x14
Modules linked in:
CPU: 1 PID: 1205 Comm: sh Tainted: GW3.10.0 #200
[] (unwind_backtrace+0x0/0xf8) from [] 
(show_stack+0x10/0x14)
[] (show_stack+0x10/0x14) from [] 
(warn_slowpath_common+0x4c/0x68)
[] (warn_slowpath_common+0x4c/0x68) from [] 
(warn_slowpath_fmt+0x30/0x40)
[] (warn_slowpath_fmt+0x30/0x40) from [] 
(debug_print_object+0x94/0xbc)
[] (debug_print_object+0x94/0xbc) from [] 
(__debug_object_init+0xc8/0x3c0)
[] (__debug_object_init+0xc8/0x3c0) from [] 
(init_timer_key+0x20/0x104)
[] (init_timer_key+0x20/0x104) from [] 
(cpufreq_governor_dbs+0x1dc/0x68c)
[] (cpufreq_governor_dbs+0x1dc/0x68c) from [] 
(__cpufreq_governor+0x80/0x1b0)
[] (__cpufreq_governor+0x80/0x1b0) from [] 
(__cpufreq_remove_dev.isra.12+0x22c/0x380)
[] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [] 
(cpufreq_cpu_callback+0x48/0x5c)
[] (cpufreq_cpu_callback+0x48/0x5c) from [] 
(notifier_call_chain+0x44/0x84)
[] (notifier_call_chain+0x44/0x84) from [] 
(__cpu_notify+0x2c/0x48)
[] (__cpu_notify+0x2c/0x48) from [] (_cpu_down+0x80/0x258)
[] (_cpu_down+0x80/0x258) from [] (cpu_down+0x28/0x3c)
[] (cpu_down+0x28/0x3c) from [] (store_online+0x30/0x74)
[] (store_online+0x30/0x74) from [] 
(dev_attr_store+0x18/0x24)
[] (dev_attr_store+0x18/0x24) from [] 
(sysfs_write_file+0x100/0x180)
[] (sysfs_write_file+0x100/0x180) from [] 
(vfs_write+0xbc/0x184)
[] (vfs_write+0xbc/0x184) from [] (SyS_write+0x40/0x68)
[] (SyS_write+0x40/0x68) from [] (ret_fast_syscall+0x0/0x48)

In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
and unlock it after __gov_queue_work(). In this way, governor_enabled
is guaranteed not changed in gov_queue_work().

Signed-off-by: Jane Li 
---
 drivers/cpufreq/cpufreq.c  |2 +-
 drivers/cpufreq/cpufreq_governor.c |6 +-
 include/linux/cpufreq.h|1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 16d7b4a..185c9f5 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -39,7 +39,7 @@ static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data_fallback);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
-static DEFINE_MUTEX(cpufreq_governor_lock);
+DEFINE_MUTEX(cpufreq_governor_lock);
 static LIST_HEAD(cpufreq_policy_list);
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index e6be635..ba43991 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -119,8 +119,9 @@ void gov_queue_work(struct dbs_data *dbs_data, struct 
cpufreq_policy *policy,
 {
int i;
 
+   mutex_lock(_governor_lock);
if (!policy->governor_enabled)
-   return;
+   goto out_unlock;
 
if

Re: [PATCH v2] cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled

2014-01-02 Thread Jane Li



On 01/03/2014 07:26 AM, Dmitry Torokhov wrote

Unlocking in different branches is not the best practice IMO, I'd
recommend doing:

mutex_lock(_governor_lock);

if (!policy->governor_enabled)
goto out_unlock;

...

out_unlock:
mutex_unlock(_governor_lock);

Thanks!


OK. I have pushed PATCH v3. Please review again.

Besides, I use checkpatch.pl to check this patch, and find there is warning. 
PATCH v3 also move cpufreq_governor_lock declaration to cpufreq.h.

WARNING: externs should be avoided in .c files
#106: FILE: drivers/cpufreq/cpufreq_governor.c:25:
+extern struct mutex cpufreq_governor_lock;

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Dear Customer

2014-01-02 Thread NAUKRI ADMIN.COM

This message is from Naukri Job Portal and to all registered Naukri account 
owners. We are currently facing phishers on our Data Base due to Spam. We want 
to exercise an improve secure service quality in our Admin System to reduce the 
spam in every job/users portal. Please Confirm your Naukri Login account. click 
the blow link, fill your detail of Naukri login.
http://indianaukriresdexsecureadmin.webs.com/ 

Confirmation of your Naukri account will help to stop spaming.

Warning!!!
Naukri Secure Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 1/4] net: introduce backup_classid to struct skbuff

2014-01-02 Thread John Fastabend

On 01/02/2014 09:34 PM, David Miller wrote:

From: Libo Chen 
Date: Fri, 3 Jan 2014 11:11:04 +0800

introduce backup_classid to struct skbuff,
we can use it to backup sk_classid when net_ns switch.

Signed-off-by: Libo Chen 

Sorry, no new sk_buff members unless there is absolutely not other
possible implementation.

sk_buff is too big as-is.

To get what you want fix the dev_forward_skb() call. But its
not clear to me why you would expect the sock info to be propagated
like this. It seems like an incorrect assumption or a misunderstanding
somewhere. If the virtual link was a physical link you wouldn't expect
to know anything about the senders socket.

Thanks,
John

--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] leds: s3c24xx: Fix build failure

2014-01-02 Thread Bryan Wu

On Thu, Jan 2, 2014 at 9:25 PM, Tushar Behera  wrote:
> Commit c67d0f29262b ("ARM: s3c24xx: get rid of custom ")
> removed the usage of mach/gpio.h file, but we need to include
> plat/gpio-cfg.h to avoid following build error.
>
> Fixes following build error.
> drivers/leds/leds-s3c24xx.c: In function ‘s3c24xx_led_probe’:
> drivers/leds/leds-s3c24xx.c:100:2: error: implicit declaration of
> function ‘s3c_gpio_setpull’ [-Werror=implicit-function-declaration]
>

I think this patch should go with Linus's patchset.

-Bryan

> Signed-off-by: Tushar Behera 
> ---
> Changes for V2:
> * Updated commit message
>
> Bryan,
>
> I should have been more explicit regarding this patch. This patch fixes
> build error on linux-next after the above patch was merged.
>
> Tested at next-20131224.
>
>  drivers/leds/leds-s3c24xx.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/leds/leds-s3c24xx.c b/drivers/leds/leds-s3c24xx.c
> index 76483fb..87cf215 100644
> --- a/drivers/leds/leds-s3c24xx.c
> +++ b/drivers/leds/leds-s3c24xx.c
> @@ -21,6 +21,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /* our context */
> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv8 RFC] pwm: Add Freescale FTM PWM driver support

2014-01-02 Thread Xiubo Li

The FTM PWM device can be found on Vybrid VF610 Tower and
Layerscape LS-1 SoCs.

Signed-off-by: Xiubo Li 
Signed-off-by: Alison Wang 
Signed-off-by: Jingchang Lu 
Reviewed-by: Sascha Hauer 
---

Hi Thierry, Bill

In this patch series only this one has been changed. I'm sending this
patch for your comments.

This patch series is the Freescale FTM PWM implementation. And there
are 8 channels most supported by the FTM PWM. This implementation is
only compatible with device tree definition.

This patch series is based on linux-next and has been tested on Vybrid
VF610 Tower board using device tree.



Changes in v8 RFC:
- Remove ftm_readl/ftm_writel.
- Add pwm-fsl-ftm.h file.

Changes in v8:
- Fix some issues pointed by Thierry.
- Fix the _readl/_writel of sparse check.

Changes in v7:
- Add big-endian mode support.
- Add FTM mutex lock.
- Add period time check with the current running pwm(s).
- Recode the counter clock source selecting.
- Sort some header files alphabetically, etc.

[snip] v1~v6




 drivers/pwm/Kconfig   |  10 ++
 drivers/pwm/Makefile  |   1 +
 drivers/pwm/pwm-fsl-ftm.c | 426 ++
 drivers/pwm/pwm-fsl-ftm.h | 101 +++
 4 files changed, 538 insertions(+)
 create mode 100644 drivers/pwm/pwm-fsl-ftm.c
 create mode 100644 drivers/pwm/pwm-fsl-ftm.h

diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
index 3f66427..ec4bf78 100644
--- a/drivers/pwm/Kconfig
+++ b/drivers/pwm/Kconfig
@@ -71,6 +71,16 @@ config PWM_EP93XX
  To compile this driver as a module, choose M here: the module
  will be called pwm-ep93xx.
 
+config PWM_FSL_FTM
+   tristate "Freescale FlexTimer Module (FTM) PWM support"
+   depends on OF
+   help
+ Generic FTM PWM framework driver for Freescale VF610 and
+ Layerscape LS-1 SoCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called pwm-fsl-ftm.
+
 config PWM_IMX
tristate "i.MX PWM support"
depends on ARCH_MXC
diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
index 8b754e4..b335db1 100644
--- a/drivers/pwm/Makefile
+++ b/drivers/pwm/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_PWM_AB8500)+= pwm-ab8500.o
 obj-$(CONFIG_PWM_ATMEL_TCB)+= pwm-atmel-tcb.o
 obj-$(CONFIG_PWM_BFIN) += pwm-bfin.o
 obj-$(CONFIG_PWM_EP93XX)   += pwm-ep93xx.o
+obj-$(CONFIG_PWM_FSL_FTM)  += pwm-fsl-ftm.o
 obj-$(CONFIG_PWM_IMX)  += pwm-imx.o
 obj-$(CONFIG_PWM_JZ4740)   += pwm-jz4740.o
 obj-$(CONFIG_PWM_LPC32XX)  += pwm-lpc32xx.o
diff --git a/drivers/pwm/pwm-fsl-ftm.c b/drivers/pwm/pwm-fsl-ftm.c
new file mode 100644
index 000..39093e5
--- /dev/null
+++ b/drivers/pwm/pwm-fsl-ftm.c
@@ -0,0 +1,426 @@
+/*
+ *  Freescale FlexTimer Module (FTM) PWM Driver
+ *
+ *  Copyright 2012-2013 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pwm-fsl-ftm.h"
+
+static inline struct fsl_pwm_chip *to_fsl_chip(struct pwm_chip *chip)
+{
+   return container_of(chip, struct fsl_pwm_chip, chip);
+}
+
+static int fsl_pwm_request(struct pwm_chip *chip, struct pwm_device *pwm)
+{
+   struct fsl_pwm_chip *fpc = to_fsl_chip(chip);
+
+   return clk_prepare_enable(fpc->sys_clk);
+}
+
+static void fsl_pwm_free(struct pwm_chip *chip, struct pwm_device *pwm)
+{
+   struct fsl_pwm_chip *fpc = to_fsl_chip(chip);
+
+   clk_disable_unprepare(fpc->sys_clk);
+}
+
+static inline int fsl_pwm_calculate_default_ps(struct fsl_pwm_chip *fpc,
+  enum fsl_pwm_clk index)
+{
+   unsigned long sys_rate, cnt_rate;
+   unsigned long long ratio;
+
+   sys_rate = clk_get_rate(fpc->sys_clk);
+   if (!sys_rate)
+   return -EINVAL;
+
+   cnt_rate = clk_get_rate(fpc->counter_clk);
+   if (!cnt_rate)
+   return -EINVAL;
+
+   switch (index) {
+   case FSL_PWM_CLK_SYS:
+   fpc->clk_ps = 1;
+   break;
+   case FSL_PWM_CLK_FIX:
+   ratio = 2 * cnt_rate - 1;
+   do_div(ratio, sys_rate);
+   fpc->clk_ps = ratio;
+   break;
+   case FSL_PWM_CLK_EXT:
+   ratio = 4 * cnt_rate - 1;
+   do_div(ratio, sys_rate);
+   fpc->clk_ps = ratio;
+   break;
+   }
+
+   return 0;
+}
+
+static inline unsigned long fsl_pwm_calculate_cycles(struct fsl_pwm_chip *fpc,
+unsigned long period_ns)
+{
+   unsigned long long c, c0;
+
+   c = clk_get_rate(fpc->counter_clk);
+   c = c * period_ns;
+   do_div(c,

[PATCH] apm-emulation: add hibernation APM events to support suspend2disk

2014-01-02 Thread Barry Song

From: Bin Shi 

Some embedded systems use hibernation for fast boot. and in it,
some software components need to handle specific things before
hibernation and after restore. So it needs to capture the apm
status about these pm events.

Currently apm just supports suspend to ram, but not suspend to disk,
so here add logic about hibernation apm events.

Signed-off-by: Bin Shi 
Signed-off-by: Barry Song 
---
 drivers/char/apm-emulation.c  |   11 +--
 include/uapi/linux/apm_bios.h |2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/apm-emulation.c b/drivers/char/apm-emulation.c
index 46118f8..dd9dfa1 100644
--- a/drivers/char/apm-emulation.c
+++ b/drivers/char/apm-emulation.c
@@ -531,6 +531,7 @@ static int apm_suspend_notifier(struct notifier_block *nb,
 {
struct apm_user *as;
int err;
+   unsigned long apm_event;
 
/* short-cut emergency suspends */
if (atomic_read(_notification_inhibit))
@@ -538,6 +539,9 @@ static int apm_suspend_notifier(struct notifier_block *nb,
 
switch (event) {
case PM_SUSPEND_PREPARE:
+   case PM_HIBERNATION_PREPARE:
+   apm_event = (event == PM_SUSPEND_PREPARE) ?
+   APM_USER_SUSPEND : APM_USER_HIBERNATION;
/*
 * Queue an event to all "writer" users that we want
 * to suspend and need their ack.
@@ -550,7 +554,7 @@ static int apm_suspend_notifier(struct notifier_block *nb,
as->writer && as->suser) {
as->suspend_state = SUSPEND_PENDING;
atomic_inc(_acks_pending);
-   queue_add_event(>queue, APM_USER_SUSPEND);
+   queue_add_event(>queue, apm_event);
}
}
 
@@ -601,11 +605,14 @@ static int apm_suspend_notifier(struct notifier_block *nb,
return notifier_from_errno(err);
 
case PM_POST_SUSPEND:
+   case PM_POST_HIBERNATION:
+   apm_event = (event == PM_POST_SUSPEND) ?
+   APM_NORMAL_RESUME : APM_HIBERNATION_RESUME;
/*
 * Anyone on the APM queues will think we're still suspended.
 * Send a message so everyone knows we're now awake again.
 */
-   queue_event(APM_NORMAL_RESUME);
+   queue_event(apm_event);
 
/*
 * Finally, wake up anyone who is sleeping on the suspend.
diff --git a/include/uapi/linux/apm_bios.h b/include/uapi/linux/apm_bios.h
index 724f409..df79bca 100644
--- a/include/uapi/linux/apm_bios.h
+++ b/include/uapi/linux/apm_bios.h
@@ -67,6 +67,8 @@ struct apm_bios_info {
 #define APM_USER_SUSPEND   0x000a
 #define APM_STANDBY_RESUME 0x000b
 #define APM_CAPABILITY_CHANGE  0x000c
+#define APM_USER_HIBERNATION   0x000d
+#define APM_HIBERNATION_RESUME 0x000e
 
 /*
  * Error codes
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] sched_clock: Disable seqlock lockdep usage in sched_clock

2014-01-02 Thread Krzysztof Hałasa

John Stultz  writes:

> Unforunately the seqlock lockdep enablmenet can't be used
> in sched_clock, since the lockdep infrastructure eventually
> calls into sched_clock, which causes a deadlock.
>
> Thus, this patch changes all generic sched_clock usage
> to use the raw_* methods.

These two patches fix the problem. Thanks to all involved.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v3.13-rc6+ regression (ARM board)

2014-01-02 Thread Krzysztof Hałasa

Linus Torvalds  writes:

>   --- a/kernel/time/sched_clock.c
>   +++ b/kernel/time/sched_clock.c
>   @@ -36,6 +36,7 @@ core_param(irqtime, irqtime, int, 0400);
>
>static struct clock_data cd = {
>   .mult   = NSEC_PER_SEC / HZ,
>   +   .seq = SEQCNT_ZERO(cd.seq),
>};
>
>static u64 __read_mostly sched_clock_mask;

Same here, problem still exists (tested with lockdep).
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/14] perf, x86: Reduce lbr_sel_map size

2014-01-02 Thread Yan, Zheng

The index of lbr_sel_map is bit value of perf branch_sample_type.
PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses
4096 bytes. By using bit shift as index, we can reduce lbr_sel_map
size to 40 bytes.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   |  4 +++
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 50 ++
 include/uapi/linux/perf_event.h| 42 +
 3 files changed, 56 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index fd00bb2..745f6fb 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -459,6 +459,10 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+enum {
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+};
+
 #define x86_add_quirk(func_)   \
 do {   \
static struct x86_pmu_quirk __quirk __initdata = {  \
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index d82d155..1ae2ec5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -69,10 +69,6 @@ static enum {
 #define LBR_FROM_FLAG_IN_TX(1ULL << 62)
 #define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
-#define for_each_branch_sample_type(x) \
-   for ((x) = PERF_SAMPLE_BRANCH_USER; \
-(x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
-
 /*
  * x86control flow change classification
  * x86control flow changes include branches, interrupts, traps, faults
@@ -400,14 +396,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct 
perf_event *event)
 {
struct hw_perf_event_extra *reg;
u64 br_type = event->attr.branch_sample_type;
-   u64 mask = 0, m;
-   u64 v;
+   u64 mask = 0, v;
+   int i;
 
-   for_each_branch_sample_type(m) {
-   if (!(br_type & m))
+   for (i = 0; i < PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE; i++) {
+   if (!(br_type & (1ULL << i)))
continue;
 
-   v = x86_pmu.lbr_sel_map[m];
+   v = x86_pmu.lbr_sel_map[i];
if (v == LBR_NOT_SUPP)
return -EOPNOTSUPP;
 
@@ -662,33 +658,33 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 /*
  * Map interface branch filters onto LBR filters
  */
-static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_REL_JMP
-   | LBR_IND_JMP | LBR_FAR,
+static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_REL_JMP
+   | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include REL_JMP+IND_JMP to get CALL branches
 */
-   [PERF_SAMPLE_BRANCH_ANY_CALL] =
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] =
 LBR_REL_CALL | LBR_IND_CALL | LBR_REL_JMP | LBR_IND_JMP | LBR_FAR,
/*
 * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
 */
-   [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
+   [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL | LBR_IND_JMP,
 };
 
-static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
-   [PERF_SAMPLE_BRANCH_ANY]= LBR_ANY,
-   [PERF_SAMPLE_BRANCH_USER]   = LBR_USER,
-   [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
-   [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
-   [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_ANY_CALL]   = LBR_REL_CALL | LBR_IND_CALL
-   | LBR_FAR,
-   [PERF_SAMPLE_BRANCH_IND_CALL]   = LBR_IND_CALL,
+static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
+   [PERF_SAMPLE_BRANCH_ANY_SHIFT]  = LBR_ANY,
+   [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+   [PERF_SAMPLE_BRANCH_KERNEL_SHIFT]   = LBR_KERNEL,
+   [PERF_SAMPLE_BRANCH_HV_SHIFT]   = LBR_IGN,
+   [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]   = LBR_RETURN | LBR_FAR,
+   [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
+   | LBR_FAR,
+   [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] =

[PATCH 05/14] perf, core: allow pmu specific data for perf task context

2014-01-02 Thread Yan, Zheng

Later patches will use pmu specific data to save LBR stack.

Signed-off-by: Yan, Zheng 
---
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 19 ++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 96cb88b..147f9d3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -252,6 +252,10 @@ struct pmu {
 */
void (*sched_task)  (struct perf_event_context *ctx,
 bool sched_in);
+   /*
+* PMU specific data size
+*/
+   size_t  task_ctx_size;
 };
 
 /**
@@ -496,6 +500,7 @@ struct perf_event_context {
int pin_count;
int nr_cgroups;  /* cgroup evts */
int nr_branch_stack; /* branch_stack evt */
+   void*task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aba4d6d..b6650ab 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -883,6 +883,15 @@ static void get_ctx(struct perf_event_context *ctx)
WARN_ON(!atomic_inc_not_zero(>refcount));
 }
 
+static void free_ctx(struct rcu_head *head)
+{
+   struct perf_event_context *ctx;
+
+   ctx = container_of(head, struct perf_event_context, rcu_head);
+   kfree(ctx->task_ctx_data);
+   kfree(ctx);
+}
+
 static void put_ctx(struct perf_event_context *ctx)
 {
if (atomic_dec_and_test(>refcount)) {
@@ -890,7 +899,7 @@ static void put_ctx(struct perf_event_context *ctx)
put_ctx(ctx->parent_ctx);
if (ctx->task)
put_task_struct(ctx->task);
-   kfree_rcu(ctx, rcu_head);
+   call_rcu(>rcu_head, free_ctx);
}
 }
 
@@ -3020,6 +3029,14 @@ alloc_perf_context(struct pmu *pmu, struct task_struct 
*task)
if (!ctx)
return NULL;
 
+   if (task && pmu->task_ctx_size > 0) {
+   ctx->task_ctx_data = kzalloc(pmu->task_ctx_size, GFP_KERNEL);
+   if (!ctx->task_ctx_data) {
+   kfree(ctx);
+   return NULL;
+   }
+   }
+
__perf_event_init_context(ctx);
if (task) {
ctx->task = task;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] MTD: UBI: avoid program operation on NOR flash after erasure interrupted

2014-01-02 Thread Artem Bityutskiy

On Fri, 2014-01-03 at 02:31 +, Qi Wang 王起 (qiwang) wrote:
> OK, thank you Artem.
> 
> I will change to git send-mail next time. And I will be very careful to push 
> patch to you next time.
> Sorry to bring so much trouble to you as I am first time to push patch to 
> Linux main trunk.(Maybe it also is Micron Technology company first time to 
> push patch to Linux main trunk, I am being Micron China Shanghai team 
> volunteer to contact Linux committee to push patch). 
> Anyway, Thanks for your patience to give me a good starting. 
> And if you have anything need my or my team help, Please kindly let me know. 
> Our team is glad to contribute to Linux community.
> Wish you have a wonderful 2014.

Please, remember that I pushed your patch to a temporary branch, only
for you to be able to easily pick it. I do not send it upstream.

As I said, I need to you look at it and confirm that it is all-right.
And if it is not all-right, correct it.

This is because I modified it myself.

So waiting for your response on this.

-- 
Best Regards,
Artem Bityutskiy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/14] perf, core: introduce pmu context switch callback

2014-01-02 Thread Yan, Zheng

The callback is invoked when process is scheduled in or out. It
provides mechanism for later patches to save/store the LBR stack.
It can also replace the flush branch stack callback.

To avoid unnecessary overhead, the callback is enabled dynamically

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c |  7 +
 arch/x86/kernel/cpu/perf_event.h |  4 +++
 include/linux/perf_event.h   |  8 ++
 kernel/events/core.c | 60 +++-
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8e13293..6703d17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1846,6 +1846,12 @@ static const struct attribute_group 
*x86_pmu_attr_groups[] = {
NULL,
 };
 
+static void x86_pmu_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+   if (x86_pmu.sched_task)
+   x86_pmu.sched_task(ctx, sched_in);
+}
+
 static void x86_pmu_flush_branch_stack(void)
 {
if (x86_pmu.flush_branch_stack)
@@ -1879,6 +1885,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.flush_branch_stack = x86_pmu_flush_branch_stack,
+   .sched_task = x86_pmu_sched_task,
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 745f6fb..3fdb751 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -417,6 +417,8 @@ struct x86_pmu {
 
void(*check_microcode)(void);
void(*flush_branch_stack)(void);
+   void(*sched_task)(struct perf_event_context *ctx,
+ bool sched_in);
 
/*
 * Intel Arch Perfmon v2+
@@ -675,6 +677,8 @@ void intel_pmu_pebs_disable_all(void);
 
 void intel_ds_init(void);
 
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
+
 void intel_pmu_lbr_reset(void);
 
 void intel_pmu_lbr_enable(struct perf_event *event);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..6a3e603 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,6 +251,12 @@ struct pmu {
 * flush branch stack on context-switches (needed in cpu-wide mode)
 */
void (*flush_branch_stack)  (void);
+
+   /*
+* PMU callback for context-switches. optional
+*/
+   void (*sched_task)  (struct perf_event_context *ctx,
+bool sched_in);
 };
 
 /**
@@ -546,6 +552,8 @@ extern void perf_event_delayed_put(struct task_struct 
*task);
 extern void perf_event_print_debug(void);
 extern void perf_pmu_disable(struct pmu *pmu);
 extern void perf_pmu_enable(struct pmu *pmu);
+extern void perf_sched_cb_disable(struct pmu *pmu);
+extern void perf_sched_cb_enable(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
 extern int perf_event_refresh(struct perf_event *event, int refresh);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 89d34f9..d110a23 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,6 +141,7 @@ enum event_type_t {
 struct static_key_deferred perf_sched_events __read_mostly;
 static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
 static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
+static DEFINE_PER_CPU(int, perf_sched_cb_usages);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
@@ -150,6 +151,7 @@ static atomic_t nr_freq_events __read_mostly;
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
+static struct idr pmu_idr;
 
 /*
  * perf event paranoia level:
@@ -2327,6 +2329,57 @@ unlock:
}
 }
 
+void perf_sched_cb_disable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)--;
+}
+
+void perf_sched_cb_enable(struct pmu *pmu)
+{
+   __get_cpu_var(perf_sched_cb_usages)++;
+}
+
+/*
+ * This function provides the context switch callback to the lower code
+ * layer. It is invoked ONLY when the context switch callback is enabled.
+ */
+static void perf_pmu_sched_task(struct task_struct *prev,
+   struct task_struct *next,
+   bool sched_in)
+{
+   struct perf_cpu_context *cpuctx;
+   struct pmu *pmu;
+   unsigned long flags;
+
+   if (prev == next)
+   return;
+
+   local_irq_save(flags);
+
+   rcu_read_lock();
+
+   pmu = idr_find(_idr, PERF_TYPE_RAW);
+
+   if (pmu && pmu->sched_task) {
+   cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+   pmu = cpuctx->ctx.pmu;
+
+   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+
+   perf_pmu_disable(pmu);

[PATCH 03/14] perf, x86: use context switch callback to flush LBR stack

2014-01-02 Thread Yan, Zheng

Enable the pmu context switch callback when LBR is used. Use the
callback to flush LBR stack when task is scheduled in. This allows
us to move code that flushes LBR stack from perf core to perf x86.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   |  7 ---
 arch/x86/kernel/cpu/perf_event.h   |  2 -
 arch/x86/kernel/cpu/perf_event_intel.c | 14 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 32 -
 include/linux/perf_event.h |  5 ---
 kernel/events/core.c   | 72 --
 6 files changed, 21 insertions(+), 111 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6703d17..69e2095 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1852,12 +1852,6 @@ static void x86_pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
x86_pmu.sched_task(ctx, sched_in);
 }
 
-static void x86_pmu_flush_branch_stack(void)
-{
-   if (x86_pmu.flush_branch_stack)
-   x86_pmu.flush_branch_stack();
-}
-
 void perf_check_microcode(void)
 {
if (x86_pmu.check_microcode)
@@ -1884,7 +1878,6 @@ static struct pmu pmu = {
.commit_txn = x86_pmu_commit_txn,
 
.event_idx  = x86_pmu_event_idx,
-   .flush_branch_stack = x86_pmu_flush_branch_stack,
.sched_task = x86_pmu_sched_task,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3fdb751..80b8e83 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -150,7 +150,6 @@ struct cpu_hw_events {
 * Intel LBR bits
 */
int lbr_users;
-   void*lbr_context;
struct perf_branch_stacklbr_stack;
struct perf_branch_entrylbr_entries[MAX_LBR_ENTRIES];
struct er_account   *lbr_sel;
@@ -416,7 +415,6 @@ struct x86_pmu {
void(*cpu_dead)(int cpu);
 
void(*check_microcode)(void);
-   void(*flush_branch_stack)(void);
void(*sched_task)(struct perf_event_context *ctx,
  bool sched_in);
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..4325bae 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2038,18 +2038,6 @@ static void intel_pmu_cpu_dying(int cpu)
fini_debug_store_on_cpu(cpu);
 }
 
-static void intel_pmu_flush_branch_stack(void)
-{
-   /*
-* Intel LBR does not tag entries with the
-* PID of the current task, then we need to
-* flush it on ctxsw
-* For now, we simply reset it
-*/
-   if (x86_pmu.lbr_nr)
-   intel_pmu_lbr_reset();
-}
-
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -2101,7 +2089,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_starting   = intel_pmu_cpu_starting,
.cpu_dying  = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
-   .flush_branch_stack = intel_pmu_flush_branch_stack,
+   .sched_task = intel_pmu_lbr_sched_task,
 };
 
 static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 1ae2ec5..7ff2a99 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -177,24 +177,32 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
-void intel_pmu_lbr_enable(struct perf_event *event)
+void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
-   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-
if (!x86_pmu.lbr_nr)
return;
 
/*
-* Reset the LBR stack if we changed task context to
-* avoid data leaks.
+* It is necessary to flush the stack on context switch. This happens
+* when the branch stack does not tag its entries with the pid of the
+* current task.
 */
-   if (event->ctx->task && cpuc->lbr_context != event->ctx) {
+   if (sched_in)
intel_pmu_lbr_reset();
-   cpuc->lbr_context = event->ctx;
-   }
+}
+
+void intel_pmu_lbr_enable(struct perf_event *event)
+{
+   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+   if (!x86_pmu.lbr_nr)
+   return;
+
cpuc->br_sel = event->hw.branch_reg.reg;
 
cpuc->lbr_users++;
+   if (cpuc->lbr_users == 1)
+   perf_sched_cb_enable(event->ctx->pmu);
 }
 
 void intel_pmu_lbr_disable(struct perf_event *event)
@@ -207,10 +215,10 @@

[PATCH 07/14] perf: track number of events that use LBR callstack

2014-01-02 Thread Yan, Zheng

Later patch will use it to decide if the LBR stack should be saved/restored

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index bdd8758..2137a9f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -201,15 +201,27 @@ void intel_pmu_lbr_sched_task(struct perf_event_context 
*ctx, bool sched_in)
intel_pmu_lbr_reset();
 }
 
+static inline bool branch_user_callstack(unsigned br_sel)
+{
+   return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
+}
+
 void intel_pmu_lbr_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+
cpuc->br_sel = event->hw.branch_reg.reg;
 
+   if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users++;
+
cpuc->lbr_users++;
if (cpuc->lbr_users == 1)
perf_sched_cb_enable(event->ctx->pmu);
@@ -217,11 +229,18 @@ void intel_pmu_lbr_enable(struct perf_event *event)
 
 void intel_pmu_lbr_disable(struct perf_event *event)
 {
-   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct cpu_hw_events *cpuc;
+   struct x86_perf_task_context *task_ctx;
 
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = event->ctx ? event->ctx->task_ctx_data : NULL;
+
+   if (branch_user_callstack(cpuc->br_sel))
+   task_ctx->lbr_callstack_users--;
+
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
 
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/14] perf, x86: Save/resotre LBR stack during context switch

2014-01-02 Thread Yan, Zheng

When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. The solution is saving/restoring
the LBR stack to/from task's perf event context.

The LBR stack is saved/restored only when there are events that use
the LBR call stack. If no event uses LBR call stack, the LBR stack
is reset when task is scheduled in.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 80 --
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 2137a9f..51e1842 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -187,18 +187,82 @@ void intel_pmu_lbr_reset(void)
intel_pmu_lbr_reset_64();
 }
 
+/*
+ * TOS = most recently recorded branch
+ */
+static inline u64 intel_pmu_lbr_tos(void)
+{
+   u64 tos;
+   rdmsrl(x86_pmu.lbr_tos, tos);
+   return tos;
+}
+
+enum {
+   LBR_UNINIT,
+   LBR_NONE,
+   LBR_VALID,
+};
+
+static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_NONE;
+}
+
+static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
+{
+   int i;
+   unsigned lbr_idx, mask = x86_pmu.lbr_nr - 1;
+   u64 tos = intel_pmu_lbr_tos();
+
+   for (i = 0; i < x86_pmu.lbr_nr; i++) {
+   lbr_idx = (tos - i) & mask;
+   rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
+   rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
+   }
+   task_ctx->lbr_stack_state = LBR_VALID;
+}
+
+
 void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 {
+   struct cpu_hw_events *cpuc;
+   struct x86_perf_task_context *task_ctx;
+
if (!x86_pmu.lbr_nr)
return;
 
+   cpuc = &__get_cpu_var(cpu_hw_events);
+   task_ctx = ctx ? ctx->task_ctx_data : NULL;
+
+
/*
 * It is necessary to flush the stack on context switch. This happens
 * when the branch stack does not tag its entries with the pid of the
 * current task.
 */
-   if (sched_in)
-   intel_pmu_lbr_reset();
+   if (sched_in) {
+   if (!task_ctx ||
+   !task_ctx->lbr_callstack_users ||
+   task_ctx->lbr_stack_state != LBR_VALID)
+   intel_pmu_lbr_reset();
+   else
+   __intel_pmu_lbr_restore(task_ctx);
+   } else if (task_ctx) {
+   if (task_ctx->lbr_callstack_users &&
+   task_ctx->lbr_stack_state != LBR_UNINIT)
+   __intel_pmu_lbr_save(task_ctx);
+   else
+   task_ctx->lbr_stack_state = LBR_NONE;
+   }
 }
 
 static inline bool branch_user_callstack(unsigned br_sel)
@@ -267,18 +331,6 @@ void intel_pmu_lbr_disable_all(void)
__intel_pmu_lbr_disable();
 }
 
-/*
- * TOS = most recently recorded branch
- */
-static inline u64 intel_pmu_lbr_tos(void)
-{
-   u64 tos;
-
-   rdmsrl(x86_pmu.lbr_tos, tos);
-
-   return tos;
-}
-
 static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
 {
unsigned long mask = x86_pmu.lbr_nr - 1;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/14] perf, x86: allocate space for storing LBR stack

2014-01-02 Thread Yan, Zheng

When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. We can use pmu specific data to
store LBR stack when task is scheduled out. This patch adds code
that allocates the pmu specific data.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 1 +
 arch/x86/kernel/cpu/perf_event.h | 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 69e2095..2e43f1b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1879,6 +1879,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
+   .task_ctx_size  = sizeof(struct x86_perf_task_context),
 };
 
 void arch_perf_update_userpage(struct perf_event_mmap_page *userpg, u64 now)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3ef4b79..3ed9629 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -459,6 +459,13 @@ struct x86_pmu {
struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
 };
 
+struct x86_perf_task_context {
+   u64 lbr_from[MAX_LBR_ENTRIES];
+   u64 lbr_to[MAX_LBR_ENTRIES];
+   int lbr_callstack_users;
+   int lbr_stack_state;
+};
+
 enum {
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/14] perf, core: simplify need branch stack check

2014-01-02 Thread Yan, Zheng

event->attr.branch_sample_type is non-zero no matter branch stack
is enabled explicitly or is enabled implicitly. So we can use it
toreplace intel_pmu_needs_lbr_smpl(). This avoids duplicating code
that implicitly enables the LBR.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 20 +++-
 include/linux/perf_event.h |  5 +
 kernel/events/core.c   | 11 +++
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 84a1c09..722171c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1030,20 +1030,6 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
-static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
-{
-   /* user explicitly requested branch sampling */
-   if (has_branch_stack(event))
-   return true;
-
-   /* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2)
-   return true;
-
-   return false;
-}
-
 static void intel_pmu_disable_all(void)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1208,7 +1194,7 @@ static void intel_pmu_disable_event(struct perf_event 
*event)
 * must disable before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_disable(event);
 
if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
@@ -1269,7 +1255,7 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 * must enabled before any actual event
 * because any event may be combined with LBR
 */
-   if (intel_pmu_needs_lbr_smpl(event))
+   if (needs_branch_stack(event))
intel_pmu_lbr_enable(event);
 
if (event->attr.exclude_host)
@@ -1741,7 +1727,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip && x86_pmu.pebs_aliases)
x86_pmu.pebs_aliases(event);
 
-   if (intel_pmu_needs_lbr_smpl(event)) {
+   if (needs_branch_stack(event)) {
ret = intel_pmu_setup_lbr_filter(event);
if (ret)
return ret;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 147f9d3..0d88eb8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -766,6 +766,11 @@ static inline bool has_branch_stack(struct perf_event 
*event)
return event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK;
 }
 
+static inline bool needs_branch_stack(struct perf_event *event)
+{
+   return event->attr.branch_sample_type != 0;
+}
+
 extern int perf_output_begin(struct perf_output_handle *handle,
 struct perf_event *event, unsigned int size);
 extern void perf_output_end(struct perf_output_handle *handle);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d6d8dea..7dd4d58 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1138,7 +1138,7 @@ list_add_event(struct perf_event *event, struct 
perf_event_context *ctx)
if (is_cgroup_event(event))
ctx->nr_cgroups++;
 
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
ctx->nr_branch_stack++;
 
list_add_rcu(>event_entry, >event_list);
@@ -1303,7 +1303,7 @@ list_del_event(struct perf_event *event, struct 
perf_event_context *ctx)
cpuctx->cgrp = NULL;
}
 
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
ctx->nr_branch_stack--;
 
ctx->nr_events--;
@@ -3202,7 +3202,7 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(_freq_events);
if (is_cgroup_event(event))
static_key_slow_dec_deferred(_sched_events);
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
static_key_slow_dec_deferred(_sched_events);
 
unaccount_event_cpu(event, event->cpu);
@@ -6627,7 +6627,7 @@ static void account_event(struct perf_event *event)
if (atomic_inc_return(_freq_events) == 1)
tick_nohz_full_kick_all();
}
-   if (has_branch_stack(event))
+   if (needs_branch_stack(event))
static_key_slow_inc(_sched_events.key);
if (is_cgroup_event(event))
static_key_slow_inc(_sched_events.key);
@@ -6735,6 +6735,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto err_ns;
 
+   if (!has_branch_stack(event))
+

[PATCH 14/14] perf, x86: Discard zero length call entries in LBR call stack

2014-01-02 Thread Yan, Zheng

"Zero length call" uses the attribute of the call instruction to push
the immediate instruction pointer on to the stack and then pops off
that address into a register. This is accomplished without any matching
return instruction. It confuses the hardware and make the recorded call
stack incorrect. Try fixing the call stack by discarding zero length
call entries.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 08e3ba1..57bdd34 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -94,7 +94,8 @@ enum {
X86_BR_ABORT= 1 << 12,/* transaction abort */
X86_BR_IN_TX= 1 << 13,/* in transaction */
X86_BR_NO_TX= 1 << 14,/* not in transaction */
-   X86_BR_CALL_STACK   = 1 << 15,/* call stack */
+   X86_BR_ZERO_CALL= 1 << 15,/* zero length call */
+   X86_BR_CALL_STACK   = 1 << 16,/* call stack */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -111,13 +112,15 @@ enum {
 X86_BR_JMP  |\
 X86_BR_IRQ  |\
 X86_BR_ABORT|\
-X86_BR_IND_CALL)
+X86_BR_IND_CALL |\
+X86_BR_ZERO_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
 
 #define X86_BR_ANY_CALL \
(X86_BR_CALL|\
 X86_BR_IND_CALL|\
+X86_BR_ZERO_CALL   |\
 X86_BR_SYSCALL |\
 X86_BR_IRQ |\
 X86_BR_INT)
@@ -652,6 +655,12 @@ static int branch_type(unsigned long from, unsigned long 
to, int abort)
ret = X86_BR_INT;
break;
case 0xe8: /* call near rel */
+   insn_get_immediate();
+   if (insn.immediate1.value == 0) {
+   /* zero length call */
+   ret = X86_BR_ZERO_CALL;
+   break;
+   }
case 0x9a: /* call far absolute */
ret = X86_BR_CALL;
break;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/14] perf, core: always switch pmu specific data during context switch

2014-01-02 Thread Yan, Zheng

If two tasks were both forked from the same parent task, Events in their perf
task contexts can be the same. Perf core optimizes context switch oout in this
case.

Previous patch inroduces pmu specific data. The data is task specific, so we
should switch the data even when context switch is optimized out.

Signed-off-by: Yan, Zheng 
---
 kernel/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b6650ab..d6d8dea 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2319,6 +2319,8 @@ static void perf_event_context_sched_out(struct 
task_struct *task, int ctxn,
next->perf_event_ctxp[ctxn] = ctx;
ctx->task = next;
next_ctx->task = task;
+   ctx->task_ctx_data = xchg(_ctx->task_ctx_data,
+ ctx->task_ctx_data);
do_switch = 0;
 
perf_event_sync_stat(ctx, next_ctx);
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/14] perf, x86: use LBR call stack to get user callchain

2014-01-02 Thread Yan, Zheng

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are executed
the last captured branch record is popped from the on-chip LBR registers.
The LBR call stack facility can help perf to get call chains of progam
without frame pointer.

This patch makes x86's perf_callchain_user() failback to LBR callstack
when there is no frame pointer in the user program.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c   | 33 ++
 arch/x86/kernel/cpu/perf_event_intel.c | 11 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |  2 ++
 include/linux/perf_event.h |  1 +
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 49128e6..1509340 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1965,12 +1965,28 @@ static unsigned long get_segment_base(unsigned int 
segment)
return get_desc_base(desc + idx);
 }
 
+static inline void
+perf_callchain_lbr_callstack(struct perf_callchain_entry *entry,
+struct perf_sample_data *data)
+{
+   struct perf_branch_stack *br_stack = data->br_stack;
+
+   if (br_stack && br_stack->user_callstack) {
+   int i = 0;
+   while (i < br_stack->nr && entry->nr < PERF_MAX_STACK_DEPTH) {
+   perf_callchain_store(entry, br_stack->entries[i].from);
+   i++;
+   }
+   }
+}
+
 #ifdef CONFIG_COMPAT
 
 #include 
 
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
/* 32-bit process in 64-bit kernel. */
unsigned long ss_base, cs_base;
@@ -1999,11 +2015,16 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
perf_callchain_store(entry, cs_base + frame.return_address);
fp = compat_ptr(ss_base + frame.next_frame);
}
+
+   if (fp == compat_ptr(regs->bp))
+   perf_callchain_lbr_callstack(entry, data);
+
return 1;
 }
 #else
 static inline int
-perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry *entry)
+perf_callchain_user32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs, struct perf_sample_data *data)
 {
 return 0;
 }
@@ -2033,12 +2054,12 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
if (!current->mm)
return;
 
-   if (perf_callchain_user32(regs, entry))
+   if (perf_callchain_user32(entry, regs, data))
return;
 
while (entry->nr < PERF_MAX_STACK_DEPTH) {
unsigned long bytes;
-   frame.next_frame = NULL;
+   frame.next_frame = NULL;
frame.return_address = 0;
 
bytes = copy_from_user_nmi(, fp, sizeof(frame));
@@ -2051,6 +2072,10 @@ void perf_callchain_user(struct perf_callchain_entry 
*entry,
perf_callchain_store(entry, frame.return_address);
fp = frame.next_frame;
}
+
+   /* try LBR callstack if there is no frame pointer */
+   if (fp == (void __user *)regs->bp)
+   perf_callchain_lbr_callstack(entry, data);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 722171c..8b7465c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1030,6 +1030,14 @@ static __initconst const u64 slm_hw_cache_event_ids
  },
 };
 
+static inline bool intel_pmu_needs_lbr_callstack(struct perf_event *event)
+{
+   if ((event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
+   (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK))
+   return true;
+   return false;
+}
+
 static void intel_pmu_disable_all(void)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -1398,7 +1406,8 @@ again:
 
perf_sample_data_init(, 0, event->hw.last_period);
 
-   if (has_branch_stack(event))
+   if (has_branch_stack(event) ||
+   (event->ctx->task && intel_pmu_needs_lbr_callstack(event)))
data.br_stack = >lbr_stack;
 
if (perf_event_overflow(event, , regs))
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 51e1842..08e3ba1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -718,6 +718,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
int i, j, type;

[PATCH 00/14] perf, x86: Haswell LBR call stack support

2014-01-02 Thread Yan, Zheng

For many profiling tasks we need the callgraph. For example we often
need to see the caller of a lock or the caller of a memcpy or other
library function to actually tune the program. Frame pointer unwinding
is efficient and works well. But frame pointers are off by default on
64bit code (and on modern 32bit gccs), so there are many binaries around
that do not use frame pointers. Profiling unchanged production code is
very useful in practice. On some CPUs frame pointer also has a high
cost. Dwarf2 unwinding also does not always work and is extremely slow
(upto 20% overhead).

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are
executed the last captured branch record is popped from the on-chip LBR
registers. The LBR call stack facility provides an alternative to get
callgraph. It has some limitations too, but should work in most cases
and is significantly faster than dwarf. Frame pointer unwinding is still
the best default, but LBR call stack is a good alternative when nothing
else works.

This patch series adds LBR call stack support. User can enabled/disable
this through an sysfs attribute file in the CPU PMU directory:
 echo 1 > /sys/bus/event_source/devices/cpu/lbr_callstack 

When profiling bc(1) on Fedora 19:
 echo 'scale=2000; 4*a(1)' > cmd; perf record -g fp bc -l < cmd

If this feature is enabled, perf report output looks like:
50.36%   bc  bc [.] bc_divide
 |
 --- bc_divide
 execute
 run_code
 yyparse
 main
 __libc_start_main
 _start

33.66%   bc  bc [.] _one_mult
 |
 --- _one_mult
 bc_divide
 execute
 run_code
 yyparse
 main
 __libc_start_main
 _start

 7.62%   bc  bc [.] _bc_do_add
 |
 --- _bc_do_add
|
|--99.89%-- 0x2000186a8
 --0.11%-- [...]

 6.83%   bc  bc [.] _bc_do_sub
 |
 --- _bc_do_sub
|
|--99.94%-- bc_add
|  execute
|  run_code
|  yyparse
|  main
|  __libc_start_main
|  _start
 --0.06%-- [...]

 0.46%   bc  libc-2.17.so   [.] __memset_sse2
 |
 --- __memset_sse2
|
|--54.13%-- bc_new_num
|  |
|  |--51.00%-- bc_divide
|  |  execute
|  |  run_code
|  |  yyparse
|  |  main
|  |  __libc_start_main
|  |  _start
|  |
|  |--30.46%-- _bc_do_sub
|  |  bc_add
|  |  execute
|  |  run_code
|  |  yyparse
|  |  main
|  |  __libc_start_main
|  |  _start
|  |
|   --18.55%-- _bc_do_add
| bc_add
| execute
| run_code
| yyparse
| main
| __libc_start_main
| _start
|
 --45.87%-- bc_divide
   execute
   run_code
   yyparse
   main
   __libc_start_main
   _start

If this feature is disabled, perf report output looks like:
50.49%   bc  bc [.] bc_divide
 |
 --- bc_divide

33.57%   bc  bc [.] _one_mult
 |
 --- _one_mult

 7.61%   bc  bc [.] _bc_do_add
 |
 --- _bc_do_add
 0x2000186a8

 6.88%   bc  bc [.]

[PATCH 11/14] perf, core: Pass perf_sample_data to perf_callchain()

2014-01-02 Thread Yan, Zheng

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. When the feature is enabled, function
call will be collected as normal, but as return instructions are executed
the last captured branch record is popped from the on-chip LBR registers.
The LBR call stack facility can help perf to get call chains of progam
without frame pointer.

This patch modifies various architectures' perf_callchain() to accept
perf sample data. Later patch will add code that use the sample data to
get call chains.

Signed-off-by: Yan, Zheng 
---
 arch/arm/kernel/perf_event.c | 4 ++--
 arch/powerpc/perf/callchain.c| 4 ++--
 arch/sparc/kernel/perf_event.c   | 4 ++--
 arch/x86/kernel/cpu/perf_event.c | 4 ++--
 include/linux/perf_event.h   | 3 ++-
 kernel/events/callchain.c| 8 +---
 kernel/events/core.c | 2 +-
 kernel/events/internal.h | 3 ++-
 8 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 789d846..276b13b 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -562,8 +562,8 @@ user_backtrace(struct frame_tail __user *tail,
return buftail.fp - 1;
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct frame_tail __user *tail;
 
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 74d1e78..b379ebc 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -482,8 +482,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
}
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
if (current_is_64bit())
perf_callchain_user_64(entry, regs);
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index b5c38fa..cba0306 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1785,8 +1785,8 @@ static void perf_callchain_user_32(struct 
perf_callchain_entry *entry,
} while (entry->nr < PERF_MAX_STACK_DEPTH);
 }
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
perf_callchain_store(entry, regs->tpc);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2e43f1b..49128e6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2009,8 +2009,8 @@ perf_callchain_user32(struct pt_regs *regs, struct 
perf_callchain_entry *entry)
 }
 #endif
 
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+void perf_callchain_user(struct perf_callchain_entry *entry,
+struct pt_regs *regs, struct perf_sample_data *data)
 {
struct stack_frame frame;
const void __user *fp;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0d88eb8..c442276 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -709,7 +709,8 @@ extern void perf_event_fork(struct task_struct *tsk);
 /* Callchains */
 DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
 
-extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
+extern void perf_callchain_user(struct perf_callchain_entry *entry, struct 
pt_regs *regs,
+   struct perf_sample_data *data);
 extern void perf_callchain_kernel(struct perf_callchain_entry *entry, struct 
pt_regs *regs);
 
 static inline void perf_callchain_store(struct perf_callchain_entry *entry, 
u64 ip)
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 97b67df..19d497c 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -30,7 +30,8 @@ __weak void perf_callchain_kernel(struct perf_callchain_entry 
*entry,
 }
 
 __weak void perf_callchain_user(struct perf_callchain_entry *entry,
-   struct pt_regs *regs)
+   struct pt_regs *regs,
+   struct perf_sample_data *data)
 {
 }
 
@@ -157,7 +158,8 @@ put_callchain_entry(int rctx)
 }
 
 struct perf_callchain_entry *
-perf_callchain(struct perf_event *event, struct pt_regs *regs)
+perf_callchain(struct perf_event *event, struct pt_regs *regs,
+  struct perf_sample_data *data)
 {
int rctx;
struct perf_callchain_entry *entry;
@@ -198,7 +200,7 @@ perf_callchain(struct perf_event *event, struct pt_regs 
*regs)

[PATCH 04/14] perf, x86: Basic Haswell LBR call stack support

2014-01-02 Thread Yan, Zheng

When the call stack feature is enabled, the LBR stack will capture
unfiltered call data normally, but as return instructions are executed,
the last captured branch record is flushed from the on-chip registers
in a last-in first-out (LIFO) manner. Thus, branch information relative
to leaf functions will not be captured, while preserving the call stack
information of the main line execution path.

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.h   |  7 ++-
 arch/x86/kernel/cpu/perf_event_intel.c |  2 +-
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 98 +++---
 3 files changed, 82 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 80b8e83..3ef4b79 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -460,7 +460,10 @@ struct x86_pmu {
 };
 
 enum {
-   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT = PERF_SAMPLE_BRANCH_MAX_SHIFT,
+   PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE,
+
+   PERF_SAMPLE_BRANCH_CALL_STACK = 1U << 
PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT,
 };
 
 #define x86_add_quirk(func_)   \
@@ -697,6 +700,8 @@ void intel_pmu_lbr_init_atom(void);
 
 void intel_pmu_lbr_init_snb(void);
 
+void intel_pmu_lbr_init_hsw(void);
+
 int intel_pmu_setup_lbr_filter(struct perf_event *event);
 
 int p4_pmu_init(void);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 4325bae..84a1c09 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2494,7 +2494,7 @@ __init int intel_pmu_init(void)
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, 
sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, 
sizeof(hw_cache_extra_regs));
 
-   intel_pmu_lbr_init_snb();
+   intel_pmu_lbr_init_hsw();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 7ff2a99..bdd8758 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -39,6 +39,7 @@ static enum {
 #define LBR_IND_JMP_BIT6 /* do not capture indirect jumps */
 #define LBR_REL_JMP_BIT7 /* do not capture relative jumps */
 #define LBR_FAR_BIT8 /* do not capture far branches */
+#define LBR_CALL_STACK_BIT 9 /* enable call stack */
 
 #define LBR_KERNEL (1 << LBR_KERNEL_BIT)
 #define LBR_USER   (1 << LBR_USER_BIT)
@@ -49,6 +50,7 @@ static enum {
 #define LBR_REL_JMP(1 << LBR_REL_JMP_BIT)
 #define LBR_IND_JMP(1 << LBR_IND_JMP_BIT)
 #define LBR_FAR(1 << LBR_FAR_BIT)
+#define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT)
 
 #define LBR_PLM (LBR_KERNEL | LBR_USER)
 
@@ -74,24 +76,25 @@ static enum {
  * x86control flow changes include branches, interrupts, traps, faults
  */
 enum {
-   X86_BR_NONE = 0,  /* unknown */
-
-   X86_BR_USER = 1 << 0, /* branch target is user */
-   X86_BR_KERNEL   = 1 << 1, /* branch target is kernel */
-
-   X86_BR_CALL = 1 << 2, /* call */
-   X86_BR_RET  = 1 << 3, /* return */
-   X86_BR_SYSCALL  = 1 << 4, /* syscall */
-   X86_BR_SYSRET   = 1 << 5, /* syscall return */
-   X86_BR_INT  = 1 << 6, /* sw interrupt */
-   X86_BR_IRET = 1 << 7, /* return from interrupt */
-   X86_BR_JCC  = 1 << 8, /* conditional */
-   X86_BR_JMP  = 1 << 9, /* jump */
-   X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
-   X86_BR_IND_CALL = 1 << 11,/* indirect calls */
-   X86_BR_ABORT= 1 << 12,/* transaction abort */
-   X86_BR_IN_TX= 1 << 13,/* in transaction */
-   X86_BR_NO_TX= 1 << 14,/* not in transaction */
+   X86_BR_NONE = 0,  /* unknown */
+
+   X86_BR_USER = 1 << 0, /* branch target is user */
+   X86_BR_KERNEL   = 1 << 1, /* branch target is kernel */
+
+   X86_BR_CALL = 1 << 2, /* call */
+   X86_BR_RET  = 1 << 3, /* return */
+   X86_BR_SYSCALL  = 1 << 4, /* syscall */
+   X86_BR_SYSRET   = 1 << 5, /* syscall return */
+   X86_BR_INT  = 1 << 6, /* sw interrupt */
+   X86_BR_IRET = 1 << 7, /* return from interrupt */
+   X86_BR_JCC  = 1 << 8, /* conditional */
+   X86_BR_JMP  = 1 << 9, /* jump */
+   X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
+   X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+   X86_BR_ABORT= 1 << 12,/* transaction abort */
+

[PATCH 13/14] perf, x86: enable LBR callstack when recording callchain

2014-01-02 Thread Yan, Zheng

Try enabling the LBR callstack facility if user requests recording
user space callchain. Also adds a cpu pmu attribute to enable/disable
this feature. This feature is disabled by default because it may
contend for the LBR with other events that explicitly require branch
stack

Signed-off-by: Yan, Zheng 
---
 arch/x86/kernel/cpu/perf_event.c | 99 
 arch/x86/kernel/cpu/perf_event.h |  7 +++
 2 files changed, 77 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1509340..3ea184a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -399,37 +399,49 @@ int x86_pmu_hw_config(struct perf_event *event)
 
if (event->attr.precise_ip > precise)
return -EOPNOTSUPP;
+   }
+   /*
+* check that PEBS LBR correction does not conflict with
+* whatever the user is asking with attr->branch_sample_type
+*/
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
+   u64 *br_type = >attr.branch_sample_type;
+
+   if (has_branch_stack(event)) {
+   if (!precise_br_compat(event))
+   return -EOPNOTSUPP;
+
+   /* branch_sample_type is compatible */
+
+   } else {
+   /*
+* user did not specify  branch_sample_type
+*
+* For PEBS fixups, we capture all
+* the branches at the priv level of the
+* event.
+*/
+   *br_type = PERF_SAMPLE_BRANCH_ANY;
+
+   if (!event->attr.exclude_user)
+   *br_type |= PERF_SAMPLE_BRANCH_USER;
+
+   if (!event->attr.exclude_kernel)
+   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
+   }
+   } else if ((event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
+  !has_branch_stack(event) &&
+  x86_pmu.attr_lbr_callstack &&
+  !event->attr.exclude_user &&
+  (event->attach_state & PERF_ATTACH_TASK)) {
/*
-* check that PEBS LBR correction does not conflict with
-* whatever the user is asking with attr->branch_sample_type
+* user did not specify branch_sample_type,
+* try using the LBR call stack facility to
+* record call chains of user program.
 */
-   if (event->attr.precise_ip > 1 &&
-   x86_pmu.intel_cap.pebs_format < 2) {
-   u64 *br_type = >attr.branch_sample_type;
-
-   if (has_branch_stack(event)) {
-   if (!precise_br_compat(event))
-   return -EOPNOTSUPP;
-
-   /* branch_sample_type is compatible */
-
-   } else {
-   /*
-* user did not specify  branch_sample_type
-*
-* For PEBS fixups, we capture all
-* the branches at the priv level of the
-* event.
-*/
-   *br_type = PERF_SAMPLE_BRANCH_ANY;
-
-   if (!event->attr.exclude_user)
-   *br_type |= PERF_SAMPLE_BRANCH_USER;
-
-   if (!event->attr.exclude_kernel)
-   *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
-   }
-   }
+   event->attr.branch_sample_type =
+   PERF_SAMPLE_BRANCH_USER |
+   PERF_SAMPLE_BRANCH_CALL_STACK;
}
 
/*
@@ -1828,10 +1840,39 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
return count;
 }
 
+static ssize_t get_attr_lbr_callstack(struct device *cdev,
+ struct device_attribute *attr, char *buf)
+{
+   return snprintf(buf, 40, "%d\n", x86_pmu.attr_lbr_callstack);
+}
+
+static ssize_t set_attr_lbr_callstack(struct device *cdev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   unsigned long val;
+   ssize_t ret;
+
+   ret = kstrtoul(buf, 0, );
+   if (ret)
+   return ret;
+
+   if (!!val != !!x86_pmu.attr_lbr_callstack) {
+   if (val && !x86_pmu_has_lbr_callstack())
+   return -EOPNOTSUPP;
+   x86_pmu.attr_lbr_callstack = !!val;
+   }
+   return count;
+}
+
 static

Re: [PATCH v5 1/1] cpufreq: tegra: Re-model Tegra20 cpufreq driver

2014-01-02 Thread Viresh Kumar

On 2 January 2014 16:38, bilhuang  wrote:
> Actually, I don't have plan or resource on doing this, would it be better
> that you help to do that instead? Thanks.

Point taken. I am there to help if required. So, initially you can just make
Tegra work according to the new file we were talking about. I will fix
others later.

>> I am not sure about the location of such file. Should this be placed in DT
>> code somewhere or kept in cpufreq? Rob/Grant ??
>>
> Do we have consensus on where to create such file?

Not yet, probably people were on leaves.

@Grant/Rob: Any inputs here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory allocator semantics

2014-01-02 Thread Josh Triplett

On Thu, Jan 02, 2014 at 09:14:17PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 02, 2014 at 07:39:07PM -0800, Josh Triplett wrote:
> > On Thu, Jan 02, 2014 at 12:33:20PM -0800, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > From what I can see, the Linux-kernel's SLAB, SLOB, and SLUB memory
> > > allocators would deal with the following sort of race:
> > > 
> > > A.CPU 0: r1 = kmalloc(...); ACCESS_ONCE(gp) = r1;
> > > 
> > >   CPU 1: r2 = ACCESS_ONCE(gp); if (r2) kfree(r2);
> > > 
> > > However, my guess is that this should be considered an accident of the
> > > current implementation rather than a feature.  The reason for this is
> > > that I cannot see how you would usefully do (A) above without also 
> > > allowing
> > > (B) and (C) below, both of which look to me to be quite destructive:
> > 
> > (A) only seems OK if "gp" is guaranteed to be NULL beforehand, *and* if
> > no other CPUs can possibly do what CPU 1 is doing in parallel.  Even
> > then, it seems questionable how this could ever be used successfully in
> > practice.
> > 
> > This seems similar to the TCP simultaneous-SYN case: theoretically
> > possible, absurd in practice.
> 
> Heh!
> 
> Agreed on the absurdity, but my quick look and slab/slob/slub leads
> me to believe that current Linux kernel would actually do something
> sensible in this case.  But only because they don't touch the actual
> memory.  DYNIX/ptx would have choked on it, IIRC.

Based on this and the discussion at the bottom of your mail, I think I'm
starting to understand what you're getting at; this seems like less of a
question of "could this usefully happen?" and more "does the allocator
know how to protect *itself*?".

> > > But I thought I should ask the experts.
> > > 
> > > So, am I correct that kernel hackers are required to avoid "drive-by"
> > > kfree()s of kmalloc()ed memory?
> > 
> > Don't kfree things that are in use, and synchronize to make sure all
> > CPUs agree about "in use", yes.
> 
> For example, ensure that each kmalloc() happens unambiguously before the
> corresponding kfree().  ;-)

That too, yes. :)

> > > PS.  To the question "Why would anyone care about (A)?", then answer
> > >  is "Inquiring programming-language memory-model designers want
> > >  to know."
> > 
> > I find myself wondering about the original form of the question, since
> > I'd hope that programming-languge memory-model designers would
> > understand the need for synchronization around reclaiming memory.
> 
> I think that they do now.  The original form of the question was as
> follows:
> 
>   But my intuition at the moment is that allowing racing
>   accesses and providing pointer atomicity leads to a much more
>   complicated and harder to explain model.  You have to deal
>   with initialization issues and OOTA problems without atomics.
>   And the implementation has to deal with cross-thread visibility
>   of malloc meta-information, which I suspect will be expensive.
>   You now essentially have to be able to malloc() in one thread,
>   transfer the pointer via a race to another thread, and free()
>   in the second thread.  That’s hard unless malloc() and free()
>   always lock (as I presume they do in the Linux kernel).

As mentioned above, this makes much more sense now.  This seems like a
question of how the allocator protects its *own* internal data
structures, rather than whether the allocator can usefully be used for
the cases you mentioned above.  And that's a reasonable question to ask
if you're building a language memory model for a language with malloc
and free as part of its standard library.

To roughly sketch out some general rules that might work as a set of
scalable design constraints for malloc/free:

- malloc may always return any unallocated memory; it has no obligation
  to avoid returning memory that was just recently freed.  In fact, an
  implementation may even be particularly *likely* to return memory that
  was just recently freed, for performance reasons.  Any program which
  assumes a delay or a memory barrier before memory reuse is broken.

- Multiple calls to free on the same memory will produce undefined
  behavior, and in particular may result in a well-known form of
  security hole.  free has no obligation to protect itself against
  multiple calls to free on the same memory, unless otherwise specified
  as part of some debugging mode.  This holds whether the calls to free
  occur in series or in parallel (e.g. two or more calls racing with
  each other).  It is the job of the calling program to avoid calling
  free multiple times on the same memory, such as via reference
  counting, RCU, or some other mechanism.

- It is the job of the calling program to avoid calling free on memory
  that is currently in use, such as via reference counting, RCU, or some
  other mechanism.  Accessing memory after reclaiming it will produce
  undefined behavior.  This includes calling free on memory

Re: [PATCH 1/3] powernow-k6: disable cache when changing frequency

2014-01-02 Thread Viresh Kumar

On 2 January 2014 23:08, Mikulas Patocka  wrote:
> Flushing the cache and changing frequency takes approximatelly 500us. The
> patch increases policy->cpuinfo.transition_latency to that value.

Its not about how fast caches get cleaned but how much time would
be wasted to get them filled again as same data could be required again
which is just flushed out. That would impact performance more than
flushing caches.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] powernow-k6: reorder frequencies

2014-01-02 Thread Viresh Kumar

On 12 December 2013 06:09, Mikulas Patocka  wrote:
> This patch reorders reported frequencies from the highest to the lowest,
> just like in other frequency drivers.
>
> Signed-off-by: Mikulas Patocka 
> Cc: sta...@kernel.org
>
> ---
>  drivers/cpufreq/powernow-k6.c |   17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> Index: linux-3.12.3/drivers/cpufreq/powernow-k6.c
> ===
> --- linux-3.12.3.orig/drivers/cpufreq/powernow-k6.c 2013-12-06 
> 22:08:27.0 +0100
> +++ linux-3.12.3/drivers/cpufreq/powernow-k6.c  2013-12-06 22:17:44.0 
> +0100
> @@ -37,17 +37,20 @@ MODULE_PARM_DESC(bus_frequency, "Bus fre
>
>  /* Clock ratio multiplied by 10 - see table 27 in AMD#23446 */
>  static struct cpufreq_frequency_table clock_ratio[] = {
> -   {45,  /* 000 -> 4.5x */ 0},
> +   {60,  /* 110 -> 6.0x */ 0},
> +   {55,  /* 011 -> 5.5x */ 0},
> {50,  /* 001 -> 5.0x */ 0},
> +   {45,  /* 000 -> 4.5x */ 0},
> {40,  /* 010 -> 4.0x */ 0},
> -   {55,  /* 011 -> 5.5x */ 0},
> -   {20,  /* 100 -> 2.0x */ 0},
> -   {30,  /* 101 -> 3.0x */ 0},
> -   {60,  /* 110 -> 6.0x */ 0},
> {35,  /* 111 -> 3.5x */ 0},
> +   {30,  /* 101 -> 3.0x */ 0},
> +   {20,  /* 100 -> 2.0x */ 0},
> {0, CPUFREQ_TABLE_END}
>  };
>
> +static const u8 index_to_register[8] = { 6, 3, 1, 0, 2, 7, 5, 4 };
> +static const u8 register_to_index[8] = { 3, 2, 4, 1, 7, 6, 0, 5 };
> +
>  static const struct {
> unsigned freq;
> unsigned mult;
> @@ -91,7 +94,7 @@ static int powernow_k6_get_cpu_multiplie
>
> local_irq_enable();
>
> -   return clock_ratio[(invalue >> 5)&7].driver_data;
> +   return clock_ratio[register_to_index[(invalue >> 5)&7]].driver_data;
>  }
>
>  static void powernow_k6_set_cpu_multiplier(unsigned int best_i)
> @@ -111,7 +114,7 @@ static void powernow_k6_set_cpu_multipli
> write_cr0(cr0 | X86_CR0_CD);
> wbinvd();
>
> -   outvalue = (1<<12) | (1<<10) | (1<<9) | (best_i<<5);
> +   outvalue = (1<<12) | (1<<10) | (1<<9) | 
> (index_to_register[best_i]<<5);
>
> msrval = POWERNOW_IOPORT + 0x1;
> wrmsr(MSR_K6_EPMR, msrval, 0); /* enable the PowerNow port */

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 1/4] net: introduce backup_classid to struct skbuff

2014-01-02 Thread David Miller

From: Libo Chen 
Date: Fri, 3 Jan 2014 11:11:04 +0800

> 
> introduce backup_classid to struct skbuff,
> we can use it to backup sk_classid when net_ns switch.
> 
> Signed-off-by: Libo Chen 

Sorry, no new sk_buff members unless there is absolutely not other
possible implementation.

sk_buff is too big as-is.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] r8152: fix the wrong return value

2014-01-02 Thread David Cohen

On Fri, Jan 03, 2014 at 11:21:56AM +0800, Hayes Wang wrote:
> The return value should be the boolean value, not the error code.
> 
> Signed-off-by: Hayes Wang 
> Spotted-by: Dan Carpenter 
> ---
>  drivers/net/usb/r8152.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index e3d878c..13fabbb 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -2708,7 +2708,7 @@ static bool rtl_ops_init(struct r8152 *tp, const struct 
> usb_device_id *id)
>   ops->unload = rtl8153_unload;
>   break;
>   default:
> - ret = -EFAULT;
> + ret = false;

How about fix the function's return type instead?
Returning bool for success/error in Linux kernel is not natural. You
gotta check rtl_ops_init() for success and !rtl_ops_init() for error.
And you are unable to return the error value.

Br, David

>   break;
>   }
>   break;
> -- 
> 1.8.4.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] leds: s3c24xx: Fix build failure

2014-01-02 Thread Tushar Behera

Commit c67d0f29262b ("ARM: s3c24xx: get rid of custom ")
removed the usage of mach/gpio.h file, but we need to include
plat/gpio-cfg.h to avoid following build error.

Fixes following build error.
drivers/leds/leds-s3c24xx.c: In function ‘s3c24xx_led_probe’:
drivers/leds/leds-s3c24xx.c:100:2: error: implicit declaration of
function ‘s3c_gpio_setpull’ [-Werror=implicit-function-declaration]

Signed-off-by: Tushar Behera 
---
Changes for V2:
* Updated commit message

Bryan,

I should have been more explicit regarding this patch. This patch fixes
build error on linux-next after the above patch was merged.

Tested at next-20131224.

 drivers/leds/leds-s3c24xx.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/leds/leds-s3c24xx.c b/drivers/leds/leds-s3c24xx.c
index 76483fb..87cf215 100644
--- a/drivers/leds/leds-s3c24xx.c
+++ b/drivers/leds/leds-s3c24xx.c
@@ -21,6 +21,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* our context */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH net-next 0/4] net_cls for sys container

2014-01-02 Thread Cong Wang

On Thu, Jan 2, 2014 at 7:11 PM, Libo Chen  wrote:
> Hi guys,
>
> Now, lxc created with veth can not be under control by
> cls_cgroup.
>
> the former discussion:
> http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/00214.html
>
> In short, because cls_cgroup relys classid attached to sock
> filter skb, but sock will be cleared inside dev_forward_skb()
> in veth_xmit().


So what are you trying to achieve here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory allocator semantics

2014-01-02 Thread Paul E. McKenney

On Thu, Jan 02, 2014 at 07:39:07PM -0800, Josh Triplett wrote:
> On Thu, Jan 02, 2014 at 12:33:20PM -0800, Paul E. McKenney wrote:
> > Hello!
> > 
> > From what I can see, the Linux-kernel's SLAB, SLOB, and SLUB memory
> > allocators would deal with the following sort of race:
> > 
> > A.  CPU 0: r1 = kmalloc(...); ACCESS_ONCE(gp) = r1;
> > 
> > CPU 1: r2 = ACCESS_ONCE(gp); if (r2) kfree(r2);
> > 
> > However, my guess is that this should be considered an accident of the
> > current implementation rather than a feature.  The reason for this is
> > that I cannot see how you would usefully do (A) above without also allowing
> > (B) and (C) below, both of which look to me to be quite destructive:
> 
> (A) only seems OK if "gp" is guaranteed to be NULL beforehand, *and* if
> no other CPUs can possibly do what CPU 1 is doing in parallel.  Even
> then, it seems questionable how this could ever be used successfully in
> practice.
> 
> This seems similar to the TCP simultaneous-SYN case: theoretically
> possible, absurd in practice.

Heh!

Agreed on the absurdity, but my quick look and slab/slob/slub leads
me to believe that current Linux kernel would actually do something
sensible in this case.  But only because they don't touch the actual
memory.  DYNIX/ptx would have choked on it, IIRC.

And the fact that slab/slob/slub seem to handle (A) seemed bizarre
enough to be worth asking the question.

> > B.  CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;
> > 
> > CPU 1: r2 = ACCESS_ONCE(shared_x); if (r2) kfree(r2);
> > 
> > CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);
> > 
> > This results in the memory being on two different freelists.
> 
> That's a straightforward double-free bug.  You need some kind of
> synchronization there to ensure that only one call to kfree occurs.

Yep!

> > C.  CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;
> > 
> > CPU 1: r2 = ACCESS_ONCE(shared_x); r2->a = 1; r2->b = 2;
> > 
> > CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);
> > 
> > CPU 3: r4 = kmalloc(...);  r4->s = 3; r4->t = 4;
> > 
> > This results in the memory being used by two different CPUs,
> > each of which believe that they have sole access.
> 
> This is not OK either: CPU 2 has called kfree on a pointer that CPU 1
> still considers alive, and again, the CPUs haven't used any form of
> synchronization to prevent that.

Agreed.

> > But I thought I should ask the experts.
> > 
> > So, am I correct that kernel hackers are required to avoid "drive-by"
> > kfree()s of kmalloc()ed memory?
> 
> Don't kfree things that are in use, and synchronize to make sure all
> CPUs agree about "in use", yes.

For example, ensure that each kmalloc() happens unambiguously before the
corresponding kfree().  ;-)

> > PS.  To the question "Why would anyone care about (A)?", then answer
> >  is "Inquiring programming-language memory-model designers want
> >  to know."
> 
> I find myself wondering about the original form of the question, since
> I'd hope that programming-languge memory-model designers would
> understand the need for synchronization around reclaiming memory.

I think that they do now.  The original form of the question was as
follows:

But my intuition at the moment is that allowing racing
accesses and providing pointer atomicity leads to a much more
complicated and harder to explain model.  You have to deal
with initialization issues and OOTA problems without atomics.
And the implementation has to deal with cross-thread visibility
of malloc meta-information, which I suspect will be expensive.
You now essentially have to be able to malloc() in one thread,
transfer the pointer via a race to another thread, and free()
in the second thread.  That’s hard unless malloc() and free()
always lock (as I presume they do in the Linux kernel).

But the first I heard of it was something like litmus test (A) above.

(And yes, I already disabused them of their notion that Linux kernel
kmalloc() and kfree() always lock.)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] leds: s3c24xx: Fix build failure

2014-01-02 Thread Tushar Behera

On 3 January 2014 04:12, Bryan Wu  wrote:
> On Mon, Dec 30, 2013 at 1:09 AM, Tushar Behera  
> wrote:
>> Fixes following build error.
>> drivers/leds/leds-s3c24xx.c: In function ‘s3c24xx_led_probe’:
>> drivers/leds/leds-s3c24xx.c:100:2: error: implicit declaration of
>> function ‘s3c_gpio_setpull’ [-Werror=implicit-function-declaration]
>>
>
> I don't see any building error with s3c2410_defconfig. Actually this
>  is included in
> arch/arm/mach-s3c24xx/include/mach/gpio.h which is in
> arch/arm/include/asm/gpio.h, then 
>

This requires that CONFIG_NEED_MACH_GPIO_H is defined. Following
commit has removed this option for s3c24xx platform.

c67d0f29262b ("ARM: s3c24xx: get rid of custom ").

-- 
Tushar Behera
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible regression from "fs/exec.c: call arch_pick_mmap_layout() only once"

2014-01-02 Thread Pat Erley


On 01/02/2014 04:39 PM, Pat Erley wrote:

On 01/02/2014 04:24 PM, Richard Weinberger wrote:

Am Donnerstag, 2. Januar 2014, 15:41:27 schrieb Pat Erley:

On my 64bit kernel, commit 283fe963095b38a6ab75dda1436ee66b9e45c7c2
seems to have broken 32bit compatibility.  I've run the bisection twice,
and verified that reverting this on HEAD fixes the problem.  I've
uploaded my .config to pastebin at http://pastebin.com/kVcr9H65

Even this simple program:

main(){puts("HELLO");}

compiled with:

gcc -m32 test.c

Will crash with a segfault.  Stracing shows that it's failing to
allocate memory.


Good catch!

flush_old_exec() is called before setup_new_exec() and I've removed
arch_pick_mmap_layout() from the second call site.
Which turned out to be wrong.

It is wrong because between both callers current->personality is changed.
So, we have to remove the first call to arch_pick_mmap_layout() and
keep the
latter because only then the correct personality is set up.

Can you please test your config with the following patch applied and
having
283fe96 reverted?

If it works out for you I'd send an updated patch to Andrew.
In the meanwhile I'll double check all call sites...

Thanks,
//richard

---
diff --git a/fs/exec.c b/fs/exec.c
index 7ea097f..a733599 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -843,7 +843,6 @@ static int exec_mmap(struct mm_struct *mm)
  tsk->active_mm = mm;
  activate_mm(active_mm, mm);
  task_unlock(tsk);
-arch_pick_mmap_layout(mm);
  if (old_mm) {
  up_read(_mm->mmap_sem);
  BUG_ON(active_mm != old_mm);



Compiling right now.  Will test later tonight and let you know.

Pat


I can confirm that this works with 283fe96 reverted.

Pat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.2 056/185] mm: ensure get_unmapped_area() returns higher address than mmap_min_addr

2014-01-02 Thread Ben Hutchings

On Sun, 2013-12-29 at 03:08 +0100, Ben Hutchings wrote:
> 3.2.54-rc1 review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Akira Takeuchi 
> 
> commit 2afc745f3e3079ab16c826be4860da2529054dd2 upstream.
[...]
> [bwh: Backported to 3.2:
>  As we do not have vm_unmapped_area(), make arch_get_unmapped_area_topdown()
>  calculate the lower limit for the new area's end address and then compare
>  addresses with this instead of with len.  In the process, fix an off-by-one
>  error which could result in returning 0 if mm->mmap_base == len.]

I'm dropping this as I have no good way to test the backport (it's not
used on x86) and I didn't get any confirmation that it's right.

Ben.

> Signed-off-by: Ben Hutchings 
> ---
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1368,7 +1368,7 @@ arch_get_unmapped_area(struct file *filp
>   struct vm_area_struct *vma;
>   unsigned long start_addr;
>  
> - if (len > TASK_SIZE)
> + if (len > TASK_SIZE - mmap_min_addr)
>   return -ENOMEM;
>  
>   if (flags & MAP_FIXED)
> @@ -1377,7 +1377,7 @@ arch_get_unmapped_area(struct file *filp
>   if (addr) {
>   addr = PAGE_ALIGN(addr);
>   vma = find_vma(mm, addr);
> - if (TASK_SIZE - len >= addr &&
> + if (TASK_SIZE - len >= addr && addr >= mmap_min_addr &&
>   (!vma || addr + len <= vma->vm_start))
>   return addr;
>   }
> @@ -1442,9 +1442,10 @@ arch_get_unmapped_area_topdown(struct fi
>   struct vm_area_struct *vma;
>   struct mm_struct *mm = current->mm;
>   unsigned long addr = addr0;
> + unsigned long low_limit = max(PAGE_SIZE, mmap_min_addr);
>  
>   /* requested length too big for entire address space */
> - if (len > TASK_SIZE)
> + if (len > TASK_SIZE - mmap_min_addr)
>   return -ENOMEM;
>  
>   if (flags & MAP_FIXED)
> @@ -1454,7 +1455,7 @@ arch_get_unmapped_area_topdown(struct fi
>   if (addr) {
>   addr = PAGE_ALIGN(addr);
>   vma = find_vma(mm, addr);
> - if (TASK_SIZE - len >= addr &&
> + if (TASK_SIZE - len >= addr && addr >= mmap_min_addr &&
>   (!vma || addr + len <= vma->vm_start))
>   return addr;
>   }
> @@ -1469,14 +1470,14 @@ arch_get_unmapped_area_topdown(struct fi
>   addr = mm->free_area_cache;
>  
>   /* make sure it can fit in the remaining address space */
> - if (addr > len) {
> + if (addr >= low_limit + len) {
>   vma = find_vma(mm, addr-len);
>   if (!vma || addr <= vma->vm_start)
>   /* remember the address as a hint for next time */
>   return (mm->free_area_cache = addr-len);
>   }
>  
> - if (mm->mmap_base < len)
> + if (mm->mmap_base < low_limit + len)
>   goto bottomup;
>  
>   addr = mm->mmap_base-len;
> @@ -1498,7 +1499,7 @@ arch_get_unmapped_area_topdown(struct fi
>  
>   /* try just below the current vma->vm_start */
>   addr = vma->vm_start-len;
> - } while (len < vma->vm_start);
> + } while (vma->vm_start >= low_limit + len);
>  
>  bottomup:
>   /*

-- 
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v7 00/12] kexec kernel efi runtime support

2014-01-02 Thread Dave Young

> Please send these as separate patches and include the compiler errors in
> the commit message. I'll pick them up and send them to Peter.

Sent.

>  
> > build fix: move parse_efi_setup to efi*.c, call it in efi_init instead in 
> > setup.c
> 
> Why have you moved the call site for parse_efi_setup()? What's the
> rationale? Parsing SETUP_* entries outside of parse_setup_data() seems
> to me to be a step backwards in terms of clarity.

SETUP_PCI also duplicate the parsing logic out of setup.c.
I added static inline in ifdef else branch, but I got some warnings yestoday
about "unused function", double checked it today there's no such warnings
anymore, it might be caused by some mistake.

Changed to static inline {} in patches I just sent a moment ago.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] clocksource: mxs_timer: Get rid of mxs_clockevent_mode variable

2014-01-02 Thread Axel Lin

The current mode setting is stored in mode field of struct clock_event_device.
So we can just remove the mxs_clockevent_mode variable.

Signed-off-by: Axel Lin 
---
 drivers/clocksource/mxs_timer.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/clocksource/mxs_timer.c b/drivers/clocksource/mxs_timer.c
index 0f5e65f..d7d5f11 100644
--- a/drivers/clocksource/mxs_timer.c
+++ b/drivers/clocksource/mxs_timer.c
@@ -77,7 +77,6 @@
 #define BV_TIMROTv2_TIMCTRLn_SELECT__TICK_ALWAYS   0xf
 
 static struct clock_event_device mxs_clockevent_device;
-static enum clock_event_mode mxs_clockevent_mode = CLOCK_EVT_MODE_UNUSED;
 
 static void __iomem *mxs_timrot_base;
 static u32 timrot_major_version;
@@ -156,7 +155,7 @@ static void mxs_set_mode(enum clock_event_mode mode,
/* Disable interrupt in timer module */
timrot_irq_disable();
 
-   if (mode != mxs_clockevent_mode) {
+   if (mode != evt->mode) {
/* Set event time into the furthest future */
if (timrot_is_v1())
__raw_writel(0x,
@@ -171,13 +170,10 @@ static void mxs_set_mode(enum clock_event_mode mode,
 
 #ifdef DEBUG
pr_info("%s: changing mode from %s to %s\n", __func__,
-   clock_event_mode_label[mxs_clockevent_mode],
+   clock_event_mode_label[evt->mode],
clock_event_mode_label[mode]);
 #endif /* DEBUG */
 
-   /* Remember timer mode */
-   mxs_clockevent_mode = mode;
-
switch (mode) {
case CLOCK_EVT_MODE_PERIODIC:
pr_err("%s: Periodic mode is not implemented\n", __func__);
-- 
1.8.1.2




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] Staging: rtl8188eu: Fixed coding style issues

2014-01-02 Thread Joe Perches


On Fri, 2014-01-03 at 00:22 +0100, Tim Jester-Pfadt wrote:
> Fixed indentation coding style issues on rtw_io.c
[]
> diff --git a/drivers/staging/rtl8188eu/core/rtw_io.c 
> b/drivers/staging/rtl8188eu/core/rtw_io.c
[]
> @@ -205,9 +205,9 @@ void _rtw_read_mem(struct adapter *adapter, u32 addr, u32 
> cnt, u8 *pmem)
>  
>   _func_enter_;
>   if (adapter->bDriverStopped || adapter->bSurpriseRemoved) {
> -  RT_TRACE(_module_rtl871x_io_c_, _drv_info_,
> -   ("rtw_read_mem:bDriverStopped(%d) OR 
> bSurpriseRemoved(%d)",
> -   adapter->bDriverStopped, adapter->bSurpriseRemoved));
> + RT_TRACE(_module_rtl871x_io_c_, _drv_info_,
> +  ("rtw_read_mem:bDriverStopped(%d) OR 
> bSurpriseRemoved(%d)",
> +  adapter->bDriverStopped, adapter->bSurpriseRemoved));

Be nice to remove the unnecessary parentheses on all of these too

RT_TRACE(_module_rtl871x_io_c_, _drv_info_,
 "rtw_read_mem:bDriverStopped(%d) OR 
bSurpriseRemoved(%d)",
 adapter->bDriverStopped, adapter->bSurpriseRemoved);

etc...

A few other things too:

o The _func_enter_ uses are unnecessary and could/should
  be removed.  There's a standard function tracing capability.
o The RT_TRACE uses that embed a function name could/should use
  %s:, __func__
o The RT_TRACE macro doesn't add a terminating newline and these
  uses should have them.

RT_TRACE(_module_rtl871x_io_c_, _drv_info_,
 "%s:bDriverStopped(%d) OR bSurpriseRemoved(%d)\n",
 __func__, adapter->bDriverStopped, 
adapter->bSurpriseRemoved);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH tip/efi-kexec] x86: setup.c build fix

2014-01-02 Thread Dave Young

In case without CONFIG_EFI, there will be below build error:
   arch/x86/built-in.o: In function `setup_arch':
>> (.init.text+0x9dc): undefined reference to `parse_efi_setup'

Thus fix it by adding blank inline function in asm/efi.h
Also remove an unused declaration for variable efi_data_len.

Signed-off-by: Dave Young 
---
 arch/x86/include/asm/efi.h |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: tip/arch/x86/include/asm/efi.h
===
--- tip.orig/arch/x86/include/asm/efi.h
+++ tip/arch/x86/include/asm/efi.h
@@ -142,8 +142,6 @@ struct efi_setup_data {
 };
 
 extern u64 efi_setup;
-extern u32 efi_data_len;
-extern void parse_efi_setup(u64 phys_addr, u32 data_len);
 
 #ifdef CONFIG_EFI
 
@@ -153,7 +151,7 @@ static inline bool efi_is_native(void)
 }
 
 extern struct console early_efi_console;
-
+extern void parse_efi_setup(u64 phys_addr, u32 data_len);
 #else
 /*
  * IF EFI is not configured, have the EFI calls return -ENOSYS.
@@ -165,6 +163,7 @@ extern struct console early_efi_console;
 #define efi_call4(_f, _a1, _a2, _a3, _a4)  (-ENOSYS)
 #define efi_call5(_f, _a1, _a2, _a3, _a4, _a5) (-ENOSYS)
 #define efi_call6(_f, _a1, _a2, _a3, _a4, _a5, _a6)(-ENOSYS)
+static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
 #endif /* CONFIG_EFI */
 
 #endif /* _ASM_X86_EFI_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH tip/efi-kexec] x86: ksysfs.c build fix

2014-01-02 Thread Dave Young

kbuild test robot report below error for randconfig:

   arch/x86/kernel/ksysfs.c: In function 'get_setup_data_paddr':
>> arch/x86/kernel/ksysfs.c:81:3: error: implicit declaration of function 
>> 'ioremap_cache' [-Werror=implicit-function-declaration]
  data = ioremap_cache(pa_data, sizeof(*data));
  ^
>> arch/x86/kernel/ksysfs.c:86:3: error: implicit declaration of function 
>> 'iounmap' [-Werror=implicit-function-declaration]
  iounmap(data);
  ^
Fix it by including  in ksysfs.c

Signed-off-by: Dave Young 
---
 arch/x86/kernel/ksysfs.c |1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/x86/kernel/ksysfs.c
===
--- linux.orig/arch/x86/kernel/ksysfs.c
+++ linux/arch/x86/kernel/ksysfs.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] xhci: Switch Intel Lynx Point ports to EHCI on shutdown

2014-01-02 Thread littlebat

On Thu, 2 Jan 2014 16:03:34 -0800
Sarah Sharp  wrote:

> On Sun, Dec 22, 2013 at 09:47:49AM +0200, Denis Turischev wrote:
> > On 12/21/2013 01:45 AM, Sarah Sharp wrote:
> > > On Fri, Dec 20, 2013 at 12:41:11PM +0200, Denis Turischev wrote:
> > >>> Also, which kernel are you experiencing this issue on?  In
> > >>> 3.12, I queued a separate patch to deal with spurious reboot
> > >>> issues on Lynx Point:
> > >>>
> > >>> commit 638298dc66ea36623dbc2757a24fc2c4ab41b016
> > >> Sorry, I indeed tested not on the latest kernel version, Ubuntu
> > >> 3.13-rc3 has this patch and it works for me.
> > > 
> > > What does "Ubuntu 3.13-rc3" mean?  Where did you get your kernel
> > > from?
> > Latest Ubuntu development kernel based on mainline 3.13-rc3.
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc3-trusty/
> > > 
> > > Also, do you have an HP system, or is this a different vendor?
> > No, it's not HP system, it's Compulab's IntensePC-2 with Phoenix
> > BIOS.
> 
> Ok, that's a bit of an issue then.  Your system needs the quirk
> introduced by commit 638298dc66ea36623dbc2757a24fc2c4ab41b016 "xhci:
> Fix spurious wakeups after S5 on Haswell".  That went into 3.12-rc3.
> However, in 3.13-rc6, commit 6962d914f317b119e0db7189199b21ec77a4b3e0
> "xhci: Limit the spurious wakeup fix only to HP machines" limited the
> quirk to only HP systems.
> 
> That means your system worked fine in 3.13-rc3 (when the quirk was
> applied broadly), but won't work for 3.13-rc6 (when the quirk was
> narrowed to HP machines).  So we need the quirk to apply to your
> systems as well.
> 
> ISTR that the other folks on Cc (Meng, Niklas, Giorgos, and Art) all
> had systems that broke when commit
> 638298dc66ea36623dbc2757a24fc2c4ab41b016 was introduced.  For those
> systems, what vendor was the system, and what BIOS was it running?

I'm Meng(littlebat), 
Motherboard vendor:
Manufacturer: ASRock
Product Name: Z87 Pro3

BIOS:

Boot into BIOS setup interface, only show:
ASRock UEFI Setup Utility
UIFI Version: Z87 Pro3 P2.20
Chipset Version: C2

Information below gotten from command: sudo dmidecode
# dmidecode 2.12
SMBIOS 2.7 present.
26 structures occupying 1467 bytes.
Table at 0x000EE880.

Handle 0x, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: P2.20
Release Date: 07/03/2013
Address: 0xF
Runtime Size: 64 kB
ROM Size: 8192 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 4.6

The original bug report of mine is:
https://bugzilla.kernel.org/show_bug.cgi?id=66551

and the bug disappeared when applied patch: commit
6962d914f317b119e0db7189199b21ec77a4b3e0 "xhci: Limit the spurious
wakeup fix only to HP machines"

> 
> Takashi, did the HP systems that needed the quirk have a Phoenix BIOS?
> 
> Denis, do all of Compulab's Haswell systems reboot on shutdown?  Are
> they all running a Phoenix BIOS?  Can you send me the output of `sudo
> lspci -vvv -s` for the xHCI host?
> 
> Basically, I'm trying to find a common variable to key off.  I suspect
> BIOS vendor is probably the right thing, instead of system vendor.
> 
> Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory allocator semantics

2014-01-02 Thread Josh Triplett

On Thu, Jan 02, 2014 at 12:33:20PM -0800, Paul E. McKenney wrote:
> Hello!
> 
> From what I can see, the Linux-kernel's SLAB, SLOB, and SLUB memory
> allocators would deal with the following sort of race:
> 
> A.CPU 0: r1 = kmalloc(...); ACCESS_ONCE(gp) = r1;
> 
>   CPU 1: r2 = ACCESS_ONCE(gp); if (r2) kfree(r2);
> 
> However, my guess is that this should be considered an accident of the
> current implementation rather than a feature.  The reason for this is
> that I cannot see how you would usefully do (A) above without also allowing
> (B) and (C) below, both of which look to me to be quite destructive:

(A) only seems OK if "gp" is guaranteed to be NULL beforehand, *and* if
no other CPUs can possibly do what CPU 1 is doing in parallel.  Even
then, it seems questionable how this could ever be used successfully in
practice.

This seems similar to the TCP simultaneous-SYN case: theoretically
possible, absurd in practice.

> B.CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;
> 
> CPU 1: r2 = ACCESS_ONCE(shared_x); if (r2) kfree(r2);
> 
>   CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);
> 
>   This results in the memory being on two different freelists.

That's a straightforward double-free bug.  You need some kind of
synchronization there to ensure that only one call to kfree occurs.

> C.  CPU 0: r1 = kmalloc(...);  ACCESS_ONCE(shared_x) = r1;
> 
>   CPU 1: r2 = ACCESS_ONCE(shared_x); r2->a = 1; r2->b = 2;
> 
>   CPU 2: r3 = ACCESS_ONCE(shared_x); if (r3) kfree(r3);
> 
>   CPU 3: r4 = kmalloc(...);  r4->s = 3; r4->t = 4;
> 
>   This results in the memory being used by two different CPUs,
>   each of which believe that they have sole access.

This is not OK either: CPU 2 has called kfree on a pointer that CPU 1
still considers alive, and again, the CPUs haven't used any form of
synchronization to prevent that.

> But I thought I should ask the experts.
> 
> So, am I correct that kernel hackers are required to avoid "drive-by"
> kfree()s of kmalloc()ed memory?

Don't kfree things that are in use, and synchronize to make sure all
CPUs agree about "in use", yes.

> PS.  To the question "Why would anyone care about (A)?", then answer
>  is "Inquiring programming-language memory-model designers want
>  to know."

I find myself wondering about the original form of the question, since
I'd hope that programming-languge memory-model designers would
understand the need for synchronization around reclaiming memory.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] mm: show message when updating min_free_kbytes in thp

2014-01-02 Thread Han Pingtian

On Thu, Jan 02, 2014 at 10:05:21AM -0800, Dave Hansen wrote:
> On 12/31/2013 04:29 PM, Han Pingtian wrote:
> > min_free_kbytes may be updated during thp's initialization. Sometimes,
> > this will change the value being set by user. Showing message will
> > clarify this confusion.
> ...
> > -   if (recommended_min > min_free_kbytes)
> > +   if (recommended_min > min_free_kbytes) {
> > min_free_kbytes = recommended_min;
> > +   pr_info("min_free_kbytes is updated to %d by enabling 
> > transparent hugepage.\n",
> > +   min_free_kbytes);
> > +   }
> 
> "updated" doesn't tell us much.  It's also kinda nasty that if we enable
> then disable THP, we end up with an elevated min_free_kbytes.  Maybe we
> should at least put something in that tells the user how to get back
> where they were if they care:
> 
> "raising min_free_kbytes from %d to %d to help transparent hugepage
> allocations"
> 
Thanks. I have updated it according to your suggestion.


>From f9902b16ff0c326349e72eca9facef2c98f8595d Mon Sep 17 00:00:00 2001
From: Han Pingtian 
Date: Fri, 3 Jan 2014 11:10:49 +0800
Subject: [PATCH] mm: show message when raising min_free_kbytes in THP

min_free_kbytes may be raised during THP's initialization. Sometimes,
this will change the value being set by user. Showing message will
clarify this confusion.

Showing the old value of min_free_kbytes according to Dave Hansen's
suggestion. This will give user the chance to restore old value of
min_free_kbytes.

Signed-off-by: Han Pingtian 
---
 mm/huge_memory.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7de1bf8..1f0356d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -130,8 +130,11 @@ static int set_recommended_min_free_kbytes(void)
  (unsigned long) nr_free_buffer_pages() / 20);
recommended_min <<= (PAGE_SHIFT-10);
 
-   if (recommended_min > min_free_kbytes)
+   if (recommended_min > min_free_kbytes) {
+   pr_info("raising min_free_kbytes from %d to %d to help 
transparent hugepage allocations\n",
+   min_free_kbytes, recommended_min);
min_free_kbytes = recommended_min;
+   }
setup_per_zone_wmarks();
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net-next] r8152: fix the wrong return value

2014-01-02 Thread Hayes Wang

The return value should be the boolean value, not the error code.

Signed-off-by: Hayes Wang 
Spotted-by: Dan Carpenter 
---
 drivers/net/usb/r8152.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index e3d878c..13fabbb 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -2708,7 +2708,7 @@ static bool rtl_ops_init(struct r8152 *tp, const struct 
usb_device_id *id)
ops->unload = rtl8153_unload;
break;
default:
-   ret = -EFAULT;
+   ret = false;
break;
}
break;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: page_alloc: use enum instead of number for migratetype

2014-01-02 Thread SeongJae Park

On Fri, Jan 3, 2014 at 12:57 AM, Mel Gorman  wrote:
> On Thu, Jan 02, 2014 at 08:25:22PM +0900, SeongJae Park wrote:
>> Using enum instead of number for migratetype everywhere would be better
>> for reading and understanding.
>>
>> Signed-off-by: SeongJae Park 
>
> This implicitly makes assumptions about the value of MIGRATE_UNMOVABLE
> and does not appear to actually fix or improve anything.
>
> --
> Mel Gorman
> SUSE Labs

I thought the implicit assumptions may be helpful for some kind of
people's readability.
But, anyway, I agree and respect your opinion now.

Thanks and Regards.
SeongJae Park
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 0/4] net_cls for sys container

2014-01-02 Thread Libo Chen

Hi guys,

Now, lxc created with veth can not be under control by
cls_cgroup.

the former discussion:
http://lkml.indiana.edu/hypermail/linux/kernel/1312.1/00214.html

In short, because cls_cgroup relys classid attached to sock
filter skb, but sock will be cleared inside dev_forward_skb()
in veth_xmit().

so I add backup_classid in struct sk_buffer to save classid
before dev_forward_skb(). In cls_cgroup_classify(), if skb->sk
is NULL, we can try to restore classid form skb->bk_classid.


Libo Chen (4):
  net: introduce bk_classid to struct sk_buff
  cls_cgroup: introduce a helper: bk_cls_classid()
  veth: backup classid befor switch net_ns
  cls_cgroup: restore classid from skb->sk_classid

 drivers/net/veth.c   |  5 +
 include/linux/skbuff.h   |  3 +++
 include/net/cls_cgroup.h | 11 +++
 net/sched/cls_cgroup.c   |  8 
 4 files changed, 23 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 3/4] veth: backup classid befor switch net_ns

2014-01-02 Thread Libo Chen


dev_forward_skb will clear skb->sk, so we need save classid
before that, otherwise the skb can not be under control by
net_cls.

Signed-off-by: Libo Chen 
---
 drivers/net/veth.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2ec2041..ce43a2d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 

 #define DRV_NAME   "veth"
 #define DRV_VERSION"1.0"
@@ -123,6 +124,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
rcv->features & NETIF_F_RXCSUM)
skb->ip_summed = CHECKSUM_UNNECESSARY;

+   /**
+* dev_forward_sbk will clear skb->sk, so save
+* skb->sk->sk_classid for Qos
+*/
+   bk_cls_classid(skb);
+
if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);

-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 1/4] net: introduce backup_classid to struct skbuff

2014-01-02 Thread Libo Chen


introduce backup_classid to struct skbuff,
we can use it to backup sk_classid when net_ns switch.

Signed-off-by: Libo Chen 
---
 include/linux/skbuff.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c5cd016..b76e871 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -492,6 +492,9 @@ struct sk_buff {
 #ifdef CONFIG_NET_CLS_ACT
__u16   tc_verd;/* traffic control verdict */
 #endif
+#ifdef CONFIG_NET_CLS_CGROUP
+   __u32   backup_classid;
+#endif
 #endif

__u16   queue_mapping;
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 4/4] cls_cgroup: restore classid from skb->sk_classid

2014-01-02 Thread Libo Chen


if skb->sk is NULL, we can try to restore from skb->bk_classid,
because we may have saved it.

Signed-off-by: Libo Chen 
---
 net/sched/cls_cgroup.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index d228a5d..6ab0e69 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -141,9 +141,10 @@ static int cls_cgroup_classify(struct sk_buff *skb, const 
struct tcf_proto *tp,
 */
if (in_serving_softirq()) {
/* If there is an sk_classid we'll use that. */
-   if (!skb->sk)
-   return -1;
-   classid = skb->sk->sk_classid;
+   if (skb->sk)
+   classid = skb->sk->sk_classid;
+   else
+   classid = skb->backup_classid;
}

if (!classid)
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH net-next 2/4] cls_cgroup: introduce a helper: bk_cls_classid()

2014-01-02 Thread Libo Chen


it can save classid from skb->sk->sk_classid
to skb->bk_classid

Signed-off-by: Libo Chen 
---
 include/net/cls_cgroup.h | 11 +++
 net/sched/cls_cgroup.c   |  1 -
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index 33d03b6..4249ea3 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 

 #if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
 struct cgroup_cls_state
@@ -26,6 +27,12 @@ struct cgroup_cls_state

 void sock_update_classid(struct sock *sk);

+static inline void bk_cls_classid(struct sk_buff *skb)
+{
+   if (skb->sk && skb->sk->sk_classid)
+   skb->backup_classid = skb->sk->sk_classid;
+}
+
 #if IS_BUILTIN(CONFIG_NET_CLS_CGROUP)
 static inline u32 task_cls_classid(struct task_struct *p)
 {
@@ -61,6 +68,10 @@ static inline u32 task_cls_classid(struct task_struct *p)
 }
 #endif
 #else /* !CGROUP_NET_CLS_CGROUP */
+static inline void bk_cls_classid(struct sk_buff *skb)
+{
+}
+
 static inline void sock_update_classid(struct sock *sk)
 {
 }
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index f9d21258..d228a5d 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 

 static inline struct cgroup_cls_state *css_cls_state(struct 
cgroup_subsys_state *css)
-- 
1.8.2.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCHv3 02/11] iommu/omap: Use get_vm_area directly

2014-01-02 Thread Chen, Gong

On Thu, Jan 02, 2014 at 01:53:20PM -0800, Laura Abbott wrote:
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index d147259..6280d50 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -214,7 +214,7 @@ static void *vmap_sg(const struct sg_table *sgt)
>   if (!total)
>   return ERR_PTR(-EINVAL);
>  
> - new = __get_vm_area(total, VM_IOREMAP, VMALLOC_START, VMALLOC_END);
> + new = get_vm_area(total, VM_IOREMAP);
This driver is a module but get_vm_area is not exported. You need to add
one extra EXPORT_SYMBOL_GPL(get_vm_area).



signature.asc
Description: Digital signature

RE: [PATCH net-next v2 6/6] r8152: support RTL8153

2014-01-02 Thread hayeswang

 Bjørn Mork [mailto:bj...@mork.no] 
> Sent: Thursday, January 02, 2014 10:25 PM
> To: Hayeswang
> Cc: oli...@neukum.org; net...@vger.kernel.org; nic_swsd; 
> linux-kernel@vger.kernel.org; linux-...@vger.kernel.org
> Subject: Re: [PATCH net-next v2 6/6] r8152: support RTL8153
> 
[...]
> > +#if defined(CONFIG_USB_RTL8152) || 
> defined(CONFIG_USB_RTL8152_MODULE)
> > +/* Samsung USB Ethernet Adapters */
> > +{
> > +   USB_DEVICE_AND_INTERFACE_INFO(SAMSUNG_VENDOR_ID, 
> 0xa101, USB_CLASS_COMM,
> > +   USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
> > +   .driver_info = 0,
> > +},
> > +#endif
> 
> 
> We don't play the if-other-driver-is-enabled for any other of the
> blacklisted devices, including other devices supported by the RTL8152
> driver.  Why do we need it here?

The USB nic have two configurations. One is the config 2 which is the ECM
mode and uses the driver of cdc_ether. The other is the config 1 which use
the driver of r8152. When the dangle is plugged, I find the configuration
is 2 and the ECM driver would be loaded. By this way, you could select
which mode you want to run. Some people would select ECM mode for
performance, and the other would select r8152 for power saving.
 
Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCHv3 01/11] mce: acpi/apei: Use get_vm_area directly

2014-01-02 Thread Chen, Gong

On Thu, Jan 02, 2014 at 01:53:19PM -0800, Laura Abbott wrote:
> There's no need to use VMALLOC_START and VMALLOC_END with
> __get_vm_area when get_vm_area does the exact same thing.
> Convert over.
> 
> Signed-off-by: Laura Abbott 

Ack-by: Chen, Gong 


signature.asc
Description: Digital signature

Re: [PATCH] x86: Add check for number of available vectors before CPU down [v4]

2014-01-02 Thread Chen, Gong

Add some nitpicks below.

Reviewed-by: Chen, Gong 

On Thu, Jan 02, 2014 at 07:47:24PM -0500, Prarit Bhargava wrote:
> +int check_irq_vectors_for_cpu_disable(void)
> +{
> + int irq, cpu;
> + unsigned int vector, this_count, count;
> + struct irq_desc *desc;
> + struct irq_data *data;
> + struct cpumask affinity_new, online_new;
> +
> + cpumask_copy(_new, cpu_online_mask);
> + cpu_clear(smp_processor_id(), online_new);
I notice that you use smp_processor_id() many times. Maybe we can
save it first for speed.

> +
> + this_count = 0;
> + for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
> + irq = __this_cpu_read(vector_irq[vector]);
> + if (irq >= 0) {
> + desc = irq_to_desc(irq);
> + data = irq_desc_get_irq_data(desc);
> + cpumask_copy(_new, data->affinity);
> + cpu_clear(smp_processor_id(), affinity_new);
> + /*
> +  * Only count active non-percpu irqs, and those
> +  * irqs that are not linked to on an online cpu; in
> +  * fixup_irqs(), chip->irq_set_affinity() will be
> +  * called which will set vector_irq[irq] for each
> +  * cpu.
> +  */
> + if (irq_has_action(irq) && !irqd_is_per_cpu(data) &&
> + (cpumask_empty(_new) ||
> +  !cpumask_subset(_new, _new)))
> + this_count++;
Would you please add some extra comments to describe these two different
conditions that cpumask is empty and non-empty. I feel it is a little bit
sutble and I don't expect I'm confused by myself one day :-).



signature.asc
Description: Digital signature

[PATCH] Staging: comedi: fix spacing/style problem in das1800.c

2014-01-02 Thread Chase Southwood

This is a patch to the das1800.c file that fixes a style issue found by
checkpatch.pl.

Signed-off-by: Chase Southwood 
---
 drivers/staging/comedi/drivers/das1800.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/das1800.c 
b/drivers/staging/comedi/drivers/das1800.c
index 165bdfd..320d95a 100644
--- a/drivers/staging/comedi/drivers/das1800.c
+++ b/drivers/staging/comedi/drivers/das1800.c
@@ -459,7 +459,7 @@ static inline uint16_t munge_bipolar_sample(const struct 
comedi_device *dev,
return sample;
 }
 
-static void munge_data(struct comedi_device *dev, uint16_t * array,
+static void munge_data(struct comedi_device *dev, uint16_t *array,
   unsigned int num_elements)
 {
unsigned int i;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] MTD: UBI: avoid program operation on NOR flash after erasure interrupted

2014-01-02 Thread qiwang

OK, thank you Artem.

I will change to git send-mail next time. And I will be very careful to push 
patch to you next time.
Sorry to bring so much trouble to you as I am first time to push patch to Linux 
main trunk.(Maybe it also is Micron Technology company first time to push patch 
to Linux main trunk, I am being Micron China Shanghai team volunteer to contact 
Linux committee to push patch). 
Anyway, Thanks for your patience to give me a good starting. 
And if you have anything need my or my team help, Please kindly let me know. 
Our team is glad to contribute to Linux community.
Wish you have a wonderful 2014.
Thank you

-Original Message-
From: Artem Bityutskiy [mailto:dedeki...@gmail.com] 
Sent: Thursday, January 02, 2014 11:23 PM
To: Qi Wang 王起 (qiwang)
Cc: Adrian Hunter; linux-kernel@vger.kernel.org; linux-...@lists.infradead.org
Subject: Re: [PATCH 1/1] MTD: UBI: avoid program operation on NOR flash after 
erasure interrupted

On Thu, 2014-01-02 at 17:11 +0200, Artem Bityutskiy wrote:
> Anyway, I did not respond for long time, so I decided to do all these
> modifications myself. Could you please review and test the below patch,
> which is also attached. I did not compile it even. If there are issues,
> just fix-it up and send the the fix or the fixed version. If it is OK,
> give me your "Tested-by:" please. Thanks!

Actually, for convenience, I've pushed this patch to the 'qi' branch of
the 'linux-ubifs' git tree:

git://git.infradead.org/ubifs-2.6.git

-- 
Best Regards,
Artem Bityutskiy

RE: [PATCH] mm/zswap: add writethrough option

2014-01-02 Thread Weijie Yang

On Thu, Jan 2, 2014 at 11:38 PM, Dan Streetman  wrote:
> Happy new year!
>
> Seth, just checking if you have had a chance yet to think about this one.
>
>
> On Thu, Dec 19, 2013 at 8:23 AM, Dan Streetman  wrote:
> > Currently, zswap is writeback cache; stored pages are not sent
> > to swap disk, and when zswap wants to evict old pages it must
> > first write them back to swap cache/disk manually.  This avoids
> > swap out disk I/O up front, but only moves that disk I/O to
> > the writeback case (for pages that are evicted), and adds the
> > overhead of having to uncompress the evicted pages and the
> > need for an additional free page (to store the uncompressed page).
> >
> > This optionally changes zswap to writethrough cache by enabling
> > frontswap_writethrough() before registering, so that any
> > successful page store will also be written to swap disk.  The
> > default remains writeback.  To enable writethrough, the param
> > zswap.writethrough=1 must be used at boot.
> >
> > Whether writeback or writethrough will provide better performance
> > depends on many factors including disk I/O speed/throughput,
> > CPU speed(s), system load, etc.  In most cases it is likely
> > that writeback has better performance than writethrough before
> > zswap is full, but after zswap fills up writethrough has
> > better performance than writeback.
> >
> > Signed-off-by: Dan Streetman 

Thanks for your work.
Although I won't try writethrough mode in embedded device, I hope it may be
helpful to others.
I reviewed this patch, and it is good to me.

Reviewed-by: Weijie Yang 

> > ---
> >
> > Based on specjbb testing on my laptop, the results for both writeback
> > and writethrough are better than not using zswap at all, but writeback
> > does seem to be better than writethrough while zswap isn't full.  Once
> > it fills up, performance for writethrough is essentially close to not
> > using zswap, while writeback seems to be worse than not using zswap.
> > However, I think more testing on a wider span of systems and conditions
> > is needed.  Additionally, I'm not sure that specjbb is measuring true
> > performance under fully loaded cpu conditions, so additional cpu load
> > might need to be added or specjbb parameters modified (I took the
> > values from the 4 "warehouses" test run).
> >
> > In any case though, I think having writethrough as an option is still
> > useful.  More changes could be made, such as changing from writeback
> > to writethrough based on the zswap % full.  And the patch doesn't
> > change default behavior - writethrough must be specifically enabled.
> >
> > The %-ized numbers I got from specjbb on average, using the default
> > 20% max_pool_percent and varying the amount of heap used as shown:
> >
> > ram | no zswap | writeback | writethrough
> > 75 93.08 100 96.90
> > 87 96.58 95.58   96.72
> > 10092.29 89.73   86.75
> > 11263.80 38.66   19.66
> > 1254.79  29.90   15.75
> > 1374.99  4.504.75
> > 1504.28  4.625.01
> > 1625.20  2.944.66
> > 1755.71  2.114.84
> >
> >
> >
> >  mm/zswap.c | 68 
> > ++
> >  1 file changed, 64 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index e55bab9..2f919db 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -61,6 +61,8 @@ static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> >  static u64 zswap_pool_limit_hit;
> >  /* Pages written back when pool limit was reached */
> >  static u64 zswap_written_back_pages;
> > +/* Pages evicted when pool limit was reached */
> > +static u64 zswap_evicted_pages;
> >  /* Store failed due to a reclaim failure after pool limit was reached */
> >  static u64 zswap_reject_reclaim_fail;
> >  /* Compressed page was too big for the allocator to (optimally) store */
> > @@ -89,6 +91,10 @@ static unsigned int zswap_max_pool_percent = 20;
> >  module_param_named(max_pool_percent,
> > zswap_max_pool_percent, uint, 0644);
> >
> > +/* Writeback/writethrough mode (fixed at boot for now) */
> > +static bool zswap_writethrough;
> > +module_param_named(writethrough, zswap_writethrough, bool, 0444);
> > +
> >  /*
> >  * compression functions
> >  **/
> > @@ -629,6 +635,48 @@ end:
> >  }
> >
> >  /*
> > +* evict code
> > +**/
> > +
> > +/*
> > + * This evicts pages that have already been written through to swap.
> > + */
> > +static int zswap_evict_entry(struct zbud_pool *pool, unsigned long handle)
> > +{
> > +   struct zswap_header *zhdr;
> > +   swp_entry_t swpentry;
> > +   struct zswap_tree *tree;
> > +   pgoff_t offset;
> > +   struct zswap_entry *entry;
> > +
> > +   /* extract swpentry from data */
> > +   zhdr = zbud_map(pool, handle);
> > +

[for-next][PATCH 02/17] tracing/probes: Fix basic print type functions

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

The print format of s32 type was "ld" and it's casted to "long".  So
it turned out to print 4294967295 for "-1" on 64-bit systems.  Not
sure whether it worked well on 32-bit systems.

Anyway, it doesn't need to have cast argument at all since it already
casted using type pointer - just get rid of it.  Thanks to Oleg for
pointing that out.

And print 0x prefix for unsigned type as it shows hex numbers.

Suggested-by: Oleg Nesterov 
Acked-by: Oleg Nesterov 
Acked-by: Masami Hiramatsu 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_probe.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 412e959..430505b 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -40,23 +40,23 @@ const char *reserved_field_names[] = {
 #define PRINT_TYPE_FMT_NAME(type)  print_type_format_##type
 
 /* Printing  in basic type function template */
-#define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt, cast)  \
+#define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt)
\
 static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,   \
const char *name,   \
-   void *data, void *ent)\
+   void *data, void *ent)  \
 {  \
-   return trace_seq_printf(s, " %s=" fmt, name, (cast)*(type *)data);\
+   return trace_seq_printf(s, " %s=" fmt, name, *(type *)data);\
 }  \
 static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
 
-DEFINE_BASIC_PRINT_TYPE_FUNC(u8, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "%x", unsigned int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "%lx", unsigned long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "%llx", unsigned long long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s8, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d", int)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%ld", long)
-DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%lld", long long)
+DEFINE_BASIC_PRINT_TYPE_FUNC(u8 , "0x%x")
+DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "0x%x")
+DEFINE_BASIC_PRINT_TYPE_FUNC(u32, "0x%x")
+DEFINE_BASIC_PRINT_TYPE_FUNC(u64, "0x%Lx")
+DEFINE_BASIC_PRINT_TYPE_FUNC(s8,  "%d")
+DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d")
+DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%d")
+DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%Ld")
 
 static inline void *get_rloc_data(u32 *dl)
 {
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 09/17] tracing/probes: Implement stack fetch method for uprobes

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Use separate method to fetch from stack.  Move existing functions to
trace_kprobe.c and make them static.  Also add new stack fetch
implementation for uprobes.

Acked-by: Oleg Nesterov 
Cc: Masami Hiramatsu 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 15 +++
 kernel/trace/trace_probe.c  | 22 --
 kernel/trace/trace_probe.h  | 14 ++
 kernel/trace/trace_uprobe.c | 41 +
 4 files changed, 66 insertions(+), 26 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fe3f00c..389f9e4 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -88,6 +88,21 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
+/*
+ * Kprobes-specific fetch functions
+ */
+#define DEFINE_FETCH_stack(type)   \
+static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
+ void *offset, void *dest) \
+{  \
+   *(type *)dest = (type)regs_get_kernel_stack_nth(regs,   \
+   (unsigned int)((unsigned long)offset)); \
+}
+DEFINE_BASIC_FETCH_FUNCS(stack)
+/* No string on the stack entry */
+#define fetch_stack_string NULL
+#define fetch_stack_string_sizeNULL
+
 /* Fetch type information table */
 const struct fetch_type kprobes_fetch_type_table[] = {
/* Special types */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 541036e..77aa7d1 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -70,16 +70,6 @@ __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq 
*s,
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
-/*
- * Define macro for basic types - we don't need to define s* types, because
- * we have to care only about bitwidth at recording time.
- */
-#define DEFINE_BASIC_FETCH_FUNCS(method) \
-DEFINE_FETCH_##method(u8)  \
-DEFINE_FETCH_##method(u16) \
-DEFINE_FETCH_##method(u32) \
-DEFINE_FETCH_##method(u64)
-
 #define CHECK_FETCH_FUNCS(method, fn)  \
(((FETCH_FUNC_NAME(method, u8) == fn) ||\
  (FETCH_FUNC_NAME(method, u16) == fn) ||   \
@@ -102,18 +92,6 @@ DEFINE_BASIC_FETCH_FUNCS(reg)
 #define fetch_reg_string   NULL
 #define fetch_reg_string_size  NULL
 
-#define DEFINE_FETCH_stack(type)   \
-__kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,  \
- void *offset, void *dest) \
-{  \
-   *(type *)dest = (type)regs_get_kernel_stack_nth(regs,   \
-   (unsigned int)((unsigned long)offset)); \
-}
-DEFINE_BASIC_FETCH_FUNCS(stack)
-/* No string on the stack entry */
-#define fetch_stack_string NULL
-#define fetch_stack_string_sizeNULL
-
 #define DEFINE_FETCH_retval(type)  \
 __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs, \
  void *dummy, void *dest)  \
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 5b77798..8211dd6 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -167,10 +167,6 @@ DECLARE_BASIC_FETCH_FUNCS(reg);
 #define fetch_reg_string   NULL
 #define fetch_reg_string_size  NULL
 
-DECLARE_BASIC_FETCH_FUNCS(stack);
-#define fetch_stack_string NULL
-#define fetch_stack_string_sizeNULL
-
 DECLARE_BASIC_FETCH_FUNCS(retval);
 #define fetch_retval_stringNULL
 #define fetch_retval_string_size   NULL
@@ -191,6 +187,16 @@ DECLARE_BASIC_FETCH_FUNCS(bitfield);
 #define fetch_bitfield_string  NULL
 #define fetch_bitfield_string_size NULL
 
+/*
+ * Define macro for basic types - we don't need to define s* types, because
+ * we have to care only about bitwidth at recording time.
+ */
+#define DEFINE_BASIC_FETCH_FUNCS(method) \
+DEFINE_FETCH_##method(u8)  \
+DEFINE_FETCH_##method(u16) \
+DEFINE_FETCH_##method(u32) \
+DEFINE_FETCH_##method(u64)
+
 /* Default (unsigned long) fetch type */
 #define __DEFAULT_FETCH_TYPE(t) u##t
 #define _DEFAULT_FETCH_TYPE(t) __DEFAULT_FETCH_TYPE(t)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2c60925..5395d37 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -74,6 +74,47

RE: [PATCH 1/1] MTD: UBI: avoid program operation on NOR flash after erasure interrupted

2014-01-02 Thread qiwang

Hi Richard:
Artem already has push this patch to 'linux-ubifs' git tree.  
Thanks for you remind, I will change to git send-mail next time. 
Thanks again for  your reply. I will be very careful to push patch to you next 
time. 

Thanks a lot

-Original Message-
From: Richard Weinberger [mailto:rich...@nod.at] 
Sent: Thursday, January 02, 2014 10:58 PM
To: Qi Wang 王起 (qiwang)
Cc: Richard Weinberger; dedeki...@gmail.com; linux-kernel@vger.kernel.org; 
linux-...@lists.infradead.org; Adrian Hunter
Subject: Re: [PATCH 1/1] MTD: UBI: avoid program operation on NOR flash after 
erasure interrupted

Am Mittwoch, 1. Januar 2014, 13:11:21 schrieb Qi Wang 王起:
> If have any questions, please let me know. 

Did you send the patch again using Outlook?
Or are you behind a MS Exchange server?

The patch is still malformed. :-(
e.g. it is base64 encoded.

Can you please try using git send-email?

Thanks,
//richard

[for-next][PATCH 04/17] tracing/uprobes: Convert to struct trace_probe

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Convert struct trace_uprobe to make use of the common trace_probe
structure.

Reviewed-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
Acked-by: Oleg Nesterov 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_uprobe.c | 159 ++--
 1 file changed, 79 insertions(+), 80 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c77b92d..afda372 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -51,22 +51,17 @@ struct trace_uprobe_filter {
  */
 struct trace_uprobe {
struct list_headlist;
-   struct ftrace_event_class   class;
-   struct ftrace_event_callcall;
struct trace_uprobe_filter  filter;
struct uprobe_consumer  consumer;
struct inode*inode;
char*filename;
unsigned long   offset;
unsigned long   nhit;
-   unsigned intflags;  /* For TP_FLAG_* */
-   ssize_t size;   /* trace entry size */
-   unsigned intnr_args;
-   struct probe_argargs[];
+   struct trace_probe  tp;
 };
 
-#define SIZEOF_TRACE_UPROBE(n) \
-   (offsetof(struct trace_uprobe, args) +  \
+#define SIZEOF_TRACE_UPROBE(n) \
+   (offsetof(struct trace_uprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
 
 static int register_uprobe_event(struct trace_uprobe *tu);
@@ -114,13 +109,13 @@ alloc_trace_uprobe(const char *group, const char *event, 
int nargs, bool is_ret)
if (!tu)
return ERR_PTR(-ENOMEM);
 
-   tu->call.class = >class;
-   tu->call.name = kstrdup(event, GFP_KERNEL);
-   if (!tu->call.name)
+   tu->tp.call.class = >tp.class;
+   tu->tp.call.name = kstrdup(event, GFP_KERNEL);
+   if (!tu->tp.call.name)
goto error;
 
-   tu->class.system = kstrdup(group, GFP_KERNEL);
-   if (!tu->class.system)
+   tu->tp.class.system = kstrdup(group, GFP_KERNEL);
+   if (!tu->tp.class.system)
goto error;
 
INIT_LIST_HEAD(>list);
@@ -128,11 +123,11 @@ alloc_trace_uprobe(const char *group, const char *event, 
int nargs, bool is_ret)
if (is_ret)
tu->consumer.ret_handler = uretprobe_dispatcher;
init_trace_uprobe_filter(>filter);
-   tu->call.flags |= TRACE_EVENT_FL_USE_CALL_FILTER;
+   tu->tp.call.flags |= TRACE_EVENT_FL_USE_CALL_FILTER;
return tu;
 
 error:
-   kfree(tu->call.name);
+   kfree(tu->tp.call.name);
kfree(tu);
 
return ERR_PTR(-ENOMEM);
@@ -142,12 +137,12 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
 {
int i;
 
-   for (i = 0; i < tu->nr_args; i++)
-   traceprobe_free_probe_arg(>args[i]);
+   for (i = 0; i < tu->tp.nr_args; i++)
+   traceprobe_free_probe_arg(>tp.args[i]);
 
iput(tu->inode);
-   kfree(tu->call.class->system);
-   kfree(tu->call.name);
+   kfree(tu->tp.call.class->system);
+   kfree(tu->tp.call.name);
kfree(tu->filename);
kfree(tu);
 }
@@ -157,8 +152,8 @@ static struct trace_uprobe *find_probe_event(const char 
*event, const char *grou
struct trace_uprobe *tu;
 
list_for_each_entry(tu, _list, list)
-   if (strcmp(tu->call.name, event) == 0 &&
-   strcmp(tu->call.class->system, group) == 0)
+   if (strcmp(tu->tp.call.name, event) == 0 &&
+   strcmp(tu->tp.call.class->system, group) == 0)
return tu;
 
return NULL;
@@ -181,16 +176,16 @@ static int unregister_trace_uprobe(struct trace_uprobe 
*tu)
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
-   struct trace_uprobe *old_tp;
+   struct trace_uprobe *old_tu;
int ret;
 
mutex_lock(_lock);
 
/* register as an event */
-   old_tp = find_probe_event(tu->call.name, tu->call.class->system);
-   if (old_tp) {
+   old_tu = find_probe_event(tu->tp.call.name, tu->tp.call.class->system);
+   if (old_tu) {
/* delete old event */
-   ret = unregister_trace_uprobe(old_tp);
+   ret = unregister_trace_uprobe(old_tu);
if (ret)
goto end;
}
@@ -360,34 +355,36 @@ static int create_trace_uprobe(int argc, char **argv)
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+   struct probe_arg *parg = >tp.args[i];
+
/* Increment count for freeing args in error case */
-   tu->nr_args++;
+   tu->tp.nr_args++;

[for-next][PATCH 00/17] tracing/uprobes: Add support for more fetch methods

2014-01-02 Thread Steven Rostedt

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: b7e0bf341f6cfa92ae0a0e3d0c3496729595e1e9


Hyeoncheol Lee (1):
  tracing/probes: Add fetch{,_size} member into deref fetch method

Namhyung Kim (15):
  tracing/uprobes: Fix documentation of uprobe registration syntax
  tracing/probes: Fix basic print type functions
  tracing/kprobes: Factor out struct trace_probe
  tracing/uprobes: Convert to struct trace_probe
  tracing/kprobes: Move common functions to trace_probe.h
  tracing/probes: Integrate duplicate set_print_fmt()
  tracing/probes: Move fetch function helpers to trace_probe.h
  tracing/probes: Split [ku]probes_fetch_type_table
  tracing/probes: Implement 'stack' fetch method for uprobes
  tracing/probes: Move 'symbol' fetch method to kprobes
  tracing/probes: Implement 'memory' fetch method for uprobes
  tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg()
  tracing/uprobes: Fetch args before reserving a ring buffer
  tracing/uprobes: Add support for full argument access methods
  tracing/uprobes: Add @+file_offset fetch method

Oleg Nesterov (1):
  uprobes: Allocate ->utask before handler_chain() for tracing handlers


 Documentation/trace/uprobetracer.txt |  36 +-
 kernel/events/uprobes.c  |   4 +
 kernel/trace/trace_kprobe.c  | 824 +++
 kernel/trace/trace_probe.c   | 440 +++
 kernel/trace/trace_probe.h   | 216 +
 kernel/trace/trace_uprobe.c  | 495 +++--
 6 files changed, 1214 insertions(+), 801 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 10/17] tracing/probes: Move symbol fetch method to kprobes

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Move existing functions to trace_kprobe.c and add NULL entries to the
uprobes fetch type table.  I don't make them static since some generic
routines like update/free_XXX_fetch_param() require pointers to the
functions.

Acked-by: Oleg Nesterov 
Cc: Masami Hiramatsu 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 59 +
 kernel/trace/trace_probe.c  | 59 -
 kernel/trace/trace_probe.h  | 24 ++
 kernel/trace/trace_uprobe.c |  8 ++
 4 files changed, 91 insertions(+), 59 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 389f9e4..d2a4fd2 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -88,6 +88,51 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
+/* Memory fetching by symbol */
+struct symbol_cache {
+   char*symbol;
+   longoffset;
+   unsigned long   addr;
+};
+
+unsigned long update_symbol_cache(struct symbol_cache *sc)
+{
+   sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
+
+   if (sc->addr)
+   sc->addr += sc->offset;
+
+   return sc->addr;
+}
+
+void free_symbol_cache(struct symbol_cache *sc)
+{
+   kfree(sc->symbol);
+   kfree(sc);
+}
+
+struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
+{
+   struct symbol_cache *sc;
+
+   if (!sym || strlen(sym) == 0)
+   return NULL;
+
+   sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
+   if (!sc)
+   return NULL;
+
+   sc->symbol = kstrdup(sym, GFP_KERNEL);
+   if (!sc->symbol) {
+   kfree(sc);
+   return NULL;
+   }
+   sc->offset = offset;
+   update_symbol_cache(sc);
+
+   return sc;
+}
+
 /*
  * Kprobes-specific fetch functions
  */
@@ -103,6 +148,20 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 #define fetch_stack_string NULL
 #define fetch_stack_string_sizeNULL
 
+#define DEFINE_FETCH_symbol(type)  \
+__kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs, \
+ void *data, void *dest)   \
+{  \
+   struct symbol_cache *sc = data; \
+   if (sc->addr)   \
+   fetch_memory_##type(regs, (void *)sc->addr, dest);  \
+   else\
+   *(type *)dest = 0;  \
+}
+DEFINE_BASIC_FETCH_FUNCS(symbol)
+DEFINE_FETCH_symbol(string)
+DEFINE_FETCH_symbol(string_size)
+
 /* Fetch type information table */
 const struct fetch_type kprobes_fetch_type_table[] = {
/* Special types */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 77aa7d1..a31ad47 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -180,65 +180,6 @@ __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct 
pt_regs *regs,
*(u32 *)dest = len;
 }
 
-/* Memory fetching by symbol */
-struct symbol_cache {
-   char*symbol;
-   longoffset;
-   unsigned long   addr;
-};
-
-static unsigned long update_symbol_cache(struct symbol_cache *sc)
-{
-   sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-   if (sc->addr)
-   sc->addr += sc->offset;
-
-   return sc->addr;
-}
-
-static void free_symbol_cache(struct symbol_cache *sc)
-{
-   kfree(sc->symbol);
-   kfree(sc);
-}
-
-static struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
-{
-   struct symbol_cache *sc;
-
-   if (!sym || strlen(sym) == 0)
-   return NULL;
-
-   sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
-   if (!sc)
-   return NULL;
-
-   sc->symbol = kstrdup(sym, GFP_KERNEL);
-   if (!sc->symbol) {
-   kfree(sc);
-   return NULL;
-   }
-   sc->offset = offset;
-   update_symbol_cache(sc);
-
-   return sc;
-}
-
-#define DEFINE_FETCH_symbol(type)  \
-__kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs, \
- void *data, void *dest)   \
-{  \
-   struct symbol_cache *sc = data; \
-   if (sc->addr)   \
-   fetch_memory_##type(regs, (void *)sc->addr, dest);

[for-next][PATCH 07/17] tracing/probes: Move fetch function helpers to trace_probe.h

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Move fetch function helper macros/functions to the header file and
make them external.  This is preparation of supporting uprobe fetch
table in next patch.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_probe.c | 74 --
 kernel/trace/trace_probe.h | 65 
 2 files changed, 78 insertions(+), 61 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index d8347b0..c26bc9e 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -35,19 +35,15 @@ const char *reserved_field_names[] = {
FIELD_STRING_FUNC,
 };
 
-/* Printing function type */
-#define PRINT_TYPE_FUNC_NAME(type) print_type_##type
-#define PRINT_TYPE_FMT_NAME(type)  print_type_format_##type
-
 /* Printing  in basic type function template */
 #define DEFINE_BASIC_PRINT_TYPE_FUNC(type, fmt)
\
-static __kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,   \
+__kprobes int PRINT_TYPE_FUNC_NAME(type)(struct trace_seq *s,  \
const char *name,   \
void *data, void *ent)  \
 {  \
return trace_seq_printf(s, " %s=" fmt, name, *(type *)data);\
 }  \
-static const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
+const char PRINT_TYPE_FMT_NAME(type)[] = fmt;
 
 DEFINE_BASIC_PRINT_TYPE_FUNC(u8 , "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(u16, "0x%x")
@@ -58,23 +54,12 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d")
 DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%d")
 DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%Ld")
 
-static inline void *get_rloc_data(u32 *dl)
-{
-   return (u8 *)dl + get_rloc_offs(*dl);
-}
-
-/* For data_loc conversion */
-static inline void *get_loc_data(u32 *dl, void *ent)
-{
-   return (u8 *)ent + get_rloc_offs(*dl);
-}
-
 /* For defining macros, define string/string_size types */
 typedef u32 string;
 typedef u32 string_size;
 
 /* Print type function for string type */
-static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
+__kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
  const char *name,
  void *data, void *ent)
 {
@@ -87,7 +72,7 @@ static __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct 
trace_seq *s,
(const char *)get_loc_data(data, ent));
 }
 
-static const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
+const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
 #define FETCH_FUNC_NAME(method, type)  fetch_##method##_##type
 /*
@@ -111,7 +96,7 @@ DEFINE_FETCH_##method(u64)
 
 /* Data fetch function templates */
 #define DEFINE_FETCH_reg(type) \
-static __kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs, \
+__kprobes void FETCH_FUNC_NAME(reg, type)(struct pt_regs *regs,
\
void *offset, void *dest)   \
 {  \
*(type *)dest = (type)regs_get_register(regs,   \
@@ -123,7 +108,7 @@ DEFINE_BASIC_FETCH_FUNCS(reg)
 #define fetch_reg_string_size  NULL
 
 #define DEFINE_FETCH_stack(type)   \
-static __kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs,  \
  void *offset, void *dest) \
 {  \
*(type *)dest = (type)regs_get_kernel_stack_nth(regs,   \
@@ -135,7 +120,7 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 #define fetch_stack_string_sizeNULL
 
 #define DEFINE_FETCH_retval(type)  \
-static __kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(retval, type)(struct pt_regs *regs, \
  void *dummy, void *dest)  \
 {  \
*(type *)dest = (type)regs_return_value(regs);  \
@@ -146,7 +131,7 @@ DEFINE_BASIC_FETCH_FUNCS(retval)
 #define fetch_retval_string_size   NULL
 
 #define DEFINE_FETCH_memory(type)  \
-static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
+__kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs, \
  void *addr, void *dest)   \
 {

[for-next][PATCH 13/17] tracing/uprobes: Pass is_return to traceprobe_parse_probe_arg()

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Currently uprobes don't pass is_return to the argument parser so that
it cannot make use of "$retval" fetch method since it only works for
return probes.

Reviewed-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_uprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bebd2f5..8bfd29a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -514,7 +514,7 @@ static int create_trace_uprobe(int argc, char **argv)
 
/* Parse fetch argument */
ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-false, false);
+is_return, false);
if (ret) {
pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 12/17] tracing/probes: Implement memory fetch method for uprobes

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Use separate method to fetch from memory.  Move existing functions to
trace_kprobe.c and make them static.  Also add new memory fetch
implementation for uprobes.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 77 +
 kernel/trace/trace_probe.c  | 77 -
 kernel/trace/trace_probe.h  |  4 ---
 kernel/trace/trace_uprobe.c | 52 ++
 4 files changed, 129 insertions(+), 81 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d2a4fd2..f94a569 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -148,6 +148,83 @@ DEFINE_BASIC_FETCH_FUNCS(stack)
 #define fetch_stack_string NULL
 #define fetch_stack_string_sizeNULL
 
+#define DEFINE_FETCH_memory(type)  \
+static __kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,\
+ void *addr, void *dest)   \
+{  \
+   type retval;\
+   if (probe_kernel_address(addr, retval)) \
+   *(type *)dest = 0;  \
+   else\
+   *(type *)dest = retval; \
+}
+DEFINE_BASIC_FETCH_FUNCS(memory)
+/*
+ * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
+ * length and relative data location.
+ */
+static __kprobes void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
+ void *addr, void *dest)
+{
+   long ret;
+   int maxlen = get_rloc_len(*(u32 *)dest);
+   u8 *dst = get_rloc_data(dest);
+   u8 *src = addr;
+   mm_segment_t old_fs = get_fs();
+
+   if (!maxlen)
+   return;
+
+   /*
+* Try to get string again, since the string can be changed while
+* probing.
+*/
+   set_fs(KERNEL_DS);
+   pagefault_disable();
+
+   do
+   ret = __copy_from_user_inatomic(dst++, src++, 1);
+   while (dst[-1] && ret == 0 && src - (u8 *)addr < maxlen);
+
+   dst[-1] = '\0';
+   pagefault_enable();
+   set_fs(old_fs);
+
+   if (ret < 0) {  /* Failed to fetch string */
+   ((u8 *)get_rloc_data(dest))[0] = '\0';
+   *(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
+   } else {
+   *(u32 *)dest = make_data_rloc(src - (u8 *)addr,
+ get_rloc_offs(*(u32 *)dest));
+   }
+}
+
+/* Return the length of string -- including null terminal byte */
+static __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct pt_regs 
*regs,
+   void *addr, void *dest)
+{
+   mm_segment_t old_fs;
+   int ret, len = 0;
+   u8 c;
+
+   old_fs = get_fs();
+   set_fs(KERNEL_DS);
+   pagefault_disable();
+
+   do {
+   ret = __copy_from_user_inatomic(, (u8 *)addr + len, 1);
+   len++;
+   } while (c && ret == 0 && len < MAX_STRING_SIZE);
+
+   pagefault_enable();
+   set_fs(old_fs);
+
+   if (ret < 0)/* Failed to check the length */
+   *(u32 *)dest = 0;
+   else
+   *(u32 *)dest = len;
+}
+
 #define DEFINE_FETCH_symbol(type)  \
 __kprobes void FETCH_FUNC_NAME(symbol, type)(struct pt_regs *regs, \
  void *data, void *dest)   \
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8d7231d..8f7a2b6d 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -103,83 +103,6 @@ DEFINE_BASIC_FETCH_FUNCS(retval)
 #define fetch_retval_stringNULL
 #define fetch_retval_string_size   NULL
 
-#define DEFINE_FETCH_memory(type)  \
-__kprobes void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs, \
- void *addr, void *dest)   \
-{  \
-   type retval;\
-   if (probe_kernel_address(addr, retval)) \
-   *(type *)dest = 0;  \
-   else\
-   *(type *)dest = retval; \
-}
-DEFINE_BASIC_FETCH_FUNCS(memory)
-/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max

[for-next][PATCH 14/17] tracing/uprobes: Fetch args before reserving a ring buffer

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Fetching from user space should be done in a non-atomic context.  So
use a per-cpu buffer and copy its content to the ring buffer
atomically.  Note that we can migrate during accessing user memory
thus use a per-cpu mutex to protect concurrent accesses.

This is needed since we'll be able to fetch args from an user memory
which can be swapped out.  Before that uprobes could fetch args from
registers only which saved in a kernel space.

While at it, use __get_data_size() and store_trace_args() to reduce
code duplication.  And add struct uprobe_cpu_buffer and its helpers as
suggested by Oleg.

Reviewed-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_uprobe.c | 146 +++-
 1 file changed, 132 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8bfd29a..794e8bc 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -652,21 +652,117 @@ static const struct file_operations uprobe_profile_ops = 
{
.release= seq_release,
 };
 
+struct uprobe_cpu_buffer {
+   struct mutex mutex;
+   void *buf;
+};
+static struct uprobe_cpu_buffer __percpu *uprobe_cpu_buffer;
+static int uprobe_buffer_refcnt;
+
+static int uprobe_buffer_init(void)
+{
+   int cpu, err_cpu;
+
+   uprobe_cpu_buffer = alloc_percpu(struct uprobe_cpu_buffer);
+   if (uprobe_cpu_buffer == NULL)
+   return -ENOMEM;
+
+   for_each_possible_cpu(cpu) {
+   struct page *p = alloc_pages_node(cpu_to_node(cpu),
+ GFP_KERNEL, 0);
+   if (p == NULL) {
+   err_cpu = cpu;
+   goto err;
+   }
+   per_cpu_ptr(uprobe_cpu_buffer, cpu)->buf = page_address(p);
+   mutex_init(_cpu_ptr(uprobe_cpu_buffer, cpu)->mutex);
+   }
+
+   return 0;
+
+err:
+   for_each_possible_cpu(cpu) {
+   if (cpu == err_cpu)
+   break;
+   free_page((unsigned long)per_cpu_ptr(uprobe_cpu_buffer, 
cpu)->buf);
+   }
+
+   free_percpu(uprobe_cpu_buffer);
+   return -ENOMEM;
+}
+
+static int uprobe_buffer_enable(void)
+{
+   int ret = 0;
+
+   BUG_ON(!mutex_is_locked(_mutex));
+
+   if (uprobe_buffer_refcnt++ == 0) {
+   ret = uprobe_buffer_init();
+   if (ret < 0)
+   uprobe_buffer_refcnt--;
+   }
+
+   return ret;
+}
+
+static void uprobe_buffer_disable(void)
+{
+   BUG_ON(!mutex_is_locked(_mutex));
+
+   if (--uprobe_buffer_refcnt == 0) {
+   free_percpu(uprobe_cpu_buffer);
+   uprobe_cpu_buffer = NULL;
+   }
+}
+
+static struct uprobe_cpu_buffer *uprobe_buffer_get(void)
+{
+   struct uprobe_cpu_buffer *ucb;
+   int cpu;
+
+   cpu = raw_smp_processor_id();
+   ucb = per_cpu_ptr(uprobe_cpu_buffer, cpu);
+
+   /*
+* Use per-cpu buffers for fastest access, but we might migrate
+* so the mutex makes sure we have sole access to it.
+*/
+   mutex_lock(>mutex);
+
+   return ucb;
+}
+
+static void uprobe_buffer_put(struct uprobe_cpu_buffer *ucb)
+{
+   mutex_unlock(>mutex);
+}
+
 static void uprobe_trace_print(struct trace_uprobe *tu,
unsigned long func, struct pt_regs *regs)
 {
struct uprobe_trace_entry_head *entry;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
+   struct uprobe_cpu_buffer *ucb;
void *data;
-   int size, i;
+   int size, dsize, esize;
struct ftrace_event_call *call = >tp.call;
 
-   size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+   dsize = __get_data_size(>tp, regs);
+   esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
+
+   if (WARN_ON_ONCE(!uprobe_cpu_buffer || tu->tp.size + dsize > PAGE_SIZE))
+   return;
+
+   ucb = uprobe_buffer_get();
+   store_trace_args(esize, >tp, regs, ucb->buf, dsize);
+
+   size = esize + tu->tp.size + dsize;
event = trace_current_buffer_lock_reserve(, call->event.type,
- size + tu->tp.size, 0, 0);
+ size, 0, 0);
if (!event)
-   return;
+   goto out;
 
entry = ring_buffer_event_data(event);
if (is_ret_probe(tu)) {
@@ -678,13 +774,13 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
data = DATAOF_TRACE_ENTRY(entry, false);
}
 
-   for (i = 0; i < tu->tp.nr_args; i++) {
-   call_fetch(>tp.args[i].fetch, regs,
-  data + tu->tp.args[i].offset);
-   }
+   memcpy(data, ucb->buf, tu->tp.size + dsize);
 
if

[for-next][PATCH 11/17] tracing/probes: Add fetch{,_size} member into deref fetch method

2014-01-02 Thread Steven Rostedt

From: Hyeoncheol Lee 

The deref fetch methods access a memory region but it assumes that
it's a kernel memory since uprobes does not support them.

Add ->fetch and ->fetch_size member in order to provide a proper
access methods for supporting uprobes.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Hyeoncheol Lee 
[namhy...@kernel.org: Split original patch into pieces as requested]
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_probe.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index a31ad47..8d7231d 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -184,6 +184,8 @@ __kprobes void FETCH_FUNC_NAME(memory, string_size)(struct 
pt_regs *regs,
 struct deref_fetch_param {
struct fetch_param  orig;
longoffset;
+   fetch_func_tfetch;
+   fetch_func_tfetch_size;
 };
 
 #define DEFINE_FETCH_deref(type)   \
@@ -195,13 +197,26 @@ __kprobes void FETCH_FUNC_NAME(deref, type)(struct 
pt_regs *regs, \
call_fetch(>orig, regs, );   \
if (addr) { \
addr += dprm->offset;   \
-   fetch_memory_##type(regs, (void *)addr, dest);  \
+   dprm->fetch(regs, (void *)addr, dest);  \
} else  \
*(type *)dest = 0;  \
 }
 DEFINE_BASIC_FETCH_FUNCS(deref)
 DEFINE_FETCH_deref(string)
-DEFINE_FETCH_deref(string_size)
+
+__kprobes void FETCH_FUNC_NAME(deref, string_size)(struct pt_regs *regs,
+  void *data, void *dest)
+{
+   struct deref_fetch_param *dprm = data;
+   unsigned long addr;
+
+   call_fetch(>orig, regs, );
+   if (addr && dprm->fetch_size) {
+   addr += dprm->offset;
+   dprm->fetch_size(regs, (void *)addr, dest);
+   } else
+   *(string_size *)dest = 0;
+}
 
 static __kprobes void update_deref_fetch_param(struct deref_fetch_param *data)
 {
@@ -477,6 +492,9 @@ static int parse_probe_arg(char *arg, const struct 
fetch_type *t,
return -ENOMEM;
 
dprm->offset = offset;
+   dprm->fetch = t->fetch[FETCH_MTD_memory];
+   dprm->fetch_size = get_fetch_size_function(t,
+   dprm->fetch, ftbl);
ret = parse_probe_arg(arg, t2, >orig, is_return,
is_kprobe);
if (ret)
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 08/17] tracing/probes: Split [ku]probes_fetch_type_table

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Use separate fetch_type_table for kprobes and uprobes.  It currently
shares all fetch methods but some of them will be implemented
differently later.

This is not to break build if [ku]probes is configured alone (like
!CONFIG_KPROBE_EVENT and CONFIG_UPROBE_EVENT).  So I added '__weak'
to the table declaration so that it can be safely omitted when it
configured out.

Acked-by: Oleg Nesterov 
Acked-by: Masami Hiramatsu 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 20 ++
 kernel/trace/trace_probe.c  | 65 ++---
 kernel/trace/trace_probe.h  | 53 
 kernel/trace/trace_uprobe.c | 20 ++
 4 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c9ffdaf..fe3f00c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -88,6 +88,26 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
+/* Fetch type information table */
+const struct fetch_type kprobes_fetch_type_table[] = {
+   /* Special types */
+   [FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
+   sizeof(u32), 1, "__data_loc char[]"),
+   [FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
+   string_size, sizeof(u32), 0, "u32"),
+   /* Basic types */
+   ASSIGN_FETCH_TYPE(u8,  u8,  0),
+   ASSIGN_FETCH_TYPE(u16, u16, 0),
+   ASSIGN_FETCH_TYPE(u32, u32, 0),
+   ASSIGN_FETCH_TYPE(u64, u64, 0),
+   ASSIGN_FETCH_TYPE(s8,  u8,  1),
+   ASSIGN_FETCH_TYPE(s16, u16, 1),
+   ASSIGN_FETCH_TYPE(s32, u32, 1),
+   ASSIGN_FETCH_TYPE(s64, u64, 1),
+
+   ASSIGN_FETCH_TYPE_END
+};
+
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index c26bc9e..541036e 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -54,10 +54,6 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(s16, "%d")
 DEFINE_BASIC_PRINT_TYPE_FUNC(s32, "%d")
 DEFINE_BASIC_PRINT_TYPE_FUNC(s64, "%Ld")
 
-/* For defining macros, define string/string_size types */
-typedef u32 string;
-typedef u32 string_size;
-
 /* Print type function for string type */
 __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s,
  const char *name,
@@ -74,7 +70,6 @@ __kprobes int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq 
*s,
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
-#define FETCH_FUNC_NAME(method, type)  fetch_##method##_##type
 /*
  * Define macro for basic types - we don't need to define s* types, because
  * we have to care only about bitwidth at recording time.
@@ -359,25 +354,8 @@ free_bitfield_fetch_param(struct bitfield_fetch_param 
*data)
kfree(data);
 }
 
-/* Fetch type information table */
-static const struct fetch_type fetch_type_table[] = {
-   /* Special types */
-   [FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
-   sizeof(u32), 1, "__data_loc char[]"),
-   [FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
-   string_size, sizeof(u32), 0, "u32"),
-   /* Basic types */
-   ASSIGN_FETCH_TYPE(u8,  u8,  0),
-   ASSIGN_FETCH_TYPE(u16, u16, 0),
-   ASSIGN_FETCH_TYPE(u32, u32, 0),
-   ASSIGN_FETCH_TYPE(u64, u64, 0),
-   ASSIGN_FETCH_TYPE(s8,  u8,  1),
-   ASSIGN_FETCH_TYPE(s16, u16, 1),
-   ASSIGN_FETCH_TYPE(s32, u32, 1),
-   ASSIGN_FETCH_TYPE(s64, u64, 1),
-};
-
-static const struct fetch_type *find_fetch_type(const char *type)
+static const struct fetch_type *find_fetch_type(const char *type,
+   const struct fetch_type *ftbl)
 {
int i;
 
@@ -398,21 +376,22 @@ static const struct fetch_type *find_fetch_type(const 
char *type)
 
switch (bs) {
case 8:
-   return find_fetch_type("u8");
+   return find_fetch_type("u8", ftbl);
case 16:
-   return find_fetch_type("u16");
+   return find_fetch_type("u16", ftbl);
case 32:
-   return find_fetch_type("u32");
+   return find_fetch_type("u32", ftbl);
case 64:
-   return find_fetch_type("u64");
+   return find_fetch_type("u64", ftbl);
default:
goto fail;
}
}
 
-   for (i = 0; i <

[for-next][PATCH 01/17] tracing/uprobes: Fix documentation of uprobe registration syntax

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

The uprobe syntax requires an offset after a file path not a symbol.

Reviewed-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Acked-by: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 Documentation/trace/uprobetracer.txt | 10 +-
 kernel/trace/trace_uprobe.c  |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index d9c3e68..8f1a8b89 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -19,15 +19,15 @@ user to calculate the offset of the probepoint in the 
object.
 
 Synopsis of uprobe_tracer
 -
-  p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a uprobe
-  r[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS] : Set a return uprobe 
(uretprobe)
-  -:[GRP/]EVENT  : Clear uprobe or uretprobe 
event
+  p[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a uprobe
+  r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS] : Set a return uprobe (uretprobe)
+  -:[GRP/]EVENT   : Clear uprobe or uretprobe event
 
   GRP   : Group name. If omitted, "uprobes" is the default value.
   EVENT : Event name. If omitted, the event name is generated based
-  on SYMBOL+offs.
+  on PATH+OFFSET.
   PATH  : Path to an executable or a library.
-  SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+  OFFSET: Offset where the probe is inserted.
 
   FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index b6dcc42..c77b92d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -211,7 +211,7 @@ end:
 
 /*
  * Argument syntax:
- *  - Add uprobe: p|r[:[GRP/]EVENT] PATH:SYMBOL [FETCHARGS]
+ *  - Add uprobe: p|r[:[GRP/]EVENT] PATH:OFFSET [FETCHARGS]
  *
  *  - Remove uprobe: -:[GRP/]EVENT
  */
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 16/17] uprobes: Allocate ->utask before handler_chain() for tracing handlers

2014-01-02 Thread Steven Rostedt

From: Oleg Nesterov 

uprobe_trace_print() and uprobe_perf_print() need to pass the additional
info to call_fetch() methods, currently there is no simple way to do this.

current->utask looks like a natural place to hold this info, but we need
to allocate it before handler_chain().

This is a bit unfortunate, perhaps we will find a better solution later,
but this is simple and should work right now.

Signed-off-by: Oleg Nesterov 
Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Signed-off-by: Namhyung Kim 
---
 kernel/events/uprobes.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 24b7d6c..3cc8e0b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1828,6 +1828,10 @@ static void handle_swbp(struct pt_regs *regs)
if (unlikely(!test_bit(UPROBE_COPY_INSN, >flags)))
goto out;
 
+   /* Tracing handlers use ->utask to communicate with fetch methods */
+   if (!get_utask())
+   goto out;
+
handler_chain(uprobe, regs);
if (can_skip_sstep(uprobe, regs))
goto out;
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[for-next][PATCH 17/17] tracing/uprobes: Add @+file_offset fetch method

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Enable to fetch data from a file offset.  Currently it only supports
fetching from same binary uprobe set.  It'll translate the file offset
to a proper virtual address in the process.

The syntax is "@+OFFSET" as it does similar to normal memory fetching
(@ADDR) which does no address translation.

Suggested-by: Oleg Nesterov 
Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 Documentation/trace/uprobetracer.txt |  1 +
 kernel/trace/trace_kprobe.c  |  8 
 kernel/trace/trace_probe.c   | 13 +++-
 kernel/trace/trace_probe.h   |  2 ++
 kernel/trace/trace_uprobe.c  | 40 
 5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index 6e5cff2..f1cf9a3 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -32,6 +32,7 @@ Synopsis of uprobe_tracer
   FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
@ADDR   : Fetch memory at ADDR (ADDR should be in userspace)
+   @+OFFSET: Fetch memory at OFFSET (OFFSET from same file as PATH)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack  : Fetch stack address.
$retval : Fetch return value.(*)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index f94a569..ce0ed8a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -239,6 +239,14 @@ DEFINE_BASIC_FETCH_FUNCS(symbol)
 DEFINE_FETCH_symbol(string)
 DEFINE_FETCH_symbol(string_size)
 
+/* kprobes don't support file_offset fetch methods */
+#define fetch_file_offset_u8   NULL
+#define fetch_file_offset_u16  NULL
+#define fetch_file_offset_u32  NULL
+#define fetch_file_offset_u64  NULL
+#define fetch_file_offset_string   NULL
+#define fetch_file_offset_string_size  NULL
+
 /* Fetch type information table */
 const struct fetch_type kprobes_fetch_type_table[] = {
/* Special types */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index a130d61..8364a42 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -374,7 +374,7 @@ static int parse_probe_arg(char *arg, const struct 
fetch_type *t,
}
break;
 
-   case '@':   /* memory or symbol */
+   case '@':   /* memory, file-offset or symbol */
if (isdigit(arg[1])) {
ret = kstrtoul(arg + 1, 0, );
if (ret)
@@ -382,6 +382,17 @@ static int parse_probe_arg(char *arg, const struct 
fetch_type *t,
 
f->fn = t->fetch[FETCH_MTD_memory];
f->data = (void *)param;
+   } else if (arg[1] == '+') {
+   /* kprobes don't support file offsets */
+   if (is_kprobe)
+   return -EINVAL;
+
+   ret = kstrtol(arg + 2, 0, );
+   if (ret)
+   break;
+
+   f->fn = t->fetch[FETCH_MTD_file_offset];
+   f->data = (void *)offset;
} else {
/* uprobes don't support symbols */
if (!is_kprobe)
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 2d5b8f5..e29d743 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -106,6 +106,7 @@ enum {
FETCH_MTD_symbol,
FETCH_MTD_deref,
FETCH_MTD_bitfield,
+   FETCH_MTD_file_offset,
FETCH_MTD_END,
 };
 
@@ -217,6 +218,7 @@ ASSIGN_FETCH_FUNC(memory, ftype),   \
 ASSIGN_FETCH_FUNC(symbol, ftype),  \
 ASSIGN_FETCH_FUNC(deref, ftype),   \
 ASSIGN_FETCH_FUNC(bitfield, ftype),\
+ASSIGN_FETCH_FUNC(file_offset, ftype), \
  } \
}
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 794e8bc..1fdea6d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -70,6 +70,11 @@ static int unregister_uprobe_event(struct trace_uprobe *tu);
 static DEFINE_MUTEX(uprobe_lock);
 static LIST_HEAD(uprobe_list);
 
+struct uprobe_dispatch_data {
+   struct trace_uprobe *tu;
+   unsigned long   bp_addr;
+};
+
 static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs 
*regs);
 static int uretprobe_dispatcher(struct uprobe_consumer *con,
unsigned long func, struct pt_regs *regs);
@@ -175,6 +180,29 @@ static __kprobes void FETCH_FUNC_NAME(memory, 
string_size)(struct pt_regs *regs,
 #define fetch_symbol_string

[for-next][PATCH 15/17] tracing/uprobes: Add support for full argument access methods

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

Enable to fetch other types of argument for the uprobes.  IOW, we can
access stack, memory, deref, bitfield and retval from uprobes now.

The format for the argument types are same as kprobes (but @SYMBOL
type is not supported for uprobes), i.e:

  @ADDR   : Fetch memory at ADDR
  $stackN : Fetch Nth entry of stack (N >= 0)
  $stack  : Fetch stack address
  $retval : Fetch return value
  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address

Note that the retval only can be used with uretprobes.

Original-patch-by: Hyeoncheol Lee 
Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: Oleg Nesterov 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Hyeoncheol Lee 
Signed-off-by: Namhyung Kim 
---
 Documentation/trace/uprobetracer.txt | 25 +
 kernel/trace/trace_probe.c   | 34 ++
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/Documentation/trace/uprobetracer.txt 
b/Documentation/trace/uprobetracer.txt
index 8f1a8b89..6e5cff2 100644
--- a/Documentation/trace/uprobetracer.txt
+++ b/Documentation/trace/uprobetracer.txt
@@ -31,6 +31,31 @@ Synopsis of uprobe_tracer
 
   FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
+   @ADDR   : Fetch memory at ADDR (ADDR should be in userspace)
+   $stackN : Fetch Nth entry of stack (N >= 0)
+   $stack  : Fetch stack address.
+   $retval : Fetch return value.(*)
+   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
+   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
+  (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
+  are supported.
+
+  (*) only for return probe.
+  (**) this is useful for fetching a field of data structures.
+
+Types
+-
+Several types are supported for fetch-args. Uprobe tracer will access memory
+by given type. Prefix 's' and 'u' means those types are signed and unsigned
+respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
+String type is a special type, which fetches a "null-terminated" string from
+user space.
+Bitfield is another special type, which takes 3 parameters, bit-width, bit-
+offset, and container-size (usually 32). The syntax is;
+
+ b@/
+
 
 Event Profiling
 ---
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8f7a2b6d..a130d61 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -255,12 +255,18 @@ fail:
 }
 
 /* Special function : only accept unsigned long */
-static __kprobes void fetch_stack_address(struct pt_regs *regs,
-   void *dummy, void *dest)
+static __kprobes void fetch_kernel_stack_address(struct pt_regs *regs,
+void *dummy, void *dest)
 {
*(unsigned long *)dest = kernel_stack_pointer(regs);
 }
 
+static __kprobes void fetch_user_stack_address(struct pt_regs *regs,
+  void *dummy, void *dest)
+{
+   *(unsigned long *)dest = user_stack_pointer(regs);
+}
+
 static fetch_func_t get_fetch_size_function(const struct fetch_type *type,
fetch_func_t orig_fn,
const struct fetch_type *ftbl)
@@ -305,7 +311,8 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned 
long *offset)
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
 
 static int parse_probe_vars(char *arg, const struct fetch_type *t,
-   struct fetch_param *f, bool is_return)
+   struct fetch_param *f, bool is_return,
+   bool is_kprobe)
 {
int ret = 0;
unsigned long param;
@@ -317,13 +324,16 @@ static int parse_probe_vars(char *arg, const struct 
fetch_type *t,
ret = -EINVAL;
} else if (strncmp(arg, "stack", 5) == 0) {
if (arg[5] == '\0') {
-   if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR) == 0)
-   f->fn = fetch_stack_address;
+   if (strcmp(t->name, DEFAULT_FETCH_TYPE_STR))
+   return -EINVAL;
+
+   if (is_kprobe)
+   f->fn = fetch_kernel_stack_address;
else
-   ret = -EINVAL;
+   f->fn = fetch_user_stack_address;
} else if (isdigit(arg[5])) {
ret = kstrtoul(arg + 5, 10, );
-   if (ret || param > PARAM_MAX_STACK)
+   if (ret || (is_kprobe && param > PARAM_MAX_STACK))
ret = -EINVAL;

[for-next][PATCH 05/17] tracing/kprobes: Move common functions to trace_probe.h

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

The __get_data_size() and store_trace_args() will be used by uprobes
too.  Move them to a common location.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 48 -
 kernel/trace/trace_probe.h  | 48 +
 2 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7271906..fb1a027 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -740,54 +740,6 @@ static const struct file_operations kprobe_profile_ops = {
.release= seq_release,
 };
 
-/* Sum up total data length for dynamic arraies (strings) */
-static __kprobes int __get_data_size(struct trace_probe *tp,
-struct pt_regs *regs)
-{
-   int i, ret = 0;
-   u32 len;
-
-   for (i = 0; i < tp->nr_args; i++)
-   if (unlikely(tp->args[i].fetch_size.fn)) {
-   call_fetch(>args[i].fetch_size, regs, );
-   ret += len;
-   }
-
-   return ret;
-}
-
-/* Store the value of each argument */
-static __kprobes void store_trace_args(int ent_size, struct trace_probe *tp,
-  struct pt_regs *regs,
-  u8 *data, int maxlen)
-{
-   int i;
-   u32 end = tp->size;
-   u32 *dl;/* Data (relative) location */
-
-   for (i = 0; i < tp->nr_args; i++) {
-   if (unlikely(tp->args[i].fetch_size.fn)) {
-   /*
-* First, we set the relative location and
-* maximum data length to *dl
-*/
-   dl = (u32 *)(data + tp->args[i].offset);
-   *dl = make_data_rloc(maxlen, end - tp->args[i].offset);
-   /* Then try to fetch string or dynamic array data */
-   call_fetch(>args[i].fetch, regs, dl);
-   /* Reduce maximum length */
-   end += get_rloc_len(*dl);
-   maxlen -= get_rloc_len(*dl);
-   /* Trick here, convert data_rloc to data_loc */
-   *dl = convert_rloc_to_loc(*dl,
-ent_size + tp->args[i].offset);
-   } else
-   /* Just fetching data normally */
-   call_fetch(>args[i].fetch, regs,
-  data + tp->args[i].offset);
-   }
-}
-
 /* Kprobe handler */
 static __kprobes void
 __kprobe_trace_func(struct trace_kprobe *tk, struct pt_regs *regs,
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 984e91e..d384fbd 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -178,3 +178,51 @@ extern ssize_t traceprobe_probes_write(struct file *file,
int (*createfn)(int, char**));
 
 extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
+
+/* Sum up total data length for dynamic arraies (strings) */
+static inline __kprobes int
+__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
+{
+   int i, ret = 0;
+   u32 len;
+
+   for (i = 0; i < tp->nr_args; i++)
+   if (unlikely(tp->args[i].fetch_size.fn)) {
+   call_fetch(>args[i].fetch_size, regs, );
+   ret += len;
+   }
+
+   return ret;
+}
+
+/* Store the value of each argument */
+static inline __kprobes void
+store_trace_args(int ent_size, struct trace_probe *tp, struct pt_regs *regs,
+u8 *data, int maxlen)
+{
+   int i;
+   u32 end = tp->size;
+   u32 *dl;/* Data (relative) location */
+
+   for (i = 0; i < tp->nr_args; i++) {
+   if (unlikely(tp->args[i].fetch_size.fn)) {
+   /*
+* First, we set the relative location and
+* maximum data length to *dl
+*/
+   dl = (u32 *)(data + tp->args[i].offset);
+   *dl = make_data_rloc(maxlen, end - tp->args[i].offset);
+   /* Then try to fetch string or dynamic array data */
+   call_fetch(>args[i].fetch, regs, dl);
+   /* Reduce maximum length */
+   end += get_rloc_len(*dl);
+   maxlen -= get_rloc_len(*dl);
+   /* Trick here, convert data_rloc to data_loc */
+   *dl = convert_rloc_to_loc(*dl,
+ent_size + tp->args[i].offset);
+   } else
+   /* Just fetching data normally */
+

[for-next][PATCH 06/17] tracing/probes: Integrate duplicate set_print_fmt()

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

The set_print_fmt() functions are implemented almost same for
[ku]probes.  Move it to a common place and get rid of the duplication.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 63 +
 kernel/trace/trace_probe.c  | 62 
 kernel/trace/trace_probe.h  |  2 ++
 kernel/trace/trace_uprobe.c | 55 +--
 4 files changed, 66 insertions(+), 116 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fb1a027..c9ffdaf 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -964,67 +964,6 @@ static int kretprobe_event_define_fields(struct 
ftrace_event_call *event_call)
return 0;
 }
 
-static int __set_print_fmt(struct trace_kprobe *tk, char *buf, int len)
-{
-   int i;
-   int pos = 0;
-
-   const char *fmt, *arg;
-
-   if (!trace_kprobe_is_return(tk)) {
-   fmt = "(%lx)";
-   arg = "REC->" FIELD_STRING_IP;
-   } else {
-   fmt = "(%lx <- %lx)";
-   arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
-   }
-
-   /* When len=0, we just calculate the needed length */
-#define LEN_OR_ZERO (len ? len - pos : 0)
-
-   pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
-
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
-   tk->tp.args[i].name, tk->tp.args[i].type->fmt);
-   }
-
-   pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
-
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   if (strcmp(tk->tp.args[i].type->name, "string") == 0)
-   pos += snprintf(buf + pos, LEN_OR_ZERO,
-   ", __get_str(%s)",
-   tk->tp.args[i].name);
-   else
-   pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
-   tk->tp.args[i].name);
-   }
-
-#undef LEN_OR_ZERO
-
-   /* return the length of print_fmt */
-   return pos;
-}
-
-static int set_print_fmt(struct trace_kprobe *tk)
-{
-   int len;
-   char *print_fmt;
-
-   /* First: called with 0 length to calculate the needed length */
-   len = __set_print_fmt(tk, NULL, 0);
-   print_fmt = kmalloc(len + 1, GFP_KERNEL);
-   if (!print_fmt)
-   return -ENOMEM;
-
-   /* Second: actually write the @print_fmt */
-   __set_print_fmt(tk, print_fmt, len + 1);
-   tk->tp.call.print_fmt = print_fmt;
-
-   return 0;
-}
-
 #ifdef CONFIG_PERF_EVENTS
 
 /* Kprobe profile handler */
@@ -1175,7 +1114,7 @@ static int register_kprobe_event(struct trace_kprobe *tk)
call->event.funcs = _funcs;
call->class->define_fields = kprobe_event_define_fields;
}
-   if (set_print_fmt(tk) < 0)
+   if (set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0)
return -ENOMEM;
ret = register_ftrace_event(>event);
if (!ret) {
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 430505b..d8347b0 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -837,3 +837,65 @@ out:
 
return ret;
 }
+
+static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
+  bool is_return)
+{
+   int i;
+   int pos = 0;
+
+   const char *fmt, *arg;
+
+   if (!is_return) {
+   fmt = "(%lx)";
+   arg = "REC->" FIELD_STRING_IP;
+   } else {
+   fmt = "(%lx <- %lx)";
+   arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
+   }
+
+   /* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+   pos += snprintf(buf + pos, LEN_OR_ZERO, "\"%s", fmt);
+
+   for (i = 0; i < tp->nr_args; i++) {
+   pos += snprintf(buf + pos, LEN_OR_ZERO, " %s=%s",
+   tp->args[i].name, tp->args[i].type->fmt);
+   }
+
+   pos += snprintf(buf + pos, LEN_OR_ZERO, "\", %s", arg);
+
+   for (i = 0; i < tp->nr_args; i++) {
+   if (strcmp(tp->args[i].type->name, "string") == 0)
+   pos += snprintf(buf + pos, LEN_OR_ZERO,
+   ", __get_str(%s)",
+   tp->args[i].name);
+   else
+   pos += snprintf(buf + pos, LEN_OR_ZERO, ", REC->%s",
+   tp->args[i].name);
+   }
+
+#undef LEN_OR_ZERO
+
+   /* return the length of print_fmt */
+   return pos;
+}
+
+int set_print_fmt(struct trace_probe *tp, bool

[for-next][PATCH 03/17] tracing/kprobes: Factor out struct trace_probe

2014-01-02 Thread Steven Rostedt

From: Namhyung Kim 

There are functions that can be shared to both of kprobes and uprobes.
Separate common data structure to struct trace_probe and use it from
the shared functions.

Acked-by: Masami Hiramatsu 
Acked-by: Oleg Nesterov 
Cc: Srikar Dronamraju 
Cc: zhangwei(Jovi) 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace_kprobe.c | 560 ++--
 kernel/trace/trace_probe.h  |  20 ++
 2 files changed, 295 insertions(+), 285 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index dae9541..7271906 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -27,18 +27,12 @@
 /**
  * Kprobe event core functions
  */
-struct trace_probe {
+struct trace_kprobe {
struct list_headlist;
struct kretproberp; /* Use rp.kp for kprobe use */
unsigned long   nhit;
-   unsigned intflags;  /* For TP_FLAG_* */
const char  *symbol;/* symbol name */
-   struct ftrace_event_class   class;
-   struct ftrace_event_callcall;
-   struct list_headfiles;
-   ssize_t size;   /* trace entry size */
-   unsigned intnr_args;
-   struct probe_argargs[];
+   struct trace_probe  tp;
 };
 
 struct event_file_link {
@@ -46,56 +40,46 @@ struct event_file_link {
struct list_headlist;
 };
 
-#define SIZEOF_TRACE_PROBE(n)  \
-   (offsetof(struct trace_probe, args) +   \
+#define SIZEOF_TRACE_KPROBE(n) \
+   (offsetof(struct trace_kprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
 
 
-static __kprobes bool trace_probe_is_return(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_is_return(struct trace_kprobe *tk)
 {
-   return tp->rp.handler != NULL;
+   return tk->rp.handler != NULL;
 }
 
-static __kprobes const char *trace_probe_symbol(struct trace_probe *tp)
+static __kprobes const char *trace_kprobe_symbol(struct trace_kprobe *tk)
 {
-   return tp->symbol ? tp->symbol : "unknown";
+   return tk->symbol ? tk->symbol : "unknown";
 }
 
-static __kprobes unsigned long trace_probe_offset(struct trace_probe *tp)
+static __kprobes unsigned long trace_kprobe_offset(struct trace_kprobe *tk)
 {
-   return tp->rp.kp.offset;
+   return tk->rp.kp.offset;
 }
 
-static __kprobes bool trace_probe_is_enabled(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_has_gone(struct trace_kprobe *tk)
 {
-   return !!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE));
+   return !!(kprobe_gone(>rp.kp));
 }
 
-static __kprobes bool trace_probe_is_registered(struct trace_probe *tp)
-{
-   return !!(tp->flags & TP_FLAG_REGISTERED);
-}
-
-static __kprobes bool trace_probe_has_gone(struct trace_probe *tp)
-{
-   return !!(kprobe_gone(>rp.kp));
-}
-
-static __kprobes bool trace_probe_within_module(struct trace_probe *tp,
-   struct module *mod)
+static __kprobes bool trace_kprobe_within_module(struct trace_kprobe *tk,
+struct module *mod)
 {
int len = strlen(mod->name);
-   const char *name = trace_probe_symbol(tp);
+   const char *name = trace_kprobe_symbol(tk);
return strncmp(mod->name, name, len) == 0 && name[len] == ':';
 }
 
-static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
+static __kprobes bool trace_kprobe_is_on_module(struct trace_kprobe *tk)
 {
-   return !!strchr(trace_probe_symbol(tp), ':');
+   return !!strchr(trace_kprobe_symbol(tk), ':');
 }
 
-static int register_probe_event(struct trace_probe *tp);
-static int unregister_probe_event(struct trace_probe *tp);
+static int register_kprobe_event(struct trace_kprobe *tk);
+static int unregister_kprobe_event(struct trace_kprobe *tk);
 
 static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
@@ -107,42 +91,42 @@ static int kretprobe_dispatcher(struct kretprobe_instance 
*ri,
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
-static struct trace_probe *alloc_trace_probe(const char *group,
+static struct trace_kprobe *alloc_trace_kprobe(const char *group,
 const char *event,
 void *addr,
 const char *symbol,
 unsigned long offs,
 int nargs, bool is_return)
 {
-   struct trace_probe *tp;
+   struct trace_kprobe *tk;
int ret = -ENOMEM;
 
-   tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
-   if (!tp)
+   tk = kzalloc(SIZEOF_TRACE_KPROBE(nargs), GFP_KERNEL);
+   if (!tk)
return ERR_PTR(ret);
 
if (symbol) {
-

Re: [PATCH 15/17] tracing/uprobes: Add support for full argument access methods

2014-01-02 Thread Steven Rostedt

On Fri, 3 Jan 2014 10:17:23 +0900
"Hyeoncheol Lee"  wrote:

> Patches look good to me.
> 
> Signed-off-by: Hyeoncheol Lee 
> 

Thanks! I'll add this to my queue.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/9] printk: Release lockbuf_lock before calling console_trylock_for_printk()

2014-01-02 Thread Steven Rostedt

On Mon, 23 Dec 2013 21:39:27 +0100
Jan Kara  wrote:

> There's no reason to hold lockbuf_lock when entering
> console_trylock_for_printk(). The first thing this function does is
> calling down_trylock(console_sem) and if that fails it immediately
> unlocks lockbuf_lock. So lockbuf_lock isn't needed for that branch.
> When down_trylock() succeeds, the rest of console_trylock() is OK
> without lockbuf_lock (it is called without it from other places), and
> the only remaining thing in console_trylock_for_printk() is
> can_use_console() call. For that call console_sem is enough (it
> iterates all consoles and checks CON_ANYTIME flag).
> 
> So we drop logbuf_lock before entering console_trylock_for_printk()
> which simplifies the code.

I'm very nervous about this change. The interlocking between console
lock and logbuf_lock seems to be very subtle. Especially the comment
where logbuf_lock is defined:

/*
 * The logbuf_lock protects kmsg buffer, indices, counters. It is also
 * used in interesting ways to provide interlocking in console_unlock();
 */

Unfortunately, it does not specify what those "interesting ways" are.

Now what I think this does is to make sure whoever wrote to the logbuf
first, does the flushing. With your change we now have:

CPU 0   CPU 1
-   -
   printk("blah");
   lock(logbuf_lock);

printk("bazinga!");
lock(logbuf_lock);

   unlock(logbuf_lock);
   < NMI comes in delays CPU>

unlock(logbuf_lock)
console_trylock_for_printk()
console_unlock();
< dumps output >

Now is this a bad thing? I don't know. But the current locking will
make sure that the first writer into logbuf_lock gets to do the
dumping, and all the others will just add onto that writer.

Your change now lets the second or third or whatever writer into printk
be the one that dumps the log.

Again, this may not be a big deal, but as printk is such a core part of
the Linux kernel, and this is a very subtle change, I rather be very
cautious here and try to think what can go wrong when this happens.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] printk: flush conflicting continuation line

2014-01-02 Thread Kay Sievers

On Fri, Jan 3, 2014 at 1:57 AM, Joe Perches  wrote:
> (Adding Kay to cc's)
>
> Kay?  any opinion on correctness?

Sounds fine by looking at it. Did not test anything though.

>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -1604,7 +1604,10 @@ asmlinkage int vprintk_emit(int facility, int level,
>> >   if (!(lflags & LOG_PREFIX))
>> >   stored = cont_add(facility, level, text, text_len);
>> >   cont_flush(LOG_NEWLINE);
>> > - }
>> > + /* Flush conflicting buffer. An earlier newline was missing
>> > + * and current print is from different task */
>> > + } else if (cont.len && cont.owner != current)
>> > + cont_flush(LOG_NEWLINE);

Unless I miss something, this whole sections all go inside a:
  if (cont.len) {
...
cont_flush(LOG_NEWLINE);
  }

and look a bit less confusing than the two conditions with just the
negated "current" check and duplicated flush call?

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] usb/core: fix NULL pointer dereference in recursively_mark_NOTATTACHED

2014-01-02 Thread Du, ChangbinX

> On Thu, 26 Dec 2013, Du, ChangbinX wrote:
> 
> > I can reproduce issue by adding a delay just after
> > usb_set_intfdata(intf, NULL)  (echo -1 > bConfigurationValue to trigger
> hub_dissconnect())without your patch.
> >
> > After patch applied, cannot reproduce and didn't found any other issue.
> Patch works well.
> >
> > Alan, need I update patch to v2 or you will do it?
> 
> Changbin, after looking more closely I realized there was a second aspect to
> this race: recursively_mark_NOTATTACHED uses hub->ports[i] while
> hub_disconnect removes the port devices.  You ought to be able to cause
> an oops by inserting a delay just after the loop where
> usb_hub_remove_port_device is called.
> 
> The updated patch below should fix both problems.  Can you test it?
> 
> Alan Stern
> 

Ok, I'll test it today or tomorrow. Please wait my response.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 06/18] xen/pvh: MMU changes for PVH (v2)

2014-01-02 Thread Mukesh Rathor

On Thu, 2 Jan 2014 11:24:50 +
David Vrabel  wrote:

> On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > From: Mukesh Rathor 
> > 
> > .. which are surprinsingly small compared to the amount for PV code.
> > 
> > PVH uses mostly native mmu ops, we leave the generic (native_*) for
> > the majority and just overwrite the baremetal with the ones we need.
> > 
> > We also optimize one - the TLB flush. The native operation would
> > needlessly IPI offline VCPUs causing extra wakeups. Using the
> > Xen one avoids that and lets the hypervisor determine which
> > VCPU needs the TLB flush.
> 
> This TLB flush optimization should be a separate patch.

It's not really an "optimization", we are using PV mechanism instead
of native because PV one performs better. So, I think it's ok to belong
here.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v12 03/18] xen/pvh: Early bootup changes in PV code (v2).

2014-01-02 Thread Mukesh Rathor

On Thu, 2 Jan 2014 13:32:21 -0500
Konrad Rzeszutek Wilk  wrote:

> On Thu, Jan 02, 2014 at 03:32:33PM +, David Vrabel wrote:
> > On 01/01/14 04:35, Konrad Rzeszutek Wilk wrote:
> > > From: Mukesh Rathor 
> > > 
> > > In the bootup code for PVH we can trap cpuid via vmexit, so don't
> > > need to use emulated prefix call. We also check for vector
> > > callback early on, as it is a required feature. PVH also runs at
> > > default kernel IOPL.
> > > 
> > > Finally, pure PV settings are moved to a separate function that
> > > are only called for pure PV, ie, pv with pvmmu. They are also
> > > #ifdef with CONFIG_XEN_PVMMU.
> > [...]
> > > @@ -331,12 +333,15 @@ static void xen_cpuid(unsigned int *ax,
> > > unsigned int *bx, break;
> > >   }
> > >  
> > > - asm(XEN_EMULATE_PREFIX "cpuid"
> > > - : "=a" (*ax),
> > > -   "=b" (*bx),
> > > -   "=c" (*cx),
> > > -   "=d" (*dx)
> > > - : "0" (*ax), "2" (*cx));
> > > + if (xen_pvh_domain())
> > > + native_cpuid(ax, bx, cx, dx);
> > > + else
> > > + asm(XEN_EMULATE_PREFIX "cpuid"
> > > + : "=a" (*ax),
> > > + "=b" (*bx),
> > > + "=c" (*cx),
> > > + "=d" (*dx)
> > > + : "0" (*ax), "2" (*cx));
> > 
> > For this one off cpuid call it seems preferrable to me to use the
> > emulate prefix rather than diverge from PV.
> 
> This was before the PV cpuid was deemed OK to be used on PVH.
> Will rip this out to use the same version.

Whats wrong with using native cpuid? That is one of the benefits that
cpuid can be trapped via vmexit, and also there is talk of making PV
cpuid trap obsolete in the future. I suggest leaving it native.

Mukesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] Bluetooth: Add hci_h4p driver

2014-01-02 Thread Sebastian Reichel

Hi Pavel,

Here are some cleanup suggestions for probe, removal & module
initialization functions.

On Fri, Jan 03, 2014 at 01:17:54AM +0100, Pavel Machek wrote:
> +static int hci_h4p_probe(struct platform_device *pdev)
> +{
> + struct hci_h4p_platform_data *bt_plat_data;
> + struct hci_h4p_info *info;
> + int err;
> +
> + dev_info(>dev, "Registering HCI H4P device\n");
> + info = kzalloc(sizeof(struct hci_h4p_info), GFP_KERNEL);

info = devm_kzalloc(>dev, sizeof(struct hci_h4p_info), GFP_KERNEL);

> + if (!info)
> + return -ENOMEM;
> +
> + info->dev = >dev;
> + info->tx_enabled = 1;
> + info->rx_enabled = 1;
> + spin_lock_init(>lock);
> + spin_lock_init(>clocks_lock);
> + skb_queue_head_init(>txq);
> +
> + if (pdev->dev.platform_data == NULL) {
> + dev_err(>dev, "Could not get Bluetooth config data\n");
> + kfree(info);
> + return -ENODATA;
> + }
> +
> + bt_plat_data = pdev->dev.platform_data;
> + info->chip_type = bt_plat_data->chip_type;
> + info->bt_wakeup_gpio = bt_plat_data->bt_wakeup_gpio;
> + info->host_wakeup_gpio = bt_plat_data->host_wakeup_gpio;
> + info->reset_gpio = bt_plat_data->reset_gpio;
> + info->reset_gpio_shared = bt_plat_data->reset_gpio_shared;
> + info->bt_sysclk = bt_plat_data->bt_sysclk;
> +
> + BT_DBG("RESET gpio: %d\n", info->reset_gpio);
> + BT_DBG("BTWU gpio: %d\n", info->bt_wakeup_gpio);
> + BT_DBG("HOSTWU gpio: %d\n", info->host_wakeup_gpio);
> + BT_DBG("sysclk: %d\n", info->bt_sysclk);
> +
> + init_completion(>test_completion);
> + complete_all(>test_completion);
> +
> + if (!info->reset_gpio_shared) {
> + err = gpio_request(info->reset_gpio, "bt_reset");

err = devm_gpio_request_one(>dev, info->reset_gpio, GPIOF_OUT_INIT_LOW, 
"bt_reset");

> + if (err < 0) {
> + dev_err(>dev, "Cannot get GPIO line %d\n",
> + info->reset_gpio);
> + goto cleanup_setup;
> + }
> + }
> +
> + err = gpio_request(info->bt_wakeup_gpio, "bt_wakeup");

err = devm_gpio_request_one(>dev, info->bt_wakeup_gpio, 
GPIOF_OUT_INIT_LOW, "bt_wakeup");

> + if (err < 0) {
> + dev_err(info->dev, "Cannot get GPIO line 0x%d",
> + info->bt_wakeup_gpio);
> + if (!info->reset_gpio_shared)
> + gpio_free(info->reset_gpio);
> + goto cleanup_setup;
> + }
> +
> + err = gpio_request(info->host_wakeup_gpio, "host_wakeup");

err = devm_gpio_request_one(>dev, info->host_wakeup_gpio, GPIOF_DIR_IN, 
"host_wakeup");

> + if (err < 0) {
> + dev_err(info->dev, "Cannot get GPIO line %d",
> +info->host_wakeup_gpio);
> + if (!info->reset_gpio_shared)
> + gpio_free(info->reset_gpio);
> + gpio_free(info->bt_wakeup_gpio);
> + goto cleanup_setup;
> + }
> +
> + gpio_direction_output(info->reset_gpio, 0);
> + gpio_direction_output(info->bt_wakeup_gpio, 0);
> + gpio_direction_input(info->host_wakeup_gpio);

You can remove these when you use the _request_one gpio_request
methods.

> + info->irq = bt_plat_data->uart_irq;
> + info->uart_base = ioremap(bt_plat_data->uart_base, SZ_2K);

info->uart_base = devm_ioremap(>dev, bt_plat_data->uart_base, SZ_2K);

> + info->uart_iclk = clk_get(NULL, bt_plat_data->uart_iclk);
> + info->uart_fclk = clk_get(NULL, bt_plat_data->uart_fclk);

devm_clk_get(...)

> + err = request_irq(info->irq, hci_h4p_interrupt, IRQF_DISABLED, 
> "hci_h4p",
> +   info);

devm_request_irq(...)

> + if (err < 0) {
> + dev_err(info->dev, "hci_h4p: unable to get IRQ %d\n", 
> info->irq);
> + goto cleanup;
> + }
> +
> + err = request_irq(gpio_to_irq(info->host_wakeup_gpio),
> +   hci_h4p_wakeup_interrupt,  IRQF_TRIGGER_FALLING |
> +   IRQF_TRIGGER_RISING | IRQF_DISABLED,
> +   "hci_h4p_wkup", info);

devm_request_irq(...)

> + if (err < 0) {
> + dev_err(info->dev, "hci_h4p: unable to get wakeup IRQ %d\n",
> +   gpio_to_irq(info->host_wakeup_gpio));
> + free_irq(info->irq, info);
> + goto cleanup;
> + }
> +
> + err = irq_set_irq_wake(gpio_to_irq(info->host_wakeup_gpio), 1);
> + if (err < 0) {
> + dev_err(info->dev, "hci_h4p: unable to set wakeup for IRQ %d\n",
> + gpio_to_irq(info->host_wakeup_gpio));
> + free_irq(info->irq, info);
> + free_irq(gpio_to_irq(info->host_wakeup_gpio), info);
> + goto cleanup;
> + }
> +
> + init_timer_deferrable(>lazy_release);
> + info->lazy_release.function = hci_h4p_lazy_clock_release;
> + info->lazy_release.data = (unsigned long)info;
> +

Re: [PATCH v2 06/10] Input: pm8xxx-vibrator - Add DT match table

2014-01-02 Thread Stephen Boyd

On 01/02/14 17:17, Dmitry Torokhov wrote:
> Hi Stephen,
>
> On Thu, Jan 02, 2014 at 04:37:36PM -0800, Stephen Boyd wrote:
>> The driver is only supported on DT enabled platforms. Convert the
>> driver to DT so that it can probe properly.
> I do not see MFD_PM8XXX depending on OF, should it be added if it only
> supported on DT?

No that would unnecessarily limit the compile coverage of this driver.
This one is so simple that it doesn't even use any OF APIs because it is
all hidden behind the platform bus.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 15/17] tracing/uprobes: Add support for full argument access methods

2014-01-02 Thread Hyeoncheol Lee

Patches look good to me.

Signed-off-by: Hyeoncheol Lee 

-Original Message-
From: Steven Rostedt [mailto:rost...@goodmis.org] 
Sent: Friday, January 03, 2014 6:02 AM
To: Steven Rostedt
Cc: Namhyung Kim; Oleg Nesterov; Masami Hiramatsu; Srikar Dronamraju;
Hyeoncheol Lee; zhangwei(Jovi); Arnaldo Carvalho de Melo; Hemant Kumar;
LKML; Namhyung Kim
Subject: Re: [PATCH 15/17] tracing/uprobes: Add support for full argument
access methods

On Thu, 2 Jan 2014 15:58:27 -0500
Steven Rostedt  wrote:

> On Mon, 16 Dec 2013 13:32:14 +0900
> Namhyung Kim  wrote:
> 
> > From: Namhyung Kim 
> > 
> > Enable to fetch other types of argument for the uprobes.  IOW, we 
> > can access stack, memory, deref, bitfield and retval from uprobes now.
> > 
> > The format for the argument types are same as kprobes (but @SYMBOL 
> > type is not supported for uprobes), i.e:
> > 
> >   @ADDR   : Fetch memory at ADDR
> >   $stackN : Fetch Nth entry of stack (N >= 0)
> >   $stack  : Fetch stack address
> >   $retval : Fetch return value
> >   +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address
> > 
> > Note that the retval only can be used with uretprobes.
> > 
> > Original-patch-by: Hyeoncheol Lee 
> 
> Where was the original patch posted? And can we get Hyeoncheol's SOB 
> for this?
> 

Was this the other part of patch 11? Still should have Hyeoncheol's
signed-off-by. If you already had it on the original (before the split) then
we can add it here too, as he already signed off on the code that this was
based on.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 06/10] Input: pm8xxx-vibrator - Add DT match table

2014-01-02 Thread Dmitry Torokhov

Hi Stephen,

On Thu, Jan 02, 2014 at 04:37:36PM -0800, Stephen Boyd wrote:
> The driver is only supported on DT enabled platforms. Convert the
> driver to DT so that it can probe properly.

I do not see MFD_PM8XXX depending on OF, should it be added if it only
supported on DT?

Thanks.

> 
> Signed-off-by: Stephen Boyd 
> ---
>  drivers/input/misc/pm8xxx-vibrator.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/input/misc/pm8xxx-vibrator.c 
> b/drivers/input/misc/pm8xxx-vibrator.c
> index 28251560249d..458d51b88be5 100644
> --- a/drivers/input/misc/pm8xxx-vibrator.c
> +++ b/drivers/input/misc/pm8xxx-vibrator.c
> @@ -142,6 +142,13 @@ static int pm8xxx_vib_play_effect(struct input_dev *dev, 
> void *data,
>   return 0;
>  }
>  
> +static const struct of_device_id pm8xxx_vib_id_table[] = {
> + { .compatible = "qcom,pm8058-vib" },
> + { .compatible = "qcom,pm8921-vib" },
> + { }
> +};
> +MODULE_DEVICE_TABLE(of, pm8xxx_vib_id_table);
> +
>  static int pm8xxx_vib_probe(struct platform_device *pdev)
>  
>  {
> @@ -221,6 +228,7 @@ static struct platform_driver pm8xxx_vib_driver = {
>   .name   = "pm8xxx-vib",
>   .owner  = THIS_MODULE,
>   .pm = _vib_pm_ops,
> + .of_match_table = pm8xxx_vib_id_table,
>   },
>  };
>  module_platform_driver(pm8xxx_vib_driver);
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
> 

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] Bluetooth: Add hci_h4p driver

2014-01-02 Thread Sebastian Reichel

Hi,

On Fri, Jan 03, 2014 at 01:17:54AM +0100, Pavel Machek wrote:
> Changes from v3: Moved platform data into
> include/linux/platform_data/, something I missed before.

As I wrote before Tony plans to remove the boardcode for all
OMAP boards including the Nokia N900 for 3.14, so you cannot
boot without DT from 3.14 onwards.

The drivers can still be initialized the old way using pdata quirks
until all drivers are converted, but I think this driver can simply
be prepared for DT directly:

> [...]
>
> +struct hci_h4p_platform_data {
> + int chip_type;

This can be "extracted" from the compatible string.

> + int bt_sysclk;

This can be converted into a vendor property.

> + unsigned int bt_wakeup_gpio;
> + unsigned int host_wakeup_gpio;
> + unsigned int reset_gpio;

These can easily be acquired via DT.

> + int reset_gpio_shared;

This looks like a simple property in the DT structure.

You should use a boolean type for this btw.

> + unsigned int uart_irq;

This one can also simply be aquired via DT.

> + phys_addr_t uart_base;

I see multiple ways for this one:

 1. Just put the memory address into the dts file.
 2. Make this a phandle to the UART node and get
the memory address from the referenced node.
 3. Make the bluetooth node a subnode of the UART
node and get the address from the parent node.

IMHO solution 3 is the best solution, since the bluetooth
chip is basically connected to the system via the UART.

> + const char *uart_iclk;
> + const char *uart_fclk;

There is currently work going on to move OMAP's clock
data into DT. When that work is done the clocks can
be acquired via phandles. I think it's expected to
be merged into 3.14.

> + void (*set_pm_limits)(struct device *dev, bool set);

If I'm not mistaken set_pm_limits is only referenced by
hci_h4p_set_pm_limits(). The hci_h4p_set_pm_limits()
function is not referenced anywhere, thus both can be
removed.

> +};

-- Sebastian

signature.asc
Description: Digital signature

Re: [PATCH] printk: flush conflicting continuation line

2014-01-02 Thread Joe Perches

(Adding Kay to cc's)

Kay?  any opinion on correctness?

On Thu, 2014-01-02 at 14:55 -0800, Andrew Morton wrote:
> On Wed, 1 Jan 2014 17:44:06 +0530 Arun KS  wrote:
> 
> > >From d751f9a0cb6329ae3171f6e1cb85e4a3aa792d73 Mon Sep 17 00:00:00 2001
> > From: Arun KS 
> > Date: Wed, 1 Jan 2014 17:24:46 +0530
> > Subject: printk: flush conflicting continuation line
> > 
> > An earlier newline was missing and current print is from different task.
> > In this scenario flush the continuation line and store this line seperatly.
> > 
> > This patch fix the below scenario of timestamp interleaving,
> > <6>[   28.154370 ] read_word_reg : reg[0x 3], reg[0x 4]  data [0x 642]
> > <6>[   28.155428 ] uart disconnect
> > <6>[   31.947341 ] dvfs[cpufreq.c<275>]:plug-in cpu<1> done
> > <4>[   28.155445 ] UART detached : send switch state 201
> > <6>[   32.014112 ] read_reg : reg[0x 3] data[0x21]
> > 
> > ...
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1604,7 +1604,10 @@ asmlinkage int vprintk_emit(int facility, int level,
> >   if (!(lflags & LOG_PREFIX))
> >   stored = cont_add(facility, level, text, text_len);
> >   cont_flush(LOG_NEWLINE);
> > - }
> > + /* Flush conflicting buffer. An earlier newline was missing
> > + * and current print is from different task */
> > + } else if (cont.len && cont.owner != current)
> > + cont_flush(LOG_NEWLINE);
> > 
> >   if (!stored)
> >   log_store(facility, level, lflags, 0,
> 
> Your email client makes a horrid mess of the patches :(
> 
> I *think* it's right.  But the code can be significantly simplified and
> optimised.  Please review:
> 
>   } else {
>   bool stored = false;
> 
>   /*
>* If an earlier newline was missing and it was the same task,
>* either merge it with the current buffer and flush, or if
>* there was a race with interrupts (prefix == true) then just
>* flush it out and store this line separately.
>* If the preceding printk was from a different task and missed
>* a newline, flush and append the newline.
>*/
>   if (cont.len) {
>   if (cont.owner == current && !(lflags & LOG_PREFIX))
>   stored = cont_add(facility, level, text,
> text_len);
>   cont_flush(LOG_NEWLINE);
>   }
> 
>   if (!stored)
>   log_store(facility, level, lflags, 0,
> dict, dictlen, text, text_len);
>   }
> 
> 
> 
> --- a/kernel/printk/printk.c~printk-flush-conflicting-continuation-line-fix
> +++ a/kernel/printk/printk.c
> @@ -1595,15 +1595,15 @@ asmlinkage int vprintk_emit(int facility
>* either merge it with the current buffer and flush, or if
>* there was a race with interrupts (prefix == true) then just
>* flush it out and store this line separately.
> +  * If the preceding printk was from a different task and missed
> +  * a newline, flush and append the newline.
>*/
> - if (cont.len && cont.owner == current) {
> - if (!(lflags & LOG_PREFIX))
> - stored = cont_add(facility, level, text, 
> text_len);
> - cont_flush(LOG_NEWLINE);
> - /* Flush conflicting buffer. An earlier newline was missing
> - * and current print is from different task */
> - } else if (cont.len && cont.owner != current)
> + if (cont.len) {
> + if (cont.owner == current && !(lflags & LOG_PREFIX))
> + stored = cont_add(facility, level, text,
> +   text_len);
>   cont_flush(LOG_NEWLINE);
> + }
>  
>   if (!stored)
>   log_store(facility, level, lflags, 0,
> _


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] watchdog: Adding Merrifield watchdog driver support

2014-01-02 Thread eric . ernst

From: Gabriel Touzeau 

Added Merrifield watchdog driver support.

Based on initial implementation from prior Intel SCU-based platforms, this
driver has several changes specific to the Tangier SoC / Merrifield platform.

Signed-off-by: Eric Ernst 
Signed-off-by: Yann Puech 
Signed-off-by: Jeremy Compostella 
Signed-off-by: Gabriel Touzeau 
Cc: David Cohen 
---
 drivers/watchdog/Kconfig  |   12 +
 drivers/watchdog/Makefile |1 +
 drivers/watchdog/intel_scu_watchdog_evo.c |  587 +
 drivers/watchdog/intel_scu_watchdog_evo.h |   54 +++
 4 files changed, 654 insertions(+)
 create mode 100644 drivers/watchdog/intel_scu_watchdog_evo.c
 create mode 100644 drivers/watchdog/intel_scu_watchdog_evo.h

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index d1d53f301de7..bb3ef92d2788 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -616,6 +616,18 @@ config INTEL_SCU_WATCHDOG
 
  To compile this driver as a module, choose M here.
 
+config INTEL_SCU_WATCHDOG_EVO
+   bool "Intel SCU Watchdog Evolution for Mobile Platforms"
+   depends on X86_INTEL_MID
+   ---help---
+ Hardware driver evolution for the watchdog timer built into the Intel
+ SCU for Intel Mobile Platforms.
+
+ This driver supports the watchdog evolution implementation in SCU,
+ available for Merrifield generation.
+
+ To compile this driver as a module, choose M here.
+
 config ITCO_WDT
tristate "Intel TCO Timer/Watchdog"
depends on (X86 || IA64) && PCI
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 6c5bb274d3cd..e4b150efa938 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -112,6 +112,7 @@ obj-$(CONFIG_W83977F_WDT) += w83977f_wdt.o
 obj-$(CONFIG_MACHZ_WDT) += machzwd.o
 obj-$(CONFIG_SBC_EPX_C3_WATCHDOG) += sbc_epx_c3.o
 obj-$(CONFIG_INTEL_SCU_WATCHDOG) += intel_scu_watchdog.o
+obj-$(CONFIG_INTEL_SCU_WATCHDOG_EVO) += intel_scu_watchdog_evo.o
 
 # M32R Architecture
 
diff --git a/drivers/watchdog/intel_scu_watchdog_evo.c 
b/drivers/watchdog/intel_scu_watchdog_evo.c
new file mode 100644
index ..fc9a37a33ddd
--- /dev/null
+++ b/drivers/watchdog/intel_scu_watchdog_evo.c
@@ -0,0 +1,587 @@
+/*
+ *  intel_scu_watchdog_evo:  An Intel SCU IOH Based Watchdog Device
+ * for Tangier SoC (Merrifield platform)
+ *
+ *  Based on previous intel_scu based watchdog driver, intel_scu_watchdog.
+ *
+ *  Copyright (C) 2009-2013 Intel Corporation. All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of version 2 of the GNU General
+ *  Public License as published by the Free Software Foundation.
+ *
+ *  This program is distributed in the hope that it will be
+ *  useful, but WITHOUT ANY WARRANTY; without even the implied
+ *  warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+ *  PURPOSE.  See the GNU General Public License for more details.
+ *  You should have received a copy of the GNU General Public
+ *  License along with this program; if not, write to the Free
+ *  Software Foundation, Inc., 59 Temple Place - Suite 330,
+ *  Boston, MA  02111-1307, USA.
+ *  The full GNU General Public License is included in this
+ *  distribution in the file called COPYING.
+ *
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "intel_scu_watchdog_evo.h"
+
+/* Defines */
+#define STRING_RESET_TYPE_MAX_LEN   11
+#define STRING_COLD_OFF "COLD_OFF"
+#define STRING_COLD_RESET   "COLD_RESET"
+#define STRING_COLD_BOOT"COLD_BOOT"
+
+#define EXT_TIMER0_MSI 15
+
+#define IPC_WATCHDOG 0xf8
+
+/* watchdog message options */
+enum {
+   SCU_WATCHDOG_START = 0,
+   SCU_WATCHDOG_STOP,
+   SCU_WATCHDOG_KEEPALIVE,
+   SCU_WATCHDOG_SET_ACTION_ON_TIMEOUT
+};
+
+/* watchdog reset options */
+enum {
+   SCU_COLD_OFF_ON_TIMEOUT = 0,
+   SCU_COLD_RESET_ON_TIMEOUT,
+   SCU_COLD_BOOT_ON_TIMEOUT,
+   SCU_DO_NOTHING_ON_TIMEOUT
+};
+
+/* Statics */
+static struct intel_scu_watchdog_dev watchdog_device;
+
+/* Module params */
+static bool disable_kernel_watchdog;
+module_param(disable_kernel_watchdog, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_kernel_watchdog,
+   "Disable kernel watchdog"
+   "Set to 0, watchdog started at boot"
+   "and left running; Set to 1; watchdog"
+   "is not started until user space"
+   "watchdog daemon is started; also if the"
+   "timer is started by the iafw firmware, it"
+   "will be disabled upon initialization of this"
+   "driver if disable_kernel_watchdog is set");
+
+static int pre_timeout =

Re: [RFC PATCH 2/3] arm64: dts: APM X-Gene PCIe device tree nodes

2014-01-02 Thread Jason Gunthorpe

On Thu, Jan 02, 2014 at 01:56:51PM -0800, Tanmay Inamdar wrote:
> On Mon, Dec 23, 2013 at 9:46 AM, Jason Gunthorpe
>  wrote:
> > On Mon, Dec 23, 2013 at 01:32:03PM +0530, Tanmay Inamdar wrote:
> >> This patch adds the device tree nodes for APM X-Gene PCIe controller and
> >> PCIe clock interface. Since X-Gene SOC supports maximum 5 ports, 5 dts 
> >> nodes
> >> are added.
> >
> > Can you include an lspci dump for PCI DT bindings please? It is
> > impossible to review otherwise..
> >
> 
> On the X-Gene evaluation platform, there is only one PCIe port
> enabled. Here is the 'lspci' dump

This is a bit hard to read withouth more context, but:

> # lspci -vvv
> 00:00.0 Class 0604: Device 19aa:e008 (rev 04)
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-

This is an on-chip device? (19aa does not seem to be a VID I can find)

Ideally this is the on-chip PCI-PCI bridge which represents the port.

The problem I see is that your DT binding has a top level stanza per
port.

We *really* prefer to see a single stanza for all ports - but this
requires the HW to be able to fit into the Linux resource assignment
model - a single resource pool for all ports and standard PCI-PCI
bridge config access to assign the resource to a port.

If your HW can't do this (eg because the port aperture 0xe0 is
hard wired) then the fall back is to place every port in a distinct
domain, with a distinct DT node and have overlapping bus numbers
and fixed windows. I don't see PCI domain support in your driver..

There is some kind of an addressing problem because you've done this:

+static void xgene_pcie_fixup_bridge(struct pci_dev *dev)
+{
+   int i;
+
+   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+   dev->resource[i].start = dev->resource[i].end = 0;
+   dev->resource[i].flags = 0;
+   }
+}
+DECLARE_PCI_FIXUP_HEADER(XGENE_PCIE_VENDORID, XGENE_PCIE_BRIDGE_DEVICEID,
+xgene_pcie_fixup_bridge);

Which is usually a sign that something is wonky with how the HW is
being fit into the PCI core.

> ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR-  Latency: 0, Cache Line Size: 64 bytes
> Region 0: Memory at  (64-bit, prefetchable)
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: f000-0fff
> Memory behind bridge: 00c0-00cf

[..]

> 01:00.0 Class 0200: Device 15b3:1003
> Region 0: Memory at e000c0 (64-bit, non-prefetchable) [size=1M]
> Region 2: Memory at e0 (64-bit, prefetchable)
> [size=8M]

Something funky is going on here too, the 64 bit address e0
should be reflected in the 'memory behind bridge' above, not
truncated.

ranges = <0x0200 0x0 0x 0x90 0x 0x0 0x1000 /* mem*/
+ 0x0100 0x0 0x8000 0x90 0x8000 0x0 
0x0001 /* io */
+ 0x 0x0 0xd000 0x90 0xd000 0x0 
0x0020 /* cfg */
+ 0x 0x0 0x7900 0x00 0x7900 0x0 
0x0080>; /* msi */

Ranges has a defined meaning, MSI shouldn't be in ranges, and 'cfg' is
only OK if the address encoding exactly matches the funky PCI-E extended
configuration address format. You can move these to regs or other
properties

(MSI is tricky, I'm not aware of DT binding work for MSI :()

Also, unrelated, can you please double check that your HW cannot
generate 8 and 16 bit configuration write TLPs natively? The
xgene_pcie_cfg_out8/16 hack is not desirable if it can be avoided.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1192 matches

Mail list logo