Re: [RFC/PATCHSET 00/15] perf report: Add support to accumulate hist periods

2012-09-27 Thread Namhyung Kim
Hi Frederic,

On Fri, 28 Sep 2012 01:01:48 +0200, Frederic Weisbecker wrote:
> When Arun was working on this, I asked him to explore if it could make sense 
> to reuse
> the "-b, --branch-stack"  perf report option. Because after all, this feature 
> is doing
> about the same than "-b" except it's using callchains instead of full branch 
> tracing.
> But callchains are branches. Just a limited subset of all branches taken on 
> excecution.
> So you can probably reuse some interface and even ground code there.
>
> What do you think?

Umm.. first of all, I'm not familiar with the branch stack thing.  It's
intel-specific, right?

Also I don't understand what exactly you want here.  What kind of
interface did you say?  Can you elaborate it bit more?

And AFAIK branch stack can collect much more branch information than
just callstacks.  Can we differentiate which is which easily?  Is there
any limitation on using it?  What if callstacks are not sync'ed with
branch stacks - is it possible though?

But I think it'd be good if the branch stack can be changed to call
stack in general.  Did you mean this?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the signal tree with the vfs tree

2012-09-27 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in fs/exec.c
between commit 5b8a94d461a7 ("coredump: move core dump functionality into
its own file") from the vfs tree and commits 70446600fa12 ("arm:
introduce ret_from_kernel_execve(), switch to generic kernel_execve()")
and 5e41814a7d8b ("arm: get rid of execve wrapper, switch to generic
execve() implementation") from the signal tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

BTW, Al, you have that vfs tree commit (and others) authored by you ...
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc fs/exec.c
index 48fb26e,df8b282..000
--- a/fs/exec.c
+++ b/fs/exec.c
@@@ -1645,3 -2031,345 +1644,58 @@@ int get_dumpable(struct mm_struct *mm
  {
return __get_dumpable(mm->flags);
  }
+ 
 -static void wait_for_dump_helpers(struct file *file)
 -{
 -  struct pipe_inode_info *pipe;
 -
 -  pipe = file->f_path.dentry->d_inode->i_pipe;
 -
 -  pipe_lock(pipe);
 -  pipe->readers++;
 -  pipe->writers--;
 -
 -  while ((pipe->readers > 1) && (!signal_pending(current))) {
 -  wake_up_interruptible_sync(>wait);
 -  kill_fasync(>fasync_readers, SIGIO, POLL_IN);
 -  pipe_wait(pipe);
 -  }
 -
 -  pipe->readers--;
 -  pipe->writers++;
 -  pipe_unlock(pipe);
 -
 -}
 -
 -
 -/*
 - * umh_pipe_setup
 - * helper function to customize the process used
 - * to collect the core in userspace.  Specifically
 - * it sets up a pipe and installs it as fd 0 (stdin)
 - * for the process.  Returns 0 on success, or
 - * PTR_ERR on failure.
 - * Note that it also sets the core limit to 1.  This
 - * is a special value that we use to trap recursive
 - * core dumps
 - */
 -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
 -{
 -  struct file *files[2];
 -  struct fdtable *fdt;
 -  struct coredump_params *cp = (struct coredump_params *)info->data;
 -  struct files_struct *cf = current->files;
 -  int err = create_pipe_files(files, 0);
 -  if (err)
 -  return err;
 -
 -  cp->file = files[1];
 -
 -  sys_close(0);
 -  fd_install(0, files[0]);
 -  spin_lock(>file_lock);
 -  fdt = files_fdtable(cf);
 -  __set_open_fd(0, fdt);
 -  __clear_close_on_exec(0, fdt);
 -  spin_unlock(>file_lock);
 -
 -  /* and disallow core files too */
 -  current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
 -
 -  return 0;
 -}
 -
 -void do_coredump(long signr, int exit_code, struct pt_regs *regs)
 -{
 -  struct core_state core_state;
 -  struct core_name cn;
 -  struct mm_struct *mm = current->mm;
 -  struct linux_binfmt * binfmt;
 -  const struct cred *old_cred;
 -  struct cred *cred;
 -  int retval = 0;
 -  int flag = 0;
 -  int ispipe;
 -  bool need_nonrelative = false;
 -  static atomic_t core_dump_count = ATOMIC_INIT(0);
 -  struct coredump_params cprm = {
 -  .signr = signr,
 -  .regs = regs,
 -  .limit = rlimit(RLIMIT_CORE),
 -  /*
 -   * We must use the same mm->flags while dumping core to avoid
 -   * inconsistency of bit flags, since this flag is not protected
 -   * by any locks.
 -   */
 -  .mm_flags = mm->flags,
 -  };
 -
 -  audit_core_dumps(signr);
 -
 -  binfmt = mm->binfmt;
 -  if (!binfmt || !binfmt->core_dump)
 -  goto fail;
 -  if (!__get_dumpable(cprm.mm_flags))
 -  goto fail;
 -
 -  cred = prepare_creds();
 -  if (!cred)
 -  goto fail;
 -  /*
 -   * We cannot trust fsuid as being the "true" uid of the process
 -   * nor do we know its entire history. We only know it was tainted
 -   * so we dump it as root in mode 2, and only into a controlled
 -   * environment (pipe handler or fully qualified path).
 -   */
 -  if (__get_dumpable(cprm.mm_flags) == SUID_DUMPABLE_SAFE) {
 -  /* Setuid core dump mode */
 -  flag = O_EXCL;  /* Stop rewrite attacks */
 -  cred->fsuid = GLOBAL_ROOT_UID;  /* Dump root private */
 -  need_nonrelative = true;
 -  }
 -
 -  retval = coredump_wait(exit_code, _state);
 -  if (retval < 0)
 -  goto fail_creds;
 -
 -  old_cred = override_creds(cred);
 -
 -  /*
 -   * Clear any false indication of pending signals that might
 -   * be seen by the filesystem code called to write the core file.
 -   */
 -  clear_thread_flag(TIF_SIGPENDING);
 -
 -  ispipe = format_corename(, signr);
 -
 -  if (ispipe) {
 -  int dump_count;
 -  char **helper_argv;
 -
 -  if (ispipe < 0) {
 -  printk(KERN_WARNING "format_corename failed\n");
 -  printk(KERN_WARNING "Aborting core\n");
 -   

Re: [PATCH] x86: Distinguish TLB shootdown interrupts from other functions call interrupts

2012-09-27 Thread H. Peter Anvin

On 09/27/2012 12:02 AM, Alex Shi wrote:


Peter:

Maybe the patch doesn't looks perfect for this issue.
So I am wondering if the following patch is better, if we don't care the irq_tlb
was counted again in irq_call?



Tomoki-san's patch looked sane to me, I should just apply it.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-27 Thread H. Peter Anvin

On 09/27/2012 10:38 PM, Raghavendra K T wrote:

+
+bool kvm_overcommitted()
+{


This better not be C...

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] pwm_backlight: Add device tree support for Low Threshold Brightness

2012-09-27 Thread Thierry Reding
On Thu, Sep 27, 2012 at 02:33:09PM -0700, Andrew Morton wrote:
> On Wed, 26 Sep 2012 20:17:07 +0530
> "Philip, Avinash"  wrote:
> 
> > Some back lights perform poorly when driven by a PWM with a short
> > duty-cycle. For such devices, the low threshold can be used to specify a
> > lower bound for the duty-cycle and should be chosen to exclude the
> > problematic range.
> > 
> > Add device tree probing support for lth_brightness putting
> > low-threshold-brightness as optional property.
> > 
> > ...
> >
> > --- a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> > +++ b/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> > @@ -14,6 +14,15 @@ Required properties:
> >  Optional properties:
> >- pwm-names: a list of names for the PWM devices specified in the
> > "pwms" property (see PWM binding[0])
> > +  - low-threshold-brightness: brightness threshold low level. Low threshold
> > +brightness set to value so that backlight present on low end of
> > +brightness.
> > +Some panels, backlight would absent if duty percentage of PWM wave is 
> > less
> > +than certain level (say 20%). By setting low-threshold-brightness to a
> > +value above (percentage of brightness-levels max) 50 (20% of 255, if 
> > 255
> > +is max). On setting low-threshold-brightness, range of 
> > brightness-levels
> > +is calculated in a region of low-threshold-brightness to 
> > brightness-levels
> > +max.
> 
> hoo boy, that's hard to follow.  How does this look?
> 
> --- 
> a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt~pwm_backlight-add-device-tree-support-for-low-threshold-brightness-fix
> +++ a/Documentation/devicetree/bindings/video/backlight/pwm-backlight.txt
> @@ -14,15 +14,15 @@ Required properties:
>  Optional properties:
>- pwm-names: a list of names for the PWM devices specified in the
> "pwms" property (see PWM binding[0])
> -  - low-threshold-brightness: brightness threshold low level. Low threshold
> -brightness set to value so that backlight present on low end of
> -brightness.
> -Some panels, backlight would absent if duty percentage of PWM wave is 
> less
> -than certain level (say 20%). By setting low-threshold-brightness to a
> -value above (percentage of brightness-levels max) 50 (20% of 255, if 255
> -is max). On setting low-threshold-brightness, range of brightness-levels
> -is calculated in a region of low-threshold-brightness to 
> brightness-levels
> -max.
> +  - low-threshold-brightness: brightness threshold low level. Sets the lowest
> +brightness value.
> +On some panels the backlight misbehaves if the duty cycle percentage of 
> the
> +PWM wave is less than a certain level (say 20%).  In this example the 
> user
> +can set low-threshold-brightness to a value above 50 (ie, 20% of 255), 
> thus
> +preventing the PWM duty cycle from going too low.
> +On setting low-threshold-brightness the range of brightness levels is
> +calculated in the range low-threshold-brightness to the maximum value in
> +brightness-levels, described above.
>  
>  [0]: Documentation/devicetree/bindings/pwm/pwm.txt
>  
> 
> 
> Also, I'm wondering if we really needed a new property - couldn't one
> do this simply by setting brightness-levels to 50..255?

Yes. This was discussed in the thread that followed the posting of this
patch's v2. We've decided to drop it and go with brightness-levels only
for device tree data. Eventually all existing users should convert to
that as well so we can remove some of the cruft from the platform data
up.

Thierry


pgpughV2lVEwC.pgp
Description: PGP signature


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-27 Thread Raghavendra K T

On 09/27/2012 05:33 PM, Avi Kivity wrote:

On 09/27/2012 01:23 PM, Raghavendra K T wrote:


This gives us a good case for tracking preemption on a per-vm basis.  As
long as we aren't preempted, we can keep the PLE window high, and also
return immediately from the handler without looking for candidates.


1) So do you think, deferring preemption patch ( Vatsa was mentioning
long back)  is also another thing worth trying, so we reduce the chance
of LHP.


Yes, we have to keep it in mind.  It will be useful for fine grained
locks, not so much so coarse locks or IPIs.



Agree.


I would still of course prefer a PLE solution, but if we can't get it to
work we can consider preemption deferral.



Okay.



IIRC, with defer preemption :
we will have hook in spinlock/unlock path to measure depth of lock held,
and shared with host scheduler (may be via MSRs now).
Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
give say one chance.


A downside is that we have to do that even when undercommitted.

Also there may be a lot of false positives (deferred preemptions even
when there is no contention).


Yes. That is a worry.





2) looking at the result (comparing A & C) , I do feel we have
significant in iterating over vcpus (when compared to even vmexit)
so We still would need undercommit fix sugested by PeterZ (improving by
140%). ?


Looking only at the current runqueue?  My worry is that it misses a lot
of cases.  Maybe try the current runqueue first and then others.

Or were you referring to something else?


No. I was referring to the same thing.

However. I had tried following also (which works well to check 
undercommited scenario). But thinking to use only for yielding in case
of overcommit (yield in overcommit suggested by Rik) and keep 
undercommit patch as suggested by PeterZ


[ patch is not in proper diff I suppose ].

Will test them.

Peter, Can I post your patch with your from/sob.. in V2?
Please let me know..

---
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 28f00bc..9ed3759 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1620,6 +1620,21 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)

return eligible;
 }
 #endif
+
+bool kvm_overcommitted()
+{
+   unsigned long load;
+
+   load = avenrun[0] + FIXED_1/200;
+   load = load >> FSHIFT;
+   load = (load << 7) / num_online_cpus();
+
+   if (load > 128)
+   return true;
+
+   return false;
+}
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me->kvm;
@@ -1629,6 +1644,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
int pass;
int i;

+   if (!kvm_overcommitted())
+   return;
+
kvm_vcpu_set_in_spin_loop(me, true);
/*
 * We boost the priority of a VCPU that is runnable but not

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CMA broken in next-20120926

2012-09-27 Thread Minchan Kim
On Thu, Sep 27, 2012 at 03:11:59PM -0700, Andrew Morton wrote:
> On Thu, 27 Sep 2012 13:29:11 +0200
> Thierry Reding  wrote:
> 
> > Hi Marek,
> > 
> > any idea why CMA might be broken in next-20120926. I see that there
> > haven't been any major changes to CMA itself, but there's been quite a
> > bit of restructuring of various memory allocation bits lately. I wasn't
> > able to track the problem down, though.
> > 
> > What I see is this during boot (with CMA_DEBUG enabled):
> > 
> > [0.266904] cma: dma_alloc_from_contiguous(cma db474f80, count 64, align 
> > 6)
> > [0.284469] cma: dma_alloc_from_contiguous(): memory range at c09d7000 
> > is busy, retrying
> > [0.293648] cma: dma_alloc_from_contiguous(): memory range at c09d7800 
> > is busy, retrying
> > ...
> > [2.648619] DMA: failed to allocate 256 KiB pool for atomic coherent 
> > allocation
> > ...
> > [4.196193] WARNING: at 
> > /home/thierry.reding/src/kernel/linux-ipmp.git/arch/arm/mm/dma-mapping.c:485
> >  __alloc_from_pool+0xdc/0x110()
> > [4.207988] coherent pool not initialised!
> > 
> > So the pool isn't getting initialized properly because CMA can't get at
> > the memory. Do you have any hints as to what might be going on? If it's
> > any help, I started seeing this with next-20120926 and it is in today's
> > next as well.
> > 
> 
> Bart and Minchan have made recent changes to CMA.  Let us cc them.

Hi all,

I have no time now so I look over the problem during short time
so I mighte be wrong. Even I should leave the office soon and
Korea will have long vacation from now on so I will be off by next week.
So it's hard to reach on me.

I hope this patch fixes the bug. If this patch fixes the problem
but has some problem about description or someone has better idea,
feel free to modify and resend to akpm, Please.

Thierry, Could you test below patch?

>From 24a547855fa2bd4212a779cc73997837148310b3 Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Fri, 28 Sep 2012 14:28:32 +0900
Subject: [PATCH] revert mm: compaction: iron out isolate_freepages_block()
 and isolate_freepages_range()

[1] made bug on CMA.
The nr_scanned should be never equal to total_isolated for successful CMA.
This patch reverts part of the patch.

[1] mm: compaction: iron out isolate_freepages_block() and 
isolate_freepages_range()

Cc: Mel Gorman 
Signed-off-by: Minchan Kim 
---
 mm/compaction.c |   29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 5037399..7721197 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -269,13 +269,14 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
int isolated, i;
struct page *page = cursor;
 
-   nr_scanned++;
if (!pfn_valid_within(blockpfn))
-   continue;
+   goto strict_check;
+   nr_scanned++;
+
if (!valid_page)
valid_page = page;
if (!PageBuddy(page))
-   continue;
+   goto strict_check;
 
/*
 * The zone lock must be held to isolate freepages.
@@ -296,12 +297,12 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
 
/* Recheck this is a buddy page under lock */
if (!PageBuddy(page))
-   continue;
+   goto strict_check;
 
/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
if (!isolated && strict)
-   break;
+   goto strict_check;
total_isolated += isolated;
for (i = 0; i < isolated; i++) {
list_add(>lru, freelist);
@@ -313,18 +314,20 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
blockpfn += isolated - 1;
cursor += isolated - 1;
}
+
+   continue;
+
+strict_check:
+   /* Abort isolation if the caller requested strict isolation */
+   if (strict) {
+   total_isolated = 0;
+   goto out;
+   }
}
 
trace_mm_compaction_isolate_freepages(nr_scanned, total_isolated);
 
-   /*
-* If strict isolation is requested by CMA then check that all the
-* pages scanned were isolated. If there were any failures, 0 is
-* returned and CMA will fail.
-*/
-   if (strict && nr_scanned != total_isolated)
-   total_isolated = 0;
-
+out:
if (locked)
spin_unlock_irqrestore(>zone->lock, flags);
 
-- 
1.7.9.5



> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: 

RE: [PATCH v4 0/2] ACPI: DBGP/DBG2 early console support for LPIA.

2012-09-27 Thread Zheng, Lv
Forgot to Cc x86 maintainers, will send again. Sorry for the noise.

> -Original Message-
> From: Zheng, Lv
> Sent: Friday, September 28, 2012 10:40 AM
> To: Brown, Len
> Cc: linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; Zheng, Lv
> Subject: [PATCH v4 0/2] ACPI: DBGP/DBG2 early console support for LPIA.
> 
> Microsoft Debug Port Table (DBGP or DBG2) is used by the Windows SoC
> platforms to describe their debugging facilities.
> Recent Low Power Intel Architecture (LPIA) platforms have utilized
> this for the SPI UART debug ports that are resident on their debug
> boards.
> 
> This patch set enables the DBGP/DBG2 debug ports as an Linux early
> console launcher.
> The SPI UART debug ports support is also refined to co-exist with this
> new usage model.
> 
> To use this facility on LPIA platforms, you need to enable the following
> kernel configurations:
>   CONFIG_EARLY_PRINTK_ACPI=y
>   CONFIG_EARLY_PRINTK_INTEL_MID_SPI=y
> Then you need to append the following kernel parameter to the kernel
> command line in your the boot loader configuration file:
>   earlyprintk=acpi
> 
> There is a dilemma in designing this patch set.  There should be three
> steps to enable an early console for an operating system:
> 1. Probe: In this stage, the Linux kernel can detect the early consoles
>   and the base address of their register block can be determined.
>   This can be done by parsing the descriptors in the ACPI
> DBGP/DBG2
>   tables.  Note that acpi_table_init() must be called before
>   parsing.
> 2. Setup: In this stage, the Linux kernel can apply user specified
>   configuration options (ex. baudrate of serial ports) for the
>   early consoles.  This is done by parsing the early parameters
>   passed to the kernel from the boot loaders.  Note that
>   parse_early_params() is called very early to allow parameters to
>   be passed to other kernel subsystems.
> 3. Start: In this stage, the Linux kernel can make the console available
>   to output messages.  Since early consoles are always used for
>   kernel boot up debugging, this must be done as early as possible
>   to arm the kernel with more testability the kernel subsystems.
>   Note that, this stage happens when the register_console() is
>   called.
> The preferred sequence for the above steps is:
>+-++---+++
>| ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK
> START |
>+-++---+++
> But unfortunately, in the current x86 implementation, early parameters and
> early printk initialization are called before acpi_table_init() which
> requires early memory mapping facility.
> There are some choices for me to design this patch set:
> 1. Invoking acpi_table_init() before parse_early_param() to maintain the
>sequence:
>+-++---+++
>| ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK
> START |
>+-++---+++
>This requires other subsystem maintainers' review to ensure no
>regressions will be introduced.  As far as I know, one kind of issue
>might be found in EFI subsystsm:
>The EFI boot services and runtime services are mixed up in the x86
>specific initialization process before the ACPI table initialization.
>Things are much worse that you even cannot disable the runtime services
>while still allow the boot services codes to be executed in the kernel
>compilation stage.  Enabling the early consoles after the ACPI table
>initialization will make it difficult to debug the runtime BIOS bugs.
>If any progress is made to the kernel boot sequences, please let me
>know.  I'll be willing to redesign the ACPI DBGP/DBG2 console probing
>facility.  You can reach me at .
> 2. Modifying above sequece to make it look like:
>+---++-+++
>| EARLY_PARAM SETUP | -> | ACPI DBGP PROBE | -> | EARLY_RPINTK
> START |
>+---++-+++
>Early consoles started in this style will lose some debuggabilities in
>the kernel boot up.  If the system does not crash very early,
>developers still can see the bufferred kernel outputs when the
>register_console() is called.
>Current early console implementation need to be modified to split their
>initialization codes into tow part:
>1. Detecting hardware.  This can be called in the PROBE stage.
>2. Applying user parameters.  This can be called in the SETUP stage.
>Individual early console drver maintainers need to be involved to avoid
>regressions that might occur on this modification as the maintainers
>might offer the real 

Re: [PATCH 1/1] hid:Fix problem on GeneralTouch multi-touchscreen

2012-09-27 Thread Benjamin Tissoires
On Fri, Sep 28, 2012 at 4:18 AM, GeneralTouch  wrote:
> From: Xianhan Yu 
>
> Fix the touch-up no response problem on GeneralTouch twofingers touchscreen 
> and modify the driver for new GeneralTouch PWT touchscreen.
>
> Signed-off-by: Xianhan Yu 

Hi,

Thank you for re-submitting the patch. It's cleaner now.

I have a few questions, but generally speaking, the patch is good in
its current form.

Jiri: I know that I have not been as reactive as I used to, but I'm
ending my contract in my current lab now, and I've been pretty busy.
Please don't put me in the grave, I'm still the maintainer of
hid-multitouch ;-)


> ---
>  drivers/hid/hid-ids.h|1 +
>  drivers/hid/hid-multitouch.c |   20 ++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
> index 1dcb76f..a6d5890 100644
> --- a/drivers/hid/hid-ids.h
> +++ b/drivers/hid/hid-ids.h
> @@ -305,6 +305,7 @@
>
>  #define USB_VENDOR_ID_GENERAL_TOUCH0x0dfc
>  #define USB_DEVICE_ID_GENERAL_TOUCH_WIN7_TWOFINGERS 0x0003
> +#define USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PWT_TENFINGERS 0x0100
>
>  #define USB_VENDOR_ID_GLAB 0x06c2
>  #define USB_DEVICE_ID_4_PHIDGETSERVO_300x0038
> diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
> index 59c8b5c..7aece16 100644
> --- a/drivers/hid/hid-multitouch.c
> +++ b/drivers/hid/hid-multitouch.c
> @@ -115,6 +115,8 @@ struct mt_device {
>  #define MT_CLS_EGALAX_SERIAL   0x0104
>  #define MT_CLS_TOPSEED 0x0105
>  #define MT_CLS_PANASONIC   0x0106
> +#define MT_CLS_GENERALTOUCH_TWOFINGERS 0x0107
> +#define MT_CLS_GENERALTOUCH_PWT_TENFINGERS 0x0108
>
>  #define MT_DEFAULT_MAXCONTACT  10
>
> @@ -215,7 +217,18 @@ static struct mt_class mt_classes[] = {
> { .name = MT_CLS_PANASONIC,
> .quirks = MT_QUIRK_NOT_SEEN_MEANS_UP,
> .maxcontacts = 4 },
> -
> +   { .name = MT_CLS_GENERALTOUCH_TWOFINGERS,
> +   .quirks = MT_QUIRK_NOT_SEEN_MEANS_UP |
> +   MT_QUIRK_VALID_IS_INRANGE |
> +   MT_QUIRK_SLOT_IS_CONTACTNUMBER,
> +   .maxcontacts = 2
> +   },

At first, I was a little bit surprised because
MT_QUIRK_NOT_SEEN_MEANS_UP and MT_QUIRK_VALID_IS_INRANGE were not
supposed to be used together. Anyway, if it's smoothly working with
your device, then I'm not against: the code shows that they won't
interfere.

> +   { .name = MT_CLS_GENERALTOUCH_PWT_TENFINGERS,

This is more worrying me. Apparently the 0x0100 device is win8
compliant. Does'nt it work out of the box with MT_CLS_DEFAULT?

> +   .quirks = MT_QUIRK_NOT_SEEN_MEANS_UP |
> +   MT_QUIRK_SLOT_IS_CONTACTNUMBER,
> +   .maxcontacts = 10

Do you really need to set the contact number of your device? Doing so
will force you to create a new class if you have a device with a
different maximum contact count.

I'm asking because I'd rather not having this field set on most of the MT_CLS_*.
However, if it's needed, (because you need to set it into the
associated feature), them I will be fine with it. But I would
appreciate to get the report descriptor of this particular device.

So, if you judge that those two device-specific classes are absolutely
needed (after all, you have the device in your hands), you have my
reviewed-by:
Reviewed-by Benjamin Tissoires 

Thanks,
Benjamin


> +   },
> +
> { }
>  };
>
> @@ -893,9 +906,12 @@ static const struct hid_device_id mt_devices[] = {
> USB_DEVICE_ID_ELO_TS2515) },
>
> /* GeneralTouch panel */
> -   { .driver_data = MT_CLS_DUAL_INRANGE_CONTACTNUMBER,
> +   { .driver_data = MT_CLS_GENERALTOUCH_TWOFINGERS,
> MT_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH,
> USB_DEVICE_ID_GENERAL_TOUCH_WIN7_TWOFINGERS) },
> +   { .driver_data = MT_CLS_GENERALTOUCH_PWT_TENFINGERS,
> +   MT_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH,
> +   USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PWT_TENFINGERS) },
>
> /* Gametel game controller */
> { .driver_data = MT_CLS_DEFAULT,
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-input" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] hpfs: convert to use leXX_add_cpu()

2012-09-27 Thread Wei Yongjun
From: Wei Yongjun 

Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 fs/hpfs/dnode.c | 30 +++---
 fs/hpfs/anode.c | 6 +++---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/hpfs/dnode.c b/fs/hpfs/dnode.c
index 3228c52..153f442 100644
--- a/fs/hpfs/dnode.c
+++ b/fs/hpfs/dnode.c
@@ -145,10 +145,10 @@ static void set_last_pointer(struct super_block *s, 
struct dnode *d, dnode_secno
}
}
if (ptr) {
-   d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) + 4);
+   le32_add_cpu(>first_free, 4);
if (le32_to_cpu(d->first_free) > 2048) {
hpfs_error(s, "set_last_pointer: too long dnode %08x", 
le32_to_cpu(d->self));
-   d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) 
- 4);
+   le32_add_cpu(>first_free, -4);
return;
}
de->length = cpu_to_le16(36);
@@ -184,7 +184,7 @@ struct hpfs_dirent *hpfs_add_de(struct super_block *s, 
struct dnode *d,
de->not_8x3 = hpfs_is_name_long(name, namelen);
de->namelen = namelen;
memcpy(de->name, name, namelen);
-   d->first_free = cpu_to_le32(le32_to_cpu(d->first_free) + d_size);
+   le32_add_cpu(>first_free, d_size);
return de;
 }
 
@@ -314,7 +314,7 @@ static int hpfs_add_to_dnode(struct inode *i, dnode_secno 
dno,
set_last_pointer(i->i_sb, ad, de->down ? de_down_pointer(de) : 0);
de = de_next_de(de);
memmove((char *)nd + 20, de, le32_to_cpu(nd->first_free) + (char *)nd - 
(char *)de);
-   nd->first_free = cpu_to_le32(le32_to_cpu(nd->first_free) - ((char *)de 
- (char *)nd - 20));
+   le32_add_cpu(>first_free, -((char *)de - (char *)nd - 20));
memcpy(d, nd, le32_to_cpu(nd->first_free));
for_all_poss(i, hpfs_pos_del, (loff_t)dno << 4, pos);
fix_up_ptrs(i->i_sb, ad);
@@ -474,8 +474,8 @@ static secno move_to_top(struct inode *i, dnode_secno from, 
dnode_secno to)
hpfs_brelse4();
return 0;
}
-   dnode->first_free = cpu_to_le32(le32_to_cpu(dnode->first_free) 
- 4);
-   de->length = cpu_to_le16(le16_to_cpu(de->length) - 4);
+   le32_add_cpu(>first_free, -4);
+   le16_add_cpu(>length, -4);
de->down = 0;
hpfs_mark_4buffers_dirty();
dno = up;
@@ -570,8 +570,8 @@ static void delete_empty_dnode(struct inode *i, dnode_secno 
dno)
for_all_poss(i, hpfs_pos_subst, ((loff_t)dno << 4) | 1, 
((loff_t)up << 4) | p);
if (!down) {
de->down = 0;
-   de->length = cpu_to_le16(le16_to_cpu(de->length) - 4);
-   dnode->first_free = 
cpu_to_le32(le32_to_cpu(dnode->first_free) - 4);
+   le16_add_cpu(>length, -4);
+   le32_add_cpu(>first_free, -4);
memmove(de_next_de(de), (char *)de_next_de(de) + 4,
(char *)dnode + le32_to_cpu(dnode->first_free) 
- (char *)de_next_de(de));
} else {
@@ -647,14 +647,14 @@ static void delete_empty_dnode(struct inode *i, 
dnode_secno dno)
printk("HPFS: warning: unbalanced dnode 
tree, see hpfs.txt 4 more info\n");
printk("HPFS: warning: goin'on\n");
}
-   del->length = 
cpu_to_le16(le16_to_cpu(del->length) + 4);
+   le16_add_cpu(>length, 4);
del->down = 1;
-   d1->first_free = 
cpu_to_le32(le32_to_cpu(d1->first_free) + 4);
+   le32_add_cpu(>first_free, 4);
}
if (dlp && !down) {
-   del->length = 
cpu_to_le16(le16_to_cpu(del->length) - 4);
+   le16_add_cpu(>length, -4);
del->down = 0;
-   d1->first_free = 
cpu_to_le32(le32_to_cpu(d1->first_free) - 4);
+   le32_add_cpu(>first_free, -4);
} else if (down)
*(__le32 *) ((void *) del + 
le16_to_cpu(del->length) - 4) = cpu_to_le32(down);
} else goto endm;
@@ -668,9 +668,9 @@ static void delete_empty_dnode(struct inode *i, dnode_secno 
dno)
memcpy(de_cp, de_prev, le16_to_cpu(de_prev->length));
hpfs_delete_de(i->i_sb, dnode, de_prev);
if (!de_prev->down) {
-   de_prev->length = 
cpu_to_le16(le16_to_cpu(de_prev->length) + 4);
+ 

Re: [PATCH 05/16] perf tools: Keep group information

2012-09-27 Thread Namhyung Kim
Hi Jiri,

On Thu, 27 Sep 2012 19:03:52 +0200, Jiri Olsa wrote:
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index bf5d033ee1b4..3c52d0ab9270 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -830,6 +830,7 @@ int parse_events(struct perf_evlist *evlist, const char 
>> *str,
>>  if (!ret) {
>>  int entries = data.idx - evlist->nr_entries;
>>  perf_evlist__splice_list_tail(evlist, , entries);
>> +evlist->nr_groups += data.nr_groups;
>>  return 0;
>>  }
>>  
>> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
>> index c356e443448d..f6b0254afe17 100644
>> --- a/tools/perf/util/parse-events.h
>> +++ b/tools/perf/util/parse-events.h
>> @@ -65,6 +65,7 @@ struct parse_events__term {
>>  struct parse_events_data__events {
>>  struct list_head list;
>>  int idx;
>> +int nr_groups;
>>  };
>>  
>>  struct parse_events_data__terms {
>> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
>> index cd88209e3c58..d14bb507594b 100644
>> --- a/tools/perf/util/parse-events.y
>> +++ b/tools/perf/util/parse-events.y
>> @@ -122,7 +122,9 @@ group_def:
>>  PE_NAME '{' events '}'
>>  {
>>  struct list_head *list = $3;
>> +struct parse_events_data__events *data = _data;
>>  
>> +data->nr_groups++;
>
> perhaps if you inc nr_groups only if there's more than 1 event,
> you would not need your next patch:
>   perf evlist: Add perf_evlist__recalc_nr_groups
>
> something like:
>
> if (!list_is_last(list))
>   data->nr_groups++;
>

Right!  Will use it in next version.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] omfs: convert to use beXX_add_cpu()

2012-09-27 Thread Wei Yongjun
From: Wei Yongjun 

Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 fs/omfs/file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/omfs/file.c b/fs/omfs/file.c
index 2c6d952..77e3cb2 100644
--- a/fs/omfs/file.c
+++ b/fs/omfs/file.c
@@ -146,8 +146,7 @@ static int omfs_grow_extent(struct inode *inode, struct 
omfs_extent *oe,
be64_to_cpu(entry->e_blocks);
 
if (omfs_allocate_block(inode->i_sb, new_block)) {
-   entry->e_blocks =
-   cpu_to_be64(be64_to_cpu(entry->e_blocks) + 1);
+   be64_add_cpu(>e_blocks, 1);
terminator->e_blocks = ~(cpu_to_be64(
be64_to_cpu(~terminator->e_blocks) + 1));
goto out;
@@ -177,7 +176,7 @@ static int omfs_grow_extent(struct inode *inode, struct 
omfs_extent *oe,
be64_to_cpu(~terminator->e_blocks) + (u64) new_count));
 
/* write in new entry */
-   oe->e_extent_count = cpu_to_be32(1 + be32_to_cpu(oe->e_extent_count));
+   be32_add_cpu(>e_extent_count, 1);
 
 out:
*ret_block = new_block;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 04/21] memory-hotplug: offline and remove memory when removing the memory device

2012-09-27 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu 

We should offline and remove memory when removing the memory device.
The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called. In the 2nd
case, acpi_memory_device_remove() will be called. acpi_memory_device_remove()
will also be called when we unbind the memory device from the driver
acpi_memhotplug. If the type is ACPI_BUS_REMOVAL_EJECT, it means
that the user wants to eject the memory device, and we should offline
and remove memory in acpi_memory_device_remove().

The function remove_memory() is not implemeted now. It only check whether
all memory has been offllined now.

CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
  drivers/acpi/acpi_memhotplug.c |   45 +--
  drivers/base/memory.c  |   39 ++
  include/linux/memory.h |5 
  include/linux/memory_hotplug.h |5 
  mm/memory_hotplug.c|   22 +++
  5 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 7873832..9d47458 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -310,25 +311,44 @@ static int acpi_memory_powerdown_device(struct 
acpi_memory_device *mem_device)
return 0;
  }
  
-static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)

+static int
+acpi_memory_device_remove_memory(struct acpi_memory_device *mem_device)
  {
int result;
struct acpi_memory_info *info, *n;
+   int node = mem_device->nid;
  
-

-   /*
-* Ask the VM to offline this memory range.
-* Note: Assume that this function returns zero on success
-*/
list_for_each_entry_safe(info, n, _device->res_list, list) {
if (info->enabled) {
result = offline_memory(info->start_addr, info->length);
if (result)
return result;
+
+   result = remove_memory(node, info->start_addr,
+  info->length);
+   if (result)
+   return result;
}
+
+   list_del(>list);
kfree(info);
}
  
+	return 0;

+}
+
+static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+{
+   int result;
+
+   /*
+* Ask the VM to offline this memory range.
+* Note: Assume that this function returns zero on success
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+
/* Power-off and eject the device */
result = acpi_memory_powerdown_device(mem_device);
if (result) {
@@ -477,12 +497,23 @@ static int acpi_memory_device_add(struct acpi_device 
*device)
  static int acpi_memory_device_remove(struct acpi_device *device, int type)
  {
struct acpi_memory_device *mem_device = NULL;
-
+   int result;
  
  	if (!device || !acpi_driver_data(device))

return -EINVAL;
  
  	mem_device = acpi_driver_data(device);

+
+   if (type == ACPI_BUS_REMOVAL_EJECT) {
+   /*
+* offline and remove memory only when the memory device is
+* ejected.
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+   }
+
kfree(mem_device);
  
  	return 0;

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..038be73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(struct 
notifier_block *nb)
  }
  EXPORT_SYMBOL(unregister_memory_isolate_notifier);
  
+bool is_memblk_offline(unsigned long start, unsigned long size)

+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(start + size);
+
+   for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   

[PATCH 11/31] perf, x86: Support Haswell v4 LBR format

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the PEBS record.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   18 +++---
 include/linux/perf_event.h |7 ++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
LBR_FORMAT_LIP  = 0x01,
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
+   LBR_FORMAT_EIP_FLAGS2   = 0x04,
 };
 
 /*
@@ -56,6 +57,8 @@ enum {
 LBR_FAR)
 
 #define LBR_FROM_FLAG_MISPRED  (1ULL << 63)
+#define LBR_FROM_FLAG_INTX (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
 #define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events 
*cpuc)
 
for (i = 0; i < x86_pmu.lbr_nr; i++) {
unsigned long lbr_idx = (tos - i) & mask;
-   u64 from, to, mis = 0, pred = 0;
+   u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;
 
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to   + lbr_idx, to);
 
-   if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+   lbr_format == LBR_FORMAT_EIP_FLAGS2) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
-   from = (u64)s64)from) << 1) >> 1);
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+   from = (u64)s64)from) << 1) >> 1);
+   else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+   intx = !!(from & LBR_FROM_FLAG_INTX);
+   abort = !!(from & LBR_FROM_FLAG_ABORT);
+   from = (u64)s64)from) << 3) >> 3);
+   }
}
 
cpuc->lbr_entries[i].from   = from;
cpuc->lbr_entries[i].to = to;
cpuc->lbr_entries[i].mispred= mis;
cpuc->lbr_entries[i].predicted  = pred;
+   cpuc->lbr_entries[i].intx   = intx;
+   cpuc->lbr_entries[i].abort  = abort;
cpuc->lbr_entries[i].reserved   = 0;
}
cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4c2adfa..fadd14b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -634,13 +634,18 @@ struct perf_raw_record {
  *
  * support for mispred, predicted is optional. In case it
  * is not supported mispred = predicted = 0.
+ *
+ * intx: running in a hardware transaction
+ * abort: aborting a hardware transaction
  */
 struct perf_branch_entry {
__u64   from;
__u64   to;
__u64   mispred:1,  /* target mispredicted */
predicted:1,/* target predicted */
-   reserved:62;
+   intx:1, /* in transaction */
+   abort:1,/* transaction abort */
+   reserved:60;
 };
 
 /*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/31] perf, x86: Basic Haswell PEBS support

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add basic PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new events.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |2 ++
 arch/x86/kernel/cpu/perf_event_intel.c|2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   29 +
 3 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a135a5a..8200c69 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -593,6 +593,8 @@ extern struct event_constraint 
intel_westmere_pebs_event_constraints[];
 
 extern struct event_constraint intel_snb_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_pebs_event_constraints[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 82bae24..695abd1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2094,7 +2094,7 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_nhm();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
-
+   x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snb_extra_regs;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index c8ab670..994156f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -413,6 +413,35 @@ struct event_constraint intel_snb_pebs_event_constraints[] 
= {
EVENT_CONSTRAINT_END
 };
 
+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+   INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+   INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
+   INTEL_EVENT_CONSTRAINT(0xc4, 0xf),/* BR_INST_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0x01c5, 0xf),  /* BR_MISP_RETIRED.CONDITIONAL */
+   INTEL_EVENT_CONSTRAINT(0x04c5, 0xf),  /* BR_MISP_RETIRED.ALL_BRANCHES */
+   INTEL_EVENT_CONSTRAINT(0x20c5, 0xf),  /* BR_MISP_RETIRED.NEAR_TAKEN */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8),/* MEM_TRANS_RETIRED.* */
+   INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES 
*/
+   INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB 
*/
+   INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
+   INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d3, 0xf), /* 
MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
+   INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
+   INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
+
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/31] perf, core: Add generic intx/intx_checkpointed counter modifiers

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Expose INTX (count in transaction only, :t) and INTX_CHECKPOINTED
(on transaction abort restore counter, :c) attributes as generic perf event
attributes. These are important for measuring basic hardware transactional 
behaviour.

They also need to be handled in a special way in the Haswell port, so it's 
useful
to have them as generic attributes.

Typically they would be used as a group with:

{cycles,cycles:t,cycles:ct}

Then:

Total cycles = cycles
Percent cycles in transaction = (cycles:t/cycles)*100
Percent cycles in transaction lost due to aborts = ((cycles:t-cycles:ct) / 
cycles)*100

This gives a quick overview of the transactional execution.

Used in followon patches.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 33ed9d6..4c2adfa 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -251,11 +251,14 @@ struct perf_event_attr {
precise_ip :  2, /* skid constraint   */
mmap_data  :  1, /* non-exec mmap data*/
sample_id_all  :  1, /* sample_type all events 
*/
-
exclude_host   :  1, /* don't count in host   */
exclude_guest  :  1, /* don't count in guest  */
+   intx   :  1, /* count inside 
transaction */
+   intx_checkpointed :  1, /* checkpointed in 
transaction */
+
+
 
-   __reserved_1   : 43;
+   __reserved_1   : 41;
 
union {
__u32   wakeup_events;/* wakeup every n events */
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/31] perf, tools: Add intx/intx_checkpoint to perf script and header printing

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Just straight forward use of the new flags

Signed-off-by: Andi Kleen 
---
 tools/perf/util/header.c |6 --
 tools/perf/util/python.c |8 +++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 74ea3c2..558b3b3 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1217,9 +1217,11 @@ static void print_event_desc(struct perf_header *ph, int 
fd, FILE *fp)
(u64)attr.config1,
(u64)attr.config2);
 
-   fprintf(fp, ", excl_usr = %d, excl_kern = %d",
+   fprintf(fp, ", excl_usr = %d, excl_kern = %d, intx = %d, 
intx_cp = %d",
attr.exclude_user,
-   attr.exclude_kernel);
+   attr.exclude_kernel,
+   attr.intx,
+   attr.intx_checkpointed);
 
fprintf(fp, ", excl_host = %d, excl_guest = %d",
attr.exclude_host,
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 0688bfb..70c234d 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -528,6 +528,8 @@ static int pyrf_evsel__init(struct pyrf_evsel *pevsel,
"bp_type",
"bp_addr",
"bp_len",
+   "intx",
+   "intx_checkpointed",
 NULL
};
u64 sample_period = 0;
@@ -548,6 +550,8 @@ static int pyrf_evsel__init(struct pyrf_evsel *pevsel,
watermark = 0,
precise_ip = 0,
mmap_data = 0,
+   intx = 0,
+   intx_cp = 0,
sample_id_all = 1;
int idx = 0;
 
@@ -562,7 +566,7 @@ static int pyrf_evsel__init(struct pyrf_evsel *pevsel,
 _on_exec, , ,
 _ip, _data, 
_id_all,
 _events, _type,
-_addr, _len, ))
+_addr, _len, , 
_cp, ))
return -1;
 
/* union... */
@@ -591,6 +595,8 @@ static int pyrf_evsel__init(struct pyrf_evsel *pevsel,
attr.precise_ip = precise_ip;
attr.mmap_data  = mmap_data;
attr.sample_id_all  = sample_id_all;
+   attr.intx   = intx;
+   attr.intx_checkpointed = intx_cp;
 
perf_evsel__init(>evsel, , idx);
return 0;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/31] perf, x86: Support LBR filtering by INTX/NOTX/ABORT

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add LBR filtering for branch in transaction, branch not in transaction
or transaction abort. This is exposed as new sample types.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   31 +--
 include/linux/perf_event.h |5 +++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index ad5af13..63451b1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -85,9 +85,13 @@ enum {
X86_BR_JMP  = 1 << 9, /* jump */
X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+   X86_BR_ABORT= 1 << 12,/* transaction abort */
+   X86_BR_INTX = 1 << 13,/* in transaction */
+   X86_BR_NOTX = 1 << 14,/* not in transaction */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
+#define X86_BR_ANYTX (X86_BR_NOTX | X86_BR_INTX)
 
 #define X86_BR_ANY   \
(X86_BR_CALL|\
@@ -99,6 +103,7 @@ enum {
 X86_BR_JCC |\
 X86_BR_JMP  |\
 X86_BR_IRQ  |\
+X86_BR_ABORT|\
 X86_BR_IND_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
@@ -347,6 +352,16 @@ static void intel_pmu_setup_sw_lbr_filter(struct 
perf_event *event)
 
if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
mask |= X86_BR_IND_CALL;
+
+   if (br_type & PERF_SAMPLE_BRANCH_ABORT)
+   mask |= X86_BR_ABORT;
+
+   if (br_type & PERF_SAMPLE_BRANCH_INTX)
+   mask |= X86_BR_INTX;
+
+   if (br_type & PERF_SAMPLE_BRANCH_NOTX)
+   mask |= X86_BR_NOTX;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -393,7 +408,8 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
/*
 * no LBR on this PMU
 */
-   if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > 
LBR_FORMAT_MAX_KNOWN)
+   if (!x86_pmu.lbr_nr || 
+   x86_pmu.intel_cap.lbr_format > LBR_FORMAT_MAX_KNOWN)
return -EOPNOTSUPP;
 
/*
@@ -421,7 +437,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
  * decoded (e.g., text page not present), then X86_BR_NONE is
  * returned.
  */
-static int branch_type(unsigned long from, unsigned long to)
+static int branch_type(unsigned long from, unsigned long to, int abort)
 {
struct insn insn;
void *addr;
@@ -441,6 +457,9 @@ static int branch_type(unsigned long from, unsigned long to)
if (from == 0 || to == 0)
return X86_BR_NONE;
 
+   if (abort)
+   return X86_BR_ABORT | to_plm;
+
if (from_plm == X86_BR_USER) {
/*
 * can happen if measuring at the user level only
@@ -577,7 +596,13 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
from = cpuc->lbr_entries[i].from;
to = cpuc->lbr_entries[i].to;
 
-   type = branch_type(from, to);
+   type = branch_type(from, to, cpuc->lbr_entries[i].abort);
+   if (type != X86_BR_NONE && (br_sel & X86_BR_ANYTX)) {
+   if (cpuc->lbr_entries[i].intx)
+   type |= X86_BR_INTX;
+   else
+   type |= X86_BR_NOTX;
+   }
 
/* if type does not correspond, then discard */
if (type == X86_BR_NONE || (br_sel & type) != type) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fadd14b..5bc0e8b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -153,8 +153,11 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */
PERF_SAMPLE_BRANCH_ANY_RETURN   = 1U << 5, /* any return branch */
PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */
+   PERF_SAMPLE_BRANCH_ABORT= 1U << 7, /* transaction aborts */
+   PERF_SAMPLE_BRANCH_INTX = 1U << 8, /* in transaction (flag) */
+   PERF_SAMPLE_BRANCH_NOTX = 1U << 9, /* not in transaction (flag) 
*/
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 7, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/31] perf, x86: Support for printing PMU state on spurious PMIs

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

I had some problems with spurious PMIs, so print the PMU state
on a spurious one. This will not interact well with other NMI users.
Disabled by default, has to be explicitely enabled through sysfs.

Optional, but useful for debugging.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index be0d3c8..baf78e0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -19,6 +20,9 @@
 
 #include "perf_event.h"
 
+static bool print_spurious_pmi __read_mostly;
+module_param(print_spurious_pmi, bool, 0644);
+
 /*
  * Intel PerfMon, used on Core and later.
  */
@@ -1237,6 +1241,10 @@ again:
goto again;
 
 done:
+   if (!handled && print_spurious_pmi) {
+   pr_debug("Spurious PMI\n");
+   perf_event_print_debug();
+   }
intel_pmu_enable_all(0);
return handled;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/31] perf, kvm: Support :t and :c perf modifiers in KVM arch perfmon emulation

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest

Cc: a...@redhat.com
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |3 ++-
 arch/x86/kvm/pmu.c |   10 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 7dab353..cd48669 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1631,7 +1631,8 @@ static int hsw_hw_config(struct perf_event *event)
return 0;
 }
 
-static struct event_constraint counter2_constraint = EVENT_CONSTRAINT(0, 0x4, 
0);
+static struct event_constraint counter2_constraint = 
+   EVENT_CONSTRAINT(0, 0x4, 0);
 
 static struct event_constraint *
 hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9b7ec11..03ba87b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
 
 static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
unsigned config, bool exclude_user, bool exclude_kernel,
-   bool intr)
+   bool intr, bool intx, bool intx_cp)
 {
struct perf_event *event;
struct perf_event_attr attr = {
@@ -172,6 +172,8 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
.exclude_user = exclude_user,
.exclude_kernel = exclude_kernel,
.config = config,
+   .intx = intx,
+   .intx_checkpointed = intx_cp
};
 
attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
@@ -239,7 +241,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
reprogram_counter(pmc, type, config,
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-   eventsel & ARCH_PERFMON_EVENTSEL_INT);
+   eventsel & ARCH_PERFMON_EVENTSEL_INT,
+   !!(eventsel & HSW_INTX),
+   !!(eventsel & HSW_INTX_CHECKPOINTED));
 }
 
 static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
@@ -256,7 +260,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 
en_pmi, int idx)
arch_events[fixed_pmc_events[idx]].event_type,
!(en & 0x2), /* exclude user */
!(en & 0x1), /* exclude kernel */
-   pmi);
+   pmi, false, false);
 }
 
 static inline u8 fixed_en_pmi(u64 ctrl, int idx)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/31] perf, x86: Basic Haswell PMU support

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add basic Haswell PMU support.

Similar to SandyBridge, but has a few new events. Further
differences are handled in followon patches.

There are some new counter flags that need to be prevented
from being set on fixed counters.

Contains fixes from Stephane Eranian

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/perf_event.h  |3 +++
 arch/x86/kernel/cpu/perf_event.h   |7 +++
 arch/x86/kernel/cpu/perf_event_intel.c |   29 +
 3 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index cb4e43b..c1fe6e9 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
 #define ARCH_PERFMON_EVENTSEL_INV  (1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL
 
+#define HSW_INTX   (1ULL << 32)
+#define HSW_INTX_CHECKPOINTED  (1ULL << 33)
+
 #define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY  (1ULL << 41)
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 6605a81..a135a5a 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -226,6 +226,13 @@ struct cpu_hw_events {
EVENT_CONSTRAINT(c, (1ULL << (32+n)), X86_RAW_EVENT_MASK)
 
 /*
+ * Also filter out TSX bits.
+ */
+#define TSX_FIXED_EVENT_CONSTRAINT(c, n)   \
+   EVENT_CONSTRAINT(c, (1ULL << (32+n)),   \
+X86_RAW_EVENT_MASK|HSW_INTX|HSW_INTX_CHECKPOINTED)
+
+/*
  * Constraint on the Event code + UMask
  */
 #define INTEL_UEVENT_CONSTRAINT(c, n)  \
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 0d3d63a..82bae24 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -133,6 +133,17 @@ static struct extra_reg intel_snb_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_hsw_event_constraints[] =
+{
+   TSX_FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   TSX_FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   TSX_FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
+   INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
+   EVENT_CONSTRAINT_END
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
return intel_perfmon_event_map[hw_event];
@@ -2074,6 +2085,24 @@ __init int intel_pmu_init(void)
pr_cont("SandyBridge events, ");
break;
 
+   case 60: /* Haswell Client */
+   case 70:
+   case 71:
+   memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+  sizeof(hw_cache_event_ids));
+
+   intel_pmu_lbr_init_nhm();
+
+   x86_pmu.event_constraints = intel_hsw_event_constraints;
+
+   x86_pmu.extra_regs = intel_snb_extra_regs;
+   /* all extra regs are per-cpu when HT is on */
+   x86_pmu.er_flags |= ERF_HAS_RSP_1;
+   x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+   pr_cont("Haswell events, ");
+   break;
+
default:
switch (x86_pmu.version) {
case 1:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/31] perf, tools: Add support for generic transaction events to perf userspace

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add the generic transaction events with aliases to the parser, lexer
and the reverse map code.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/evsel.c|   40 
 tools/perf/util/parse-events.c |   24 
 tools/perf/util/parse-events.l |   19 ++-
 tools/perf/util/parse-events.y |4 ++--
 4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ff084b0..8790069 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -288,6 +288,42 @@ static int perf_evsel__hw_cache_name(struct perf_evsel 
*evsel, char *bf, size_t
return ret + perf_evsel__add_modifiers(evsel, bf + ret, size - ret);
 }
 
+static const char *transaction_name[] = {
+ [PERF_COUNT_HW_TRANSACTION_START]  = "transaction-start",
+ [PERF_COUNT_HW_TRANSACTION_COMMIT] = "transaction-commit",
+ [PERF_COUNT_HW_TRANSACTION_ABORT]  = "transaction-abort",
+ [PERF_COUNT_HW_ELISION_START]  = "elision-start",
+ [PERF_COUNT_HW_ELISION_COMMIT] = "elision-commit",
+ [PERF_COUNT_HW_ELISION_ABORT]  = "elision-abort",
+};
+
+static const char *transaction_reason[] = {
+ [PERF_COUNT_HW_ABORT_ALL]  = "all",
+ [PERF_COUNT_HW_ABORT_CONFLICT] = "conflict",
+ [PERF_COUNT_HW_ABORT_CAPACITY] = "capacity",
+};
+
+static int perf_evsel__transaction_name(struct perf_evsel *evsel, char *bf,
+   size_t size)
+{
+   u64 config = evsel->attr.config;
+   u8 name = config & 0xff, reason = (config >> 8) & 0xff;
+
+   if (name < PERF_COUNT_HW_TRANSACTION_MAX &&
+   reason < PERF_COUNT_HW_ABORT_MAX) {
+   const char *sep = "", *rtxt = "";
+   if (name == PERF_COUNT_HW_TRANSACTION_ABORT ||
+   name == PERF_COUNT_HW_ELISION_ABORT) {
+   sep = "-";
+   rtxt = transaction_reason[reason];
+   }
+   return scnprintf(bf, size, "%s%s%s", transaction_name[name],
+sep, rtxt);
+   }
+
+   return scnprintf(bf, size, "invalid-transaction");
+}
+
 static int perf_evsel__raw_name(struct perf_evsel *evsel, char *bf, size_t 
size)
 {
int ret = scnprintf(bf, size, "raw 0x%" PRIx64, evsel->attr.config);
@@ -326,6 +362,10 @@ const char *perf_evsel__name(struct perf_evsel *evsel)
perf_evsel__bp_name(evsel, bf, sizeof(bf));
break;
 
+   case PERF_TYPE_HW_TRANSACTION:
+   perf_evsel__transaction_name(evsel, bf, sizeof(bf));
+   break;
+
default:
scnprintf(bf, sizeof(bf), "%s", "unknown attr type");
break;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5668ca6..e24a490 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -110,6 +110,20 @@ static struct event_symbol 
event_symbols_sw[PERF_COUNT_SW_MAX] = {
},
 };
 
+static struct event_symbol event_symbols_txn[] = {
+   { .symbol = "transaction-start",  .alias = "tx-start"},
+   { .symbol = "transaction-commit", .alias = "tx-commit"   },
+   { .symbol = "transaction-abort-all",  .alias = "tx-abort"},
+   { .symbol = "transaction-abort-capacity", .alias = "tx-capacity" },
+   { .symbol = "transaction-abort-conflict", .alias = "tx-conflict" },
+   { .symbol = "elision-start",  .alias = "le-start"},
+   { .symbol = "elision-commit", .alias = "le-commit"   },
+   { .symbol = "elision-abort-all",  .alias = "le-abort"},
+   { .symbol = "elision-abort-capacity", .alias = "le-capacity" },
+   { .symbol = "elision-abort-conflict", .alias = "le-conflict" },
+
+};
+
 #define __PERF_EVENT_FIELD(config, name) \
((config & PERF_EVENT_##name##_MASK) >> PERF_EVENT_##name##_SHIFT)
 
@@ -232,6 +246,9 @@ const char *event_type(int type)
case PERF_TYPE_HW_CACHE:
return "hardware-cache";
 
+   case PERF_TYPE_HW_TRANSACTION:
+   return "hardware-transaction";
+
default:
break;
}
@@ -800,6 +817,7 @@ static const char * const event_type_descriptors[] = {
"Hardware cache event",
"Raw hardware event descriptor",
"Hardware breakpoint",
+   "Hardware transaction event",
 };
 
 /*
@@ -909,6 +927,9 @@ void print_events_type(u8 type)
 {
if (type == PERF_TYPE_SOFTWARE)
__print_events_type(type, event_symbols_sw, PERF_COUNT_SW_MAX);
+   else if (type == PERF_TYPE_HW_TRANSACTION)
+   __print_events_type(type, event_symbols_txn,
+   ARRAY_SIZE(event_symbols_txn));
else
__print_events_type(type, event_symbols_hw, PERF_COUNT_HW_MAX);
 }
@@ -984,6 +1005,9 @@ void 

[PATCH 14/31] perf, tools: Add abort,notx,intx branch filter options to perf report -j

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Make perf report -j aware of the new intx,notx,abort branch qualifiers.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-record.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4db6e1b..e851bf2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -666,6 +666,9 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+   BRANCH_OPT("abort", PERF_SAMPLE_BRANCH_ABORT),
+   BRANCH_OPT("intx", PERF_SAMPLE_BRANCH_INTX),
+   BRANCH_OPT("notx", PERF_SAMPLE_BRANCH_NOTX),
BRANCH_END
 };
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/31] perf, core: Add generic transaction flags

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add a generic qualifier for transaction events, as a new sample
type that returns a flag word. This is particularly useful
for qualifying aborts: to distinguish aborts which happen
due to asynchronous events (like conflicts caused by another
CPU) versus instructions that lead to an abort.

The tuning strategies are very different for those cases,
so it's important to distinguish them easily and early.

Since it's inconvenient and inflexible to filter for this
in the kernel we report all the events out and allow
some post processing in user space.

The flags are based on the Intel TSX events, but should be fairly
generic and mostly applicable to other architectures too. In addition
to various flag words there's also reserved space to report an
program supplied abort code. For TSX this is used to distinguish specific
classes of aborts, like a lock busy abort when doing lock elision.

This adds the perf core glue needed for reporting the new flag word out.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h |   25 -
 kernel/events/core.c   |6 ++
 2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1867bed..beafd7f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -156,8 +156,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_RAW = 1U << 10,
PERF_SAMPLE_BRANCH_STACK= 1U << 11,
PERF_SAMPLE_WEIGHT  = 1U << 12,
+   PERF_SAMPLE_TRANSACTION = 1U << 13,
 
-   PERF_SAMPLE_MAX = 1U << 13, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 14, /* non-ABI */
 };
 
 /*
@@ -192,6 +193,26 @@ enum perf_branch_sample_type {
 PERF_SAMPLE_BRANCH_HV)
 
 /*
+ * Values for the transaction event qualifier, mostly for abort events.
+ */
+enum {
+   PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */
+   PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */
+   PERF_SAMPLE_TXN_SYNC= (1 << 2), /* Instruction is related */
+   PERF_SAMPLE_TXN_ASYNC   = (1 << 3), /* Instruction not related */
+   PERF_SAMPLE_TXN_RETRY   = (1 << 4), /* Retry possible */
+   PERF_SAMPLE_TXN_CONFLICT= (1 << 5), /* Conflict abort */
+   PERF_SAMPLE_TXN_CAPACITY= (1 << 6), /* Capacity abort */
+
+   PERF_SAMPLE_TXN_MAX = (1 << 7),  /* non-ABI */
+
+   /* bits 24..31 are reserved for the abort code */
+
+   PERF_SAMPLE_TXN_ABORT_MASK  = 0xff00,
+   PERF_SAMPLE_TXN_ABORT_SHIFT = 24,
+};
+
+/*
  * The format of the data returned by read() on a perf event fd,
  * as specified by attr.read_format:
  *
@@ -1173,6 +1194,7 @@ struct perf_sample_data {
struct perf_raw_record  *raw;
struct perf_branch_stack*br_stack;
u64 weight;
+   u64 transaction;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -1184,6 +1206,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->br_stack = NULL;
data->period= period;
data->weight = 0;
+   data->transaction = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 74e4ff4..6af2e76 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_WEIGHT)
size += sizeof(data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   size += sizeof(data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -3963,6 +3966,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_WEIGHT)
perf_output_put(handle, data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   perf_output_put(handle, data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/31] perf, tools: Handle XBEGIN like a jump

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

So that the browser still shows the abort label

Signed-off-by: Andi Kleen 
---
 tools/perf/util/annotate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 3a282c0..bf549cd 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -399,6 +399,8 @@ static struct ins instructions[] = {
{ .name = "testb", .ops  = _ops, },
{ .name = "testl", .ops  = _ops, },
{ .name = "xadd",  .ops  = _ops, },
+   { .name = "xbeginl", .ops  = _ops, },
+   { .name = "xbeginq", .ops  = _ops, },
 };
 
 static int ins__cmp(const void *name, const void *insp)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/31] perf, x86: Implement the :t and :c qualifiers for Haswell

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Implement the TSX transaction and checkpointed transaction qualifiers for
Haswell. This allows e.g. to profile the number of cycles in transactions.
The checkpointed qualifier requires forcing the event to
counter 2, implement this with a custom constraint for Haswell.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   38 +++-
 1 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 695abd1..7dab353 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -826,7 +827,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct 
perf_event *event)
return true;
 
/* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1)
+   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
+   x86_pmu.intel_cap.pebs_format < 2)
return true;
 
return false;
@@ -1614,6 +1616,38 @@ static struct attribute *intel_arch_formats_attr[] = {
NULL,
 };
 
+static int hsw_hw_config(struct perf_event *event)
+{
+   int ret = intel_pmu_hw_config(event);
+
+   if (ret)
+   return ret;
+   if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
+   return 0;
+   if (event->attr.intx)
+   event->hw.config |= HSW_INTX;
+   if (event->attr.intx_checkpointed)
+   event->hw.config |= HSW_INTX_CHECKPOINTED;
+   return 0;
+}
+
+static struct event_constraint counter2_constraint = EVENT_CONSTRAINT(0, 0x4, 
0);
+
+static struct event_constraint *
+hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+   struct event_constraint *c = intel_get_event_constraints(cpuc, event);
+
+   /* Handle special quirk on intx_checkpointed only in counter 2 */
+   if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+   if (c->idxmsk64 & (1U << 2))
+   return _constraint;
+   return 
+   }
+
+   return c;
+}
+
 static __initconst const struct x86_pmu core_pmu = {
.name   = "core",
.handle_irq = x86_pmu_handle_irq,
@@ -2100,6 +2134,8 @@ __init int intel_pmu_init(void)
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+   x86_pmu.hw_config = hsw_hw_config;
+   x86_pmu.get_event_constraints = hsw_get_event_constraints;
pr_cont("Haswell events, ");
break;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/31] perf, tools: Add :c,:t event modifiers in perf tools

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Haswell supports new per event qualifiers for TSX transactions and
checkpointed transaction qualifiers that can be used to compute the
events discarded due to aborts.

Implement it in the usertool as :t and :c

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-list.txt |6 ++
 tools/perf/util/evsel.c|   14 --
 tools/perf/util/parse-events.c |7 +++
 tools/perf/util/parse-events.l |2 +-
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index ddc2252..52ea166 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -34,6 +34,12 @@ Intel PEBS and can be specified multiple times:
 
 The PEBS implementation now supports up to 2.
 
+On Intel Haswell CPUs t and c can be specified to request that the event
+is only counted inside transactions ('t') or that the counter is rolled
+back to the beginning of the transaction on a abort ('c'). This can
+be used to account for transaction residency and cycles lost to transaction
+aborts.
+
 RAW HARDWARE EVENT DESCRIPTOR
 -
 Even when an event is not available in a symbolic form within perf right now,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2eaae14..cda3805 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -93,11 +93,12 @@ static int perf_evsel__add_modifiers(struct perf_evsel 
*evsel, char *bf, size_t
struct perf_event_attr *attr = >attr;
bool exclude_guest_default = false;
 
-#define MOD_PRINT(context, mod)do {
\
-   if (!attr->exclude_##context) { \
+#define __MOD_PRINT(test, mod) do {\
+   if (test) { \
if (!colon) colon = ++r;\
r += scnprintf(bf + r, size - r, "%c", mod);\
} } while(0)
+#define MOD_PRINT(context, mod) __MOD_PRINT(!attr->exclude_##context, mod)
 
if (attr->exclude_kernel || attr->exclude_user || attr->exclude_hv) {
MOD_PRINT(kernel, 'k');
@@ -113,11 +114,20 @@ static int perf_evsel__add_modifiers(struct perf_evsel 
*evsel, char *bf, size_t
exclude_guest_default = true;
}
 
+   if (attr->intx || attr->intx_checkpointed) {
+   __MOD_PRINT(attr->intx_checkpointed, 'c');
+   __MOD_PRINT(attr->intx, 't');
+   /* Set the bizarro flag: */
+   exclude_guest_default = true;
+   }
+
if (attr->exclude_host || attr->exclude_guest == exclude_guest_default) 
{
MOD_PRINT(host, 'H');
MOD_PRINT(guest, 'G');
}
+
 #undef MOD_PRINT
+#undef __MOD_PRINT
if (colon)
bf[colon - 1] = ':';
return r;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 74a5af4..5668ca6 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -628,6 +628,7 @@ int parse_events_modifier(struct list_head *list, char *str)
struct perf_evsel *evsel;
int exclude = 0, exclude_GH = 0;
int eu = 0, ek = 0, eh = 0, eH = 0, eG = 0, precise = 0;
+   int intx = 0, intx_cp = 0;
 
if (str == NULL)
return 0;
@@ -655,6 +656,10 @@ int parse_events_modifier(struct list_head *list, char 
*str)
eH = 0;
} else if (*str == 'p') {
precise++;
+   } else if (*str == 't') {
+   intx = 1;
+   } else if (*str == 'c') {
+   intx_cp = 1;
} else
break;
 
@@ -681,6 +686,8 @@ int parse_events_modifier(struct list_head *list, char *str)
evsel->attr.precise_ip = precise;
evsel->attr.exclude_host   = eH;
evsel->attr.exclude_guest  = eG;
+   evsel->attr.intx_checkpointed = intx_cp;
+   evsel->attr.intx = intx;
}
 
return 0;
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 384ca74..96ab100 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -75,7 +75,7 @@ num_dec   [0-9]+
 num_hex0x[a-fA-F0-9]+
 num_raw_hex[a-fA-F0-9]+
 name   [a-zA-Z_*?][a-zA-Z0-9_*?]*
-modifier_event [ukhpGH]{1,8}
+modifier_event [ukhpGHct]+
 modifier_bp[rwx]{1,3}
 
 %%
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/31] perf, x86: Report PEBS event in a raw format

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add support for reporting PEBS records in a raw format that can
be then parsed by perf script.

We exposed most of the Haswell PEBS fields in a generic way
in this patchkit:
- Aborted cycles is in weight
- Memory latency is in weight
- DataLA is in address
- EventingRIP is used for precise ip
- tsx_tuning and some bits of the abort code in RAX are
mapped to transaction flags

Left over are the general registers. We need them for some analysis
too: for example for loop trip count and string instruction trip
count sampling.

There isn't really any good way to generalize general registers.
Obviously they are different for every architecture.

So patch exports the RAW PEBS record when requested.

With the new perf script infrastructure that was recently added
it is reasonably easy and clean to process with script.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 994156f..5b60dcf 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -586,6 +586,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
struct pebs_record_core *pebs = __pebs;
struct perf_sample_data data;
struct pt_regs regs;
+   struct perf_raw_record raw;
 
if (!intel_pmu_save_and_restart(event))
return;
@@ -616,6 +617,12 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
else
regs.flags &= ~PERF_EFLAGS_EXACT;
 
+   if (event->attr.sample_type & PERF_SAMPLE_RAW) {
+   raw.size = x86_pmu.pebs_record_size;
+   raw.data = __pebs;
+   data.raw = 
+   }
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/31] perf, tools: Add browser support for transaction flags

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add histogram support for the transaction flags. Each flags instance becomes
a separate histogram. Support sorting and displaying the flags in report
and top.

The patch is fairly large, but it's really mostly just plumbing to pass the
flags around.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-annotate.c |2 +-
 tools/perf/builtin-diff.c |8 --
 tools/perf/builtin-report.c   |4 +-
 tools/perf/builtin-top.c  |4 +-
 tools/perf/util/hist.c|3 +-
 tools/perf/util/hist.h|3 +-
 tools/perf/util/sort.c|   50 +
 tools/perf/util/sort.h|2 +
 8 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index c522367..a6f0ffc 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
 
-   he = __hists__add_entry(>hists, al, NULL, 1, 1);
+   he = __hists__add_entry(>hists, al, NULL, 1, 1, 0);
if (he == NULL)
return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 04c1d21..cef8f5c 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -31,9 +31,10 @@ struct perf_diff {
 
 static int hists__add_entry(struct hists *self,
struct addr_location *al, u64 period,
-   u64 weight)
+   u64 weight, u64 transaction)
 {
-   if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
+   if (__hists__add_entry(self, al, NULL, period, weight, transaction)
+   != NULL)
return 0;
return -ENOMEM;
 }
@@ -57,7 +58,8 @@ static int diff__process_sample_event(struct perf_tool *tool,
if (al.filtered || al.sym == NULL)
return 0;
 
-   if (hists__add_entry(>hists, , sample->period, 
sample->weight)) {
+   if (hists__add_entry(>hists, , sample->period, 
sample->weight,
+sample->transaction)) {
pr_warning("problem incrementing symbol period, skipping 
event\n");
return -1;
}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2c73578..b22a905 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -147,7 +147,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel 
*evsel,
}
 
he = __hists__add_entry(>hists, al, parent, sample->period,
-   sample->weight);
+   sample->weight, sample->transaction);
if (he == NULL)
return -ENOMEM;
 
@@ -596,7 +596,7 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __used)
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-  " abort, intx,  weight, global_weight"),
+  " abort, intx,  weight, global_weight, transaction"),
OPT_BOOLEAN(0, "showcpuutilization", _conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", _pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ee37ddc..5e44f7c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -270,7 +270,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct 
perf_evsel *evsel,
struct hist_entry *he;
 
he = __hists__add_entry(>hists, al, NULL, sample->period,
-   sample->weight);
+   sample->weight, sample->transaction);
if (he == NULL)
return NULL;
 
@@ -1230,7 +1230,7 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __used)
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-  " abort, intx, weight, global_weight"),
+  " abort, intx, weight, global_weight, transaction"),
OPT_BOOLEAN('n', "show-nr-samples", _conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_DEFAULT('G', "call-graph", , "output_type,min_percent, 
call_order",
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 566badc..072b418 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -323,7 +323,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists 
*self,
 struct hist_entry *__hists__add_entry(struct hists *self,
  struct addr_location *al,
 

[PATCH 18/31] perf, core: Add a concept of a weightened sample

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was.  This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms)  The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h |9 +++--
 kernel/events/core.c   |6 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5bc0e8b..c488ae2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -130,8 +130,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_STREAM_ID   = 1U << 9,
PERF_SAMPLE_RAW = 1U << 10,
PERF_SAMPLE_BRANCH_STACK= 1U << 11,
+   PERF_SAMPLE_WEIGHT  = 1U << 12,
 
-   PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 13, /* non-ABI */
 };
 
 /*
@@ -190,8 +191,9 @@ enum perf_event_read_format {
PERF_FORMAT_TOTAL_TIME_RUNNING  = 1U << 1,
PERF_FORMAT_ID  = 1U << 2,
PERF_FORMAT_GROUP   = 1U << 3,
+   PERF_FORMAT_WEIGHT  = 1U << 4,
 
-   PERF_FORMAT_MAX = 1U << 4,  /* non-ABI */
+   PERF_FORMAT_MAX = 1U << 5,  /* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER064  /* sizeof first published struct */
@@ -533,6 +535,7 @@ enum perf_event_type {
 *  { u64   stream_id;} && PERF_SAMPLE_STREAM_ID
 *  { u32   cpu, res; } && PERF_SAMPLE_CPU
 *  { u64   period;   } && PERF_SAMPLE_PERIOD
+*  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *
 *  { struct read_formatvalues;   } && PERF_SAMPLE_READ
 *
@@ -1144,6 +1147,7 @@ struct perf_sample_data {
struct perf_callchain_entry *callchain;
struct perf_raw_record  *raw;
struct perf_branch_stack*br_stack;
+   u64 weight;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -1154,6 +1158,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->raw  = NULL;
data->br_stack = NULL;
data->period= period;
+   data->weight = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7fee567..74e4ff4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -949,6 +949,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_PERIOD)
size += sizeof(data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   size += sizeof(data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -3957,6 +3960,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PERIOD)
perf_output_put(handle, data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   perf_output_put(handle, data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/31] perf, x86: Add the Haswell implementation of the generic transaction events

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Straight forward table mapping from the generic transactional memory events
to the Haswell TSX events.

One special case is that the abort-all events force PEBS with precise level
two. Without using eventingrip abort IPs are generally useless (you get
something after the abort). So we really want PEBS here for any
sampling. Since it was very unintuitive for users to do this manually
I just made this default.

To do this the mapping table sets a magic flag, that is later checked in
the event setup and forces the precise event. For counting events
PEBS is not forced.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   47 
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 2c4cbf3..be0d3c8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -820,6 +820,38 @@ static __initconst const u64 atom_hw_cache_event_ids
  },
 };
 
+#define FORCE_PEBS (1ULL << 63)
+
+static u64 __initdata hsw_transaction_event_ids
+   [PERF_COUNT_HW_TRANSACTION_MAX]
+   [PERF_COUNT_HW_ABORT_MAX] =
+{
+   /* RTM_RETIRED.START */
+   [ PERF_COUNT_HW_TRANSACTION_START ]  = { 0x1c9 },
+   /* RTM_RETIRED.COMMIT */
+   [ PERF_COUNT_HW_TRANSACTION_COMMIT ] = { 0x2c9 },
+   [ PERF_COUNT_HW_TRANSACTION_ABORT ]  = {
+   /* RTM_RETIRED.ABORTED with pebs */
+   [ PERF_COUNT_HW_ABORT_ALL ]  = 0x4c9|FORCE_PEBS,
+   /* TX_MEM.ABORT_CONFLICT */
+   [ PERF_COUNT_HW_ABORT_CONFLICT ] = 0x154,
+   /* TX_MEM.ABORT_CAPACITY */
+   [ PERF_COUNT_HW_ABORT_CAPACITY ] = 0x254,
+   },
+   /* HLE_RETIRED.START */
+   [ PERF_COUNT_HW_ELISION_START ]  = { 0x1c8 },
+   /* HLE_RETIRED.COMMIT */
+   [ PERF_COUNT_HW_ELISION_COMMIT ] = { 0x2c8 },
+   [ PERF_COUNT_HW_ELISION_ABORT ]  = {
+   /* HLE_RETIRED.ABORTED with pebs */
+   [ PERF_COUNT_HW_ABORT_ALL ]  = 0x4c8|FORCE_PEBS,
+   /* TX_MEM.ABORT_CONFLICT */
+   [ PERF_COUNT_HW_ABORT_CONFLICT ] = 0x154,
+   /* TX_MEM.ABORT_CAPACITY */
+   [ PERF_COUNT_HW_ABORT_CAPACITY ] = 0x254,
+   }
+};
+
 static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
 {
/* user explicitly requested branch sampling */
@@ -1655,6 +1687,18 @@ static int hsw_hw_config(struct perf_event *event)
return -EIO;
event->hw.config |= HSW_INTX_CHECKPOINTED;
}
+
+   /*
+* Sampling transaction abort events work very poorly without
+* PEBS. So force it.
+*/
+   if (event->attr.type == PERF_TYPE_HW_TRANSACTION &&
+   (event->hw.config & FORCE_PEBS)) {
+   event->hw.config &= ~FORCE_PEBS;
+   if (is_sampling_event(event))
+   event->attr.precise_ip = 2;
+   }
+
return 0;
 }
 
@@ -2165,6 +2209,9 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
+   memcpy(hw_transaction_event_ids, hsw_transaction_event_ids,
+  sizeof(hsw_transaction_event_ids));
+
pr_cont("Haswell events, ");
break;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/31] perf, tools: Add perf stat --transaction

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the intx and intx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction 
length.

This is a reasonable overview over the success of the transactions.

Enable with a new --transaction / -T option.

This requires measuring these events in a group, since they depend on each
other

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |3 +
 tools/perf/builtin-stat.c  |  104 +---
 2 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 2fa173b..6e55bd9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -108,7 +108,10 @@ with it.  --append may be used here.  Examples:
  3>results  perf stat --log-fd 3  -- $cmd
  3>>results perf stat --log-fd 3 --append -- $cmd
 
+-T::
+--transaction::
 
+Print statistics of transactional execution.  Implies --group.
 
 EXAMPLES
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 861f0ae..2364605 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -64,6 +64,9 @@
 #define CNTR_NOT_SUPPORTED ""
 #define CNTR_NOT_COUNTED   ""
 
+#define is_intx(e) ((e)->attr.intx && !(e)->attr.intx_checkpointed)
+#define is_intx_cp(e)  ((e)->attr.intx && (e)->attr.intx_checkpointed)
+
 static struct perf_event_attr default_attrs[] = {
 
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
@@ -171,7 +174,21 @@ static struct perf_event_attr very_very_detailed_attrs[] = 
{
(PERF_COUNT_HW_CACHE_RESULT_MISS<< 16)  
},
 };
 
+/*
+ * Transactional memory stats (-T)
+ * Must run as a group.
+ */
+static struct perf_event_attr transaction_attrs[] = {
+  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK 
},
 
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS   
},
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES 
},
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .intx = 1  
},
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES,
+.intx = 1, .intx_checkpointed = 1 },
+  { .type = PERF_TYPE_HW_TRANSACTION, .config = 
PERF_COUNT_HW_TRANSACTION_START},
+  { .type = PERF_TYPE_HW_TRANSACTION, .config = PERF_COUNT_HW_ELISION_START
},
+};
 
 static struct perf_evlist  *evsel_list;
 
@@ -187,6 +204,7 @@ static bool no_aggr 
= false;
 static pid_t   child_pid   = -1;
 static boolnull_run=  false;
 static int detailed_run=  0;
+static booltransaction_run =  false;
 static boolsync_run=  false;
 static boolbig_num =  true;
 static int big_num_opt =  -1;
@@ -275,7 +293,11 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intx_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intxcp_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 
 static int create_perf_stat_counter(struct perf_evsel *evsel,
struct perf_evsel *first)
@@ -350,10 +372,18 @@ static void update_shadow_stats(struct perf_evsel 
*counter, u64 *count)
 {
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK))
update_stats(_nsecs_stats[0], count[0]);
-   else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
-   update_stats(_cycles_stats[0], count[0]);
-   else if (perf_evsel__match(counter, HARDWARE, 
HW_STALLED_CYCLES_FRONTEND))
-   update_stats(_stalled_cycles_front_stats[0], count[0]);
+   else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES)) {
+   if (is_intx(counter))
+   update_stats(_cycles_intx_stats[0], count[0]);
+   else if (is_intx_cp(counter))
+   update_stats(_cycles_intxcp_stats[0], count[0]);
+   else
+   update_stats(_cycles_stats[0], count[0]);
+   } else if (perf_evsel__match(counter, HW_TRANSACTION,
+

[PATCH 28/31] perf, x86: Add Haswell specific transaction flag reporting

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 930bc65..6df29c7 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -670,6 +670,15 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
data.weight = ((struct pebs_record_v2 *)pebs)->nhm.lat;
}
 
+   if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) &&
+   x86_pmu.intel_cap.pebs_format >= 2) {
+   data.transaction =
+((struct pebs_record_v2 *)pebs)->tsx_tuning >> 32;
+   if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) &&
+   (pebs->ax & 1))
+   data.transaction |= pebs->ax & 0xff00;
+   }
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/31] perf, x86: Avoid checkpointed counters causing excessive TSX aborts

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with a
ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:
- Using the full counter width for counting counters (previous patch)
- Forbid sampling for checkpointed counters. It's not too useful anyways,
checkpointing is mainly for counting.
- On a PMI always set back checkpointed counters to zero.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   26 +-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index e302186..83ced1a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
x86_perf_event_update(event);
+   /*
+* For a checkpointed counter always reset back to 0.  This
+* avoids a situation where the counter overflows, aborts the
+* transaction and is then set back to shortly before the
+* overflow, and overflows and aborts again.
+*/
+   if (event->attr.intx_checkpointed) {
+   /* No race with NMIs because the counter should not be armed */
+   wrmsrl(event->hw.event_base, 0);
+   local64_set(>hw.prev_count, 0);
+   }
return x86_perf_event_set_period(event);
 }
 
@@ -1162,6 +1173,10 @@ again:
x86_pmu.drain_pebs(regs);
}
 
+   /* XXX move somewhere else. */
+   if (cpuc->events[2] && cpuc->events[2]->attr.intx_checkpointed)
+   status |= (1ULL << 2);
+
for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -1626,8 +1641,17 @@ static int hsw_hw_config(struct perf_event *event)
return 0;
if (event->attr.intx)
event->hw.config |= HSW_INTX;
-   if (event->attr.intx_checkpointed)
+   if (event->attr.intx_checkpointed) {
+   /*
+* Sampling of checkpointed events can cause situations where
+* the CPU constantly aborts because of a overflow, which is
+* then checkpointed back and ignored. Forbid checkpointing
+* for sampling.
+*/
+   if (is_sampling_event(event))
+   return -EIO;
event->hw.config |= HSW_INTX_CHECKPOINTED;
+   }
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/31] perf, x86: Support weight samples for PEBS

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

When a weighted sample is requested, first try to report the TSX abort cost
on Haswell. If that is not available report the memory latency. This
allows profiling both by abort cost and by memory latencies.

Memory latencies requires enabling a different PEBS mode (LL).
When both address and weight is requested address wins.

The LL mode only works for memory related PEBS events, so add a
separate event constraint table for those.

I only did this for Haswell for now, but it could be added
for several other Intel CPUs too by just adding the right
table for them.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |4 ++
 arch/x86/kernel/cpu/perf_event_intel.c|4 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   47 +++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8550601..724a141 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -168,6 +168,7 @@ struct cpu_hw_events {
u64 perf_ctr_virt_mask;
 
void*kfree_on_online;
+   u8  *memory_latency_events;
 };
 
 #define __EVENT_CONSTRAINT(c, n, m, w, o) {\
@@ -392,6 +393,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void(*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
+   struct event_constraint *memory_lat_events;
 
/*
 * Intel LBR
@@ -596,6 +598,8 @@ extern struct event_constraint 
intel_snb_pebs_event_constraints[];
 
 extern struct event_constraint intel_hsw_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_memory_latency_events[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 83ced1a..2c4cbf3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1637,6 +1637,9 @@ static int hsw_hw_config(struct perf_event *event)
 
if (ret)
return ret;
+   /* PEBS cannot capture both */
+   if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+   event->attr.sample_type &= ~PERF_SAMPLE_WEIGHT;
if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
return 0;
if (event->attr.intx)
@@ -2161,6 +2164,7 @@ __init int intel_pmu_init(void)
 
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
+   x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
pr_cont("Haswell events, ");
break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 81fc14a..930bc65 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -442,6 +442,17 @@ struct event_constraint intel_hsw_pebs_event_constraints[] 
= {
EVENT_CONSTRAINT_END
 };
 
+/* Subset of PEBS events supporting memory latency. Not used for scheduling */
+
+struct event_constraint intel_hsw_memory_latency_events[] = {
+   INTEL_EVENT_CONSTRAINT(0xcd, 0), /* MEM_TRANS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd0, 0), /* MEM_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd1, 0), /* MEM_LOAD_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd2, 0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd3, 0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
@@ -459,6 +470,21 @@ struct event_constraint *intel_pebs_constraints(struct 
perf_event *event)
return 
 }
 
+static bool is_memory_lat_event(struct perf_event *event)
+{
+   struct event_constraint *c;
+
+   if (x86_pmu.intel_cap.pebs_format < 1)
+   return false;
+   if (!x86_pmu.memory_lat_events)
+   return false;
+   for_each_event_constraint(c, x86_pmu.memory_lat_events) {
+   if ((event->hw.config & c->cmask) == c->code)
+   return true;
+   }
+   return false;
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -466,7 +492,12 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
-   cpuc->pebs_enabled |= 1ULL << hwc->idx;
+   /* When weight is requested enable LL instead of normal PEBS */
+   if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) &&
+   is_memory_lat_event(event))
+   

[PATCH 22/31] perf, core: Define generic hardware transaction events

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

For tuning and debugging hardware transactional memory it is very
important to have hardware counter support.

This patch adds a simple and hopefully generic set of hardware events
for transactional memory and lock elision.

It is based on the TSX PMU support because I don't have any
information on other CPU's HTM support.

There are start, commit and abort events for transactions and
for lock elision.

The abort events are qualified by a generic abort reason that should
be roughly applicable to a wide range of memory transaction systems:

capacity for the buffering capacity
conflict for a dynamic conflict between CPUs
all  for all aborts. On TSX this can be precisely sampled.

We need to split the events into general transaction events and lock
elision events. Architecturs with HTM but no lock elision would only
use the first set.

Implementation for Haswell in a followon patch.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c |   36 
 arch/x86/kernel/cpu/perf_event.h |4 
 include/linux/perf_event.h   |   25 +
 3 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 87c2ab0..cee8f80 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -53,6 +53,13 @@ u64 __read_mostly hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_RESULT_MAX];
 
 /*
+ * Generalized transactional memory event table.
+ */
+u64 __read_mostly hw_transaction_event_ids
+   [PERF_COUNT_HW_TRANSACTION_MAX]
+   [PERF_COUNT_HW_ABORT_MAX];
+
+/*
  * Propagate event elapsed time into the generic event.
  * Can only be executed on the CPU where the event is active.
  * Returns the delta events processed.
@@ -285,6 +292,31 @@ set_ext_hw_attr(struct hw_perf_event *hwc, struct 
perf_event *event)
return x86_pmu_extra_regs(val, event);
 }
 
+static int
+set_hw_transaction_attr(struct hw_perf_event *hwc, struct perf_event *event)
+{
+   struct perf_event_attr *attr = >attr;
+   u64 config, val;
+   unsigned int op, reason;
+
+   config = attr->config;
+   op = config & 0xff;
+   if (op >= PERF_COUNT_HW_TRANSACTION_MAX)
+   return -EINVAL;
+   reason = (config >> 8) & 0xff;
+   if (reason >= PERF_COUNT_HW_ABORT_MAX)
+   return -EINVAL;
+   if (config >> 16)
+   return -EINVAL;
+   val = hw_transaction_event_ids[config][reason];
+   if (val == 0)
+   return -ENOENT;
+   if (val == -1)
+   return -EINVAL;
+   hwc->config |= val;
+   return 0;
+}
+
 int x86_setup_perfctr(struct perf_event *event)
 {
struct perf_event_attr *attr = >attr;
@@ -312,6 +344,9 @@ int x86_setup_perfctr(struct perf_event *event)
if (attr->type == PERF_TYPE_HW_CACHE)
return set_ext_hw_attr(hwc, event);
 
+   if (attr->type == PERF_TYPE_HW_TRANSACTION)
+   return set_hw_transaction_attr(hwc, event);
+
if (attr->config >= x86_pmu.max_events)
return -EINVAL;
 
@@ -1547,6 +1582,7 @@ static int x86_pmu_event_init(struct perf_event *event)
case PERF_TYPE_RAW:
case PERF_TYPE_HARDWARE:
case PERF_TYPE_HW_CACHE:
+   case PERF_TYPE_HW_TRANSACTION:
break;
 
default:
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 724a141..6a8730e 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -452,6 +452,10 @@ extern u64 __read_mostly hw_cache_extra_regs
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX];
 
+extern u64 __read_mostly hw_transaction_event_ids
+   [PERF_COUNT_HW_TRANSACTION_MAX]
+   [PERF_COUNT_HW_ABORT_MAX];
+
 u64 x86_perf_event_update(struct perf_event *event);
 
 static inline int x86_pmu_addr_offset(int index)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c488ae2..1867bed 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -32,6 +32,7 @@ enum perf_type_id {
PERF_TYPE_HW_CACHE  = 3,
PERF_TYPE_RAW   = 4,
PERF_TYPE_BREAKPOINT= 5,
+   PERF_TYPE_HW_TRANSACTION= 6,
 
PERF_TYPE_MAX,  /* non-ABI */
 };
@@ -94,6 +95,30 @@ enum perf_hw_cache_op_result_id {
 };
 
 /*
+ * Transactional memory related events:
+ * { op, reason } (8 bits each)
+ * Only aborts have a reason.
+ */
+enum perf_hw_transaction_op_id {
+   PERF_COUNT_HW_TRANSACTION_START = 0,
+   PERF_COUNT_HW_TRANSACTION_COMMIT= 1,
+   PERF_COUNT_HW_TRANSACTION_ABORT = 

Re: [PATCH 3/3] DMA: PL330: Balance module remove function with probe

2012-09-27 Thread Inderpal Singh
On 27 September 2012 21:36, Jassi Brar  wrote:
> On Thu, Sep 27, 2012 at 9:11 PM, Inderpal Singh
>  wrote:
>> On 27 September 2012 15:18, Vinod Koul  wrote:
>>> On Wed, 2012-09-26 at 12:11 +0530, Inderpal Singh wrote:
 If we fail pl330_remove while some client is queued, the force unload
 will fail and the
 force unload will lose its purpose.
 How about conditionally DMA_TERMINATE_ALL and free resources like
 below ?
>>> Why would you want to remove the driver when it is doing something
>>> useful? You have to ensure driver is not doing anything.
>>>
>>> What is point here?
>>>
>> As mentioned by jassi,  if the pl330 module is forced unloaded while
>> some client is queued, we have to manually do DMA_TERMINATE_ALL.
>>
> I meant that in the current situation. Not ideally.
>
>> If failing remove is a better option in case some client is queued, we
>> can do away with DMA_TERMINATE_ALL and free_chan_resources and simply
>> return a suitable error code from remove.
>>
> That was exactly what I suggested as an alternative.

Yes, but our discussion went about continue doing DMA_TERMINATE_ALL and freeing.

Now, if we have to check if any client is using the channel and then
decide. We will have to traverse the channel list twice once to check
the usage and second time to delete the nodes from the list if we go
ahead with remove.
The remove will look like below:

@@ -3008,18 +3008,19 @@ static int __devexit pl330_remove(struct
amba_device *adev)
if (!pdmac)
return 0;

+   /* check if any client is using any channel */
+   list_for_each_entry_safe(pch, _p, >ddma.channels,
+   chan.device_node) {
+
+   if (pch->chan.client_count)
+   return -EBUSY;
+   }
+
amba_set_drvdata(adev, NULL);

-   /* Idle the DMAC */
list_for_each_entry_safe(pch, _p, >ddma.channels,
chan.device_node) {

/* Remove the channel */
list_del(>chan.device_node);
-
-   /* Flush the channel */
-   pl330_control(>chan, DMA_TERMINATE_ALL, 0);
-   pl330_free_chan_resources(>chan);
}

Please suggest if there is any better way to do it.

Thanks,
Inder
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/31] perf, tools: Support sorting by intx, abort branch flags

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Extend the perf branch sorting code to support sorting by intx
or abort qualifiers. Also print out those qualifiers.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-report.c |3 +-
 tools/perf/builtin-top.c|4 ++-
 tools/perf/perf.h   |4 ++-
 tools/perf/util/hist.h  |2 +
 tools/perf/util/sort.c  |   55 +++
 tools/perf/util/sort.h  |2 +
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7c88a24..8231cb1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -594,7 +594,8 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __used)
"Use the stdio interface"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-  " dso_from, symbol_to, symbol_from, mispredict"),
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN(0, "showcpuutilization", _conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", _pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 68cd61e..5ab2188 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1227,7 +1227,9 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __used)
OPT_INCR('v', "verbose", ,
"be more verbose (show counter open errors, etc)"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
-  "sort by key(s): pid, comm, dso, symbol, parent"),
+  "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN('n', "show-nr-samples", _conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_DEFAULT('G', "call-graph", , "output_type,min_percent, 
call_order",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index f960ccb..9147ffc 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -188,7 +188,9 @@ struct ip_callchain {
 struct branch_flags {
u64 mispred:1;
u64 predicted:1;
-   u64 reserved:62;
+   u64 intx:1;
+   u64 abort:1;
+   u64 reserved:60;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 0b096c2..71837aa 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -43,6 +43,8 @@ enum hist_column {
HISTC_PARENT,
HISTC_CPU,
HISTC_MISPREDICT,
+   HISTC_INTX,
+   HISTC_ABORT,
HISTC_SYMBOL_FROM,
HISTC_SYMBOL_TO,
HISTC_DSO_FROM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 0f5a0a4..596b82c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -467,6 +467,55 @@ struct sort_entry sort_mispredict = {
.se_width_idx   = HISTC_MISPREDICT,
 };
 
+static int64_t
+sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.abort !=
+   right->branch_info->flags.abort;
+}
+
+static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.abort)
+   out = "A";
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_abort = {
+   .se_header  = "Transaction abort",
+   .se_cmp = sort__abort_cmp,
+   .se_snprintf= hist_entry__abort_snprintf,
+   .se_width_idx   = HISTC_ABORT,
+};
+
+static int64_t
+sort__intx_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.intx !=
+   right->branch_info->flags.intx;
+}
+
+static int hist_entry__intx_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.intx)
+   out = "T";
+
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_intx = {
+   .se_header  = "Branch in transaction",
+   .se_cmp = sort__intx_cmp,
+   .se_snprintf= hist_entry__intx_snprintf,
+   .se_width_idx   = HISTC_INTX,
+};
+
 struct sort_dimension {
const char  *name;
struct sort_entry   *entry;
@@ -488,6 +537,8 @@ static struct sort_dimension sort_dimensions[] = {
DIM(SORT_CPU, "cpu", sort_cpu),
DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
DIM(SORT_SRCLINE, "srcline", 

[PATCH 10/31] perf, x86: Support PERF_SAMPLE_ADDR on Haswell

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Haswell supplies the address for every PEBS event, so always fill it in
when the user requested it.  It will be 0 when not useful (no memory access)

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 5b60dcf..81fc14a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -623,6 +623,10 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
data.raw = 
}
 
+   if ((event->attr.sample_type & PERF_SAMPLE_ADDR) &&
+   x86_pmu.intel_cap.pebs_format >= 2)
+   data.addr = ((struct pebs_record_v2 *)pebs)->nhm.dla;
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/31] perf, x86: Add PEBSv2 record support

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add support for the v2 PEBS format. It has a superset of the v1 PEBS
fields, but has a longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which directly
gives the instruction, not off-by-one instruction. So with precise == 2
we use that directly and don't try to use LBRs and walking basic blocks.
This lowers the overhead significantly.

Some other features are added in later patches.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c  |2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  101 ++---
 2 files changed, 79 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 915b876..87c2ab0 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -395,7 +395,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 * check that PEBS LBR correction does not conflict with
 * whatever the user is asking with attr->branch_sample_type
 */
-   if (event->attr.precise_ip > 1) {
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format 
< 2) {
u64 *br_type = >attr.branch_sample_type;
 
if (has_branch_stack(event)) {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index e38d97b..c8ab670 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -41,6 +41,12 @@ struct pebs_record_nhm {
u64 status, dla, dse, lat;
 };
 
+struct pebs_record_v2 {
+   struct pebs_record_nhm nhm;
+   u64 eventingrip;
+   u64 tsx_tuning;
+};
+
 void init_debug_store_on_cpu(int cpu)
 {
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -545,8 +551,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 {
/*
 * We cast to pebs_record_core since that is a subset of
-* both formats and we don't use the other fields in this
-* routine.
+* both formats.
 */
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct pebs_record_core *pebs = __pebs;
@@ -574,7 +579,10 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
regs.bp = pebs->bp;
regs.sp = pebs->sp;
 
-   if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip())
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
+   regs.ip = ((struct pebs_record_v2 *)pebs)->eventingrip;
+   regs.flags |= PERF_EFLAGS_EXACT;
+   } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip())
regs.flags |= PERF_EFLAGS_EXACT;
else
regs.flags &= ~PERF_EFLAGS_EXACT;
@@ -627,35 +635,21 @@ static void intel_pmu_drain_pebs_core(struct pt_regs 
*iregs)
__intel_pmu_pebs_event(event, iregs, at);
 }
 
-static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+static void intel_pmu_drain_pebs_common(struct pt_regs *iregs, void *at, 
+   void *top)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
-   struct pebs_record_nhm *at, *top;
struct perf_event *event = NULL;
u64 status = 0;
-   int bit, n;
-
-   if (!x86_pmu.pebs_active)
-   return;
-
-   at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
-   top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+   int bit;
 
ds->pebs_index = ds->pebs_buffer_base;
 
-   n = top - at;
-   if (n <= 0)
-   return;
+   for ( ; at < top; at += x86_pmu.pebs_record_size) {
+   struct pebs_record_nhm *p = at;
 
-   /*
-* Should not happen, we program the threshold at 1 and do not
-* set a reset value.
-*/
-   WARN_ONCE(n > x86_pmu.max_pebs_events, "Unexpected number of pebs 
records %d\n", n);
-
-   for ( ; at < top; at++) {
-   for_each_set_bit(bit, (unsigned long *)>status, 
x86_pmu.max_pebs_events) {
+   for_each_set_bit(bit, (unsigned long *)>status, 
x86_pmu.max_pebs_events) {
event = cpuc->events[bit];
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -678,6 +672,61 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
}
 }
 
+static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+{
+   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct debug_store *ds = cpuc->ds;
+   struct pebs_record_nhm *at, *top;
+   int n;
+
+   if (!x86_pmu.pebs_active)
+   return;
+
+   at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+   top = 

[PATCH 20/31] perf, tools: Add support for weight

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

perf record has a new option -W that enables weightened sampling.

Add sorting support in top/report for the average weight per sample and the
total weight sum. This allows to both compare relative cost per event
and the total cost over the measurement period.

Add the necessary glue to perf report, record and the library.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |6 +++
 tools/perf/builtin-annotate.c|2 +-
 tools/perf/builtin-diff.c|7 ++--
 tools/perf/builtin-record.c  |2 +
 tools/perf/builtin-report.c  |7 ++--
 tools/perf/builtin-top.c |5 ++-
 tools/perf/perf.h|1 +
 tools/perf/util/event.h  |1 +
 tools/perf/util/evsel.c  |   10 ++
 tools/perf/util/hist.c   |   20 ---
 tools/perf/util/hist.h   |8 +++-
 tools/perf/util/session.c|3 ++
 tools/perf/util/sort.c   |   51 +-
 tools/perf/util/sort.h   |3 ++
 14 files changed, 108 insertions(+), 18 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index b38a1f9..4930654 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -182,6 +182,12 @@ is enabled for all the sampling events. The sampled branch 
type is the same for
 The various filters must be specified as a comma separated list: 
--branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
+--weight::
+Enable weightened sampling. When the event supports an additional weight per 
sample scale
+the histogram by this weight. This currently works for TSX abort events and 
some memory events
+in precise mode on modern Intel CPUs.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 67522cf..c522367 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
 
-   he = __hists__add_entry(>hists, al, NULL, 1);
+   he = __hists__add_entry(>hists, al, NULL, 1, 1);
if (he == NULL)
return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index d29d350..04c1d21 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -30,9 +30,10 @@ struct perf_diff {
 };
 
 static int hists__add_entry(struct hists *self,
-   struct addr_location *al, u64 period)
+   struct addr_location *al, u64 period,
+   u64 weight)
 {
-   if (__hists__add_entry(self, al, NULL, period) != NULL)
+   if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
return 0;
return -ENOMEM;
 }
@@ -56,7 +57,7 @@ static int diff__process_sample_event(struct perf_tool *tool,
if (al.filtered || al.sym == NULL)
return 0;
 
-   if (hists__add_entry(>hists, , sample->period)) {
+   if (hists__add_entry(>hists, , sample->period, 
sample->weight)) {
pr_warning("problem incrementing symbol period, skipping 
event\n");
return -1;
}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e851bf2..4dbdc4e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -836,6 +836,8 @@ const struct option record_options[] = {
OPT_CALLBACK('j', "branch-filter", _stack,
 "branch filter mask", "branch stack filter modes",
 parse_branch_stack),
+   OPT_BOOLEAN('W', "weight", _weight,
+   "sample by weight (on special events only)"),
OPT_END()
 };
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8231cb1..2c73578 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -88,7 +88,7 @@ static int perf_report__add_branch_hist_entry(struct 
perf_tool *tool,
 * and not events sampled. Thus we use a pseudo period of 1.
 */
he = __hists__add_branch_entry(>hists, al, parent,
-   [i], 1);
+   [i], 1, 1);
if (he) {
struct annotation *notes;
err = -ENOMEM;
@@ -146,7 +146,8 @@ static int perf_evsel__add_hist_entry(struct perf_evsel 
*evsel,
return err;
}
 
-   he = __hists__add_entry(>hists, al, parent, sample->period);
+   he = __hists__add_entry(>hists, al, parent, sample->period,
+   sample->weight);
if (he == NULL)
   

[PATCH 29/31] perf, tools: Add support for record transaction flags

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Add the glue in the user tools to record transaction flags with
--transaction (-T was already taken) and dump them.

Followon patches will use them.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |5 -
 tools/perf/builtin-record.c  |2 ++
 tools/perf/perf.h|1 +
 tools/perf/util/event.h  |1 +
 tools/perf/util/evsel.c  |9 +
 tools/perf/util/session.c|3 +++
 6 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 4930654..2ede9e6 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -182,12 +182,15 @@ is enabled for all the sampling events. The sampled 
branch type is the same for
 The various filters must be specified as a comma separated list: 
--branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
--W::
 --weight::
 Enable weightened sampling. When the event supports an additional weight per 
sample scale
 the histogram by this weight. This currently works for TSX abort events and 
some memory events
 in precise mode on modern Intel CPUs.
 
+-T::
+--transaction::
+Record transaction flags for transaction related events.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4dbdc4e..5987f07 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -838,6 +838,8 @@ const struct option record_options[] = {
 parse_branch_stack),
OPT_BOOLEAN('W', "weight", _weight,
"sample by weight (on special events only)"),
+   OPT_BOOLEAN(0, "transaction", _transaction,
+   "sample transaction flags (special events only)"),
OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index a98dcf2..d1b1e82 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -233,6 +233,7 @@ struct perf_record_opts {
u64  branch_stack;
u64  default_interval;
u64  user_interval;
+   bool sample_transaction;
 };
 
 #endif
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 5ac79f3..b902d16 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -78,6 +78,7 @@ struct perf_sample {
u64 stream_id;
u64 period;
u64 weight;
+   u64 transaction;
u32 cpu;
u32 raw_size;
void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8790069..b0921a6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -420,6 +420,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
perf_record_opts *opts,
if (opts->sample_weight)
attr->sample_type   |= PERF_SAMPLE_WEIGHT;
 
+   if (opts->sample_transaction)
+   attr->sample_type   |= PERF_SAMPLE_TRANSACTION;
+
if (opts->call_graph)
attr->sample_type   |= PERF_SAMPLE_CALLCHAIN;
 
@@ -876,6 +879,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, 
union perf_event *event,
array++;
}
 
+   data->transaction = 0;
+   if (type & PERF_SAMPLE_TRANSACTION) {
+   data->transaction = *array;
+   array++;
+   }
+
if (type & PERF_SAMPLE_READ) {
fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
return -1;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 84f000c..ae79f22 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -919,6 +919,9 @@ static void dump_sample(struct perf_session *session, union 
perf_event *event,
 
if (sample_type & PERF_SAMPLE_WEIGHT)
printf("... weight: %" PRIu64 "\n", sample->weight);
+
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   printf("... transaction: %" PRIx64 "\n", sample->transaction);
 }
 
 static struct machine *
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/31] perf, x86: Disable LBR recording for unknown LBR_FMT

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

When the LBR format is unknown disable LBR recording. This prevents
crashes when the LBR address is misdecoded and mis-sign extended.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 2af6695b..ad5af13 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -13,6 +13,7 @@ enum {
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
LBR_FORMAT_EIP_FLAGS2   = 0x04,
+   LBR_FORMAT_MAX_KNOWN= LBR_FORMAT_EIP_FLAGS2,
 };
 
 /*
@@ -392,7 +393,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
/*
 * no LBR on this PMU
 */
-   if (!x86_pmu.lbr_nr)
+   if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > 
LBR_FORMAT_MAX_KNOWN)
return -EOPNOTSUPP;
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 30/31] perf, tools: Point --sort documentation to --help

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

The --sort documentation for top and report was hopelessly out-of-date
Instead of having two more places that would need to be updated,
just point to --help.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |2 +-
 tools/perf/Documentation/perf-top.txt|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 495210a..b4e747a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -57,7 +57,7 @@ OPTIONS
 
 -s::
 --sort=::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): See --help for a full list.
 
 -p::
 --parent=::
diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 5b80d84..0f0fa3e 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -112,7 +112,7 @@ Default is to monitor all CPUS.
 
 -s::
 --sort::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): see --help for a full list.
 
 -n::
 --show-nr-samples::
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/31] perf, x86: Support full width counting on Haswell

2012-09-27 Thread Andi Kleen
From: Andi Kleen 

Haswell has a new alternative MSR range for perfctrs that allows writing the 
full
counter width. Enable this range if the hardware reports it using a new 
capability
bit. This lowers overhead of perf stat slightly because it has to do less 
interrupts
to accumulate the counter value. It also avoids some problems with TSX
aborting when the end of the counter range is reached.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/msr-index.h   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |1 +
 arch/x86/kernel/cpu/perf_event_intel.c |6 ++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 957ec87..cbf344f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -121,6 +121,9 @@
 #define MSR_P6_EVNTSEL00x0186
 #define MSR_P6_EVNTSEL10x0187
 
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0  0x04c1
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
complete list. */
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8200c69..8550601 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -282,6 +282,7 @@ union perf_capabilities {
u64 pebs_arch_reg:1;
u64 pebs_format:4;
u64 smm_freeze:1;
+   u64 fw_write:1;
};
u64 capabilities;
 };
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index cd48669..e302186 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2188,5 +2188,11 @@ __init int intel_pmu_init(void)
}
}
 
+   /* Support full width counters using alternative MSR range */
+   if (x86_pmu.intel_cap.fw_write) {
+   x86_pmu.max_period = x86_pmu.cntval_mask;
+   x86_pmu.perfctr = MSR_IA32_PMC0;
+   }
+
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


perf PMU support for Haswell

2012-09-27 Thread Andi Kleen
This adds perf PMU support for the upcoming Haswell core. The patchkit 
is fairly large, mainly due to various enhancement for TSX. TSX tuning
relies heavily on the PMU, so I tried hard to make all facilities 
easily available. In addition it also has some other enhancements.

This includes changes to the core perf code, to the x86 specific part,
to the perf user land tools and to KVM

High level overview:

- Basic Haswell PMU support
- Easy high level TSX measurement in perf stat -T
- Generic transactional memory events, plus Haswell implementations.
- Generic weightend profiling for memory latency and transaction abort costs.
- Support for address profiling
- Support for filtering events inside/outside transactions
- KVM support to do this from guests
- Support for filtering/sorting/bucketing transaction abort types based on 
PEBS information
- LBR support for transactions

For more details on the Haswell PMU please see the SDM. For more details on TSX
please see http://halobates.de/adding-lock-elision-to-linux.pdf

Review appreciated.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] iommu/shmobile: Add iommu driver for Renesas IPMMU modules

2012-09-27 Thread Hideki EIRAKU
Hi,

From: Nobuhiro Iwamatsu 
Subject: Re: [PATCH v3 1/2] iommu/shmobile: Add iommu driver for Renesas IPMMU 
modules
Date: Wed, 12 Sep 2012 17:07:00 +0900

>> +static inline void ipmmu_add_device(struct device *dev)
>> +{
>> +}
> 
> Please use 'do { } while (0)'.

Do you mean using #define macro is better than this inline function?
I chose the inline function because:

- The function's argument type is checked by a compiler.
- Its output code should be exactly the same as a macro in this case.
- The Linux kernel coding style says "Generally, inline functions are
  preferable to macros resembling functions."

>> +   switch (size) {
>> +   default:
>> +   priv->tlb_enabled = 0;
>> +   break;
>> +   case 0x2000:
>> +   ipmmu_reg_write(priv, IMTTBCR, 1);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x1000:
>> +   ipmmu_reg_write(priv, IMTTBCR, 2);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x800:
>> +   ipmmu_reg_write(priv, IMTTBCR, 3);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x400:
>> +   ipmmu_reg_write(priv, IMTTBCR, 4);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x200:
>> +   ipmmu_reg_write(priv, IMTTBCR, 5);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x100:
>> +   ipmmu_reg_write(priv, IMTTBCR, 6);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   case 0x80:
>> +   ipmmu_reg_write(priv, IMTTBCR, 7);
>> +   priv->tlb_enabled = 1;
>> +   break;
>> +   }
> 
> I thought that you could describe more briefly if ffs() is used.

This is simply converted from a hardware manual.  ffs() can be used
like below:

bit = ffs(size);
if (bit >= 7 && bit <= 13 && (1 << bit) == size) {
ipmmu_reg_write(priv, IMTTBCR, 14 - bit);
priv->tlb_enabled = 1;
} else {
priv->tlb_enabled = 0;
}

Checking size is still needed because only 7 sizes are allowed here.
I think using switch() is easier to understand.

>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_8192
>> +#define L1_SIZE 8192
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_4096
>> +#define L1_SIZE 4096
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_2048
>> +#define L1_SIZE 2048
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_1024
>> +#define L1_SIZE 1024
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_512
>> +#define L1_SIZE 512
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_256
>> +#define L1_SIZE 256
>> +#endif
>> +#ifdef CONFIG_SHMOBILE_IOMMU_L1SIZE_128
>> +#define L1_SIZE 128
>> +#endif
> 
> I think that it was better to define by kconfig.
> For example, following codes.
> 
> +config SHMOBILE_IOMMU_L1SIZE
> +   hex
> +   default "0x2000" if SHMOBILE_IOMMU_L1SIZE_8192
> +   default "0x1000" if SHMOBILE_IOMMU_L1SIZE_4096
> +   default "0x0800" if SHMOBILE_IOMMU_L1SIZE_2048
> +   default "0x0400" if SHMOBILE_IOMMU_L1SIZE_1024
> +   default "0x0200" if SHMOBILE_IOMMU_L1SIZE_512
> +   default "0x0100" if SHMOBILE_IOMMU_L1SIZE_256
> +   default "0x0080" if SHMOBILE_IOMMU_L1SIZE_128

I did not know that way.  It looks good for me too.
Thank you.

-- 
Hideki EIRAKU 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Mike Galbraith
On Thu, 2012-09-27 at 12:40 -0700, Linus Torvalds wrote: 
> On Thu, Sep 27, 2012 at 11:29 AM, Peter Zijlstra  
> wrote:
> >
> > Don't forget to run the desktop interactivity benchmarks after you're
> > done wriggling with this knob... wakeup preemption is important for most
> > those.
> 
> So I don't think we want to *just* wiggle that knob per se. We
> definitely don't want to hurt latency on actual interactive asks. But
> it's interesting that it helps psql so much, and that there seems to
> be some interaction with the select_idle_sibling().
> 
> So I do have a few things I react to when looking at that wakeup granularity..
> 
> I wonder about this comment, for example:
> 
>  * By using 'se' instead of 'curr' we penalize light tasks, so
>  * they get preempted easier. That is, if 'se' < 'curr' then
>  * the resulting gran will be larger, therefore penalizing the
>  * lighter, if otoh 'se' > 'curr' then the resulting gran will
>  * be smaller, again penalizing the lighter task.
> 
> why would we want to preempt light tasks easier? It sounds backwards
> to me. If they are light, we have *less* reason to preempt them, since
> they are more likely to just go to sleep on their own, no?

At, that particular 'light' refers to se->load.weight.

> Another question is whether the fact that this same load interacts
> with select_idle_sibling() is perhaps a sign that maybe the preemption
> logic is all fine, but it interacts badly with the "pick new cpu"
> code. In particular, after having changed rq's, is the vruntime really
> comparable? IOW, maybe this is an interaction between "place_entity()"
> and then the immediately following (?) call to check wakeup
> preemption?

I think vruntime should be fine.  We set take the delta between the
task's vruntime when it went to sleep and it's previous rq min_vruntime
to capture progress made while it slept, and apply the relative offset
in the task's new home so a task can migrate and still have a chance to
preempt on wakeup.

> The fact that *either* changing select_idle_sibling() *or* changing
> the wakeup preemption granularity seems to have such a huge impact
> does seem to tie them together somehow for this particular load. No?

The way I read it, Boris had wakeup preemption disabled.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-27 Thread Yasuaki Ishimatsu

Hi Chen,

2012/09/28 11:22, Ni zhan Chen wrote:

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu 

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline >/sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.


But this time there is not a function associated with add_memory.


To associate with add_memory() later, we renamed it.

Thanks,
Yasuaki Ishimatsu





CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
  drivers/acpi/acpi_memhotplug.c |2 +-
  drivers/base/memory.c  |9 +++--
  include/linux/memory_hotplug.h |3 ++-
  mm/memory_hotplug.c|   22 ++
  4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
   */
  list_for_each_entry_safe(info, n, _device->res_list, list) {
  if (info->enabled) {
-result = remove_memory(info->start_addr, info->length);
+result = offline_memory(info->start_addr, info->length);
  if (result)
  return result;
  }
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
  static int
  memory_block_action(unsigned long phys_index, unsigned long action)
  {
-unsigned long start_pfn, start_paddr;
+unsigned long start_pfn;
  unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
  struct page *first_page;
  int ret;
  first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);
+start_pfn = page_to_pfn(first_page);
  switch (action) {
  case MEM_ONLINE:
-start_pfn = page_to_pfn(first_page);
-
  if (!pages_correctly_reserved(start_pfn, nr_pages))
  return -EBUSY;
  ret = online_pages(start_pfn, nr_pages);
  break;
  case MEM_OFFLINE:
-start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
-ret = remove_memory(start_paddr,
-nr_pages << PAGE_SHIFT);
+ret = offline_pages(start_pfn, nr_pages);
  break;
  default:
  WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
  extern int mem_online_node(int nid);
  extern int add_memory(int nid, u64 start, u64 size);
  extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
  int nr_pages);
  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..bb42316 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
  return offlined;
  }
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
  {
  unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
  return ret;
  }
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
  {
-unsigned long start_pfn, end_pfn;
+return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
-start_pfn = PFN_DOWN(start);
-end_pfn = start_pfn + PFN_DOWN(size);
-return offline_pages(start_pfn, end_pfn, 120 * HZ);
+int offline_memory(u64 start, u64 size)
+{
+return -EINVAL;
  }
  #else
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start, 

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-27 Thread Mike Galbraith
On Thu, 2012-09-27 at 21:24 +0200, Borislav Petkov wrote: 
> On Thu, Sep 27, 2012 at 08:29:44PM +0200, Peter Zijlstra wrote:
> > > >> Or could we just improve the heuristics. What happens if the
> > > >> scheduling granularity is increased, for example? It's set to 1ms
> > > >> right now, with a logarithmic scaling by number of cpus.
> > > >
> > > > /proc/sys/kernel/sched_wakeup_granularity_ns=1000 (10ms)
> > > > --
> > > > tps = 4994.730809 (including connections establishing)
> > > > tps = 5000.260764 (excluding connections establishing)
> > > >
> > > > A bit better over the default NO_WAKEUP_PREEMPTION setting.
> > > 
> > > Ok, so this gives us something possible to actually play with.
> > > 
> > > For example, maybe SCHED_TUNABLESCALING_LINEAR is more appropriate
> > > than SCHED_TUNABLESCALING_LOG. At least for WAKEUP_PREEMPTION. Hmm?
> > 
> > Don't forget to run the desktop interactivity benchmarks after you're
> > done wriggling with this knob... wakeup preemption is important for most
> > those.
> 
> Setting sched_tunable_scaling to SCHED_TUNABLESCALING_LINEAR made
> wakeup_granularity go to 4ms:
> 
> sched_autogroup_enabled:1
> sched_child_runs_first:0
> sched_latency_ns:2400
> sched_migration_cost_ns:50
> sched_min_granularity_ns:300
> sched_nr_migrate:32
> sched_rt_period_us:100
> sched_rt_runtime_us:95
> sched_shares_window_ns:1000
> sched_time_avg_ms:1000
> sched_tunable_scaling:2
> sched_wakeup_granularity_ns:400
> 
> pgbench results look good:
> 
> tps = 4997.675331 (including connections establishing)
> tps = 5003.256870 (excluding connections establishing)
> 
> This is still with Ingo's NO_WAKEUP_PREEMPTION patch.

And wakeup preemption is still disabled as well, correct?

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] memory-hotplug: add memory_block_release

2012-09-27 Thread Yasuaki Ishimatsu

Hi Kosaki-san,

2012/09/28 10:35, KOSAKI Motohiro wrote:

On Thu, Sep 27, 2012 at 8:24 PM, Yasuaki Ishimatsu
 wrote:

Hi Chen,


2012/09/27 19:20, Ni zhan Chen wrote:


Hi Congyang,

2012/9/27 


From: Yasuaki Ishimatsu 

When calling remove_memory_block(), the function shows following message
at
device_release().

Device 'memory528' does not have a release() function, it is broken and
must
be fixed.



What's the difference between the patch and original implemetation?



The implementation is for removing a memory_block. So the purpose is
same as original one. But original code is bad manner. kobject_cleanup()
is called by remove_memory_block() at last. But release function for
releasing memory_block is not registered. As a result, the kernel message
is shown. IMHO, memory_block should be release by the releae function.


but your patch introduced use after free bug, if i understand correctly.
See unregister_memory() function. After your patch, kobject_put() call
release_memory_block() and kfree(). and then device_unregister() will
touch freed memory.


It is not correct. The kobject_put() is prepared against find_memory_block()
in remove_memory_block() since kobject->kref is incremented in it.
So release_memory_block() is called by device_unregister() correctly as follows:

[ 1014.589008] Pid: 126, comm: kworker/0:2 Not tainted 
3.6.0-rc3-enable-memory-hotremove-and-root-bridge #3
[ 1014.702437] Call Trace:
[ 1014.731684]  [] release_memory_block+0x16/0x30
[ 1014.803581]  [] device_release+0x27/0xa0
[ 1014.869312]  [] kobject_cleanup+0x82/0x1b0
[ 1014.937062]  [] kobject_release+0xd/0x10
[ 1015.002718]  [] kobject_put+0x2c/0x60
[ 1015.065271]  [] put_device+0x17/0x20
[ 1015.126794]  [] device_unregister+0x2a/0x60
[ 1015.195578]  [] remove_memory_block+0xbb/0xf0
[ 1015.266434]  [] unregister_memory_section+0x1f/0x30
[ 1015.343532]  [] __remove_section+0x68/0x110
[ 1015.412318]  [] __remove_pages+0xe7/0x120
[ 1015.479021]  [] arch_remove_memory+0x2c/0x80
[ 1015.548845]  [] remove_memory+0x6b/0xd0
[ 1015.613474]  [] acpi_memory_device_remove_memory+0x48/0x73
[ 1015.697834]  [] acpi_memory_device_remove+0x2b/0x44
[ 1015.774922]  [] acpi_device_remove+0x90/0xb2
[ 1015.844796]  [] __device_release_driver+0x7c/0xf0
[ 1015.919814]  [] device_release_driver+0x2f/0x50
[ 1015.992753]  [] acpi_bus_remove+0x32/0x6d
[ 1016.059462]  [] acpi_bus_trim+0x91/0x102
[ 1016.125128]  [] acpi_bus_hot_remove_device+0x88/0x16b
[ 1016.204295]  [] acpi_os_execute_deferred+0x27/0x34
[ 1016.280350]  [] process_one_work+0x219/0x680
[ 1016.350173]  [] ? process_one_work+0x1b8/0x680
[ 1016.422072]  [] ? acpi_os_wait_events_complete+0x23/0x23
[ 1016.504357]  [] worker_thread+0x12e/0x320
[ 1016.571064]  [] ? manage_workers+0x110/0x110
[ 1016.640886]  [] kthread+0xc6/0xd0
[ 1016.699290]  [] kernel_thread_helper+0x4/0x10
[ 1016.770149]  [] ? retint_restore_args+0x13/0x13
[ 1016.843165]  [] ? __init_kthread_worker+0x70/0x70
[ 1016.918200]  [] ? gs_change+0x13/0x13

Thanks,
Yasuaki Ishimatsu



static void
unregister_memory(struct memory_block *memory)
{
BUG_ON(memory->dev.bus != _subsys);

/* drop the ref. we got in remove_memory_block() */
kobject_put(>dev.kobj);
device_unregister(>dev);
}




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/57] power: ab8500_bm: Don't clear the CCMuxOffset bit

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:11:59AM -0600, mathieu.poir...@linaro.org wrote:
> From: Kalle Komierowski 
> 
> The CCMuxOffset bit is not kept set, this will force the columb counter
> of the AB8500 to use the measure offset calibration.
> This should increase the accuracy of the fuel gauge.
> 
> Signed-off-by: Kalle Komierowski 
> Signed-off-by: Marcus Cooper 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Jonas ABERG 
> ---
>  drivers/power/ab8500_fg.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/power/ab8500_fg.c b/drivers/power/ab8500_fg.c
> index bf02225..af792a8 100644
> --- a/drivers/power/ab8500_fg.c
> +++ b/drivers/power/ab8500_fg.c
> @@ -485,8 +485,9 @@ static int ab8500_fg_coulomb_counter(struct ab8500_fg 
> *di, bool enable)
>   di->flags.fg_enabled = true;
>   } else {
>   /* Clear any pending read requests */
> - ret = abx500_set_register_interruptible(di->dev,
> - AB8500_GAS_GAUGE, AB8500_GASG_CC_CTRL_REG, 0);
> + ret = abx500_mask_and_set_register_interruptible(di->dev,
> + AB8500_GAS_GAUGE, AB8500_GASG_CC_CTRL_REG,
> + (RESET_ACCU | READ_REQ), 0);
>   if (ret)
>   goto cc_err;
>  
> @@ -1404,8 +1405,7 @@ static void ab8500_fg_algorithm_discharging(struct 
> ab8500_fg *di)
>   sleep_time = di->bat->fg_params->init_timer;
>  
>   /* Discard the first [x] seconds */
> - if (di->init_cnt >
> - di->bat->fg_params->init_discard_time) {
> + if (di->init_cnt > di->bat->fg_params->init_discard_time) {

This change is OK, but it's cosmetic, and desires its own patch (you can
combine all cosmetic changes, which does not change the logic, into one
patch).

>   ab8500_fg_calc_cap_discharge_voltage(di, true);
>  
>   ab8500_fg_check_capacity_limits(di, true);
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/57] power: ab8500_bm: Charger current step-up/down

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:11:58AM -0600, mathieu.poir...@linaro.org wrote:
> From: Johan Bjornstedt 
> 
> There is no state machine in the AB to step up/down
> the charger current to avoid dips and spikes on VBUS
> and VBAT when charging is started.
> Instead this is implemented in SW

Some general comment: the commit messages use random line wrapping length.

It's usually a good idea to make lines no longer than 74 columns (since
'git log' adds some spaces before the message) , but too short or random
is also not pretty.

> Signed-off-by: Johan Bjornstedt 
> Signed-off-by: Mattias Wallin 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Karl KOMIEROWSKI 
> ---
>  drivers/power/ab8500_charger.c |  172 
> +++-
>  1 files changed, 133 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
> index d4f0c98..3ceb788 100644
> --- a/drivers/power/ab8500_charger.c
> +++ b/drivers/power/ab8500_charger.c
> @@ -77,6 +77,9 @@
>  /* Lowest charger voltage is 3.39V -> 0x4E */
>  #define LOW_VOLT_REG 0x4E
>  
> +/* Step up/down delay in us */
> +#define STEP_UDELAY  1000
> +
>  /* UsbLineStatus register - usb types */
>  enum ab8500_charger_link_status {
>   USB_STAT_NOT_CONFIGURED,
> @@ -934,6 +937,88 @@ static int ab8500_charger_get_usb_cur(struct 
> ab8500_charger *di)
>  }
>  
>  /**
> + * ab8500_charger_set_current() - set charger current
> + * @di:  pointer to the ab8500_charger structure
> + * @ich: charger current, in mA
> + * @reg: select what charger register to set
> + *
> + * Set charger current.
> + * There is no state machine in the AB to step up/down the charger
> + * current to avoid dips and spikes on MAIN, VBUS and VBAT when
> + * charging is started. Instead we need to implement
> + * this charger current step-up/down here.
> + * Returns error code in case of failure else 0(on success)

Random line wrapping...

Sophisticated editors (like vim :-), can format text for you. i.e. 'gqip'
command. I'm sure emacs can do this too.

> + */
> +static int ab8500_charger_set_current(struct ab8500_charger *di,
> + int ich, int reg)
> +{
> + int ret, i;
> + int curr_index, prev_curr_index, shift_value;

One variable per line, please.

> + u8 reg_value;
> +
> + switch (reg) {
> + case AB8500_MCH_IPT_CURLVL_REG:
> + shift_value = MAIN_CH_INPUT_CURR_SHIFT;
> + curr_index = ab8500_current_to_regval(ich);
> + break;
> + case AB8500_USBCH_IPT_CRNTLVL_REG:
> + shift_value = VBUS_IN_CURR_LIM_SHIFT;
> + curr_index = ab8500_vbus_in_curr_to_regval(ich);
> + break;
> + case AB8500_CH_OPT_CRNTLVL_REG:
> + shift_value = 0;
> + curr_index = ab8500_current_to_regval(ich);
> + break;
> + default:
> + dev_err(di->dev, "%s current register not valid\n", __func__);
> + return -ENXIO;
> + }
> +
> + if (curr_index < 0) {
> + dev_err(di->dev, "requested current limit out-of-range\n");
> + return -ENXIO;
> + }
> +
> + ret = abx500_get_register_interruptible(di->dev, AB8500_CHARGER,
> + reg, _value);
> + if (ret < 0) {
> + dev_err(di->dev, "%s read failed\n", __func__);
> + return ret;
> + }
> + prev_curr_index = (reg_value >> shift_value);

No need for parenthesis.

> + /* only update current if it's been changed */
> + if (prev_curr_index == curr_index)
> + return 0;
> +
> + dev_dbg(di->dev, "%s set charger current: %d mA for reg: 0x%02x\n",
> + __func__, ich, reg);
> +
> + if (prev_curr_index > curr_index) {
> + for (i = prev_curr_index - 1; i >= curr_index; i--) {
> + ret = abx500_set_register_interruptible(di->dev,
> + AB8500_CHARGER, reg, (u8) i << shift_value);
> + if (ret) {
> + dev_err(di->dev, "%s write failed\n", __func__);
> + return ret;
> + }
> + usleep_range(STEP_UDELAY, STEP_UDELAY * 2);
> + }
> + } else {
> + for (i = prev_curr_index + 1; i <= curr_index; i++) {
> + ret = abx500_set_register_interruptible(di->dev,
> + AB8500_CHARGER, reg, (u8) i << shift_value);
> + if (ret) {
> + dev_err(di->dev, "%s write failed\n", __func__);
> + return ret;
> + }
> + usleep_range(STEP_UDELAY, STEP_UDELAY * 2);
> + }
> + }

Too much duplication.

Assuming that you need to preserve the order of the writes, i.e. if it
matters to the hw, I guess this (or something alike) will work:

write_current()
{
uint start = 

Re: [PATCH -v2] mm: frontswap: fix a wrong if condition in frontswap_shrink

2012-09-27 Thread Zhenzhong Duan



On 2012-09-27 19:35, Paul Bolle wrote:

On Fri, 2012-09-21 at 16:40 +0800, Zhenzhong Duan wrote:


@@ -275,7 +280,7 @@ static int __frontswap_shrink(unsigned long target_pages,
if (total_pages<= target_pages) {
/* Nothing to do */
*pages_to_unuse = 0;


I think setting pages_to_unuse to zero here is not needed. It is
initiated to zero in frontswap_shrink() and hasn't been touched since.
See my patch at https://lkml.org/lkml/2012/9/27/250.
Yes, it's unneeded. But I didn't see warning as you said in above link 
when run 'make V=1 mm/frontswap.o'.

-   return 0;
+   return 1;
}
total_pages_to_unuse = total_pages - target_pages;
return __frontswap_unuse_pages(total_pages_to_unuse, pages_to_unuse, 
type);
@@ -302,7 +307,7 @@ void frontswap_shrink(unsigned long target_pages)
spin_lock(_lock);
ret = __frontswap_shrink(target_pages,_to_unuse,);
spin_unlock(_lock);
-   if (ret == 0&&  pages_to_unuse)
+   if (ret == 0)
try_to_unuse(type, true, pages_to_unuse);
return;
  }


Are you sure pages_to_unuse won't be zero here? I've stared quite a bit
at __frontswap_unuse_pages() and it's not obvious pages_to_unuse (there
also called unused) will never be zero when that function returns zero.

pages_to_unuse==0 means all pages need to be unused.

zduan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull][vfs.git] a couple of fixes

2012-09-27 Thread Al Viro
A couple of fixes; one for automount/lazy umount race, another
a classic "we don't protect the refcount transition to zero with the
lock that protects looking for object in hash" kind of crap in lockd.
Please, pull.  The usual place -
git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (2):
  do_add_mount()/umount -l races
  close the race in nlmsvc_free_block()

Diffstat:
 fs/lockd/svclock.c |3 +--
 fs/namespace.c |   10 --
 2 files changed, 9 insertions(+), 4 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the tip tree with the rr tree

2012-09-27 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/x86/Kconfig between commit 9a9d5786a5e7 ("Make most arch
asm/module.h files use asm-generic/module.h") from the rr tree and
commits fdf9c356502a ("cputime: Make finegrained irqtime accounting
generally available") and edf55fda35c7 ("x86: Exit RCU extended QS on
notify resume") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/x86/Kconfig
index ede3e92,56e7a25..000
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@@ -97,11 -102,12 +102,14 @@@ config X8
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
 +  select MODULES_USE_ELF_REL if X86_32
 +  select MODULES_USE_ELF_RELA if X86_64
+   select HAVE_RCU_USER_QS if X86_64
+   select HAVE_IRQ_TIME_ACCOUNTING
  
  config INSTRUCTION_DECODER
-   def_bool (KPROBES || PERF_EVENTS || UPROBES)
+   def_bool y
+   depends on KPROBES || PERF_EVENTS || UPROBES
  
  config OUTPUT_FORMAT
string


pgpGnASZxIIxz.pgp
Description: PGP signature


Re: [RFC v9 PATCH 05/21] memory-hotplug: check whether memory is present or not

2012-09-27 Thread Ni zhan Chen

On 09/11/2012 10:24 AM, Yasuaki Ishimatsu wrote:

Hi Wen,

2012/09/11 11:15, Wen Congyang wrote:

Hi, ishimatsu

At 09/05/2012 05:25 PM, we...@cn.fujitsu.com Wrote:

From: Yasuaki Ishimatsu 

If system supports memory hot-remove, online_pages() may online 
removed pages.
So online_pages() need to check whether onlining pages are present 
or not.


Because we use memory_block_change_state() to hotremoving memory, I 
think

this patch can be removed. What do you think?


Pleae teach me detals a little more. If we use 
memory_block_change_state(),

does the conflict never occur? Why?


since memory hot-add or hot-remove is based on memblock, if check in 
memory_block_change_state()

can guarantee conflict never occur?



Thansk,
Yasuaki Ishimatsu


Thanks
Wen Congyang



CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Wen Congyang 
Signed-off-by: Yasuaki Ishimatsu 
---
  include/linux/mmzone.h |   19 +++
  mm/memory_hotplug.c|   13 +
  2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ac3ae30 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1180,6 +1180,25 @@ void sparse_init(void);
  #define sparse_index_init(_sec, _nid)  do {} while (0)
  #endif /* CONFIG_SPARSEMEM */

+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long 
nr_pages)

+{
+int i;
+for (i = 0; i < nr_pages; i++) {
+if (pfn_present(pfn + i))
+continue;
+else
+return -EINVAL;
+}
+return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long 
nr_pages)

+{
+return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
  #ifdef CONFIG_NODES_SPAN_OTHER_NODES
  bool early_pfn_in_nid(unsigned long pfn, int nid);
  #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7747..299747d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, 
unsigned long nr_pages)

  struct memory_notify arg;

  lock_memory_hotplug();
+/*
+ * If system supports memory hot-remove, the memory may have been
+ * removed. So we check whether the memory has been removed or 
not.

+ *
+ * Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+ *   effective. If CONFIG_SPARSEMEM is not defined, 
pfns_present()

+ *   always returns 0.
+ */
+ret = pfns_present(pfn, nr_pages);
+if (ret) {
+unlock_memory_hotplug();
+return ret;
+}
  arg.start_pfn = pfn;
  arg.nr_pages = nr_pages;
  arg.status_change_nid = -1;





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the tip tree with the rr tree

2012-09-27 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in arch/Kconfig
between commit 9a9d5786a5e7 ("Make most arch asm/module.h files use
asm-generic/module.h") from the rr tree and commits fdf9c356502a
("cputime: Make finegrained irqtime accounting generally available") and
2b1d5024e17b ("rcu: Settle config for userspace extended quiescent
state") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/Kconfig
index 3450115,a62965d..000
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@@ -281,23 -294,23 +294,42 @@@ config SECCOMP_FILTE
  
  See Documentation/prctl/seccomp_filter.txt for details.
  
 +config HAVE_MOD_ARCH_SPECIFIC
 +  bool
 +  help
 +The arch uses struct mod_arch_specific to store data.  Many arches
 +just need a simple module loader without arch specific data - those
 +should not enable this.
 +
 +config MODULES_USE_ELF_RELA
 +  bool
 +  help
 +Modules only use ELF RELA relocations.  Modules with ELF REL
 +relocations will give an error.
 +
 +config MODULES_USE_ELF_REL
 +  bool
 +  help
 +Modules only use ELF REL relocations.  Modules with ELF RELA
 +relocations will give an error.
 +
+ config HAVE_RCU_USER_QS
+   bool
+   help
+ Provide kernel entry/exit hooks necessary for userspace
+ RCU extended quiescent state. Syscalls need to be wrapped inside
+ rcu_user_exit()-rcu_user_enter() through the slow path using
+ TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
+ are already protected inside rcu_irq_enter/rcu_irq_exit() but
+ preemption or signal handling on irq exit still need to be protected.
+ 
+ config HAVE_VIRT_CPU_ACCOUNTING
+   bool
+ 
+ config HAVE_IRQ_TIME_ACCOUNTING
+   bool
+   help
+ Archs need to ensure they use a high enough resolution clock to
+ support irq time accounting and then call 
enable_sched_clock_irqtime().
+ 
  source "kernel/gcov/Kconfig"


pgp85LjNPxyYd.pgp
Description: PGP signature


[PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch()

2012-09-27 Thread Theodore Ts'o
I've found a much simpler way of fixing this, by using
down_read_trylock().  In the very unlikely case where s_umount is
contended, we can just skip kicking the writeback thread.

- Ted

>From 51ad3407a91ab090d1772b63329bd3b7f2210eb0 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o 
Date: Thu, 27 Sep 2012 23:12:48 -0400
Subject: [PATCH v2] ext4: fix potential deadlock in ext4_nonda_switch()

In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   ---
   dd/8298 is trying to acquire lock:
(>s_umount_key#18){..}, at: [] 
writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
(>s_type->i_mutex_key#8){+.+...}, at: [] 
generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
#0:  (sb_writers#2){.+.+.+}, at: [] 
generic_file_aio_write+0x56/0xd3
#1:  (>s_type->i_mutex_key#8){+.+...}, at: [] 
generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
[] ? console_unlock+0x345/0x372
[] print_circular_bug+0x190/0x19d
[] __lock_acquire+0x86d/0xb6c
[] ? mark_held_locks+0x5c/0x7b
[] lock_acquire+0x66/0xb9
[] ? writeback_inodes_sb_if_idle+0x28/0x46
[] down_read+0x28/0x58
[] ? writeback_inodes_sb_if_idle+0x28/0x46
[] writeback_inodes_sb_if_idle+0x28/0x46
[] ext4_nonda_switch+0xe1/0xf4
[] ext4_da_write_begin+0x27/0x193
[] generic_file_buffered_write+0xc8/0x1bb
[] __generic_file_aio_write+0x1dd/0x205
[] generic_file_aio_write+0x78/0xd3
[] ext4_file_write+0x480/0x4a6
[] ? __lock_acquire+0x41e/0xb6c
[] ? sched_clock_cpu+0x11a/0x13e
[] ? trace_hardirqs_off+0xb/0xd
[] ? local_clock+0x37/0x4e
[] do_sync_write+0x67/0x9d
[] ? wait_on_retry_sync_kiocb+0x44/0x44
[] vfs_write+0x7b/0xe6
[] sys_write+0x3b/0x64
[] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" 
Cc: sta...@vger.kernel.org
---
 fs/ext4/inode.c   | 17 ++---
 fs/fs-writeback.c |  1 +
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0a9a89e..4ea396f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2437,6 +2437,16 @@ static int ext4_nonda_switch(struct super_block *sb)
free_blocks  = EXT4_C2B(sbi,
percpu_counter_read_positive(>s_freeclusters_counter));
dirty_blocks = 
percpu_counter_read_positive(>s_dirtyclusters_counter);
+   /*
+* Start pushing delalloc when 1/2 of free blocks are dirty.
+*/
+   if (dirty_blocks && (free_blocks < 2 * dirty_blocks) &&
+   !writeback_in_progress(sb->s_bdi) &&
+   down_read_trylock(>s_umount)) {
+   writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
+   up_read(>s_umount);
+   }
+
if (2 * free_blocks < 3 * dirty_blocks ||
free_blocks < (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) {
/*
@@ -2445,13 +2455,6 @@ static int ext4_nonda_switch(struct super_block *sb)
 */
return 1;
}
-   /*
-* Even if we don't switch but are nearing capacity,
-* start pushing delalloc when 1/2 of free blocks are dirty.
-*/
-   if (free_blocks < 2 * dirty_blocks)
-   writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
-
return 0;
 }
 
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index be3efc4..5602d73 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -63,6 +63,7 @@ int writeback_in_progress(struct backing_dev_info *bdi)
 {
return test_bit(BDI_writeback_running, >state);
 }
+EXPORT_SYMBOL(writeback_in_progress);
 
 static inline struct backing_dev_info 

Re: [RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device

2012-09-27 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang 

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.


one question:
if use numa emulation, memory device will associated to one node or ...?



CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
CC: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
Reviewed-by: Yasuaki Ishimatsu 
---
  drivers/acpi/acpi_memhotplug.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 2a7beac..7873832 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
  struct acpi_memory_device {
struct acpi_device * device;
unsigned int state; /* State of the memory device */
+   int nid;
struct list_head res_list;
  };
  
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)

info->enabled = 1;
num_enabled++;
}
+
+   mem_device->nid = node;
+
if (!num_enabled) {
printk(KERN_ERR PREFIX "add_memory failed\n");
mem_device->state = MEMORY_INVALID_STATE;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 08/10] efi: Enable secure boot lockdown automatically when enabled in firmware

2012-09-27 Thread Serge Hallyn
Quoting Matthew Garrett (m...@redhat.com):
> The firmware has a set of flags that indicate whether secure boot is enabled
> and enforcing. Use them to indicate whether the kernel should lock itself
> down.
> 
> Signed-off-by: Matthew Garrett 

(purely for the non-firmware bits) seems good, thanks.

Acked-by: Serge E. Hallyn 

> ---
>  Documentation/x86/zero-page.txt  |  2 ++
>  arch/x86/boot/compressed/eboot.c | 32 
>  arch/x86/include/asm/bootparam.h |  3 ++-
>  arch/x86/kernel/setup.c  |  3 +++
>  include/linux/cred.h |  2 ++
>  5 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
> index cf5437d..7f9ed48 100644
> --- a/Documentation/x86/zero-page.txt
> +++ b/Documentation/x86/zero-page.txt
> @@ -27,6 +27,8 @@ Offset  Proto   NameMeaning
>  1E9/001  ALL eddbuf_entries  Number of entries in eddbuf (below)
>  1EA/001  ALL edd_mbr_sig_buf_entries Number of entries in 
> edd_mbr_sig_buffer
>   (below)
> +1EB/001  ALL kbd_status  Numlock is enabled
> +1EC/001  ALL secure_boot Kernel should enable secure boot 
> lockdowns
>  290/040  ALL edd_mbr_sig_buffer EDD MBR signatures
>  2D0/A00  ALL e820_mapE820 memory map table
>   (array of struct e820entry)
> diff --git a/arch/x86/boot/compressed/eboot.c 
> b/arch/x86/boot/compressed/eboot.c
> index b3e0227..3789356 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -724,6 +724,36 @@ fail:
>   return status;
>  }
>  
> +static int get_secure_boot(efi_system_table_t *_table)
> +{
> + u8 sb, setup;
> + unsigned long datasize = sizeof(sb);
> + efi_guid_t var_guid = EFI_GLOBAL_VARIABLE_GUID;
> + efi_status_t status;
> +
> + status = efi_call_phys5(sys_table->runtime->get_variable,
> + L"SecureBoot", _guid, NULL, , );
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + if (sb == 0)
> + return 0;
> +
> +
> + status = efi_call_phys5(sys_table->runtime->get_variable,
> + L"SetupMode", _guid, NULL, ,
> + );
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + if (setup == 1)
> + return 0;
> +
> + return 1;
> +}
> +
>  /*
>   * Because the x86 boot code expects to be passed a boot_params we
>   * need to create one ourselves (usually the bootloader would create
> @@ -1018,6 +1048,8 @@ struct boot_params *efi_main(void *handle, 
> efi_system_table_t *_table,
>   if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
>   goto fail;
>  
> + boot_params->secure_boot = get_secure_boot(sys_table);
> +
>   setup_graphics(boot_params);
>  
>   status = efi_call_phys3(sys_table->boottime->allocate_pool,
> diff --git a/arch/x86/include/asm/bootparam.h 
> b/arch/x86/include/asm/bootparam.h
> index 2ad874c..c7338e0 100644
> --- a/arch/x86/include/asm/bootparam.h
> +++ b/arch/x86/include/asm/bootparam.h
> @@ -114,7 +114,8 @@ struct boot_params {
>   __u8  eddbuf_entries;   /* 0x1e9 */
>   __u8  edd_mbr_sig_buf_entries;  /* 0x1ea */
>   __u8  kbd_status;   /* 0x1eb */
> - __u8  _pad6[5]; /* 0x1ec */
> + __u8  secure_boot;  /* 0x1ec */
> + __u8  _pad6[4]; /* 0x1ed */
>   struct setup_header hdr;/* setup header */  /* 0x1f1 */
>   __u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
>   __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];  /* 0x290 */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f4b9b80..239bf2a 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -947,6 +947,9 @@ void __init setup_arch(char **cmdline_p)
>  
>   io_delay_init();
>  
> + if (boot_params.secure_boot)
> + secureboot_enable();
> +
>   /*
>* Parse the ACPI tables for possible boot-time SMP configuration.
>*/
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index ebbed2c..a24faf1 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -170,6 +170,8 @@ extern int set_security_override_from_ctx(struct cred *, 
> const char *);
>  extern int set_create_files_as(struct cred *, struct inode *);
>  extern void __init cred_init(void);
>  
> +extern void secureboot_enable(void);
> +
>  /*
>   * check for validity of credentials
>   */
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from 

Re: [PATCH V2 07/10] Secure boot: Add a dummy kernel parameter that will switch on Secure Boot mode

2012-09-27 Thread Serge Hallyn
Quoting Matthew Garrett (m...@redhat.com):
> From: Josh Boyer 
> 
> This forcibly drops CAP_COMPROMISE_KERNEL from both cap_permitted and cap_bset
> in the init_cred struct, which everything else inherits from.  This works on
> any machine and can be used to develop even if the box doesn't have UEFI.
> 
> Signed-off-by: Josh Boyer 

Acked-by: Serge E. Hallyn 

> ---
>  kernel/cred.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/cred.c b/kernel/cred.c
> index de728ac..7e6e83f 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -623,6 +623,23 @@ void __init cred_init(void)
>0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
>  }
>  
> +void __init secureboot_enable()
> +{
> + pr_info("Secure boot enabled\n");
> + cap_lower((_cred)->cap_bset, CAP_COMPROMISE_KERNEL);
> + cap_lower((_cred)->cap_permitted, CAP_COMPROMISE_KERNEL);
> +}
> +
> +/* Dummy Secure Boot enable option to fake out UEFI SB=1 */
> +static int __init secureboot_enable_opt(char *str)
> +{
> + int sb_enable = !!simple_strtol(str, NULL, 0);
> + if (sb_enable)
> + secureboot_enable();
> + return 1;
> +}
> +__setup("secureboot_enable=", secureboot_enable_opt);
> +
>  /**
>   * prepare_kernel_cred - Prepare a set of credentials for a kernel service
>   * @daemon: A userspace daemon to be used as a reference
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 01/10] Secure boot: Add new capability

2012-09-27 Thread Serge Hallyn
Quoting Matthew Garrett (m...@redhat.com):
> Secure boot adds certain policy requirements, including that root must not
> be able to do anything that could cause the kernel to execute arbitrary code.
> The simplest way to handle this would seem to be to add a new capability
> and gate various functionality on that. We'll then strip it from the initial
> capability set if required.
> 
> Signed-off-by: Matthew Garrett 

Acked-by: Serge E. Hallyn 

> ---
>  include/linux/capability.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index d10b7ed..4345bc8 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -364,7 +364,11 @@ struct cpu_vfs_cap_data {
>  
>  #define CAP_BLOCK_SUSPEND36
>  
> -#define CAP_LAST_CAP CAP_BLOCK_SUSPEND
> +/* Allow things that trivially permit root to modify the running kernel */
> +
> +#define CAP_COMPROMISE_KERNEL  37
> +
> +#define CAP_LAST_CAP CAP_COMPROMISE_KERNEL
>  
>  #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
>  
> -- 
> 1.7.11.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-security-module" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 1/2] gpio: Add a block GPIO API to gpiolib

2012-09-27 Thread Jean-Christophe PLAGNIOL-VILLARD
On 23:22 Thu 27 Sep , Roland Stigge wrote:
> The recurring task of providing simultaneous access to GPIO lines (especially
> for bit banging protocols) needs an appropriate API.
> 
> This patch adds a kernel internal "Block GPIO" API that enables simultaneous
> access to several GPIOs in the same gpio_chip (bit mapped). Further, it adds a
> sysfs interface (/sys/class/gpio/gpiochipXX/block).
> 
> Signed-off-by: Roland Stigge 
> 
> ---
> NOTE: This is only useful if individual drivers implement the .get_block() and
> .set_block() functions. I'm providing an example implementation for max730x
> (see next patch), and can provide further driver patches after API review.
> 
> Thanks in advance!
> 
>  Documentation/gpio.txt |   52 +++
>  drivers/gpio/gpiolib.c |  121 
> +
>  include/asm-generic/gpio.h |7 ++
>  include/linux/gpio.h   |   24 
>  4 files changed, 204 insertions(+)
> 
> --- linux-2.6.orig/Documentation/gpio.txt
> +++ linux-2.6/Documentation/gpio.txt
> @@ -439,6 +439,51 @@ slower clock delays the rising edge of S
>  signaling rate accordingly.
>  
>  
> +Block GPIO (optional)
> +-
> +
> +The above described interface concentrates on handling single GPIOs.  
> However,
> +in applications where it is critical to set several GPIOs at once, this
> +interface doesn't work well, e.g. bit-banging protocols via GPIO lines.
> +Consider a GPIO controller that is connected via a slow I2C line. When
> +switching two or more GPIOs one after another, there can be considerable time
> +between those events. This is solved by an interface called Block GPIO:
> +
> +void gpio_get_block(unsigned int gpio, u8* values, size_t size);
> +void gpio_set_block(unsigned int gpio, u8* set, u8* clr, size_t size);
> +
> +The function gpio_get_block() detects the current state of several GPIOs at
> +once, practically by doing only one query at the hardware level (e.g. memory
> +mapped or via bus transfers like I2C). There are some limits to this 
> interface:
> +A certain gpio_chip (see below) must be specified via the gpio parameter as 
> the
> +first GPIO in the gpio_chip group. The Block GPIO interface only supports
> +simultaneous handling of GPIOs in the same gpio_chip group since different
> +gpio_chips typically map to different GPIO hardware blocks.
so basicaly you use a gpio numberthat you do not request, that is maybe
requested. This is broken if you want to get or set block you need to pass the
list of GPIO you want to control not some fancy magic

Otherwise this will end be broken code.

And how you can hope to describe this via DT

Best Regards,
J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 56/57] power: abx500_chargalg: Fix quick re-attach charger issue.

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:53AM -0600, mathieu.poir...@linaro.org wrote:
> From: Marcus Cooper 
> 
> The patch for 426250 added a change to check for the quick

What is 426250? I guess it's some internal bug#... but since we don't have
access to that info, it's better to describe which upstream commit caused
this.

> re-attachment of the charger connection as an error in the
> AB8500 HW meant that a quick detach/attach wouldn't be
> detected.
> This patch isolates the original change so that newer AB's
> are not affected.
> 
> Signed-off-by: Marcus Cooper 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Martin SJOBLOM 
> Reviewed-by: Hakan BERG 
> Reviewed-by: Jonas ABERG 
> ---
>  drivers/power/abx500_chargalg.c |   11 ++-
>  1 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c
> index c8849af..7a81e4e 100644
> --- a/drivers/power/abx500_chargalg.c
> +++ b/drivers/power/abx500_chargalg.c
> @@ -1299,11 +1299,12 @@ static void abx500_chargalg_algorithm(struct 
> abx500_chargalg *di)
>   abx500_chargalg_check_charger_voltage(di);
>   charger_status = abx500_chargalg_check_charger_connection(di);
>  
> - ret = abx500_chargalg_check_charger_enable(di);
> - if (ret < 0)
> - dev_err(di->dev, "Checking charger if enabled error: %d line: 
> %d\n",
> - ret, __LINE__);
> -
> + if (is_ab8500(di->parent)) {
> + ret = abx500_chargalg_check_charger_enable(di);
> + if (ret < 0)
> + dev_err(di->dev, "Checking charger is enabled error");
> + dev_err(di->dev, ": Returned Value %d\n", ret);

Ouch. Missing braces. No need for two dev_err().

> + }
>   /*
>* First check if we have a charger connected.
>* Also we don't allow charging of unknown batteries if configured
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 55/57] power: ab8500_charger: Add UsbLineCtrl2 reference

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:52AM -0600, mathieu.poir...@linaro.org wrote:
> From: Marcus Cooper 
> 
> When the state of USB Charge detection is changed then the calls
> use a define for another register in other bank. This change
> creates a new define for the correct register and removes the
> magic numbers that are present.
> 
> Signed-off-by: Marcus Cooper 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Hakan BERG 
> Reviewed-by: Jonas ABERG 
> 
> Conflicts:
> 
>   drivers/power/ab8500_charger.c

Stray comment.

> ---
>  drivers/power/ab8500_charger.c   |   11 +--
>  include/linux/mfd/abx500/ab8500-bm.h |1 +
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
> index 7f8f362..afb4fda 100644
> --- a/drivers/power/ab8500_charger.c
> +++ b/drivers/power/ab8500_charger.c
> @@ -51,6 +51,7 @@
>  #define VBUS_DET_DBNC1   0x01
>  #define OTP_ENABLE_WD0x01
>  #define DROP_COUNT_RESET 0x01
> +#define USB_CH_DET   0x01
>  
>  #define MAIN_CH_INPUT_CURR_SHIFT 4
>  #define VBUS_IN_CURR_LIM_SHIFT   4
> @@ -2287,9 +2288,8 @@ static void ab8500_charger_usb_link_status_work(struct 
> work_struct *work)
>   USB_CH_ENA, USB_CH_ENA);
>   /*Enable charger detection*/
>   abx500_mask_and_set_register_interruptible(di->dev,
> - AB8500_USB,
> - AB8500_MCH_IPT_CURLVL_REG,
> - 0x01, 0x01);
> + AB8500_USB, AB8500_USB_LINE_CTRL2_REG,
> + USB_CH_DET, USB_CH_DET);
>   di->invalid_charger_detect_state = 1;
>   /*exit and wait for new link status interrupt.*/
>   return;
> @@ -2300,9 +2300,8 @@ static void ab8500_charger_usb_link_status_work(struct 
> work_struct *work)
>   "Invalid charger detected, state= 1\n");
>   /*Stop charger detection*/
>   abx500_mask_and_set_register_interruptible(di->dev,
> - AB8500_USB,
> - AB8500_MCH_IPT_CURLVL_REG,
> - 0x01, 0x00);
> + AB8500_USB, AB8500_USB_LINE_CTRL2_REG,
> + USB_CH_DET, 0x00);
>   /*Check link status*/
>   if (is_ab8500(di->parent))
>   ret = abx500_get_register_interruptible(di->dev,
> diff --git a/include/linux/mfd/abx500/ab8500-bm.h 
> b/include/linux/mfd/abx500/ab8500-bm.h
> index 721bd6d..6b69ad5 100644
> --- a/include/linux/mfd/abx500/ab8500-bm.h
> +++ b/include/linux/mfd/abx500/ab8500-bm.h
> @@ -23,6 +23,7 @@
>   * Bank : 0x5
>   */
>  #define AB8500_USB_LINE_STAT_REG 0x80
> +#define AB8500_USB_LINE_CTRL2_REG0x82
>  #define AB8500_USB_LINK1_STAT_REG0x94
>  
>  /*
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 54/57] power: ab8500_charger: Use USBLink1Status Register

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:51AM -0600, mathieu.poir...@linaro.org wrote:
> From: Marcus Cooper 
> 
> The newer AB's such as the AB8505, AB9540 etc include a
> USBLink1 Status register which detects a larger range of
> external devices. This should be used instead of the
> USBLine Status register.
> 
> Signed-off-by: Marcus Cooper 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Hakan BERG 
> Reviewed-by: Yang QU 
> Reviewed-by: Jonas ABERG 
> ---
>  drivers/power/ab8500_charger.c |   22 --
>  1 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
> index 3a97012..7f8f362 100644
> --- a/drivers/power/ab8500_charger.c
> +++ b/drivers/power/ab8500_charger.c
> @@ -2258,8 +2258,13 @@ static void ab8500_charger_usb_link_status_work(struct 
> work_struct *work)
>* to start the charging process. but by jumping
>* thru a few hoops it can be forced to start.
>*/
> - ret = abx500_get_register_interruptible(di->dev, AB8500_USB,
> - AB8500_USB_LINE_STAT_REG, );
> + if (is_ab8500(di->parent))
> + ret = abx500_get_register_interruptible(di->dev, AB8500_USB,
> + AB8500_USB_LINE_STAT_REG, );
> + else
> + ret = abx500_get_register_interruptible(di->dev, AB8500_USB,
> + AB8500_USB_LINK1_STAT_REG, );

How about

int reg = is_ab8500(di->parent) ? AB8500_USB_LINE_STAT_REG :
  AB8500_USB_LINK1_STAT_REG;

ret = abx500_get_register_interruptible(di->dev, AB8500_USB, reg, );

Shorter, clearer, and precisely fits into 80 columns -- must be good. :-)

> +
>   if (ret >= 0)
>   dev_dbg(di->dev, "UsbLineStatus register = 0x%02x\n", val);
>   else
> @@ -2299,10 +2304,15 @@ static void 
> ab8500_charger_usb_link_status_work(struct work_struct *work)
>   AB8500_MCH_IPT_CURLVL_REG,
>   0x01, 0x00);
>   /*Check link status*/
> - ret = abx500_get_register_interruptible(di->dev,
> - AB8500_USB,
> - AB8500_USB_LINE_STAT_REG,
> - );
> + if (is_ab8500(di->parent))
> + ret = abx500_get_register_interruptible(di->dev,
> + AB8500_USB, AB8500_USB_LINE_STAT_REG,
> + );
> + else
> + ret = abx500_get_register_interruptible(di->dev,
> + AB8500_USB, AB8500_USB_LINK1_STAT_REG,
> + );
> +

Same here. Actually, isn't it exactly the same as above? If so, then just
factor it into its own function.

>   dev_dbg(di->dev, "USB link status= 0x%02x\n",
>   (val & link_status) >> USB_LINK_STATUS_SHIFT);
>   di->invalid_charger_detect_state = 2;
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 53/57] power: ab8500_fg: Moving structure definitions to header file

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:50AM -0600, mathieu.poir...@linaro.org wrote:
> From: "Mathieu J. Poirier" 
> 
> Signed-off-by: Mathieu Poirier 
> ---
[...]
> diff --git a/drivers/power/ab8500_fg.h b/drivers/power/ab8500_fg.h
> new file mode 100644
> index 000..5cfadc2
> --- /dev/null
> +++ b/drivers/power/ab8500_fg.h
> @@ -0,0 +1,201 @@
> +/*
> + * Copyright (C) ST-Ericsson AB 2012
> + *
> + * Main and Back-up battery management driver.
> + *
> + * Note: Backup battery management is required in case of Li-Ion battery and 
> not
> + * for capacitive battery. HREF boards have capacitive battery and hence 
> backup
> + * battery management is not used and the supported code is available in this
> + * driver.
> + *
> + * License Terms: GNU General Public License v2
> + * Author: Johan Palsson 
> + * Author: Karl Komierowski 
> + */
> +

The change is dubious, since you don't seem to use the header anywhere
outside of _fg, so technically there's no need for it.

But if you want to logically separate structs and definitions, it's fine.

OK, but at least you have to check for multiple inclusions, I guess.
I.e. #ifndef __AB8500_FG_H...

> +#define MILLI_TO_MICRO   1000
> +#define FG_LSB_IN_MA 1627
> +#define QLSB_NANO_AMP_HOURS_X10  1129
> +#define INS_CURR_TIMEOUT (3 * HZ)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 v2] Optimize CRC32C calculation using PCLMULQDQ in crc32c-intel module

2012-09-27 Thread H. Peter Anvin

On 09/27/2012 03:44 PM, Tim Chen wrote:

Version 2
This version of the patch series fixes compilation errors for
32 bit x86 targets.

Version 1
This patch series optimized CRC32C calculations with PCLMULQDQ
instruction for crc32c-intel module.  It speeds up the original
implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
more.  The tcrypt module was enhanced for doing speed test
on crc32c calculations.



Herbert - you are handling this one, right?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 52/57] power: abx500_chargalg: Use hrtimer

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:49AM -0600, mathieu.poir...@linaro.org wrote:
> From: Hakan Berg 
> 
> Timers used for charging safety and maintenance must work even when
> CPU is power collapsed. By using hrtimers with realtime clock, system
> is able to trigger an alarm that wakes the CPU up and make it possible
> to handle the event.
> 
> Allow a little slack of 5 minutes to the hrtimers to allow CPU to be
> waked up in a more optimal power saving way. A 5 minute delay to
> time out timers on hours does not impact on safety.
> 
> Signed-off-by: Hakan Berg 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Mian Yousaf KAUKAB 
> ---
>  drivers/power/abx500_chargalg.c |   94 
> ++-
>  1 files changed, 53 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c
> index 636d970..c8849af 100644
> --- a/drivers/power/abx500_chargalg.c
> +++ b/drivers/power/abx500_chargalg.c
> @@ -1,5 +1,6 @@
>  /*
>   * Copyright (C) ST-Ericsson SA 2012
> + * Copyright (c) 2012 Sony Mobile Communications AB
>   *
>   * Charging algorithm driver for abx500 variants
>   *
> @@ -8,11 +9,13 @@
>   *   Johan Palsson 
>   *   Karl Komierowski 
>   *   Arun R Murthy 
> + *   Imre Sunyi 
>   */
>  
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -32,6 +35,12 @@
>  /* End-of-charge criteria counter */
>  #define EOC_COND_CNT 10
>  
> +/* One hour expressed in seconds */
> +#define ONE_HOUR_IN_SECONDS  3600
> +
> +/* Five minutes expressed in seconds */
> +#define FIVE_MINUTES_IN_SECONDS  300
> +
>  #define to_abx500_chargalg_device_info(x) container_of((x), \
>   struct abx500_chargalg, chargalg_psy);
>  
> @@ -245,8 +254,8 @@ struct abx500_chargalg {
>   struct delayed_work chargalg_periodic_work;
>   struct delayed_work chargalg_wd_work;
>   struct work_struct chargalg_work;
> - struct timer_list safety_timer;
> - struct timer_list maintenance_timer;
> + struct hrtimer safety_timer;
> + struct hrtimer maintenance_timer;
>   struct kobject chargalg_kobject;
>  };
>  
> @@ -261,38 +270,47 @@ BLOCKING_NOTIFIER_HEAD(charger_notifier_list);
>  
>  /**
>   * abx500_chargalg_safety_timer_expired() - Expiration of the safety timer
> - * @data:pointer to the abx500_chargalg structure
> + * @timer:   pointer to the hrtimer structure
>   *
>   * This function gets called when the safety timer for the charger
>   * expires
>   */
> -static void abx500_chargalg_safety_timer_expired(unsigned long data)
> +static enum hrtimer_restart
> +abx500_chargalg_safety_timer_expired(struct hrtimer *timer)
>  {
> - struct abx500_chargalg *di = (struct abx500_chargalg *) data;
> + struct abx500_chargalg *di = container_of(timer, struct abx500_chargalg,
> + safety_timer);

Empty line here.

>   dev_err(di->dev, "Safety timer expired\n");
>   di->events.safety_timer_expired = true;
>  
>   /* Trigger execution of the algorithm instantly */
>   queue_work(di->chargalg_wq, >chargalg_work);
> +
> + return HRTIMER_NORESTART;
>  }
>  
>  /**
>   * abx500_chargalg_maintenance_timer_expired() - Expiration of
>   * the maintenance timer
> - * @i:   pointer to the abx500_chargalg structure
> + * @timer:   pointer to the timer structure
>   *
>   * This function gets called when the maintenence timer
>   * expires
>   */
> -static void abx500_chargalg_maintenance_timer_expired(unsigned long data)
> +static enum hrtimer_restart
> +abx500_chargalg_maintenance_timer_expired(struct hrtimer *timer)
> +
>  {
>  
> - struct abx500_chargalg *di = (struct abx500_chargalg *) data;
> + struct abx500_chargalg *di = container_of(timer, struct abx500_chargalg,
> + maintenance_timer);
>   dev_dbg(di->dev, "Maintenance timer expired\n");
>   di->events.maintenance_timer_expired = true;
>  
>   /* Trigger execution of the algorithm instantly */
>   queue_work(di->chargalg_wq, >chargalg_work);
> +
> + return HRTIMER_NORESTART;
>  }
>  
>  /**
> @@ -392,19 +410,16 @@ static int 
> abx500_chargalg_check_charger_connection(struct abx500_chargalg *di)
>   */
>  static void abx500_chargalg_start_safety_timer(struct abx500_chargalg *di)
>  {
> - unsigned long timer_expiration = 0;
> + /* Charger-dependent expiration time in hours*/
> + int timer_expiration = 0;
>  
>   switch (di->chg_info.charger_type) {
>   case AC_CHG:
> - timer_expiration =
> - round_jiffies(jiffies +
> - (di->bat->main_safety_tmr_h * 3600 * HZ));
> + timer_expiration = di->bat->main_safety_tmr_h;
>   break;
>  
>   case USB_CHG:
> - timer_expiration =
> - round_jiffies(jiffies +
> - (di->bat->usb_safety_tmr_h 

Re: [PATCH 51/57] power: ab8500: Re-alignment with internal developement.

2012-09-27 Thread Anton Vorontsov
On Thu, Sep 27, 2012 at 07:35:10PM -0700, Anton Vorontsov wrote:
[...]
> abx500_chargalg_check_safety_timer()
> {
>   if (di->batt_data.percent < 100) {
>   dev_dbg(di->dev, "stopping safety timer\n");
>   abx500_chargalg_stop_safety_timer(di);
>   return;
>   }
> 
>   dev_dbg(di->dev, "starting safety timer\n");
>   abx500_chargalg_start_safety_timer(di);
> }
> 
> The thing is, restarting an already pending timer is no-op, unless you
> program the timer before the previously programmed value.

Oh, actually, no. It's no-op if old expires == new expires. :-/

So, yes, you need to check for pending before (re)starting. So, it'll
become

if (pending)
return;
abx500_chargalg_start_safety_timer(di);

(Or better, start_safety_timer() should do that, and it seems that it
already does.)

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/2] ACPI: Add early console framework for DBGP/DBG2.

2012-09-27 Thread Lv Zheng
Microsoft Debug Port Table (DBGP or DBG2) is required for Windows SoC
platforms.  This patch is introduced to fix the gap between Windows
and Linux.

Signed-off-by: Lv Zheng 
---
 Documentation/kernel-parameters.txt |1 +
 arch/x86/Kconfig.debug  |   15 +++
 arch/x86/kernel/acpi/boot.c |1 +
 arch/x86/kernel/early_printk.c  |   13 +++
 drivers/acpi/Makefile   |2 +
 drivers/acpi/early_printk.c |  201 +++
 include/linux/acpi.h|   24 +
 7 files changed, 257 insertions(+)
 create mode 100644 drivers/acpi/early_printk.c

Index: linux-acpi/Documentation/kernel-parameters.txt
===
--- linux-acpi.orig/Documentation/kernel-parameters.txt 2012-09-27 
22:35:07.0 +0800
+++ linux-acpi/Documentation/kernel-parameters.txt  2012-09-27 
22:35:44.0 +0800
@@ -763,6 +763,7 @@
earlyprintk=serial[,ttySn[,baudrate]]
earlyprintk=ttySn[,baudrate]
earlyprintk=dbgp[debugController#]
+   earlyprintk=acpi[debugController#]
 
Append ",keep" to not disable it when the real console
takes over.
Index: linux-acpi/arch/x86/Kconfig.debug
===
--- linux-acpi.orig/arch/x86/Kconfig.debug  2012-09-27 22:35:07.0 
+0800
+++ linux-acpi/arch/x86/Kconfig.debug   2012-09-27 22:35:44.0 +0800
@@ -59,6 +59,21 @@
  with klogd/syslogd or the X server. You should normally N here,
  unless you want to debug such a crash. You need usb debug device.
 
+config EARLY_PRINTK_ACPI
+   bool "Early printk launcher via ACPI debug port tables"
+   depends on EARLY_PRINTK && ACPI
+   ---help---
+ Write kernel log output directly into the debug ports described
+ in the ACPI tables known as DBGP and DBG2.
+
+ To enable such debugging facilities, you need to enable this
+ configuration option and append the "earlyprintk=acpi" kernel
+ parameter through the boot loaders.  Please refer the
+ "Documentation/kernel-parameters.txt" for details.  Since this
+ is an early console launcher, you still need to enable actual
+ early console drivers that are suitable for your platform.
+ If in doubt, say "N".
+
 config DEBUG_STACKOVERFLOW
bool "Check for stack overflows"
depends on DEBUG_KERNEL
Index: linux-acpi/arch/x86/kernel/acpi/boot.c
===
--- linux-acpi.orig/arch/x86/kernel/acpi/boot.c 2012-09-27 22:35:07.0 
+0800
+++ linux-acpi/arch/x86/kernel/acpi/boot.c  2012-09-27 22:35:13.0 
+0800
@@ -1518,6 +1518,7 @@
return;
}
 
+   acpi_early_console_parse();
acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
/*
Index: linux-acpi/arch/x86/kernel/early_printk.c
===
--- linux-acpi.orig/arch/x86/kernel/early_printk.c  2012-09-27 
22:35:07.0 +0800
+++ linux-acpi/arch/x86/kernel/early_printk.c   2012-09-27 22:35:44.0 
+0800
@@ -200,6 +200,15 @@
register_console(early_console);
 }
 
+#ifdef CONFIG_EARLY_PRINTK_ACPI
+#include 
+
+int __init acpi_early_console_setup(struct acpi_debug_port *info)
+{
+   return 0;
+}
+#endif
+
 static int __init setup_early_printk(char *buf)
 {
int keep;
@@ -236,6 +245,10 @@
if (!strncmp(buf, "dbgp", 4) && !early_dbgp_init(buf + 4))
early_console_register(_dbgp_console, keep);
 #endif
+#ifdef CONFIG_EARLY_PRINTK_ACPI
+   if (!strncmp(buf, "acpi", 4))
+   acpi_early_console_init(buf + 4, keep);
+#endif
 #ifdef CONFIG_HVC_XEN
if (!strncmp(buf, "xen", 3))
early_console_register(_console, keep);
Index: linux-acpi/drivers/acpi/Makefile
===
--- linux-acpi.orig/drivers/acpi/Makefile   2012-09-27 22:35:07.0 
+0800
+++ linux-acpi/drivers/acpi/Makefile2012-09-27 22:35:36.0 +0800
@@ -46,6 +46,8 @@
 acpi-y += video_detect.o
 endif
 
+obj-$(CONFIG_EARLY_PRINTK_ACPI)+= early_printk.o
+
 # These are (potentially) separate modules
 obj-$(CONFIG_ACPI_AC)  += ac.o
 obj-$(CONFIG_ACPI_BUTTON)  += button.o
Index: linux-acpi/drivers/acpi/early_printk.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-acpi/drivers/acpi/early_printk.c  2012-09-27 22:35:13.0 
+0800
@@ -0,0 +1,201 @@
+/*
+ *  acpi/early_printk.c - ACPI Boot-Time Debug Ports
+ *
+ *  Copyright (C) 2012 Lv Zheng 
+ *
+ * 

[PATCH v4 2/2] ACPI: Add Intel MID SPI early console support.

2012-09-27 Thread Lv Zheng
DesignWare SPI UART is used as one of the debug ports on Low Power Intel
Architecture (LPIA) platforms.  This patch is introduced to support this
debugging console reported by ACPI DBGP/DBG2.  The original MID SPI
early console stuff is also refined to co-exist with the new ACPI usage
model.

Signed-off-by: Lv Zheng 
---
 Documentation/kernel-parameters.txt|1 +
 arch/x86/Kconfig.debug |   23 +++
 arch/x86/include/asm/mrst.h|2 +-
 arch/x86/kernel/early_printk.c |   12 +-
 arch/x86/platform/mrst/early_printk_mrst.c |  186 +--
 drivers/platform/x86/Makefile  |2 +
 drivers/platform/x86/early/Makefile|5 +
 drivers/platform/x86/early/intel_mid_spi.c |  220 
 include/acpi/actbl2.h  |1 +
 include/linux/intel_mid_early.h|   12 ++
 10 files changed, 283 insertions(+), 181 deletions(-)
 create mode 100644 drivers/platform/x86/early/Makefile
 create mode 100644 drivers/platform/x86/early/intel_mid_spi.c
 create mode 100644 include/linux/intel_mid_early.h

Index: linux-acpi/arch/x86/Kconfig.debug
===
--- linux-acpi.orig/arch/x86/Kconfig.debug  2012-09-27 22:35:13.0 
+0800
+++ linux-acpi/arch/x86/Kconfig.debug   2012-09-27 22:35:14.0 +0800
@@ -43,9 +43,32 @@
  with klogd/syslogd or the X server. You should normally N here,
  unless you want to debug such a crash.
 
+config EARLY_PRINTK_INTEL_MID_SPI
+   bool "Early printk for Intel MID SPI UART port"
+   depends on EARLY_PRINTK
+   ---help---
+ Write kernel log output directly into the MID SPI UART debug port.
+
+ Intel MID platforms are using DesignWare SPI UART as its debug
+ console.  This option does not introduce actual early console into
+ the kernel binary, but is required by a real early console
+ implementation (EARLY_PRINTK_INTEL_MID or EARLY_PRINTK_ACPI).
+ You should normally N here unless you need to do kernel booting
+ development.
+
 config EARLY_PRINTK_INTEL_MID
bool "Early printk for Intel MID platform support"
depends on EARLY_PRINTK && X86_INTEL_MID
+   select EARLY_PRINTK_INTEL_MID_SPI
+   ---help---
+ Write kernel log output directly into the MID SPI UART debug port.
+
+ Intel MID platforms are always equipped with SPI debug ports and
+ USB OTG debug ports. To enable these debugging facilities, you
+ need to pass "earlyprintk=mrst" parameter to the kernel through
+ boot loaders.  Please see "Documentation/kernel-parameter.txt" for
+ details.  You should normally N here unless you need to do kernel
+ booting development.
 
 config EARLY_PRINTK_DBGP
bool "Early printk via EHCI debug port"
Index: linux-acpi/arch/x86/include/asm/mrst.h
===
--- linux-acpi.orig/arch/x86/include/asm/mrst.h 2012-09-27 22:35:05.0 
+0800
+++ linux-acpi/arch/x86/include/asm/mrst.h  2012-09-27 22:35:41.0 
+0800
@@ -12,6 +12,7 @@
 #define _ASM_X86_MRST_H
 
 #include 
+#include 
 
 extern int pci_mrst_init(void);
 extern int __init sfi_parse_mrtc(struct sfi_table_header *table);
@@ -63,7 +64,6 @@
 #define SFI_MTMR_MAX_NUM 8
 #define SFI_MRTC_MAX   8
 
-extern struct console early_mrst_console;
 extern void mrst_early_console_init(void);
 
 extern struct console early_hsu_console;
Index: linux-acpi/arch/x86/kernel/early_printk.c
===
--- linux-acpi.orig/arch/x86/kernel/early_printk.c  2012-09-27 
22:35:13.0 +0800
+++ linux-acpi/arch/x86/kernel/early_printk.c   2012-09-27 22:35:42.0 
+0800
@@ -205,6 +205,16 @@
 
 int __init acpi_early_console_setup(struct acpi_debug_port *info)
 {
+#ifdef CONFIG_EARLY_PRINTK_INTEL_MID_SPI
+   if (info->port_type == ACPI_DBG2_SERIAL_PORT
+   && info->port_subtype == ACPI_DBG2_INTEL_MID_SPI
+   && info->register_count > 0) {
+   mid_spi_early_console_init((u32)(info->registers[0].address));
+   early_console_register(_spi_early_console,
+  acpi_early_console_keep());
+   }
+#endif
+
return 0;
 }
 #endif
@@ -256,7 +266,7 @@
 #ifdef CONFIG_EARLY_PRINTK_INTEL_MID
if (!strncmp(buf, "mrst", 4)) {
mrst_early_console_init();
-   early_console_register(_mrst_console, keep);
+   early_console_register(_spi_early_console, keep);
}
 
if (!strncmp(buf, "hsu", 3)) {
Index: linux-acpi/arch/x86/platform/mrst/early_printk_mrst.c
===
--- linux-acpi.orig/arch/x86/platform/mrst/early_printk_mrst.c  2012-09-27 

[PATCH v4 0/2] ACPI: DBGP/DBG2 early console support for LPIA.

2012-09-27 Thread Lv Zheng
Microsoft Debug Port Table (DBGP or DBG2) is used by the Windows SoC
platforms to describe their debugging facilities.
Recent Low Power Intel Architecture (LPIA) platforms have utilized
this for the SPI UART debug ports that are resident on their debug
boards.

This patch set enables the DBGP/DBG2 debug ports as an Linux early
console launcher.
The SPI UART debug ports support is also refined to co-exist with this
new usage model.

To use this facility on LPIA platforms, you need to enable the following
kernel configurations:
  CONFIG_EARLY_PRINTK_ACPI=y
  CONFIG_EARLY_PRINTK_INTEL_MID_SPI=y
Then you need to append the following kernel parameter to the kernel
command line in your the boot loader configuration file:
  earlyprintk=acpi

There is a dilemma in designing this patch set.  There should be three
steps to enable an early console for an operating system:
1. Probe: In this stage, the Linux kernel can detect the early consoles
  and the base address of their register block can be determined.
  This can be done by parsing the descriptors in the ACPI DBGP/DBG2
  tables.  Note that acpi_table_init() must be called before
  parsing.
2. Setup: In this stage, the Linux kernel can apply user specified
  configuration options (ex. baudrate of serial ports) for the
  early consoles.  This is done by parsing the early parameters
  passed to the kernel from the boot loaders.  Note that
  parse_early_params() is called very early to allow parameters to
  be passed to other kernel subsystems.
3. Start: In this stage, the Linux kernel can make the console available
  to output messages.  Since early consoles are always used for
  kernel boot up debugging, this must be done as early as possible
  to arm the kernel with more testability the kernel subsystems.
  Note that, this stage happens when the register_console() is
  called.
The preferred sequence for the above steps is:
   +-++---+++
   | ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK START |
   +-++---+++
But unfortunately, in the current x86 implementation, early parameters and
early printk initialization are called before acpi_table_init() which
requires early memory mapping facility.
There are some choices for me to design this patch set:
1. Invoking acpi_table_init() before parse_early_param() to maintain the
   sequence:
   +-++---+++
   | ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK START |
   +-++---+++
   This requires other subsystem maintainers' review to ensure no
   regressions will be introduced.  As far as I know, one kind of issue
   might be found in EFI subsystsm:
   The EFI boot services and runtime services are mixed up in the x86
   specific initialization process before the ACPI table initialization.
   Things are much worse that you even cannot disable the runtime services
   while still allow the boot services codes to be executed in the kernel
   compilation stage.  Enabling the early consoles after the ACPI table
   initialization will make it difficult to debug the runtime BIOS bugs.
   If any progress is made to the kernel boot sequences, please let me
   know.  I'll be willing to redesign the ACPI DBGP/DBG2 console probing
   facility.  You can reach me at .
2. Modifying above sequece to make it look like:
   +---++-+++
   | EARLY_PARAM SETUP | -> | ACPI DBGP PROBE | -> | EARLY_RPINTK START |
   +---++-+++
   Early consoles started in this style will lose some debuggabilities in
   the kernel boot up.  If the system does not crash very early,
   developers still can see the bufferred kernel outputs when the
   register_console() is called.
   Current early console implementation need to be modified to split their
   initialization codes into tow part:
   1. Detecting hardware.  This can be called in the PROBE stage.
   2. Applying user parameters.  This can be called in the SETUP stage.
   Individual early console drver maintainers need to be involved to avoid
   regressions that might occur on this modification as the maintainers
   might offer the real tests rather than I can do. 
3. Introducing a barely new debugging facility that does not relate to the
   current early console implementation to allow automatic detection for
   the early consoles.
   +---+++
   | EARLY_PARAM SETUP | -> | EARLY_RPINTK START |
   +---+++
   +-+++
   | ACPI DBGP PROBE | -> | EARLY_RPINTK START |
   +-+

Re: [PATCH 51/57] power: ab8500: Re-alignment with internal developement.

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:48AM -0600, mathieu.poir...@linaro.org wrote:
> From: "Mathieu J. Poirier" 
> 
> A lot of developement happened internally since the first
> mainlining of the battery managmenent driver.  Most of the
> new code can be historically accounted for but some of it
> can't.
> 
> This patch is a gathering of the code for which history was
> lost but still relevant to the well being of the driver.

Nope, sorry. The patch has to be logically separated and the description
should precisely describe the change(s), not 'let's gather some missing
bits.'

> Signed-off-by: Mathieu Poirier 
> ---
>  drivers/power/ab8500_charger.c  |2 +-
>  drivers/power/abx500_chargalg.c |   66 
> +++
>  2 files changed, 47 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
> index 1290470..3a97012 100644
> --- a/drivers/power/ab8500_charger.c
> +++ b/drivers/power/ab8500_charger.c
> @@ -720,7 +720,7 @@ static int ab8500_charger_max_usb_curr(struct 
> ab8500_charger *di,
>   di->is_aca_rid = 0;
>   break;
>   case USB_STAT_ACA_RID_C_HS_CHIRP:
> - case USB_STAT_ACA_RID_C_NM:
> + case USB_STAT_ACA_RID_C_NM:
>   di->max_usb_in_curr.usb_type_max = USB_CH_IP_CUR_LVL_1P5;
>   di->is_aca_rid = 1;
>   break;
> diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c
> index 1df238f..636d970 100644
> --- a/drivers/power/abx500_chargalg.c
> +++ b/drivers/power/abx500_chargalg.c
> @@ -27,7 +27,7 @@
>  #include 
>  
>  /* Watchdog kick interval */
> -#define CHG_WD_INTERVAL  (6 * HZ)
> +#define CHG_WD_INTERVAL  (60 * HZ)
>  
>  /* End-of-charge criteria counter */
>  #define EOC_COND_CNT 10
> @@ -513,7 +513,7 @@ static int abx500_chargalg_kick_watchdog(struct 
> abx500_chargalg *di)
>  static int abx500_chargalg_ac_en(struct abx500_chargalg *di, int enable,
>   int vset, int iset)
>  {
> - static int ab8500_chargalg_ex_ac_enable_toggle;
> + static int abx500_chargalg_ex_ac_enable_toggle;
>  
>   if (!di->ac_chg || !di->ac_chg->ops.enable)
>   return -ENXIO;
> @@ -529,10 +529,10 @@ static int abx500_chargalg_ac_en(struct abx500_chargalg 
> *di, int enable,
>  
>   /*enable external charger*/
>   if (enable && di->ac_chg->external &&
> - !ab8500_chargalg_ex_ac_enable_toggle) {
> + !abx500_chargalg_ex_ac_enable_toggle) {
>   blocking_notifier_call_chain(_notifier_list,
>   0, di->dev);
> - ab8500_chargalg_ex_ac_enable_toggle++;
> + abx500_chargalg_ex_ac_enable_toggle++;
>   }
>  
>   return di->ac_chg->ops.enable(di->ac_chg, enable, vset, iset);
> @@ -899,6 +899,27 @@ static void handle_maxim_chg_curr(struct abx500_chargalg 
> *di)
>   }
>  }
>  
> +static void abx500_chargalg_check_safety_timer(struct abx500_chargalg *di)
> +{
> + /*
> +  * The safety timer will not be started until the capacity reported
> +  * from the FG algorithm is 100%. Then we know that the amount of
> +  * charge that's gone into the battery is enough for the battery
> +  * to be full. If it has not reached end-of-charge before the safety
> +  * timer has expired then we know that the battery is overcharged
> +  * and charging will be stopped to protect the battery.
> +  */
> + if (di->batt_data.percent == 100 &&
> + !timer_pending(>safety_timer)) {

Wrong indentation.

> + abx500_chargalg_start_safety_timer(di);
> + dev_dbg(di->dev, "start safety timer\n");
> + } else if (di->batt_data.percent != 100 &&
> + timer_pending(>safety_timer)) {

Ditto.

Plus, I think this can be simplified, you can just do.

abx500_chargalg_check_safety_timer()
{
if (di->batt_data.percent < 100) {
dev_dbg(di->dev, "stopping safety timer\n");
abx500_chargalg_stop_safety_timer(di);
return;
}

dev_dbg(di->dev, "starting safety timer\n");
abx500_chargalg_start_safety_timer(di);
}

The thing is, restarting an already pending timer is no-op, unless you
program the timer before the previously programmed value.

And stopping unarmed timer should be alo no-op.

(btw, these dev_dbg() should probably be placed into the start/stop
functions, to catch all the users/invocations in the debug log).

> + abx500_chargalg_stop_safety_timer(di);
> + dev_dbg(di->dev, "stop safety timer\n");
> + }
> +}
> +
>  static int abx500_chargalg_get_ext_psy_data(struct device *dev, void *data)
>  {
>   struct power_supply *psy;
> @@ -1125,6 +1146,10 @@ static int abx500_chargalg_get_ext_psy_data(struct 
> device *dev, void *data)
>   switch (ext->type) {
>   case 

Re: [PATCH] printk: drop ambiguous LOG_CONT flag

2012-09-27 Thread Kay Sievers
On Thu, Sep 27, 2012 at 5:46 PM, "Jan H. Schönherr"
 wrote:
> Am 27.09.2012 15:39, schrieb Kay Sievers:

>> It is a flag that we have not been able to merge a continuation line
>> in the buffer, because we had a race with another thread, or the
>> console lock was taken for a long time and we couldn't use the merge
>> buffer.
>
> But it is also set, if we don't know yet, whether there is going to
> be a continuation (printk.c, line 1583):
>
> log_store(facility, level, lflags | LOG_CONT, 0,
>   dict, dictlen, text, text_len);
>
> This confuses devkmsg_read() and msg_print_text() later on.

Yeah, I can see that here too. Tested your patch and seems to behave
well with a bunch of other tests I still have available. Looks good
and worth to try:
  Tested-By: Kay Sievers 

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints

2012-09-27 Thread H. Peter Anvin
On 09/27/2012 03:33 PM, Seiji Aguchi wrote:
> Hi,
> 
>> ... except the cost can be reduced to zero *AND* be made into a more general 
>> mechanism by simply hooking the IDT.
> 
> Thank you for giving me the comment.
> In my understanding,  we can introduce a more general mechanism by 
> sandwiching an existing handler between tracepoints.
> The pseudo code is like this:
> 
> @@ -17,7 +18,7 @@ static void default_threshold_interrupt(void)
>  
>  void (*mce_threshold_vector)(void) = default_threshold_interrupt;
>  
> -asmlinkage void smp_threshold_interrupt(void)
> +static void do_smp_threshold_interrupt(void)
>  {
>   irq_enter();
>   exit_idle();
> @@ -27,3 +28,10 @@ asmlinkage void smp_threshold_interrupt(void)
>   /* Ack only at the end to avoid potential reentry */
>   ack_APIC_irq();
>  }
> +
> +asmlinkage void smp_threshold_interrupt(void) {
> + trace_arch_irq_vector_entry(THRESHOLD_APIC_VECTOR);
> + do_smp_threshold_interrupt();
> + trace_arch_irq_vector_exit(THRESHOLD_APIC_VECTOR);
> +}
> 
> If I misunderstand something, please let me know.
> 

Quite.

These functions are being invoked from the IDT, which is an indirect
pointer structure.  When not being traced, there is absolutely no reason
why it should go through a thunk with tracepoints.

-hpa



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 50/57] power: ab8500-chargalg: update battery health on safety timer exp

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:47AM -0600, mathieu.poir...@linaro.org wrote:
> From: Hakan Berg 
> 
> When the charging safety timer is elapsed the battery health is shown as 
> "Good".
> This is misleading and also hard to distingiush problems reported on "phone
> discharges although charger is attached".
> 
> When safety timer elapses that is an indication of a fault in the battery of
> some kind. Hence report as POWER_SUPPLY_HEALTH_UNSPEC_FAILURE.
> 
> Signed-off-by: Hakan Berg 
> Signed-off-by: Mathieu Poirier 
> Reviewed-by: Arun MURTHY 
> Reviewed-by: Karl KOMIEROWSKI 
> ---
>  drivers/power/abx500_chargalg.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c
> index 4db0ef0..1df238f 100644
> --- a/drivers/power/abx500_chargalg.c
> +++ b/drivers/power/abx500_chargalg.c
> @@ -1711,6 +1711,10 @@ static int abx500_chargalg_get_property(struct 
> power_supply *psy,
>   val->intval = POWER_SUPPLY_HEALTH_COLD;
>   else
>   val->intval = POWER_SUPPLY_HEALTH_OVERHEAT;
> + } else if (di->charge_state == STATE_SAFETY_TIMER_EXPIRED ||
> + di->charge_state ==
> + STATE_SAFETY_TIMER_EXPIRED_INIT) {

Wrong indentation, no need to wrap lines. (You could align to
di->change_state, and thus it'll look prettier and there still wouldn't
need for line wrapping.)

> + val->intval = POWER_SUPPLY_HEALTH_UNSPEC_FAILURE;
>   } else {
>   val->intval = POWER_SUPPLY_HEALTH_GOOD;
>   }
> -- 
> 1.7.5.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-27 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu 

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline >/sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.


But this time there is not a function associated with add_memory.



CC: David Rientjes 
CC: Jiang Liu 
CC: Len Brown 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Christoph Lameter 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 
---
  drivers/acpi/acpi_memhotplug.c |2 +-
  drivers/base/memory.c  |9 +++--
  include/linux/memory_hotplug.h |3 ++-
  mm/memory_hotplug.c|   22 ++
  4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
 */
list_for_each_entry_safe(info, n, _device->res_list, list) {
if (info->enabled) {
-   result = remove_memory(info->start_addr, info->length);
+   result = offline_memory(info->start_addr, info->length);
if (result)
return result;
}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
  static int
  memory_block_action(unsigned long phys_index, unsigned long action)
  {
-   unsigned long start_pfn, start_paddr;
+   unsigned long start_pfn;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
struct page *first_page;
int ret;
  
  	first_page = pfn_to_page(phys_index << PFN_SECTION_SHIFT);

+   start_pfn = page_to_pfn(first_page);
  
  	switch (action) {

case MEM_ONLINE:
-   start_pfn = page_to_pfn(first_page);
-
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
  
  			ret = online_pages(start_pfn, nr_pages);

break;
case MEM_OFFLINE:
-   start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages << PAGE_SHIFT);
+   ret = offline_pages(start_pfn, nr_pages);
break;
default:
WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
  extern int mem_online_node(int nid);
  extern int add_memory(int nid, u64 start, u64 size);
  extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
  extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..bb42316 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
return offlined;
  }
  
-static int __ref offline_pages(unsigned long start_pfn,

+static int __ref __offline_pages(unsigned long start_pfn,
  unsigned long end_pfn, unsigned long timeout)
  {
unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
return ret;
  }
  
-int remove_memory(u64 start, u64 size)

+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
  {
-   unsigned long start_pfn, end_pfn;
+   return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
  
-	start_pfn = PFN_DOWN(start);

-   end_pfn = start_pfn + PFN_DOWN(size);
-   return offline_pages(start_pfn, end_pfn, 120 * HZ);
+int offline_memory(u64 

[PATCH 1/1] hid:Fix problem on GeneralTouch multi-touchscreen

2012-09-27 Thread GeneralTouch
From: Xianhan Yu 

Fix the touch-up no response problem on GeneralTouch twofingers touchscreen and 
modify the driver for new GeneralTouch PWT touchscreen.

Signed-off-by: Xianhan Yu 
---
 drivers/hid/hid-ids.h|1 +
 drivers/hid/hid-multitouch.c |   20 ++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 1dcb76f..a6d5890 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -305,6 +305,7 @@
 
 #define USB_VENDOR_ID_GENERAL_TOUCH0x0dfc
 #define USB_DEVICE_ID_GENERAL_TOUCH_WIN7_TWOFINGERS 0x0003
+#define USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PWT_TENFINGERS 0x0100
 
 #define USB_VENDOR_ID_GLAB 0x06c2
 #define USB_DEVICE_ID_4_PHIDGETSERVO_300x0038
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
index 59c8b5c..7aece16 100644
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -115,6 +115,8 @@ struct mt_device {
 #define MT_CLS_EGALAX_SERIAL   0x0104
 #define MT_CLS_TOPSEED 0x0105
 #define MT_CLS_PANASONIC   0x0106
+#define MT_CLS_GENERALTOUCH_TWOFINGERS 0x0107
+#define MT_CLS_GENERALTOUCH_PWT_TENFINGERS 0x0108
 
 #define MT_DEFAULT_MAXCONTACT  10
 
@@ -215,7 +217,18 @@ static struct mt_class mt_classes[] = {
{ .name = MT_CLS_PANASONIC,
.quirks = MT_QUIRK_NOT_SEEN_MEANS_UP,
.maxcontacts = 4 },
-
+   { .name = MT_CLS_GENERALTOUCH_TWOFINGERS,
+   .quirks = MT_QUIRK_NOT_SEEN_MEANS_UP |
+   MT_QUIRK_VALID_IS_INRANGE |
+   MT_QUIRK_SLOT_IS_CONTACTNUMBER,
+   .maxcontacts = 2
+   },
+   { .name = MT_CLS_GENERALTOUCH_PWT_TENFINGERS,
+   .quirks = MT_QUIRK_NOT_SEEN_MEANS_UP |
+   MT_QUIRK_SLOT_IS_CONTACTNUMBER,
+   .maxcontacts = 10
+   },
+
{ }
 };
 
@@ -893,9 +906,12 @@ static const struct hid_device_id mt_devices[] = {
USB_DEVICE_ID_ELO_TS2515) },
 
/* GeneralTouch panel */
-   { .driver_data = MT_CLS_DUAL_INRANGE_CONTACTNUMBER,
+   { .driver_data = MT_CLS_GENERALTOUCH_TWOFINGERS,
MT_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH,
USB_DEVICE_ID_GENERAL_TOUCH_WIN7_TWOFINGERS) },
+   { .driver_data = MT_CLS_GENERALTOUCH_PWT_TENFINGERS,
+   MT_USB_DEVICE(USB_VENDOR_ID_GENERAL_TOUCH,
+   USB_DEVICE_ID_GENERAL_TOUCH_WIN8_PWT_TENFINGERS) },
 
/* Gametel game controller */
{ .driver_data = MT_CLS_DEFAULT,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-27 Thread Alexander Graf

On 28.09.2012, at 04:04, Linus Torvalds wrote:

> On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf  wrote:
>> 
>> Below are OOPS excerpts from different rc's I tried. All of them crashed - 
>> all the way up to current Linus' master branch. I haven't cross-checked, but 
>> I don't remember any such behavior from pre-3.6 releases.
> 
> Since you seem to be able to reproduce it easily (and apparently
> reliably), any chance you could just bisect it?
> 
> Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
> 
>   git bisect start
>   git bisect good v3.5
>   git bisect bad v3.6-rc1
> 
> will get you started on your adventure..

Heh, will give it a try :). The thing really does look quite bisectable.


It might take a few hours though - the machine isn't exactly fast by today's 
standards and it's getting late here. But I'll keep you updated.

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the net-next tree

2012-09-27 Thread David Miller
From: Stephen Rothwell 
Date: Fri, 28 Sep 2012 11:43:35 +1000

> Hi all,
> 
> After merging the net-next tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_find_vfs':
> drivers/net/ethernet/emulex/benet/be_main.c:1090:28: error: 'struct pci_dev' 
> has no member named 'physfn'
> 
> Caused by commit 51af6d7c1f31 ("be2net: fix vfs enumeration").  physfn is
> only defined if CONFIG_PCI_ATS is set.
> 
> I have reverted that commit for today.

I'm reverting it too, thanks for reporting Stephen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 49/57] power: Cancelling status charging notification.

2012-09-27 Thread Anton Vorontsov
On Tue, Sep 25, 2012 at 10:12:46AM -0600, mathieu.poir...@linaro.org wrote:
> From: "Mathieu J. Poirier" 

No reasoning behind the change?

> Signed-off-by: Mathieu Poirier 
> ---
>  drivers/power/abx500_chargalg.c |2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/power/abx500_chargalg.c b/drivers/power/abx500_chargalg.c
> index ce58f20..4db0ef0 100644
> --- a/drivers/power/abx500_chargalg.c
> +++ b/drivers/power/abx500_chargalg.c
> @@ -641,10 +641,8 @@ static void abx500_chargalg_hold_charging(struct 
> abx500_chargalg *di)
>   abx500_chargalg_usb_en(di, false, 0, 0);
>   abx500_chargalg_stop_safety_timer(di);
>   abx500_chargalg_stop_maintenance_timer(di);
> - di->charge_status = POWER_SUPPLY_STATUS_CHARGING;
>   di->maintenance_chg = false;
>   cancel_delayed_work(>chargalg_wd_work);
> - power_supply_changed(>chargalg_psy);
>  }
>  
>  /**
> -- 
> 1.7.5.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pwm-backlight: Take over maintenance

2012-09-27 Thread Alex Courbot
On Thursday 20 September 2012 18:50:35 Thierry Reding wrote:
> Since the pwm-backlight driver is lacking a proper maintainer and is the
> heaviest user of the PWM framework I'm taking over maintenance.
> 
> Signed-off-by: Thierry Reding 
> Cc: Arun Murthy 
> Cc: Matthew Garrett 
> Cc: Robert Morell 
> Cc: Dilan Lee 
> Cc: Axel Lin 
> Cc: Mark Brown 
> Cc: Alexandre Courbot 
> Cc: Sachin Kamat 
> Cc: Andrew Morton 
> ---
> Andrew: As previously discussed this patch makes me the new maintainer
> for the pwm-backlight driver. I contacted Richard Purdie and requested
> his Acked-by for this because he's the backlight subsystem maintainer.
> Unfortunately he hasn't answered yet. Looking at the commit log, every
> commit in the last 2+ years has gone through your tree, which seems to
> be reason enough for me to take over this driver but I still wanted to
> get an Acked-by at least from you so that this doesn't look like
> hijacking. I've also Cc'ed a couple of people that have done work on
> the driver in the past to give them a chance to object or send their
> Acked-by.

Probably too late, but:

Acked-by: Alexandre Courbot 

> 
> Obviously I will take this patch through my tree so this is not a
> request to include it in your tree.
> 
>  MAINTAINERS | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 61ad79e..a6f023e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5543,6 +5543,8 @@ F:  Documentation/devicetree/bindings/pwm/
>  F:   include/linux/pwm.h
>  F:   include/linux/of_pwm.h
>  F:   drivers/pwm/
> +F:   drivers/video/backlight/pwm_bl.c
> +F:   include/linux/pwm_backlight.h
> 
>  PXA2xx/PXA3xx SUPPORT
>  M:   Eric Miao 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-27 Thread Linus Torvalds
On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf  wrote:
>
> Below are OOPS excerpts from different rc's I tried. All of them crashed - 
> all the way up to current Linus' master branch. I haven't cross-checked, but 
> I don't remember any such behavior from pre-3.6 releases.

Since you seem to be able to reproduce it easily (and apparently
reliably), any chance you could just bisect it?

Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple

   git bisect start
   git bisect good v3.5
   git bisect bad v3.6-rc1

will get you started on your adventure..

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


email still screwed

2012-09-27 Thread Daniel Santos
well, I guess my new email provider is worse than the old one, as only
the summary email appears to have made it through.  I'll try something
else tomorrow :(

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-27 Thread Wen Congyang
At 09/27/2012 06:35 PM, Vasilis Liaskovitis Wrote:
> On Thu, Sep 27, 2012 at 02:37:14PM +0800, Wen Congyang wrote:
>> Hi Vasilis Liaskovitis
>>
>> At 09/27/2012 12:46 AM, Vasilis Liaskovitis Wrote:
>>> Hi,
>>>
>>> I am testing 3.6.0-rc7 with this v9 patchset plus more recent fixes 
>>> [1],[2],[3]
>>> Running in a guest (qemu+seabios from [4]). 
>>> CONFIG_SLAB=y
>>> CONFIG_DEBUG_SLAB=y
>>>
>>> After succesfull hot-add and online, I am doing a hot-remove with "echo 1 > 
>>> /sys/bus/acpi/devices/PNP/eject"
>>> When I do the OSPM-eject, I often get slab corruption in "acpi-state" 
>>> cache, or in other caches
>>
>> I can't reproduce this problem. Can you provide the following information:
>> 1. config file
>> 2. qemu's command line
>>
>> You said you did OSPM-eject. Do you mean write 1 to 
>> /sys/bus/acpi/devices/PNP0C80:XX/eject?
> yes.
> 
> example qemu command line with one dimm:
> 
> "/opt/qemu-kvm-memhp/bin/qemu-system-x86_64 -bios
> /opt/extra/vliaskov/devel/seabios-upstream/out/bios.bin -enable-kvm -M pc -smp
> 4,maxcpus=8 -cpu host -m 2048 -drive 
> file=/opt/extra/debian-template.raw,if=none,id=drive-virtio-disk0,format=raw
> -device 
> virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -vga cirrus -netdev type=tap,id=guest0,vhost=on -device 
> virtio-net-pci,netdev=guest0
> -monitor unix:/tmp/qemu.monitor11,server,nowait -chardev stdio,id=seabios  
> -device
> isa-debugcon,iobase=0x402,chardev=seabios
> -dimm id=n0,size=512M,node=0"
> 
> or last line with 2 numa nodes:
> "-dimm id=n0,size=512M,node=0 -dimm id=n1,size=512M,node=1 -numa 
> node,nodeid=0 -numa node,nodeid=1"

I have reproduced this problem. It only can be reproduced when the dimm's 
memory is on node 0.
I investigate it now.

Thanks
Wen Congyang

> 
> attached config. Tree is at:
> https://github.com/vliaskov/linux/commits/memhp-fujitsu
> 
> thanks,
> - Vasilis

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable

2012-09-27 Thread Namjae Jeon
2012/9/27, Jan Kara :
> On Thu 27-09-12 15:00:18, Namjae Jeon wrote:
>> 2012/9/27, Jan Kara :
>> > On Thu 27-09-12 00:56:02, Wu Fengguang wrote:
>> >> On Tue, Sep 25, 2012 at 12:23:06AM +0200, Jan Kara wrote:
>> >> > On Thu 20-09-12 16:44:22, Wu Fengguang wrote:
>> >> > > On Sun, Sep 16, 2012 at 08:25:42AM -0400, Namjae Jeon wrote:
>> >> > > > From: Namjae Jeon 
>> >> > > >
>> >> > > > This patch is based on suggestion by Wu Fengguang:
>> >> > > > https://lkml.org/lkml/2011/8/19/19
>> >> > > >
>> >> > > > kernel has mechanism to do writeback as per dirty_ratio and
>> >> > > > dirty_background
>> >> > > > ratio. It also maintains per task dirty rate limit to keep
>> >> > > > balance
>> >> > > > of
>> >> > > > dirty pages at any given instance by doing bdi bandwidth
>> >> > > > estimation.
>> >> > > >
>> >> > > > Kernel also has max_ratio/min_ratio tunables to specify
>> >> > > > percentage
>> >> > > > of
>> >> > > > writecache to control per bdi dirty limits and task throttling.
>> >> > > >
>> >> > > > However, there might be a usecase where user wants a per bdi
>> >> > > > writeback tuning
>> >> > > > parameter to flush dirty data once per bdi dirty data reach a
>> >> > > > threshold
>> >> > > > especially at NFS server.
>> >> > > >
>> >> > > > dirty_background_centisecs provides an interface where user can
>> >> > > > tune
>> >> > > > background writeback start threshold using
>> >> > > > /sys/block/sda/bdi/dirty_background_centisecs
>> >> > > >
>> >> > > > dirty_background_centisecs is used alongwith average bdi write
>> >> > > > bandwidth
>> >> > > > estimation to start background writeback.
>> >> >   The functionality you describe, i.e. start flushing bdi when
>> >> > there's
>> >> > reasonable amount of dirty data on it, looks sensible and useful.
>> >> > However
>> >> > I'm not so sure whether the interface you propose is the right one.
>> >> > Traditionally, we allow user to set amount of dirty data (either in
>> >> > bytes
>> >> > or percentage of memory) when background writeback should start. You
>> >> > propose setting the amount of data in centisecs-to-write. Why that
>> >> > difference? Also this interface ties our throughput estimation code
>> >> > (which
>> >> > is an implementation detail of current dirty throttling) with the
>> >> > userspace
>> >> > API. So we'd have to maintain the estimation code forever, possibly
>> >> > also
>> >> > face problems when we change the estimation code (and thus estimates
>> >> > in
>> >> > some cases) and users will complain that the values they set
>> >> > originally
>> >> > no
>> >> > longer work as they used to.
>> >>
>> >> Yes, that bandwidth estimation is not all that (and in theory cannot
>> >> be made) reliable which may be a surprise to the user. Which make the
>> >> interface flaky.
>> >>
>> >> > Also, as with each knob, there's a problem how to properly set its
>> >> > value?
>> >> > Most admins won't know about the knob and so won't touch it. Others
>> >> > might
>> >> > know about the knob but will have hard time figuring out what value
>> >> > should
>> >> > they set. So if there's a new knob, it should have a sensible
>> >> > initial
>> >> > value. And since this feature looks like a useful one, it shouldn't
>> >> > be
>> >> > zero.
>> >>
>> >> Agreed in principle. There seems be no reasonable defaults for the
>> >> centisecs-to-write interface, mainly due to its inaccurate nature,
>> >> especially the initial value may be wildly wrong on fresh system
>> >> bootup. This is also true for your proposed interfaces, see below.
>> >>
>> >> > So my personal preference would be to have
>> >> > bdi->dirty_background_ratio
>> >> > and
>> >> > bdi->dirty_background_bytes and start background writeback whenever
>> >> > one of global background limit and per-bdi background limit is
>> >> > exceeded.
>> >> > I
>> >> > think this interface will do the job as well and it's easier to
>> >> > maintain
>> >> > in
>> >> > future.
>> >>
>> >> bdi->dirty_background_ratio, if I understand its semantics right, is
>> >> unfortunately flaky in the same principle as centisecs-to-write,
>> >> because it relies on the (implicitly estimation of) writeout
>> >> proportions. The writeout proportions for each bdi starts with 0,
>> >> which is even worse than the 100MB/s initial value for
>> >> bdi->write_bandwidth and will trigger background writeback on the
>> >> first write.
>> >   Well, I meant bdi->dirty_backround_ratio wouldn't use writeout
>> > proportion
>> > estimates at all. Limit would be
>> >   dirtiable_memory * bdi->dirty_backround_ratio.
>> >
>> > After all we want to start writeout to bdi when we have enough pages to
>> > reasonably load the device for a while which has nothing to do with how
>> > much is written to this device as compared to other devices.
>> >
>> > OTOH I'm not particularly attached to this interface. Especially since
>> > on a
>> > lot of today's machines, 1% is rather big so people might often end up
>> > using dirty_background_bytes 

[PATCH v2] mfd: da9052-core: Use regmap_irq_get_virq() and fix the probe

2012-09-27 Thread Fabio Estevam
From: Fabio Estevam 

On a mx53qsb dt-kernel the da9052-core driver fails to probe as follows:

da9052 1-0048: DA9052 ADC IRQ failed ret=-22

The reason for the error was due to passing only the offset as the interrupt 
number in request_threaded_irq().

The recommended approach though is to use regmap_get_virq() to acquire the 
interrupt number.

Fix it and allow the driver to probe successfully.

Also provide a few more error logs and change the irq string to "adc-irq", so
that it appears as a single word in 'cat /proc/interrupts'.

Signed-off-by: Fabio Estevam 
---
Changes since v2:
- Use regmap_irq_get_virq() instead of relying on irq_base

Arnd/Mark,

I also plan to convert the other da9052 drivers to use regmap_irq_get_virq().

 drivers/mfd/da9052-core.c |   31 ++-
 include/linux/mfd/da9052/da9052.h |1 +
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/mfd/da9052-core.c b/drivers/mfd/da9052-core.c
index a0a62b2..79f9674 100644
--- a/drivers/mfd/da9052-core.c
+++ b/drivers/mfd/da9052-core.c
@@ -782,35 +782,40 @@ int __devinit da9052_device_init(struct da9052 *da9052, 
u8 chip_id)
 
da9052->chip_id = chip_id;
 
-   if (!pdata || !pdata->irq_base)
-   da9052->irq_base = -1;
-   else
-   da9052->irq_base = pdata->irq_base;
-
ret = regmap_add_irq_chip(da9052->regmap, da9052->chip_irq,
  IRQF_TRIGGER_LOW | IRQF_ONESHOT,
- da9052->irq_base, _regmap_irq_chip,
+ -1, _regmap_irq_chip,
  >irq_data);
-   if (ret < 0)
+   if (ret < 0) {
+   dev_err(da9052->dev, "regmap_add_irq_chip failed: %d\n", ret);
goto regmap_err;
+   }
 
-   da9052->irq_base = regmap_irq_chip_get_base(da9052->irq_data);
+   da9052->irq = regmap_irq_get_virq(da9052->irq_data, DA9052_IRQ_ADC_EOM);
 
-   ret = request_threaded_irq(DA9052_IRQ_ADC_EOM, NULL, da9052_auxadc_irq,
+   if (da9052->irq < 0) {
+   ret = da9052->irq;
+   dev_err(da9052->dev, "regmap_irq_get_virq failed: %d\n", ret);
+   goto regmap_err;
+   }
+
+   ret = request_threaded_irq(da9052->irq, NULL, da9052_auxadc_irq,
   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
-  "adc irq", da9052);
+  "adc-irq", da9052);
if (ret != 0)
dev_err(da9052->dev, "DA9052 ADC IRQ failed ret=%d\n", ret);
 
ret = mfd_add_devices(da9052->dev, -1, da9052_subdev_info,
  ARRAY_SIZE(da9052_subdev_info), NULL, 0, NULL);
-   if (ret)
+   if (ret) {
+   dev_err(da9052->dev, "mfd_add_devices failed: %d\n", ret);
goto err;
+   }
 
return 0;
 
 err:
-   free_irq(DA9052_IRQ_ADC_EOM, da9052);
+   free_irq(da9052->irq, da9052);
mfd_remove_devices(da9052->dev);
 regmap_err:
return ret;
@@ -818,7 +823,7 @@ regmap_err:
 
 void da9052_device_exit(struct da9052 *da9052)
 {
-   free_irq(DA9052_IRQ_ADC_EOM, da9052);
+   free_irq(da9052->irq, da9052);
regmap_del_irq_chip(da9052->chip_irq, da9052->irq_data);
mfd_remove_devices(da9052->dev);
 }
diff --git a/include/linux/mfd/da9052/da9052.h 
b/include/linux/mfd/da9052/da9052.h
index 0507c4c..f0b259d 100644
--- a/include/linux/mfd/da9052/da9052.h
+++ b/include/linux/mfd/da9052/da9052.h
@@ -99,6 +99,7 @@ struct da9052 {
u8 chip_id;
 
int chip_irq;
+   int irq;
 };
 
 /* ADC API */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/25] compiler-gcc.h: Add gcc-recommended GCC_VERSION macro

2012-09-27 Thread Daniel Santos
Throughout compiler*.h, many version checks are made.  These can be
simplified by using the macro that gcc's documentation recommends.
However, my primary reason for adding this is that I need bug-check
macros that are enabled at certain gcc versions and it's cleaner to use
this macro than the tradition method:

if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ => 2)

If you add patch level, it gets this ugly:

if __GNUC__ > 4 || (__GNUC__ == 4 && (__GNUC_MINOR__ > 2 || \
   __GNUC_MINOR__ == 2 __GNUC_PATCHLEVEL__ >= 1))

As opposed to:

if GCC_VERSION >= 40201

While having separate headers for gcc 3 & 4 eliminates some of this
verbosity, they can still be cleaned up by this.

See also:
http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 6a6d7ae..24545cd 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -5,6 +5,9 @@
 /*
  * Common definitions for all gcc versions go here.
  */
+#define GCC_VERSION (__GNUC__ * 1 \
+  + __GNUC_MINOR__ * 100 \
+  + __GNUC_PATCHLEVEL__)
 
 
 /* Optimization barrier */
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 5/25] compiler{,-gcc4}.h: Remove duplicate macros

2012-09-27 Thread Daniel Santos
__linktime_error() does the same thing as __compiletime_error() and is
only used in bug.h.  Since the macro defines a function attribute that
will cause a failure at compile-time (not link-time), it makes more
sense to keep __compiletime_error(), which is also neatly mated with
__compiletime_warning().

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |2 --
 include/linux/compiler.h  |3 ---
 2 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index b44307d..ad610f2 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -33,8 +33,6 @@
the kernel context */
 #define __cold __attribute__((__cold__))
 
-#define __linktime_error(message) __attribute__((__error__(message)))
-
 #ifndef __CHECKER__
 # define __compiletime_warning(message) __attribute__((warning(message)))
 # define __compiletime_error(message) __attribute__((error(message)))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index f430e41..fd455aa 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -297,9 +297,6 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int 
val, int expect);
 #ifndef __compiletime_error
 # define __compiletime_error(message)
 #endif
-#ifndef __linktime_error
-# define __linktime_error(message)
-#endif
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 7/25] compiler{,-gcc4}.h: Introduce __flatten function attribute

2012-09-27 Thread Daniel Santos
For gcc 4.1 & later, expands to __attribute__((flatten)) which forces
the compiler to inline everything it can into the function.  This is
useful in combination with noinline when you want to control the depth
of inlining, or create a single function where inline expansions will
occur. (see
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bflatten_007d-function-attribute-2512)

Normally, it's best to leave this type of thing up to the compiler.
However, the generic rbtree code uses inline functions just to be able
to inject compile-time constant data that specifies how the caller wants
the function to behave (via struct rb_relationship).  This data can be
thought of as the template parameters of a C++ templatized function.
Since some of these functions, once expanded, become quite large, gcc
sometimes decides not to perform some important inlining, in one case,
even generating a few bytes more code by not doing so. (Note: I have not
eliminated the possibility that this was an optimization bug, but the
flatten attribute fixes it in either case.)

Combining __flatten and noinline insures that important optimizations
occur in these cases and that the inline expansion occurs in exactly one
place, thus not leading to unnecissary bloat. However, it also can
eliminate some opportunities for optimization should gcc otherwise
decide the function its self is a good candidate for inlining.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |7 ++-
 include/linux/compiler.h  |4 
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index ad610f2..5a0897e 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -15,7 +15,12 @@
 
 #if GCC_VERSION >= 40102
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
+
+/* flatten introduced in 4.1, but broken in 4.6.0 (gcc bug #48731)*/
+# if GCC_VERSION != 40600
+#  define __flatten __attribute__((flatten))
+# endif
+#endif /* GCC_VERSION >= 40102 */
 
 #if GCC_VERSION >= 40300
 /* Mark functions as cold. gcc will assume any path leading to a call
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index fd455aa..268aeb6 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -244,6 +244,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, 
int val, int expect);
 #define __always_inline inline
 #endif
 
+#ifndef __flatten
+#define __flatten
+#endif
+
 #endif /* __KERNEL__ */
 
 /*
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-27 Thread Alexander Graf
Howdy,

While running 3.6.0-rcX I am having a few issues with nfsd on my PPC970 based 
system. For some reason every time I actually end up accessing an NFS share on 
it, it crashes away at random points. It looks a lot like corrupted pointers in 
all logs. I also can't reproduce the oopses without nfsd in the game. Doing the 
same workload that crashes over NFS locally on the box (git clone -ls) works 
just fine.

The mount was done simply without parameters:

  lychee:/home/agraf/release on /abuild/agraf/autotest_e500/lychee type nfs 
(rw,addr=10.10.1.189)

My exports on the host is also quite simple:

  /home/agraf/release *(async,rw)

Below are OOPS excerpts from different rc's I tried. All of them crashed - all 
the way up to current Linus' master branch. I haven't cross-checked, but I 
don't remember any such behavior from pre-3.6 releases.


Alex



#  3.6.0-rc1   #



Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart 
hwmon
NIP: c016a330 LR: c0608794 CTR: c063fe78
REGS: c00071b832f0 TRAP: 0300   Not tainted  (3.6.0-rc1-00220-gb645f8b)
MSR: 90009032   CR: 22002044  XER: 2000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 4000
TASK = c0007b3a[6061] 'nfsd' THREAD: c00071b8 CPU: 3
GPR00:  c00071b83570 c0b8e2a8 0280 
GPR04: 000102d0 c0609844 c065418c c0ec55e0 
GPR08: 0036ac63 00477000 c0a91758 c0a4e5e0 
GPR12: 2248 cfff2680 0001 0010 
GPR16:  0001 05a8 0d10 
GPR20: 8000 c00071d39608 0d10 02f0 
GPR24: c0609844 000102d0 0280 0280 
GPR28: c0609844 4e375f30f9fae38f c0adb9c8 c0007b002b00 
NIP [c016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c0608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c00071b83570] [c016a370] .__kmalloc_track_caller+0x160/0x2ac 
(unreliable)
[c00071b83620] [c0608794] .__kmalloc_reserve+0x44/0xbc
[c00071b836c0] [c0609844] .__alloc_skb+0xb8/0x1d0
[c00071b83780] [c065418c] .sk_stream_alloc_skb+0x48/0x154
[c00071b83810] [c0655308] .tcp_sendpage+0x1d0/0x7a0
[c00071b83920] [c067b864] .inet_sendpage+0x100/0x158
[c00071b839d0] [c05fb294] .kernel_sendpage+0x7c/0xc8
[c00071b83a70] [c06b72b4] .svc_send_common+0xc8/0x1a8
[c00071b83b40] [c06b74c8] .svc_sendto+0x134/0x15c
[c00071b83c40] [c06b7590] .svc_tcp_sendto+0x3c/0xc0
[c00071b83cd0] [c06c46dc] .svc_send+0xb0/0x118
[c00071b83d70] [c06b41a0] .svc_process+0x784/0x7c0
[c00071b83e40] [d3032e34] .nfsd+0x138/0x1ec [nfsd]
[c00071b83ed0] [c009d050] .kthread+0xb0/0xbc
[c00071b83f90] [c001e9b0] .kernel_thread+0x54/0x70
Instruction dump:
2bbf0010 41fd000c 7fe3fb78 48000180 e92d0040 e97f 7ceb4a14 e9070008 
7fab482a 2fbd 419e0034 e81f0022 <7f7d002a> 3800 886d01f2 980d01f2 
---[ end trace 9af22fc4dfe9499b ]---

Unable to handle kernel paging request for data at address 0x4e375f30f9fae38f
Faulting instruction address: 0xc016a330
Oops: Kernel access of bad area, sig: 11 [#2]
PREEMPT SMP NR_CPUS=32 Maple
Modules linked in: nfsd autofs4 exportfs binfmt_misc tg3 uninorth_agp agpgart 
hwmon
NIP: c016a330 LR: c0608794 CTR: c06a1b7c
REGS: c00079e1b300 TRAP: 0300   Tainted: G  D   
(3.6.0-rc1-00220-gb645f8b)
MSR: 90009032   CR: 24082444  XER: 2000
SOFTE: 1
DAR: 4e375f30f9fae38f, DSISR: 4000
TASK = c00073d8[10885] 'dhcpcd' THREAD: c00079e18000 CPU: 3
GPR00:  c00079e1b580 c0b8e2a8 0300 
GPR04: 000106d0 c0609844 c0601c60 c0ec55e0 
GPR08: 0036ac63 00477000 c0a91758 c0a4e5e0 
GPR12:  cfff2680 201e6ad8 c00079e1b930 
GPR16: 0158  0001 0004 
GPR20: c00079e1b818 c000711350f8 c000711350c0  
GPR24: c0609844 000106d0 0300 0300 
GPR28: c0609844 4e375f30f9fae38f c0adb9c8 c0007b002b00 
NIP [c016a330] .__kmalloc_track_caller+0x120/0x2ac
LR [c0608794] .__kmalloc_reserve+0x44/0xbc
Call Trace:
[c00079e1b630] [c0608794] .__kmalloc_reserve+0x44/0xbc
[c00079e1b6d0] [c0609844] .__alloc_skb+0xb8/0x1d0
[c00079e1b790] [c0601c60] 

[PATCH v6 6/25] bug.h: Replace __linktime_error with __compiletime_error

2012-09-27 Thread Daniel Santos
Signed-off-by: Daniel Santos 
---
 include/linux/bug.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/bug.h b/include/linux/bug.h
index aaac4bb..298a916 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -73,7 +73,7 @@ extern int __build_bug_on_failed;
 #define BUILD_BUG()\
do {\
extern void __build_bug_failed(void)\
-   __linktime_error("BUILD_BUG failed");   \
+   __compiletime_error("BUILD_BUG failed");\
__build_bug_failed();   \
} while (0)
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 4/25] compiler-gcc{3,4}.h: Use GCC_VERSION macro

2012-09-27 Thread Daniel Santos
Using GCC_VERSION reduces complexity, is easier to read and is GCC's
recommended mechanism for doing version checks. (Just don't ask me why
they didn't define it in the first place.)  This also makes it easy to
merge compiler-gcc{3,4}.h should somebody want to.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc3.h |8 
 include/linux/compiler-gcc4.h |   14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/compiler-gcc3.h b/include/linux/compiler-gcc3.h
index 37d4124..7d89feb 100644
--- a/include/linux/compiler-gcc3.h
+++ b/include/linux/compiler-gcc3.h
@@ -2,22 +2,22 @@
 #error "Please don't include  directly, include 
 instead."
 #endif
 
-#if __GNUC_MINOR__ < 2
+#if GCC_VERSION < 30200
 # error Sorry, your compiler is too old - please upgrade it.
 #endif
 
-#if __GNUC_MINOR__ >= 3
+#if GCC_VERSION >= 30300
 # define __used__attribute__((__used__))
 #else
 # define __used__attribute__((__unused__))
 #endif
 
-#if __GNUC_MINOR__ >= 4
+#if GCC_VERSION >= 30400
 #define __must_check   __attribute__((warn_unused_result))
 #endif
 
 #ifdef CONFIG_GCOV_KERNEL
-# if __GNUC_MINOR__ < 4
+# if GCC_VERSION < 30400
 #   error "GCOV profiling support for gcc versions below 3.4 not included"
 # endif /* __GNUC_MINOR__ */
 #endif /* CONFIG_GCOV_KERNEL */
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 4506d65..b44307d 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -4,7 +4,7 @@
 
 /* GCC 4.1.[01] miscompiles __weak */
 #ifdef __KERNEL__
-# if __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ <= 1
+# if GCC_VERSION >= 40100 &&  GCC_VERSION <= 40101
 //#  error Your version of gcc miscompiles the __weak directive
 # endif
 #endif
@@ -13,11 +13,11 @@
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 
-#if __GNUC_MINOR__ > 0
+#if GCC_VERSION >= 40102
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
 
-#if __GNUC_MINOR__ >= 3
+#if GCC_VERSION >= 40300
 /* Mark functions as cold. gcc will assume any path leading to a call
to them will be unlikely.  This means a lot of manual unlikely()s
are unnecessary now for any paths leading to the usual suspects
@@ -39,9 +39,9 @@
 # define __compiletime_warning(message) __attribute__((warning(message)))
 # define __compiletime_error(message) __attribute__((error(message)))
 #endif /* __CHECKER__ */
-#endif /* __GNUC_MINOR__ >= 3 */
+#endif /* GCC_VERSION >= 40300 */
 
-#if __GNUC_MINOR__ >= 5
+#if GCC_VERSION >= 40500
 /*
  * Mark a position in code as unreachable.  This can be used to
  * suppress control flow warnings after asm blocks that transfer
@@ -56,9 +56,9 @@
 /* Mark a function definition as prohibited from being cloned. */
 #define __noclone  __attribute__((__noclone__))
 
-#endif /* __GNUC_MINOR__ >= 5 */
+#endif /* GCC_VERSION >= 40500 */
 
-#if __GNUC_MINOR__ >= 6
+#if GCC_VERSION >= 40600
 /*
  * Tell the optimizer that something else uses this function or variable.
  */
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/25] compiler-gcc4.h: Reorder macros based upon gcc ver

2012-09-27 Thread Daniel Santos
This helps to keep the file from getting confusing, removes one
duplicate version check and should encourage future editors to put new
macros where they belong.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 8721704..4506d65 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -13,6 +13,10 @@
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 
+#if __GNUC_MINOR__ > 0
+# define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
+#endif
+
 #if __GNUC_MINOR__ >= 3
 /* Mark functions as cold. gcc will assume any path leading to a call
to them will be unlikely.  This means a lot of manual unlikely()s
@@ -31,6 +35,12 @@
 
 #define __linktime_error(message) __attribute__((__error__(message)))
 
+#ifndef __CHECKER__
+# define __compiletime_warning(message) __attribute__((warning(message)))
+# define __compiletime_error(message) __attribute__((error(message)))
+#endif /* __CHECKER__ */
+#endif /* __GNUC_MINOR__ >= 3 */
+
 #if __GNUC_MINOR__ >= 5
 /*
  * Mark a position in code as unreachable.  This can be used to
@@ -46,8 +56,7 @@
 /* Mark a function definition as prohibited from being cloned. */
 #define __noclone  __attribute__((__noclone__))
 
-#endif
-#endif
+#endif /* __GNUC_MINOR__ >= 5 */
 
 #if __GNUC_MINOR__ >= 6
 /*
@@ -56,10 +65,3 @@
 #define __visible __attribute__((externally_visible))
 #endif
 
-#if __GNUC_MINOR__ > 0
-#define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
-#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
-#define __compiletime_warning(message) __attribute__((warning(message)))
-#define __compiletime_error(message) __attribute__((error(message)))
-#endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 1/25] compiler-gcc4.h: Correct verion check for __compiletime_error

2012-09-27 Thread Daniel Santos
__attribute__((error(msg))) was introduced in gcc 4.3 (not 4.4) and as I
was unable to find any gcc bugs pertaining to it, I'm presuming that it
has functioned as advertised since 4.3.0.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 997fd8a..8721704 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -59,7 +59,7 @@
 #if __GNUC_MINOR__ > 0
 #define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
-#if __GNUC_MINOR__ >= 4 && !defined(__CHECKER__)
+#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
 #define __compiletime_warning(message) __attribute__((warning(message)))
 #define __compiletime_error(message) __attribute__((error(message)))
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/25] Generic Red-Black Trees

2012-09-27 Thread Daniel Santos


This revised patch set is rebased onto linux-mmotm.

I have a new mail provider now, but they seem just as bad as the last, so I'm
going to try to split the recpients in half and pray that this goes through.
I'm sorry to those of you who have gotten partial patch sets and I hope to get
this fixed soon. (If anybody knows of a descent email service for developers
who send patches, please let me know.)

Summary
===
This patch set improves on Andrea Arcangeli's original Red-Black Tree
implementation by adding generic search and insert functions with
complete support for:

o leftmost - keeps a pointer to the leftmost (lowest value) node cached
  in your container struct
o rightmost - ditto for rightmost (greatest value)
o count - optionally update an count variable when you perform inserts
  or deletes
o unique or non-unique keys
o find and insert "near" functions - when you already have a node that
  is likely near another one you want to search for
o type-safe wrapper interface available via pre-processor macro

Outstanding Issues
==
General
---
o Need something in Documents to explain generic rbtrees.
o Due to a bug in gcc's optimizer, extra instructions are generated in various
  places.  Pavel Pisa has provided me a possible work-around that should be
  examined more closely to see if it can be working in (Discussed in
  Performance section).
o Doc-comments are missing or out of date in some places for the new
  ins_compare field of struct rb_relationship (including at least one code
  example).

Selftests
-
o In-kernel test module not completed.
o Userspace selftest's Makefile should run modules_prepare in KERNELDIR.
o Validation in self-tests doesn't yet cover tests for
  - insert_near
  - find_{first,last,next,prev}
o Selftest scripts need better portability (maybe solved? we'll see)
o It would be nice to have some fault-injection in test code to verify that
  CONFIG_DEBUG_GRBTREE and CONFIG_DEBUG_GRBTREE_VALIDATE (and it's
  RB_VERIFY_INTEGRITY counterpart flag) catch the errors they are supposed to.

Undecided (Opinions Requested!)
---
o With the exception of the rb_node & rb_root structs, "Layer 2" of the code
  (see below) completely abstracts away the underlying red-black tree
  mechanism.  The structs rb_node and rb_root can also be abstracted away via
  a typeset or some other mechanism. Thus, should the "Layer 2" code be
  separated from "Layer 1" and renamed "Generic Tree (gtree)" or some such,
  paving the way for an alternate tree implementation in the future?
o Do we need RB_INSERT_DUPE_RIGHT? (see the last patch)


Theory of Operation
===
Historically, genericity in C meant function pointers, the overhead of a
function call and the inability of the compiler to optimize code across
the function call boundary.  GCC has been getting better and better at
optimization and determining when a value is a compile-time constant and
compiling it out.  As of gcc 4.6, it has finally reached a point where
it's possible to have generic search & insert cores that optimize
exactly as well as if they were hand-coded. (see also gcc man page:
-findirect-inlining)

This implementation actually consists of two layers written on top of the
existing rbtree implementation.

Layer 1: Type-Specific (But Not Type-Safe)
--
The first layer consists of enum rb_flags, struct rb_relationship and
some generic inline functions(see patch for doc comments).

enum rb_flags {
RB_HAS_LEFTMOST = 0x0001,
RB_HAS_RIGHTMOST= 0x0002,
RB_HAS_COUNT= 0x0004,
RB_UNIQUE_KEYS  = 0x0008,
RB_INSERT_REPLACES  = 0x0010,
RB_IS_AUGMENTED = 0x0040,
RB_VERIFY_USAGE = 0x0080,
RB_VERIFY_INTEGRITY = 0x0100
};

struct rb_relationship {
ssize_t root_offset;
ssize_t left_offset;
ssize_t right_offset;
ssize_t count_offset;
ssize_t node_offset;
ssize_t key_offset;
int flags;
const rb_compare_f compare; /* comparitor for lookups */
const rb_compare_f ins_compare; /* comparitor for inserts */
unsigned key_size;
};

/* these function for use on all trees */
struct rb_node *rb_find(
struct rb_root *root,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_find_near(
struct rb_node *from,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_insert(
struct rb_root *root,
struct rb_node *node,
const struct rb_relationship *rel);
struct rb_node *rb_insert_near(
struct rb_root *root,
struct rb_node *start,
struct rb_node *node,
const struct rb_relationship *rel);
void 

Re: [PATCH 3/4] memory-hotplug: clear hwpoisoned flag when onlining pages

2012-09-27 Thread Wen Congyang
At 09/28/2012 04:17 AM, KOSAKI Motohiro Wrote:
> On Thu, Sep 27, 2012 at 1:45 AM,   wrote:
>> From: Wen Congyang 
>>
>> hwpoisoned may set when we offline a page by the sysfs interface
>> /sys/devices/system/memory/soft_offline_page or
>> /sys/devices/system/memory/hard_offline_page. If we don't clear
>> this flag when onlining pages, this page can't be freed, and will
>> not in free list. So we can't offline these pages again. So we
>> should clear this flag when onlining pages.
> 
> This seems wrong fix to me.  After offline, memory may or may not
> change with new one. Thus we can't assume any memory status. Thus,
> we should just forget hwpoison status at _offline_ event.
> 

Yes, agree with you. I will update this patch.

Thanks for reviewing.

Wen Congyang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/acpi] ACPI: Document ACPI table overriding via initrd

2012-09-27 Thread tip-bot for Thomas Renninger
Commit-ID:  4901b402c957d4ebaff123ffc34fafe3cef99542
Gitweb: http://git.kernel.org/tip/4901b402c957d4ebaff123ffc34fafe3cef99542
Author: Thomas Renninger 
AuthorDate: Wed, 26 Sep 2012 14:19:01 +0200
Committer:  H. Peter Anvin 
CommitDate: Thu, 27 Sep 2012 15:01:43 -0700

ACPI: Document ACPI table overriding via initrd

Signed-off-by: Thomas Renninger 
Link: http://lkml.kernel.org/r/1348661941-71287-7-git-send-email-tr...@suse.de
Cc: Len Brown 
Cc: Robert Moore 
Cc: Yinghai Lu 
Cc: Eric Piel 
Signed-off-by: H. Peter Anvin 
---
 Documentation/acpi/initrd_table_override.txt |   94 ++
 1 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/Documentation/acpi/initrd_table_override.txt 
b/Documentation/acpi/initrd_table_override.txt
new file mode 100644
index 000..35c3f54
--- /dev/null
+++ b/Documentation/acpi/initrd_table_override.txt
@@ -0,0 +1,94 @@
+Overriding ACPI tables via initrd
+=
+
+1) Introduction (What is this about)
+2) What is this for
+3) How does it work
+4) References (Where to retrieve userspace tools)
+
+1) What is this about
+-
+
+If the ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible to
+override nearly any ACPI table provided by the BIOS with an instrumented,
+modified one.
+
+For a full list of ACPI tables that can be overridden, take a look at
+the char *table_sigs[MAX_ACPI_SIGNATURE]; definition in drivers/acpi/osl.c
+All ACPI tables iasl (Intel's ACPI compiler and disassembler) knows should
+be overridable, except:
+   - ACPI_SIG_RSDP (has a signature of 6 bytes)
+   - ACPI_SIG_FACS (does not have an ordinary ACPI table header)
+Both could get implemented as well.
+
+
+2) What is this for
+---
+
+Please keep in mind that this is a debug option.
+ACPI tables should not get overridden for productive use.
+If BIOS ACPI tables are overridden the kernel will get tainted with the
+TAINT_OVERRIDDEN_ACPI_TABLE flag.
+Complain to your platform/BIOS vendor if you find a bug which is so sever
+that a workaround is not accepted in the Linux kernel.
+
+Still, it can and should be enabled in any kernel, because:
+  - There is no functional change with not instrumented initrds
+  - It provides a powerful feature to easily debug and test ACPI BIOS table
+compatibility with the Linux kernel.
+
+
+3) How does it work
+---
+
+# Extract the machine's ACPI tables:
+cd /tmp
+acpidump >acpidump
+acpixtract -a acpidump
+# Disassemble, modify and recompile them:
+iasl -d *.dat
+# For example add this statement into a _PRT (PCI Routing Table) function
+# of the DSDT:
+Store("HELLO WORLD", debug)
+iasl -sa dsdt.dsl
+# Add the raw ACPI tables to an uncompressed cpio archive.
+# They must be put into a /kernel/firmware/acpi directory inside the
+# cpio archive.
+# The uncompressed cpio archive must be the first.
+# Other, typically compressed cpio archives, must be
+# concatenated on top of the uncompressed one.
+mkdir -p kernel/firmware/acpi
+cp dsdt.aml kernel/firmware/acpi
+# A maximum of: #define ACPI_OVERRIDE_TABLES 10
+# tables are  currently allowed (see osl.c):
+iasl -sa facp.dsl
+iasl -sa ssdt1.dsl
+cp facp.aml kernel/firmware/acpi
+cp ssdt1.aml kernel/firmware/acpi
+# Create the uncompressed cpio archive and concatenate the original initrd
+# on top:
+find kernel | cpio -H newc --create > /boot/instrumented_initrd
+cat /boot/initrd >>/boot/instrumented_initrd
+# reboot with increased acpi debug level, e.g. boot params:
+acpi.debug_level=0x2 acpi.debug_layer=0x
+# and check your syslog:
+[1.268089] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
+[1.272091] [ACPI Debug]  String [0x0B] "HELLO WORLD"
+
+iasl is able to disassemble and recompile quite a lot different,
+also static ACPI tables.
+
+
+4) Where to retrieve userspace tools
+
+
+iasl and acpixtract are part of Intel's ACPICA project:
+http://acpica.org/
+and should be packaged by distributions (for example in the acpica package
+on SUSE).
+
+acpidump can be found in Len Browns pmtools:
+ftp://kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools/acpidump
+This tool is also part of the acpica package on SUSE.
+Alternatively, used ACPI tables can be retrieved via sysfs in latest kernels:
+/sys/firmware/acpi/tables
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >