[PATCH] target: Update copyright ownership to 2012

2012-11-09 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hello everyone,

This patch to update copyright year to current for principal target core
ownership is now being pushed into target-pending/for-next.

Signed-off-by: Nicholas Bellinger 
---
 drivers/target/target_core_alua.c|5 +++--
 drivers/target/target_core_configfs.c|5 +++--
 drivers/target/target_core_device.c  |7 +++
 drivers/target/target_core_fabric_configfs.c |7 ---
 drivers/target/target_core_fabric_lib.c  |5 +++--
 drivers/target/target_core_file.c|7 +++
 drivers/target/target_core_hba.c |7 +++
 drivers/target/target_core_iblock.c  |7 +++
 drivers/target/target_core_pr.c  |5 +++--
 drivers/target/target_core_pscsi.c   |7 +++
 drivers/target/target_core_rd.c  |7 +++
 drivers/target/target_core_sbc.c |7 +++
 drivers/target/target_core_spc.c |7 +++
 drivers/target/target_core_stat.c|7 +++
 drivers/target/target_core_tmr.c |5 +++--
 drivers/target/target_core_tpg.c |7 +++
 drivers/target/target_core_transport.c   |7 +++
 drivers/target/target_core_ua.c  |5 +++--
 18 files changed, 55 insertions(+), 59 deletions(-)

diff --git a/drivers/target/target_core_alua.c 
b/drivers/target/target_core_alua.c
index 4c8eea2..035c606 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -3,8 +3,9 @@
  *
  * This file contains SPC-3 compliant asymmetric logical unit assigntment 
(ALUA)
  *
- * Copyright (c) 2009-2010 Rising Tide Systems
- * Copyright (c) 2009-2010 Linux-iSCSI.org
+ * (c) Copyright 2009-2012 RisingTide Systems LLC.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
  *
  * Nicholas A. Bellinger 
  *
diff --git a/drivers/target/target_core_configfs.c 
b/drivers/target/target_core_configfs.c
index 2b14164..72881cc 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -3,8 +3,9 @@
  *
  * This file contains ConfigFS logic for the Generic Target Engine project.
  *
- * Copyright (c) 2008-2011 Rising Tide Systems
- * Copyright (c) 2008-2011 Linux-iSCSI.org
+ * (c) Copyright 2008-2012 RisingTide Systems LLC.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
  *
  * Nicholas A. Bellinger 
  *
diff --git a/drivers/target/target_core_device.c 
b/drivers/target/target_core_device.c
index 54439bc..a9634aa 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -4,10 +4,9 @@
  * This file contains the TCM Virtual Device and Disk Transport
  * agnostic related functions.
  *
- * Copyright (c) 2003, 2004, 2005 PyX Technologies, Inc.
- * Copyright (c) 2005-2006 SBE, Inc.  All Rights Reserved.
- * Copyright (c) 2007-2010 Rising Tide Systems
- * Copyright (c) 2008-2010 Linux-iSCSI.org
+ * (c) Copyright 2003-2012 RisingTide Systems LLC.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
  *
  * Nicholas A. Bellinger 
  *
diff --git a/drivers/target/target_core_fabric_configfs.c 
b/drivers/target/target_core_fabric_configfs.c
index aa67313..4baac62 100644
--- a/drivers/target/target_core_fabric_configfs.c
+++ b/drivers/target/target_core_fabric_configfs.c
@@ -4,10 +4,11 @@
  * This file contains generic fabric module configfs infrastructure for
  * TCM v4.x code
  *
- * Copyright (c) 2010,2011 Rising Tide Systems
- * Copyright (c) 2010,2011 Linux-iSCSI.org
+ * (c) Copyright 2010-2012 RisingTide Systems LLC.
  *
- * Copyright (c) Nicholas A. Bellinger 
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
+ *
+ * Nicholas A. Bellinger 
 *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/drivers/target/target_core_fabric_lib.c 
b/drivers/target/target_core_fabric_lib.c
index e460d62..6ec5814 100644
--- a/drivers/target/target_core_fabric_lib.c
+++ b/drivers/target/target_core_fabric_lib.c
@@ -4,8 +4,9 @@
  * This file contains generic high level protocol identifier and PR
  * handlers for TCM fabric modules
  *
- * Copyright (c) 2010 Rising Tide Systems, Inc.
- * Copyright (c) 2010 Linux-iSCSI.org
+ * (c) Copyright 2010-2012 RisingTide Systems LLC.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) 
version 2.
  *
  * Nicholas A. Bellinger 
  *
diff --git a/drivers/target/target_core_file.c 
b/drivers/target/target_core_file.c
index 3ccef83..421da5c 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -3,10 +3,9 @@
  *
  * This file contains the Storage Engine <-> FILEIO transport specific 
functions
  *
- * Copyright (c) 2005 PyX Technologies, Inc.
- * 

Re: [PATCH 02/11] time: convert arch_gettimeoffset to a pointer

2012-11-09 Thread John Stultz

On 11/08/2012 01:01 PM, Stephen Warren wrote:

From: Stephen Warren 

Currently, whenever CONFIG_ARCH_USES_GETTIMEOFFSET is enabled, each
arch core provides a single implementation of arch_gettimeoffset(). In
many cases, different sub-architectures, different machines, or
different timer providers exist, and so the arch ends up implementing
arch_gettimeoffset() as a call-through-pointer anyway. Examples are
ARM, Cris, M68K, and it's arguable that the remaining architectures,
M32R and Blackfin, should be doing this anyway.

Modify arch_gettimeoffset so that it itself is a function pointer, which
the arch initializes. This will allow later changes to move the
initialization of this function into individual machine support or timer
drivers. This is particularly useful for code in drivers/clocksource
which should rely on an arch-independant mechanism to register their
implementation of arch_gettimeoffset().

This patch also converts the Cris architecture to set arch_gettimeoffset
directly to the final implementation in time_init(), because Cris already
had separate time_init() functions per sub-architecture. M68K and ARM
are converted to set arch_gettimeoffset the final implementation in later
patches, because they already have function pointers in place for this
purpose.

[snip]

diff --git a/include/linux/time.h b/include/linux/time.h
index 4d358e9..05e32a7 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -142,9 +142,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta);
   * finer then tick granular time.
   */
  #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
-extern u32 arch_gettimeoffset(void);
-#else
-static inline u32 arch_gettimeoffset(void) { return 0; }
+extern u32 (*arch_gettimeoffset)(void);
  #endif

  extern void do_gettimeofday(struct timeval *tv);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e424970..9d00ace 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -140,6 +140,20 @@ static void tk_setup_internals(struct timekeeper *tk, 
struct clocksource *clock)
  }

  /* Timekeeper helper functions. */
+
+#ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
+u32 (*arch_gettimeoffset)(void);
+
+u32 gettimeoffset(void)
+{
+   if (likely(arch_gettimeoffset))
+   return arch_gettimeoffset();
+   return 0;
+}
+#else
+static inline u32 gettimeoffset(void) { return 0; }
+#endif


Minor nit-pick here, but get_arch_timeoffset() or get_arch_tickoffset() 
might be clearer, as gettimeoffset() sounds a little generic, and could 
be confused with the higher-level timekeeping_inject_offset() call.


Otherwise this looks ok to me (disclaimer: I'm back from a 4 week leave, 
so I may not have my brain plugged in all the way yet).


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Stephen Warren
On 11/08/2012 07:26 PM, David Gibson wrote:
...
> So, let me take a stab at this from a more bottom-up approach, and see
> if we meet in the middle somewhere.  As I discussed in the other
> thread with Daniel Mack, I can see two different operationso on the
> fdt that might be useful in this context.  I think of them as "graft"
> - which takes one fdt and adds it as a new subtree to an existing fdt
> - and "overlay" where a new fdt adds or overrides arbitrary properties
> in an existing tree.  Overlay is more or less what we do at the source
> level in dtc already.

One more thought on the differences between overlay and grafting:

With overlay, it's possible to make your overlay a complete DT tree
rooted at /. In some cases, you might find a lower-level node which all
overlaid elements share, and root the overlay process there.

With grafting, you're basically guaranteed to want the child/graft file
to have different parts of itself grafted into different points in the
parent/underlying tree, e.g. to add nodes to an I2C bus, an SPI bus, and
perhaps some bus-less devices at the root (e.g. a new gpio-leds device).
This will require that a child/graft file to consist of multiple chunks
of DT all to be grafted as separate operations, whereas with overlay you
may be able to get away with a single chunk to be overlaid (although
with some of the use-cases discussed, even the overlay approach might
require multiple chunks to be applied).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Stephen Warren
On 11/09/2012 09:28 AM, Grant Likely wrote:
> On Tue, Nov 6, 2012 at 10:37 PM, Stephen Warren  wrote:
...
>> I do rather suspect this use-case is quite common. NVIDIA certainly has
>> a bunch of development boards with pluggable
>> PMIC/audio/WiFi/display/..., and I believe there's some ability to
>> re-use the pluggable components with a variety of base-boards.
>>
>> Given people within NVIDIA started talking about this recently, I asked
>> them to enumerate all the boards we have that support pluggable
>> components, and how common it is that some boards support being plugged
>> into different main boards. I don't know when that enumeration will
>> complete (or even start) but hopefully I can provide some feedback on
>> how common the use-case is for us once it's done.
> 
> From your perspective, is it important to use the exact same .dtb
> overlays for those add-on boards, or is it okay to have a separate
> build of the overlay for each base tree?

I certainly think it'd be extremely beneficial to use the exact same
child board .dtb with arbitrary base boards.

Consider something like the Arduino shield connector format, which I
/believe/ has been re-used across a wide variety of Arduino boards and
other compatible or imitation boards. Now consider a vendor of an
Arduino shield. The shield vendor probably wants to publish a single
.dtb file that works for users irrespective of which board they're using
it with.

(Well, I'm not sure that Arduino can run Linux; perhaps that's why you
picked BeagleBone capes for your document!)

I suppose it would be acceptable for the shield vendor to ship the .dts
file rather than the .dtb, and hence need to build the shield .dtb for a
specific base board.

However, I think the process for an end-user needs to be as simple as
"drop this .dts/.dtb file into some standard directory", and I imagine
it'll be much easier for distros/... to make that process work if
they're dealing with a .dtb that they can just blast into the kernel's
firmware loader interface, rather than having to also locate the
base-board .dts/.dtb file, and run dtc and/or other tools on both .dts
files together.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Grant Likely
On Fri, Nov 9, 2012 at 10:57 PM, Stephen Warren  wrote:
> On 11/08/2012 07:26 PM, David Gibson wrote:
> ...
>> I also think graft will handle most of your use cases, although as I
>> said I don't fully understand the implications of some of them, so I
>> could be wrong.  So, the actual insertion of the subtree is pretty
>> trivial to implement.  phandles are the obvious other matter to be
>> dealt with.  I haven't found the right thread where what you've
>> envisaged so far is discussed, so here are things I can see:
>>
>> 1) Avoiding phandle collisions between main tree and subtree (or
>>between subtrees).
>>
>> I'm hopeful that this can be resolved just by establishing some
>> conventions about the ranges of phandles to be used for various
>> components.  I'd certainly be happy to add a directive to dtc which
>> enforces allocation of phandles within a specified range.
>
> One case I wonder about is:
>
> Base board A that's split into two .dts files e.g. due to there being
> two versions of the base board which are 90% identical, yet there being
> a few differences between them.
>
> Base board B that's just a single .dts file.
>
> We might allocate phandle range 0..999 for the base board.
>
> Child board X plugs into either, so the two base boards need to "export"
> the same set of phandle IDs to the child board to allow it to reference
> them.

It's more than just that. The boards really need to be equivalent
since the irq specifiers and gpio specifiers need to match the gpio
and irq controllers provided by the board. So a simple overlay design
won't cover the case of a single file that will work generically on
any board.

> If base board A version 2 comes along after the phandle IDs that the
> child board uses are defined, then how do we handle assigning a specific
> phandle ID range to each of base board A's two version-specific
> overlays, when the version-specific changes might affect arbitrary
> phandle IDs within the range, and require some to be moved into the base
> board version-specific overlays and some to stay in the common base
> board .dts?

With the design I'm currently considering, at the dts level the
overlay could be compiled against any base boards if the specifiers
are equivalent and the labels match up as indicated above. The
compiled dtb could also be somewhat portable, but only if care is
taken to make sure the phandles match too; possibly by explicitly
specifying them instead of letting them be generated by a hash.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Grant Likely
On Fri, Nov 9, 2012 at 11:06 PM, Stephen Warren  wrote:
> On 11/05/2012 01:40 PM, Grant Likely wrote:
>> As promised, here is my early draft to try and capture what device
>> tree overlays need to do and how to get there. Comments and
>> suggestions greatly appreciated.
>
> Here's one other requirement I'd like that I don't think I saw
> explicitly mentioned in your document:
>
> Assuming a base file board.dts and a child board file child.dts, the
> compiled file child.dtb should be usable with a modified board.dtb
> assuming it exports the same set of attachment-points (hashed phandles,
> socket objects, whatever). This allows bug-fixes etc. to board.dts
> without forcing every child.dts to be recompiled.

No, I hadn't explicitly captured that one, but yes it is the intent.

> If the attachment points is hashed node names or node content from
> board.dts, I'm not sure how this is possible?

Ummm, I think there is misunderstanding about the hashed phandles. My
thought is merely that using a hash to generate a phandle is 'better'
than starting at 1 and counting upwards when dtc compiles the tree.
That way, if the path to the node has not changed, then the phandle
will not have changed.

However, phandles can still be explicitly specified if slightly
different trees need to have the same target point.

That said, if portability of dtb files is a strong requirement, then
it will be difficult to do with simple overlays. Even if the phandles
line up, the irq/gpio specifiers probably need to be different. That
makes things harder.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scsi target, likely GPL violation

2012-11-09 Thread Andy Grover
On 11/08/2012 06:08 PM, Nicholas A. Bellinger wrote:
> Support for certified VAAI is part of our commercial target core. The
> target core constitutes a stand-alone kernel subsystem of which we are
> the sole copyright owners. In addition, our target contains a number of
> backend drivers, of which we are also the sole copyright owners, and a
> number of fabric modules, of which some we are the sole copyright
> owners, and of which others we are not, as you pointed out. A
> substantial fraction of the code of which we own the sole copyright was
> certified by BlackDuck Software as early as in 2007. 

See this:

http://www.gnu.org/licenses/gpl-2.0.html from section 2

"These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program, and
can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based on
the Program, the distribution of the whole must be on the terms of this
License, whose permissions for other licensees extend to the entire
whole, and thus to each and every part regardless of who wrote it."

Is your code an independent and separate work from the Linux kernel?
Some tests might be: can it be used without the Linux kernel? Can it be
used with alternative kernels? Even if the answer to these questions is
YES (which it isn't) then that second quoted sentence would still put
your code under the terms of the GPL, since RTS OS distributes its
changes along with the Linux kernel.

I've spent enough time arguing the factual basis of this issue. It seems
crystal clear to me, and I have a hard time seeing how anyone could
disagree.

But let's forget licenses and talk community. Looking back, can anyone
say that your push to get LIO accepted into mainline as the kernel
target was in good faith? Back before LIO was merged, James chose LIO
over SCST saying to the SCST devs:

"Look, let me try to make it simple: It's not about the community you
bring to the table, it's about the community you have to join when you
become part of the linux kernel."

RTS behaved long enough to get LIO merged, and then forget community.
James is right, community is more important than code, and licenses
enforce community expectations. RTS has appeared just community-focused
enough to prevent someone else's code from being adopted, so it can
extract the benefits and still maintain its proprietary advantage.

-- Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtio-scsi: Fix incorrect lock release order in virtscsi_kick_cmd

2012-11-09 Thread Paolo Bonzini
Il 09/11/2012 20:31, Nicholas A. Bellinger ha scritto:
>> That's done on purpose.  After you do virtqueue_add_buf, you don't need
>> the sg list anymore, nor the lock that protects it.  The cover letter is
>> at https://lkml.org/lkml/2012/6/13/295 and had this text:
>>
>>   This series reorganizes the locking in virtio-scsi, introducing
>>   separate scatterlists for each target and "pipelining" the locks so
>>   that one command can be queued while the other is prepared.  This
>>   improves performance when there are multiple in-flight operations.
>>
>> In fact, the patch _introduces_ wrong locking because
>> virtqueue_kick_prepare needs the vq_lock.
>>
>> Perhaps what you want is separate local_irq_save/local_irq_restore?
> 
> Ahh, that makes more sense now.
> 
> Just noticed this while reviewing code that using one spinlock flag's to
> release the other looks suspicious, minus the ordering bit..
> 
> Using local_irq_* would probably be cleaner than swapping flags between
> different locks, and a short comment here would be helpful to explain
> the locking order context.

Well, my plan is to improve the virtio API so I can reuse the higher
layer's scatterlist, and get rid of the lock (not just of the funny
order) altogether. :)  Queuing requests is really performance-sensitive,
and it can use any optimization.

But if I can't get to it quick, I'll queue a cleanup using local_irq_*.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2] Bluetooth: Add support for BCM20702A0 [0b05, 17b5]

2012-11-09 Thread Jeff Cook
Vendor-specific ID for BCM20702A0.
Support for bluetooth over Asus Wi-Fi GO!, included with Asus P8Z77-V
Deluxe.

T:  Bus=07 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=12  MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0b05 ProdID=17b5 Rev=01.12
S:  Manufacturer=Broadcom Corp
S:  Product=BCM20702A0
S:  SerialNumber=94DBC98AC113
C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=0mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
I:  If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)

Signed-off-by: Jeff Cook 
---
 drivers/bluetooth/btusb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index ee82f2f..a1d4ede 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -96,6 +96,7 @@ static struct usb_device_id btusb_table[] = {
{ USB_DEVICE(0x0c10, 0x) },
 
/* Broadcom BCM20702A0 */
+   { USB_DEVICE(0x0b05, 0x17b5) },
{ USB_DEVICE(0x04ca, 0x2003) },
{ USB_DEVICE(0x0489, 0xe042) },
{ USB_DEVICE(0x413c, 0x8197) },
-- 

Hi Gustavo,

Here is the rebased patch. Sorry for the delay getting this out.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Grant Likely
On Fri, Nov 9, 2012 at 11:23 PM, Stephen Warren  wrote:
> On 11/09/2012 09:28 AM, Grant Likely wrote:
>> On Tue, Nov 6, 2012 at 10:37 PM, Stephen Warren  
>> wrote:
> ...
>>> I do rather suspect this use-case is quite common. NVIDIA certainly has
>>> a bunch of development boards with pluggable
>>> PMIC/audio/WiFi/display/..., and I believe there's some ability to
>>> re-use the pluggable components with a variety of base-boards.
>>>
>>> Given people within NVIDIA started talking about this recently, I asked
>>> them to enumerate all the boards we have that support pluggable
>>> components, and how common it is that some boards support being plugged
>>> into different main boards. I don't know when that enumeration will
>>> complete (or even start) but hopefully I can provide some feedback on
>>> how common the use-case is for us once it's done.
>>
>> From your perspective, is it important to use the exact same .dtb
>> overlays for those add-on boards, or is it okay to have a separate
>> build of the overlay for each base tree?
>
> I certainly think it'd be extremely beneficial to use the exact same
> child board .dtb with arbitrary base boards.
>
> Consider something like the Arduino shield connector format, which I
> /believe/ has been re-used across a wide variety of Arduino boards and
> other compatible or imitation boards. Now consider a vendor of an
> Arduino shield. The shield vendor probably wants to publish a single
> .dtb file that works for users irrespective of which board they're using
> it with.
>
> (Well, I'm not sure that Arduino can run Linux; perhaps that's why you
> picked BeagleBone capes for your document!)

Correct, the Arduino is only an AVR with a tiny amount of ram. No Linux there.

However, Arduino shields are a good example of a use case. I think
there are even some Arduino shield compatible Linux boards out there.

> I suppose it would be acceptable for the shield vendor to ship the .dts
> file rather than the .dtb, and hence need to build the shield .dtb for a
> specific base board.

That would be better I think than relying on a binary. However, some
though needs to go into how to handle base boards that /aren't/ mostly
equivalent. Such as if they have a different GPIO controller. It may
be that for gpios and irqs, the solution really is to use
interrupt-map and create a gpio-map. i2c, spi and others still would
need to become children of the correct bus.

> However, I think the process for an end-user needs to be as simple as
> "drop this .dts/.dtb file into some standard directory", and I imagine
> it'll be much easier for distros/... to make that process work if
> they're dealing with a .dtb that they can just blast into the kernel's
> firmware loader interface, rather than having to also locate the
> base-board .dts/.dtb file, and run dtc and/or other tools on both .dts
> files together.

The base-board .dts is unnecessary. dtc is fully capable of using
/proc/device-tree as the source material.  :-)

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Ever wonder what 21+ years of kernel development look like?

2012-11-09 Thread Darrick J. Wong
Hi all,

With the upcoming U.S. holiday, are you worried that there won't be much work
going on for a few days?  Or perhaps you have kids who need a babysitter while
mom and dad hack code for a few hours?  Maybe you've gotten tired of
fast-forwarding^W^W^Wwatching commercials on TV?

This is not the solution for you!

However, following the recent trend of gource animations popping up on YouTube
(due at least in part to the Phoronix editor), I have posted a 2.75h video of
gource animating the last 21 years of Linux kernel development.  Watch in
amazement as 3D Linux hacker figures shoot multi-colored laser beams of code
goodness (or not) at the source tree!  From Linux 0.01* all the way to 3.7-rc3,
you can watch the tree glow brighter with code churn, and shrink during the
less active times.  A pure delight for the whole family**!

Enjoy! http://www.youtube.com/watch?v=pOSqctHH9vY

Happy (belated) 21st birthday to Linux!  A penguin walks into a U.S. bar...

--D

* The pre-2002 releases, sadly, all look like they're coming from Linus, and
  have 2007-ish timestamps on them, probably due to the method used to shove
  the early history into git.  Sorry.  I'll fix it if I ever figure out how.

** Well, not /my/ family.  They complained about the lack of sound, so I
   synthesized for them some bytebeats music until they fled the room.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] drm: Add NVIDIA Tegra20 support

2012-11-09 Thread Stephen Warren
On 11/09/2012 06:59 AM, Thierry Reding wrote:
> This commit adds a KMS driver for the Tegra20 SoC. This includes basic
> support for host1x and the two display controllers found on the Tegra20
> SoC. Each display controller can drive a separate RGB/LVDS output.

I applied these two patches and the two arch/arm/mach-tegra patches you
posted, directly on top of next-20121109, and I see the following build
failure:

> drivers/gpu/drm/tegra/output.c: In function 'tegra_output_init':
> drivers/gpu/drm/tegra/output.c:166:9: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c:166:3: error: implicit declaration of function 
> 'of_get_display'
> drivers/gpu/drm/tegra/output.c:167:20: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c:168:25: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c:179:13: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c:180:3: error: implicit declaration of function 
> 'display_put'
> drivers/gpu/drm/tegra/output.c:180:21: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c:257:20: error: 'struct tegra_output' has no 
> member named 'display'
> drivers/gpu/drm/tegra/output.c: In function 'tegra_output_exit':
> drivers/gpu/drm/tegra/output.c:272:20: error: 'struct tegra_output' has no 
> member named 'display'

Does this depend on something not in linux-next?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at kernel/rcutree.c:1562 rcu_do_batch()

2012-11-09 Thread Paul E. McKenney
On Fri, Nov 09, 2012 at 12:23:30PM +0800, Fengguang Wu wrote:
> Paul,
> 
> I got the below warning in stable kernel 3.6.3. linux-next does
> not have this issue. Bisect shows that the first bad commit is
> 
> commit b1420f1c8bfc30ecf6380a31d0f686884834b599
> Author: Paul E. McKenney 
> Date:   Thu Mar 1 13:18:08 2012 -0800
> 
> rcu: Make rcu_barrier() less disruptive
> 
> 
> [   92.252733] do_IRQ: 1.59 No irq handler for vector (irq -1)
> [   92.253257] [ cut here ]
> [   92.253675] WARNING: at /c/kernel-tests/src/stable/kernel/rcutree.c:1562 
> rcu_do_batch+0x17e/0x63b()
> [   92.254474] Hardware name: Bochs
> [   92.254766] Modules linked in:
> [   92.256689] Pid: 9, comm: migration/1 Not tainted 3.6.3 #1306
> [   92.256689] Call Trace:
> [   92.256689][] warn_slowpath_common+0x83/0x9c
> [   92.256689]  [] warn_slowpath_null+0x1a/0x1c
> [   92.256689]  [] rcu_do_batch+0x17e/0x63b
> [   92.256689]  [] ? rcu_report_qs_rnp+0x28b/0x2d5
> [   92.256689]  [] ? rcu_process_callbacks+0xe3/0x236
> [   92.256689]  [] rcu_process_callbacks+0x172/0x236
> [   92.256689]  [] __do_softirq+0xf6/0x231
> [   92.256689]  [] ? tick_program_event+0x24/0x26
> [   92.256689]  [] call_softirq+0x1c/0x30
> [   92.256689]  [] do_softirq+0x4a/0xa6
> [   92.256689]  [] irq_exit+0x51/0xbc
> [   92.256689]  [] smp_apic_timer_interrupt+0x8b/0x99
> [   92.256689]  [] apic_timer_interrupt+0x6f/0x80
> [   92.256689][] ? local_clock+0x1d/0x5a
> [   92.256689]  [] ? stop_machine_cpu_stop+0x104/0x119
> [   92.256689]  [] cpu_stopper_thread+0xdd/0x17d
> [   92.256689]  [] ? queue_stop_cpus_work+0x130/0x130
> [   92.256689]  [] ? _raw_spin_unlock_irqrestore+0x47/0x65
> [   92.256689]  [] ? trace_hardirqs_on_caller+0x125/0x181
> [   92.256689]  [] ? trace_hardirqs_on+0xd/0xf
> [   92.256689]  [] ? cpu_stop_signal_done+0x2c/0x2c
> [   92.256689]  [] kthread+0x9a/0xa2
> [   92.256689]  [] kernel_thread_helper+0x4/0x10
> [   92.256689]  [] ? retint_restore_args+0x13/0x13
> [   92.256689]  [] ? __init_kthread_worker+0x5a/0x5a
> [   92.317029]  [] ? gs_change+0x13/0x13
> 
> Thanks,
> Fengguang

Hello, Fengguang,

You need commit #bfa00b4c, which prevents offline CPUs from getting
into rcu_do_batch.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFT RESEND linux-next] sparc: dma-mapping: support debug_dma_mapping_error

2012-11-09 Thread David Miller
From: Shuah Khan 
Date: Fri, 26 Oct 2012 10:13:09 -0600

> Add support for debug_dma_mapping_error() call to avoid warning from
> debug_dma_unmap() interface when it checks for mapping error checked
> status. Without this patch, device driver failed to check map error
> warning is generated.
> 
> Signed-off-by: Shuah Khan 

This doesn't even compile:

/home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h: In function 
'dma_mapping_error':
/home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h:62:2: error: 
implicit declaration of function 'debug_dma
_mapping_error' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
In file included from include/linux/dma-mapping.h:76:0,
 from include/linux/skbuff.h:33,
 from include/linux/icmpv6.h:4,
 from include/linux/ipv6.h:58,
 from include/net/ipv6.h:16,
 from include/linux/sunrpc/clnt.h:26,
 from include/linux/nfs_fs.h:30,
 from init/do_mounts.c:30:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/12] perf tests: Move test__vmlinux_matches_kallsyms into separate object

2012-11-09 Thread Jiri Olsa
Separating test__vmlinux_matches_kallsyms test from
the builtin-test into vmlinux-kallsyms object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c | 223 +-
 tools/perf/tests/tests.h|   6 +
 tools/perf/tests/vmlinux-kallsyms.c | 230 
 4 files changed, 239 insertions(+), 221 deletions(-)
 create mode 100644 tools/perf/tests/tests.h
 create mode 100644 tools/perf/tests/vmlinux-kallsyms.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index cca5bb8..7c7ba4d 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -431,6 +431,7 @@ LIB_OBJS += $(OUTPUT)arch/common.o
 LIB_OBJS += $(OUTPUT)tests/parse-events.o
 LIB_OBJS += $(OUTPUT)tests/dso-data.o
 LIB_OBJS += $(OUTPUT)tests/attr.o
+LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 5d4354e..5bc9063 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -21,231 +21,12 @@
 
 #include 
 
-static int vmlinux_matches_kallsyms_filter(struct map *map __maybe_unused,
-  struct symbol *sym)
-{
-   bool *visited = symbol__priv(sym);
-   *visited = true;
-   return 0;
-}
-
-static int test__vmlinux_matches_kallsyms(void)
-{
-   int err = -1;
-   struct rb_node *nd;
-   struct symbol *sym;
-   struct map *kallsyms_map, *vmlinux_map;
-   struct machine kallsyms, vmlinux;
-   enum map_type type = MAP__FUNCTION;
-   struct ref_reloc_sym ref_reloc_sym = { .name = "_stext", };
-
-   /*
-* Step 1:
-*
-* Init the machines that will hold kernel, modules obtained from
-* both vmlinux + .ko files and from /proc/kallsyms split by modules.
-*/
-   machine__init(, "", HOST_KERNEL_ID);
-   machine__init(, "", HOST_KERNEL_ID);
-
-   /*
-* Step 2:
-*
-* Create the kernel maps for kallsyms and the DSO where we will then
-* load /proc/kallsyms. Also create the modules maps from /proc/modules
-* and find the .ko files that match them in /lib/modules/`uname -r`/.
-*/
-   if (machine__create_kernel_maps() < 0) {
-   pr_debug("machine__create_kernel_maps ");
-   return -1;
-   }
-
-   /*
-* Step 3:
-*
-* Load and split /proc/kallsyms into multiple maps, one per module.
-*/
-   if (machine__load_kallsyms(, "/proc/kallsyms", type, NULL) <= 
0) {
-   pr_debug("dso__load_kallsyms ");
-   goto out;
-   }
-
-   /*
-* Step 4:
-*
-* kallsyms will be internally on demand sorted by name so that we can
-* find the reference relocation * symbol, i.e. the symbol we will use
-* to see if the running kernel was relocated by checking if it has the
-* same value in the vmlinux file we load.
-*/
-   kallsyms_map = machine__kernel_map(, type);
-
-   sym = map__find_symbol_by_name(kallsyms_map, ref_reloc_sym.name, NULL);
-   if (sym == NULL) {
-   pr_debug("dso__find_symbol_by_name ");
-   goto out;
-   }
-
-   ref_reloc_sym.addr = sym->start;
-
-   /*
-* Step 5:
-*
-* Now repeat step 2, this time for the vmlinux file we'll auto-locate.
-*/
-   if (machine__create_kernel_maps() < 0) {
-   pr_debug("machine__create_kernel_maps ");
-   goto out;
-   }
-
-   vmlinux_map = machine__kernel_map(, type);
-   map__kmap(vmlinux_map)->ref_reloc_sym = _reloc_sym;
-
-   /*
-* Step 6:
-*
-* Locate a vmlinux file in the vmlinux path that has a buildid that
-* matches the one of the running kernel.
-*
-* While doing that look if we find the ref reloc symbol, if we find it
-* we'll have its ref_reloc_symbol.unrelocated_addr and then
-* maps__reloc_vmlinux will notice and set proper ->[un]map_ip routines
-* to fixup the symbols.
-*/
-   if (machine__load_vmlinux_path(, type,
-  vmlinux_matches_kallsyms_filter) <= 0) {
-   pr_debug("machine__load_vmlinux_path ");
-   goto out;
-   }
-
-   err = 0;
-   /*
-* Step 7:
-*
-* Now look at the symbols in the vmlinux DSO and check if we find all 
of them
-* in the kallsyms dso. For the ones that are in both, check its names 
and
-* end addresses too.
-*/
-   for (nd = rb_first(_map->dso->symbols[type]); nd; nd = 
rb_next(nd)) {
-   

[PATCH 06/12] perf tests: Move test__rdpmc into separate object

2012-11-09 Thread Jiri Olsa
Separating test__rdpmc test from the builtin-test
into rdpmc object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c | 168 --
 tools/perf/tests/rdpmc.c| 175 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 177 insertions(+), 168 deletions(-)
 create mode 100644 tools/perf/tests/rdpmc.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index a2d6153..2e5197a 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -436,6 +436,7 @@ LIB_OBJS += $(OUTPUT)tests/open-syscall.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
 LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/perf-record.o
+LIB_OBJS += $(OUTPUT)tests/rdpmc.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 7cb3928..1e9a0ea 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -30,174 +30,6 @@
 #include 
 
 
-
-#if defined(__x86_64__) || defined(__i386__)
-
-#define barrier() asm volatile("" ::: "memory")
-
-static u64 rdpmc(unsigned int counter)
-{
-   unsigned int low, high;
-
-   asm volatile("rdpmc" : "=a" (low), "=d" (high) : "c" (counter));
-
-   return low | ((u64)high) << 32;
-}
-
-static u64 rdtsc(void)
-{
-   unsigned int low, high;
-
-   asm volatile("rdtsc" : "=a" (low), "=d" (high));
-
-   return low | ((u64)high) << 32;
-}
-
-static u64 mmap_read_self(void *addr)
-{
-   struct perf_event_mmap_page *pc = addr;
-   u32 seq, idx, time_mult = 0, time_shift = 0;
-   u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
-
-   do {
-   seq = pc->lock;
-   barrier();
-
-   enabled = pc->time_enabled;
-   running = pc->time_running;
-
-   if (enabled != running) {
-   cyc = rdtsc();
-   time_mult = pc->time_mult;
-   time_shift = pc->time_shift;
-   time_offset = pc->time_offset;
-   }
-
-   idx = pc->index;
-   count = pc->offset;
-   if (idx)
-   count += rdpmc(idx - 1);
-
-   barrier();
-   } while (pc->lock != seq);
-
-   if (enabled != running) {
-   u64 quot, rem;
-
-   quot = (cyc >> time_shift);
-   rem = cyc & ((1 << time_shift) - 1);
-   delta = time_offset + quot * time_mult +
-   ((rem * time_mult) >> time_shift);
-
-   enabled += delta;
-   if (idx)
-   running += delta;
-
-   quot = count / running;
-   rem = count % running;
-   count = quot * enabled + (rem * enabled) / running;
-   }
-
-   return count;
-}
-
-/*
- * If the RDPMC instruction faults then signal this back to the test parent 
task:
- */
-static void segfault_handler(int sig __maybe_unused,
-siginfo_t *info __maybe_unused,
-void *uc __maybe_unused)
-{
-   exit(-1);
-}
-
-static int __test__rdpmc(void)
-{
-   volatile int tmp = 0;
-   u64 i, loops = 1000;
-   int n;
-   int fd;
-   void *addr;
-   struct perf_event_attr attr = {
-   .type = PERF_TYPE_HARDWARE,
-   .config = PERF_COUNT_HW_INSTRUCTIONS,
-   .exclude_kernel = 1,
-   };
-   u64 delta_sum = 0;
-struct sigaction sa;
-
-   sigfillset(_mask);
-   sa.sa_sigaction = segfault_handler;
-   sigaction(SIGSEGV, , NULL);
-
-   fd = sys_perf_event_open(, 0, -1, -1, 0);
-   if (fd < 0) {
-   pr_err("Error: sys_perf_event_open() syscall returned "
-  "with %d (%s)\n", fd, strerror(errno));
-   return -1;
-   }
-
-   addr = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
-   if (addr == (void *)(-1)) {
-   pr_err("Error: mmap() syscall returned with (%s)\n",
-  strerror(errno));
-   goto out_close;
-   }
-
-   for (n = 0; n < 6; n++) {
-   u64 stamp, now, delta;
-
-   stamp = mmap_read_self(addr);
-
-   for (i = 0; i < loops; i++)
-   tmp++;
-
-   now = mmap_read_self(addr);
-   loops *= 10;
-
-   delta = now - stamp;
-   pr_debug("%14d: %14Lu\n", n, (long long)delta);
-
-   delta_sum += delta;
-   }
-
-   munmap(addr, page_size);
-   pr_debug("   ");
-out_close:
-   close(fd);
-
-   if (!delta_sum)
-   return -1;
-
-   

[PATCH 10/12] perf tests: Move pmu tests into separate object

2012-11-09 Thread Jiri Olsa
Separating pmu's object tests into pmu object under
tests directory.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c |   7 +-
 tools/perf/tests/pmu.c  | 178 ++
 tools/perf/tests/tests.h|   1 +
 tools/perf/util/pmu.c   | 185 ++--
 tools/perf/util/pmu.h   |   4 +
 6 files changed, 191 insertions(+), 185 deletions(-)
 create mode 100644 tools/perf/tests/pmu.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 1e50559..9af012f 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -440,6 +440,7 @@ LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
 LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o
 LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o
+LIB_OBJS += $(OUTPUT)tests/pmu.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index bab8490..d3b95e0 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -30,11 +30,6 @@
 #include 
 
 
-static int test__perf_pmu(void)
-{
-   return perf_pmu__test();
-}
-
 static struct test {
const char *desc;
int (*func)(void);
@@ -71,7 +66,7 @@ static struct test {
},
{
.desc = "Test perf pmu format parsing",
-   .func = test__perf_pmu,
+   .func = test__pmu,
},
{
.desc = "Test dso data interface",
diff --git a/tools/perf/tests/pmu.c b/tools/perf/tests/pmu.c
new file mode 100644
index 000..a5f3798
--- /dev/null
+++ b/tools/perf/tests/pmu.c
@@ -0,0 +1,178 @@
+#include "parse-events.h"
+#include "pmu.h"
+#include "util.h"
+#include "tests.h"
+
+/* Simulated format definitions. */
+static struct test_format {
+   const char *name;
+   const char *value;
+} test_formats[] = {
+   { "krava01", "config:0-1,62-63\n", },
+   { "krava02", "config:10-17\n", },
+   { "krava03", "config:5\n", },
+   { "krava11", "config1:0,2,4,6,8,20-28\n", },
+   { "krava12", "config1:63\n", },
+   { "krava13", "config1:45-47\n", },
+   { "krava21", "config2:0-3,10-13,20-23,30-33,40-43,50-53,60-63\n", },
+   { "krava22", "config2:8,18,48,58\n", },
+   { "krava23", "config2:28-29,38\n", },
+};
+
+#define TEST_FORMATS_CNT (sizeof(test_formats) / sizeof(struct test_format))
+
+/* Simulated users input. */
+static struct parse_events__term test_terms[] = {
+   {
+   .config= (char *) "krava01",
+   .val.num   = 15,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava02",
+   .val.num   = 170,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava03",
+   .val.num   = 1,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava11",
+   .val.num   = 27,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava12",
+   .val.num   = 1,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava13",
+   .val.num   = 2,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava21",
+   .val.num   = 119,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava22",
+   .val.num   = 11,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+   {
+   .config= (char *) "krava23",
+   .val.num   = 2,
+   .type_val  = PARSE_EVENTS__TERM_TYPE_NUM,
+   .type_term = PARSE_EVENTS__TERM_TYPE_USER,
+   },
+};
+#define TERMS_CNT (sizeof(test_terms) / sizeof(struct parse_events__term))
+
+/*
+ * Prepare format directory data, exported by kernel
+ * at /sys/bus/event_source/devices//format.
+ */
+static char *test_format_dir_get(void)
+{
+   static char dir[PATH_MAX];
+   unsigned int i;
+
+   

[PATCH 08/12] perf tests: Move perf_evsel__tp_sched_test into separate object

2012-11-09 Thread Jiri Olsa
Separating perf_evsel__tp_sched_test test from the
builtin-test into evsel-tp-sched object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile   |  1 +
 tools/perf/tests/builtin-test.c   | 83 +-
 tools/perf/tests/evsel-tp-sched.c | 84 +++
 tools/perf/tests/tests.h  |  1 +
 4 files changed, 87 insertions(+), 82 deletions(-)
 create mode 100644 tools/perf/tests/evsel-tp-sched.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index ad6fcb5..e510b53 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -438,6 +438,7 @@ LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
 LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o
+LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 93f5e91..c66caa7 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -35,87 +35,6 @@ static int test__perf_pmu(void)
return perf_pmu__test();
 }
 
-static int perf_evsel__test_field(struct perf_evsel *evsel, const char *name,
- int size, bool should_be_signed)
-{
-   struct format_field *field = perf_evsel__field(evsel, name);
-   int is_signed;
-   int ret = 0;
-
-   if (field == NULL) {
-   pr_debug("%s: \"%s\" field not found!\n", evsel->name, name);
-   return -1;
-   }
-
-   is_signed = !!(field->flags | FIELD_IS_SIGNED);
-   if (should_be_signed && !is_signed) {
-   pr_debug("%s: \"%s\" signedness(%d) is wrong, should be %d\n",
-evsel->name, name, is_signed, should_be_signed);
-   ret = -1;
-   }
-
-   if (field->size != size) {
-   pr_debug("%s: \"%s\" size (%d) should be %d!\n",
-evsel->name, name, field->size, size);
-   ret = -1;
-   }
-
-   return ret;
-}
-
-static int perf_evsel__tp_sched_test(void)
-{
-   struct perf_evsel *evsel = perf_evsel__newtp("sched", "sched_switch", 
0);
-   int ret = 0;
-
-   if (evsel == NULL) {
-   pr_debug("perf_evsel__new\n");
-   return -1;
-   }
-
-   if (perf_evsel__test_field(evsel, "prev_comm", 16, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "prev_pid", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "prev_prio", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "prev_state", 8, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "next_comm", 16, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "next_pid", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "next_prio", 4, true))
-   ret = -1;
-
-   perf_evsel__delete(evsel);
-
-   evsel = perf_evsel__newtp("sched", "sched_wakeup", 0);
-
-   if (perf_evsel__test_field(evsel, "comm", 16, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "pid", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "prio", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "success", 4, true))
-   ret = -1;
-
-   if (perf_evsel__test_field(evsel, "target_cpu", 4, true))
-   ret = -1;
-
-   return ret;
-}
-
 static int test__syscall_open_tp_fields(void)
 {
struct perf_record_opts opts = {
@@ -276,7 +195,7 @@ static struct test {
},
{
.desc = "Check parsing of sched tracepoints fields",
-   .func = perf_evsel__tp_sched_test,
+   .func = test__perf_evsel__tp_sched_test,
},
{
.desc = "Generate and check syscalls:sys_enter_open event 
fields",
diff --git a/tools/perf/tests/evsel-tp-sched.c 
b/tools/perf/tests/evsel-tp-sched.c
new file mode 100644
index 000..a5d2fcc
--- /dev/null
+++ b/tools/perf/tests/evsel-tp-sched.c
@@ -0,0 +1,84 @@
+#include "evsel.h"
+#include "tests.h"
+#include "event-parse.h"
+
+static int perf_evsel__test_field(struct perf_evsel *evsel, const char *name,
+ int size, bool should_be_signed)
+{
+   struct format_field *field = perf_evsel__field(evsel, name);
+   int is_signed;
+   int ret = 0;
+
+   if (field == NULL) {
+   pr_debug("%s: \"%s\" field not found!\n", evsel->name, name);
+   return -1;
+   }
+
+   is_signed = !!(field->flags | FIELD_IS_SIGNED);
+   if (should_be_signed && !is_signed) {
+   

[PATCH 09/12] perf tests: Move test__syscall_open_tp_fields into separate object

2012-11-09 Thread Jiri Olsa
Separating test__syscall_open_tp_fields test from the
builtin-test into open-syscall-tp-fields object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile   |   1 +
 tools/perf/tests/builtin-test.c   | 112 
 tools/perf/tests/open-syscall-tp-fields.c | 117 ++
 tools/perf/tests/tests.h  |   1 +
 4 files changed, 119 insertions(+), 112 deletions(-)
 create mode 100644 tools/perf/tests/open-syscall-tp-fields.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index e510b53..1e50559 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -434,6 +434,7 @@ LIB_OBJS += $(OUTPUT)tests/attr.o
 LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
+LIB_OBJS += $(OUTPUT)tests/open-syscall-tp-fields.o
 LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index c66caa7..bab8490 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -35,118 +35,6 @@ static int test__perf_pmu(void)
return perf_pmu__test();
 }
 
-static int test__syscall_open_tp_fields(void)
-{
-   struct perf_record_opts opts = {
-   .target = {
-   .uid = UINT_MAX,
-   .uses_mmap = true,
-   },
-   .no_delay   = true,
-   .freq   = 1,
-   .mmap_pages = 256,
-   .raw_samples = true,
-   };
-   const char *filename = "/etc/passwd";
-   int flags = O_RDONLY | O_DIRECTORY;
-   struct perf_evlist *evlist = perf_evlist__new(NULL, NULL);
-   struct perf_evsel *evsel;
-   int err = -1, i, nr_events = 0, nr_polls = 0;
-
-   if (evlist == NULL) {
-   pr_debug("%s: perf_evlist__new\n", __func__);
-   goto out;
-   }
-
-   evsel = perf_evsel__newtp("syscalls", "sys_enter_open", 0);
-   if (evsel == NULL) {
-   pr_debug("%s: perf_evsel__newtp\n", __func__);
-   goto out_delete_evlist;
-   }
-
-   perf_evlist__add(evlist, evsel);
-
-   err = perf_evlist__create_maps(evlist, );
-   if (err < 0) {
-   pr_debug("%s: perf_evlist__create_maps\n", __func__);
-   goto out_delete_evlist;
-   }
-
-   perf_evsel__config(evsel, , evsel);
-
-   evlist->threads->map[0] = getpid();
-
-   err = perf_evlist__open(evlist);
-   if (err < 0) {
-   pr_debug("perf_evlist__open: %s\n", strerror(errno));
-   goto out_delete_evlist;
-   }
-
-   err = perf_evlist__mmap(evlist, UINT_MAX, false);
-   if (err < 0) {
-   pr_debug("perf_evlist__mmap: %s\n", strerror(errno));
-   goto out_delete_evlist;
-   }
-
-   perf_evlist__enable(evlist);
-
-   /*
-* Generate the event:
-*/
-   open(filename, flags);
-
-   while (1) {
-   int before = nr_events;
-
-   for (i = 0; i < evlist->nr_mmaps; i++) {
-   union perf_event *event;
-
-   while ((event = perf_evlist__mmap_read(evlist, i)) != 
NULL) {
-   const u32 type = event->header.type;
-   int tp_flags;
-   struct perf_sample sample;
-
-   ++nr_events;
-
-   if (type != PERF_RECORD_SAMPLE)
-   continue;
-
-   err = perf_evsel__parse_sample(evsel, event, 
);
-   if (err) {
-   pr_err("Can't parse sample, err = 
%d\n", err);
-   goto out_munmap;
-   }
-
-   tp_flags = perf_evsel__intval(evsel, , 
"flags");
-
-   if (flags != tp_flags) {
-   pr_debug("%s: Expected flags=%#x, got 
%#x\n",
-__func__, flags, tp_flags);
-   goto out_munmap;
-   }
-
-   goto out_ok;
-   }
-   }
-
-   if (nr_events == before)
-   poll(evlist->pollfd, evlist->nr_fds, 10);
-
-   if (++nr_polls > 5) {
-   pr_debug("%s: no events!\n", __func__);
-   goto out_munmap;
-   }
-   }
-out_ok:
-   err = 0;
-out_munmap:
-   perf_evlist__munmap(evlist);
-out_delete_evlist:
-  

[PATCH 02/12] perf tests: Move test__open_syscall_event into separate object

2012-11-09 Thread Jiri Olsa
Separating test__open_syscall_event test from
the builtin-test into open-syscall object.

Adding util object under tests directory to gather
help functions common to more tests.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |  2 +
 tools/perf/tests/builtin-test.c | 84 -
 tools/perf/tests/open-syscall.c | 66 
 tools/perf/tests/tests.h|  5 +++
 tools/perf/tests/util.c | 30 +++
 5 files changed, 103 insertions(+), 84 deletions(-)
 create mode 100644 tools/perf/tests/open-syscall.c
 create mode 100644 tools/perf/tests/util.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7c7ba4d..69f582c 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -432,6 +432,8 @@ LIB_OBJS += $(OUTPUT)tests/parse-events.o
 LIB_OBJS += $(OUTPUT)tests/dso-data.o
 LIB_OBJS += $(OUTPUT)tests/attr.o
 LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
+LIB_OBJS += $(OUTPUT)tests/open-syscall.o
+LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 5bc9063..b6b1e46 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -27,90 +27,6 @@
 
 #include "tests.h"
 
-static int trace_event__id(const char *evname)
-{
-   char *filename;
-   int err = -1, fd;
-
-   if (asprintf(,
-"%s/syscalls/%s/id",
-tracing_events_path, evname) < 0)
-   return -1;
-
-   fd = open(filename, O_RDONLY);
-   if (fd >= 0) {
-   char id[16];
-   if (read(fd, id, sizeof(id)) > 0)
-   err = atoi(id);
-   close(fd);
-   }
-
-   free(filename);
-   return err;
-}
-
-static int test__open_syscall_event(void)
-{
-   int err = -1, fd;
-   struct thread_map *threads;
-   struct perf_evsel *evsel;
-   struct perf_event_attr attr;
-   unsigned int nr_open_calls = 111, i;
-   int id = trace_event__id("sys_enter_open");
-
-   if (id < 0) {
-   pr_debug("is debugfs mounted on /sys/kernel/debug?\n");
-   return -1;
-   }
-
-   threads = thread_map__new(-1, getpid(), UINT_MAX);
-   if (threads == NULL) {
-   pr_debug("thread_map__new\n");
-   return -1;
-   }
-
-   memset(, 0, sizeof(attr));
-   attr.type = PERF_TYPE_TRACEPOINT;
-   attr.config = id;
-   evsel = perf_evsel__new(, 0);
-   if (evsel == NULL) {
-   pr_debug("perf_evsel__new\n");
-   goto out_thread_map_delete;
-   }
-
-   if (perf_evsel__open_per_thread(evsel, threads) < 0) {
-   pr_debug("failed to open counter: %s, "
-"tweak /proc/sys/kernel/perf_event_paranoid?\n",
-strerror(errno));
-   goto out_evsel_delete;
-   }
-
-   for (i = 0; i < nr_open_calls; ++i) {
-   fd = open("/etc/passwd", O_RDONLY);
-   close(fd);
-   }
-
-   if (perf_evsel__read_on_cpu(evsel, 0, 0) < 0) {
-   pr_debug("perf_evsel__read_on_cpu\n");
-   goto out_close_fd;
-   }
-
-   if (evsel->counts->cpu[0].val != nr_open_calls) {
-   pr_debug("perf_evsel__read_on_cpu: expected to intercept %d 
calls, got %" PRIu64 "\n",
-nr_open_calls, evsel->counts->cpu[0].val);
-   goto out_close_fd;
-   }
-
-   err = 0;
-out_close_fd:
-   perf_evsel__close_fd(evsel, 1, threads->nr);
-out_evsel_delete:
-   perf_evsel__delete(evsel);
-out_thread_map_delete:
-   thread_map__delete(threads);
-   return err;
-}
-
 #include 
 
 static int test__open_syscall_event_on_all_cpus(void)
diff --git a/tools/perf/tests/open-syscall.c b/tools/perf/tests/open-syscall.c
new file mode 100644
index 000..98be8b5
--- /dev/null
+++ b/tools/perf/tests/open-syscall.c
@@ -0,0 +1,66 @@
+#include "thread_map.h"
+#include "evsel.h"
+#include "debug.h"
+#include "tests.h"
+
+int test__open_syscall_event(void)
+{
+   int err = -1, fd;
+   struct thread_map *threads;
+   struct perf_evsel *evsel;
+   struct perf_event_attr attr;
+   unsigned int nr_open_calls = 111, i;
+   int id = trace_event__id("sys_enter_open");
+
+   if (id < 0) {
+   pr_debug("is debugfs mounted on /sys/kernel/debug?\n");
+   return -1;
+   }
+
+   threads = thread_map__new(-1, getpid(), UINT_MAX);
+   if (threads == NULL) {
+   pr_debug("thread_map__new\n");
+   return -1;
+   }
+
+   memset(, 0, sizeof(attr));
+   attr.type = PERF_TYPE_TRACEPOINT;
+   attr.config = id;
+   

[PATCH 07/12] perf tests: Move perf_evsel__roundtrip_name_test into separate object

2012-11-09 Thread Jiri Olsa
Separating perf_evsel__roundtrip_name_test test from the
builtin-test into evsel-roundtrip-name object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c | 112 +--
 tools/perf/tests/evsel-roundtrip-name.c | 114 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 117 insertions(+), 111 deletions(-)
 create mode 100644 tools/perf/tests/evsel-roundtrip-name.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 2e5197a..ad6fcb5 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -437,6 +437,7 @@ LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
 LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
+LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 1e9a0ea..93f5e91 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -35,116 +35,6 @@ static int test__perf_pmu(void)
return perf_pmu__test();
 }
 
-static int perf_evsel__roundtrip_cache_name_test(void)
-{
-   char name[128];
-   int type, op, err = 0, ret = 0, i, idx;
-   struct perf_evsel *evsel;
-struct perf_evlist *evlist = perf_evlist__new(NULL, NULL);
-
-if (evlist == NULL)
-return -ENOMEM;
-
-   for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
-   for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
-   /* skip invalid cache type */
-   if (!perf_evsel__is_cache_op_valid(type, op))
-   continue;
-
-   for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
-   __perf_evsel__hw_cache_type_op_res_name(type, 
op, i,
-   name, 
sizeof(name));
-   err = parse_events(evlist, name, 0);
-   if (err)
-   ret = err;
-   }
-   }
-   }
-
-   idx = 0;
-   evsel = perf_evlist__first(evlist);
-
-   for (type = 0; type < PERF_COUNT_HW_CACHE_MAX; type++) {
-   for (op = 0; op < PERF_COUNT_HW_CACHE_OP_MAX; op++) {
-   /* skip invalid cache type */
-   if (!perf_evsel__is_cache_op_valid(type, op))
-   continue;
-
-   for (i = 0; i < PERF_COUNT_HW_CACHE_RESULT_MAX; i++) {
-   __perf_evsel__hw_cache_type_op_res_name(type, 
op, i,
-   name, 
sizeof(name));
-   if (evsel->idx != idx)
-   continue;
-
-   ++idx;
-
-   if (strcmp(perf_evsel__name(evsel), name)) {
-   pr_debug("%s != %s\n", 
perf_evsel__name(evsel), name);
-   ret = -1;
-   }
-
-   evsel = perf_evsel__next(evsel);
-   }
-   }
-   }
-
-   perf_evlist__delete(evlist);
-   return ret;
-}
-
-static int __perf_evsel__name_array_test(const char *names[], int nr_names)
-{
-   int i, err;
-   struct perf_evsel *evsel;
-struct perf_evlist *evlist = perf_evlist__new(NULL, NULL);
-
-if (evlist == NULL)
-return -ENOMEM;
-
-   for (i = 0; i < nr_names; ++i) {
-   err = parse_events(evlist, names[i], 0);
-   if (err) {
-   pr_debug("failed to parse event '%s', err %d\n",
-names[i], err);
-   goto out_delete_evlist;
-   }
-   }
-
-   err = 0;
-   list_for_each_entry(evsel, >entries, node) {
-   if (strcmp(perf_evsel__name(evsel), names[evsel->idx])) {
-   --err;
-   pr_debug("%s != %s\n", perf_evsel__name(evsel), 
names[evsel->idx]);
-   }
-   }
-
-out_delete_evlist:
-   perf_evlist__delete(evlist);
-   return err;
-}
-
-#define perf_evsel__name_array_test(names) \
-   __perf_evsel__name_array_test(names, ARRAY_SIZE(names))
-
-static int perf_evsel__roundtrip_name_test(void)
-{
-   int err = 0, ret = 0;
-
-   err = perf_evsel__name_array_test(perf_evsel__hw_names);
-   if (err)
-   ret = err;
-
-   err = 

[PATCH 04/12] perf tests: Move test__basic_mmap into separate object

2012-11-09 Thread Jiri Olsa
Separating test__basic_mmap test from the builtin-test
into mmap-basic object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c | 157 --
 tools/perf/tests/mmap-basic.c   | 162 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 164 insertions(+), 157 deletions(-)
 create mode 100644 tools/perf/tests/mmap-basic.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index d413e89..337489e 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -434,6 +434,7 @@ LIB_OBJS += $(OUTPUT)tests/attr.o
 LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
+LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 98e883b..609f592 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -30,163 +30,6 @@
 #include 
 
 
-/*
- * This test will generate random numbers of calls to some getpid syscalls,
- * then establish an mmap for a group of events that are created to monitor
- * the syscalls.
- *
- * It will receive the events, using mmap, use its PERF_SAMPLE_ID generated
- * sample.id field to map back to its respective perf_evsel instance.
- *
- * Then it checks if the number of syscalls reported as perf events by
- * the kernel corresponds to the number of syscalls made.
- */
-static int test__basic_mmap(void)
-{
-   int err = -1;
-   union perf_event *event;
-   struct thread_map *threads;
-   struct cpu_map *cpus;
-   struct perf_evlist *evlist;
-   struct perf_event_attr attr = {
-   .type   = PERF_TYPE_TRACEPOINT,
-   .read_format= PERF_FORMAT_ID,
-   .sample_type= PERF_SAMPLE_ID,
-   .watermark  = 0,
-   };
-   cpu_set_t cpu_set;
-   const char *syscall_names[] = { "getsid", "getppid", "getpgrp",
-   "getpgid", };
-   pid_t (*syscalls[])(void) = { (void *)getsid, getppid, getpgrp,
- (void*)getpgid };
-#define nsyscalls ARRAY_SIZE(syscall_names)
-   int ids[nsyscalls];
-   unsigned int nr_events[nsyscalls],
-expected_nr_events[nsyscalls], i, j;
-   struct perf_evsel *evsels[nsyscalls], *evsel;
-
-   for (i = 0; i < nsyscalls; ++i) {
-   char name[64];
-
-   snprintf(name, sizeof(name), "sys_enter_%s", syscall_names[i]);
-   ids[i] = trace_event__id(name);
-   if (ids[i] < 0) {
-   pr_debug("Is debugfs mounted on /sys/kernel/debug?\n");
-   return -1;
-   }
-   nr_events[i] = 0;
-   expected_nr_events[i] = random() % 257;
-   }
-
-   threads = thread_map__new(-1, getpid(), UINT_MAX);
-   if (threads == NULL) {
-   pr_debug("thread_map__new\n");
-   return -1;
-   }
-
-   cpus = cpu_map__new(NULL);
-   if (cpus == NULL) {
-   pr_debug("cpu_map__new\n");
-   goto out_free_threads;
-   }
-
-   CPU_ZERO(_set);
-   CPU_SET(cpus->map[0], _set);
-   sched_setaffinity(0, sizeof(cpu_set), _set);
-   if (sched_setaffinity(0, sizeof(cpu_set), _set) < 0) {
-   pr_debug("sched_setaffinity() failed on CPU %d: %s ",
-cpus->map[0], strerror(errno));
-   goto out_free_cpus;
-   }
-
-   evlist = perf_evlist__new(cpus, threads);
-   if (evlist == NULL) {
-   pr_debug("perf_evlist__new\n");
-   goto out_free_cpus;
-   }
-
-   /* anonymous union fields, can't be initialized above */
-   attr.wakeup_events = 1;
-   attr.sample_period = 1;
-
-   for (i = 0; i < nsyscalls; ++i) {
-   attr.config = ids[i];
-   evsels[i] = perf_evsel__new(, i);
-   if (evsels[i] == NULL) {
-   pr_debug("perf_evsel__new\n");
-   goto out_free_evlist;
-   }
-
-   perf_evlist__add(evlist, evsels[i]);
-
-   if (perf_evsel__open(evsels[i], cpus, threads) < 0) {
-   pr_debug("failed to open counter: %s, "
-"tweak 
/proc/sys/kernel/perf_event_paranoid?\n",
-strerror(errno));
-   goto out_close_fd;
-   }
-   }
-
-   if (perf_evlist__mmap(evlist, 128, true) < 0) {
-   pr_debug("failed to mmap events: %d (%s)\n", errno,
-

[PATCH 03/12] perf tests: Move test__open_syscall_event_on_all_cpus into separate object

2012-11-09 Thread Jiri Olsa
Separating test__open_syscall_event_on_all_cpus test from
the builtin-test into open-syscall-all-cpus object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile  |   1 +
 tools/perf/tests/builtin-test.c  | 114 -
 tools/perf/tests/open-syscall-all-cpus.c | 120 +++
 tools/perf/tests/tests.h |   1 +
 4 files changed, 122 insertions(+), 114 deletions(-)
 create mode 100644 tools/perf/tests/open-syscall-all-cpus.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 69f582c..d413e89 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -433,6 +433,7 @@ LIB_OBJS += $(OUTPUT)tests/dso-data.o
 LIB_OBJS += $(OUTPUT)tests/attr.o
 LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall.o
+LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index b6b1e46..98e883b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -29,120 +29,6 @@
 
 #include 
 
-static int test__open_syscall_event_on_all_cpus(void)
-{
-   int err = -1, fd, cpu;
-   struct thread_map *threads;
-   struct cpu_map *cpus;
-   struct perf_evsel *evsel;
-   struct perf_event_attr attr;
-   unsigned int nr_open_calls = 111, i;
-   cpu_set_t cpu_set;
-   int id = trace_event__id("sys_enter_open");
-
-   if (id < 0) {
-   pr_debug("is debugfs mounted on /sys/kernel/debug?\n");
-   return -1;
-   }
-
-   threads = thread_map__new(-1, getpid(), UINT_MAX);
-   if (threads == NULL) {
-   pr_debug("thread_map__new\n");
-   return -1;
-   }
-
-   cpus = cpu_map__new(NULL);
-   if (cpus == NULL) {
-   pr_debug("cpu_map__new\n");
-   goto out_thread_map_delete;
-   }
-
-
-   CPU_ZERO(_set);
-
-   memset(, 0, sizeof(attr));
-   attr.type = PERF_TYPE_TRACEPOINT;
-   attr.config = id;
-   evsel = perf_evsel__new(, 0);
-   if (evsel == NULL) {
-   pr_debug("perf_evsel__new\n");
-   goto out_thread_map_delete;
-   }
-
-   if (perf_evsel__open(evsel, cpus, threads) < 0) {
-   pr_debug("failed to open counter: %s, "
-"tweak /proc/sys/kernel/perf_event_paranoid?\n",
-strerror(errno));
-   goto out_evsel_delete;
-   }
-
-   for (cpu = 0; cpu < cpus->nr; ++cpu) {
-   unsigned int ncalls = nr_open_calls + cpu;
-   /*
-* XXX eventually lift this restriction in a way that
-* keeps perf building on older glibc installations
-* without CPU_ALLOC. 1024 cpus in 2010 still seems
-* a reasonable upper limit tho :-)
-*/
-   if (cpus->map[cpu] >= CPU_SETSIZE) {
-   pr_debug("Ignoring CPU %d\n", cpus->map[cpu]);
-   continue;
-   }
-
-   CPU_SET(cpus->map[cpu], _set);
-   if (sched_setaffinity(0, sizeof(cpu_set), _set) < 0) {
-   pr_debug("sched_setaffinity() failed on CPU %d: %s ",
-cpus->map[cpu],
-strerror(errno));
-   goto out_close_fd;
-   }
-   for (i = 0; i < ncalls; ++i) {
-   fd = open("/etc/passwd", O_RDONLY);
-   close(fd);
-   }
-   CPU_CLR(cpus->map[cpu], _set);
-   }
-
-   /*
-* Here we need to explicitely preallocate the counts, as if
-* we use the auto allocation it will allocate just for 1 cpu,
-* as we start by cpu 0.
-*/
-   if (perf_evsel__alloc_counts(evsel, cpus->nr) < 0) {
-   pr_debug("perf_evsel__alloc_counts(ncpus=%d)\n", cpus->nr);
-   goto out_close_fd;
-   }
-
-   err = 0;
-
-   for (cpu = 0; cpu < cpus->nr; ++cpu) {
-   unsigned int expected;
-
-   if (cpus->map[cpu] >= CPU_SETSIZE)
-   continue;
-
-   if (perf_evsel__read_on_cpu(evsel, cpu, 0) < 0) {
-   pr_debug("perf_evsel__read_on_cpu\n");
-   err = -1;
-   break;
-   }
-
-   expected = nr_open_calls + cpu;
-   if (evsel->counts->cpu[cpu].val != expected) {
-   pr_debug("perf_evsel__read_on_cpu: expected to 
intercept %d calls on cpu %d, got %" PRIu64 "\n",
-expected, cpus->map[cpu], 
evsel->counts->cpu[cpu].val);
-   

[RFC] dt/platform: Use cell-index for device naming if available

2012-11-09 Thread Stepan Moskovchenko
Use the cell-index property to construct names for platform
devices, falling back on the existing scheme of using the
device register address if cell-index is not specified.

The cell-index property is a more useful device identifier,
especially in systems containing several numbered instances
of a particular hardware block, since it more easily
illustrates how devices relate to each other.

Additionally, userspace software may rely on the classic
. naming scheme to access device attributes in
sysfs, without having to know the physical addresses of
that device on every platform the userspace software may
support. Using cell-index for device naming allows the
device addresses to be hidden from userspace and to be
exposed by logical device number without having to rely on
auxdata to perform name overrides. This allows userspace to
make assumptions about which sysfs nodes map to which
logical instance of a specific hardware block.

Signed-off-by: Stepan Moskovchenko 
---
I had also considered using something like the linux,label property to allow
custom names for platform devices without resorting to auxdata, but the
cell-index approach seems more in line with what cell-index was intended for
and with what the pre-DT platform device naming scheme used to be. Please let
me know if you think there is a better way to accomplish this.

This is just being sent out as an RFC for now. If there are no objections, I
will send this out as an official patch, along with (or combined with) a patch
to fix up the device names in things like clock tables of any affected
platforms.

 drivers/of/platform.c |   13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 343ad29..472e374 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -77,8 +77,9 @@ void of_device_make_bus_id(struct device *dev)
static atomic_t bus_no_reg_magic;
struct device_node *node = dev->of_node;
const u32 *reg;
+   u32 cell_index;
u64 addr;
-   int magic;
+   int magic, ret;

 #ifdef CONFIG_PPC_DCR
/*
@@ -101,6 +102,16 @@ void of_device_make_bus_id(struct device *dev)
 #endif /* CONFIG_PPC_DCR */

/*
+* For devices with a specified cell-index, use the traditional
+* naming scheme of .
+*/
+   ret = of_property_read_u32(node, "cell-index", _index);
+   if (ret == 0) {
+   dev_set_name(dev, "%s.%d", node->name, cell_index);
+   return;
+   }
+
+   /*
 * For MMIO, get the physical address
 */
reg = of_get_property(node, "reg", NULL);
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/12] perf tests: Check for mkstemp return value in dso-data test

2012-11-09 Thread Jiri Olsa
Adding check for mkstemp return error value in dso-data test.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/dso-data.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index b5198f5..5eaffa2 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -26,6 +26,10 @@ static char *test_file(int size)
unsigned char *buf;
 
fd = mkstemp(templ);
+   if (fd < 0) {
+   perror("mkstemp failed");
+   return NULL;
+   }
 
buf = malloc(size);
if (!buf) {
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFT RESEND linux-next] sparc: dma-mapping: support debug_dma_mapping_error

2012-11-09 Thread Shuah Khan
On Fri, 2012-11-09 at 19:38 -0500, David Miller wrote:
> From: Shuah Khan 
> Date: Fri, 26 Oct 2012 10:13:09 -0600
> 
> > Add support for debug_dma_mapping_error() call to avoid warning from
> > debug_dma_unmap() interface when it checks for mapping error checked
> > status. Without this patch, device driver failed to check map error
> > warning is generated.
> > 
> > Signed-off-by: Shuah Khan 
> 
> This doesn't even compile:
> 
> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h: In function 
> 'dma_mapping_error':
> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h:62:2: error: 
> implicit declaration of function 'debug_dma
> _mapping_error' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> In file included from include/linux/dma-mapping.h:76:0,
>  from include/linux/skbuff.h:33,
>  from include/linux/icmpv6.h:4,
>  from include/linux/ipv6.h:58,
>  from include/net/ipv6.h:16,
>  from include/linux/sunrpc/clnt.h:26,
>  from include/linux/nfs_fs.h:30,
>  from init/do_mounts.c:30:

The problem the dependent patch is in linux-next and not in arch trees
yet. That is why I am trying it get this through linux-next and/or
dma-debug tree. I am open to suggestions on what is the best route.

linux-next commit-id is:

commit 6c9c6d6301287e369a754d628230fa6e50cdb74b

I have another one that I sent to linux-next that fixes the dependency
on get_dma_ops():

https://lkml.org/lkml/2012/11/3/219

Thanks,
-- Shuah






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFT RESEND linux-next] sparc: dma-mapping: support debug_dma_mapping_error

2012-11-09 Thread David Miller
From: Shuah Khan 
Date: Fri, 09 Nov 2012 17:52:23 -0700

> On Fri, 2012-11-09 at 19:38 -0500, David Miller wrote:
>> From: Shuah Khan 
>> Date: Fri, 26 Oct 2012 10:13:09 -0600
>> 
>> > Add support for debug_dma_mapping_error() call to avoid warning from
>> > debug_dma_unmap() interface when it checks for mapping error checked
>> > status. Without this patch, device driver failed to check map error
>> > warning is generated.
>> > 
>> > Signed-off-by: Shuah Khan 
>> 
>> This doesn't even compile:
>> 
>> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h: In function 
>> 'dma_mapping_error':
>> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h:62:2: error: 
>> implicit declaration of function 'debug_dma
>> _mapping_error' [-Werror=implicit-function-declaration]
>> cc1: some warnings being treated as errors
>> In file included from include/linux/dma-mapping.h:76:0,
>>  from include/linux/skbuff.h:33,
>>  from include/linux/icmpv6.h:4,
>>  from include/linux/ipv6.h:58,
>>  from include/net/ipv6.h:16,
>>  from include/linux/sunrpc/clnt.h:26,
>>  from include/linux/nfs_fs.h:30,
>>  from init/do_mounts.c:30:
> 
> The problem the dependent patch is in linux-next and not in arch trees
> yet. That is why I am trying it get this through linux-next and/or
> dma-debug tree. I am open to suggestions on what is the best route.

Then please put this in via the dma-debug tree:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND v2 1/1] percpu_rw_semaphore: reimplement to not block the readers unnecessarily

2012-11-09 Thread Paul E. McKenney
On Fri, Nov 09, 2012 at 07:10:48PM +0100, Oleg Nesterov wrote:
> On 11/09, Paul E. McKenney wrote:
> >
> > On Fri, Nov 09, 2012 at 04:46:56PM +0100, Oleg Nesterov wrote:
> > > Contrary, I am going to try to add some complications later, so that
> > > it can have more users. In particular, I think it can replace
> > > get_online_cpus/cpu_hotplug_begin, just we need
> > > percpu_down_write_but_dont_deadlock_with_recursive_readers().
> >
> > I must confess that I am a bit concerned about possible scalability
> > bottlenecks in the current get_online_cpus(), so +1 from me on this one.
> 
> OK, thanks...
> 
> And btw percpu_down_write_but_dont_deadlock_with_recursive_readers() is
> trivial, just it needs down_write(rw_sem) "inside" wait_event(), not
> before. But I'm afraid I will never manage to write the comments ;)
> 
>   static bool xxx(brw)
>   {
>   down_write(>rw_sem);

down_write_trylock()

As you noted in your later email.  Presumably you return false if
the attempt to acquire it fails.

>   if (!atomic_read(>slow_read_ctr))
>   return true;
> 
>   up_write(>rw_sem);
>   return false;
>   }
> 
>   static void __percpu_down_write(struct percpu_rw_semaphore *brw, bool 
> recursive_readers)
>   {
>   mutex_lock(>writer_mutex);
> 
>   synchronize_sched();
> 
>   atomic_add(clear_fast_ctr(brw), >slow_read_ctr);
> 
>   if (recursive_readers)  {
>   wait_event(brw->write_waitq, xxx(brw));

I see what you mean about acquiring brw->rw_sem inside of wait_event().

Cute trick!

The "recursive_readers" is a global initialization-time thing, right?

>   } else {
>   down_write(>rw_sem);
> 
>   wait_event(brw->write_waitq, 
> !atomic_read(>slow_read_ctr));
>   }
>   }

Looks like it should work, and would perform and scale nicely even
if we end up having to greatly increase the number of calls to
get_online_cpus().

> Of course, cpu.c still needs .active_writer to allow get_online_cpus()
> under cpu_hotplug_begin(), but this is simple.

Yep, same check as now.

> But first we should do other changes, I think. IMHO we should not do
> synchronize_sched() under mutex_lock() and this will add (a bit) more
> complications. We will see.

Indeed, that does put considerable delay on the writers.  There is always
synchronize_sched_expedited(), I suppose.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] perf, amd: Simplify northbridge event constraints handler

2012-11-09 Thread Jacob Shin
From: Robert Richter 

Code simplification, there is no functional change.

Signed-off-by: Robert Richter 
Signed-off-by: Jacob Shin 
---
 arch/x86/kernel/cpu/perf_event_amd.c |   68 +-
 1 file changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c 
b/arch/x86/kernel/cpu/perf_event_amd.c
index 4528ae7..d60c5c7 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -256,9 +256,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, 
struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
struct amd_nb *nb = cpuc->amd_nb;
-   struct perf_event *old = NULL;
-   int max = x86_pmu.num_counters;
-   int i, j, k = -1;
+   struct perf_event *old;
+   int idx, new = -1;
 
/*
 * if not NB event or no NB, then no constraints
@@ -276,48 +275,33 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, 
struct perf_event *event)
 * because of successive calls to x86_schedule_events() from
 * hw_perf_group_sched_in() without hw_perf_enable()
 */
-   for (i = 0; i < max; i++) {
-   /*
-* keep track of first free slot
-*/
-   if (k == -1 && !nb->owners[i])
-   k = i;
+   for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+   if (new == -1 || hwc->idx == idx)
+   /* assign free slot, prefer hwc->idx */
+   old = cmpxchg(nb->owners + idx, NULL, event);
+   else if (nb->owners[idx] == event)
+   /* event already present */
+   old = event;
+   else
+   continue;
+
+   if (old && old != event)
+   continue;
+
+   /* reassign to this slot */
+   if (new != -1)
+   cmpxchg(nb->owners + new, event, NULL);
+   new = idx;
 
/* already present, reuse */
-   if (nb->owners[i] == event)
-   goto done;
-   }
-   /*
-* not present, so grab a new slot
-* starting either at:
-*/
-   if (hwc->idx != -1) {
-   /* previous assignment */
-   i = hwc->idx;
-   } else if (k != -1) {
-   /* start from free slot found */
-   i = k;
-   } else {
-   /*
-* event not found, no slot found in
-* first pass, try again from the
-* beginning
-*/
-   i = 0;
-   }
-   j = i;
-   do {
-   old = cmpxchg(nb->owners+i, NULL, event);
-   if (!old)
+   if (old == event)
break;
-   if (++i == max)
-   i = 0;
-   } while (i != j);
-done:
-   if (!old)
-   return >event_constraints[i];
-
-   return 
+   }
+
+   if (new == -1)
+   return 
+
+   return >event_constraints[new];
 }
 
 static struct amd_nb *amd_alloc_nb(int cpu)
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] perf, amd: Enable AMD family 15h northbridge counters

2012-11-09 Thread Jacob Shin
The following patchset enables 4 additional performance counters in
AMD family 15h processors that counts northbridge events -- such as
DRAM accesses.

This patchset is based on previous work done by Robert Richter
 :

https://lkml.org/lkml/2012/6/19/324

The main differences are:

- The northbridge counters are indexed contiguously right above the
  core performance counters.

- MSR address offset calculations are moved to architecture specific
  files.

- Interrups are set up to be delivered only to a single core.

Jacob Shin (3):
  perf, amd: Refactor northbridge event constraints handler for code
sharing
  perf, x86: Move MSR address offset calculation to architecture
specific files
  perf, amd: Enable northbridge performance counters on AMD family 15h

Robert Richter (1):
  perf, amd: Simplify northbridge event constraints handler

 arch/x86/include/asm/cpufeature.h|2 +
 arch/x86/include/asm/msr-index.h |2 +
 arch/x86/include/asm/perf_event.h|6 +
 arch/x86/kernel/cpu/perf_event.h |   21 +--
 arch/x86/kernel/cpu/perf_event_amd.c |  279 +++---
 5 files changed, 207 insertions(+), 103 deletions(-)

-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] perf, x86: Move MSR address offset calculation to architecture specific files

2012-11-09 Thread Jacob Shin
Move counter index to MSR address offset calculation to architecture
specific files. This prepares the way for perf_event_amd to enable
counter addresses that are not contiguous -- for example AMD Family
15h processors have 6 core performance counters starting at 0xc0010200
and 4 northbridge performance counters starting at 0xc0010240.

Signed-off-by: Jacob Shin 
---
 arch/x86/kernel/cpu/perf_event.h |   21 +---
 arch/x86/kernel/cpu/perf_event_amd.c |   36 ++
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 271d257..aacf025 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -325,6 +325,7 @@ struct x86_pmu {
int (*schedule_events)(struct cpu_hw_events *cpuc, int n, 
int *assign);
unsignedeventsel;
unsignedperfctr;
+   int (*addr_offset)(int index);
u64 (*event_map)(int);
int max_events;
int num_counters;
@@ -444,28 +445,16 @@ extern u64 __read_mostly hw_cache_extra_regs
 
 u64 x86_perf_event_update(struct perf_event *event);
 
-static inline int x86_pmu_addr_offset(int index)
-{
-   int offset;
-
-   /* offset = X86_FEATURE_PERFCTR_CORE ? index << 1 : index */
-   alternative_io(ASM_NOP2,
-  "shll $1, %%eax",
-  X86_FEATURE_PERFCTR_CORE,
-  "=a" (offset),
-  "a"  (index));
-
-   return offset;
-}
-
 static inline unsigned int x86_pmu_config_addr(int index)
 {
-   return x86_pmu.eventsel + x86_pmu_addr_offset(index);
+   return x86_pmu.eventsel +
+   (x86_pmu.addr_offset ? x86_pmu.addr_offset(index) : index);
 }
 
 static inline unsigned int x86_pmu_event_addr(int index)
 {
-   return x86_pmu.perfctr + x86_pmu_addr_offset(index);
+   return x86_pmu.perfctr +
+   (x86_pmu.addr_offset ? x86_pmu.addr_offset(index) : index);
 }
 
 int x86_setup_perfctr(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c 
b/arch/x86/kernel/cpu/perf_event_amd.c
index d17debd..078beb5 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -132,6 +132,41 @@ static u64 amd_pmu_event_map(int hw_event)
return amd_perfmon_event_map[hw_event];
 }
 
+/*
+ * Previously calculated offsets
+ */
+static unsigned int addr_offsets[X86_PMC_IDX_MAX] __read_mostly;
+
+/*
+ * Legacy CPUs:
+ *   4 counters starting at 0xc001 each offset by 1
+ *
+ * CPUs with core performance counter extensions:
+ *   6 counters starting at 0xc0010200 each offset by 2
+ */
+static inline int amd_pmu_addr_offset(int index)
+{
+   int offset;
+
+   if (!index)
+   return index;
+
+   offset = addr_offsets[index];
+
+   if (offset)
+   return offset;
+
+   if (!cpu_has_perfctr_core) {
+   offset = index;
+   } else {
+   offset = index << 1;
+   }
+
+   addr_offsets[index] = offset;
+
+   return offset;
+}
+
 static int amd_pmu_hw_config(struct perf_event *event)
 {
int ret;
@@ -570,6 +605,7 @@ static __initconst const struct x86_pmu amd_pmu = {
.schedule_events= x86_schedule_events,
.eventsel   = MSR_K7_EVNTSEL0,
.perfctr= MSR_K7_PERFCTR0,
+   .addr_offset= amd_pmu_addr_offset,
.event_map  = amd_pmu_event_map,
.max_events = ARRAY_SIZE(amd_perfmon_event_map),
.num_counters   = AMD64_NUM_COUNTERS,
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] perf, amd: Refactor northbridge event constraints handler for code sharing

2012-11-09 Thread Jacob Shin
Breakout and generalize family 10h northbridge event contraints code
so that later we can reuse the same code path with other AMD processor
families that have the same northbridge event constraints.

Based on previous patch by Robert Richter 

Signed-off-by: Jacob Shin 
Signed-off-by: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event_amd.c |   43 --
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c 
b/arch/x86/kernel/cpu/perf_event_amd.c
index d60c5c7..d17debd 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -188,20 +188,13 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
return nb && nb->nb_id != -1;
 }
 
-static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
- struct perf_event *event)
+static void __amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
+  struct perf_event *event)
 {
-   struct hw_perf_event *hwc = >hw;
struct amd_nb *nb = cpuc->amd_nb;
int i;
 
/*
-* only care about NB events
-*/
-   if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-   return;
-
-   /*
 * need to scan whole list because event may not have
 * been assigned during scheduling
 *
@@ -247,12 +240,13 @@ static void amd_put_event_constraints(struct 
cpu_hw_events *cpuc,
   *
   * Given that resources are allocated (cmpxchg), they must be
   * eventually freed for others to use. This is accomplished by
-  * calling amd_put_event_constraints().
+  * calling __amd_put_nb_event_constraints()
   *
   * Non NB events are not impacted by this restriction.
   */
 static struct event_constraint *
-amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event 
*event,
+  struct event_constraint *c)
 {
struct hw_perf_event *hwc = >hw;
struct amd_nb *nb = cpuc->amd_nb;
@@ -260,12 +254,6 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, 
struct perf_event *event)
int idx, new = -1;
 
/*
-* if not NB event or no NB, then no constraints
-*/
-   if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
-   return 
-
-   /*
 * detect if already present, if so reuse
 *
 * cannot merge with actual allocation
@@ -275,7 +263,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, 
struct perf_event *event)
 * because of successive calls to x86_schedule_events() from
 * hw_perf_group_sched_in() without hw_perf_enable()
 */
-   for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+   for_each_set_bit(idx, c->idxmsk, X86_PMC_IDX_MAX) {
if (new == -1 || hwc->idx == idx)
/* assign free slot, prefer hwc->idx */
old = cmpxchg(nb->owners + idx, NULL, event);
@@ -391,6 +379,25 @@ static void amd_pmu_cpu_dead(int cpu)
}
 }
 
+static struct event_constraint *
+amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+   /*
+* if not NB event or no NB, then no constraints
+*/
+   if ((amd_has_nb(cpuc) && amd_is_nb_event(>hw)))
+   return 
+
+   return __amd_get_nb_event_constraints(cpuc, event, );
+}
+
+static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
+ struct perf_event *event)
+{
+   if (amd_has_nb(cpuc) && amd_is_nb_event(>hw))
+   __amd_put_nb_event_constraints(cpuc, event);
+}
+
 PMU_FORMAT_ATTR(event, "config:0-7,32-35");
 PMU_FORMAT_ATTR(umask, "config:8-15"   );
 PMU_FORMAT_ATTR(edge,  "config:18" );
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] perf, amd: Enable northbridge performance counters on AMD family 15h

2012-11-09 Thread Jacob Shin
On AMD family 15h processors, there are 4 new performance counters
(in addition to 6 core performance counters) that can be used for
counting northbridge events (i.e. DRAM accesses). Their bit fields are
almost identical to the core performance counters. However, the same
set of MSRs are shared between multiple cores (that share the same
northbridge). We will reuse the same code path as existing family 10h
northbridge event constraints handler logic to enforce sharing.

Based on previous patch by Robert Richter 

Signed-off-by: Jacob Shin 
Signed-off-by: Robert Richter 
---
 arch/x86/include/asm/cpufeature.h|2 +
 arch/x86/include/asm/msr-index.h |2 +
 arch/x86/include/asm/perf_event.h|6 ++
 arch/x86/kernel/cpu/perf_event_amd.c |  142 ++
 4 files changed, 120 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 8c297aa..17f75b8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -167,6 +167,7 @@
 #define X86_FEATURE_TBM(6*32+21) /* trailing bit manipulations 
*/
 #define X86_FEATURE_TOPOEXT(6*32+22) /* topology extensions CPUID leafs */
 #define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter 
extensions */
+#define X86_FEATURE_PERFCTR_NB (6*32+24) /* nb performance counter extensions 
*/
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
@@ -308,6 +309,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
 #define cpu_has_pclmulqdq  boot_cpu_has(X86_FEATURE_PCLMULQDQ)
 #define cpu_has_perfctr_core   boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb boot_cpu_has(X86_FEATURE_PERFCTR_NB)
 #define cpu_has_cx8boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16   boot_cpu_has(X86_FEATURE_CX16)
 #define cpu_has_eager_fpu  boot_cpu_has(X86_FEATURE_EAGER_FPU)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..e67ff1e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,8 @@
 /* Fam 15h MSRs */
 #define MSR_F15H_PERF_CTL  0xc0010200
 #define MSR_F15H_PERF_CTR  0xc0010201
+#define MSR_F15H_NB_PERF_CTL   0xc0010240
+#define MSR_F15H_NB_PERF_CTR   0xc0010241
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE  0xc0010058
diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 4fabcdf..75e039c 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,8 @@
 #define ARCH_PERFMON_EVENTSEL_INV  (1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL
 
+#define AMD_PERFMON_EVENTSEL_INT_CORE_ENABLE   (1ULL << 36)
+#define AMD_PERFMON_EVENTSEL_INT_CORE_SEL_MASK (0x0FULL << 37)
 #define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY  (1ULL << 41)
 
@@ -46,8 +48,12 @@
 #define AMD64_RAW_EVENT_MASK   \
(X86_RAW_EVENT_MASK  |  \
 AMD64_EVENTSEL_EVENT)
+#define AMD64_NB_EVENT_MASK\
+   (AMD64_EVENTSEL_EVENT|  \
+ARCH_PERFMON_EVENTSEL_UMASK)
 #define AMD64_NUM_COUNTERS 4
 #define AMD64_NUM_COUNTERS_CORE6
+#define AMD64_NUM_COUNTERS_NB  4
 
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL  0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK(0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c 
b/arch/x86/kernel/cpu/perf_event_amd.c
index 078beb5..adf4026 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -143,10 +143,15 @@ static unsigned int addr_offsets[X86_PMC_IDX_MAX] 
__read_mostly;
  *
  * CPUs with core performance counter extensions:
  *   6 counters starting at 0xc0010200 each offset by 2
+ *
+ * CPUs with north bridge performance counter extensions:
+ *   4 additional counters starting at 0xc0010240 each offset by 2
+ *   (indexed right above either one of the above core counters)
  */
 static inline int amd_pmu_addr_offset(int index)
 {
int offset;
+   int ncore;
 
if (!index)
return index;
@@ -158,8 +163,17 @@ static inline int amd_pmu_addr_offset(int index)
 
if (!cpu_has_perfctr_core) {
offset = index;
+   ncore = AMD64_NUM_COUNTERS;
} else {
offset = index << 1;
+   ncore = AMD64_NUM_COUNTERS_CORE;
+   }
+
+   /* find offset of NB counters with respect to x86_pmu.eventsel */
+   if (cpu_has_perfctr_nb) {
+   if (index >= ncore && index < (ncore + AMD64_NUM_COUNTERS_NB))
+   

[PATCH 05/12] perf tests: Move test__PERF_RECORD into separate object

2012-11-09 Thread Jiri Olsa
Separating test__PERF_RECORD test from the builtin-test
into perf-record object.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile |   1 +
 tools/perf/tests/builtin-test.c | 307 ---
 tools/perf/tests/perf-record.c  | 314 
 tools/perf/tests/tests.h|   1 +
 4 files changed, 316 insertions(+), 307 deletions(-)
 create mode 100644 tools/perf/tests/perf-record.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 337489e..a2d6153 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -435,6 +435,7 @@ LIB_OBJS += $(OUTPUT)tests/vmlinux-kallsyms.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall.o
 LIB_OBJS += $(OUTPUT)tests/open-syscall-all-cpus.o
 LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
+LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/util.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 609f592..7cb3928 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -31,313 +31,6 @@
 
 
 
-static int sched__get_first_possible_cpu(pid_t pid, cpu_set_t *maskp)
-{
-   int i, cpu = -1, nrcpus = 1024;
-realloc:
-   CPU_ZERO(maskp);
-
-   if (sched_getaffinity(pid, sizeof(*maskp), maskp) == -1) {
-   if (errno == EINVAL && nrcpus < (1024 << 8)) {
-   nrcpus = nrcpus << 2;
-   goto realloc;
-   }
-   perror("sched_getaffinity");
-   return -1;
-   }
-
-   for (i = 0; i < nrcpus; i++) {
-   if (CPU_ISSET(i, maskp)) {
-   if (cpu == -1)
-   cpu = i;
-   else
-   CPU_CLR(i, maskp);
-   }
-   }
-
-   return cpu;
-}
-
-static int test__PERF_RECORD(void)
-{
-   struct perf_record_opts opts = {
-   .target = {
-   .uid = UINT_MAX,
-   .uses_mmap = true,
-   },
-   .no_delay   = true,
-   .freq   = 10,
-   .mmap_pages = 256,
-   };
-   cpu_set_t cpu_mask;
-   size_t cpu_mask_size = sizeof(cpu_mask);
-   struct perf_evlist *evlist = perf_evlist__new(NULL, NULL);
-   struct perf_evsel *evsel;
-   struct perf_sample sample;
-   const char *cmd = "sleep";
-   const char *argv[] = { cmd, "1", NULL, };
-   char *bname;
-   u64 prev_time = 0;
-   bool found_cmd_mmap = false,
-found_libc_mmap = false,
-found_vdso_mmap = false,
-found_ld_mmap = false;
-   int err = -1, errs = 0, i, wakeups = 0;
-   u32 cpu;
-   int total_events = 0, nr_events[PERF_RECORD_MAX] = { 0, };
-
-   if (evlist == NULL || argv == NULL) {
-   pr_debug("Not enough memory to create evlist\n");
-   goto out;
-   }
-
-   /*
-* We need at least one evsel in the evlist, use the default
-* one: "cycles".
-*/
-   err = perf_evlist__add_default(evlist);
-   if (err < 0) {
-   pr_debug("Not enough memory to create evsel\n");
-   goto out_delete_evlist;
-   }
-
-   /*
-* Create maps of threads and cpus to monitor. In this case
-* we start with all threads and cpus (-1, -1) but then in
-* perf_evlist__prepare_workload we'll fill in the only thread
-* we're monitoring, the one forked there.
-*/
-   err = perf_evlist__create_maps(evlist, );
-   if (err < 0) {
-   pr_debug("Not enough memory to create thread/cpu maps\n");
-   goto out_delete_evlist;
-   }
-
-   /*
-* Prepare the workload in argv[] to run, it'll fork it, and then wait
-* for perf_evlist__start_workload() to exec it. This is done this way
-* so that we have time to open the evlist (calling sys_perf_event_open
-* on all the fds) and then mmap them.
-*/
-   err = perf_evlist__prepare_workload(evlist, , argv);
-   if (err < 0) {
-   pr_debug("Couldn't run the workload!\n");
-   goto out_delete_evlist;
-   }
-
-   /*
-* Config the evsels, setting attr->comm on the first one, etc.
-*/
-   evsel = perf_evlist__first(evlist);
-   evsel->attr.sample_type |= PERF_SAMPLE_CPU;
-   evsel->attr.sample_type |= PERF_SAMPLE_TID;
-   evsel->attr.sample_type |= PERF_SAMPLE_TIME;
-   perf_evlist__config_attrs(evlist, );
-
-   err = sched__get_first_possible_cpu(evlist->workload.pid, _mask);
-   if (err < 0) {
-   pr_debug("sched__get_first_possible_cpu: %s\n", 
strerror(errno));
-   goto out_delete_evlist;
- 

Re: [PATCH RFT RESEND linux-next] sparc: dma-mapping: support debug_dma_mapping_error

2012-11-09 Thread Shuah Khan
On Fri, 2012-11-09 at 19:54 -0500, David Miller wrote:
> From: Shuah Khan 
> Date: Fri, 09 Nov 2012 17:52:23 -0700
> 
> > On Fri, 2012-11-09 at 19:38 -0500, David Miller wrote:
> >> From: Shuah Khan 
> >> Date: Fri, 26 Oct 2012 10:13:09 -0600
> >> 
> >> > Add support for debug_dma_mapping_error() call to avoid warning from
> >> > debug_dma_unmap() interface when it checks for mapping error checked
> >> > status. Without this patch, device driver failed to check map error
> >> > warning is generated.
> >> > 
> >> > Signed-off-by: Shuah Khan 
> >> 
> >> This doesn't even compile:
> >> 
> >> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h: In 
> >> function 'dma_mapping_error':
> >> /home/davem/src/GIT/sparc/arch/sparc/include/asm/dma-mapping.h:62:2: 
> >> error: implicit declaration of function 'debug_dma
> >> _mapping_error' [-Werror=implicit-function-declaration]
> >> cc1: some warnings being treated as errors
> >> In file included from include/linux/dma-mapping.h:76:0,
> >>  from include/linux/skbuff.h:33,
> >>  from include/linux/icmpv6.h:4,
> >>  from include/linux/ipv6.h:58,
> >>  from include/net/ipv6.h:16,
> >>  from include/linux/sunrpc/clnt.h:26,
> >>  from include/linux/nfs_fs.h:30,
> >>  from init/do_mounts.c:30:
> > 
> > The problem the dependent patch is in linux-next and not in arch trees
> > yet. That is why I am trying it get this through linux-next and/or
> > dma-debug tree. I am open to suggestions on what is the best route.
> 
> Then please put this in via the dma-debug tree:
> 
> Acked-by: David S. Miller 

Thanks
-- Shuah


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] pstore/ram: no timekeeping calls when unavailable

2012-11-09 Thread Kees Cook
On Fri, Nov 9, 2012 at 4:56 PM, John Stultz  wrote:
> On 11/05/2012 02:00 PM, Kees Cook wrote:
>>
>> We must not call timekeeping functions unless they are available. If we
>> dump
>> before they have resumed, avoid a WARN_ON by setting the timestamp to 0.
>>
>> Since the "ram" pstore driver can be a module, we must have
>> timekeeping_suspended exported.
>>
>> Reported-by: Doug Anderson 
>> Cc: Anton Vorontsov 
>> Cc: John Stultz 
>> Signed-off-by: Kees Cook 
>> ---
>> v2:
>>   - export needed for timekeeping_suspended (thanks to Fengguang Wu).
>> ---
>>   fs/pstore/ram.c   |8 +++-
>>   kernel/time/timekeeping.c |1 +
>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> index 1a4f6da..6d014e0 100644
>> --- a/fs/pstore/ram.c
>> +++ b/fs/pstore/ram.c
>> @@ -171,7 +171,13 @@ static size_t ramoops_write_kmsg_hdr(struct
>> persistent_ram_zone *prz)
>> struct timeval timestamp;
>> size_t len;
>>
>> -   do_gettimeofday();
>> +   /* Handle dumping before timekeeping has resumed. */
>> +   if (unlikely(timekeeping_suspended)) {
>> +   timestamp.tv_sec = 0;
>> +   timestamp.tv_usec = 0;
>> +   } else
>> +   do_gettimeofday();
>> +
>
> Would nulling out the timestamp be better done in do_gettimeofday()?  That
> way we don't have to export timekeeping internals and users would get
> something more sane for this corner case.

Well... I'm not sure. If we don't want to expose the
timekeeping_suspended variable, maybe we need a function to check
this? I think it's probably better to find the users of timekeeping
that could call it when suspended. That's why I figured the BUG was
there. Very very few things should be attempting to call gettimeofday
in a place where it might be suspended. As such, it seems like those
things should be able to determine how to handle it. Maybe not
everything would be sensible to get back 0s.

In this particular case, I'm fine with removing the BUG and returning
0 instead, since that's fine for ramoops. :)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/32] perf, x86: Support Haswell v4 LBR format

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the LBR format.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   18 +++---
 include/linux/perf_event.h |7 ++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
LBR_FORMAT_LIP  = 0x01,
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
+   LBR_FORMAT_EIP_FLAGS2   = 0x04,
 };
 
 /*
@@ -56,6 +57,8 @@ enum {
 LBR_FAR)
 
 #define LBR_FROM_FLAG_MISPRED  (1ULL << 63)
+#define LBR_FROM_FLAG_INTX (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
 #define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events 
*cpuc)
 
for (i = 0; i < x86_pmu.lbr_nr; i++) {
unsigned long lbr_idx = (tos - i) & mask;
-   u64 from, to, mis = 0, pred = 0;
+   u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;
 
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to   + lbr_idx, to);
 
-   if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+   lbr_format == LBR_FORMAT_EIP_FLAGS2) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
-   from = (u64)s64)from) << 1) >> 1);
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+   from = (u64)s64)from) << 1) >> 1);
+   else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+   intx = !!(from & LBR_FROM_FLAG_INTX);
+   abort = !!(from & LBR_FROM_FLAG_ABORT);
+   from = (u64)s64)from) << 3) >> 3);
+   }
}
 
cpuc->lbr_entries[i].from   = from;
cpuc->lbr_entries[i].to = to;
cpuc->lbr_entries[i].mispred= mis;
cpuc->lbr_entries[i].predicted  = pred;
+   cpuc->lbr_entries[i].intx   = intx;
+   cpuc->lbr_entries[i].abort  = abort;
cpuc->lbr_entries[i].reserved   = 0;
}
cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..91052e1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -74,13 +74,18 @@ struct perf_raw_record {
  *
  * support for mispred, predicted is optional. In case it
  * is not supported mispred = predicted = 0.
+ *
+ * intx: running in a hardware transaction
+ * abort: aborting a hardware transaction
  */
 struct perf_branch_entry {
__u64   from;
__u64   to;
__u64   mispred:1,  /* target mispredicted */
predicted:1,/* target predicted */
-   reserved:62;
+   intx:1, /* in transaction */
+   abort:1,/* transaction abort */
+   reserved:60;
 };
 
 /*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/32] perf, x86: Support weight samples for PEBS

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

When a weighted sample is requested, first try to report the TSX abort cost
on Haswell. If that is not available report the memory latency. This
allows profiling both by abort cost and by memory latencies.

Memory latencies requires enabling a different PEBS mode (LL).
When both address and weight is requested address wins.

The LL mode only works for memory related PEBS events, so add a
separate event constraint table for those.

I only did this for Haswell for now, but it could be added
for several other Intel CPUs too by just adding the right
table for them.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |4 ++
 arch/x86/kernel/cpu/perf_event_intel.c|4 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   47 +++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index ce2a863..d55e502 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -168,6 +168,7 @@ struct cpu_hw_events {
u64 perf_ctr_virt_mask;
 
void*kfree_on_online;
+   u8  *memory_latency_events;
 };
 
 #define __EVENT_CONSTRAINT(c, n, m, w, o) {\
@@ -390,6 +391,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void(*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
+   struct event_constraint *memory_lat_events;
 
/*
 * Intel LBR
@@ -599,6 +601,8 @@ extern struct event_constraint 
intel_ivb_pebs_event_constraints[];
 
 extern struct event_constraint intel_hsw_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_memory_latency_events[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 9b4dda5..20caf0a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1624,6 +1624,9 @@ static int hsw_hw_config(struct perf_event *event)
 
if (ret)
return ret;
+   /* PEBS cannot capture both */
+   if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+   event->attr.sample_type &= ~PERF_SAMPLE_WEIGHT;
if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
return 0;
event->hw.config |= event->attr.config & 
(HSW_INTX|HSW_INTX_CHECKPOINTED);
@@ -2230,6 +2233,7 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.format_attrs = intel_hsw_formats_attr;
+   x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
pr_cont("Haswell events, ");
break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index aa0f5fa..3094caa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -456,6 +456,17 @@ struct event_constraint intel_hsw_pebs_event_constraints[] 
= {
EVENT_CONSTRAINT_END
 };
 
+/* Subset of PEBS events supporting memory latency. Not used for scheduling */
+
+struct event_constraint intel_hsw_memory_latency_events[] = {
+   INTEL_EVENT_CONSTRAINT(0xcd, 0), /* MEM_TRANS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd0, 0), /* MEM_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd1, 0), /* MEM_LOAD_UOPS_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd2, 0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+   INTEL_EVENT_CONSTRAINT(0xd3, 0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
@@ -473,6 +484,21 @@ struct event_constraint *intel_pebs_constraints(struct 
perf_event *event)
return 
 }
 
+static bool is_memory_lat_event(struct perf_event *event)
+{
+   struct event_constraint *c;
+
+   if (x86_pmu.intel_cap.pebs_format < 1)
+   return false;
+   if (!x86_pmu.memory_lat_events)
+   return false;
+   for_each_event_constraint(c, x86_pmu.memory_lat_events) {
+   if ((event->hw.config & c->cmask) == c->code)
+   return true;
+   }
+   return false;
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -480,7 +506,12 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
-   cpuc->pebs_enabled |= 1ULL << hwc->idx;
+   /* When weight is requested enable LL instead of normal PEBS */
+   if 

[PATCH 11/32] perf, tools: Support sorting by intx, abort branch flags

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Extend the perf branch sorting code to support sorting by intx
or abort qualifiers. Also print out those qualifiers.

Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-report.c |3 +-
 tools/perf/builtin-top.c|4 ++-
 tools/perf/perf.h   |4 ++-
 tools/perf/util/hist.h  |2 +
 tools/perf/util/sort.c  |   55 +++
 tools/perf/util/sort.h  |2 +
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f07eae7..836aa32 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -595,7 +595,8 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
"Use the stdio interface"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-  " dso_from, symbol_to, symbol_from, mispredict"),
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN(0, "showcpuutilization", _conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", _pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f2ecd49..213bfeb 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1226,7 +1226,9 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_INCR('v', "verbose", ,
"be more verbose (show counter open errors, etc)"),
OPT_STRING('s', "sort", _order, "key[,key2...]",
-  "sort by key(s): pid, comm, dso, symbol, parent"),
+  "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
+  " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+  " abort, intx"),
OPT_BOOLEAN('n', "show-nr-samples", _conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_DEFAULT('G', "call-graph", , "output_type,min_percent, 
call_order",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 469fbf2..d106d5a 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -194,7 +194,9 @@ struct ip_callchain {
 struct branch_flags {
u64 mispred:1;
u64 predicted:1;
-   u64 reserved:62;
+   u64 intx:1;
+   u64 abort:1;
+   u64 reserved:60;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index b874609..d874bf5 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -43,6 +43,8 @@ enum hist_column {
HISTC_PARENT,
HISTC_CPU,
HISTC_MISPREDICT,
+   HISTC_INTX,
+   HISTC_ABORT,
HISTC_SYMBOL_FROM,
HISTC_SYMBOL_TO,
HISTC_DSO_FROM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..a8d1f1a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -476,6 +476,55 @@ struct sort_entry sort_mispredict = {
.se_width_idx   = HISTC_MISPREDICT,
 };
 
+static int64_t
+sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.abort !=
+   right->branch_info->flags.abort;
+}
+
+static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.abort)
+   out = "A";
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_abort = {
+   .se_header  = "Transaction abort",
+   .se_cmp = sort__abort_cmp,
+   .se_snprintf= hist_entry__abort_snprintf,
+   .se_width_idx   = HISTC_ABORT,
+};
+
+static int64_t
+sort__intx_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+   return left->branch_info->flags.intx !=
+   right->branch_info->flags.intx;
+}
+
+static int hist_entry__intx_snprintf(struct hist_entry *self, char *bf,
+   size_t size, unsigned int width)
+{
+   static const char *out = ".";
+
+   if (self->branch_info->flags.intx)
+   out = "T";
+
+   return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_intx = {
+   .se_header  = "Branch in transaction",
+   .se_cmp = sort__intx_cmp,
+   .se_snprintf= hist_entry__intx_snprintf,
+   .se_width_idx   = HISTC_INTX,
+};
+
 struct sort_dimension {
const char  *name;
struct sort_entry   *entry;
@@ -497,6 +546,8 @@ static struct sort_dimension sort_dimensions[] = {
DIM(SORT_CPU, "cpu", sort_cpu),
DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
DIM(SORT_SRCLINE, 

[PATCH 04/32] perf, x86: Support the TSX intx/intx_cp qualifiers v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Implement the TSX transaction and checkpointed transaction qualifiers for
Haswell. This allows e.g. to profile the number of cycles in transactions.

The checkpointed qualifier requires forcing the event to
counter 2, implement this with a custom constraint for Haswell.

Also add sysfs format attributes for intx/intx_cp

[Updated from earlier version that used generic attributes, now does
raw + sysfs formats]
v2: Moved bad hunk. Forbid some bad combinations.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   61 
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 634f639..44e18c02 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -1597,6 +1598,44 @@ static void core_pmu_enable_all(int added)
}
 }
 
+static int hsw_hw_config(struct perf_event *event)
+{
+   int ret = intel_pmu_hw_config(event);
+
+   if (ret)
+   return ret;
+   if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
+   return 0;
+   event->hw.config |= event->attr.config & 
(HSW_INTX|HSW_INTX_CHECKPOINTED);
+
+   /* 
+* INTX/INTX-CP do not play well with PEBS or ANY thread mode.
+*/
+   if ((event->hw.config & (HSW_INTX|HSW_INTX_CHECKPOINTED)) &&
+((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
+ event->attr.precise_ip > 0))
+   return -EIO;
+   return 0;
+}
+
+static struct event_constraint counter2_constraint = 
+   EVENT_CONSTRAINT(0, 0x4, 0);
+
+static struct event_constraint *
+hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+   struct event_constraint *c = intel_get_event_constraints(cpuc, event);
+
+   /* Handle special quirk on intx_checkpointed only in counter 2 */
+   if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+   if (c->idxmsk64 & (1U << 2))
+   return _constraint;
+   return 
+   }
+
+   return c;
+}
+
 PMU_FORMAT_ATTR(event, "config:0-7");
 PMU_FORMAT_ATTR(umask, "config:8-15"   );
 PMU_FORMAT_ATTR(edge,  "config:18" );
@@ -1604,6 +1643,8 @@ PMU_FORMAT_ATTR(pc,   "config:19" );
 PMU_FORMAT_ATTR(any,   "config:21" ); /* v3 + */
 PMU_FORMAT_ATTR(inv,   "config:23" );
 PMU_FORMAT_ATTR(cmask, "config:24-31"  );
+PMU_FORMAT_ATTR(intx,  "config:32" );
+PMU_FORMAT_ATTR(intx_cp,"config:33");
 
 static struct attribute *intel_arch_formats_attr[] = {
_attr_event.attr,
@@ -1761,6 +1802,23 @@ static struct attribute *intel_arch3_formats_attr[] = {
NULL,
 };
 
+/* Arch3 + TSX support */
+static struct attribute *intel_hsw_formats_attr[] __read_mostly = {
+   _attr_event.attr,
+   _attr_umask.attr,
+   _attr_edge.attr,
+   _attr_pc.attr,
+   _attr_any.attr,
+   _attr_inv.attr,
+   _attr_cmask.attr,
+   _attr_intx.attr,
+   _attr_intx_cp.attr,
+
+   _attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
+   NULL,
+};
+
+
 static __initconst const struct x86_pmu intel_pmu = {
.name   = "Intel",
.handle_irq = intel_pmu_handle_irq,
@@ -2135,6 +2193,9 @@ __init int intel_pmu_init(void)
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+   x86_pmu.hw_config = hsw_hw_config;
+   x86_pmu.get_event_constraints = hsw_get_event_constraints;
+   x86_pmu.format_attrs = intel_hsw_formats_attr;
pr_cont("Haswell events, ");
break;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/32] perf, tools: Add abort_tx,no_tx,in_tx branch filter options to perf record -j v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Make perf record -j aware of the new in_tx,no_tx,abort_tx branch qualifiers.

v2: ABORT -> ABORTTX
v3: Add more _
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |3 +++
 tools/perf/builtin-record.c  |3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index b38a1f9..159680e 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -172,6 +172,9 @@ following filters are defined:
 - u:  only when the branch target is at the user level
 - k: only when the branch target is in the kernel
 - hv: only when the target is at the hypervisor level
+   - in_tx: only when the target is in a hardware transaction
+   - no_tx: only when the target is not in a hardware transaction
+   - abort_tx: only when the target is a hardware transaction abort
 
 +
 The option requires at least one branch type among any, any_call, any_ret, 
ind_call.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5783c32..067d8ee 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -763,6 +763,9 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+   BRANCH_OPT("abort_tx", PERF_SAMPLE_BRANCH_ABORTTX),
+   BRANCH_OPT("in_tx", PERF_SAMPLE_BRANCH_INTX),
+   BRANCH_OPT("no_tx", PERF_SAMPLE_BRANCH_NOTX),
BRANCH_END
 };
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/32] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v4

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest

Cc: a...@redhat.com
Cc: g...@redhat.com
v2: Various fixes to address review feedback
v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
v4: Use reserved bits for #GP
Cc: g...@redhat.com
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/pmu.c  |   32 
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..63d4be4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_pmu {
u64 global_ovf_ctrl;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
+   u64 reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cfc258a..89405d0 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
 
 static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
unsigned config, bool exclude_user, bool exclude_kernel,
-   bool intr)
+   bool intr, bool intx, bool intx_cp)
 {
struct perf_event *event;
struct perf_event_attr attr = {
@@ -173,6 +173,10 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 
type,
.exclude_kernel = exclude_kernel,
.config = config,
};
+   if (intx)
+   attr.config |= HSW_INTX;
+   if (intx_cp)
+   attr.config |= HSW_INTX_CHECKPOINTED;
 
attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
 
@@ -206,7 +210,8 @@ static unsigned find_arch_event(struct kvm_pmu *pmu, u8 
event_select,
return arch_events[i].event_type;
 }
 
-static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+static void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, 
+u64 eventsel)
 {
unsigned config, type = PERF_TYPE_RAW;
u8 event_select, unit_mask;
@@ -226,7 +231,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
 
if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
ARCH_PERFMON_EVENTSEL_INV |
-   ARCH_PERFMON_EVENTSEL_CMASK))) {
+   ARCH_PERFMON_EVENTSEL_CMASK |
+   HSW_INTX |
+   HSW_INTX_CHECKPOINTED))) {
config = find_arch_event(>vcpu->arch.pmu, event_select,
unit_mask);
if (config != PERF_COUNT_HW_MAX)
@@ -239,7 +246,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 
eventsel)
reprogram_counter(pmc, type, config,
!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-   eventsel & ARCH_PERFMON_EVENTSEL_INT);
+   eventsel & ARCH_PERFMON_EVENTSEL_INT,
+   (eventsel & HSW_INTX),
+   (eventsel & HSW_INTX_CHECKPOINTED));
 }
 
 static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
@@ -256,7 +265,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 
en_pmi, int idx)
arch_events[fixed_pmc_events[idx]].event_type,
!(en & 0x2), /* exclude user */
!(en & 0x1), /* exclude kernel */
-   pmi);
+   pmi, false, false);
 }
 
 static inline u8 fixed_en_pmi(u64 ctrl, int idx)
@@ -289,7 +298,7 @@ static void reprogram_idx(struct kvm_pmu *pmu, int idx)
return;
 
if (pmc_is_gp(pmc))
-   reprogram_gp_counter(pmc, pmc->eventsel);
+   reprogram_gp_counter(pmu, pmc, pmc->eventsel);
else {
int fidx = idx - INTEL_PMC_IDX_FIXED;
reprogram_fixed_counter(pmc,
@@ -400,8 +409,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 
data)
} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
if (data == pmc->eventsel)
return 0;
-   if (!(data & 0x0020ull)) {
-   reprogram_gp_counter(pmc, data);
+   if (!(data & pmu->reserved_bits)) {
+   reprogram_gp_counter(pmu, pmc, data);
return 0;
}
}
@@ -442,6 +451,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)

[PATCH 12/32] perf, x86: Support full width counting

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Recent Intel CPUs have a new alternative MSR range for perfctrs that allows
writing the full counter width. Enable this range if the hardware reports it
using a new capability bit. This lowers overhead of perf stat slightly because
it has to do less interrupts to accumulate the counter value. On Haswell it
also avoids some problems with TSX aborting when the end of the counter
range is reached.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/msr-index.h   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |1 +
 arch/x86/kernel/cpu/perf_event_intel.c |6 ++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..2070f46 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -126,6 +126,9 @@
 #define MSR_KNC_EVNTSEL0   0x0028
 #define MSR_KNC_EVNTSEL1   0x0029
 
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0  0x04c1
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
complete list. */
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 1567b0d..ce2a863 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -278,6 +278,7 @@ union perf_capabilities {
u64 pebs_arch_reg:1;
u64 pebs_format:4;
u64 smm_freeze:1;
+   u64 fw_write:1;
};
u64 capabilities;
 };
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 44e18c02..bc21bce 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2247,5 +2247,11 @@ __init int intel_pmu_init(void)
}
}
 
+   /* Support full width counters using alternative MSR range */
+   if (x86_pmu.intel_cap.fw_write) {
+   x86_pmu.max_period = x86_pmu.cntval_mask;
+   x86_pmu.perfctr = MSR_IA32_PMC0;
+   }
+
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/32] perf, tools: Add browser support for transaction flags v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add histogram support for the transaction flags. Each flags instance becomes
a separate histogram. Support sorting and displaying the flags in report
and top.

The patch is fairly large, but it's really mostly just plumbing to pass the
flags around.

v2: Increase column. Fix flags decoding. Use longer strings for flags
to be more user friendly.
v3: Fix WERROR=1 build. Tidy display
Signed-off-by: Andi Kleen 
---
 tools/perf/builtin-annotate.c |2 +-
 tools/perf/builtin-diff.c |8 +++--
 tools/perf/builtin-report.c   |4 +-
 tools/perf/builtin-top.c  |4 +-
 tools/perf/util/hist.c|3 +-
 tools/perf/util/hist.h|3 +-
 tools/perf/util/sort.c|   61 +
 tools/perf/util/sort.h|2 +
 8 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 0b9cd0f..5a27016 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
 
-   he = __hists__add_entry(>hists, al, NULL, 1, 1);
+   he = __hists__add_entry(>hists, al, NULL, 1, 1, 0);
if (he == NULL)
return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 84cdbd1..aebff56 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -249,9 +249,10 @@ int perf_diff__formula(char *buf, size_t size, struct 
hist_entry *he)
 
 static int hists__add_entry(struct hists *self,
struct addr_location *al, u64 period,
-   u64 weight)
+   u64 weight, u64 transaction)
 {
-   if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
+   if (__hists__add_entry(self, al, NULL, period, weight, transaction)
+   != NULL)
return 0;
return -ENOMEM;
 }
@@ -273,7 +274,8 @@ static int diff__process_sample_event(struct perf_tool 
*tool __maybe_unused,
if (al.filtered)
return 0;
 
-   if (hists__add_entry(>hists, , sample->period, 
sample->weight)) {
+   if (hists__add_entry(>hists, , sample->period, sample->weight,
+sample->transaction)) {
pr_warning("problem incrementing symbol period, skipping 
event\n");
return -1;
}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index bd7bb66..c3d3fb7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -147,7 +147,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel 
*evsel,
}
 
he = __hists__add_entry(>hists, al, parent, sample->period,
-   sample->weight);
+   sample->weight, sample->transaction);
if (he == NULL)
return -ENOMEM;
 
@@ -597,7 +597,7 @@ int cmd_report(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-  " abort, intx,  weight, local_weight"),
+  " abort, intx,  weight, local_weight, transaction"),
OPT_BOOLEAN(0, "showcpuutilization", _conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", _pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index eb4ba1d5..0d2a33d 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -272,7 +272,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct 
perf_evsel *evsel,
struct hist_entry *he;
 
he = __hists__add_entry(>hists, al, NULL, sample->period,
-   sample->weight);
+   sample->weight, sample->transaction);
if (he == NULL)
return NULL;
 
@@ -1229,7 +1229,7 @@ int cmd_top(int argc, const char **argv, const char 
*prefix __maybe_unused)
OPT_STRING('s', "sort", _order, "key[,key2...]",
   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-  " abort, intx, weight, local_weight"),
+  " abort, intx, weight, local_weight, transaction"),
OPT_BOOLEAN('n', "show-nr-samples", _conf.show_nr_samples,
"Show a column with the number of samples"),
OPT_CALLBACK_DEFAULT('G', "call-graph", , "output_type,min_percent, 
call_order",
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index cbedba7..df79b9f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -359,7 

[PATCH 32/32] perf, tools: List kernel supplied event aliases in perf list v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

List the kernel supplied pmu event aliases in perf list

It's better when the users can actually see them.

v2: Fix pattern matching
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-list.txt |4 +-
 tools/perf/builtin-list.c  |3 +
 tools/perf/util/parse-events.c |5 ++-
 tools/perf/util/pmu.c  |   72 
 tools/perf/util/pmu.h  |3 +
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index d1e39dc..826f3d6 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
 SYNOPSIS
 
 [verse]
-'perf list' [hw|sw|cache|tracepoint|event_glob]
+'perf list' [hw|sw|cache|tracepoint|pmu|event_glob]
 
 DESCRIPTION
 ---
@@ -104,6 +104,8 @@ To limit the list use:
   'subsys_glob:event_glob' to filter by tracepoint subsystems such as sched,
   block, etc.
 
+. 'pmu' to print the kernel supplied PMU events.
+
 . If none of the above is matched, it will apply the supplied glob to all
   events, printing the ones that match.
 
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 1948ece..e79f423 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -13,6 +13,7 @@
 
 #include "util/parse-events.h"
 #include "util/cache.h"
+#include "util/pmu.h"
 
 int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
 {
@@ -37,6 +38,8 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__maybe_unused)
else if (strcmp(argv[i], "cache") == 0 ||
 strcmp(argv[i], "hwcache") == 0)
print_hwcache_events(NULL, false);
+   else if (strcmp(argv[i], "pmu") == 0)
+   print_pmu_events(NULL, false);
else if (strcmp(argv[i], "--raw-dump") == 0)
print_events(NULL, true);
else {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 70cbd1c..2834f12 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1069,6 +1069,8 @@ int print_hwcache_events(const char *event_glob, bool 
name_only)
}
}
 
+   if (printed)
+   printf("\n");
return printed;
 }
 
@@ -1123,11 +1125,12 @@ void print_events(const char *event_glob, bool 
name_only)
 
print_hwcache_events(event_glob, name_only);
 
+   print_pmu_events(event_glob, name_only);
+
if (event_glob != NULL)
return;
 
if (!name_only) {
-   printf("\n");
printf("  %-50s [%s]\n",
   "rNNN",
   event_type_descriptors[PERF_TYPE_RAW]);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 18e8480..fbd5ca0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -552,6 +552,78 @@ void perf_pmu__set_format(unsigned long *bits, long from, 
long to)
set_bit(b, bits);
 }
 
+static char *format_alias(char *buf, int len, struct perf_pmu *pmu,
+ struct perf_pmu__alias *alias)
+{
+   snprintf(buf, len, "%s/%s/", pmu->name, alias->name);
+   return buf;
+}
+
+static char *format_alias_or(char *buf, int len, struct perf_pmu *pmu,
+struct perf_pmu__alias *alias)
+{
+   snprintf(buf, len, "%s OR %s/%s/", alias->name, pmu->name, alias->name);
+   return buf;
+}
+
+static int cmp_string(const void *a, const void *b)
+{
+   const char * const *as = a;
+   const char * const *bs = b;
+   return strcmp(*as, *bs);
+}
+
+void print_pmu_events(const char *event_glob, bool name_only)
+{
+   struct perf_pmu *pmu;
+   struct perf_pmu__alias *alias;
+   char buf[1024];
+   int printed = 0;
+   int len, j;
+   char **aliases;
+
+   pmu = NULL;
+   len = 0;
+   while ((pmu = perf_pmu__scan(pmu)) != NULL)
+   list_for_each_entry (alias, >aliases, list)
+   len++;
+   aliases = malloc(sizeof(char *) * len);
+   if (!aliases)
+   return;
+   pmu = NULL;
+   j = 0;
+   while ((pmu = perf_pmu__scan(pmu)) != NULL)
+   list_for_each_entry (alias, >aliases, list) {
+   char *name = format_alias(buf, sizeof buf, pmu, alias);
+   bool is_cpu = !strcmp(pmu->name, "cpu");
+
+   if (event_glob != NULL &&
+   !(strglobmatch(name, event_glob) ||
+ (!is_cpu && strglobmatch(alias->name, 
event_glob
+   continue;
+   aliases[j] = name;
+   

[PATCH 29/32] perf, tools: Add perf stat --transaction v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the intx and intx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction 
length.

This is a reasonable overview over the success of the transactions.

Enable with a new --transaction / -T option.

This requires measuring these events in a group, since they depend on each
other.

This is implemented by using TM sysfs events exported by the kernel

v2: Only print the extended statistics when the option is enabled.
This avoids negative output when the user specifies the -T events
in separate groups.
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |4 +
 tools/perf/builtin-stat.c  |  101 +++-
 tools/perf/util/evsel.h|6 ++
 3 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index cf0c310..0d5b8cb 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -114,6 +114,10 @@ with it.  --append may be used here.  Examples:
 
 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- 
make -s -j64 O=defconfig-build/ bzImage
 
+-T::
+--transaction::
+
+Print statistics of transactional execution if supported.
 
 EXAMPLES
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6888960..6dfc8f8 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -65,6 +65,29 @@
 #define CNTR_NOT_SUPPORTED ""
 #define CNTR_NOT_COUNTED   ""
 
+static const char *transaction_attrs[] = {
+   "task-clock",
+   "{"
+   "instructions,"
+   "cycles,"
+   "cpu/cycles-t/,"
+   "cpu/cycles-ct/,"
+   "cpu/tx-start/,"
+   "cpu/el-start/"
+   "}"
+};
+
+/* must match the transaction_attrs above */
+enum {
+   T_TASK_CLOCK,
+   T_INSTRUCTIONS,
+   T_CYCLES,
+   T_CYCLES_INTX,
+   T_CYCLES_INTX_CP,
+   T_TRANSACTION_START,
+   T_ELISION_START
+};
+
 static struct perf_evlist  *evsel_list;
 
 static struct perf_target  target = {
@@ -78,6 +101,7 @@ static bool  no_aggr 
= false;
 static pid_t   child_pid   = -1;
 static boolnull_run=  false;
 static int detailed_run=  0;
+static booltransaction_run =  false;
 static boolbig_num =  true;
 static int big_num_opt =  -1;
 static const char  *csv_sep= NULL;
@@ -127,7 +151,11 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intx_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intxcp_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 
 static int create_perf_stat_counter(struct perf_evsel *evsel,
struct perf_evsel *first)
@@ -187,6 +215,18 @@ static inline int nsec_counter(struct perf_evsel *evsel)
return 0;
 }
 
+static struct perf_evsel *nth_evsel(int n)
+{
+   struct perf_evsel *ev;
+   int j;
+
+   j = 0;
+   list_for_each_entry (ev, _list->entries, node)
+   if (j++ == n)
+   return ev;
+   return NULL;
+}
+
 /*
  * Update various tracking values we maintain to print
  * more semantic information such as miss/hit ratios,
@@ -198,8 +238,14 @@ static void update_shadow_stats(struct perf_evsel 
*counter, u64 *count)
update_stats(_nsecs_stats[0], count[0]);
else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
update_stats(_cycles_stats[0], count[0]);
-   else if (perf_evsel__match(counter, HARDWARE, 
HW_STALLED_CYCLES_FRONTEND))
-   update_stats(_stalled_cycles_front_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX)))
+   update_stats(_cycles_intx_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX_CP)))
+   update_stats(_cycles_intxcp_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START)))
+   update_stats(_transaction_stats[0], count[0]);
+   else if (perf_evsel__cmp(counter, nth_evsel(T_ELISION_START)))
+   update_stats(_elision_stats[0], 

[PATCH 28/32] perf, x86: Add Haswell TSX event aliases v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add infrastructure to generate event aliases in /sys/devices/cpu/events/

And use this to set up user friendly aliases for the common TSX events.
TSX tuning relies heavily on the PMU, so it's important to be user friendly.

This replaces the generic transaction events in an earlier version
of this patchkit.

tx-start/commit/abort  to count RTM transactions
el-start/commit/abort  to count HLE ("elision") transactions
tx-conflict/overflow   to count conflict/overflow for both combined.

The general abort events exist in precise and non precise variants
Since the common case is sampling plain "tx-aborts" in precise.

This is very important because abort sampling only really works
with PEBS enabled, otherwise it would report the IP after the abort,
not the abort point. But counting with PEBS has more overhead,
so also have tx/el-abort-count aliases that do not enable PEBS
for perf stat.

It would be nice to switch automatically between those two, like in the
previous version, but that would need more new infrastructure for sysfs
first.

There is an tx-abort<->tx-aborts alias too, because I found myself
using both variants.

Also added friendly aliases for cpu/cycles,intx=1/ and
cpu/cycles,intx=1,intx_cp=1/ and the same for instructions.
These will be used by perf stat -T, and are also useful for users directly.

So for example to get transactional cycles can use "perf stat -e cycles-t"

v2: Move to new sysfs infrastructure
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   47 
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index dc2f2a1..e8fb4e2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2010,6 +2010,52 @@ static __init void intel_nehalem_quirk(void)
}
 }
 
+/* Haswell special events */
+EVENT_ATTR_STR(tx-start,   tx_start,   "event=0xc9,umask=0x1");
+EVENT_ATTR_STR(tx-commit,  tx_commit,  "event=0xc9,umask=0x2");
+EVENT_ATTR_STR(tx-abort,   tx_abort,   
"event=0xc9,umask=0x4,precise=2");
+EVENT_ATTR_STR(tx-abort-count, tx_abort_count, "event=0xc9,umask=0x4");
+/* alias */
+EVENT_ATTR_STR(tx-aborts,  tx_aborts,  
"event=0xc9,umask=0x4,precise=2");
+EVENT_ATTR_STR(tx-capacity,tx_capacity,"event=0x54,umask=0x2");
+EVENT_ATTR_STR(tx-conflict,tx_conflict,"event=0x54,umask=0x1");
+EVENT_ATTR_STR(el-start,   el_start,   "event=0xc8,umask=0x1");
+EVENT_ATTR_STR(el-commit,  el_commit,  "event=0xc8,umask=0x2");
+EVENT_ATTR_STR(el-abort,   el_abort,   
"event=0xc8,umask=0x4,precise=2");
+EVENT_ATTR_STR(el-abort-count, el_abort_count, "event=0xc8,umask=0x4");
+/* alias */
+EVENT_ATTR_STR(el-aborts,  el_aborts,  
"event=0xc8,umask=0x4,precise=2");
+/* shared with tx-* */
+EVENT_ATTR_STR(el-capacity,el_capacity,"event=0x54,umask=0x2");
+/* shared with tx-* */
+EVENT_ATTR_STR(el-conflict,el_conflict,"event=0x54,umask=0x1");
+EVENT_ATTR_STR(cycles-t,   cycles_t,   "event=0x3c,intx=1");
+EVENT_ATTR_STR(cycles-ct,  cycles_ct,  "event=0x3c,intx=1,intx_cp=1");
+EVENT_ATTR_STR(instructions-t, instructions_t, "event=0xc0,intx=1");
+EVENT_ATTR_STR(instructions-ct,instructions_ct,"event=0xc0,intx=1,intx_cp=1");
+
+static struct attribute *hsw_events_attrs[] = {
+   EVENT_PTR(tx_start),
+   EVENT_PTR(tx_commit),
+   EVENT_PTR(tx_abort),
+   EVENT_PTR(tx_aborts),
+   EVENT_PTR(tx_abort_count),
+   EVENT_PTR(tx_capacity),
+   EVENT_PTR(tx_conflict),
+   EVENT_PTR(el_start),
+   EVENT_PTR(el_commit),
+   EVENT_PTR(el_abort),
+   EVENT_PTR(el_aborts),
+   EVENT_PTR(el_abort_count),
+   EVENT_PTR(el_capacity),
+   EVENT_PTR(el_conflict),
+   EVENT_PTR(cycles_t),
+   EVENT_PTR(cycles_ct),
+   EVENT_PTR(instructions_t),
+   EVENT_PTR(instructions_ct),
+   NULL
+};
+
 __init int intel_pmu_init(void)
 {
union cpuid10_edx edx;
@@ -2235,6 +2281,7 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.format_attrs = intel_hsw_formats_attr;
x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
+   x86_pmu.cpu_events = hsw_events_attrs;
pr_cont("Haswell events, ");
break;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/32] perf, x86: improve sysfs event mapping with event string

2012-11-09 Thread Andi Kleen
From: Stephane Eranian 

This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.

[AK: Make events_sysfs_show unstatic again to fix compilation]
Signed-off-by: Stephane Eranian 
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c |   28 +---
 arch/x86/kernel/cpu/perf_event.h |   26 ++
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 57bb4ce..af38635 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1317,20 +1317,22 @@ static struct attribute_group x86_pmu_format_group = {
.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-   struct device_attribute attr;
-   u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
  */
 static void __init filter_events(struct attribute **attrs)
 {
+   struct device_attribute *d;
+   struct perf_pmu_events_attr *pmu_attr;
int i, j;
 
for (i = 0; attrs[i]; i++) {
+   d = (struct device_attribute *)attrs[i];
+   pmu_attr = container_of(d, struct perf_pmu_events_attr, attr);
+   /* str trumps id */
+   if (pmu_attr->event_str)
+   continue;
if (x86_pmu.event_map(i))
continue;
 
@@ -1342,24 +1344,20 @@ static void __init filter_events(struct attribute 
**attrs)
}
 }
 
-static ssize_t events_sysfs_show(struct device *dev, struct device_attribute 
*attr,
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
  char *page)
 {
struct perf_pmu_events_attr *pmu_attr = \
container_of(attr, struct perf_pmu_events_attr, attr);
 
u64 config = x86_pmu.event_map(pmu_attr->id);
-   return x86_pmu.events_sysfs_show(page, config);
-}
 
-#define EVENT_VAR(_id)  event_attr_##_id
-#define EVENT_PTR(_id) _attr_##_id.attr.attr
+   /* string trumps id */
+   if (pmu_attr->event_str)
+   return sprintf(page, "%s", pmu_attr->event_str);
 
-#define EVENT_ATTR(_name, _id) \
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {  \
-   .attr = __ATTR(_name, 0444, events_sysfs_show, NULL),   \
-   .id   =  PERF_COUNT_HW_##_id,   \
-};
+   return x86_pmu.events_sysfs_show(page, config);
+}
 
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 2ef5b20..12ae625 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -425,6 +425,29 @@ do {   
\
 #define ERF_NO_HT_SHARING  1
 #define ERF_HAS_RSP_1  2
 
+#define EVENT_VAR(_id)  event_attr_##_id
+#define EVENT_PTR(_id) _attr_##_id.attr.attr
+
+#define EVENT_ATTR(_name, _id) \
+static struct perf_pmu_events_attr EVENT_VAR(_id) = {  \
+   .attr = __ATTR(_name, 0444, events_sysfs_show, NULL),   \
+   .id   =  PERF_COUNT_HW_##_id,   \
+   .event_str = NULL,  \
+};
+
+#define EVENT_ATTR_STR(_name, v, str)\
+static struct perf_pmu_events_attr event_attr_##v = {\
+   .attr  = __ATTR(_name, 0444, events_sysfs_show, NULL),\
+   .id=  0,  \
+   .event_str =  str,\
+};
+
+struct perf_pmu_events_attr {
+   struct device_attribute attr;
+   u64 id;
+   const char *event_str;
+};
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
@@ -643,6 +666,9 @@ int p6_pmu_init(void);
 
 int knc_pmu_init(void);
 
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+ char *page);
+
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/32] perf, tools: Add support for record transaction flags v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add the glue in the user tools to record transaction flags with
--transaction (-T was already taken) and dump them.

Followon patches will use them.

v2: Fix manpage
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |4 +++-
 tools/perf/builtin-record.c  |2 ++
 tools/perf/perf.h|1 +
 tools/perf/util/event.h  |1 +
 tools/perf/util/evsel.c  |9 +
 tools/perf/util/session.c|3 +++
 6 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index fd4ac81..becccf7 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -185,12 +185,14 @@ is enabled for all the sampling events. The sampled 
branch type is the same for
 The various filters must be specified as a comma separated list: 
--branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
--W::
 --weight::
 Enable weightened sampling. An additional weight is recorded per sample and 
can be
 displayed with the weight and local_weight sort keys.  This currently works 
for TSX
 abort events and some memory events in precise mode on modern Intel CPUs.
 
+--transaction::
+Record transaction flags for transaction related events.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5ed43f2..5b8e185 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1046,6 +1046,8 @@ const struct option record_options[] = {
 parse_branch_stack),
OPT_BOOLEAN('W', "weight", _weight,
"sample by weight (on special events only)"),
+   OPT_BOOLEAN(0, "transaction", _transaction,
+   "sample transaction flags (special events only)"),
OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 7d3173f..4ae529c 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -247,6 +247,7 @@ struct perf_record_opts {
u64  default_interval;
u64  user_interval;
u16  stack_dump_size;
+   bool sample_transaction;
 };
 
 #endif
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index a97fbbe..84b070d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -89,6 +89,7 @@ struct perf_sample {
u64 stream_id;
u64 period;
u64 weight;
+   u64 transaction;
u32 cpu;
u32 raw_size;
void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3800fb5..5c9790d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -448,6 +448,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
perf_record_opts *opts,
if (opts->sample_weight)
attr->sample_type   |= PERF_SAMPLE_WEIGHT;
 
+   if (opts->sample_transaction)
+   attr->sample_type   |= PERF_SAMPLE_TRANSACTION;
+
if (opts->call_graph) {
attr->sample_type   |= PERF_SAMPLE_CALLCHAIN;
 
@@ -951,6 +954,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, 
union perf_event *event,
array++;
}
 
+   data->transaction = 0;
+   if (type & PERF_SAMPLE_TRANSACTION) {
+   data->transaction = *array;
+   array++;
+   }
+
if (type & PERF_SAMPLE_READ) {
fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
return -1;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 175dde3..6757345 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1009,6 +1009,9 @@ static void dump_sample(struct perf_evsel *evsel, union 
perf_event *event,
 
if (sample_type & PERF_SAMPLE_WEIGHT)
printf("... weight: %" PRIu64 "\n", sample->weight);
+
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   printf("... transaction: %" PRIx64 "\n", sample->transaction);
 }
 
 static struct machine *
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/32] perf, tools: Default to cpu// for events v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

When an event fails to parse and it's not in a new style format,
try to parse it again as a cpu event.

This allows to use sysfs exported events directly without //, so I can use

perf record -e tx-aborts ...

instead of

perf record -e cpu/tx-aborts/

v2: Handle multiple events
v3: Move to separate function
Signed-off-by: Andi Kleen 
---
 tools/perf/util/parse-events.c |   45 +++-
 1 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5b97a2b..70cbd1c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -768,6 +768,47 @@ int parse_events_name(struct list_head *list, char *name)
return 0;
 }
 
+static void str_append(char **s, int *len, const char *a)
+{
+   int olen = *s ? strlen(*s) : 0;
+   int nlen = olen + strlen(a) + 1;
+   if (*len < nlen) {
+   *len = *len * 2;
+   if (*len < nlen)
+   *len = nlen;
+   *s = realloc(*s, *len);
+   if (!*s)
+   exit(ENOMEM);
+   if (olen == 0)
+   **s = 0;
+   }
+   strcat(*s, a);
+}
+
+static int parse_events__scanner(const char *str, void *data, int start_token);
+
+static int parse_events_fixup(int ret, const char *str, void *data,
+ int start_token)
+{
+   char *o = strdup(str);
+   char *s = NULL;
+   char *t = o;
+   char *p;
+   int len = 0;
+
+   if (!o)
+   return ret;
+   while ((p = strsep(, ",")) != NULL) {
+   if (s)
+   str_append(, , ",");
+   str_append(, , "cpu/");
+   str_append(, , p);
+   str_append(, , "/");
+   }
+   free(o);
+   return parse_events__scanner(s, data, start_token);
+}
+
 static int parse_events__scanner(const char *str, void *data, int start_token)
 {
YY_BUFFER_STATE buffer;
@@ -788,7 +829,9 @@ static int parse_events__scanner(const char *str, void 
*data, int start_token)
parse_events__flush_buffer(buffer, scanner);
parse_events__delete_buffer(buffer, scanner);
parse_events_lex_destroy(scanner);
-   return ret;
+   if (ret && !strchr(str, '/'))
+   ret = parse_events_fixup(ret, str, data, start_token);
+   return ret;
 }
 
 /*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/32] perf, tools: Add support for weight v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

perf record has a new option -W that enables weightened sampling.

Add sorting support in top/report for the average weight per sample and the
total weight sum. This allows to both compare relative cost per event
and the total cost over the measurement period.

Add the necessary glue to perf report, record and the library.

v2: Merge with new hist refactoring.
v3: Fix manpage. Remove value check.
Rename global_weight to weight and weight to local_weight.
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-record.txt |6 +++
 tools/perf/builtin-annotate.c|2 +-
 tools/perf/builtin-diff.c|7 ++--
 tools/perf/builtin-record.c  |2 +
 tools/perf/builtin-report.c  |7 ++--
 tools/perf/builtin-top.c |5 ++-
 tools/perf/perf.h|1 +
 tools/perf/util/event.h  |1 +
 tools/perf/util/evsel.c  |   10 ++
 tools/perf/util/hist.c   |   22 +
 tools/perf/util/hist.h   |8 +++-
 tools/perf/util/session.c|3 ++
 tools/perf/util/sort.c   |   51 +-
 tools/perf/util/sort.h   |3 ++
 14 files changed, 109 insertions(+), 19 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 159680e..fd4ac81 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -185,6 +185,12 @@ is enabled for all the sampling events. The sampled branch 
type is the same for
 The various filters must be specified as a comma separated list: 
--branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
+--weight::
+Enable weightened sampling. An additional weight is recorded per sample and 
can be
+displayed with the weight and local_weight sort keys.  This currently works 
for TSX
+abort events and some memory events in precise mode on modern Intel CPUs.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cb23476..0b9cd0f 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
return 0;
}
 
-   he = __hists__add_entry(>hists, al, NULL, 1);
+   he = __hists__add_entry(>hists, al, NULL, 1, 1);
if (he == NULL)
return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 380683d..84cdbd1 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -248,9 +248,10 @@ int perf_diff__formula(char *buf, size_t size, struct 
hist_entry *he)
 }
 
 static int hists__add_entry(struct hists *self,
-   struct addr_location *al, u64 period)
+   struct addr_location *al, u64 period,
+   u64 weight)
 {
-   if (__hists__add_entry(self, al, NULL, period) != NULL)
+   if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
return 0;
return -ENOMEM;
 }
@@ -272,7 +273,7 @@ static int diff__process_sample_event(struct perf_tool 
*tool __maybe_unused,
if (al.filtered)
return 0;
 
-   if (hists__add_entry(>hists, , sample->period)) {
+   if (hists__add_entry(>hists, , sample->period, 
sample->weight)) {
pr_warning("problem incrementing symbol period, skipping 
event\n");
return -1;
}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 067d8ee..5ed43f2 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1044,6 +1044,8 @@ const struct option record_options[] = {
OPT_CALLBACK('j', "branch-filter", _stack,
 "branch filter mask", "branch stack filter modes",
 parse_branch_stack),
+   OPT_BOOLEAN('W', "weight", _weight,
+   "sample by weight (on special events only)"),
OPT_END()
 };
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 836aa32..bd7bb66 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -88,7 +88,7 @@ static int perf_report__add_branch_hist_entry(struct 
perf_tool *tool,
 * and not events sampled. Thus we use a pseudo period of 1.
 */
he = __hists__add_branch_entry(>hists, al, parent,
-   [i], 1);
+   [i], 1, 1);
if (he) {
struct annotation *notes;
err = -ENOMEM;
@@ -146,7 +146,8 @@ static int perf_evsel__add_hist_entry(struct perf_evsel 
*evsel,
return err;
}
 
-   he = 

[PATCH 30/32] perf, x86: Add a Haswell precise instructions event v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add a instructions-p event alias that uses the PDIR randomized instruction
retirement event. This is useful to avoid some systematic sampling shadow
problems. Normally PEBS sampling has a systematic shadow. With PDIR
enabled the hardware adds some randomization that statistically avoids
this problem. In this sense, it's more precise over a whole sampling
interval, but an individual sample can be less precise. But since we
sample overall it's a more precise event.

This could be used before using the explicit event code syntax, but it's easier
and more user friendly to use with an "instructions-p" alias. I expect
this will eventually become a common use case.

Right now for Haswell, will add to Ivy Bridge later too.

v2: Use new sysfs infrastructure
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index e8fb4e2..d177d88 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2033,6 +2033,7 @@ EVENT_ATTR_STR(cycles-t,   cycles_t,   
"event=0x3c,intx=1");
 EVENT_ATTR_STR(cycles-ct,  cycles_ct,  "event=0x3c,intx=1,intx_cp=1");
 EVENT_ATTR_STR(instructions-t, instructions_t, "event=0xc0,intx=1");
 EVENT_ATTR_STR(instructions-ct,instructions_ct,"event=0xc0,intx=1,intx_cp=1");
+EVENT_ATTR_STR(instructions-p, instructions_p, 
"event=0xc0,umask=0x01,precise=2");
 
 static struct attribute *hsw_events_attrs[] = {
EVENT_PTR(tx_start),
@@ -2053,6 +2054,7 @@ static struct attribute *hsw_events_attrs[] = {
EVENT_PTR(cycles_ct),
EVENT_PTR(instructions_t),
EVENT_PTR(instructions_ct),
+   EVENT_PTR(instructions_p),
NULL
 };
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/32] perf, core: Add generic transaction flags v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add a generic qualifier for transaction events, as a new sample
type that returns a flag word. This is particularly useful
for qualifying aborts: to distinguish aborts which happen
due to asynchronous events (like conflicts caused by another
CPU) versus instructions that lead to an abort.

The tuning strategies are very different for those cases,
so it's important to distinguish them easily and early.

Since it's inconvenient and inflexible to filter for this
in the kernel we report all the events out and allow
some post processing in user space.

The flags are based on the Intel TSX events, but should be fairly
generic and mostly applicable to other architectures too. In addition
to various flag words there's also reserved space to report an
program supplied abort code. For TSX this is used to distinguish specific
classes of aborts, like a lock busy abort when doing lock elision.

This adds the perf core glue needed for reporting the new flag word out.

v2: Add MEM/MISC
Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h  |2 ++
 include/uapi/linux/perf_event.h |   26 --
 kernel/events/core.c|6 ++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c9686c8..c32fba3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -589,6 +589,7 @@ struct perf_sample_data {
struct perf_regs_user   regs_user;
u64 stack_user_size;
u64 weight;
+   u64 transaction;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -603,6 +604,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->regs_user.regs = NULL;
data->stack_user_size = 0;
data->weight = 0;
+   data->transaction = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 809a5fd..e7b1a48 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -133,9 +133,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_REGS_USER   = 1U << 12,
PERF_SAMPLE_STACK_USER  = 1U << 13,
PERF_SAMPLE_WEIGHT  = 1U << 14,
+   PERF_SAMPLE_TRANSACTION = 1U << 15,
 
-   PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
-
+   PERF_SAMPLE_MAX = 1U << 16, /* non-ABI */
 };
 
 /*
@@ -179,6 +179,28 @@ enum perf_sample_regs_abi {
 };
 
 /*
+ * Values for the transaction event qualifier, mostly for abort events.
+ */
+enum {
+   PERF_SAMPLE_TXN_ELISION = (1 << 0), /* From elision */
+   PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */
+   PERF_SAMPLE_TXN_SYNC= (1 << 2), /* Instruction is related */
+   PERF_SAMPLE_TXN_ASYNC   = (1 << 3), /* Instruction not related */
+   PERF_SAMPLE_TXN_RETRY   = (1 << 4), /* Retry possible */
+   PERF_SAMPLE_TXN_CONFLICT= (1 << 5), /* Conflict abort */
+   PERF_SAMPLE_TXN_CAPACITY= (1 << 6), /* Capacity abort */
+   PERF_SAMPLE_TXN_MEMORY  = (1 << 7), /* Memory related abort */
+   PERF_SAMPLE_TXN_MISC= (1 << 8), /* Misc aborts */
+
+   PERF_SAMPLE_TXN_MAX = (1 << 9),  /* non-ABI */
+
+   /* bits 24..31 are reserved for the abort code */
+
+   PERF_SAMPLE_TXN_ABORT_MASK  = 0xff00,
+   PERF_SAMPLE_TXN_ABORT_SHIFT = 24,
+};
+
+/*
  * The format of the data returned by read() on a perf event fd,
  * as specified by attr.read_format:
  *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d633581..534810d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -955,6 +955,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_WEIGHT)
size += sizeof(data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   size += sizeof(data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -4086,6 +4089,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_WEIGHT)
perf_output_put(handle, data->weight);
 
+   if (sample_type & PERF_SAMPLE_TRANSACTION)
+   perf_output_put(handle, data->transaction);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/32] perf, tools: Handle XBEGIN like a jump

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

So that the browser still shows the abort label

Signed-off-by: Andi Kleen 
---
 tools/perf/util/annotate.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 7a34dd1..b14d4df 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -401,6 +401,8 @@ static struct ins instructions[] = {
{ .name = "testb", .ops  = _ops, },
{ .name = "testl", .ops  = _ops, },
{ .name = "xadd",  .ops  = _ops, },
+   { .name = "xbeginl", .ops  = _ops, },
+   { .name = "xbeginq", .ops  = _ops, },
 };
 
 static int ins__cmp(const void *name, const void *insp)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/32] perf, x86: Support CPU specific sysfs events

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add a way for the CPU initialization code to register additional events,
and merge them into the events attribute directory. Used in the next
patch.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c |   29 +
 arch/x86/kernel/cpu/perf_event.h |1 +
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index af38635..e247423 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1344,6 +1344,30 @@ static void __init filter_events(struct attribute 
**attrs)
}
 }
 
+/* Merge two pointer arrays */
+static __init struct attribute **merge_attr(struct attribute **a,
+   struct attribute **b)
+{
+   struct attribute **new;
+   int j, i;
+
+   for (j = 0; a[j]; j++)
+   ;
+   for (i = 0; b[i]; i++)
+   j++;
+   j++;
+   new = kmalloc(sizeof(struct attribute *) * j, GFP_KERNEL);
+   if (!new)
+   return a;
+   j = 0;
+   for (i = 0; a[i]; i++)
+   new[j++] = a[i];
+   for (i = 0; b[i]; i++)
+   new[j++] = b[i];
+   new[j] = NULL;
+   return new;
+}
+
 ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
  char *page)
 {
@@ -1481,6 +1505,11 @@ static int __init init_hw_perf_events(void)
else
filter_events(x86_pmu_events_group.attrs);
 
+   if (x86_pmu.cpu_events)
+   x86_pmu_events_group.attrs =
+   merge_attr(x86_pmu_events_group.attrs,
+  x86_pmu.cpu_events);
+
pr_info("... version:%d\n", x86_pmu.version);
pr_info("... bit width:  %d\n", x86_pmu.cntval_bits);
pr_info("... generic registers:  %d\n", x86_pmu.num_counters);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 12ae625..22f8003 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -360,6 +360,7 @@ struct x86_pmu {
struct attribute **format_attrs;
 
ssize_t (*events_sysfs_show)(char *page, u64 config);
+   struct attribute **cpu_events;
 
/*
 * CPU Hotplug hooks
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/32] perf, tools: Add arbitary aliases and support names with -

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

- Add missing scanner symbol for arbitrary aliases inside the config
region.
- looks nicer than _, so allow - in the event names. Used for various
of the arch perfmon and Haswell events.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/parse-events.l |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..66959fa 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,6 +81,7 @@ num_dec   [0-9]+
 num_hex0x[a-fA-F0-9]+
 num_raw_hex[a-fA-F0-9]+
 name   [a-zA-Z_*?][a-zA-Z0-9_*?]*
+name_minus [a-zA-Z_*?][a-zA-Z0-9\-_*?]*
 modifier_event [ukhpGH]{1,8}
 modifier_bp[rwx]{1,3}
 
@@ -168,6 +169,7 @@ period  { return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,  { return ','; }
 "/"{ BEGIN(INITIAL); return '/'; }
+{name_minus}   { return str(yyscanner, PE_NAME); }
 }
 
 mem:   { BEGIN(mem); return PE_PREFIX_MEM; }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/32] perf, x86: Support LBR filtering by INTX/NOTX/ABORT v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add LBR filtering for branch in transaction, branch not in transaction
or transaction abort. This is exposed as new sample types.

v2: Rename ABORT to ABORTTX
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   31 +--
 include/uapi/linux/perf_event.h|5 +++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index ad5af13..5455a00 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -85,9 +85,13 @@ enum {
X86_BR_JMP  = 1 << 9, /* jump */
X86_BR_IRQ  = 1 << 10,/* hw interrupt or trap or fault */
X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+   X86_BR_ABORT= 1 << 12,/* transaction abort */
+   X86_BR_INTX = 1 << 13,/* in transaction */
+   X86_BR_NOTX = 1 << 14,/* not in transaction */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
+#define X86_BR_ANYTX (X86_BR_NOTX | X86_BR_INTX)
 
 #define X86_BR_ANY   \
(X86_BR_CALL|\
@@ -99,6 +103,7 @@ enum {
 X86_BR_JCC |\
 X86_BR_JMP  |\
 X86_BR_IRQ  |\
+X86_BR_ABORT|\
 X86_BR_IND_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
@@ -347,6 +352,16 @@ static void intel_pmu_setup_sw_lbr_filter(struct 
perf_event *event)
 
if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
mask |= X86_BR_IND_CALL;
+
+   if (br_type & PERF_SAMPLE_BRANCH_ABORTTX)
+   mask |= X86_BR_ABORT;
+
+   if (br_type & PERF_SAMPLE_BRANCH_INTX)
+   mask |= X86_BR_INTX;
+
+   if (br_type & PERF_SAMPLE_BRANCH_NOTX)
+   mask |= X86_BR_NOTX;
+
/*
 * stash actual user request into reg, it may
 * be used by fixup code for some CPU
@@ -393,7 +408,8 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
/*
 * no LBR on this PMU
 */
-   if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > 
LBR_FORMAT_MAX_KNOWN)
+   if (!x86_pmu.lbr_nr ||
+   x86_pmu.intel_cap.lbr_format > LBR_FORMAT_MAX_KNOWN)
return -EOPNOTSUPP;
 
/*
@@ -421,7 +437,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
  * decoded (e.g., text page not present), then X86_BR_NONE is
  * returned.
  */
-static int branch_type(unsigned long from, unsigned long to)
+static int branch_type(unsigned long from, unsigned long to, int abort)
 {
struct insn insn;
void *addr;
@@ -441,6 +457,9 @@ static int branch_type(unsigned long from, unsigned long to)
if (from == 0 || to == 0)
return X86_BR_NONE;
 
+   if (abort)
+   return X86_BR_ABORT | to_plm;
+
if (from_plm == X86_BR_USER) {
/*
 * can happen if measuring at the user level only
@@ -577,7 +596,13 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
from = cpuc->lbr_entries[i].from;
to = cpuc->lbr_entries[i].to;
 
-   type = branch_type(from, to);
+   type = branch_type(from, to, cpuc->lbr_entries[i].abort);
+   if (type != X86_BR_NONE && (br_sel & X86_BR_ANYTX)) {
+   if (cpuc->lbr_entries[i].intx)
+   type |= X86_BR_INTX;
+   else
+   type |= X86_BR_NOTX;
+   }
 
/* if type does not correspond, then discard */
if (type == X86_BR_NONE || (br_sel & type) != type) {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..8e38823 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -155,8 +155,11 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */
PERF_SAMPLE_BRANCH_ANY_RETURN   = 1U << 5, /* any return branch */
PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */
+   PERF_SAMPLE_BRANCH_ABORTTX  = 1U << 7, /* transaction aborts */
+   PERF_SAMPLE_BRANCH_INTX = 1U << 8, /* in transaction (flag) */
+   PERF_SAMPLE_BRANCH_NOTX = 1U << 9, /* not in transaction (flag) 
*/
 
-   PERF_SAMPLE_BRANCH_MAX  = 1U << 7, /* non-ABI */
+   PERF_SAMPLE_BRANCH_MAX  = 1U << 10, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/32] perf, tools: Point --sort documentation to --help

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

The --sort documentation for top and report was hopelessly out-of-date
Instead of having two more places that would need to be updated,
just point to --help.

Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-report.txt |2 +-
 tools/perf/Documentation/perf-top.txt|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index f4d91be..7cd5d0a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -57,7 +57,7 @@ OPTIONS
 
 -s::
 --sort=::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): See --help for a full list.
 
 -p::
 --parent=::
diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 5b80d84..0f0fa3e 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -112,7 +112,7 @@ Default is to monitor all CPUS.
 
 -s::
 --sort::
-   Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+   Sort by key(s): see --help for a full list.
 
 -n::
 --show-nr-samples::
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/32] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with a
ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:
- Using the full counter width for counting counters (previous patch)
- Forbid sampling for checkpointed counters. It's not too useful anyways,
checkpointing is mainly for counting.
- On a PMI always set back checkpointed counters to zero.

v2: Add unlikely. Add comment
v3: Allow large sampling periods with CP for KVM
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel.c |   34 
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index bc21bce..9b4dda5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event 
*event)
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
x86_perf_event_update(event);
+   /*
+* For a checkpointed counter always reset back to 0.  This
+* avoids a situation where the counter overflows, aborts the
+* transaction and is then set back to shortly before the
+* overflow, and overflows and aborts again.
+*/
+   if (unlikely(event->hw.config & HSW_INTX_CHECKPOINTED)) {
+   /* No race with NMIs because the counter should not be armed */
+   wrmsrl(event->hw.event_base, 0);
+   local64_set(>hw.prev_count, 0);
+   }
return x86_perf_event_set_period(event);
 }
 
@@ -1162,6 +1173,15 @@ again:
x86_pmu.drain_pebs(regs);
}
 
+   /*
+* To avoid spurious interrupts with perf stat always reset checkpointed
+* counters.
+*
+* XXX move somewhere else.
+*/
+   if (cpuc->events[2] && (cpuc->events[2]->hw.config & 
HSW_INTX_CHECKPOINTED))
+   status |= (1ULL << 2);
+
for_each_set_bit(bit, (unsigned long *), X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
 
@@ -1615,6 +1635,20 @@ static int hsw_hw_config(struct perf_event *event)
 ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
  event->attr.precise_ip > 0))
return -EIO;
+   if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+   /*
+* Sampling of checkpointed events can cause situations where
+* the CPU constantly aborts because of a overflow, which is
+* then checkpointed back and ignored. Forbid checkpointing
+* for sampling.
+*
+* But still allow a long sampling period, so that perf stat
+* from KVM works.
+*/
+   if (event->attr.sample_period > 0 &&
+   event->attr.sample_period < 0x7fff)
+   return -EIO;
+   }
return 0;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/32] perf, x86: Basic Haswell PMU support v2

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add basic Haswell PMU support.

Similar to SandyBridge, but has a few new events. Further
differences are handled in followon patches.

There are some new counter flags that need to be prevented
from being set on fixed counters.

Contains fixes from Stephane Eranian

v2: Folded TSX bits into standard FIXED_EVENT_CONSTRAINTS
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/perf_event.h  |3 +++
 arch/x86/kernel/cpu/perf_event.h   |5 -
 arch/x86/kernel/cpu/perf_event_intel.c |   29 +
 3 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 4fabcdf..4003bb6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
 #define ARCH_PERFMON_EVENTSEL_INV  (1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL
 
+#define HSW_INTX   (1ULL << 32)
+#define HSW_INTX_CHECKPOINTED  (1ULL << 33)
+
 #define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY  (1ULL << 41)
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 115c1ea..8941899 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -219,11 +219,14 @@ struct cpu_hw_events {
  *  - inv
  *  - edge
  *  - cnt-mask
+ *  - intx
+ *  - intx_cp
  *  The other filters are supported by fixed counters.
  *  The any-thread option is supported starting with v3.
  */
+#define FIXED_EVENT_FLAGS (X86_RAW_EVENT_MASK|HSW_INTX|HSW_INTX_CHECKPOINTED)
 #define FIXED_EVENT_CONSTRAINT(c, n)   \
-   EVENT_CONSTRAINT(c, (1ULL << (32+n)), X86_RAW_EVENT_MASK)
+   EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
 /*
  * Constraint on the Event code + UMask
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 93b9e11..3a08534 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -133,6 +133,17 @@ static struct extra_reg intel_snb_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_hsw_event_constraints[] =
+{
+   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
+   INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
+   EVENT_CONSTRAINT_END
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
return intel_perfmon_event_map[hw_event];
@@ -2107,6 +2118,24 @@ __init int intel_pmu_init(void)
break;
 
 
+   case 60: /* Haswell Client */
+   case 70:
+   case 71:
+   memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+  sizeof(hw_cache_event_ids));
+
+   intel_pmu_lbr_init_nhm();
+
+   x86_pmu.event_constraints = intel_hsw_event_constraints;
+
+   x86_pmu.extra_regs = intel_snb_extra_regs;
+   /* all extra regs are per-cpu when HT is on */
+   x86_pmu.er_flags |= ERF_HAS_RSP_1;
+   x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+   pr_cont("Haswell events, ");
+   break;
+
default:
switch (x86_pmu.version) {
case 1:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/32] perf, x86: Support for printing PMU state on spurious PMIs v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

I had some problems with spurious PMIs, so print the PMU state
on a spurious one. This will not interact well with other NMI users.
Disabled by default, has to be explicitely enabled through sysfs.

Optional, but useful for debugging.

v2: Move to /sys/devices/cpu
v3: Print in more cases
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |2 ++
 arch/x86/kernel/cpu/perf_event_intel.c |   11 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ec3c549..57bb4ce 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -38,6 +38,7 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+int   print_spurious_pmi __read_mostly;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
@@ -1758,9 +1759,11 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 }
 
 static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+static DEVICE_INT_ATTR(print_spurious_pmi, 0644, print_spurious_pmi);
 
 static struct attribute *x86_pmu_attrs[] = {
_attr_rdpmc.attr,
+   _attr_print_spurious_pmi.attr.attr,
NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index d55e502..2ef5b20 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -664,3 +664,5 @@ static inline struct intel_shared_regs 
*allocate_shared_regs(int cpu)
 }
 
 #endif /* CONFIG_CPU_SUP_INTEL */
+
+extern int print_spurious_pmi;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 20caf0a..dc2f2a1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1146,11 +1146,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
-   if (!status) {
-   intel_pmu_enable_all(0);
-   return handled;
-   }
-
+   if (!status)
+   goto done;
loops = 0;
 again:
intel_pmu_ack_status(status);
@@ -1210,6 +1207,10 @@ again:
goto again;
 
 done:
+   if (!handled && print_spurious_pmi) {
+   pr_debug("Spurious PMI\n");
+   perf_event_print_debug();
+   }
intel_pmu_enable_all(0);
return handled;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/32] perf, core: Add a concept of a weightened sample

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was.  This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms)  The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen 
---
 include/linux/perf_event.h  |2 ++
 include/uapi/linux/perf_event.h |8 ++--
 kernel/events/core.c|6 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 91052e1..c9686c8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -588,6 +588,7 @@ struct perf_sample_data {
struct perf_branch_stack*br_stack;
struct perf_regs_user   regs_user;
u64 stack_user_size;
+   u64 weight;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -601,6 +602,7 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data->regs_user.regs = NULL;
data->stack_user_size = 0;
+   data->weight = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 8e38823..809a5fd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
PERF_SAMPLE_BRANCH_STACK= 1U << 11,
PERF_SAMPLE_REGS_USER   = 1U << 12,
PERF_SAMPLE_STACK_USER  = 1U << 13,
+   PERF_SAMPLE_WEIGHT  = 1U << 14,
+
+   PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
 
-   PERF_SAMPLE_MAX = 1U << 14, /* non-ABI */
 };
 
 /*
@@ -201,8 +203,9 @@ enum perf_event_read_format {
PERF_FORMAT_TOTAL_TIME_RUNNING  = 1U << 1,
PERF_FORMAT_ID  = 1U << 2,
PERF_FORMAT_GROUP   = 1U << 3,
+   PERF_FORMAT_WEIGHT  = 1U << 4,
 
-   PERF_FORMAT_MAX = 1U << 4,  /* non-ABI */
+   PERF_FORMAT_MAX = 1U << 5,  /* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER064  /* sizeof first published struct */
@@ -562,6 +565,7 @@ enum perf_event_type {
 *  { u64   stream_id;} && PERF_SAMPLE_STREAM_ID
 *  { u32   cpu, res; } && PERF_SAMPLE_CPU
 *  { u64   period;   } && PERF_SAMPLE_PERIOD
+*  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *
 *  { struct read_formatvalues;   } && PERF_SAMPLE_READ
 *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..d633581 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event 
*event)
if (sample_type & PERF_SAMPLE_PERIOD)
size += sizeof(data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   size += sizeof(data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;
 
@@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PERIOD)
perf_output_put(handle, data->period);
 
+   if (sample_type & PERF_SAMPLE_WEIGHT)
+   perf_output_put(handle, data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/32] perf, x86: Add Haswell specific transaction flag reporting

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 3094caa..4b657c2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -677,6 +677,15 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
data.weight = ((struct pebs_record_v2 *)pebs)->nhm.lat;
}
 
+   if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) &&
+   x86_pmu.intel_cap.pebs_format >= 2) {
+   data.transaction =
+((struct pebs_record_v2 *)pebs)->tsx_tuning >> 32;
+   if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) &&
+   (pebs->ax & 1))
+   data.transaction |= pebs->ax & 0xff00;
+   }
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/32] tools, perf: Add a precise event qualifier

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add a precise qualifier, like cpu/event=0x3c,precise=1/

This is needed so that the kernel can request enabling PEBS
for TSX events. The parser bails out on any sysfs parse errors,
so this is needed in any case to handle any event on the TSX
perf kernel.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/parse-events.c |6 ++
 tools/perf/util/parse-events.h |1 +
 tools/perf/util/parse-events.l |1 +
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index c0b785b..5b97a2b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -526,6 +526,12 @@ do {   
\
case PARSE_EVENTS__TERM_TYPE_NAME:
CHECK_TYPE_VAL(STR);
break;
+   case PARSE_EVENTS__TERM_TYPE_PRECISE:
+   CHECK_TYPE_VAL(NUM);
+   if ((unsigned)term->val.num > 2)
+   return -EINVAL;
+   attr->precise_ip = term->val.num;
+   break;
default:
return -EINVAL;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index ac9a6aa..c1166dd 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -49,6 +49,7 @@ enum {
PARSE_EVENTS__TERM_TYPE_NAME,
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
+   PARSE_EVENTS__TERM_TYPE_PRECISE,
 };
 
 struct parse_events__term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 66959fa..892297f 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -169,6 +169,7 @@ period  { return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,  { return ','; }
 "/"{ BEGIN(INITIAL); return '/'; }
+precise{ return term(yyscanner, 
PARSE_EVENTS__TERM_TYPE_PRECISE); }
 {name_minus}   { return str(yyscanner, PE_NAME); }
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/32] perf, x86: Disable LBR recording for unknown LBR_FMT

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

When the LBR format is unknown disable LBR recording. This prevents
crashes when the LBR address is misdecoded and mis-sign extended.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 2af6695b..ad5af13 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -13,6 +13,7 @@ enum {
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
LBR_FORMAT_EIP_FLAGS2   = 0x04,
+   LBR_FORMAT_MAX_KNOWN= LBR_FORMAT_EIP_FLAGS2,
 };
 
 /*
@@ -392,7 +393,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
/*
 * no LBR on this PMU
 */
-   if (!x86_pmu.lbr_nr)
+   if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > 
LBR_FORMAT_MAX_KNOWN)
return -EOPNOTSUPP;
 
/*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/32] perf, x86: Add PEBSv2 record support

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add support for the v2 PEBS format. It has a superset of the v1 PEBS
fields, but has a longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which directly
gives the instruction, not off-by-one instruction. So with precise == 2
we use that directly and don't try to use LBRs and walking basic blocks.
This lowers the overhead significantly.

Some other features are added in later patches.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.c  |2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  101 ++---
 2 files changed, 79 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..ec3c549 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -403,7 +403,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 * check that PEBS LBR correction does not conflict with
 * whatever the user is asking with attr->branch_sample_type
 */
-   if (event->attr.precise_ip > 1) {
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format 
< 2) {
u64 *br_type = >attr.branch_sample_type;
 
if (has_branch_stack(event)) {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..9d0dae0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -41,6 +41,12 @@ struct pebs_record_nhm {
u64 status, dla, dse, lat;
 };
 
+struct pebs_record_v2 {
+   struct pebs_record_nhm nhm;
+   u64 eventingrip;
+   u64 tsx_tuning;
+};
+
 void init_debug_store_on_cpu(int cpu)
 {
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -559,8 +565,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 {
/*
 * We cast to pebs_record_core since that is a subset of
-* both formats and we don't use the other fields in this
-* routine.
+* both formats.
 */
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct pebs_record_core *pebs = __pebs;
@@ -588,7 +593,10 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
regs.bp = pebs->bp;
regs.sp = pebs->sp;
 
-   if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip())
+   if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
+   regs.ip = ((struct pebs_record_v2 *)pebs)->eventingrip;
+   regs.flags |= PERF_EFLAGS_EXACT;
+   } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip())
regs.flags |= PERF_EFLAGS_EXACT;
else
regs.flags &= ~PERF_EFLAGS_EXACT;
@@ -641,35 +649,21 @@ static void intel_pmu_drain_pebs_core(struct pt_regs 
*iregs)
__intel_pmu_pebs_event(event, iregs, at);
 }
 
-static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+static void intel_pmu_drain_pebs_common(struct pt_regs *iregs, void *at,
+   void *top)
 {
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
-   struct pebs_record_nhm *at, *top;
struct perf_event *event = NULL;
u64 status = 0;
-   int bit, n;
-
-   if (!x86_pmu.pebs_active)
-   return;
-
-   at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
-   top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+   int bit;
 
ds->pebs_index = ds->pebs_buffer_base;
 
-   n = top - at;
-   if (n <= 0)
-   return;
+   for ( ; at < top; at += x86_pmu.pebs_record_size) {
+   struct pebs_record_nhm *p = at;
 
-   /*
-* Should not happen, we program the threshold at 1 and do not
-* set a reset value.
-*/
-   WARN_ONCE(n > x86_pmu.max_pebs_events, "Unexpected number of pebs 
records %d\n", n);
-
-   for ( ; at < top; at++) {
-   for_each_set_bit(bit, (unsigned long *)>status, 
x86_pmu.max_pebs_events) {
+   for_each_set_bit(bit, (unsigned long *)>status, 
x86_pmu.max_pebs_events) {
event = cpuc->events[bit];
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -692,6 +686,61 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
}
 }
 
+static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+{
+   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+   struct debug_store *ds = cpuc->ds;
+   struct pebs_record_nhm *at, *top;
+   int n;
+
+   if (!x86_pmu.pebs_active)
+   return;
+
+   at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+   top = (struct 

perf PMU support for Haswell v6

2012-11-09 Thread Andi Kleen
[Updated version for the latest master tree and various fixes,
addressing review feedback.  See end for details.

This should be ready for merging now, just waiting for Peter]

This adds perf PMU support for the upcoming Haswell core. The patchkit 
is fairly large, mainly due to various enhancement for TSX. TSX tuning
relies heavily on the PMU, so I tried hard to make all facilities 
easily available. In addition it also has some other enhancements.

This includes changes to the core perf code, to the x86 specific part,
to the perf user land tools and to KVM

Available at 
git://git.kernel.org/pub/scm/linux/kernel/ak/linux-misc.git hsw/pmu3

High level overview:

- Basic Haswell PMU support
- Easy high level TSX measurement in perf stat -T
- Transaction events and attributes implemented with sysfs enumeration
- Export arch perfmon events in sysfs 
- Generic weightend profiling for memory latency and transaction abort costs.
- Support for address profiling
- Support for filtering events inside/outside transactions
- KVM support to do this from guests
- Support for filtering/sorting/bucketing transaction abort types based on 
PEBS information
- LBR support for transactions

For more details on the Haswell PMU please see the SDM. For more details on TSX
please see http://halobates.de/adding-lock-elision-to-linux.pdf

Some of the added features could be added to older CPUs too. I plan
to do this, but in separate patches.

Review appreciated.

v2: Removed generic transaction events and qualifiers and use sysfs
enumeration. Also export arch perfmon, so that the qualifiers work.
Fixed various issues this exposed. Don't use a special macro for the
TSX constraints anymore. Address other review feedback.
Added pdir event in sysfs.

v3: Fix various bugs and address review comments.
tx-aborts instead of cpu/tx-aborts/ works now (with some limitations)
cpu/instructions,intx=1/ works now

v4:
Addressed all review feedback (I hope). See changelog in individual patches.
KVM support now works again with more changes.
Forbid some more flag combinations that don't work well.

v5:
Rebased on latest perf/core. New method for sysfs events.
Obsolete patches dropped. Added one patch from Stephane.
Fixed generic aliases inside cpu//
Improved transaction flags decoding
Addressed all review feedback (except for two minor items in
perf tools from Namhyung)

v6:
Fix WERROR=1 build with latest fixes.
Address KVM feedback. 
Improve transaction flags display.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/32] perf, x86: Support PERF_SAMPLE_ADDR on Haswell

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Haswell supplies the address for every PEBS memory event, so always fill it in
when the user requested it.  It will be 0 when not useful (no memory access)

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 16d7c58..aa0f5fa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -630,6 +630,10 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
else
regs.flags &= ~PERF_EFLAGS_EXACT;
 
+   if ((event->attr.sample_type & PERF_SAMPLE_ADDR) &&
+   x86_pmu.intel_cap.pebs_format >= 2)
+   data.addr = ((struct pebs_record_v2 *)pebs)->nhm.dla;
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/32] perf, x86: Basic Haswell PEBS support v3

2012-11-09 Thread Andi Kleen
From: Andi Kleen 

Add basic PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new events.

v2: Readd missing pebs_aliases
v3: Readd missing hunk. Fix some constraints.
Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event.h  |2 ++
 arch/x86/kernel/cpu/perf_event_intel.c|6 --
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   29 +
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8941899..1567b0d 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -596,6 +596,8 @@ extern struct event_constraint 
intel_snb_pebs_event_constraints[];
 
 extern struct event_constraint intel_ivb_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_pebs_event_constraints[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 3a08534..634f639 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -826,7 +826,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct 
perf_event *event)
return true;
 
/* implicit branch sampling to correct PEBS skid */
-   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1)
+   if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
+   x86_pmu.intel_cap.pebs_format < 2)
return true;
 
return false;
@@ -2127,8 +2128,9 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_nhm();
 
x86_pmu.event_constraints = intel_hsw_event_constraints;
-
+   x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snb_extra_regs;
+   x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 9d0dae0..16d7c58 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -427,6 +427,35 @@ struct event_constraint intel_ivb_pebs_event_constraints[] 
= {
 EVENT_CONSTRAINT_END
 };
 
+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+   INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+   INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
+   INTEL_EVENT_CONSTRAINT(0xc4, 0xf),/* BR_INST_RETIRED.* */
+   INTEL_UEVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
+   INTEL_UEVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
+   INTEL_UEVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.NEAR_TAKEN */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8),/* MEM_TRANS_RETIRED.* */
+   INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf), /* 
MEM_UOPS_RETIRED.STLB_MISS_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES 
*/
+   INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+   INTEL_UEVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+   INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB 
*/
+   INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
+   INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf), /* 
MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
+   INTEL_UEVENT_CONSTRAINT(0x02d3, 0xf), /* 
MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
+   INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
+   INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
+
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
struct event_constraint *c;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9 v3] cgroup: add cgroup_subsys->post_create()

2012-11-09 Thread Glauber Costa
On 11/09/2012 06:22 PM, Tejun Heo wrote:
> Hey, Daniel.
> 
> On Fri, Nov 09, 2012 at 12:09:38PM +0100, Daniel Wagner wrote:
>> On 08.11.2012 20:07, Tejun Heo wrote:> Subject: cgroup: add
>> cgroup_subsys->post_create()
>>>
>>> Currently, there's no way for a controller to find out whether a new
>>> cgroup finished all ->create() allocatinos successfully and is
>>> considered "live" by cgroup.
>>
>> I'd like add hierarchy support to net_prio and the first thing to
>> do is to get rid of get_prioidx(). It looks like it would be nice to
> 
> Ooh, I'm already working on it.  I *think* I should be able to post
> the patches later today or early next week.
> 
>> be able to use use_id and post_create() for this but as I read the
>> code this might not work because the netdev might access resources
>> allocated between create() and post_create(). So my question is
>> would it make sense to move
>>
>> cgroup_create():
>>
>>  if (ss->use_id) {
>>  err = alloc_css_id(ss, parent, cgrp);
>>  if (err)
>>  goto err_destroy;
>>  }
>>
>> part before create() or add some protection between create() and
>> post_create() callback in net_prio. I have a patch but I see
>> I could drop it completely if post_create() is there.
> 
> Glauber had about similar question about css_id and I need to think
> more about it but currently I think I want to phase out css IDs.  It's
> an id of the wrong thing (CSSes don't need IDs, cgroups do) and
> unnecessarily duplicates its own hierarchy when the hierarchy of
> cgroups already exists.  Once memcontrol moves away from walking using
> css_ids, I *think* I'll try to kill it.

May I suggest doing something similar with what the scheduler does? I
had some code in the past that reused that code - but basically
duplicated it. If you want, I can try getting a version of that in
kernel/cgroup.c  that would serve as a general walker.

I like that walker a lot, because it happens in a sane order. memcg
basically walks in a random weird order, that makes hierarchical
computation of anything quite hard.

> 
> I'll add cgroup ID (no hierarchy funnies, just a single ida allocated
> number) so that it can be used for cgroup indexing.  Glauber, that
> should solve your problem too, right?
> 

Actually I went with a totally orthogonal solution. I am now using per
kmem-limited ids. Because they are not tied to the cgroup creation
workflow, I can allocate whenever it is more convenient.

I ended up liking this solution because it will do better in scenarios
where most of the memcgs are not kmem limited. So it had an edge here,
and also got rid of the create/post_create problem by breaking the
dependency.

But of course, if cgroups would gain some kind of sane indexing, it
could shift the balance towards reusing it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/12] perf tests: Move tests from builtin-test

2012-11-09 Thread Jiri Olsa
hi,
I discussed with Arnaldo of moving tests from builtin-test
into separated objects.. here it is. I swear not to touch
test code for a while after this one ;-)

Attached patches:
  01/12 perf tests: Move test__vmlinux_matches_kallsyms into separate object
  02/12 perf tests: Move test__open_syscall_event into separate object
  03/12 perf tests: Move test__open_syscall_event_on_all_cpus into separate 
object
  04/12 perf tests: Move test__basic_mmap into separate object
  05/12 perf tests: Move test__PERF_RECORD into separate object
  06/12 perf tests: Move test__rdpmc into separate object
  07/12 perf tests: Move perf_evsel__roundtrip_name_test into separate object
  08/12 perf tests: Move perf_evsel__tp_sched_test into separate object
  09/12 perf tests: Move test__syscall_open_tp_fields into separate object
  10/12 perf tests: Move pmu tests into separate object
  11/12 perf tests: Final cleanup for builtin-test move
  12/12 perf tests: Check for mkstemp return value in dso-data test

Also available here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/linux.git
  perf/tests

thanks,
jirka

Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile   |   11 +
 tools/perf/perf.h |1 -
 tools/perf/tests/attr.c   |3 +-
 tools/perf/tests/builtin-test.c   | 1399 
+--
 tools/perf/tests/dso-data.c   |7 +-
 tools/perf/tests/evsel-roundtrip-name.c   |  114 +
 tools/perf/tests/evsel-tp-sched.c |   84 
 tools/perf/tests/mmap-basic.c |  162 +++
 tools/perf/tests/open-syscall-all-cpus.c  |  120 +
 tools/perf/tests/open-syscall-tp-fields.c |  117 +
 tools/perf/tests/open-syscall.c   |   66 +++
 tools/perf/tests/parse-events.c   |3 +-
 tools/perf/tests/perf-record.c|  314 ++
 tools/perf/tests/pmu.c|  178 
 tools/perf/tests/rdpmc.c  |  175 
 tools/perf/tests/tests.h  |   22 +
 tools/perf/tests/util.c   |   30 ++
 tools/perf/tests/vmlinux-kallsyms.c   |  230 ++
 tools/perf/util/parse-events.h|1 -
 tools/perf/util/pmu.c |  185 +---
 tools/perf/util/pmu.h |4 +
 tools/perf/util/symbol.h  |1 -
 22 files changed, 1654 insertions(+), 1573 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/12] perf tests: Final cleanup for builtin-test move

2012-11-09 Thread Jiri Olsa
Final function renames to match test__* style and
include cleanup.

Signed-off-by: Jiri Olsa 
Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
---
 tools/perf/perf.h   |  1 -
 tools/perf/tests/attr.c |  3 ++-
 tools/perf/tests/builtin-test.c | 34 +++---
 tools/perf/tests/dso-data.c |  3 ++-
 tools/perf/tests/parse-events.c |  3 ++-
 tools/perf/tests/tests.h|  3 +++
 tools/perf/util/parse-events.h  |  1 -
 tools/perf/util/symbol.h|  1 -
 8 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 054182e..0047264 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -178,7 +178,6 @@ extern bool test_attr__enabled;
 void test_attr__init(void);
 void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu,
 int fd, int group_fd, unsigned long flags);
-int  test_attr__run(void);
 
 static inline int
 sys_perf_event_open(struct perf_event_attr *attr,
diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index 6e2feee..25638a9 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -27,6 +27,7 @@
 #include "../perf.h"
 #include "util.h"
 #include "exec_cmd.h"
+#include "tests.h"
 
 #define ENV "PERF_TEST_ATTR"
 
@@ -151,7 +152,7 @@ static int run_dir(const char *d, const char *perf)
return system(cmd);
 }
 
-int test_attr__run(void)
+int test__attr(void)
 {
struct stat st;
char path_perf[PATH_MAX];
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d3b95e0..186f675 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -4,31 +4,11 @@
  * Builtin regression testing command: ever growing number of sanity tests
  */
 #include "builtin.h"
-
-#include "util/cache.h"
-#include "util/color.h"
-#include "util/debug.h"
-#include "util/debugfs.h"
-#include "util/evlist.h"
-#include "util/machine.h"
-#include "util/parse-options.h"
-#include "util/parse-events.h"
-#include "util/symbol.h"
-#include "util/thread_map.h"
-#include "util/pmu.h"
-#include "event-parse.h"
-#include "../../include/linux/hw_breakpoint.h"
-
-#include 
-
-#include "util/cpumap.h"
-#include "util/evsel.h"
-#include 
-
 #include "tests.h"
-
-#include 
-
+#include "debug.h"
+#include "color.h"
+#include "parse-options.h"
+#include "symbol.h"
 
 static struct test {
const char *desc;
@@ -52,7 +32,7 @@ static struct test {
},
{
.desc = "parse events tests",
-   .func = parse_events__test,
+   .func = test__parse_events,
},
 #if defined(__x86_64__) || defined(__i386__)
{
@@ -70,7 +50,7 @@ static struct test {
},
{
.desc = "Test dso data interface",
-   .func = dso__test_data,
+   .func = test__dso_data,
},
{
.desc = "roundtrip evsel->name check",
@@ -86,7 +66,7 @@ static struct test {
},
{
.desc = "struct perf_event_attr setup",
-   .func = test_attr__run,
+   .func = test__attr,
},
{
.func = NULL,
diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index 0cd42fc..b5198f5 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -8,6 +8,7 @@
 
 #include "machine.h"
 #include "symbol.h"
+#include "tests.h"
 
 #define TEST_ASSERT_VAL(text, cond) \
 do { \
@@ -95,7 +96,7 @@ struct test_data_offset offsets[] = {
},
 };
 
-int dso__test_data(void)
+int test__dso_data(void)
 {
struct machine machine;
struct dso *dso;
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index b49c2ee..f2a82d0 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -4,6 +4,7 @@
 #include "evlist.h"
 #include "sysfs.h"
 #include "../../../include/linux/hw_breakpoint.h"
+#include "tests.h"
 
 #define TEST_ASSERT_VAL(text, cond) \
 do { \
@@ -1086,7 +1087,7 @@ static int test_pmu_events(void)
return ret;
 }
 
-int parse_events__test(void)
+int test__parse_events(void)
 {
int ret1, ret2 = 0;
 
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 88a55df..fc121ed 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -12,6 +12,9 @@ int test__perf_evsel__roundtrip_name_test(void);
 int test__perf_evsel__tp_sched_test(void);
 int test__syscall_open_tp_fields(void);
 int test__pmu(void);
+int test__attr(void);
+int test__dso_data(void);
+int test__parse_events(void);
 
 /* Util */
 int trace_event__id(const char *evname);
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index ac9a6aa..f639937 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -99,7 +99,6 @@ void 

Re: [PATCH] mtd: gpmi: remove unneccessary header

2012-11-09 Thread Huang Shijie
On Fri, Nov 9, 2012 at 10:10 AM, Artem Bityutskiy  wrote:
> On Sat, 2012-10-27 at 10:23 +0800, Huang Shijie wrote:
>> 于 2012年10月26日 21:41, Artem Bityutskiy 写道:
>> > On Sat, 2012-10-13 at 13:03 -0400, Huang Shijie wrote:
>> > > The whole gpmi-nand driver has turned to pure devicetree supported.
>> > > So the linux/mtd/gpmi-nand.h is not neccessary now. Just remove it,
>> > > and move some macros to the gpmi-nand driver itself.
>> > >
>> > > Signed-off-by: Huang Shijie 
>> > I cannot compile-test it because the MXS configuration file I always
>> I tested this patch in mxs and mx6q with the linux-next-20121026.
>> I do not meet such compiler errors.
>>
>> I think the l2-mtd tree is lack of some patches in other people's
>> tree.
>> In the current l2-mtd tree, even i DISABLE the gpmi driver, the kernel
>> still
>> meets the compiler errors.
>
> It is based on v3.7-rc4. Once the patches you refer are in Linus's tree,
> let me know. Thanks!
>
I re-created the .config based on the latest l2-mtd tree. I compiled
the kernel with this patch again, and
there is no compiler error anymore.

Could you re-create your mxs config with "make ARCH=ARM
mxs_defconfig", and test this patch again?

thanks
Huang Shijie

> --
> Best Regards,
> Artem Bityutskiy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4 0/3] Add clock framework for armada 370/XP

2012-11-09 Thread Mike Turquette
Quoting Gregory CLEMENT (2012-10-30 11:23:30)
> Hello Mike,
> 
> I hope this 4th version will meet your expectations. Beside the
> corrections you have asked I also changed the way I get resources for
> the clocks. Instead of referring to a node name, now I refer to a
> compatible name which should be a better use of the device tree.
> 
> Rather than taking this series in your git tree, would it possible to
> give your ack to the patches and let Jason Cooper take the patch set
> in his git tree. We have other series for mvebu which depend on this
> one (SMP, HWIOCC and SATA for now and more to come), and it will be
> easier for us to have everything in the same place.
> 

Hi Gregory,

After much delay:

Acked-by: Mike Turquette 

Regards,
Mike

> The purpose of this patch set is to add support for clock framework
> for Armada 370 and Armada XP SoCs. All the support is done under the
> directory drivers/clk/mvebu/ as the support for other mvebu SoCs was
> in mind during the writing of the code.
> 
> Two kinds of clocks are added:
> 
> - The CPU clocks are only for Armada XP (which is multi-core)
> 
> - The core clocks are clocks which have their rate fixed during
>   reset.
> 
> Many thanks to Thomas Petazzoni and Sebastian Hesselbarth for their
> review and feedback. The device tree bindings were really improved
> with the advices of Sebastian.
> 
> Changelog:
> V3 -> V4
> - Rebased on top of 3.7-rc3
> - Replaced the whitespace by tab in the Documentation files
> - Fixed the comment style according to the CodingStyle documentation
> - Fixed incorrect indentation
> - Removed redundant header in clk-cpu.c
> - Replaced improper whitespace by tabs in armada-xp.dtsi
> - In the device tree, the resources for the clocks do not rely anymore
>   on the node name mvebu-sar but now only depend on the compatible
>   name. (Issue reported by Sebastian Hesselbarth)
> 
> V2 -> V3:
> - Rebased on top of v3.7-rc1
> - Fixed a typo in device trees
> - Fixed warning from checkpatch
> 
> V1 -> V2:
> - Improved the spelling and the wording of the documentation and the
>   1st commit log
> - Removed the "end_of_list" name which are unused here.
> - Fix the cpu clock by using of_clk_src_onecell_get in the same way it
>   was used for the core clocks
> 
> Regards,
> 
> 
> Gregory CLEMENT (3):
>   clk: mvebu: add armada-370-xp specific clocks
>   clk: armada-370-xp: add support for clock framework
>   clocksource: time-armada-370-xp converted to clk framework
> 
>  .../devicetree/bindings/clock/mvebu-core-clock.txt |   40 +++
>  .../devicetree/bindings/clock/mvebu-cpu-clock.txt  |   21 ++
>  arch/arm/boot/dts/armada-370-db.dts|4 -
>  arch/arm/boot/dts/armada-370-xp.dtsi   |1 +
>  arch/arm/boot/dts/armada-370.dtsi  |   12 +
>  arch/arm/boot/dts/armada-xp.dtsi   |   48 +++
>  arch/arm/mach-mvebu/Kconfig|5 +
>  arch/arm/mach-mvebu/armada-370-xp.c|8 +-
>  arch/arm/mach-mvebu/common.h   |1 +
>  drivers/clk/Makefile   |1 +
>  drivers/clk/mvebu/Makefile |2 +
>  drivers/clk/mvebu/clk-core.c   |  318 
> 
>  drivers/clk/mvebu/clk-core.h   |   19 ++
>  drivers/clk/mvebu/clk-cpu.c|  154 ++
>  drivers/clk/mvebu/clk-cpu.h|   19 ++
>  drivers/clk/mvebu/clk.c|   36 +++
>  drivers/clocksource/time-armada-370-xp.c   |   11 +-
>  17 files changed, 690 insertions(+), 10 deletions(-)
>  create mode 100644 
> Documentation/devicetree/bindings/clock/mvebu-core-clock.txt
>  create mode 100644 
> Documentation/devicetree/bindings/clock/mvebu-cpu-clock.txt
>  create mode 100644 drivers/clk/mvebu/Makefile
>  create mode 100644 drivers/clk/mvebu/clk-core.c
>  create mode 100644 drivers/clk/mvebu/clk-core.h
>  create mode 100644 drivers/clk/mvebu/clk-cpu.c
>  create mode 100644 drivers/clk/mvebu/clk-cpu.h
>  create mode 100644 drivers/clk/mvebu/clk.c
> 
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] leds: leds-pwm: Convert to use devm_get_pwm

2012-11-09 Thread Bryan Wu
On Wed, Nov 7, 2012 at 3:42 AM, Peter Ujfalusi  wrote:
> Update the driver to use the new API for requesting pwm so we can take
> advantage of the pwm_lookup table to find the correct pwm to be used for the
> LED functionality.
> If the devm_get_pwm fails we fall back to legacy mode to try to get the pwm.
>
> Signed-off-by: Peter Ujfalusi 
> ---
>  drivers/leds/leds-pwm.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/leds/leds-pwm.c b/drivers/leds/leds-pwm.c
> index f2e44c7..6bf9445 100644
> --- a/drivers/leds/leds-pwm.c
> +++ b/drivers/leds/leds-pwm.c
> @@ -47,6 +47,19 @@ static void led_pwm_set(struct led_classdev *led_cdev,
> }
>  }
>
> +static struct pwm_device *led_pwm_request_pwm(struct platform_device *pdev,
> + struct led_pwm *cur_led)
> +{
> +   struct pwm_device *pwm;
> +
> +   pwm = devm_pwm_get(>dev, cur_led->name);
> +   if (IS_ERR(pwm)) {
> +   dev_err(>dev, "unable to request PWM, trying legacy 
> API\n");
> +   pwm = pwm_request(cur_led->pwm_id, cur_led->name);

Looks good, but why we still need to keep the old API pwm_request(),
if devm_pwm_get() is the right replacement. AFAIK, devm_***() API will
help to clean up if device probing fails.

So if it is good enough, we can just replace pwm_request() to the devm_pwm_get()

Thanks,
-Bryan

> +   }
> +   return pwm;
> +}
> +
>  static int led_pwm_probe(struct platform_device *pdev)
>  {
> struct led_pwm_platform_data *pdata = pdev->dev.platform_data;
> @@ -67,8 +80,7 @@ static int led_pwm_probe(struct platform_device *pdev)
> cur_led = >leds[i];
> led_dat = _data[i];
>
> -   led_dat->pwm = pwm_request(cur_led->pwm_id,
> -   cur_led->name);
> +   led_dat->pwm = led_pwm_request_pwm(pdev, cur_led);
> if (IS_ERR(led_dat->pwm)) {
> ret = PTR_ERR(led_dat->pwm);
> dev_err(>dev, "unable to request PWM %d\n",
> --
> 1.8.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples

2012-11-09 Thread John Stultz

On 10/16/2012 10:23 AM, Peter Zijlstra wrote:

On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:

Hi,

There are many situations where we want to correlate events happening at
the user level with samples recorded in the perf_event kernel sampling buffer.
For instance, we might want to correlate the call to a function or creation of
a file with samples. Similarly, when we want to monitor a JVM with jitted code,
we need to be able to correlate jitted code mappings with perf event samples
for symbolization.

Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
That causes each PERF_RECORD_SAMPLE to include a timestamp
generated by calling the local_clock() -> sched_clock_cpu() function.

To make correlating user vs. kernel samples easy, we would need to
access that sched_clock() functionality. However, none of the existing
clock calls permit this at this point. They all return timestamps which are
not using the same source and/or offset as sched_clock.

I believe a similar issue exists with the ftrace subsystem.

The problem needs to be adressed in a portable manner. Solutions
based on reading TSC for the user level to reconstruct sched_clock()
don't seem appropriate to me.

One possibility to address this limitation would be to extend clock_gettime()
with a new clock time, e.g., CLOCK_PERF.

However, I understand that sched_clock_cpu() provides ordering guarantees only
when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
But we already have to deal with this problem when merging samples obtained
from different CPU sampling buffer in per-thread mode. So this is not
necessarily
a showstopper.

Alternatives could be to use uprobes but that's less practical to setup.

Anyone with better ideas?

You forgot to CC the time people ;-)

I've no problem with adding CLOCK_PERF (or another/better name).
Hrm. I'm not excited about exporting that sort of internal kernel 
details to userland.


The behavior and expectations from sched_clock() has changed over the 
years, so I'm not sure its wise to export it, since we'd have to 
preserve its behavior from then on.


Also I worry that it will be abused in the same way that direct TSC 
access is, where the seemingly better performance from the more 
careful/correct CLOCK_MONOTONIC would cause developers to write fragile 
userland code that will break when moved from one machine to the next.


I'd probably rather perf output timestamps to userland using sane clocks 
(CLOCK_MONOTONIC), rather then trying to introduce a new time domain to 
userland.   But I probably could be convinced I'm wrong.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


net, batman: NULL ptr deref in batadv_iv_ogm_queue_add

2012-11-09 Thread Sasha Levin
Hi all,

While fuzzing with trinity in a KVM tools (lkvm) guest running latest -next
kernel, I've stumbled on the following:

[  469.854708] batman_adv: �,�]+: Removing interface: bond0
[  469.890909] BUG: unable to handle kernel NULL pointer dereference at 
0003
[  469.906428] IP: [] batadv_iv_ogm_queue_add+0x20/0x700
[  469.906428] PGD 907c067 PUD 907b067 PMD 0
[  469.906428] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  469.906428] Dumping ftrace buffer:
[  469.921756](ftrace buffer empty)
[  469.921756] CPU 1
[  469.921756] Pid: 43, comm: kworker/u:1 Tainted: GW
3.7.0-rc4-next-20121109-sasha-00013-g9407f3c #125
[  469.921756] RIP: 0010:[]  [] 
batadv_iv_ogm_queue_add+0x20/0x700
[  469.921756] RSP: :880013361c08  EFLAGS: 00010292
[  469.921756] RAX: 0062 RBX:  RCX: 8800622d2c00
[  469.921756] RDX: 001a RSI:  RDI: 0064
[  469.921756] RBP: 880013361c88 R08: 0001 R09: 000142bf
[  469.921756] R10:  R11:  R12: 000142bf
[  469.921756] R13: 8800088e8b00 R14: 8800088e8b00 R15: 0001
[  469.921756] FS:  () GS:88002780() 
knlGS:
[  469.921756] CS:  0010 DS:  ES:  CR0: 80050033
[  469.921756] CR2: 0003 CR3: 0788e000 CR4: 000406e0
[  469.921756] DR0:  DR1:  DR2: 
[  469.921756] DR3:  DR6: 0ff0 DR7: 0400
[  469.921756] Process kworker/u:1 (pid: 43, threadinfo 88001336, task 
880013358000)
[  469.921756] Stack:
[  469.921756]  880013361c28 81135a74 8800088e8b00 

[  469.921756]  880013361c88 839f32ed 839f3160 

[  469.921756]  880021f0cae0  8800622d2c00 

[  469.921756] Call Trace:
[  469.921756]  [] ? __rcu_read_unlock+0x44/0xb0
[  469.921756]  [] ? batadv_slide_own_bcast_window+0x1cd/0x1f0
[  469.921756]  [] ? batadv_slide_own_bcast_window+0x40/0x1f0
[  469.921756]  [] batadv_iv_ogm_schedule+0x2a6/0x300
[  469.921756]  [] ? batadv_iv_ogm_queue_add+0x700/0x700
[  469.921756]  [] ? local_bh_enable_ip+0xef/0x150
[  469.921756]  [] 
batadv_send_outstanding_bat_ogm_packet+0xd0/0xf0
[  469.921756]  [] process_one_work+0x3b9/0x770
[  469.921756]  [] ? process_one_work+0x268/0x770
[  469.921756]  [] ? get_lock_stats+0x22/0x70
[  469.921756]  [] ? 
batadv_add_bcast_packet_to_list+0x320/0x320
[  469.921756]  [] worker_thread+0x2ba/0x3f0
[  469.921756]  [] ? rescuer_thread+0x2d0/0x2d0
[  469.921756]  [] kthread+0xe3/0xf0
[  469.921756]  [] ? put_lock_stats.isra.16+0xe/0x40
[  469.921756]  [] ? insert_kthread_work+0x90/0x90
[  469.921756]  [] ret_from_fork+0x7c/0xb0
[  469.921756]  [] ? insert_kthread_work+0x90/0x90
[  469.921756] Code: 16 7a fd e8 43 22 75 fd 5d c3 90 55 48 89 e5 41 57 41 56 
41 55 49 89 fd bf 64 00 00 00 41 54 4d 89 cc 53 48
83 ec 58 48 89 75 b8 <0f> b6 5e 03 89 55 c0 48 89 4d b0 44 89 45 c4 e8 9c ec 72 
fd 49
[  469.921756] RIP  [] batadv_iv_ogm_queue_add+0x20/0x700
[  469.921756]  RSP 
[  469.921756] CR2: 0003
[  470.016647] ---[ end trace 42fb97717ce977ba ]---


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] pstore/ram: no timekeeping calls when unavailable

2012-11-09 Thread John Stultz

On 11/05/2012 02:00 PM, Kees Cook wrote:

We must not call timekeeping functions unless they are available. If we dump
before they have resumed, avoid a WARN_ON by setting the timestamp to 0.

Since the "ram" pstore driver can be a module, we must have
timekeeping_suspended exported.

Reported-by: Doug Anderson 
Cc: Anton Vorontsov 
Cc: John Stultz 
Signed-off-by: Kees Cook 
---
v2:
  - export needed for timekeeping_suspended (thanks to Fengguang Wu).
---
  fs/pstore/ram.c   |8 +++-
  kernel/time/timekeeping.c |1 +
  2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 1a4f6da..6d014e0 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -171,7 +171,13 @@ static size_t ramoops_write_kmsg_hdr(struct 
persistent_ram_zone *prz)
struct timeval timestamp;
size_t len;

-   do_gettimeofday();
+   /* Handle dumping before timekeeping has resumed. */
+   if (unlikely(timekeeping_suspended)) {
+   timestamp.tv_sec = 0;
+   timestamp.tv_usec = 0;
+   } else
+   do_gettimeofday();
+
Would nulling out the timestamp be better done in do_gettimeofday()?  
That way we don't have to export timekeeping internals and users would 
get something more sane for this corner case.


thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rapidio: fix kernel-doc warnings

2012-11-09 Thread Randy Dunlap
From: Randy Dunlap 

Fix rapidio kernel-doc warnings:

Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local'
Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' 
description in 'rio_map_inb_region'
Warning(include/linux/rio.h:290): No description found for parameter 'switches'
Warning(include/linux/rio.h:290): No description found for parameter 
'destid_table'

Signed-off-by: Randy Dunlap 
Cc: Matt Porter 
Cc: Alexandre Bounine 
---
 drivers/rapidio/rio.c |2 +-
 include/linux/rio.h   |2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

--- lnx-37-rc4.orig/include/linux/rio.h
+++ lnx-37-rc4/include/linux/rio.h
@@ -275,9 +275,11 @@ struct rio_id_table {
  * struct rio_net - RIO network info
  * @node: Node in global list of RIO networks
  * @devices: List of devices in this network
+ * @switches: List of switches in this netowrk
  * @mports: List of master ports accessing this network
  * @hport: Default port for accessing this network
  * @id: RIO network ID
+ * @destid_table: destID allocation table
  */
 struct rio_net {
struct list_head node;  /* node in list of networks */
--- lnx-37-rc4.orig/drivers/rapidio/rio.c
+++ lnx-37-rc4/drivers/rapidio/rio.c
@@ -401,7 +401,7 @@ EXPORT_SYMBOL_GPL(rio_release_inb_pwrite
 /**
  * rio_map_inb_region -- Map inbound memory region.
  * @mport: Master port.
- * @lstart: physical address of memory region to be mapped
+ * @local: physical address of memory region to be mapped
  * @rbase: RIO base address assigned to this window
  * @size: Size of the memory region
  * @rflags: Flags for mapping.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 02/11] clk: davinci - add PSC clock driver

2012-11-09 Thread Mike Turquette
Quoting Murali Karicheri (2012-11-05 07:10:52)
> On 11/03/2012 08:07 AM, Sekhar Nori wrote:
> > On 10/25/2012 9:41 PM, Murali Karicheri wrote:
> >> This is the driver for the Power Sleep Controller (PSC) hardware
> >> found on DM SoCs as well Keystone SoCs (c6x). This driver borrowed
> >> code from arch/arm/mach-davinci/psc.c and implemented the driver
> >> as per common clock provider API. The PSC module is responsible for
> >> enabling/disabling the Power Domain and Clock domain for different IPs
> >> present in the SoC. The driver is configured through the clock data
> >> passed to the driver through struct clk_psc_data.
> >>
> >> Signed-off-by: Murali Karicheri 
> >> ---
> >> +/**
> >> + * struct clk_psc - DaVinci PSC clock driver data
> >> + *
> >> + * @hw: clk_hw for the psc
> >> + * @psc_data: Driver specific data
> >> + */
> >> +struct clk_psc {
> >> +struct clk_hw hw;
> >> +struct clk_psc_data *psc_data;
> >> +spinlock_t *lock;
> > Unused member? I don't see this being used.
> 
> OK. Will remove.

Those locks are only used for the case where a register might contain
bits for several clocks.  Thus RMW operations are protected.  On OMAP
this isn't necessary due to a very generous register layout (typically
one 32-bit reg per module) controlling clocks.  Seems tha tmaybe this is
not needed for PSC module either?

Regards,
Mike

> >
> > Thanks,
> > Sekhar
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/31] numa/core patches

2012-11-09 Thread Alex Shi
On Sat, Nov 3, 2012 at 8:21 PM, Mel Gorman  wrote:
> On Sat, Nov 03, 2012 at 07:04:04PM +0800, Alex Shi wrote:
>> >
>> > In reality, this report is larger but I chopped it down a bit for
>> > brevity. autonuma beats schednuma *heavily* on this benchmark both in
>> > terms of average operations per numa node and overall throughput.
>> >
>> > SPECJBB PEAKS
>> >3.7.0  3.7.0
>> >   3.7.0
>> >   rc2-stats-v2r1 rc2-autonuma-v27r8
>> >  rc2-schednuma-v1r3
>> >  Expctd Warehouse   12.00 (  0.00%)   
>> > 12.00 (  0.00%)   12.00 (  0.00%)
>> >  Expctd Peak Bops   442225.00 (  0.00%)   
>> > 596039.00 ( 34.78%)   555342.00 ( 25.58%)
>> >  Actual Warehouse7.00 (  0.00%)
>> > 9.00 ( 28.57%)8.00 ( 14.29%)
>> >  Actual Peak Bops   550747.00 (  0.00%)   
>> > 646124.00 ( 17.32%)   560635.00 (  1.80%)
>>
>> It is impressive report!
>>
>> Could you like to share the what JVM and options are you using in the
>> testing, and based on which kinds of platform?
>>
>
> Oracle JVM version "1.7.0_07"
> Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
>
> 4 JVMs were run, one for each node.
>
> JVM switch specified was -Xmx12901m so it would consume roughly 80% of
> memory overall.
>
> Machine is x86-64 4-node, 64G of RAM, CPUs are E7-4807, 48 cores in
> total with HT enabled.
>

Thanks for configuration sharing!

I used Jrockit and openjdk with Hugepage plus pin JVM to cpu socket.
In previous sched numa version, I had found 20% dropping with Jrockit
with our configuration. but for this version. No clear regression
found. also has no benefit found.

Seems we need to expend the testing configurations. :)
-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Joel A Fernandes
On Fri, Nov 9, 2012 at 8:29 AM, David Gibson
 wrote:
> On Fri, Nov 09, 2012 at 12:32:09AM -0500, Joel A Fernandes wrote:
>> Hi Pantelis,
>>
>> I hope I'm not too late to reply as I'm traveling.
>>
>> On Nov 6, 2012, at 5:30 AM, Pantelis Antoniou
>>  wrote:
>>
>> >> Joanne has purchased one of Jane's capes and packaged it into a rugged
>> >> case for data logging. As far as Joanne is concerned, the BeagleBone and
>> >> cape together are a single unit and she'd prefer a single monolithic FDT
>> >> instead of using an FDT overlay.
>> >> Option A: Using dtc, she uses the BeagleBone and cape .dts source files
>> >>to generate a single .dtb for the entire system which is
>> >>loaded by U-Boot. -or-
>> >
>> > Unlikely.
>> >> Option B: Joanne uses a tool to merge the BeagleBone and cape .dtb files
>> >>(instead of .dts files), -or-
>> > Possible but low probability.
>> >
>> >> Option C: U-Boot loads both the base and overlay FDT files, merges them,
>> >>and passes the resolved tree to the kernel.
>> >>
>> >
>> > Could be made to work. Only really required if Joanne wants the
>> > cape interface to work for u-boot too. For example if the cape has some
>> > kind of network interface that u-boot will use to boot from.
>> >
>>
>> I love Grant's hashing idea a lot keeping the phandle problem for
>> compile time and not requiring fixups.
>
> Well, using a hash only moves the problem of fixed phandles to a
> problem of fixed node paths.  The details of node paths are, if
> anything, more mutable than phandles.

So what you're saying is there's no way to make a phandle a signature
of a (property of a node) since no one property is unique. If I
follow, even node path can't be used because hash value changes if
node is moved around in the tree. This shoots down the hashing idea
then, which I guess is looked down upon anyway due to dynamic changes
to the base DT as discussed in the usecases.

> [snip]
>> Alternatively to hashing, reading David Gibson's paper I followed,
>> phandle is supposed to 'uniquely' identity node. I wonder why the node
>> name itself is not sufficient to uniquely identify.
>
> Node names are not unique, not even close.  If you have two similar
> NICs in slot 0 of two different PCI domains, they'll almost certainly
> both be called 'ethernet@0,0'.  Similar examples abound on other
> buses.  Node paths are unique, but they are long.
>
> The other big reason for phandles in OF history is that they would be
> more stable than paths.  The device tree could be manipulated during
> OF runtime, but phandles would generally be internal pointers in OF
> and so remain a consistent handle even if the node moved in the tree.
> That's not really relevant for flat trees, but we need to work with
> the same structures.

Thanks.

>> The code that does
>> the tree walking can then just strcmp the node name while it walks the
>> tree instead of having to find a node with a phandle number. I guess
>> the reason is phandles are small to store as data values. Another
>> approach can be to arrange the string block in alphabetical order
>> (unless it already is),
>
> They're not, and doing so would be a painful change to maintain
> compatibility across.  And in any case only property names use the
> strings block, not node names.

Understood, thanks.

Regards,
Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dmatest: Fix NULL pointer dereference on ioat

2012-11-09 Thread viresh kumar
On Sat, Nov 10, 2012 at 2:27 AM, Jon Mason  wrote:
> device_control is an optional and not implemented in all DMA drivers.
> Any calls to these will result in a NULL pointer dereference.  dmatest
> makes two of these calls when completing the kernel thread and removing
> the module.  These are corrected by calling the dmaengine_device_control
> wrapper and checking for a non-existant device_control function pointer
> there.
>
> Signed-off-by: Jon Mason 
> CC: Vinod Koul 
> CC: Dan Williams 
> ---
>  drivers/dma/dmatest.c |4 ++--
>  include/linux/dmaengine.h |5 -
>  2 files changed, 6 insertions(+), 3 deletions(-)
>

> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index d3201e4..e0004fb 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -608,7 +608,10 @@ static inline int dmaengine_device_control(struct 
> dma_chan *chan,
>enum dma_ctrl_cmd cmd,
>unsigned long arg)
>  {
> -   return chan->device->device_control(chan, cmd, arg);
> +   if (chan->device->device_control)
> +   return chan->device->device_control(chan, cmd, arg);
> +   else
> +   return -EINVAL;

-ENOTSUPP or -ENOSYS ??

@Dan: I believe i don't have to send another version now. Correct??

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Device Tree Overlays Proposal (Was Re: capebus moving omap_devices to mach-omap2)

2012-11-09 Thread Joel A Fernandes
Hi Pantelis,

On Fri, Nov 9, 2012 at 2:13 AM, Pantelis Antoniou
 wrote:

 Option C: U-Boot loads both the base and overlay FDT files, merges them,
   and passes the resolved tree to the kernel.

>>>
>>> Could be made to work. Only really required if Joanne wants the
>>> cape interface to work for u-boot too. For example if the cape has some
>>> kind of network interface that u-boot will use to boot from.
>>>
>>
>> I love Grant's hashing idea a lot keeping the phandle problem for
>> compile time and not requiring fixups.
>>
>> IMO it is still a cleaner approach if u-boot does the tree merging for
>> all cases, and not the kernel.
>>
>> That way from a development standpoint, very little or nothing will
>> have to be changed in kernel (except for scripts/dtc) considering we
>> are moving forward with hashing.
>>
>> Also this discussed a while back but at some point is going to brought
>> up again-  loading of dt fragment directly from EEPROM and merging at
>> run time. If we were to implement this in kernel, we would have to add
>> cape specific EEPROM reading code, merge the tree before it is
>> unflattened and parse. I think doing tree merging in kernel is messy
>> and we should do it in uboot. Ideally reading the fragment from the
>> EEPROM for all capes and merging without worrying about version
>> detection, Doing the merge and passing the merged blob to the kernel
>> which (kernel) works the same way it does today.
>
> Not going to work, for a lot of cases. Doing it in the kernel seems to be
> the cleaner option. There are valid use cases for doing in u-boot too.

True, if dynamic runtime stuff from userspace is what we're talking
about, then yeah I see the important need for kernel to do the merge.

>> Alternatively to hashing, reading david gibsons paper I followed,
>> phandle is supposed to 'uniquely' identity node. I wonder why the node
>> name itself is not sufficient to unquiely identify. The code that does
>> the tree walking can then just strcmp the node name while it walks the
>> tree instead of having to find a node with a phandle number. I guess
>> the reason is phandles are small to store as data values. Another
>> approach can be to arrange the string block in alphabetical order
>> (unless it already is), and store phandle as index of the node name
>> referenced relative to the starting of the strong block. This will not
>> affect nodes in dtb being moved around since they will still have the
>> same index value. the problem being adding or removing nodes Changes
>> the offsets of all other nodes in the string block as well.. Hmm.
>>
>
> This is pretty radical change to the DT format, no?

Yes, true and the only way hypothetically to replace the phandle
tree-walking mechanism is to store node paths instead of phandle which
David pointed is too long to store, so I guess this wont work after
all. Anyway this was an interesting exercise, thanks.

Regards,
Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/11] ARM: set arch_gettimeoffset directly

2012-11-09 Thread Ryan Mallon
On 10/11/12 08:07, Stephen Warren wrote:

> On 11/08/2012 04:06 PM, Ryan Mallon wrote:
>> On 09/11/12 08:01, Stephen Warren wrote:
>>> remove ARM's struct sys_timer .offset function pointer, and instead
>>> directly set the arch_gettimeoffset function pointer when the timer
>>> driver is initialized. This requires multiplying all function results
>>> by 1000, since the removed arm_gettimeoffset() did this. Also,
>>> s/unsigned long/u32/ just to make the function prototypes exactly
>>> match that of arch_gettimeoffset.
> 
>>> +static u32 ep93xx_gettimeoffset(void)
>>> +{
>>> +   int offset;
>>> +
>>> +   offset = __raw_readl(EP93XX_TIMER4_VALUE_LOW) - last_jiffy_time;
>>> +
>>> +   /* Calculate (100 / 983040) * offset.  */
>>
>> This comment is now incorrect, it should say:
>>
>>  /* Calculate (10 / 983040) * offset */
>>
>> or perhaps to better explain what is being done:
>>
>>  /*
>>   * Timer 4 is based on a 983.04 kHz reference clock,
>>   * so dividing by 983040 gives a milli-second value.
>>   * Refactor the calculation to avoid overflow.
>>   */
>>
>>> +   return (offset + (53 * offset / 3072)) * 1000;
> 
> Thanks. I expanded on that slightly and went for:
> 
>   /*
>* Timer 4 is based on a 983.04 kHz reference clock,
>* so dividing by 983040 gives the fraction of a second,
>* so dividing by 0.983040 converts to uS.
>* Refactor the calculation to avoid overflow.
>* Finally, multiply by 1000 to give nS.
>*/
> 


Looks good, thanks.

~Ryan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again

2012-11-09 Thread David Miller
From: Andreas Larsson 
Date: Mon,  5 Nov 2012 11:53:18 +0100

> This bug-fix makes sure that of_address_to_resource is defined extern for 
> sparc
> so that the sparc-specific implementation of of_address_to_resource() is once
> again used when including include/linux/of_address.h in a sparc context. A
> number of drivers in mainline relies on this function working for sparc.
> 
> The bug was introduced in a850a7554442f08d3e910c6eeb4ee216868dda1e, 
> "of/address:
> add empty static inlines for !CONFIG_OF". Contrary to that commit title, the
> static inlines are added for !CONFIG_OF_ADRESS, and CONFIG_OF_ADRESS is never
> defined for sparc. This is good behavior for the other functions in
> include/linux/of_address.h, as the extern functions defined in
> drivers/of/address.c only gets linked when OF_ADDRESS is configured. However,
> for of_address_to_resource is that there exists a sparc-specific 
> implementation
> in arch/sparc/arch/sparc/kernel/of_device_common.c
> 
> Signed-off-by: Andreas Larsson 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] extcon : callback function to read cable property

2012-11-09 Thread anish kumar
On Fri, 2012-11-09 at 14:05 +, Tc, Jenny wrote:
> > I think that the role of extcon subsystem notify changed
> > state(attached/detached) of cable to notifiee, but if you want to add
> > property feature of cable, you should solve ambiguous issues.
> > 
> > First,
> > This patch only support the properties of charger cable but, never support
> > property of other cable. The following structure depend on only charger
> > cable. We can check it the following structure:
> > struct extcon_chrg_cbl_props {
> > enum extcon_chrgr_cbl_stat cable_state;
> > unsigned long mA;
> > };
> > 
> > I think that the patch have to support all of cables for property feature.
> 
> My suggestion is to have a structure like this 
> 
> struct  extcon_cablel_props {
>   unsigned int cable_state;
>   unsigned int data; 
Why can't it be float/long/double??
> }
> Not all cables will have more than two states. If any cable has more than two 
> states,
> the above structure makes it flexible to represent additional state and the 
> data associated
> 
> > 
> > Second,
> > Certainly, you should define common properties of all cables and specific
> > properties of each cable. The properties of charger cable should never be
> > defined only.
IMHO the extcon doesn't know anything about the cable except the state
which is currently it is in and which also is set by the provider.I am
unable to understand why should extcon provide more than what it
knows?It should just limit itself to broadcasting the cable state and
exploiting it for any other information would only lead to more
un-necessary code.
It should be same as IIO subsystem where the consumer and provider knows
before hand what information they are going to share and with what
precision and IIO core is just a way to do that.It doesn't know anything
beyond what is given by the provider.
Same is the case with driver core where individual driver sets it's own
private data and the driver core just gives that private data back to
the driver as and when it needs but parsing the private data in the
right way is up to the individual driver.

I fail to understand why is not the case here. 
> 
> Hope above structure would be enough to represent the common cable state and
> it's data. If a cable has more than two states, then extcon_update_state can 
> be used to
> notify the consumer
> 1)Provider can just toggle the "state" argument to notify the consumer that 
> cable state is 
> changed
> OR
> 2) Add one more argument  like  extcon_update_state(struct extcon_dev *edev, 
> u32 mask, u32 state1, u32 sate2)
This will cause other drivers to change such as arizona.
> If the state2 is set, then consumers can use get_cable_properties() to query 
> the cable properties. State2 need to be
> used only if the cable need to represent more than two state
> 
> > 
> > Third,
> > If we finish to decide the properties for all cables, I'd like to see a 
> > example
Why do we think that state and property is the only thing which the
consumer want to know?I am sure there would be some more properties
which would be of some interest to consumers and there is quite a
possibility that in future we might get a patch for that also.So instead
of that limiting it to just state is a good idea.
> > code.
> 
> Agreed. If we  agree on the above structure, I can modify the charging 
> subsystem patch
> to use the same structure. (https://lkml.org/lkml/2012/10/18/219). This would 
> give a reference
> for the new feature.
> 
> > 
> > You explained following changed state of USB according to Host state on
> > other patch.
> > ---
> > For USB2.0
> > 1) CONNECT (100mA)->UPDATE(500mA)->DISCONNECT(0mA)
> > 2) CONNECT (100mA)->UPDATE(500mA)->HOST SUSPEND(2.5mA/500mA)-
> > >DISCONNECT(0mA)
> > 3) CONNECT (100mA)->UPDATE(500mA)->HOST SUSPEND(2.5mA/500mA)-
> > >HOST RESUME(500mA)->DISCONNECT(0mA)
> > 
> > For USB 3.0
> > 4) CONNECT (150mA)->UPDATE(900mA)->DISCONNECT(0mA)
> > 5) CONNECT (150mA)->UPDATE(900mA)-> HOST SUSPEND(2.5mA/900mA)-
> > >DISCONNECT(0mA)
> > 6) CONNECT (100mA)->UPDATE(900mA)->HOST SUSPEND(2.5mA/900mA)-
> > >HOST RESUME(900mA)->DISCONNECT(0mA)
> > ---
> > 
> > I have a question. Can the provider device driver(e.g., extcon-max77693.c/
> > extcon-max8997.c) detect the changed state of host? I'd like to see the
> > example device driver that the provider device driver detect changed state
> > of host.
> > Could you provide example device driver?
> 
> Good question. The OTG drivers are capable of identifying the SUSPEND event.
> System cannot setup SDP (USB host) charging with maximum charging current - 
> 500mA
> (USB2.0/ 900mA(USB 3)) without enumerating the USB. The USB enumeration can be
> done only with a USB/OTG driver. IMHO the above extcon drivers
> (extcon-max77693.c/ extcon-max8997.c) are not capable of doing the USB 
> enumeration
> and identifying the charge current. They can just identify the charger type - 
> 

[RESEND] [PATCH] Documentation/java.txt: add Java 7 support

2012-11-09 Thread Jonathan Callen
The sample wrapper currently fails on some Java 7 .class files.  This
updates the wrapper to properly handle those files.

Signed-off-by: Jonathan Callen 
---
 Documentation/java.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/java.txt b/Documentation/java.txt
index e6a7232..4180205 100644
--- a/Documentation/java.txt
+++ b/Documentation/java.txt
@@ -188,6 +188,9 @@ shift
 #define CP_METHODREF 10
 #define CP_INTERFACEMETHODREF 11
 #define CP_NAMEANDTYPE 12
+#define CP_METHODHANDLE 15
+#define CP_METHODTYPE 16
+#define CP_INVOKEDYNAMIC 18
 
 /* Define some commonly used error messages */
 
@@ -242,14 +245,19 @@ void skip_constant(FILE *classfile, u_int16_t *cur)
break;
case CP_CLASS:
case CP_STRING:
+   case CP_METHODTYPE:
seekerr = fseek(classfile, 2, SEEK_CUR);
break;
+   case CP_METHODHANDLE:
+   seekerr = fseek(classfile, 3, SEEK_CUR);
+   break;
case CP_INTEGER:
case CP_FLOAT:
case CP_FIELDREF:
case CP_METHODREF:
case CP_INTERFACEMETHODREF:
case CP_NAMEANDTYPE:
+   case CP_INVOKEDYNAMIC:
seekerr = fseek(classfile, 4, SEEK_CUR);
break;
case CP_LONG:
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NULL pointer dereference at task_numa_fault+0x36/0x140

2012-11-09 Thread Shuah Khan
I ran into NULL pointer dereference at task_numa_fault+0x36/0x140 when I
was installing guest OS in a vm in kvm virt env.

My test system doesn't have NUMA config and runs with Fake NUMA node:

[0.00] ACPI: Local APIC address 0xfee0
[0.00] No NUMA configuration found
[0.00] Faking a node at [mem
0x-0x7fdf]

Sharing my analysis of the problem and offer to help with re-test of any
fixes.

Further debugging narrowed the NULL pointer dereference to line 844 of
"kernel/sched/fair.c: int seq = ACCESS_ONCE(p->mm->numa_scan_seq);

(gdb) x/10i task_numa_fault+0x36
   0x81093f36 : mov0x358(%rax),%r8d
   0x81093f3d : cmp0x768(%rbx),%r8d
   0x81093f44 :
je 0x81093fc0 
   0x81093f46 :
mov0xc48ba0(%rip),%esi# 0x81cdcaec
   0x81093f4c : mov%r8d,0x768(%rbx)
   0x81093f53 : test   %esi,%esi
   0x81093f55 :
jle0x81093fc0 
   0x81093f57 : mov$0x,%esi
   0x81093f5c : xor%edi,%edi
   0x81093f5e : xor%edx,%edx
(gdb) info line *0x81093f36
Line 844 of "kernel/sched/fair.c"
   starts at address 0x81093f2f 
   and ends at 0x81093f3d .

The following two commits change the way this code is structured and the
second commit looks like is the one that introduced the numm pointer
access possibly by removing struct task_struct *p = current;

+static void task_numa_placement(struct task_struct *p)
 {
unsigned long faults, max_faults = 0;
-   struct task_struct *p = current;
int node, max_node = -1;
int seq = ACCESS_ONCE(p->mm->numa_scan_seq);
 

commit f3bd8842a897685269b3fa48ad6f9d5590be67ab
Author: Peter Zijlstra 
Date:   Wed Oct 10 14:13:15 2012 +0200

sched/numa: Simplify task_numa_fault()


commit 617fe041711635713ec52ed5f36d6f46f38d83f2
Author: Peter Zijlstra 
Date:   Sun Oct 14 21:30:07 2012 +0200

sched/numa/mm: Fix and further simplify fault accounting

The THP alloc failure path did double accounting .. fix this.

While we're at it, merge task_numa_placement() into
task_numa_fault()
so that there's only a single call from the fault path.

Signed-off-by: Peter Zijlstra 
Link:
http://lkml.kernel.org/n/tip-hz6rnixgr665fv0offesj...@git.kernel.org
Signed-off-by: Ingo Molnar 

Also fix numa_scan_seq off by one.

Signed-off-by: Peter Zijlstra 
Link:
http://lkml.kernel.org/n/tip-dvswxo34oaiibm06zyvrv...@git.kernel.org
Signed-off-by: Ingo Molnar 



Panic log:

[30155.084514] BUG: unable to handle kernel NULL pointer dereference at
0358
[30155.084568] IP: [] task_numa_fault+0x36/0x140
[30155.084597] PGD 0
[30155.084611] Oops:  [#1] SMP
[30155.084635] Modules linked in: ip6table_filter ip6_tables ebtable_nat
ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables
x_tables bridge stp llc bnep rfcomm bluetooth arc4 iwldvm
snd_hda_codec_analog mac80211 snd_hda_intel snd_hda_codec radeon
snd_hwdep coretemp snd_pcm kvm_intel snd_seq_midi iwlwifi kvm
snd_rawmidi snd_seq_midi_event cfg80211 snd_seq ttm pata_pcmcia
drm_kms_helper drm snd_timer pcmcia snd_seq_device binfmt_misc snd
psmouse tpm_infineon yenta_socket ppdev joydev soundcore hp_wmi
snd_page_alloc dm_multipath hp_accel lpc_ich parport_pc pcmcia_rsrc
pcmcia_core video sparse_keymap serio_raw i2c_algo_bit wmi mac_hid
tpm_tis lis3lv02d input_polldev microcode lp parport firewire_ohci
firewire_core crc_itu_t sdhci_pci sdhci e1000e
[30155.085191] CPU 1
[30155.085204] Pid: 33, comm: ksmd Not tainted 3.7.0-rc2-next-20121026+
#5 Hewlett-Packard HP EliteBook 6930p/30DC
[30155.085241] RIP: 0010:[]  []
task_numa_fault+0x36/0x140
[30155.085274] RSP: 0018:88003076fc68  EFLAGS: 00010286
[30155.085297] RAX:  RBX: 880030761730 RCX:

[30155.085323] RDX:  RSI:  RDI:
88002fa89fe0
[30155.085349] RBP: 88003076fc88 R08: 88007fa96b80 R09:

[30155.086737] R10: 88002fa89fd8 R11:  R12:
0001
[30155.088006] R13:  R14: 880064a2a868 R15:
7fd1e1a47000
[30155.088006] FS:  () GS:88007fa8()
knlGS:
[30155.088006] CS:  0010 DS:  ES:  CR0: 8005003b
[30155.088006] CR2: 0358 CR3: 01c0b000 CR4:
000427e0
[30155.088006] DR0:  DR1:  DR2:

[30155.088006] DR3:  DR6: 0ff0 DR7:
0400
[30155.088006] Process ksmd (pid: 33, threadinfo 88003076e000, task
880030761730)
[30155.088006] Stack:
[30155.088006]  880078276f20 051fb120 880078276f20

[30155.088006]  88003076fd48 81154c59 851fb025
880064a4

Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
Hi,

I'm having an issue booting current linux-next kernels on my test
machines. Userspace crashes when it's supposed to pivot to the rootfs.
With the loglevel=8 kernel parameter, the last messages I see are:

Checking root filesystem in pivot_root init.
[6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 us]
[6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
[6.292302] traps: hotplug[1633] general protection ip:f767c06b
sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]

I ran a bisection and it turns out that
e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
fork/vfork/clone) is the commit breaking my setup. When reverting
that, I am able to boot linux-next (or mmotm, which is what I was
trying to do in the first place) without issues.

Sorry for not having a more complete root cause at the moment - I'm
lacking some context as to what the change is trying to do.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tty_ldisc_hangup: waiting (init) for ttyS0 took too long, but we keep waiting...

2012-11-09 Thread Sasha Levin
Hi all,

I'm seeing lots of cases when my fuzzing session hangs with a message that
starts with:

[  104.670841] tty_ldisc_hangup: waiting (init) for ttyS0 took too long, but we 
keep waiting...

And continues with a hung task spew, such as:

[  242.990329] INFO: task init:1 blocked for more than 120 seconds.
[  242.990955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  242.991527] initD 8800132c  3408 1  0 0x0002
[  242.992156]  88001327dc18 0002 88001327dbd8 
81152f95
[  242.992784]  88001328 88001327dfd8 88001327dfd8 
88001327dfd8
[  242.994200]  8800132c 88001328 880013280910 
7fff
[  242.995780] Call Trace:
[  242.996429]  [] ? sched_clock_local+0x25/0xa0
[  242.997704]  [] schedule+0x55/0x60
[  242.999864]  [] schedule_timeout+0x45/0x360
[  243.008415]  [] ? _raw_spin_unlock_irqrestore+0x5d/0xb0
[  243.008980]  [] ? trace_hardirqs_on+0xd/0x10
[  243.009756]  [] ? _raw_spin_unlock_irqrestore+0x84/0xb0
[  243.010662]  [] ? prepare_to_wait+0x77/0x90
[  243.011452]  [] tty_ldisc_wait_idle.isra.6+0x76/0xb0
[  243.012314]  [] ? abort_exclusive_wait+0xb0/0xb0
[  243.013157]  [] tty_ldisc_hangup+0x1cb/0x320
[  243.013927]  [] ? __tty_hangup+0x122/0x430
[  243.014687]  [] __tty_hangup+0x12a/0x430
[  243.015410]  [] ? _raw_spin_unlock_irqrestore+0x84/0xb0
[  243.016321]  [] disassociate_ctty+0x6a/0x230
[  243.017112]  [] do_exit+0x4ea/0xbd0
[  243.017793]  [] ? rcu_user_exit+0xc5/0xf0
[  243.018549]  [] ? trace_hardirqs_on+0xd/0x10
[  243.019339]  [] do_group_exit+0x84/0xd0
[  243.020109]  [] sys_exit_group+0x12/0x20
[  243.020815]  [] tracesys+0xe1/0xe6
[  243.021607] 1 lock held by init/1:
[  243.022079]  #0:  (>ldisc_mutex){+.+...}, at: [] 
tty_ldisc_hangup+0x122/0x320

All of this on latest -next, inside a KVM tools guest.

Help appreciated.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Al Viro
On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote:
> Hi,
> 
> I'm having an issue booting current linux-next kernels on my test
> machines. Userspace crashes when it's supposed to pivot to the rootfs.
> With the loglevel=8 kernel parameter, the last messages I see are:
> 
> Checking root filesystem in pivot_root init.
> [6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 us]
> [6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
> [6.292302] traps: hotplug[1633] general protection ip:f767c06b
> sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]
> 
> I ran a bisection and it turns out that
> e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
> fork/vfork/clone) is the commit breaking my setup. When reverting
> that, I am able to boot linux-next (or mmotm, which is what I was
> trying to do in the first place) without issues.
> 
> Sorry for not having a more complete root cause at the moment - I'm
> lacking some context as to what the change is trying to do.

Hmm...  32bit native, presumably?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 8:51 PM, Al Viro  wrote:
> On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote:
>> Hi,
>>
>> I'm having an issue booting current linux-next kernels on my test
>> machines. Userspace crashes when it's supposed to pivot to the rootfs.
>> With the loglevel=8 kernel parameter, the last messages I see are:
>>
>> Checking root filesystem in pivot_root init.
>> [6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 us]
>> [6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
>> [6.292302] traps: hotplug[1633] general protection ip:f767c06b
>> sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]
>>
>> I ran a bisection and it turns out that
>> e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
>> fork/vfork/clone) is the commit breaking my setup. When reverting
>> that, I am able to boot linux-next (or mmotm, which is what I was
>> trying to do in the first place) without issues.
>>
>> Sorry for not having a more complete root cause at the moment - I'm
>> lacking some context as to what the change is trying to do.
>
> Hmm...  32bit native, presumably?

This is running on a x86_64 system; I believe the userspace binaries
should be 64-bit as well.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf: call perf_event_comm under task_lock to fix suspicious rcu usage

2012-11-09 Thread Hannes Frederic Sowa
Following RCU warning showed up while executing a shebang-script under
perf-record (could even be an empty script) on a 3.7-rc4 stable kernel:

  [   32.185108] 
  [   32.185332] ===
  [   32.185602] [ INFO: suspicious RCU usage. ]
  [   32.185903] 3.7.0-rc4 #1 Not tainted
  [   32.186021] ---
  [   32.186021] include/linux/cgroup.h:566 suspicious rcu_dereference_check() 
usage!
  [   32.186021] 
  [   32.186021] other info that might help us debug this:
  [   32.186021] 
  [   32.186021] 
  [   32.186021] rcu_scheduler_active = 1, debug_locks = 0
  [   32.186021] 1 lock held by empty.sh/556:
  [   32.186021]  #0:  (>cred_guard_mutex){+.+.+.}, at: 
[] prepare_bprm_creds+0x36/0x80
  [   32.186021] 
  [   32.186021] stack backtrace:
  [   32.186021] Pid: 556, comm: empty.sh Not tainted 3.7.0-rc4 #1
  [   32.186021] Call Trace:
  [   32.186021]  [] lockdep_rcu_suspicious+0xfd/0x130
  [   32.186021]  [] perf_event_comm+0x436/0x610
  [   32.186021]  [] ? trace_hardirqs_off+0xd/0x10
  [   32.186021]  [] ? local_clock+0x6f/0x80
  [   32.186021]  [] ? lock_release_holdtime.part.26+0xf/0x180
  [   32.186021]  [] set_task_comm+0x73/0x180
  [   32.186021]  [] setup_new_exec+0x9a/0x210
  [   32.186021]  [] load_elf_binary+0x3e3/0x1ab0
  [   32.186021]  [] ? sched_clock_local+0x25/0xa0
  [   32.186021]  [] ? sched_clock_cpu+0xa8/0x120
  [   32.186021]  [] ? trace_hardirqs_off+0xd/0x10
  [   32.186021]  [] ? local_clock+0x6f/0x80
  [   32.186021]  [] ? load_elf_library+0x240/0x240
  [   32.186021]  [] ? load_elf_library+0x240/0x240
  [   32.186021]  [] search_binary_handler+0x194/0x4f0
  [   32.186021]  [] ? search_binary_handler+0x5f/0x4f0
  [   32.186021]  [] ? compat_sys_ioctl+0x1510/0x1510
  [   32.186021]  [] load_script+0x294/0x2c0
  [   32.186021]  [] ? lock_release_holdtime.part.26+0xf/0x180
  [   32.186021]  [] ? compat_sys_ioctl+0x1510/0x1510
  [   32.186021]  [] search_binary_handler+0x194/0x4f0
  [   32.186021]  [] ? search_binary_handler+0x5f/0x4f0
  [   32.186021]  [] do_execve_common.isra.25+0x50b/0x5b0
  [   32.186021]  [] ? do_execve_common.isra.25+0x12a/0x5b0
  [   32.186021]  [] do_execve+0x1b/0x20
  [   32.186021]  [] sys_execve+0x54/0x80
  [   32.186021]  [] stub_execve+0x69/0xc0

I think this dereference qualifies for the task_lock exception (as noted
in kernel/cgroup.c), thus this patch ensures calling perf_event_comm
before giving up the task_lock.

Changelog -v2 (since <20121103235758.gd18...@order.stressinduktion.org>):
  1) rebased to 3.7-rc4
  2) slightly improved/updated commit msg and added more people to Cc

Cc: Peter Zijlstra 
Cc: Paul Mackerras 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Signed-off-by: Hannes Frederic Sowa 
---
 fs/exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 0039055..a961b9d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1038,8 +1038,8 @@ void set_task_comm(struct task_struct *tsk, char *buf)
memset(tsk->comm, 0, TASK_COMM_LEN);
wmb();
strlcpy(tsk->comm, buf, sizeof(tsk->comm));
-   task_unlock(tsk);
perf_event_comm(tsk);
+   task_unlock(tsk);
 }
 
 static void filename_to_taskname(char *tcomm, const char *fn, unsigned int len)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Al Viro
On Fri, Nov 09, 2012 at 08:57:58PM -0800, Michel Lespinasse wrote:
> On Fri, Nov 9, 2012 at 8:51 PM, Al Viro  wrote:
> > On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote:
> >> Hi,
> >>
> >> I'm having an issue booting current linux-next kernels on my test
> >> machines. Userspace crashes when it's supposed to pivot to the rootfs.
> >> With the loglevel=8 kernel parameter, the last messages I see are:
> >>
> >> Checking root filesystem in pivot_root init.
> >> [6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 us]
> >> [6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
> >> [6.292302] traps: hotplug[1633] general protection ip:f767c06b
> >> sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]
> >>
> >> I ran a bisection and it turns out that
> >> e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
> >> fork/vfork/clone) is the commit breaking my setup. When reverting
> >> that, I am able to boot linux-next (or mmotm, which is what I was
> >> trying to do in the first place) without issues.
> >>
> >> Sorry for not having a more complete root cause at the moment - I'm
> >> lacking some context as to what the change is trying to do.
> >
> > Hmm...  32bit native, presumably?
> 
> This is running on a x86_64 system; I believe the userspace binaries
> should be 64-bit as well.

Curious...  After the second look at that sucker, it seems that you have
32bit hotplug(8) in there, and yes, it's clearly a 64bit kernel...  Could
you check which binary it is and whether it's really 32bit or not?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Michel Lespinasse
On Fri, Nov 9, 2012 at 9:33 PM, Al Viro  wrote:
> On Fri, Nov 09, 2012 at 08:57:58PM -0800, Michel Lespinasse wrote:
>> On Fri, Nov 9, 2012 at 8:51 PM, Al Viro  wrote:
>> > On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote:
>> >> Hi,
>> >>
>> >> I'm having an issue booting current linux-next kernels on my test
>> >> machines. Userspace crashes when it's supposed to pivot to the rootfs.
>> >> With the loglevel=8 kernel parameter, the last messages I see are:
>> >>
>> >> Checking root filesystem in pivot_root init.
>> >> [6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 us]
>> >> [6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
>> >> [6.292302] traps: hotplug[1633] general protection ip:f767c06b
>> >> sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]
>> >>
>> >> I ran a bisection and it turns out that
>> >> e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
>> >> fork/vfork/clone) is the commit breaking my setup. When reverting
>> >> that, I am able to boot linux-next (or mmotm, which is what I was
>> >> trying to do in the first place) without issues.
>> >>
>> >> Sorry for not having a more complete root cause at the moment - I'm
>> >> lacking some context as to what the change is trying to do.
>> >
>> > Hmm...  32bit native, presumably?
>>
>> This is running on a x86_64 system; I believe the userspace binaries
>> should be 64-bit as well.
>
> Curious...  After the second look at that sucker, it seems that you have
> 32bit hotplug(8) in there, and yes, it's clearly a 64bit kernel...  Could
> you check which binary it is and whether it's really 32bit or not?

Looks like /sbin/hotplug is a script on this system, using /bin/bash
as the interpreter, and /bin/bash is ELF 32-bit LSB executable.
(wow, I had no idea, I thought more of that system was 64-bits :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Staging: comedi: drivers: replaced printk with dev_dbg

2012-11-09 Thread Arpith Easow Alexander
This is a patch to the vmk80xx.c file that replaces the printk with dev_dbg.
This fixes the warnings found by the checkpatch.pl tool.

Signed-off-by: Arpith Easow Alexander 
---
 drivers/staging/comedi/drivers/vmk80xx.c |   30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/comedi/drivers/vmk80xx.c 
b/drivers/staging/comedi/drivers/vmk80xx.c
index df277aa..6eb5361 100644
--- a/drivers/staging/comedi/drivers/vmk80xx.c
+++ b/drivers/staging/comedi/drivers/vmk80xx.c
@@ -131,10 +131,10 @@ static int dbgcm = 1;
 static int dbgcm;
 #endif
 
-#define dbgcm(fmt, arg...) \
+#define dbgcm(dev, fmt, arg...) \
 do {   \
if (dbgcm) \
-   printk(KERN_DEBUG fmt, ##arg); \
+   dev_dbg(dev, fmt, ##arg); \
 } while (0)
 
 enum vmk80xx_model {
@@ -213,8 +213,9 @@ static void vmk80xx_tx_callback(struct urb *urb)
 
if (stat && !(stat == -ENOENT
  || stat == -ECONNRESET || stat == -ESHUTDOWN))
-   dbgcm("comedi#: vmk80xx: %s - nonzero urb status (%d)\n",
- __func__, stat);
+   dbgcm(&(dev->udev->dev),
+   "comedi#: vmk80xx: %s - nonzero urb status (%d)\n",
+   __func__, stat);
 
if (!test_bit(TRANS_OUT_BUSY, >flags))
return;
@@ -237,7 +238,8 @@ static void vmk80xx_rx_callback(struct urb *urb)
case -ESHUTDOWN:
break;
default:
-   dbgcm("comedi#: vmk80xx: %s - nonzero urb status (%d)\n",
+   dbgcm(&(dev->udev->dev),
+ "comedi#: vmk80xx: %s - nonzero urb status (%d)\n",
  __func__, stat);
goto resubmit;
}
@@ -1371,14 +1373,16 @@ static int vmk80xx_usb_probe(struct usb_interface *intf,
 
if (dev->board.model == VMK8061_MODEL) {
vmk80xx_read_eeprom(dev, IC3_VERSION);
-   printk(KERN_INFO "comedi#: vmk80xx: %s\n", dev->fw.ic3_vers);
+   dev_dbg(&(dev->udev->dev),
+   "comedi#: vmk80xx: %s\n", dev->fw.ic3_vers);
 
if (vmk80xx_check_data_link(dev)) {
vmk80xx_read_eeprom(dev, IC6_VERSION);
-   printk(KERN_INFO "comedi#: vmk80xx: %s\n",
+   dev_dbg(&(dev->udev->dev), "comedi#: vmk80xx: %s\n",
   dev->fw.ic6_vers);
} else {
-   dbgcm("comedi#: vmk80xx: no conn. to CPU\n");
+   dbgcm(&(dev->udev->dev),
+   "comedi#: vmk80xx: no conn. to CPU\n");
}
}
 
@@ -1387,8 +1391,9 @@ static int vmk80xx_usb_probe(struct usb_interface *intf,
 
dev->probed = 1;
 
-   printk(KERN_INFO "comedi#: vmk80xx: board #%d [%s] now attached\n",
-  dev->count, dev->board.name);
+   dev_dbg(&(dev->udev->dev),
+   "comedi#: vmk80xx: board #%d [%s] now attached\n",
+   dev->count, dev->board.name);
 
mutex_unlock(_mutex);
 
@@ -1422,8 +1427,9 @@ static void vmk80xx_usb_disconnect(struct usb_interface 
*intf)
kfree(dev->usb_rx_buf);
kfree(dev->usb_tx_buf);
 
-   printk(KERN_INFO "comedi#: vmk80xx: board #%d [%s] now detached\n",
-  dev->count, dev->board.name);
+   dev_dbg(&(dev->udev->dev),
+   "comedi#: vmk80xx: board #%d [%s] now detached\n",
+   dev->count, dev->board.name);
 
up(>limit_sem);
mutex_unlock(_mutex);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Issues with "x86, um: switch to generic fork/vfork/clone" commit

2012-11-09 Thread Al Viro
On Fri, Nov 09, 2012 at 09:47:54PM -0800, Michel Lespinasse wrote:
> On Fri, Nov 9, 2012 at 9:33 PM, Al Viro  wrote:
> > On Fri, Nov 09, 2012 at 08:57:58PM -0800, Michel Lespinasse wrote:
> >> On Fri, Nov 9, 2012 at 8:51 PM, Al Viro  wrote:
> >> > On Fri, Nov 09, 2012 at 08:36:53PM -0800, Michel Lespinasse wrote:
> >> >> Hi,
> >> >>
> >> >> I'm having an issue booting current linux-next kernels on my test
> >> >> machines. Userspace crashes when it's supposed to pivot to the rootfs.
> >> >> With the loglevel=8 kernel parameter, the last messages I see are:
> >> >>
> >> >> Checking root filesystem in pivot_root init.
> >> >> [6.252717] usb 2-1: link qh256-0001/880853ec9ab8 start 1 [1/0 
> >> >> us]
> >> >> [6.259419] hub 2-1:1.0: state 7 ports 8 chg  evt 
> >> >> [6.292302] traps: hotplug[1633] general protection ip:f767c06b
> >> >> sp:ffbb2d1c error:0 in libc-2.3.6.so[f7652000+126000]
> >> >>
> >> >> I ran a bisection and it turns out that
> >> >> e52d03a3775841cc68d0ea9d86f2f09b603c41e6 (x86, um: switch to generic
> >> >> fork/vfork/clone) is the commit breaking my setup. When reverting
> >> >> that, I am able to boot linux-next (or mmotm, which is what I was
> >> >> trying to do in the first place) without issues.
> >> >>
> >> >> Sorry for not having a more complete root cause at the moment - I'm
> >> >> lacking some context as to what the change is trying to do.
> >> >
> >> > Hmm...  32bit native, presumably?
> >>
> >> This is running on a x86_64 system; I believe the userspace binaries
> >> should be 64-bit as well.
> >
> > Curious...  After the second look at that sucker, it seems that you have
> > 32bit hotplug(8) in there, and yes, it's clearly a 64bit kernel...  Could
> > you check which binary it is and whether it's really 32bit or not?
> 
> Looks like /sbin/hotplug is a script on this system, using /bin/bash
> as the interpreter, and /bin/bash is ELF 32-bit LSB executable.
> (wow, I had no idea, I thought more of that system was 64-bits :)

I think I see what's going on there.  It's PTREGSCALL blindly used for
clone wrapper in ia32entry.S.  FWIW, it's wrong for all of those
suckers, anyway:
* fork/clone/vfork need to save extra registers, but don't need
to restore them; after unification we don't need pt_regs argument for any
of those - for fork/vfork it's useless, for clone it breaks things.
* execve doesn't need pt_regs argument; harmless, but useless.
* for sigaltstack() we simply need to get rid of stupid pt_regs
argument, along with the wrapper; current_pt_regs()->sp is all it needs.
* for sigreturn/rt_sigreturn we need to restore extra registers,
but we do *not* need to save them; just leave the space on stack.  And
no need to pass pt_regs either - it'll be current_pt_regs() anyway.
* iopl() doesn't need to save/restore extras and it doesn't need
pt_regs argument - it's going to be current_pt_regs().

On top of all that, there's an extra piece of crap - different order of
arguments for native and compat clone.

Could you verify that this on top of for-next gets the things working again?
It's a very lazy way to deal with that (we don't want to bother with
restoring extras, at the very least), but the rest can go separately (and
is shared with mainline, unlike that one).  It seems to be working here,
but I'd like to see your ACK as well.  If everything works, it'll get
folded into the offending commit...

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 633649e..32e6f05 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -467,11 +467,16 @@ GLOBAL(\label)
PTREGSCALL stub32_sigaltstack, sys32_sigaltstack, %rdx
PTREGSCALL stub32_execve, compat_sys_execve, %rcx
PTREGSCALL stub32_fork, sys_fork, %rdi
-   PTREGSCALL stub32_clone, sys_clone, %rdx
PTREGSCALL stub32_vfork, sys_vfork, %rdi
PTREGSCALL stub32_iopl, sys_iopl, %rsi
 
ALIGN
+GLOBAL(stub32_clone)
+   leaq sys_clone(%rip),%rax
+   mov %r8, %rcx
+   jmp  ia32_ptregs_common 
+
+   ALIGN
 ia32_ptregs_common:
popq %r11
CFI_ENDPROC

Anyway, below is the minimal fix on top of for-next; I'll fold it
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 8/7] pppoatm: fix missing wakeup in pppoatm_send()

2012-11-09 Thread David Woodhouse
On Fri, 2012-11-09 at 16:30 -0500, David Miller wrote:
> I don't know what to do with this patch because I don't have any
> context whatsoever.

I sent two replies to Krzysztof's series starting with [PATCH v3 0/7]  
in Message-Id: <1352240222-363-1-git-send-email-krzys...@podlesie.net>

The first was pointing out a problem; the second was a [PATCH v3 8/7]
which fixed the problem I'd pointed out. It wasn't clear to me that more
context would be needed. In particular, [PATCH v3 8/7] was a reply to
0/7, just as the other patches ##1-7 had been.

> So I'm tossing it, please resubmit it when it's meant to be
> applied, and with some context.

That's OK. I was hoping for an ack from Chas and/or Krzysztof,
especially as I hadn't tested my patch. So hopefully there'll be a v4
series of 8 patches, including this one... and all from the same person,
which makes it slightly easier to follow :)

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


[ PATCH RESEND ] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-09 Thread Pandarathil, Vijaymohan R
When an error is detected on a PCIe device which does not have an
AER-aware driver, prevent AER infrastructure from reporting
successful error recovery.

This is because the report_error_detected() function that gets
called in the first phase of recovery process allows forward
progress even when the driver for the device does not have AER
capabilities. It seems that all callbacks (in pci_error_handlers
structure) registered by drivers that gets called during error
recovery are not mandatory. So the intention of the infrastructure
design seems to be to allow forward progress even when a specific
callback has not been registered by a driver. However, if error
handler structure itself has not been registered, it doesn't make
sense to allow forward progress.

As a result of the current design, in the case of a single device
having an AER-unaware driver or in the case of any function in a
multi-function card having an AER-unaware driver, a successful
recovery is reported.

Typical scenario this happens is when a PCI device is detached
from a KVM host and the pci-stub driver on the host claims the
device. The pci-stub driver does not have error handling capabilities
but the AER infrastructure still reports that the device recovered
successfully.

The changes proposed here leaves the device in an unrecovered state
if the driver for the device or for any function in a multi-function
card does not have error handler structure registered. This reflects
the true state of the device and prevents any partial recovery (or no
recovery at all) reported as successful.

Please also see comments from Linas Vepstas at the following link
http://www.spinics.net/lists/linux-pci/msg18572.html

Reviewed-by: Linas Vepstas  gmail.com>
Reviewed-by: Myron Stowe  redhat.com>
Signed-off-by: Vijay Mohan Pandarathil  hp.com>

---

drivers/pci/pcie/aer/aerdrv_core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
b/drivers/pci/pcie/aer/aerdrv_core.c
index 06bad96..030b229 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -215,6 +215,12 @@ static int report_error_detected(struct pci_dev *dev, void 
*data)
 
dev->error_state = result_data->state;
 
+   if ((!dev->driver || !dev->driver->err_handler) &&
+   !(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE)) {
+   dev_info(>dev, "AER: Error detected but no driver has 
claimed this device or the driver is AER-unaware\n");
+   result_data->result = PCI_ERS_RESULT_NONE;
+   return 1;
+   }
if (!dev->driver ||
!dev->driver->err_handler ||
!dev->driver->err_handler->error_detected) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    5   6   7   8   9   10   11   >