Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
On Mon, May 21, 2018 at 4:10 PM, Mark Rutland  wrote:
> On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
>> Hi Ganapat,
>>
>>
>> Sorry for the delay in replying; I was away most of last week.
>>
>> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
>> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
>> > wrote:
>> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
>> > > wrote:
>> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>
>> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore)
>> > >>> +{
>> > >>> + int counter;
>> > >>> +
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> > >>> + pmu_uncore->uncore_dev->max_counters);
>> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return -ENOSPC;
>> > >>> + }
>> > >>> + set_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return counter;
>> > >>> +}
>> > >>> +
>> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore,
>> > >>> + int counter)
>> > >>> +{
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> +}
>> > >>
>> > >> I don't believe that locking is required in either of these, as the perf
>> > >> core serializes pmu::add() and pmu::del(), where these get called.
>> >
>> > without this locking, i am seeing "BUG: scheduling while atomic" when
>> > i run perf with more events together than the maximum counters
>> > supported
>>
>> Did you manage to get to the bottom of this?
>>
>> Do you have a backtrace?
>>
>> It looks like in your latest posting you reserve counters through the
>> userspace ABI, which doesn't seem right to me, and I'd like to
>> understand the problem.
>
> Looks like I misunderstood -- those are still allocated kernel-side.
>
> I'll follow that up in the v5 posting.

please review v5.
>
> Thanks,
> Mark.

thanks
Ganapat


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
On Mon, May 21, 2018 at 4:10 PM, Mark Rutland  wrote:
> On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
>> Hi Ganapat,
>>
>>
>> Sorry for the delay in replying; I was away most of last week.
>>
>> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
>> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
>> > wrote:
>> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
>> > > wrote:
>> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>
>> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore)
>> > >>> +{
>> > >>> + int counter;
>> > >>> +
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> > >>> + pmu_uncore->uncore_dev->max_counters);
>> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return -ENOSPC;
>> > >>> + }
>> > >>> + set_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> + return counter;
>> > >>> +}
>> > >>> +
>> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
>> > >>> *pmu_uncore,
>> > >>> + int counter)
>> > >>> +{
>> > >>> + raw_spin_lock(_uncore->lock);
>> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
>> > >>> + raw_spin_unlock(_uncore->lock);
>> > >>> +}
>> > >>
>> > >> I don't believe that locking is required in either of these, as the perf
>> > >> core serializes pmu::add() and pmu::del(), where these get called.
>> >
>> > without this locking, i am seeing "BUG: scheduling while atomic" when
>> > i run perf with more events together than the maximum counters
>> > supported
>>
>> Did you manage to get to the bottom of this?
>>
>> Do you have a backtrace?
>>
>> It looks like in your latest posting you reserve counters through the
>> userspace ABI, which doesn't seem right to me, and I'd like to
>> understand the problem.
>
> Looks like I misunderstood -- those are still allocated kernel-side.
>
> I'll follow that up in the v5 posting.

please review v5.
>
> Thanks,
> Mark.

thanks
Ganapat


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
Hi Mark,

On Mon, May 21, 2018 at 4:25 PM, Mark Rutland  wrote:
> On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
>> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
>> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>
>> >> + *
>> >> + *  L3 Tile and DMC channel selection is through SMC call
>> >> + *  SMC call arguments,
>> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
>> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
>> >> + *   x2 = Node id
>> >
>> > How do we map Linux node IDs to the firmware's view of node IDs?
>> >
>> > I don't believe the two are necessarily the same -- Linux's node IDs are
>> > a Linux-specific construct.
>>
>> both are same, it is numa node id from ACPI/firmware.
>
> I am very wary about assuming that the Linux nid will always be the same
> as the ACPI node id.
>
> For that to *potentially* be true, this driver should depend on
> CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
> node id will always be NUMA_NO_NODE.

ok, i can check the node id which we get from ACPI helpers in probe.
if it is NUMA_NO_NODE, I will init first socket uncore only and nid
param to fw is always zero?

>
> I would be *much* happier if we had an explicit mapping somewhere to the
> ID the FW expects.
>
>> > It would be much nicer if we could pass something based on the MPIDR,
>> > which is a known HW construct, or if this implicitly affected the
>> > current node.
>>
>> IMO,  node id is sufficient.
>
> I agree that *a* node ID is sufficient, I just don't think that we're
> guaranteed to have the specific node ID the FW wants.

for thunderx2 which is 2 socket only platform, pxm and nid should be
same(either 0 or 1)
however, i can send PXM id(node_to_pxm) to firmware to make it more sane.

>
>> > It would be vastly more sane for this to not be muxed at all. :/
>>
>> i am helpless due to crappy hw design!
>
> I'm certainly not blaming you for this! :)
>
> I hope the HW designers don't make the same mistake in future, though...
>
> Thanks,
> Mark.

thanks
Ganapat


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Ganapatrao Kulkarni
Hi Mark,

On Mon, May 21, 2018 at 4:25 PM, Mark Rutland  wrote:
> On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
>> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
>> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>
>> >> + *
>> >> + *  L3 Tile and DMC channel selection is through SMC call
>> >> + *  SMC call arguments,
>> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
>> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
>> >> + *   x2 = Node id
>> >
>> > How do we map Linux node IDs to the firmware's view of node IDs?
>> >
>> > I don't believe the two are necessarily the same -- Linux's node IDs are
>> > a Linux-specific construct.
>>
>> both are same, it is numa node id from ACPI/firmware.
>
> I am very wary about assuming that the Linux nid will always be the same
> as the ACPI node id.
>
> For that to *potentially* be true, this driver should depend on
> CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
> node id will always be NUMA_NO_NODE.

ok, i can check the node id which we get from ACPI helpers in probe.
if it is NUMA_NO_NODE, I will init first socket uncore only and nid
param to fw is always zero?

>
> I would be *much* happier if we had an explicit mapping somewhere to the
> ID the FW expects.
>
>> > It would be much nicer if we could pass something based on the MPIDR,
>> > which is a known HW construct, or if this implicitly affected the
>> > current node.
>>
>> IMO,  node id is sufficient.
>
> I agree that *a* node ID is sufficient, I just don't think that we're
> guaranteed to have the specific node ID the FW wants.

for thunderx2 which is 2 socket only platform, pxm and nid should be
same(either 0 or 1)
however, i can send PXM id(node_to_pxm) to firmware to make it more sane.

>
>> > It would be vastly more sane for this to not be muxed at all. :/
>>
>> i am helpless due to crappy hw design!
>
> I'm certainly not blaming you for this! :)
>
> I hope the HW designers don't make the same mistake in future, though...
>
> Thanks,
> Mark.

thanks
Ganapat


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >> + *
> >> + *  L3 Tile and DMC channel selection is through SMC call
> >> + *  SMC call arguments,
> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
> >> + *   x2 = Node id
> >
> > How do we map Linux node IDs to the firmware's view of node IDs?
> >
> > I don't believe the two are necessarily the same -- Linux's node IDs are
> > a Linux-specific construct.
> 
> both are same, it is numa node id from ACPI/firmware.

I am very wary about assuming that the Linux nid will always be the same
as the ACPI node id.

For that to *potentially* be true, this driver should depend on
CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
node id will always be NUMA_NO_NODE.

I would be *much* happier if we had an explicit mapping somewhere to the
ID the FW expects.

> > It would be much nicer if we could pass something based on the MPIDR,
> > which is a known HW construct, or if this implicitly affected the
> > current node.
> 
> IMO,  node id is sufficient.

I agree that *a* node ID is sufficient, I just don't think that we're
guaranteed to have the specific node ID the FW wants.

> > It would be vastly more sane for this to not be muxed at all. :/
> 
> i am helpless due to crappy hw design!

I'm certainly not blaming you for this! :)

I hope the HW designers don't make the same mistake in future, though...

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Sat, May 05, 2018 at 12:16:13AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> > On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >> + *
> >> + *  L3 Tile and DMC channel selection is through SMC call
> >> + *  SMC call arguments,
> >> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
> >> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
> >> + *   x2 = Node id
> >
> > How do we map Linux node IDs to the firmware's view of node IDs?
> >
> > I don't believe the two are necessarily the same -- Linux's node IDs are
> > a Linux-specific construct.
> 
> both are same, it is numa node id from ACPI/firmware.

I am very wary about assuming that the Linux nid will always be the same
as the ACPI node id.

For that to *potentially* be true, this driver should depend on
CONFIG_NUMA, NUMA must not be disabled on the command line, etc, or the
node id will always be NUMA_NO_NODE.

I would be *much* happier if we had an explicit mapping somewhere to the
ID the FW expects.

> > It would be much nicer if we could pass something based on the MPIDR,
> > which is a known HW construct, or if this implicitly affected the
> > current node.
> 
> IMO,  node id is sufficient.

I agree that *a* node ID is sufficient, I just don't think that we're
guaranteed to have the specific node ID the FW wants.

> > It would be vastly more sane for this to not be muxed at all. :/
> 
> i am helpless due to crappy hw design!

I'm certainly not blaming you for this! :)

I hope the HW designers don't make the same mistake in future, though...

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
> Hi Ganapat,
> 
> 
> Sorry for the delay in replying; I was away most of last week.
> 
> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> > wrote:
> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
> > > wrote:
> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
> 
> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore)
> > >>> +{
> > >>> + int counter;
> > >>> +
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> > >>> + pmu_uncore->uncore_dev->max_counters);
> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return -ENOSPC;
> > >>> + }
> > >>> + set_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return counter;
> > >>> +}
> > >>> +
> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore,
> > >>> + int counter)
> > >>> +{
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> +}
> > >>
> > >> I don't believe that locking is required in either of these, as the perf
> > >> core serializes pmu::add() and pmu::del(), where these get called.
> > 
> > without this locking, i am seeing "BUG: scheduling while atomic" when
> > i run perf with more events together than the maximum counters
> > supported
> 
> Did you manage to get to the bottom of this?
> 
> Do you have a backtrace?
> 
> It looks like in your latest posting you reserve counters through the
> userspace ABI, which doesn't seem right to me, and I'd like to
> understand the problem.

Looks like I misunderstood -- those are still allocated kernel-side.

I'll follow that up in the v5 posting.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
> Hi Ganapat,
> 
> 
> Sorry for the delay in replying; I was away most of last week.
> 
> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> > wrote:
> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  
> > > wrote:
> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
> 
> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore)
> > >>> +{
> > >>> + int counter;
> > >>> +
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> > >>> + pmu_uncore->uncore_dev->max_counters);
> > >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return -ENOSPC;
> > >>> + }
> > >>> + set_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> + return counter;
> > >>> +}
> > >>> +
> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel 
> > >>> *pmu_uncore,
> > >>> + int counter)
> > >>> +{
> > >>> + raw_spin_lock(_uncore->lock);
> > >>> + clear_bit(counter, pmu_uncore->counter_mask);
> > >>> + raw_spin_unlock(_uncore->lock);
> > >>> +}
> > >>
> > >> I don't believe that locking is required in either of these, as the perf
> > >> core serializes pmu::add() and pmu::del(), where these get called.
> > 
> > without this locking, i am seeing "BUG: scheduling while atomic" when
> > i run perf with more events together than the maximum counters
> > supported
> 
> Did you manage to get to the bottom of this?
> 
> Do you have a backtrace?
> 
> It looks like in your latest posting you reserve counters through the
> userspace ABI, which doesn't seem right to me, and I'd like to
> understand the problem.

Looks like I misunderstood -- those are still allocated kernel-side.

I'll follow that up in the v5 posting.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
Hi Ganapat,


Sorry for the delay in replying; I was away most of last week.

On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> wrote:
> > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> >>> +{
> >>> + int counter;
> >>> +
> >>> + raw_spin_lock(_uncore->lock);
> >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> >>> + pmu_uncore->uncore_dev->max_counters);
> >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return -ENOSPC;
> >>> + }
> >>> + set_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return counter;
> >>> +}
> >>> +
> >>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
> >>> + int counter)
> >>> +{
> >>> + raw_spin_lock(_uncore->lock);
> >>> + clear_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> +}
> >>
> >> I don't believe that locking is required in either of these, as the perf
> >> core serializes pmu::add() and pmu::del(), where these get called.
> 
> without this locking, i am seeing "BUG: scheduling while atomic" when
> i run perf with more events together than the maximum counters
> supported

Did you manage to get to the bottom of this?

Do you have a backtrace?

It looks like in your latest posting you reserve counters through the
userspace ABI, which doesn't seem right to me, and I'd like to
understand the problem.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-21 Thread Mark Rutland
Hi Ganapat,


Sorry for the delay in replying; I was away most of last week.

On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
> On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  
> wrote:
> > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:

> >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> >>> +{
> >>> + int counter;
> >>> +
> >>> + raw_spin_lock(_uncore->lock);
> >>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> >>> + pmu_uncore->uncore_dev->max_counters);
> >>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return -ENOSPC;
> >>> + }
> >>> + set_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> + return counter;
> >>> +}
> >>> +
> >>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
> >>> + int counter)
> >>> +{
> >>> + raw_spin_lock(_uncore->lock);
> >>> + clear_bit(counter, pmu_uncore->counter_mask);
> >>> + raw_spin_unlock(_uncore->lock);
> >>> +}
> >>
> >> I don't believe that locking is required in either of these, as the perf
> >> core serializes pmu::add() and pmu::del(), where these get called.
> 
> without this locking, i am seeing "BUG: scheduling while atomic" when
> i run perf with more events together than the maximum counters
> supported

Did you manage to get to the bottom of this?

Do you have a backtrace?

It looks like in your latest posting you reserve counters through the
userspace ABI, which doesn't seem right to me, and I'd like to
understand the problem.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-15 Thread Ganapatrao Kulkarni
Hi Mark,


On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  wrote:
> Hi Mark,
>
> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
>> Hi,
>>
>> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>> +
>>> +/* L3c and DMC has 16 and 8 channels per socket respectively.
>>> + * Each Channel supports UNCORE PMU device and consists of
>>> + * 4 independent programmable counters. Counters are 32 bit
>>> + * and does not support overflow interrupt, they needs to be
>>> + * sampled before overflow(i.e, at every 2 seconds).
>>> + */
>>> +
>>> +#define UNCORE_MAX_COUNTERS  4
>>> +#define UNCORE_L3_MAX_TILES  16
>>> +#define UNCORE_DMC_MAX_CHANNELS  8
>>> +
>>> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)
>>
>> How was a period of two seconds chosen?
>
> It has been suggested from hw team  to sample before  3 to 4 seconds.
>
>>
>> What's the maximum clock speed for the L3C and DMC?
>
> L3C at 1.5GHz and DMC at 1.2GHz
>>
>> Given that all channels compete for access to the muxed register
>> interface, I suspect we need to try more often than once every 2
>> seconds...
>
> 2 seconds seems to be sufficient. So far testing looks good.
>
>>
>> [...]
>>
>>> +struct active_timer {
>>> + struct perf_event *event;
>>> + struct hrtimer hrtimer;
>>> +};
>>> +
>>> +/*
>>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>>> + * each uncore device has up to 16 channels, each channel can sample
>>> + * events independently with counters up to 4.
>>> + *
>>> + * struct thunderx2_pmu_uncore_channel created per channel.
>>> + * struct thunderx2_pmu_uncore_dev per uncore device.
>>> + */
>>> +struct thunderx2_pmu_uncore_channel {
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> + struct pmu pmu;
>>
>> Can we put the pmu first in the struct, please?
>
> ok
>>
>>> + int counter;
>>
>> AFAICT, this counter field is never used.
>
> hmm ok, will remove.
>>
>>> + int channel;
>>> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
>>> + struct active_timer *active_timers;
>>
>> You should only need a single timer per channel, rather than one per
>> event.
>>
>> I think you can get rid of the active_timer structure, and have:
>>
>> struct perf_event *events[UNCORE_MAX_COUNTERS];
>> struct hrtimer timer;
>>
>
> thanks, will change as suggested.
>
>>> + /* to sync counter alloc/release */
>>> + raw_spinlock_t lock;
>>> +};
>>> +
>>> +struct thunderx2_pmu_uncore_dev {
>>> + char *name;
>>> + struct device *dev;
>>> + enum thunderx2_uncore_type type;
>>> + unsigned long base;
>>
>> This should be:
>>
>> void __iomem *base;
>
> ok
>>
>> [...]
>>
>>> +static ssize_t cpumask_show(struct device *dev,
>>> + struct device_attribute *attr, char *buf)
>>> +{
>>> + struct cpumask cpu_mask;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>>> +
>>> + /* Pick first online cpu from the node */
>>> + cpumask_clear(_mask);
>>> + cpumask_set_cpu(cpumask_first(
>>> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
>>> + _mask);
>>> +
>>> + return cpumap_print_to_pagebuf(true, buf, _mask);
>>> +}
>>
>> AFAICT cpumask_of_node() returns a mask that can include offline CPUs.
>>
>> Regardless, I don't think that you can keep track of the management CPU
>> this way. Please keep track of the CPU the PMU should be managed by,
>> either with a cpumask or int embedded within the PMU structure.
>
> thanks, will add hotplug callbacks.
>>
>> At hotplug time, you'll need to update the management CPU, calling
>> perf_pmu_migrate_context() when it is offlined.
>
> ok
>>
>> [...]
>>
>>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>>> +{
>>> + int counter;
>>> +
>>> + raw_spin_lock(_uncore->lock);
>>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>>> + pmu_uncore->uncore_dev->max_counters);
>>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>>> + raw_spin_unlock(_uncore->lock);
>>> + return -ENOSPC;
>>> + }
>>> + set_bit(counter, pmu_uncore->counter_mask);
>>> + raw_spin_unlock(_uncore->lock);
>>> + return counter;
>>> +}
>>> +
>>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
>>> + int counter)
>>> +{
>>> + raw_spin_lock(_uncore->lock);
>>> + clear_bit(counter, pmu_uncore->counter_mask);
>>> + raw_spin_unlock(_uncore->lock);
>>> +}
>>
>> I don't believe that locking is required in either of these, as the perf
>> core serializes pmu::add() and pmu::del(), where these get called.

without this locking, i am seeing "BUG: scheduling while atomic" when
i run perf with more events together than the maximum 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-15 Thread Ganapatrao Kulkarni
Hi Mark,


On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni  wrote:
> Hi Mark,
>
> On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
>> Hi,
>>
>> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>> +
>>> +/* L3c and DMC has 16 and 8 channels per socket respectively.
>>> + * Each Channel supports UNCORE PMU device and consists of
>>> + * 4 independent programmable counters. Counters are 32 bit
>>> + * and does not support overflow interrupt, they needs to be
>>> + * sampled before overflow(i.e, at every 2 seconds).
>>> + */
>>> +
>>> +#define UNCORE_MAX_COUNTERS  4
>>> +#define UNCORE_L3_MAX_TILES  16
>>> +#define UNCORE_DMC_MAX_CHANNELS  8
>>> +
>>> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)
>>
>> How was a period of two seconds chosen?
>
> It has been suggested from hw team  to sample before  3 to 4 seconds.
>
>>
>> What's the maximum clock speed for the L3C and DMC?
>
> L3C at 1.5GHz and DMC at 1.2GHz
>>
>> Given that all channels compete for access to the muxed register
>> interface, I suspect we need to try more often than once every 2
>> seconds...
>
> 2 seconds seems to be sufficient. So far testing looks good.
>
>>
>> [...]
>>
>>> +struct active_timer {
>>> + struct perf_event *event;
>>> + struct hrtimer hrtimer;
>>> +};
>>> +
>>> +/*
>>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>>> + * each uncore device has up to 16 channels, each channel can sample
>>> + * events independently with counters up to 4.
>>> + *
>>> + * struct thunderx2_pmu_uncore_channel created per channel.
>>> + * struct thunderx2_pmu_uncore_dev per uncore device.
>>> + */
>>> +struct thunderx2_pmu_uncore_channel {
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> + struct pmu pmu;
>>
>> Can we put the pmu first in the struct, please?
>
> ok
>>
>>> + int counter;
>>
>> AFAICT, this counter field is never used.
>
> hmm ok, will remove.
>>
>>> + int channel;
>>> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
>>> + struct active_timer *active_timers;
>>
>> You should only need a single timer per channel, rather than one per
>> event.
>>
>> I think you can get rid of the active_timer structure, and have:
>>
>> struct perf_event *events[UNCORE_MAX_COUNTERS];
>> struct hrtimer timer;
>>
>
> thanks, will change as suggested.
>
>>> + /* to sync counter alloc/release */
>>> + raw_spinlock_t lock;
>>> +};
>>> +
>>> +struct thunderx2_pmu_uncore_dev {
>>> + char *name;
>>> + struct device *dev;
>>> + enum thunderx2_uncore_type type;
>>> + unsigned long base;
>>
>> This should be:
>>
>> void __iomem *base;
>
> ok
>>
>> [...]
>>
>>> +static ssize_t cpumask_show(struct device *dev,
>>> + struct device_attribute *attr, char *buf)
>>> +{
>>> + struct cpumask cpu_mask;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>>> +
>>> + /* Pick first online cpu from the node */
>>> + cpumask_clear(_mask);
>>> + cpumask_set_cpu(cpumask_first(
>>> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
>>> + _mask);
>>> +
>>> + return cpumap_print_to_pagebuf(true, buf, _mask);
>>> +}
>>
>> AFAICT cpumask_of_node() returns a mask that can include offline CPUs.
>>
>> Regardless, I don't think that you can keep track of the management CPU
>> this way. Please keep track of the CPU the PMU should be managed by,
>> either with a cpumask or int embedded within the PMU structure.
>
> thanks, will add hotplug callbacks.
>>
>> At hotplug time, you'll need to update the management CPU, calling
>> perf_pmu_migrate_context() when it is offlined.
>
> ok
>>
>> [...]
>>
>>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>>> +{
>>> + int counter;
>>> +
>>> + raw_spin_lock(_uncore->lock);
>>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>>> + pmu_uncore->uncore_dev->max_counters);
>>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>>> + raw_spin_unlock(_uncore->lock);
>>> + return -ENOSPC;
>>> + }
>>> + set_bit(counter, pmu_uncore->counter_mask);
>>> + raw_spin_unlock(_uncore->lock);
>>> + return counter;
>>> +}
>>> +
>>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
>>> + int counter)
>>> +{
>>> + raw_spin_lock(_uncore->lock);
>>> + clear_bit(counter, pmu_uncore->counter_mask);
>>> + raw_spin_unlock(_uncore->lock);
>>> +}
>>
>> I don't believe that locking is required in either of these, as the perf
>> core serializes pmu::add() and pmu::del(), where these get called.

without this locking, i am seeing "BUG: scheduling while atomic" when
i run perf with more events together than the maximum counters
supported

>
> thanks, it seems 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-09 Thread Kim Phillips
On Fri, 4 May 2018 18:10:44 +0100
Robin Murphy  wrote:

> Hi Kim,

Hi Robin,

> On 04/05/18 01:30, Kim Phillips wrote:
> > On Tue, 1 May 2018 12:54:05 +0100
> > Will Deacon  wrote:
> >> On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> >>> On Fri, 27 Apr 2018 17:09:14 +0100
> >>> Will Deacon  wrote:
>  On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 15:37:20 +0100
> > Will Deacon  wrote:
> >> For anything under drivers/perf/, I'd prefer not to have these prints
> >> and instead see efforts to improve error reporting via the perf system
> >> call interface.
> >
> > We'd all prefer that, and for all PMU drivers, why should ones under
> > drivers/perf be treated differently?
> 
>  Because they're the ones I maintain...
> >>>
> >>> You represent a minority on your opinion on this matter though.
> >>
> >> I'm not sure that's true
> > 
> > I haven't seen another arch/PMU driver maintainer actively oppose it
> > other than you (and Mark).
> > 
> >> -- you tend to be the only one raising this issue;
> > 
> > If you're basing that on list activity, I wouldn't, since most Arm perf
> > users don't subscribe to LKML.  I'm trying to represent current and
> > future Arm perf users that shouldn't be expected to hang out here on
> > LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
> > themselves don't chime in, but will note that some drivers in
> > drivers/perf do print things from their event_init functions.
> 
> As a PMU driver author who *has* already invested not-inconsiderable 
> time and effort in trying to explain to you the technical issues from a 
> non-maintainer perspective (and is about to do so again), I can't help 
> but feel rather slighted by that comment.

Very sorry!  You're quite right.  I believe I had considered you as a
kernel developer first, and was trying to keep LKML activity affined to
LKML, vs. other activity on other mailing lists.

> > Also, you have been cc'd on off-list email threads where SPE users had
> > to debug the SPE driver's event_init() manually.  There are other SPE
> > users I know that have had to go through the same painstaking process of
> > debugging the SPE driver.  Last but not least, there's me, and I'd sure
> > appreciate more verbose PMU drivers, based on my real-life performance
> > monitoring experience(s).
> > 
> >> I think most people don't actually care one way or the other. If you know 
> >> of
> > 
> > Like I said, I think you're limiting your audience to LKML subscribers,
> > who are more amenable to being convinced they need to debug their
> > kernel.
> > 
> >> others who care about it too then I suggest you work together as a group to
> >> solve the problem in a way which is amenable to the upstream maintainers.
> > 
> > I know a few off-list people that would like a more user-friendly Arm
> > perf experience, but we're not qualified to solve the larger problem
> > that perf has had since its inception, and I think it's a little unfair
> > for you to suggest that, esp. given you're one of the maintainers, and
> > there's Linus on top of you.
> > 
> >> Then you won't have to worry about fixing it personally. You could even
> >> propose a topic for plumbers if there is enough interest in a general
> >> framework for propagating error messages to userspace.
> > 
> > I'm not worried about fixing it personally or not.  I'm worried about
> > our perf users' experience given your objection to using the vehicle
> > already in use on other architectures and PMU drivers.
> > 
> > If you - or anyone else for that matter - know of a way that'll get us
> > past this, by all means, say something.
> > 
> > If it helps, I recently asked acme if we could borrow bits from struct
> > perf_event and got no response.
> > 
> > As you are already aware, I've personally tried to fix this problem -
> > that has existed since before the introduction of the perf tool (I
> > consider it a syscall-independent enhanced error interface), multiple
> > times, and failed.
> 
>  Why is that my problem? Try harder?
> >>>
> >>> It's your problem because we're here reviewing a patch that happens to
> >>> fall under your maintainership.  I'll be the first person to tell you
> >>> I'm obviously incompetent and haven't been able to come up with a
> >>> solution that is acceptable for everyone up to and including Linus
> >>> Torvalds.  I'm just noticing a chronic usability problem that can be
> >>> easily alleviated in the context of this patch review.
> >>
> >> I don't see how incompetence has anything to do with it and, like most
> > 
> > Well you told me to try harder, and I'm trying to let you know that I
> > can try harder - ad infinitum - and still not get it, so telling me to
> > try harder isn't going to necessarily resolve the issue.
> > 
> >> people (lucky gits...), I 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-09 Thread Kim Phillips
On Fri, 4 May 2018 18:10:44 +0100
Robin Murphy  wrote:

> Hi Kim,

Hi Robin,

> On 04/05/18 01:30, Kim Phillips wrote:
> > On Tue, 1 May 2018 12:54:05 +0100
> > Will Deacon  wrote:
> >> On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> >>> On Fri, 27 Apr 2018 17:09:14 +0100
> >>> Will Deacon  wrote:
>  On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 15:37:20 +0100
> > Will Deacon  wrote:
> >> For anything under drivers/perf/, I'd prefer not to have these prints
> >> and instead see efforts to improve error reporting via the perf system
> >> call interface.
> >
> > We'd all prefer that, and for all PMU drivers, why should ones under
> > drivers/perf be treated differently?
> 
>  Because they're the ones I maintain...
> >>>
> >>> You represent a minority on your opinion on this matter though.
> >>
> >> I'm not sure that's true
> > 
> > I haven't seen another arch/PMU driver maintainer actively oppose it
> > other than you (and Mark).
> > 
> >> -- you tend to be the only one raising this issue;
> > 
> > If you're basing that on list activity, I wouldn't, since most Arm perf
> > users don't subscribe to LKML.  I'm trying to represent current and
> > future Arm perf users that shouldn't be expected to hang out here on
> > LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
> > themselves don't chime in, but will note that some drivers in
> > drivers/perf do print things from their event_init functions.
> 
> As a PMU driver author who *has* already invested not-inconsiderable 
> time and effort in trying to explain to you the technical issues from a 
> non-maintainer perspective (and is about to do so again), I can't help 
> but feel rather slighted by that comment.

Very sorry!  You're quite right.  I believe I had considered you as a
kernel developer first, and was trying to keep LKML activity affined to
LKML, vs. other activity on other mailing lists.

> > Also, you have been cc'd on off-list email threads where SPE users had
> > to debug the SPE driver's event_init() manually.  There are other SPE
> > users I know that have had to go through the same painstaking process of
> > debugging the SPE driver.  Last but not least, there's me, and I'd sure
> > appreciate more verbose PMU drivers, based on my real-life performance
> > monitoring experience(s).
> > 
> >> I think most people don't actually care one way or the other. If you know 
> >> of
> > 
> > Like I said, I think you're limiting your audience to LKML subscribers,
> > who are more amenable to being convinced they need to debug their
> > kernel.
> > 
> >> others who care about it too then I suggest you work together as a group to
> >> solve the problem in a way which is amenable to the upstream maintainers.
> > 
> > I know a few off-list people that would like a more user-friendly Arm
> > perf experience, but we're not qualified to solve the larger problem
> > that perf has had since its inception, and I think it's a little unfair
> > for you to suggest that, esp. given you're one of the maintainers, and
> > there's Linus on top of you.
> > 
> >> Then you won't have to worry about fixing it personally. You could even
> >> propose a topic for plumbers if there is enough interest in a general
> >> framework for propagating error messages to userspace.
> > 
> > I'm not worried about fixing it personally or not.  I'm worried about
> > our perf users' experience given your objection to using the vehicle
> > already in use on other architectures and PMU drivers.
> > 
> > If you - or anyone else for that matter - know of a way that'll get us
> > past this, by all means, say something.
> > 
> > If it helps, I recently asked acme if we could borrow bits from struct
> > perf_event and got no response.
> > 
> > As you are already aware, I've personally tried to fix this problem -
> > that has existed since before the introduction of the perf tool (I
> > consider it a syscall-independent enhanced error interface), multiple
> > times, and failed.
> 
>  Why is that my problem? Try harder?
> >>>
> >>> It's your problem because we're here reviewing a patch that happens to
> >>> fall under your maintainership.  I'll be the first person to tell you
> >>> I'm obviously incompetent and haven't been able to come up with a
> >>> solution that is acceptable for everyone up to and including Linus
> >>> Torvalds.  I'm just noticing a chronic usability problem that can be
> >>> easily alleviated in the context of this patch review.
> >>
> >> I don't see how incompetence has anything to do with it and, like most
> > 
> > Well you told me to try harder, and I'm trying to let you know that I
> > can try harder - ad infinitum - and still not get it, so telling me to
> > try harder isn't going to necessarily resolve the issue.
> > 
> >> people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
> >> internals. No arguments on the 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-04 Thread Ganapatrao Kulkarni
Hi Mark,

On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> Hi,
>
> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>> +
>> +/* L3c and DMC has 16 and 8 channels per socket respectively.
>> + * Each Channel supports UNCORE PMU device and consists of
>> + * 4 independent programmable counters. Counters are 32 bit
>> + * and does not support overflow interrupt, they needs to be
>> + * sampled before overflow(i.e, at every 2 seconds).
>> + */
>> +
>> +#define UNCORE_MAX_COUNTERS  4
>> +#define UNCORE_L3_MAX_TILES  16
>> +#define UNCORE_DMC_MAX_CHANNELS  8
>> +
>> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)
>
> How was a period of two seconds chosen?

It has been suggested from hw team  to sample before  3 to 4 seconds.

>
> What's the maximum clock speed for the L3C and DMC?

L3C at 1.5GHz and DMC at 1.2GHz
>
> Given that all channels compete for access to the muxed register
> interface, I suspect we need to try more often than once every 2
> seconds...

2 seconds seems to be sufficient. So far testing looks good.

>
> [...]
>
>> +struct active_timer {
>> + struct perf_event *event;
>> + struct hrtimer hrtimer;
>> +};
>> +
>> +/*
>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>> + * each uncore device has up to 16 channels, each channel can sample
>> + * events independently with counters up to 4.
>> + *
>> + * struct thunderx2_pmu_uncore_channel created per channel.
>> + * struct thunderx2_pmu_uncore_dev per uncore device.
>> + */
>> +struct thunderx2_pmu_uncore_channel {
>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>> + struct pmu pmu;
>
> Can we put the pmu first in the struct, please?

ok
>
>> + int counter;
>
> AFAICT, this counter field is never used.

hmm ok, will remove.
>
>> + int channel;
>> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
>> + struct active_timer *active_timers;
>
> You should only need a single timer per channel, rather than one per
> event.
>
> I think you can get rid of the active_timer structure, and have:
>
> struct perf_event *events[UNCORE_MAX_COUNTERS];
> struct hrtimer timer;
>

thanks, will change as suggested.

>> + /* to sync counter alloc/release */
>> + raw_spinlock_t lock;
>> +};
>> +
>> +struct thunderx2_pmu_uncore_dev {
>> + char *name;
>> + struct device *dev;
>> + enum thunderx2_uncore_type type;
>> + unsigned long base;
>
> This should be:
>
> void __iomem *base;

ok
>
> [...]
>
>> +static ssize_t cpumask_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct cpumask cpu_mask;
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>> +
>> + /* Pick first online cpu from the node */
>> + cpumask_clear(_mask);
>> + cpumask_set_cpu(cpumask_first(
>> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
>> + _mask);
>> +
>> + return cpumap_print_to_pagebuf(true, buf, _mask);
>> +}
>
> AFAICT cpumask_of_node() returns a mask that can include offline CPUs.
>
> Regardless, I don't think that you can keep track of the management CPU
> this way. Please keep track of the CPU the PMU should be managed by,
> either with a cpumask or int embedded within the PMU structure.

thanks, will add hotplug callbacks.
>
> At hotplug time, you'll need to update the management CPU, calling
> perf_pmu_migrate_context() when it is offlined.

ok
>
> [...]
>
>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>> +{
>> + int counter;
>> +
>> + raw_spin_lock(_uncore->lock);
>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> + pmu_uncore->uncore_dev->max_counters);
>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> + raw_spin_unlock(_uncore->lock);
>> + return -ENOSPC;
>> + }
>> + set_bit(counter, pmu_uncore->counter_mask);
>> + raw_spin_unlock(_uncore->lock);
>> + return counter;
>> +}
>> +
>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
>> + int counter)
>> +{
>> + raw_spin_lock(_uncore->lock);
>> + clear_bit(counter, pmu_uncore->counter_mask);
>> + raw_spin_unlock(_uncore->lock);
>> +}
>
> I don't believe that locking is required in either of these, as the perf
> core serializes pmu::add() and pmu::del(), where these get called.

thanks, it seems to be not required.
>
>> +
>> +/*
>> + * DMC and L3 counter interface is muxed across all channels.
>> + * hence we need to select the channel before accessing counter
>> + * data/control registers.
>
> Are there separate interfaces for all-dmc-channels and all-l3c-channels?

separate interface for DMC and L3c.
channels within DMC/L3C are muxed.

>
> ... or is a 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-04 Thread Ganapatrao Kulkarni
Hi Mark,

On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland  wrote:
> Hi,
>
> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>> +
>> +/* L3c and DMC has 16 and 8 channels per socket respectively.
>> + * Each Channel supports UNCORE PMU device and consists of
>> + * 4 independent programmable counters. Counters are 32 bit
>> + * and does not support overflow interrupt, they needs to be
>> + * sampled before overflow(i.e, at every 2 seconds).
>> + */
>> +
>> +#define UNCORE_MAX_COUNTERS  4
>> +#define UNCORE_L3_MAX_TILES  16
>> +#define UNCORE_DMC_MAX_CHANNELS  8
>> +
>> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)
>
> How was a period of two seconds chosen?

It has been suggested from hw team  to sample before  3 to 4 seconds.

>
> What's the maximum clock speed for the L3C and DMC?

L3C at 1.5GHz and DMC at 1.2GHz
>
> Given that all channels compete for access to the muxed register
> interface, I suspect we need to try more often than once every 2
> seconds...

2 seconds seems to be sufficient. So far testing looks good.

>
> [...]
>
>> +struct active_timer {
>> + struct perf_event *event;
>> + struct hrtimer hrtimer;
>> +};
>> +
>> +/*
>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>> + * each uncore device has up to 16 channels, each channel can sample
>> + * events independently with counters up to 4.
>> + *
>> + * struct thunderx2_pmu_uncore_channel created per channel.
>> + * struct thunderx2_pmu_uncore_dev per uncore device.
>> + */
>> +struct thunderx2_pmu_uncore_channel {
>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>> + struct pmu pmu;
>
> Can we put the pmu first in the struct, please?

ok
>
>> + int counter;
>
> AFAICT, this counter field is never used.

hmm ok, will remove.
>
>> + int channel;
>> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
>> + struct active_timer *active_timers;
>
> You should only need a single timer per channel, rather than one per
> event.
>
> I think you can get rid of the active_timer structure, and have:
>
> struct perf_event *events[UNCORE_MAX_COUNTERS];
> struct hrtimer timer;
>

thanks, will change as suggested.

>> + /* to sync counter alloc/release */
>> + raw_spinlock_t lock;
>> +};
>> +
>> +struct thunderx2_pmu_uncore_dev {
>> + char *name;
>> + struct device *dev;
>> + enum thunderx2_uncore_type type;
>> + unsigned long base;
>
> This should be:
>
> void __iomem *base;

ok
>
> [...]
>
>> +static ssize_t cpumask_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct cpumask cpu_mask;
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>> +
>> + /* Pick first online cpu from the node */
>> + cpumask_clear(_mask);
>> + cpumask_set_cpu(cpumask_first(
>> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
>> + _mask);
>> +
>> + return cpumap_print_to_pagebuf(true, buf, _mask);
>> +}
>
> AFAICT cpumask_of_node() returns a mask that can include offline CPUs.
>
> Regardless, I don't think that you can keep track of the management CPU
> this way. Please keep track of the CPU the PMU should be managed by,
> either with a cpumask or int embedded within the PMU structure.

thanks, will add hotplug callbacks.
>
> At hotplug time, you'll need to update the management CPU, calling
> perf_pmu_migrate_context() when it is offlined.

ok
>
> [...]
>
>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>> +{
>> + int counter;
>> +
>> + raw_spin_lock(_uncore->lock);
>> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> + pmu_uncore->uncore_dev->max_counters);
>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> + raw_spin_unlock(_uncore->lock);
>> + return -ENOSPC;
>> + }
>> + set_bit(counter, pmu_uncore->counter_mask);
>> + raw_spin_unlock(_uncore->lock);
>> + return counter;
>> +}
>> +
>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
>> + int counter)
>> +{
>> + raw_spin_lock(_uncore->lock);
>> + clear_bit(counter, pmu_uncore->counter_mask);
>> + raw_spin_unlock(_uncore->lock);
>> +}
>
> I don't believe that locking is required in either of these, as the perf
> core serializes pmu::add() and pmu::del(), where these get called.

thanks, it seems to be not required.
>
>> +
>> +/*
>> + * DMC and L3 counter interface is muxed across all channels.
>> + * hence we need to select the channel before accessing counter
>> + * data/control registers.
>
> Are there separate interfaces for all-dmc-channels and all-l3c-channels?

separate interface for DMC and L3c.
channels within DMC/L3C are muxed.

>
> ... or is a single interface used 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-04 Thread Robin Murphy

Hi Kim,

On 04/05/18 01:30, Kim Phillips wrote:

On Tue, 1 May 2018 12:54:05 +0100
Will Deacon  wrote:


Hi Kim,


Hi Will, thanks for responding.


On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:

On Fri, 27 Apr 2018 17:09:14 +0100
Will Deacon  wrote:

On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:

On Fri, 27 Apr 2018 15:37:20 +0100
Will Deacon  wrote:

For anything under drivers/perf/, I'd prefer not to have these prints
and instead see efforts to improve error reporting via the perf system
call interface.


We'd all prefer that, and for all PMU drivers, why should ones under
drivers/perf be treated differently?


Because they're the ones I maintain...


You represent a minority on your opinion on this matter though.


I'm not sure that's true


I haven't seen another arch/PMU driver maintainer actively oppose it
other than you (and Mark).


-- you tend to be the only one raising this issue;


If you're basing that on list activity, I wouldn't, since most Arm perf
users don't subscribe to LKML.  I'm trying to represent current and
future Arm perf users that shouldn't be expected to hang out here on
LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
themselves don't chime in, but will note that some drivers in
drivers/perf do print things from their event_init functions.


As a PMU driver author who *has* already invested not-inconsiderable 
time and effort in trying to explain to you the technical issues from a 
non-maintainer perspective (and is about to do so again), I can't help 
but feel rather slighted by that comment.



Also, you have been cc'd on off-list email threads where SPE users had
to debug the SPE driver's event_init() manually.  There are other SPE
users I know that have had to go through the same painstaking process of
debugging the SPE driver.  Last but not least, there's me, and I'd sure
appreciate more verbose PMU drivers, based on my real-life performance
monitoring experience(s).


I think most people don't actually care one way or the other. If you know of


Like I said, I think you're limiting your audience to LKML subscribers,
who are more amenable to being convinced they need to debug their
kernel.


others who care about it too then I suggest you work together as a group to
solve the problem in a way which is amenable to the upstream maintainers.


I know a few off-list people that would like a more user-friendly Arm
perf experience, but we're not qualified to solve the larger problem
that perf has had since its inception, and I think it's a little unfair
for you to suggest that, esp. given you're one of the maintainers, and
there's Linus on top of you.


Then you won't have to worry about fixing it personally. You could even
propose a topic for plumbers if there is enough interest in a general
framework for propagating error messages to userspace.


I'm not worried about fixing it personally or not.  I'm worried about
our perf users' experience given your objection to using the vehicle
already in use on other architectures and PMU drivers.

If you - or anyone else for that matter - know of a way that'll get us
past this, by all means, say something.

If it helps, I recently asked acme if we could borrow bits from struct
perf_event and got no response.


As you are already aware, I've personally tried to fix this problem -
that has existed since before the introduction of the perf tool (I
consider it a syscall-independent enhanced error interface), multiple
times, and failed.


Why is that my problem? Try harder?


It's your problem because we're here reviewing a patch that happens to
fall under your maintainership.  I'll be the first person to tell you
I'm obviously incompetent and haven't been able to come up with a
solution that is acceptable for everyone up to and including Linus
Torvalds.  I'm just noticing a chronic usability problem that can be
easily alleviated in the context of this patch review.


I don't see how incompetence has anything to do with it and, like most


Well you told me to try harder, and I'm trying to let you know that I
can try harder - ad infinitum - and still not get it, so telling me to
try harder isn't going to necessarily resolve the issue.


people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
internals. No arguments on the usability problem, but this ain't the fix


Like acme, I doubt Linus runs perf on real Arm h/w, so yes, I don't
think they care that much, but if they did try to do it one day, I'd
like to that they'd be able to have dmesg contain that extra error
information for them.

As you know, David Howells wrote a supplemental error syscall facility
[1], that Linus nacked (off list no less).  Does Linus care about what
David Howells was developing it for?  I don't know, but he does have
enough experience to know whether a certain approach to a problem is
sustainable in his kernel or not.


you're 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-04 Thread Robin Murphy

Hi Kim,

On 04/05/18 01:30, Kim Phillips wrote:

On Tue, 1 May 2018 12:54:05 +0100
Will Deacon  wrote:


Hi Kim,


Hi Will, thanks for responding.


On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:

On Fri, 27 Apr 2018 17:09:14 +0100
Will Deacon  wrote:

On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:

On Fri, 27 Apr 2018 15:37:20 +0100
Will Deacon  wrote:

For anything under drivers/perf/, I'd prefer not to have these prints
and instead see efforts to improve error reporting via the perf system
call interface.


We'd all prefer that, and for all PMU drivers, why should ones under
drivers/perf be treated differently?


Because they're the ones I maintain...


You represent a minority on your opinion on this matter though.


I'm not sure that's true


I haven't seen another arch/PMU driver maintainer actively oppose it
other than you (and Mark).


-- you tend to be the only one raising this issue;


If you're basing that on list activity, I wouldn't, since most Arm perf
users don't subscribe to LKML.  I'm trying to represent current and
future Arm perf users that shouldn't be expected to hang out here on
LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
themselves don't chime in, but will note that some drivers in
drivers/perf do print things from their event_init functions.


As a PMU driver author who *has* already invested not-inconsiderable 
time and effort in trying to explain to you the technical issues from a 
non-maintainer perspective (and is about to do so again), I can't help 
but feel rather slighted by that comment.



Also, you have been cc'd on off-list email threads where SPE users had
to debug the SPE driver's event_init() manually.  There are other SPE
users I know that have had to go through the same painstaking process of
debugging the SPE driver.  Last but not least, there's me, and I'd sure
appreciate more verbose PMU drivers, based on my real-life performance
monitoring experience(s).


I think most people don't actually care one way or the other. If you know of


Like I said, I think you're limiting your audience to LKML subscribers,
who are more amenable to being convinced they need to debug their
kernel.


others who care about it too then I suggest you work together as a group to
solve the problem in a way which is amenable to the upstream maintainers.


I know a few off-list people that would like a more user-friendly Arm
perf experience, but we're not qualified to solve the larger problem
that perf has had since its inception, and I think it's a little unfair
for you to suggest that, esp. given you're one of the maintainers, and
there's Linus on top of you.


Then you won't have to worry about fixing it personally. You could even
propose a topic for plumbers if there is enough interest in a general
framework for propagating error messages to userspace.


I'm not worried about fixing it personally or not.  I'm worried about
our perf users' experience given your objection to using the vehicle
already in use on other architectures and PMU drivers.

If you - or anyone else for that matter - know of a way that'll get us
past this, by all means, say something.

If it helps, I recently asked acme if we could borrow bits from struct
perf_event and got no response.


As you are already aware, I've personally tried to fix this problem -
that has existed since before the introduction of the perf tool (I
consider it a syscall-independent enhanced error interface), multiple
times, and failed.


Why is that my problem? Try harder?


It's your problem because we're here reviewing a patch that happens to
fall under your maintainership.  I'll be the first person to tell you
I'm obviously incompetent and haven't been able to come up with a
solution that is acceptable for everyone up to and including Linus
Torvalds.  I'm just noticing a chronic usability problem that can be
easily alleviated in the context of this patch review.


I don't see how incompetence has anything to do with it and, like most


Well you told me to try harder, and I'm trying to let you know that I
can try harder - ad infinitum - and still not get it, so telling me to
try harder isn't going to necessarily resolve the issue.


people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
internals. No arguments on the usability problem, but this ain't the fix


Like acme, I doubt Linus runs perf on real Arm h/w, so yes, I don't
think they care that much, but if they did try to do it one day, I'd
like to that they'd be able to have dmesg contain that extra error
information for them.

As you know, David Howells wrote a supplemental error syscall facility
[1], that Linus nacked (off list no less).  Does Linus care about what
David Howells was developing it for?  I don't know, but he does have
enough experience to know whether a certain approach to a problem is
sustainable in his kernel or not.


you're looking for: it's a bodge. We've been over this many times 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-03 Thread Kim Phillips
On Tue, 1 May 2018 12:54:05 +0100
Will Deacon  wrote:

> Hi Kim,

Hi Will, thanks for responding.

> On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 17:09:14 +0100
> > Will Deacon  wrote:
> > > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > > > On Fri, 27 Apr 2018 15:37:20 +0100
> > > > Will Deacon  wrote:
> > > > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > > > and instead see efforts to improve error reporting via the perf system
> > > > > call interface.
> > > > 
> > > > We'd all prefer that, and for all PMU drivers, why should ones under
> > > > drivers/perf be treated differently?
> > > 
> > > Because they're the ones I maintain...
> > 
> > You represent a minority on your opinion on this matter though.
> 
> I'm not sure that's true

I haven't seen another arch/PMU driver maintainer actively oppose it
other than you (and Mark).

> -- you tend to be the only one raising this issue;

If you're basing that on list activity, I wouldn't, since most Arm perf
users don't subscribe to LKML.  I'm trying to represent current and
future Arm perf users that shouldn't be expected to hang out here on
LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
themselves don't chime in, but will note that some drivers in
drivers/perf do print things from their event_init functions.

Also, you have been cc'd on off-list email threads where SPE users had
to debug the SPE driver's event_init() manually.  There are other SPE
users I know that have had to go through the same painstaking process of
debugging the SPE driver.  Last but not least, there's me, and I'd sure
appreciate more verbose PMU drivers, based on my real-life performance
monitoring experience(s).

> I think most people don't actually care one way or the other. If you know of

Like I said, I think you're limiting your audience to LKML subscribers,
who are more amenable to being convinced they need to debug their
kernel.

> others who care about it too then I suggest you work together as a group to
> solve the problem in a way which is amenable to the upstream maintainers.

I know a few off-list people that would like a more user-friendly Arm
perf experience, but we're not qualified to solve the larger problem
that perf has had since its inception, and I think it's a little unfair
for you to suggest that, esp. given you're one of the maintainers, and
there's Linus on top of you.

> Then you won't have to worry about fixing it personally. You could even
> propose a topic for plumbers if there is enough interest in a general
> framework for propagating error messages to userspace.

I'm not worried about fixing it personally or not.  I'm worried about
our perf users' experience given your objection to using the vehicle
already in use on other architectures and PMU drivers.

If you - or anyone else for that matter - know of a way that'll get us
past this, by all means, say something.

If it helps, I recently asked acme if we could borrow bits from struct
perf_event and got no response.

> > > > As you are already aware, I've personally tried to fix this problem -
> > > > that has existed since before the introduction of the perf tool (I
> > > > consider it a syscall-independent enhanced error interface), multiple
> > > > times, and failed.
> > > 
> > > Why is that my problem? Try harder?
> > 
> > It's your problem because we're here reviewing a patch that happens to
> > fall under your maintainership.  I'll be the first person to tell you
> > I'm obviously incompetent and haven't been able to come up with a
> > solution that is acceptable for everyone up to and including Linus
> > Torvalds.  I'm just noticing a chronic usability problem that can be
> > easily alleviated in the context of this patch review.
> 
> I don't see how incompetence has anything to do with it and, like most

Well you told me to try harder, and I'm trying to let you know that I
can try harder - ad infinitum - and still not get it, so telling me to
try harder isn't going to necessarily resolve the issue.

> people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
> internals. No arguments on the usability problem, but this ain't the fix

Like acme, I doubt Linus runs perf on real Arm h/w, so yes, I don't
think they care that much, but if they did try to do it one day, I'd
like to that they'd be able to have dmesg contain that extra error
information for them.

As you know, David Howells wrote a supplemental error syscall facility
[1], that Linus nacked (off list no less).  Does Linus care about what
David Howells was developing it for?  I don't know, but he does have
enough experience to know whether a certain approach to a problem is
sustainable in his kernel or not.

> you're looking for: it's a bodge. We've been over this many times before.

I think it's OK - necessary even - until this error 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-03 Thread Kim Phillips
On Tue, 1 May 2018 12:54:05 +0100
Will Deacon  wrote:

> Hi Kim,

Hi Will, thanks for responding.

> On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 17:09:14 +0100
> > Will Deacon  wrote:
> > > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > > > On Fri, 27 Apr 2018 15:37:20 +0100
> > > > Will Deacon  wrote:
> > > > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > > > and instead see efforts to improve error reporting via the perf system
> > > > > call interface.
> > > > 
> > > > We'd all prefer that, and for all PMU drivers, why should ones under
> > > > drivers/perf be treated differently?
> > > 
> > > Because they're the ones I maintain...
> > 
> > You represent a minority on your opinion on this matter though.
> 
> I'm not sure that's true

I haven't seen another arch/PMU driver maintainer actively oppose it
other than you (and Mark).

> -- you tend to be the only one raising this issue;

If you're basing that on list activity, I wouldn't, since most Arm perf
users don't subscribe to LKML.  I'm trying to represent current and
future Arm perf users that shouldn't be expected to hang out here on
LKML and review PMU drivers.  Couldn't tell you why PMU driver authors
themselves don't chime in, but will note that some drivers in
drivers/perf do print things from their event_init functions.

Also, you have been cc'd on off-list email threads where SPE users had
to debug the SPE driver's event_init() manually.  There are other SPE
users I know that have had to go through the same painstaking process of
debugging the SPE driver.  Last but not least, there's me, and I'd sure
appreciate more verbose PMU drivers, based on my real-life performance
monitoring experience(s).

> I think most people don't actually care one way or the other. If you know of

Like I said, I think you're limiting your audience to LKML subscribers,
who are more amenable to being convinced they need to debug their
kernel.

> others who care about it too then I suggest you work together as a group to
> solve the problem in a way which is amenable to the upstream maintainers.

I know a few off-list people that would like a more user-friendly Arm
perf experience, but we're not qualified to solve the larger problem
that perf has had since its inception, and I think it's a little unfair
for you to suggest that, esp. given you're one of the maintainers, and
there's Linus on top of you.

> Then you won't have to worry about fixing it personally. You could even
> propose a topic for plumbers if there is enough interest in a general
> framework for propagating error messages to userspace.

I'm not worried about fixing it personally or not.  I'm worried about
our perf users' experience given your objection to using the vehicle
already in use on other architectures and PMU drivers.

If you - or anyone else for that matter - know of a way that'll get us
past this, by all means, say something.

If it helps, I recently asked acme if we could borrow bits from struct
perf_event and got no response.

> > > > As you are already aware, I've personally tried to fix this problem -
> > > > that has existed since before the introduction of the perf tool (I
> > > > consider it a syscall-independent enhanced error interface), multiple
> > > > times, and failed.
> > > 
> > > Why is that my problem? Try harder?
> > 
> > It's your problem because we're here reviewing a patch that happens to
> > fall under your maintainership.  I'll be the first person to tell you
> > I'm obviously incompetent and haven't been able to come up with a
> > solution that is acceptable for everyone up to and including Linus
> > Torvalds.  I'm just noticing a chronic usability problem that can be
> > easily alleviated in the context of this patch review.
> 
> I don't see how incompetence has anything to do with it and, like most

Well you told me to try harder, and I'm trying to let you know that I
can try harder - ad infinitum - and still not get it, so telling me to
try harder isn't going to necessarily resolve the issue.

> people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
> internals. No arguments on the usability problem, but this ain't the fix

Like acme, I doubt Linus runs perf on real Arm h/w, so yes, I don't
think they care that much, but if they did try to do it one day, I'd
like to that they'd be able to have dmesg contain that extra error
information for them.

As you know, David Howells wrote a supplemental error syscall facility
[1], that Linus nacked (off list no less).  Does Linus care about what
David Howells was developing it for?  I don't know, but he does have
enough experience to know whether a certain approach to a problem is
sustainable in his kernel or not.

> you're looking for: it's a bodge. We've been over this many times before.

I think it's OK - necessary even - until this error infrastructure
problem perf has is fixed, and I think you're being 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-01 Thread Will Deacon
Hi Kim,

On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 17:09:14 +0100
> Will Deacon  wrote:
> > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > > On Fri, 27 Apr 2018 15:37:20 +0100
> > > Will Deacon  wrote:
> > > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > > and instead see efforts to improve error reporting via the perf system
> > > > call interface.
> > > 
> > > We'd all prefer that, and for all PMU drivers, why should ones under
> > > drivers/perf be treated differently?
> > 
> > Because they're the ones I maintain...
> 
> You represent a minority on your opinion on this matter though.

I'm not sure that's true -- you tend to be the only one raising this issue;
I think most people don't actually care one way or the other. If you know of
others who care about it too then I suggest you work together as a group to
solve the problem in a way which is amenable to the upstream maintainers.
Then you won't have to worry about fixing it personally. You could even
propose a topic for plumbers if there is enough interest in a general
framework for propagating error messages to userspace.

> > > As you are already aware, I've personally tried to fix this problem -
> > > that has existed since before the introduction of the perf tool (I
> > > consider it a syscall-independent enhanced error interface), multiple
> > > times, and failed.
> > 
> > Why is that my problem? Try harder?
> 
> It's your problem because we're here reviewing a patch that happens to
> fall under your maintainership.  I'll be the first person to tell you
> I'm obviously incompetent and haven't been able to come up with a
> solution that is acceptable for everyone up to and including Linus
> Torvalds.  I'm just noticing a chronic usability problem that can be
> easily alleviated in the context of this patch review.

I don't see how incompetence has anything to do with it and, like most
people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
internals. No arguments on the usability problem, but this ain't the fix
you're looking for: it's a bodge. We've been over this many times before.

> > > So until someone comes up with a solution that works for everyone
> > > up to and including Linus Torvalds (who hasn't put up a problem
> > > pulling PMU drivers emitting things to dmesg so far, by the way), this
> > > keep PMU drivers' errors silent preference of yours is unnecessarily
> > > impeding people trying to measure system performance on Arm based
> > > machines - all other archs' maintainers are fine with PMU drivers using
> > > dmesg.
> > 
> > Good for them, although I'm pretty sure that at least the x86 folks are
> > against this crap too.
> 
> Unfortunately, it doesn't affect them nearly as much as it does our
> more diverse platforms, which is why I don't think they care to do
> much about it.

x86 has its fair share of PMUs too, so I don't think we're special in this
regard. The main difference seems to be that they have better support in
the perf tool for diagnosing problems.

> > > > Anyway, I think this driver has bigger problems that need addressing.
> > > 
> > > To me it represents yet another PMU driver submission - as the years go
> > > by - that is lacking in the user messaging area.  Which reminds me, can
> > > you take another look at applying this?:
> > 
> > As I said before, I'm not going to take anything that logs above pr_debug
> > for things that are directly triggerable from userspace. Spin a version
> 
> Why?  There are plenty of things that emit stuff into dmesg that are
> directly triggerable from userspace.  Is it because it upsets fuzzing
> tests?  How about those be run with a patched kernel that somehow
> mitigates the printing?

[we've been over this before]

There are two camps that might find this information useful:

  1. People writing userspace support for a new PMU using the
 perf_event_open syscall

  2. People trying to use a tool which abstracts the PMU details to some
 degree (e.g. perf tool)

Those in (1) can get by with pr_debug. Sure, they have to recompile, but
it's not like there are many people in this camp and they'll probably be
working with the PMU driver writer to some degree anyway and taking new
kernel drops.

Those in (2) may not have access to dmesg, absolutely should not be able
to spam it (since this could hide other important messages), will probably
struggle to match an internal message from the PMU driver to their
invocation of the high-level tool and may well be in a multi-user
environment so will struggle to identify the messages that they are
responsible for. What they actually want is for the perf tool to give
them more information, and dmesg is not the right channel for that, for
similar reasons.

> > using pr_debug and I'll queue it.
> 
> How about using a ratelimited dev_err variant?

Nah, I don't want dmesg being parsed by 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-05-01 Thread Will Deacon
Hi Kim,

On Fri, Apr 27, 2018 at 11:56:25AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 17:09:14 +0100
> Will Deacon  wrote:
> > On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > > On Fri, 27 Apr 2018 15:37:20 +0100
> > > Will Deacon  wrote:
> > > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > > and instead see efforts to improve error reporting via the perf system
> > > > call interface.
> > > 
> > > We'd all prefer that, and for all PMU drivers, why should ones under
> > > drivers/perf be treated differently?
> > 
> > Because they're the ones I maintain...
> 
> You represent a minority on your opinion on this matter though.

I'm not sure that's true -- you tend to be the only one raising this issue;
I think most people don't actually care one way or the other. If you know of
others who care about it too then I suggest you work together as a group to
solve the problem in a way which is amenable to the upstream maintainers.
Then you won't have to worry about fixing it personally. You could even
propose a topic for plumbers if there is enough interest in a general
framework for propagating error messages to userspace.

> > > As you are already aware, I've personally tried to fix this problem -
> > > that has existed since before the introduction of the perf tool (I
> > > consider it a syscall-independent enhanced error interface), multiple
> > > times, and failed.
> > 
> > Why is that my problem? Try harder?
> 
> It's your problem because we're here reviewing a patch that happens to
> fall under your maintainership.  I'll be the first person to tell you
> I'm obviously incompetent and haven't been able to come up with a
> solution that is acceptable for everyone up to and including Linus
> Torvalds.  I'm just noticing a chronic usability problem that can be
> easily alleviated in the context of this patch review.

I don't see how incompetence has anything to do with it and, like most
people (lucky gits...), I doubt Torvalds cares too deeply about PMU driver
internals. No arguments on the usability problem, but this ain't the fix
you're looking for: it's a bodge. We've been over this many times before.

> > > So until someone comes up with a solution that works for everyone
> > > up to and including Linus Torvalds (who hasn't put up a problem
> > > pulling PMU drivers emitting things to dmesg so far, by the way), this
> > > keep PMU drivers' errors silent preference of yours is unnecessarily
> > > impeding people trying to measure system performance on Arm based
> > > machines - all other archs' maintainers are fine with PMU drivers using
> > > dmesg.
> > 
> > Good for them, although I'm pretty sure that at least the x86 folks are
> > against this crap too.
> 
> Unfortunately, it doesn't affect them nearly as much as it does our
> more diverse platforms, which is why I don't think they care to do
> much about it.

x86 has its fair share of PMUs too, so I don't think we're special in this
regard. The main difference seems to be that they have better support in
the perf tool for diagnosing problems.

> > > > Anyway, I think this driver has bigger problems that need addressing.
> > > 
> > > To me it represents yet another PMU driver submission - as the years go
> > > by - that is lacking in the user messaging area.  Which reminds me, can
> > > you take another look at applying this?:
> > 
> > As I said before, I'm not going to take anything that logs above pr_debug
> > for things that are directly triggerable from userspace. Spin a version
> 
> Why?  There are plenty of things that emit stuff into dmesg that are
> directly triggerable from userspace.  Is it because it upsets fuzzing
> tests?  How about those be run with a patched kernel that somehow
> mitigates the printing?

[we've been over this before]

There are two camps that might find this information useful:

  1. People writing userspace support for a new PMU using the
 perf_event_open syscall

  2. People trying to use a tool which abstracts the PMU details to some
 degree (e.g. perf tool)

Those in (1) can get by with pr_debug. Sure, they have to recompile, but
it's not like there are many people in this camp and they'll probably be
working with the PMU driver writer to some degree anyway and taking new
kernel drops.

Those in (2) may not have access to dmesg, absolutely should not be able
to spam it (since this could hide other important messages), will probably
struggle to match an internal message from the PMU driver to their
invocation of the high-level tool and may well be in a multi-user
environment so will struggle to identify the messages that they are
responsible for. What they actually want is for the perf tool to give
them more information, and dmesg is not the right channel for that, for
similar reasons.

> > using pr_debug and I'll queue it.
> 
> How about using a ratelimited dev_err variant?

Nah, I don't want dmesg being parsed by users of perf tool. pr_debug should
be 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 17:09:14 +0100
Will Deacon  wrote:

> Kim,
> 
> [Ganapat: please don't let this discussion disrupt your PMU driver
>  development. You can safely ignore it for now :)]
> 
> On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 15:37:20 +0100
> > Will Deacon  wrote:
> > 
> > > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > > > On Fri, 27 Apr 2018 10:30:27 +0100
> > > > Mark Rutland  wrote:
> > > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > > > Ganapatrao Kulkarni  wrote:
> > > > > > 
> > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > > > 
> > > > > > This PMU driver can be made more user-friendly by not just silently
> > > > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > > > message describing the specific error via dmesg.
> > > > > 
> > > > > As has previously been discussed on several occasions, patches which 
> > > > > log
> > > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > > > of driver-specific constraints.
> > > > 
> > > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), 
> > > > etc.
> > > > 
> > > > > I would appreciate if in future you could qualify your suggestion with
> > > > > the requirement that pr_debug() is used.
> > > > 
> > > > It shouldn't - the driver isn't being debugged, it's in regular use.
> > > 
> > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > and instead see efforts to improve error reporting via the perf system
> > > call interface.
> > 
> > We'd all prefer that, and for all PMU drivers, why should ones under
> > drivers/perf be treated differently?
> 
> Because they're the ones I maintain...

You represent a minority on your opinion on this matter though.

> > As you are already aware, I've personally tried to fix this problem -
> > that has existed since before the introduction of the perf tool (I
> > consider it a syscall-independent enhanced error interface), multiple
> > times, and failed.
> 
> Why is that my problem? Try harder?

It's your problem because we're here reviewing a patch that happens to
fall under your maintainership.  I'll be the first person to tell you
I'm obviously incompetent and haven't been able to come up with a
solution that is acceptable for everyone up to and including Linus
Torvalds.  I'm just noticing a chronic usability problem that can be
easily alleviated in the context of this patch review.

> > So until someone comes up with a solution that works for everyone
> > up to and including Linus Torvalds (who hasn't put up a problem
> > pulling PMU drivers emitting things to dmesg so far, by the way), this
> > keep PMU drivers' errors silent preference of yours is unnecessarily
> > impeding people trying to measure system performance on Arm based
> > machines - all other archs' maintainers are fine with PMU drivers using
> > dmesg.
> 
> Good for them, although I'm pretty sure that at least the x86 folks are
> against this crap too.

Unfortunately, it doesn't affect them nearly as much as it does our
more diverse platforms, which is why I don't think they care to do
much about it.

> > > Anyway, I think this driver has bigger problems that need addressing.
> > 
> > To me it represents yet another PMU driver submission - as the years go
> > by - that is lacking in the user messaging area.  Which reminds me, can
> > you take another look at applying this?:
> 
> As I said before, I'm not going to take anything that logs above pr_debug
> for things that are directly triggerable from userspace. Spin a version

Why?  There are plenty of things that emit stuff into dmesg that are
directly triggerable from userspace.  Is it because it upsets fuzzing
tests?  How about those be run with a patched kernel that somehow
mitigates the printing?

> using pr_debug and I'll queue it.

How about using a ratelimited dev_err variant?

> Have a good weekend,

You too.

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 17:09:14 +0100
Will Deacon  wrote:

> Kim,
> 
> [Ganapat: please don't let this discussion disrupt your PMU driver
>  development. You can safely ignore it for now :)]
> 
> On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 15:37:20 +0100
> > Will Deacon  wrote:
> > 
> > > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > > > On Fri, 27 Apr 2018 10:30:27 +0100
> > > > Mark Rutland  wrote:
> > > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > > > Ganapatrao Kulkarni  wrote:
> > > > > > 
> > > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > > > 
> > > > > > This PMU driver can be made more user-friendly by not just silently
> > > > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > > > message describing the specific error via dmesg.
> > > > > 
> > > > > As has previously been discussed on several occasions, patches which 
> > > > > log
> > > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > > > of driver-specific constraints.
> > > > 
> > > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), 
> > > > etc.
> > > > 
> > > > > I would appreciate if in future you could qualify your suggestion with
> > > > > the requirement that pr_debug() is used.
> > > > 
> > > > It shouldn't - the driver isn't being debugged, it's in regular use.
> > > 
> > > For anything under drivers/perf/, I'd prefer not to have these prints
> > > and instead see efforts to improve error reporting via the perf system
> > > call interface.
> > 
> > We'd all prefer that, and for all PMU drivers, why should ones under
> > drivers/perf be treated differently?
> 
> Because they're the ones I maintain...

You represent a minority on your opinion on this matter though.

> > As you are already aware, I've personally tried to fix this problem -
> > that has existed since before the introduction of the perf tool (I
> > consider it a syscall-independent enhanced error interface), multiple
> > times, and failed.
> 
> Why is that my problem? Try harder?

It's your problem because we're here reviewing a patch that happens to
fall under your maintainership.  I'll be the first person to tell you
I'm obviously incompetent and haven't been able to come up with a
solution that is acceptable for everyone up to and including Linus
Torvalds.  I'm just noticing a chronic usability problem that can be
easily alleviated in the context of this patch review.

> > So until someone comes up with a solution that works for everyone
> > up to and including Linus Torvalds (who hasn't put up a problem
> > pulling PMU drivers emitting things to dmesg so far, by the way), this
> > keep PMU drivers' errors silent preference of yours is unnecessarily
> > impeding people trying to measure system performance on Arm based
> > machines - all other archs' maintainers are fine with PMU drivers using
> > dmesg.
> 
> Good for them, although I'm pretty sure that at least the x86 folks are
> against this crap too.

Unfortunately, it doesn't affect them nearly as much as it does our
more diverse platforms, which is why I don't think they care to do
much about it.

> > > Anyway, I think this driver has bigger problems that need addressing.
> > 
> > To me it represents yet another PMU driver submission - as the years go
> > by - that is lacking in the user messaging area.  Which reminds me, can
> > you take another look at applying this?:
> 
> As I said before, I'm not going to take anything that logs above pr_debug
> for things that are directly triggerable from userspace. Spin a version

Why?  There are plenty of things that emit stuff into dmesg that are
directly triggerable from userspace.  Is it because it upsets fuzzing
tests?  How about those be run with a patched kernel that somehow
mitigates the printing?

> using pr_debug and I'll queue it.

How about using a ratelimited dev_err variant?

> Have a good weekend,

You too.

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Will Deacon
Kim,

[Ganapat: please don't let this discussion disrupt your PMU driver
 development. You can safely ignore it for now :)]

On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 15:37:20 +0100
> Will Deacon  wrote:
> 
> > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > > On Fri, 27 Apr 2018 10:30:27 +0100
> > > Mark Rutland  wrote:
> > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > > Ganapatrao Kulkarni  wrote:
> > > > > 
> > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > > 
> > > > > This PMU driver can be made more user-friendly by not just silently
> > > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > > message describing the specific error via dmesg.
> > > > 
> > > > As has previously been discussed on several occasions, patches which log
> > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > > of driver-specific constraints.
> > > 
> > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> > > 
> > > > I would appreciate if in future you could qualify your suggestion with
> > > > the requirement that pr_debug() is used.
> > > 
> > > It shouldn't - the driver isn't being debugged, it's in regular use.
> > 
> > For anything under drivers/perf/, I'd prefer not to have these prints
> > and instead see efforts to improve error reporting via the perf system
> > call interface.
> 
> We'd all prefer that, and for all PMU drivers, why should ones under
> drivers/perf be treated differently?

Because they're the ones I maintain...

> As you are already aware, I've personally tried to fix this problem -
> that has existed since before the introduction of the perf tool (I
> consider it a syscall-independent enhanced error interface), multiple
> times, and failed.

Why is that my problem? Try harder?

> So until someone comes up with a solution that works for everyone
> up to and including Linus Torvalds (who hasn't put up a problem
> pulling PMU drivers emitting things to dmesg so far, by the way), this
> keep PMU drivers' errors silent preference of yours is unnecessarily
> impeding people trying to measure system performance on Arm based
> machines - all other archs' maintainers are fine with PMU drivers using
> dmesg.

Good for them, although I'm pretty sure that at least the x86 folks are
against this crap too.

> > Anyway, I think this driver has bigger problems that need addressing.
> 
> To me it represents yet another PMU driver submission - as the years go
> by - that is lacking in the user messaging area.  Which reminds me, can
> you take another look at applying this?:

As I said before, I'm not going to take anything that logs above pr_debug
for things that are directly triggerable from userspace. Spin a version
using pr_debug and I'll queue it.

Have a good weekend,

Will


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Will Deacon
Kim,

[Ganapat: please don't let this discussion disrupt your PMU driver
 development. You can safely ignore it for now :)]

On Fri, Apr 27, 2018 at 10:46:29AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 15:37:20 +0100
> Will Deacon  wrote:
> 
> > On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > > On Fri, 27 Apr 2018 10:30:27 +0100
> > > Mark Rutland  wrote:
> > > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > > Ganapatrao Kulkarni  wrote:
> > > > > 
> > > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > > 
> > > > > This PMU driver can be made more user-friendly by not just silently
> > > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > > message describing the specific error via dmesg.
> > > > 
> > > > As has previously been discussed on several occasions, patches which log
> > > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > > of driver-specific constraints.
> > > 
> > > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> > > 
> > > > I would appreciate if in future you could qualify your suggestion with
> > > > the requirement that pr_debug() is used.
> > > 
> > > It shouldn't - the driver isn't being debugged, it's in regular use.
> > 
> > For anything under drivers/perf/, I'd prefer not to have these prints
> > and instead see efforts to improve error reporting via the perf system
> > call interface.
> 
> We'd all prefer that, and for all PMU drivers, why should ones under
> drivers/perf be treated differently?

Because they're the ones I maintain...

> As you are already aware, I've personally tried to fix this problem -
> that has existed since before the introduction of the perf tool (I
> consider it a syscall-independent enhanced error interface), multiple
> times, and failed.

Why is that my problem? Try harder?

> So until someone comes up with a solution that works for everyone
> up to and including Linus Torvalds (who hasn't put up a problem
> pulling PMU drivers emitting things to dmesg so far, by the way), this
> keep PMU drivers' errors silent preference of yours is unnecessarily
> impeding people trying to measure system performance on Arm based
> machines - all other archs' maintainers are fine with PMU drivers using
> dmesg.

Good for them, although I'm pretty sure that at least the x86 folks are
against this crap too.

> > Anyway, I think this driver has bigger problems that need addressing.
> 
> To me it represents yet another PMU driver submission - as the years go
> by - that is lacking in the user messaging area.  Which reminds me, can
> you take another look at applying this?:

As I said before, I'm not going to take anything that logs above pr_debug
for things that are directly triggerable from userspace. Spin a version
using pr_debug and I'll queue it.

Have a good weekend,

Will


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 15:37:20 +0100
Will Deacon  wrote:

> On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 10:30:27 +0100
> > Mark Rutland  wrote:
> > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > Ganapatrao Kulkarni  wrote:
> > > > 
> > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > 
> > > > This PMU driver can be made more user-friendly by not just silently
> > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > message describing the specific error via dmesg.
> > > 
> > > As has previously been discussed on several occasions, patches which log
> > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > of driver-specific constraints.
> > 
> > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> > 
> > > I would appreciate if in future you could qualify your suggestion with
> > > the requirement that pr_debug() is used.
> > 
> > It shouldn't - the driver isn't being debugged, it's in regular use.
> 
> For anything under drivers/perf/, I'd prefer not to have these prints
> and instead see efforts to improve error reporting via the perf system
> call interface.

We'd all prefer that, and for all PMU drivers, why should ones under
drivers/perf be treated differently?

As you are already aware, I've personally tried to fix this problem -
that has existed since before the introduction of the perf tool (I
consider it a syscall-independent enhanced error interface), multiple
times, and failed.

So until someone comes up with a solution that works for everyone
up to and including Linus Torvalds (who hasn't put up a problem
pulling PMU drivers emitting things to dmesg so far, by the way), this
keep PMU drivers' errors silent preference of yours is unnecessarily
impeding people trying to measure system performance on Arm based
machines - all other archs' maintainers are fine with PMU drivers using
dmesg.

> Anyway, I think this driver has bigger problems that need addressing.

To me it represents yet another PMU driver submission - as the years go
by - that is lacking in the user messaging area.  Which reminds me, can
you take another look at applying this?:

https://patchwork.kernel.org/patch/10068535/

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 15:37:20 +0100
Will Deacon  wrote:

> On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> > On Fri, 27 Apr 2018 10:30:27 +0100
> > Mark Rutland  wrote:
> > > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > > Ganapatrao Kulkarni  wrote:
> > > > 
> > > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > > 
> > > > This PMU driver can be made more user-friendly by not just silently
> > > > returning an error code such as -EINVAL, but by emitting a useful
> > > > message describing the specific error via dmesg.
> > > 
> > > As has previously been discussed on several occasions, patches which log
> > > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > > not acceptable -- dmesg is not intended as a mechanism to inform users
> > > of driver-specific constraints.
> > 
> > I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> > 
> > > I would appreciate if in future you could qualify your suggestion with
> > > the requirement that pr_debug() is used.
> > 
> > It shouldn't - the driver isn't being debugged, it's in regular use.
> 
> For anything under drivers/perf/, I'd prefer not to have these prints
> and instead see efforts to improve error reporting via the perf system
> call interface.

We'd all prefer that, and for all PMU drivers, why should ones under
drivers/perf be treated differently?

As you are already aware, I've personally tried to fix this problem -
that has existed since before the introduction of the perf tool (I
consider it a syscall-independent enhanced error interface), multiple
times, and failed.

So until someone comes up with a solution that works for everyone
up to and including Linus Torvalds (who hasn't put up a problem
pulling PMU drivers emitting things to dmesg so far, by the way), this
keep PMU drivers' errors silent preference of yours is unnecessarily
impeding people trying to measure system performance on Arm based
machines - all other archs' maintainers are fine with PMU drivers using
dmesg.

> Anyway, I think this driver has bigger problems that need addressing.

To me it represents yet another PMU driver submission - as the years go
by - that is lacking in the user messaging area.  Which reminds me, can
you take another look at applying this?:

https://patchwork.kernel.org/patch/10068535/

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Will Deacon
On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 10:30:27 +0100
> Mark Rutland  wrote:
> > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > Ganapatrao Kulkarni  wrote:
> > > 
> > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > 
> > > This PMU driver can be made more user-friendly by not just silently
> > > returning an error code such as -EINVAL, but by emitting a useful
> > > message describing the specific error via dmesg.
> > 
> > As has previously been discussed on several occasions, patches which log
> > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > not acceptable -- dmesg is not intended as a mechanism to inform users
> > of driver-specific constraints.
> 
> I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> 
> > I would appreciate if in future you could qualify your suggestion with
> > the requirement that pr_debug() is used.
> 
> It shouldn't - the driver isn't being debugged, it's in regular use.

For anything under drivers/perf/, I'd prefer not to have these prints
and instead see efforts to improve error reporting via the perf system
call interface.

Anyway, I think this driver has bigger problems that need addressing.

Will


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Will Deacon
On Fri, Apr 27, 2018 at 08:15:25AM -0500, Kim Phillips wrote:
> On Fri, 27 Apr 2018 10:30:27 +0100
> Mark Rutland  wrote:
> > On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > > On Wed, 25 Apr 2018 14:30:47 +0530
> > > Ganapatrao Kulkarni  wrote:
> > > 
> > > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> > 
> > > This PMU driver can be made more user-friendly by not just silently
> > > returning an error code such as -EINVAL, but by emitting a useful
> > > message describing the specific error via dmesg.
> > 
> > As has previously been discussed on several occasions, patches which log
> > to dmesg in a pmu::event_init() path at any level above pr_debug() are
> > not acceptable -- dmesg is not intended as a mechanism to inform users
> > of driver-specific constraints.
> 
> I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.
> 
> > I would appreciate if in future you could qualify your suggestion with
> > the requirement that pr_debug() is used.
> 
> It shouldn't - the driver isn't being debugged, it's in regular use.

For anything under drivers/perf/, I'd prefer not to have these prints
and instead see efforts to improve error reporting via the perf system
call interface.

Anyway, I think this driver has bigger problems that need addressing.

Will


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 10:30:27 +0100
Mark Rutland  wrote:

> Hi Kim,
> 
> On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > On Wed, 25 Apr 2018 14:30:47 +0530
> > Ganapatrao Kulkarni  wrote:
> > 
> > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> 
> > This PMU driver can be made more user-friendly by not just silently
> > returning an error code such as -EINVAL, but by emitting a useful
> > message describing the specific error via dmesg.
> 
> As has previously been discussed on several occasions, patches which log
> to dmesg in a pmu::event_init() path at any level above pr_debug() are
> not acceptable -- dmesg is not intended as a mechanism to inform users
> of driver-specific constraints.

I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.

> I would appreciate if in future you could qualify your suggestion with
> the requirement that pr_debug() is used.

It shouldn't - the driver isn't being debugged, it's in regular use.

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Kim Phillips
On Fri, 27 Apr 2018 10:30:27 +0100
Mark Rutland  wrote:

> Hi Kim,
> 
> On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> > On Wed, 25 Apr 2018 14:30:47 +0530
> > Ganapatrao Kulkarni  wrote:
> > 
> > > +static int thunderx2_uncore_event_init(struct perf_event *event)
> 
> > This PMU driver can be made more user-friendly by not just silently
> > returning an error code such as -EINVAL, but by emitting a useful
> > message describing the specific error via dmesg.
> 
> As has previously been discussed on several occasions, patches which log
> to dmesg in a pmu::event_init() path at any level above pr_debug() are
> not acceptable -- dmesg is not intended as a mechanism to inform users
> of driver-specific constraints.

I disagree - drivers do it all the time, using dev_err(), dev_warn(), etc.

> I would appreciate if in future you could qualify your suggestion with
> the requirement that pr_debug() is used.

It shouldn't - the driver isn't being debugged, it's in regular use.

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Mark Rutland
Hi Kim,

On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> On Wed, 25 Apr 2018 14:30:47 +0530
> Ganapatrao Kulkarni  wrote:
> 
> > +static int thunderx2_uncore_event_init(struct perf_event *event)

> This PMU driver can be made more user-friendly by not just silently
> returning an error code such as -EINVAL, but by emitting a useful
> message describing the specific error via dmesg.

As has previously been discussed on several occasions, patches which log
to dmesg in a pmu::event_init() path at any level above pr_debug() are
not acceptable -- dmesg is not intended as a mechanism to inform users
of driver-specific constraints.

I would appreciate if in future you could qualify your suggestion with
the requirement that pr_debug() is used.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-27 Thread Mark Rutland
Hi Kim,

On Thu, Apr 26, 2018 at 05:06:24PM -0500, Kim Phillips wrote:
> On Wed, 25 Apr 2018 14:30:47 +0530
> Ganapatrao Kulkarni  wrote:
> 
> > +static int thunderx2_uncore_event_init(struct perf_event *event)

> This PMU driver can be made more user-friendly by not just silently
> returning an error code such as -EINVAL, but by emitting a useful
> message describing the specific error via dmesg.

As has previously been discussed on several occasions, patches which log
to dmesg in a pmu::event_init() path at any level above pr_debug() are
not acceptable -- dmesg is not intended as a mechanism to inform users
of driver-specific constraints.

I would appreciate if in future you could qualify your suggestion with
the requirement that pr_debug() is used.

Thanks,
Mark.


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-26 Thread Kim Phillips
On Wed, 25 Apr 2018 14:30:47 +0530
Ganapatrao Kulkarni  wrote:

> +static int thunderx2_uncore_event_init(struct perf_event *event)
...
> + /*
> +  * SOC PMU counters are shared across all cores.
> +  * Therefore, it does not support per-process mode.
> +  * Also, it does not support event sampling mode.
> +  */
> + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> + return -EINVAL;
> +
> + /* SOC counters do not have usr/os/guest/host bits */
> + if (event->attr.exclude_user || event->attr.exclude_kernel ||
> + event->attr.exclude_host || event->attr.exclude_guest)
> + return -EINVAL;
> +
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> +
> + if (!pmu_uncore)
> + return -ENODEV;
> +
> + /* Pick first online cpu from the node */
> + event->cpu = cpumask_first(
> + cpumask_of_node(pmu_uncore->uncore_dev->node));
> +
> + if (event->cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + if (event->attr.config >= pmu_uncore->uncore_dev->max_events)
> + return -EINVAL;
> +
> + /* store event id */
> + hwc->config = event->attr.config;
> +
> + /* Validate the group */
> + if (!thunderx2_uncore_validate_event_group(event))
> + return -EINVAL;

This PMU driver can be made more user-friendly by not just silently
returning an error code such as -EINVAL, but by emitting a useful
message describing the specific error via dmesg.

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-26 Thread Kim Phillips
On Wed, 25 Apr 2018 14:30:47 +0530
Ganapatrao Kulkarni  wrote:

> +static int thunderx2_uncore_event_init(struct perf_event *event)
...
> + /*
> +  * SOC PMU counters are shared across all cores.
> +  * Therefore, it does not support per-process mode.
> +  * Also, it does not support event sampling mode.
> +  */
> + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> + return -EINVAL;
> +
> + /* SOC counters do not have usr/os/guest/host bits */
> + if (event->attr.exclude_user || event->attr.exclude_kernel ||
> + event->attr.exclude_host || event->attr.exclude_guest)
> + return -EINVAL;
> +
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> +
> + if (!pmu_uncore)
> + return -ENODEV;
> +
> + /* Pick first online cpu from the node */
> + event->cpu = cpumask_first(
> + cpumask_of_node(pmu_uncore->uncore_dev->node));
> +
> + if (event->cpu >= nr_cpu_ids)
> + return -EINVAL;
> +
> + if (event->attr.config >= pmu_uncore->uncore_dev->max_events)
> + return -EINVAL;
> +
> + /* store event id */
> + hwc->config = event->attr.config;
> +
> + /* Validate the group */
> + if (!thunderx2_uncore_validate_event_group(event))
> + return -EINVAL;

This PMU driver can be made more user-friendly by not just silently
returning an error code such as -EINVAL, but by emitting a useful
message describing the specific error via dmesg.

Thanks,

Kim


Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-26 Thread Mark Rutland
Hi,

On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
> +
> +/* L3c and DMC has 16 and 8 channels per socket respectively.
> + * Each Channel supports UNCORE PMU device and consists of
> + * 4 independent programmable counters. Counters are 32 bit
> + * and does not support overflow interrupt, they needs to be
> + * sampled before overflow(i.e, at every 2 seconds).
> + */
> +
> +#define UNCORE_MAX_COUNTERS  4
> +#define UNCORE_L3_MAX_TILES  16
> +#define UNCORE_DMC_MAX_CHANNELS  8
> +
> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)

How was a period of two seconds chosen?

What's the maximum clock speed for the L3C and DMC?

Given that all channels compete for access to the muxed register
interface, I suspect we need to try more often than once every 2
seconds...

[...]

> +struct active_timer {
> + struct perf_event *event;
> + struct hrtimer hrtimer;
> +};
> +
> +/*
> + * pmu on each socket has 2 uncore devices(dmc and l3),
> + * each uncore device has up to 16 channels, each channel can sample
> + * events independently with counters up to 4.
> + *
> + * struct thunderx2_pmu_uncore_channel created per channel.
> + * struct thunderx2_pmu_uncore_dev per uncore device.
> + */
> +struct thunderx2_pmu_uncore_channel {
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + struct pmu pmu;

Can we put the pmu first in the struct, please?

> + int counter;

AFAICT, this counter field is never used.

> + int channel;
> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
> + struct active_timer *active_timers;

You should only need a single timer per channel, rather than one per
event.

I think you can get rid of the active_timer structure, and have:

struct perf_event *events[UNCORE_MAX_COUNTERS];
struct hrtimer timer;

> + /* to sync counter alloc/release */
> + raw_spinlock_t lock;
> +};
> +
> +struct thunderx2_pmu_uncore_dev {
> + char *name;
> + struct device *dev;
> + enum thunderx2_uncore_type type;
> + unsigned long base;

This should be:

void __iomem *base;

[...]

> +static ssize_t cpumask_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpumask cpu_mask;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
> +
> + /* Pick first online cpu from the node */
> + cpumask_clear(_mask);
> + cpumask_set_cpu(cpumask_first(
> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
> + _mask);
> +
> + return cpumap_print_to_pagebuf(true, buf, _mask);
> +}

AFAICT cpumask_of_node() returns a mask that can include offline CPUs.

Regardless, I don't think that you can keep track of the management CPU
this way. Please keep track of the CPU the PMU should be managed by,
either with a cpumask or int embedded within the PMU structure.

At hotplug time, you'll need to update the management CPU, calling
perf_pmu_migrate_context() when it is offlined.

[...]

> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> +{
> + int counter;
> +
> + raw_spin_lock(_uncore->lock);
> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> + pmu_uncore->uncore_dev->max_counters);
> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> + raw_spin_unlock(_uncore->lock);
> + return -ENOSPC;
> + }
> + set_bit(counter, pmu_uncore->counter_mask);
> + raw_spin_unlock(_uncore->lock);
> + return counter;
> +}
> +
> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
> + int counter)
> +{
> + raw_spin_lock(_uncore->lock);
> + clear_bit(counter, pmu_uncore->counter_mask);
> + raw_spin_unlock(_uncore->lock);
> +}

I don't believe that locking is required in either of these, as the perf
core serializes pmu::add() and pmu::del(), where these get called.

> +
> +/*
> + * DMC and L3 counter interface is muxed across all channels.
> + * hence we need to select the channel before accessing counter
> + * data/control registers.

Are there separate interfaces for all-dmc-channels and all-l3c-channels?

... or is a single interface used by all-dmc-and-l3c-channels?

> + *
> + *  L3 Tile and DMC channel selection is through SMC call
> + *  SMC call arguments,
> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
> + *   x2 = Node id

How do we map Linux node IDs to the firmware's view of node IDs?

I don't believe the two are necessarily the same -- Linux's node IDs are
a Linux-specific construct.

It would be much nicer if we could pass something based on the MPIDR,
which is a known HW construct, or if this implicitly affected the
current node.

It would be vastly more 

Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-26 Thread Mark Rutland
Hi,

On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
> +
> +/* L3c and DMC has 16 and 8 channels per socket respectively.
> + * Each Channel supports UNCORE PMU device and consists of
> + * 4 independent programmable counters. Counters are 32 bit
> + * and does not support overflow interrupt, they needs to be
> + * sampled before overflow(i.e, at every 2 seconds).
> + */
> +
> +#define UNCORE_MAX_COUNTERS  4
> +#define UNCORE_L3_MAX_TILES  16
> +#define UNCORE_DMC_MAX_CHANNELS  8
> +
> +#define UNCORE_HRTIMER_INTERVAL  (2 * NSEC_PER_SEC)

How was a period of two seconds chosen?

What's the maximum clock speed for the L3C and DMC?

Given that all channels compete for access to the muxed register
interface, I suspect we need to try more often than once every 2
seconds...

[...]

> +struct active_timer {
> + struct perf_event *event;
> + struct hrtimer hrtimer;
> +};
> +
> +/*
> + * pmu on each socket has 2 uncore devices(dmc and l3),
> + * each uncore device has up to 16 channels, each channel can sample
> + * events independently with counters up to 4.
> + *
> + * struct thunderx2_pmu_uncore_channel created per channel.
> + * struct thunderx2_pmu_uncore_dev per uncore device.
> + */
> +struct thunderx2_pmu_uncore_channel {
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + struct pmu pmu;

Can we put the pmu first in the struct, please?

> + int counter;

AFAICT, this counter field is never used.

> + int channel;
> + DECLARE_BITMAP(counter_mask, UNCORE_MAX_COUNTERS);
> + struct active_timer *active_timers;

You should only need a single timer per channel, rather than one per
event.

I think you can get rid of the active_timer structure, and have:

struct perf_event *events[UNCORE_MAX_COUNTERS];
struct hrtimer timer;

> + /* to sync counter alloc/release */
> + raw_spinlock_t lock;
> +};
> +
> +struct thunderx2_pmu_uncore_dev {
> + char *name;
> + struct device *dev;
> + enum thunderx2_uncore_type type;
> + unsigned long base;

This should be:

void __iomem *base;

[...]

> +static ssize_t cpumask_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpumask cpu_mask;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
> +
> + /* Pick first online cpu from the node */
> + cpumask_clear(_mask);
> + cpumask_set_cpu(cpumask_first(
> + cpumask_of_node(pmu_uncore->uncore_dev->node)),
> + _mask);
> +
> + return cpumap_print_to_pagebuf(true, buf, _mask);
> +}

AFAICT cpumask_of_node() returns a mask that can include offline CPUs.

Regardless, I don't think that you can keep track of the management CPU
this way. Please keep track of the CPU the PMU should be managed by,
either with a cpumask or int embedded within the PMU structure.

At hotplug time, you'll need to update the management CPU, calling
perf_pmu_migrate_context() when it is offlined.

[...]

> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> +{
> + int counter;
> +
> + raw_spin_lock(_uncore->lock);
> + counter = find_first_zero_bit(pmu_uncore->counter_mask,
> + pmu_uncore->uncore_dev->max_counters);
> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> + raw_spin_unlock(_uncore->lock);
> + return -ENOSPC;
> + }
> + set_bit(counter, pmu_uncore->counter_mask);
> + raw_spin_unlock(_uncore->lock);
> + return counter;
> +}
> +
> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
> + int counter)
> +{
> + raw_spin_lock(_uncore->lock);
> + clear_bit(counter, pmu_uncore->counter_mask);
> + raw_spin_unlock(_uncore->lock);
> +}

I don't believe that locking is required in either of these, as the perf
core serializes pmu::add() and pmu::del(), where these get called.

> +
> +/*
> + * DMC and L3 counter interface is muxed across all channels.
> + * hence we need to select the channel before accessing counter
> + * data/control registers.

Are there separate interfaces for all-dmc-channels and all-l3c-channels?

... or is a single interface used by all-dmc-and-l3c-channels?

> + *
> + *  L3 Tile and DMC channel selection is through SMC call
> + *  SMC call arguments,
> + *   x0 = THUNDERX2_SMC_CALL_ID  (Vendor SMC call Id)
> + *   x1 = THUNDERX2_SMC_SET_CHANNEL  (Id to set DMC/L3C channel)
> + *   x2 = Node id

How do we map Linux node IDs to the firmware's view of node IDs?

I don't believe the two are necessarily the same -- Linux's node IDs are
a Linux-specific construct.

It would be much nicer if we could pass something based on the MPIDR,
which is a known HW construct, or if this implicitly affected the
current node.

It would be vastly more 

[PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-25 Thread Ganapatrao Kulkarni
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
Controller(DMC) and Level 3 Cache(L3C).

ThunderX2 has 8 independent DMC PMUs to capture performance events
corresponding to 8 channels of DDR4 Memory Controller and 16 independent
L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
Each PMU supports up to 4 counters. All counters lack overflow interrupt
and are sampled periodically.

Signed-off-by: Ganapatrao Kulkarni 
---
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 958 +++
 3 files changed, 967 insertions(+)
 create mode 100644 drivers/perf/thunderx2_pmu.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 28bb5a0..eafd0fc 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -85,6 +85,14 @@ config QCOM_L3_PMU
   Adds the L3 cache PMU into the perf events subsystem for
   monitoring L3 cache events.
 
+config THUNDERX2_PMU
+bool "Cavium ThunderX2 SoC PMU UNCORE"
+depends on ARCH_THUNDER2 && PERF_EVENTS && ACPI
+   help
+ Provides support for ThunderX2 UNCORE events.
+ The SoC has PMU support in its L3 cache controller (L3C) and
+ in the DDR4 Memory Controller (DMC).
+
 config XGENE_PMU
 depends on ARCH_XGENE
 bool "APM X-Gene SoC PMU"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index b3902bd..909f27f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
 obj-$(CONFIG_HISI_PMU) += hisilicon/
 obj-$(CONFIG_QCOM_L2_PMU)  += qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
 obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
new file mode 100644
index 000..92d19b5
--- /dev/null
+++ b/drivers/perf/thunderx2_pmu.c
@@ -0,0 +1,958 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CAVIUM THUNDERX2 SoC PMU UNCORE
+ *
+ * Copyright (C) 2018 Cavium Inc.
+ * Author: Ganapatrao Kulkarni 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* L3c and DMC has 16 and 8 channels per socket respectively.
+ * Each Channel supports UNCORE PMU device and consists of
+ * 4 independent programmable counters. Counters are 32 bit
+ * and does not support overflow interrupt, they needs to be
+ * sampled before overflow(i.e, at every 2 seconds).
+ */
+
+#define UNCORE_MAX_COUNTERS4
+#define UNCORE_L3_MAX_TILES16
+#define UNCORE_DMC_MAX_CHANNELS8
+
+#define UNCORE_HRTIMER_INTERVAL(2 * NSEC_PER_SEC)
+#define GET_EVENTID(ev)((ev->hw.config) & 0x1ff)
+#define GET_COUNTERID(ev)  ((ev->hw.idx) & 0xf)
+#define GET_CHANNELID(pmu_uncore)  (pmu_uncore->channel)
+
+#define DMC_COUNTER_CTL0x234
+#define DMC_COUNTER_DATA   0x240
+#define L3C_COUNTER_CTL0xA8
+#define L3C_COUNTER_DATA   0xAC
+
+#define THUNDERX2_SMC_CALL_ID  0xC200FF00
+#define THUNDERX2_SMC_SET_CHANNEL  0xB010
+
+
+enum thunderx2_uncore_l3_events {
+   L3_EVENT_NONE,
+   L3_EVENT_NBU_CANCEL,
+   L3_EVENT_DIB_RETRY,
+   L3_EVENT_DOB_RETRY,
+   L3_EVENT_DIB_CREDIT_RETRY,
+   L3_EVENT_DOB_CREDIT_RETRY,
+   L3_EVENT_FORCE_RETRY,
+   L3_EVENT_IDX_CONFLICT_RETRY,
+   L3_EVENT_EVICT_CONFLICT_RETRY,
+   L3_EVENT_BANK_CONFLICT_RETRY,
+   L3_EVENT_FILL_ENTRY_RETRY,
+   L3_EVENT_EVICT_NOT_READY_RETRY,
+   L3_EVENT_L3_RETRY,
+   L3_EVENT_READ_REQ,
+   L3_EVENT_WRITE_BACK_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_REQ,
+   L3_EVENT_INV_REQ,
+   L3_EVENT_SELF_REQ,
+   L3_EVENT_REQ,
+   L3_EVENT_EVICT_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_HIT,
+   L3_EVENT_INVALIDATE_HIT,
+   L3_EVENT_SELF_HIT,
+   L3_EVENT_READ_HIT,
+   L3_EVENT_MAX,
+};
+
+enum thunderx2_uncore_dmc_events {
+   DMC_EVENT_NONE,
+   DMC_EVENT_COUNT_CYCLES,
+   DMC_EVENT_RES2,
+   DMC_EVENT_RES3,
+   DMC_EVENT_RES4,
+   DMC_EVENT_RES5,
+   DMC_EVENT_RES6,
+   DMC_EVENT_RES7,
+   DMC_EVENT_RES8,
+   

[PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver

2018-04-25 Thread Ganapatrao Kulkarni
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
Controller(DMC) and Level 3 Cache(L3C).

ThunderX2 has 8 independent DMC PMUs to capture performance events
corresponding to 8 channels of DDR4 Memory Controller and 16 independent
L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
Each PMU supports up to 4 counters. All counters lack overflow interrupt
and are sampled periodically.

Signed-off-by: Ganapatrao Kulkarni 
---
 drivers/perf/Kconfig |   8 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 958 +++
 3 files changed, 967 insertions(+)
 create mode 100644 drivers/perf/thunderx2_pmu.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 28bb5a0..eafd0fc 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -85,6 +85,14 @@ config QCOM_L3_PMU
   Adds the L3 cache PMU into the perf events subsystem for
   monitoring L3 cache events.
 
+config THUNDERX2_PMU
+bool "Cavium ThunderX2 SoC PMU UNCORE"
+depends on ARCH_THUNDER2 && PERF_EVENTS && ACPI
+   help
+ Provides support for ThunderX2 UNCORE events.
+ The SoC has PMU support in its L3 cache controller (L3C) and
+ in the DDR4 Memory Controller (DMC).
+
 config XGENE_PMU
 depends on ARCH_XGENE
 bool "APM X-Gene SoC PMU"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index b3902bd..909f27f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
 obj-$(CONFIG_HISI_PMU) += hisilicon/
 obj-$(CONFIG_QCOM_L2_PMU)  += qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
 obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
new file mode 100644
index 000..92d19b5
--- /dev/null
+++ b/drivers/perf/thunderx2_pmu.c
@@ -0,0 +1,958 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CAVIUM THUNDERX2 SoC PMU UNCORE
+ *
+ * Copyright (C) 2018 Cavium Inc.
+ * Author: Ganapatrao Kulkarni 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/* L3c and DMC has 16 and 8 channels per socket respectively.
+ * Each Channel supports UNCORE PMU device and consists of
+ * 4 independent programmable counters. Counters are 32 bit
+ * and does not support overflow interrupt, they needs to be
+ * sampled before overflow(i.e, at every 2 seconds).
+ */
+
+#define UNCORE_MAX_COUNTERS4
+#define UNCORE_L3_MAX_TILES16
+#define UNCORE_DMC_MAX_CHANNELS8
+
+#define UNCORE_HRTIMER_INTERVAL(2 * NSEC_PER_SEC)
+#define GET_EVENTID(ev)((ev->hw.config) & 0x1ff)
+#define GET_COUNTERID(ev)  ((ev->hw.idx) & 0xf)
+#define GET_CHANNELID(pmu_uncore)  (pmu_uncore->channel)
+
+#define DMC_COUNTER_CTL0x234
+#define DMC_COUNTER_DATA   0x240
+#define L3C_COUNTER_CTL0xA8
+#define L3C_COUNTER_DATA   0xAC
+
+#define THUNDERX2_SMC_CALL_ID  0xC200FF00
+#define THUNDERX2_SMC_SET_CHANNEL  0xB010
+
+
+enum thunderx2_uncore_l3_events {
+   L3_EVENT_NONE,
+   L3_EVENT_NBU_CANCEL,
+   L3_EVENT_DIB_RETRY,
+   L3_EVENT_DOB_RETRY,
+   L3_EVENT_DIB_CREDIT_RETRY,
+   L3_EVENT_DOB_CREDIT_RETRY,
+   L3_EVENT_FORCE_RETRY,
+   L3_EVENT_IDX_CONFLICT_RETRY,
+   L3_EVENT_EVICT_CONFLICT_RETRY,
+   L3_EVENT_BANK_CONFLICT_RETRY,
+   L3_EVENT_FILL_ENTRY_RETRY,
+   L3_EVENT_EVICT_NOT_READY_RETRY,
+   L3_EVENT_L3_RETRY,
+   L3_EVENT_READ_REQ,
+   L3_EVENT_WRITE_BACK_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_REQ,
+   L3_EVENT_INV_REQ,
+   L3_EVENT_SELF_REQ,
+   L3_EVENT_REQ,
+   L3_EVENT_EVICT_REQ,
+   L3_EVENT_INVALIDATE_NWRITE_HIT,
+   L3_EVENT_INVALIDATE_HIT,
+   L3_EVENT_SELF_HIT,
+   L3_EVENT_READ_HIT,
+   L3_EVENT_MAX,
+};
+
+enum thunderx2_uncore_dmc_events {
+   DMC_EVENT_NONE,
+   DMC_EVENT_COUNT_CYCLES,
+   DMC_EVENT_RES2,
+   DMC_EVENT_RES3,
+   DMC_EVENT_RES4,
+   DMC_EVENT_RES5,
+   DMC_EVENT_RES6,
+   DMC_EVENT_RES7,
+   DMC_EVENT_RES8,
+   DMC_EVENT_READ_64B_TXNS,
+