Re: [PATCH 1/2] devicetree: sound: add binding for WM8974 codec

2015-12-16 Thread Måns Rullgård
Mark Brown <broo...@kernel.org> writes:

> On Wed, Dec 16, 2015 at 01:31:30PM +0000, Måns Rullgård wrote:
>
>> This is the 1/1 you were missing.
>
> I need the patch in a form I can apply.

I assumed your email client had some way of displaying the message I
replied to.  Guess I was wrong.

>> Am I the only one who is annoyed by scripts/get_maintainer.pl not
>> returning all the addresses it should in cases like this?  Is there some
>> trick I'm missing?
>
> You can't blindly rely on get_maintainers, it's prone to both false
> positives (CCing too many people, especially if you enable git matching
> when it often starts spamming people who are just doing global cleanups)
> and false negatives (if you don't enable git matching and it misses
> people who care about a specific driver).

So in short, I'm supposed to magically divine who wants to see what.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] ata: sata_dwc_460ex: add phy support

2015-12-16 Thread Måns Rullgård
Sergei Shtylyov <sergei.shtyl...@cogentembedded.com> writes:

> Hello.
>
> On 12/16/2015 2:25 AM, Mans Rullgard wrote:
>
>> This adds support for powering on an optional PHY when activating the
>> device.
>>
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
> [...]
>> diff --git a/drivers/ata/sata_dwc_460ex.c b/drivers/ata/sata_dwc_460ex.c
>> index 9985749..d07aae1 100644
>> --- a/drivers/ata/sata_dwc_460ex.c
>> +++ b/drivers/ata/sata_dwc_460ex.c
> [...]
>> @@ -1318,6 +1320,21 @@ static int sata_dwc_probe(struct platform_device 
>> *ofdev)
>>  }
>>   #endif
>>
>> +hsdev->phy = devm_phy_optional_get(hsdev->dev, "sata-phy");
>> +if (IS_ERR(hsdev->phy)) {
>> +err = PTR_ERR(hsdev->phy);
>> +hsdev->phy = NULL;
>> +goto error_out;
>> +}
>> +
>> +err = phy_init(hsdev->phy);
>> +if (err)
>> +goto error_out;
>
>If phy_init() fails, do we really need to call phy_exit()?

No, but it doesn't hurt either, and it makes the code slightly simpler.
I can change it though.

>> +
>> +err = phy_power_on(hsdev->phy);
>> +if (err)
>> +goto error_out;
>> +
>>  /*
>>   * Now, register with libATA core, this will also initiate the
>>   * device discovery process, invoking our port_start() handler &
>> @@ -1331,6 +1348,7 @@ static int sata_dwc_probe(struct platform_device 
>> *ofdev)
>>  return 0;
>>
>>   error_out:
>> +phy_exit(hsdev->phy);
>>  iounmap(base);
>>  return err;
>>   }
> [...]
>
> MBR, Sergei
>

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-15 Thread Måns Rullgård
Mans Rullgard  writes:

> Currently this driver only works with a DesignWare DMA engine which it
> registers manually using the second "reg" address range and interrupt
> number from the DT node.
>
> This patch makes the driver instead use the "dmas" property if present,
> otherwise optionally falling back on the old way so existing device
> trees can continue to work.
>
> With this change, there is no longer any reason to depend on the 460EX
> machine type so drop that from Kconfig.
>
> Signed-off-by: Mans Rullgard 
> ---
>  drivers/ata/Kconfig  |  10 ++-
>  drivers/ata/sata_dwc_460ex.c | 192 
> +++
>  2 files changed, 131 insertions(+), 71 deletions(-)

The corresponding patch for the canyonlands devicetree looks something
like this.  I don't have any such hardware or even a manual, so I don't
know what values to use for the various required DT properties of the
DMA controller node, nor can I test it.  The SATA driver works with a
different DMA controller on a Sigma Designs chip.

diff --git a/arch/powerpc/boot/dts/canyonlands.dts 
b/arch/powerpc/boot/dts/canyonlands.dts
index 3dc75de..959f36e 100644
--- a/arch/powerpc/boot/dts/canyonlands.dts
+++ b/arch/powerpc/boot/dts/canyonlands.dts
@@ -190,12 +190,22 @@
 /* DMA */ 0x2  0xc 0x4>;
};
 
+   DMA0: dma@bffd0800 {
+   compatible = "snps,dma-spear1340";
+   reg = <4 0xbffd0800 0x400>;
+   interrupt-parent = <>;
+   interrupts = <0x5 0x4>;
+   #dma-cells = <3>;
+   /* required properties here */
+   };
+
SATA0: sata@bffd1000 {
compatible = "amcc,sata-460ex";
-   reg = <4 0xbffd1000 0x800 4 0xbffd0800 0x400>;
+   reg = <4 0xbffd1000 0x800>;
interrupt-parent = <>;
-   interrupts = <0x0 0x4   /* SATA */
- 0x5 0x4>; /* AHBDMA */
+   interrupts = <0x0 0x4>;
+   dmas = < 0 0 1>;
+   dma-names = "sata-dma";
};
 
POB0: opb {


-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] i2c: xlr: fix extra read/write at end of rx transfer

2015-12-15 Thread Måns Rullgård
Wolfram Sang  writes:

> On Mon, Nov 02, 2015 at 02:03:37AM +, Mans Rullgard wrote:
>> The BYTECNT register holds the transfer size minus one.  Setting it
>> to the correct value requires a dummy read/write only for zero-length
>> transfers as it is impossible to request one from the hardware.  If a
>> zero-length transfer is requested, changing the length to 1 and setting
>> "buf" to a dummy location allows making the main loops less convoluted.
>> 
>> In other words, this patch makes the driver transfer the number of bytes
>> requested unless this is zero, which is not supported by the hardware,
>> in which case one byte is transferred instead.
>
> Uh, this is wrong, zero byte should really not transfer anything. We
> need to fix that and bail out, so probably something like
>
>   if (!len)
>   return -EOPNOTSUPP;

So the existing driver is wrong to allow it.  Makes sense to drop that.

> Also, the xlr_func() should mask out I2C_FUNC_SMBUS_QUICK.

OK.

> Other than that, the patch looks good to me.
>
> Out of curiosity, your first driver had the registers 32bit apart. Now
> you can deal with 8bit. Is this configurable on this SoC?

It's all 32 bits.  The XLR driver uses a u32 * to access the registers.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ata: sata_dwc_460ex: use "dmas" DT property to find dma channel

2015-12-15 Thread Måns Rullgård
Mans Rullgard <m...@mansr.com> writes:

> Currently this driver only works with a DesignWare DMA engine which it
> registers manually using the second "reg" address range and interrupt
> number from the DT node.
>
> This patch makes the driver instead use the "dmas" property if present,
> otherwise optionally falling back on the old way so existing device
> trees can continue to work.
>
> With this change, there is no longer any reason to depend on the 460EX
> machine type so drop that from Kconfig.
>
> Signed-off-by: Mans Rullgard <m...@mansr.com>
> ---
>  drivers/ata/Kconfig  |  10 ++-
>  drivers/ata/sata_dwc_460ex.c | 192 
> +++
>  2 files changed, 131 insertions(+), 71 deletions(-)

The corresponding patch for the canyonlands devicetree looks something
like this.  I don't have any such hardware or even a manual, so I don't
know what values to use for the various required DT properties of the
DMA controller node, nor can I test it.  The SATA driver works with a
different DMA controller on a Sigma Designs chip.

diff --git a/arch/powerpc/boot/dts/canyonlands.dts 
b/arch/powerpc/boot/dts/canyonlands.dts
index 3dc75de..959f36e 100644
--- a/arch/powerpc/boot/dts/canyonlands.dts
+++ b/arch/powerpc/boot/dts/canyonlands.dts
@@ -190,12 +190,22 @@
 /* DMA */ 0x2  0xc 0x4>;
};
 
+   DMA0: dma@bffd0800 {
+   compatible = "snps,dma-spear1340";
+   reg = <4 0xbffd0800 0x400>;
+   interrupt-parent = <>;
+   interrupts = <0x5 0x4>;
+   #dma-cells = <3>;
+   /* required properties here */
+   };
+
SATA0: sata@bffd1000 {
compatible = "amcc,sata-460ex";
-   reg = <4 0xbffd1000 0x800 4 0xbffd0800 0x400>;
+   reg = <4 0xbffd1000 0x800>;
interrupt-parent = <>;
-   interrupts = <0x0 0x4   /* SATA */
- 0x5 0x4>; /* AHBDMA */
+   interrupts = <0x0 0x4>;
+   dmas = < 0 0 1>;
+   dma-names = "sata-dma";
};
 
POB0: opb {


-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] i2c: xlr: fix extra read/write at end of rx transfer

2015-12-15 Thread Måns Rullgård
Wolfram Sang <w...@the-dreams.de> writes:

> On Mon, Nov 02, 2015 at 02:03:37AM +, Mans Rullgard wrote:
>> The BYTECNT register holds the transfer size minus one.  Setting it
>> to the correct value requires a dummy read/write only for zero-length
>> transfers as it is impossible to request one from the hardware.  If a
>> zero-length transfer is requested, changing the length to 1 and setting
>> "buf" to a dummy location allows making the main loops less convoluted.
>> 
>> In other words, this patch makes the driver transfer the number of bytes
>> requested unless this is zero, which is not supported by the hardware,
>> in which case one byte is transferred instead.
>
> Uh, this is wrong, zero byte should really not transfer anything. We
> need to fix that and bail out, so probably something like
>
>   if (!len)
>   return -EOPNOTSUPP;

So the existing driver is wrong to allow it.  Makes sense to drop that.

> Also, the xlr_func() should mask out I2C_FUNC_SMBUS_QUICK.

OK.

> Other than that, the patch looks good to me.
>
> Out of curiosity, your first driver had the registers 32bit apart. Now
> you can deal with 8bit. Is this configurable on this SoC?

It's all 32 bits.  The XLR driver uses a u32 * to access the registers.

-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ethernet: aurora: AURORA_NB8800 should depend on HAS_DMA

2015-12-07 Thread Måns Rullgård
Geert Uytterhoeven  writes:

> If NO_DMA=y:
>
> ERROR: "dma_map_single" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_unmap_page" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_sync_single_for_cpu" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_unmap_single" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_alloc_coherent" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_mapping_error" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_map_page" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_free_coherent" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
>
> Signed-off-by: Geert Uytterhoeven 

Acked-by: Mans Rullgard 

>  drivers/net/ethernet/aurora/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/aurora/Kconfig 
> b/drivers/net/ethernet/aurora/Kconfig
> index a3c7106fdf85e78f..8ba7f8ff3434000f 100644
> --- a/drivers/net/ethernet/aurora/Kconfig
> +++ b/drivers/net/ethernet/aurora/Kconfig
> @@ -13,6 +13,7 @@ if NET_VENDOR_AURORA
>
>  config AURORA_NB8800
>   tristate "Aurora AU-NB8800 support"
> + depends on HAS_DMA
>   select PHYLIB
>   help
>Support for the AU-NB8800 gigabit Ethernet controller.
> -- 
> 1.9.1
>

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ethernet: aurora: AURORA_NB8800 should depend on HAS_DMA

2015-12-07 Thread Måns Rullgård
Geert Uytterhoeven <ge...@linux-m68k.org> writes:

> If NO_DMA=y:
>
> ERROR: "dma_map_single" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_unmap_page" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_sync_single_for_cpu" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_unmap_single" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_alloc_coherent" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_mapping_error" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
> ERROR: "dma_map_page" [drivers/net/ethernet/aurora/nb8800.ko] undefined!
> ERROR: "dma_free_coherent" [drivers/net/ethernet/aurora/nb8800.ko] 
> undefined!
>
> Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org>

Acked-by: Mans Rullgard <m...@mansr.com>

>  drivers/net/ethernet/aurora/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/aurora/Kconfig 
> b/drivers/net/ethernet/aurora/Kconfig
> index a3c7106fdf85e78f..8ba7f8ff3434000f 100644
> --- a/drivers/net/ethernet/aurora/Kconfig
> +++ b/drivers/net/ethernet/aurora/Kconfig
> @@ -13,6 +13,7 @@ if NET_VENDOR_AURORA
>
>  config AURORA_NB8800
>   tristate "Aurora AU-NB8800 support"
> + depends on HAS_DMA
>   select PHYLIB
>   help
>Support for the AU-NB8800 gigabit Ethernet controller.
> -- 
> 1.9.1
>

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-26 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Thu, 26 Nov 2015, Måns Rullgård wrote:
>
>> Russell King - ARM Linux  writes:
>> 
>> > On Thu, Nov 26, 2015 at 12:50:08AM +, Måns Rullgård wrote:
>> >> If not calling the function saves an I-cache miss, the benefit can be
>> >> substantial.  No, I have no proof of this being a problem, but it's
>> >> something that could happen.
>> >
>> > That's a simplistic view of modern CPUs.
>> >
>> > As I've already said, modern CPUs which have branch prediction, but
>> > they also have speculative instruction fetching and speculative data
>> > prefetching - which the CPUs which have idiv support will have.
>> >
>> > With such features, the branch predictor is able to learn that the
>> > branch will be taken, and because of the speculative instruction
>> > fetching, it can bring the cache line in so that it has the
>> > instructions it needs with minimal or, if working correctly,
>> > without stalling the CPU pipeline.
>> 
>> It doesn't matter how many fancy features the CPU has.  Executing more
>> branches and using more cache lines puts additional pressure on those
>> resources, reducing overall performance.  Besides, the performance
>> counters readily show that the prediction is nothing near as perfect as
>> you seem to believe.
>
> OK... Let's try to come up with actual numbers.
>
> We know that letting gcc emit idiv by itself is the ultimate solution. 
> And it is free of maintenance on our side besides passing the 
> appropriate argument to gcc of course. So this is worth doing.
>
> For the case where you have a set of target machines in your kernel that 
> may or may not have idiv, then the first step should be to patch 
> __aeabi_uidiv and __aeabi_idiv.  This is a pretty small and simple 
> change that might turn out to be more than good enough. It is necessary 
> anyway as the full patching solution does not cover all cases.
>
> Then, IMHO, it would be a good idea to get performance numbers to 
> compare that first step and the full patching solution. Of course the 
> full patching will yield better performance. It has to. But if the 
> difference is not significant enough, then it might not be worth 
> introducing the implied complexity into mainline.  And it is not because 
> the approach is bad. In fact I think this is a very cool hack. But it 
> comes with a cost in maintenance and that cost has to be justified.
>
> Just to have an idea, I produced the attached micro benchmark. I tested 
> on a TC2 forced to a single Cortex-A15 core and I got those results:
>
> Testing INLINE_DIV ...
>
> real0m7.182s
> user0m7.170s
> sys 0m0.000s
>
> Testing PATCHED_DIV ...
>
> real0m7.181s
> user0m7.170s
> sys 0m0.000s
>
> Testing OUTOFLINE_DIV ...
>
> real0m7.181s
> user0m7.170s
> sys 0m0.005s
>
> Testing LIBGCC_DIV ...
>
> real0m18.659s
> user0m18.635s
> sys 0m0.000s
>
> As you can see, whether the div is inline or out-of-line, whether 
> arguments are moved into r0-r1 or not, makes no difference at all on a 
> Cortex-A15.
>
> Now forcing it onto a Cortex-A7 core:
>
> Testing INLINE_DIV ...
>
> real0m8.917s
> user0m8.895s
> sys 0m0.005s
>
> Testing PATCHED_DIV ...
>
> real0m11.666s
> user0m11.645s
> sys 0m0.000s
>
> Testing OUTOFLINE_DIV ...
>
> real0m13.065s
> user0m13.025s
> sys 0m0.000s
>
> Testing LIBGCC_DIV ...
>
> real    0m51.815s
> user0m51.750s
> sys 0m0.005s
>
> So on A cortex-A7 the various overheads become visible. How significant 
> is it in practice with normal kernel usage? I don't know.

Bear in mind that in a trivial test like this, everything fits in L1
caches and branch prediction works perfectly.  It would be more
informative to measure the effect on a load that already has some cache
and branch prediction misses.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-26 Thread Måns Rullgård
Mason  writes:

> On 25/11/2015 13:12, Måns Rullgård wrote:
>
>> Mason writes:
>> 
>>>> +  status_lo = intc_readl(chip, chip->ctl + IRQ_STATUS);
>>>> +  status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>>>
>>> In my local branch, I wrote:
>>>
>>> #define IRQ_CTL_LO  0
>>>
>>> status_lo = intc_readl(chip, chip->ctl + IRQ_CTL_LO + IRQ_STATUS);
>>> status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>>>
>>> (I'm a sucker for symmetry)
>> 
>> Nothing wrong with a little symmetry, though in this case I think the
>> extra macro only confuses matters.
>
> It's your call :-)
>
> In my mind, the fact that the status_lo register sits at offset 0 is
> just an accident. It's just that something has to sit at offset 0.
> (Maybe I should tell the HW guys to put nothing at offset 0, and start
> the actual register block at offset 4. /That/ would be unexpected.)
>
> Another way to look at it is:
>
> There are two 4-register blocks (LO and HI) each containing registers
> {status,rawstat,enableset,enableclr}.
>
> Block LO starts at offset 0x0
> Block HI starts at offset 0x18
>
> and then there are the intra offsets for the 4 registers in the block.

When I wrote it, I was thinking of IRQ_CTL_HI as the offset to add to a
low register to get the corresponding high one.  I think that's what you
said there.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-26 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> On 25/11/2015 13:12, Måns Rullgård wrote:
>
>> Mason writes:
>> 
>>>> +  status_lo = intc_readl(chip, chip->ctl + IRQ_STATUS);
>>>> +  status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>>>
>>> In my local branch, I wrote:
>>>
>>> #define IRQ_CTL_LO  0
>>>
>>> status_lo = intc_readl(chip, chip->ctl + IRQ_CTL_LO + IRQ_STATUS);
>>> status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>>>
>>> (I'm a sucker for symmetry)
>> 
>> Nothing wrong with a little symmetry, though in this case I think the
>> extra macro only confuses matters.
>
> It's your call :-)
>
> In my mind, the fact that the status_lo register sits at offset 0 is
> just an accident. It's just that something has to sit at offset 0.
> (Maybe I should tell the HW guys to put nothing at offset 0, and start
> the actual register block at offset 4. /That/ would be unexpected.)
>
> Another way to look at it is:
>
> There are two 4-register blocks (LO and HI) each containing registers
> {status,rawstat,enableset,enableclr}.
>
> Block LO starts at offset 0x0
> Block HI starts at offset 0x18
>
> and then there are the intra offsets for the 4 registers in the block.

When I wrote it, I was thinking of IRQ_CTL_HI as the offset to add to a
low register to get the corresponding high one.  I think that's what you
said there.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-26 Thread Måns Rullgård
Nicolas Pitre <n...@fluxnic.net> writes:

> On Thu, 26 Nov 2015, Måns Rullgård wrote:
>
>> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>> 
>> > On Thu, Nov 26, 2015 at 12:50:08AM +, Måns Rullgård wrote:
>> >> If not calling the function saves an I-cache miss, the benefit can be
>> >> substantial.  No, I have no proof of this being a problem, but it's
>> >> something that could happen.
>> >
>> > That's a simplistic view of modern CPUs.
>> >
>> > As I've already said, modern CPUs which have branch prediction, but
>> > they also have speculative instruction fetching and speculative data
>> > prefetching - which the CPUs which have idiv support will have.
>> >
>> > With such features, the branch predictor is able to learn that the
>> > branch will be taken, and because of the speculative instruction
>> > fetching, it can bring the cache line in so that it has the
>> > instructions it needs with minimal or, if working correctly,
>> > without stalling the CPU pipeline.
>> 
>> It doesn't matter how many fancy features the CPU has.  Executing more
>> branches and using more cache lines puts additional pressure on those
>> resources, reducing overall performance.  Besides, the performance
>> counters readily show that the prediction is nothing near as perfect as
>> you seem to believe.
>
> OK... Let's try to come up with actual numbers.
>
> We know that letting gcc emit idiv by itself is the ultimate solution. 
> And it is free of maintenance on our side besides passing the 
> appropriate argument to gcc of course. So this is worth doing.
>
> For the case where you have a set of target machines in your kernel that 
> may or may not have idiv, then the first step should be to patch 
> __aeabi_uidiv and __aeabi_idiv.  This is a pretty small and simple 
> change that might turn out to be more than good enough. It is necessary 
> anyway as the full patching solution does not cover all cases.
>
> Then, IMHO, it would be a good idea to get performance numbers to 
> compare that first step and the full patching solution. Of course the 
> full patching will yield better performance. It has to. But if the 
> difference is not significant enough, then it might not be worth 
> introducing the implied complexity into mainline.  And it is not because 
> the approach is bad. In fact I think this is a very cool hack. But it 
> comes with a cost in maintenance and that cost has to be justified.
>
> Just to have an idea, I produced the attached micro benchmark. I tested 
> on a TC2 forced to a single Cortex-A15 core and I got those results:
>
> Testing INLINE_DIV ...
>
> real0m7.182s
> user0m7.170s
> sys 0m0.000s
>
> Testing PATCHED_DIV ...
>
> real0m7.181s
> user0m7.170s
> sys 0m0.000s
>
> Testing OUTOFLINE_DIV ...
>
> real0m7.181s
> user0m7.170s
> sys 0m0.005s
>
> Testing LIBGCC_DIV ...
>
> real0m18.659s
> user0m18.635s
> sys 0m0.000s
>
> As you can see, whether the div is inline or out-of-line, whether 
> arguments are moved into r0-r1 or not, makes no difference at all on a 
> Cortex-A15.
>
> Now forcing it onto a Cortex-A7 core:
>
> Testing INLINE_DIV ...
>
> real0m8.917s
> user0m8.895s
> sys 0m0.005s
>
> Testing PATCHED_DIV ...
>
> real0m11.666s
> user0m11.645s
> sys 0m0.000s
>
> Testing OUTOFLINE_DIV ...
>
> real0m13.065s
> user0m13.025s
> sys 0m0.000s
>
> Testing LIBGCC_DIV ...
>
> real0m51.815s
> user0m51.750s
> sys 0m0.005s
>
> So on A cortex-A7 the various overheads become visible. How significant 
> is it in practice with normal kernel usage? I don't know.

Bear in mind that in a trivial test like this, everything fits in L1
caches and branch prediction works perfectly.  It would be more
informative to measure the effect on a load that already has some cache
and branch prediction misses.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Russell King - ARM Linux  writes:

> On Thu, Nov 26, 2015 at 12:50:08AM +0000, Måns Rullgård wrote:
>> If not calling the function saves an I-cache miss, the benefit can be
>> substantial.  No, I have no proof of this being a problem, but it's
>> something that could happen.
>
> That's a simplistic view of modern CPUs.
>
> As I've already said, modern CPUs which have branch prediction, but
> they also have speculative instruction fetching and speculative data
> prefetching - which the CPUs which have idiv support will have.
>
> With such features, the branch predictor is able to learn that the
> branch will be taken, and because of the speculative instruction
> fetching, it can bring the cache line in so that it has the
> instructions it needs with minimal or, if working correctly,
> without stalling the CPU pipeline.

It doesn't matter how many fancy features the CPU has.  Executing more
branches and using more cache lines puts additional pressure on those
resources, reducing overall performance.  Besides, the performance
counters readily show that the prediction is nothing near as perfect as
you seem to believe.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Thu, 26 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre  writes:
>> 
>> > 3) In fact I was wondering if the overhead of the branch and back is 
>> >really significant compared to the non trivial cost of a idiv 
>> >instruction and all the complex infrastructure required to patch 
>> >those branches directly, and consequently if the performance 
>> >difference is actually worth it versus simply doing (2) alone.
>> 
>> Depending on the operands, the div instruction can take as few as 3
>> cycles on a Cortex-A7.
>
> Even the current software based implementation can produce a result with 
> about 5 simple ALU instructions depending on the operands.
>
> The average cycle count is more important than the easy-way-out case. 
> And then how significant the two branches around it are compared to idiv 
> alone from direct patching of every call to it.

If not calling the function saves an I-cache miss, the benefit can be
substantial.  No, I have no proof of this being a problem, but it's
something that could happen.

Of course, none of this is going to be as good as letting the compiler
generate div instructions directly.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Wed, 25 Nov 2015, Stephen Boyd wrote:
>
>> The ARM compiler inserts calls to __aeabi_uidiv() and
>> __aeabi_idiv() when it needs to perform division on signed and
>> unsigned integers. If a processor has support for the udiv and
>> sdiv division instructions the calls to these support routines
>> can be replaced with those instructions. Now that recordmcount
>> records the locations of calls to these library functions in
>> two sections (one for udiv and one for sdiv), iterate over these
>> sections early at boot and patch the call sites with the
>> appropriate division instruction when we determine that the
>> processor supports the division instructions. Using the division
>> instructions should be faster and less power intensive than
>> running the support code.
>
> A few remarks:
>
> 1) The code assumes unconditional branches to __aeabi_idiv and 
>__aeabi_uidiv. What if there are conditional branches? Also, tail 
>call optimizations will generate a straight b opcode rather than a bl 
>and patching those will obviously have catastrophic results.  I think 
>you should validate the presence of a bl before patching over it.

I did a quick check on a compiled kernel I had nearby, and there are no
conditional or tail calls to those functions, so although they should
obviously be checked for correctness, performance is unlikely to matter
for those.

However, there are almost half as many calls to __aeabi_{u}idivmod as to
the plain div functions, 129 vs 228 for signed and unsigned combined.
For best results, these functions should also be patched with the
hardware instructions.  Obviously the call sites for these can't be
patched.

> 2) For those cases where a call to __aeabi_uidiv and __aeabi_idiv is not 
>patched due to (1), you could patch a uidiv/idiv plus "bx lr" 
>at those function entry points too.
>
> 3) In fact I was wondering if the overhead of the branch and back is 
>really significant compared to the non trivial cost of a idiv 
>instruction and all the complex infrastructure required to patch 
>those branches directly, and consequently if the performance 
>difference is actually worth it versus simply doing (2) alone.

Depending on the operands, the div instruction can take as few as 3
cycles on a Cortex-A7.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-25 Thread Måns Rullgård
Mason  writes:

>> +status_lo = intc_readl(chip, chip->ctl + IRQ_STATUS);
>> +status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>
> In my local branch, I wrote:
>
> #define IRQ_CTL_LO0
>
>   status_lo = intc_readl(chip, chip->ctl + IRQ_CTL_LO + IRQ_STATUS);
>   status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>
> (I'm a sucker for symmetry)

Nothing wrong with a little symmetry, though in this case I think the
extra macro only confuses matters.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-25 Thread Måns Rullgård
Mason  writes:

> [ Trimming CC list ]
>
> On 19/11/2015 19:33, Mans Rullgard wrote:
>
>> +config TANGOX_IRQ
>
> Could you drop the X?

Sure, if that's what people prefer.

> (And perhaps change IRQ to IRQCHIP? What's the current trend?)

Most of the existing entries say just IRQ or something completely ad
hoc, only 4 IRQCHIP.

>> +bool
>> +select IRQ_DOMAIN
>> +select GENERIC_IRQ_CHIP
>
> Could you sort alphabetically, like the mach Kconfig?

Tell that to all the other ones.  Oh well, whatever.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Nicolas Pitre <n...@fluxnic.net> writes:

> On Thu, 26 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre <n...@fluxnic.net> writes:
>> 
>> > 3) In fact I was wondering if the overhead of the branch and back is 
>> >really significant compared to the non trivial cost of a idiv 
>> >instruction and all the complex infrastructure required to patch 
>> >those branches directly, and consequently if the performance 
>> >difference is actually worth it versus simply doing (2) alone.
>> 
>> Depending on the operands, the div instruction can take as few as 3
>> cycles on a Cortex-A7.
>
> Even the current software based implementation can produce a result with 
> about 5 simple ALU instructions depending on the operands.
>
> The average cycle count is more important than the easy-way-out case. 
> And then how significant the two branches around it are compared to idiv 
> alone from direct patching of every call to it.

If not calling the function saves an I-cache miss, the benefit can be
substantial.  No, I have no proof of this being a problem, but it's
something that could happen.

Of course, none of this is going to be as good as letting the compiler
generate div instructions directly.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Nicolas Pitre <n...@fluxnic.net> writes:

> On Wed, 25 Nov 2015, Stephen Boyd wrote:
>
>> The ARM compiler inserts calls to __aeabi_uidiv() and
>> __aeabi_idiv() when it needs to perform division on signed and
>> unsigned integers. If a processor has support for the udiv and
>> sdiv division instructions the calls to these support routines
>> can be replaced with those instructions. Now that recordmcount
>> records the locations of calls to these library functions in
>> two sections (one for udiv and one for sdiv), iterate over these
>> sections early at boot and patch the call sites with the
>> appropriate division instruction when we determine that the
>> processor supports the division instructions. Using the division
>> instructions should be faster and less power intensive than
>> running the support code.
>
> A few remarks:
>
> 1) The code assumes unconditional branches to __aeabi_idiv and 
>__aeabi_uidiv. What if there are conditional branches? Also, tail 
>call optimizations will generate a straight b opcode rather than a bl 
>and patching those will obviously have catastrophic results.  I think 
>you should validate the presence of a bl before patching over it.

I did a quick check on a compiled kernel I had nearby, and there are no
conditional or tail calls to those functions, so although they should
obviously be checked for correctness, performance is unlikely to matter
for those.

However, there are almost half as many calls to __aeabi_{u}idivmod as to
the plain div functions, 129 vs 228 for signed and unsigned combined.
For best results, these functions should also be patched with the
hardware instructions.  Obviously the call sites for these can't be
patched.

> 2) For those cases where a call to __aeabi_uidiv and __aeabi_idiv is not 
>patched due to (1), you could patch a uidiv/idiv plus "bx lr" 
>at those function entry points too.
>
> 3) In fact I was wondering if the overhead of the branch and back is 
>really significant compared to the non trivial cost of a idiv 
>instruction and all the complex infrastructure required to patch 
>those branches directly, and consequently if the performance 
>difference is actually worth it versus simply doing (2) alone.

Depending on the operands, the div instruction can take as few as 3
cycles on a Cortex-A7.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-25 Thread Måns Rullgård
Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Thu, Nov 26, 2015 at 12:50:08AM +0000, Måns Rullgård wrote:
>> If not calling the function saves an I-cache miss, the benefit can be
>> substantial.  No, I have no proof of this being a problem, but it's
>> something that could happen.
>
> That's a simplistic view of modern CPUs.
>
> As I've already said, modern CPUs which have branch prediction, but
> they also have speculative instruction fetching and speculative data
> prefetching - which the CPUs which have idiv support will have.
>
> With such features, the branch predictor is able to learn that the
> branch will be taken, and because of the speculative instruction
> fetching, it can bring the cache line in so that it has the
> instructions it needs with minimal or, if working correctly,
> without stalling the CPU pipeline.

It doesn't matter how many fancy features the CPU has.  Executing more
branches and using more cache lines puts additional pressure on those
resources, reducing overall performance.  Besides, the performance
counters readily show that the prediction is nothing near as perfect as
you seem to believe.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-25 Thread Måns Rullgård
Mason <slash@free.fr> writes:

>> +status_lo = intc_readl(chip, chip->ctl + IRQ_STATUS);
>> +status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>
> In my local branch, I wrote:
>
> #define IRQ_CTL_LO0
>
>   status_lo = intc_readl(chip, chip->ctl + IRQ_CTL_LO + IRQ_STATUS);
>   status_hi = intc_readl(chip, chip->ctl + IRQ_CTL_HI + IRQ_STATUS);
>
> (I'm a sucker for symmetry)

Nothing wrong with a little symmetry, though in this case I think the
extra macro only confuses matters.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-25 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> [ Trimming CC list ]
>
> On 19/11/2015 19:33, Mans Rullgard wrote:
>
>> +config TANGOX_IRQ
>
> Could you drop the X?

Sure, if that's what people prefer.

> (And perhaps change IRQ to IRQCHIP? What's the current trend?)

Most of the existing entries say just IRQ or something completely ad
hoc, only 4 IRQCHIP.

>> +bool
>> +select IRQ_DOMAIN
>> +select GENERIC_IRQ_CHIP
>
> Could you sort alphabetically, like the mach Kconfig?

Tell that to all the other ones.  Oh well, whatever.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux  writes:

> On Tue, Nov 24, 2015 at 12:29:06PM +0000, Måns Rullgård wrote:
>> Russell King - ARM Linux  writes:
>> 
>> > On Tue, Nov 24, 2015 at 12:10:02PM +, Måns Rullgård wrote:
>> >> Russell King - ARM Linux  writes:
>> >> 
>> >> > On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> >> >> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> >> >> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> >> >> that do understand -mcpu=cortex-a15.
>> >> >
>> >> > That's not all.  The bigger problem is that there are toolchains out
>> >> > there which accept these options, but do _not_ support IDIV in either
>> >> > ARM or Thumb mode.  I'm afraid that makes it impossible to add this
>> >> > feature to the mainline kernel in this form: we need to run a test
>> >> > build to check that -march=armv7ve or what-not really does work
>> >> > through both GCC and GAS.
>> >> 
>> >> If the compiler accepts the option but doesn't actually emit any div
>> >> instructions, is there any real harm?
>> >
>> > That's not what I've found.  I've found that asking the assembler
>> > to accept idiv instructions appears to be ignored, which is something
>> > completely different.
>> >
>> > Further to this, what it comes down to is the stupid idea that the
>> > compiler should embed .arch / .cpu in the assembly output, which has
>> > the effect of overriding the command line arguments given to it via
>> > -Wa.  So, giving -Wa,-mcpu=cortex-a15 is a total no-op, because the
>> > first thing in the assembly output is:
>> >
>> >.arch armv7-a
>> >
>> > which kills off any attempt to set the assembly level ISA from the
>> > command line.
>> 
>> Oh, you mean the compiler knows about the instructions but the assembler
>> doesn't or isn't passed the right options.  It's infuriating when that
>> happens.
>
> No, that isn't what I meant.

Then what do you mean?  The compiler either emits the instructions or it
doesn't.  If it doesn't, what the assembler accepts is irrelevant.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux  writes:

> On Tue, Nov 24, 2015 at 12:10:02PM +0000, Måns Rullgård wrote:
>> Russell King - ARM Linux  writes:
>> 
>> > On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> >> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> >> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> >> that do understand -mcpu=cortex-a15.
>> >
>> > That's not all.  The bigger problem is that there are toolchains out
>> > there which accept these options, but do _not_ support IDIV in either
>> > ARM or Thumb mode.  I'm afraid that makes it impossible to add this
>> > feature to the mainline kernel in this form: we need to run a test
>> > build to check that -march=armv7ve or what-not really does work
>> > through both GCC and GAS.
>> 
>> If the compiler accepts the option but doesn't actually emit any div
>> instructions, is there any real harm?
>
> That's not what I've found.  I've found that asking the assembler
> to accept idiv instructions appears to be ignored, which is something
> completely different.
>
> Further to this, what it comes down to is the stupid idea that the
> compiler should embed .arch / .cpu in the assembly output, which has
> the effect of overriding the command line arguments given to it via
> -Wa.  So, giving -Wa,-mcpu=cortex-a15 is a total no-op, because the
> first thing in the assembly output is:
>
>   .arch armv7-a
>
> which kills off any attempt to set the assembly level ISA from the
> command line.

Oh, you mean the compiler knows about the instructions but the assembler
doesn't or isn't passed the right options.  It's infuriating when that
happens.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Arnd Bergmann  writes:

> On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
>> On 11/23, Arnd Bergmann wrote:
>> > On Monday 23 November 2015 13:32:06 Stephen Boyd wrote:
>> > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> > index b251013eef0a..bad6343c34d5 100644
>> > --- a/drivers/clocksource/Kconfig
>> > +++ b/drivers/clocksource/Kconfig
>> > @@ -324,8 +324,9 @@ config EM_TIMER_STI
>> >  such as EMEV2 from former NEC Electronics.
>> >  
>> >  config CLKSRC_QCOM
>> > -  bool "Qualcomm MSM timer" if COMPILE_TEST
>> > +  bool "Qualcomm MSM timer" if ARCH_QCOM || COMPILE_TEST
>> >depends on ARM
>> > +  default ARCH_QCOM
>> >select CLKSRC_OF
>> >help
>> >  This enables the clocksource and the per CPU clockevent driver for the
>> > 
>> > would make both of them equally configurable and not clutter up
>> > the Kconfig file when ARCH_QCOM is not selected. I've added
>> > Daniel Lezcano to Cc, he probably has an opinion on this too.
>> 
>> Yeah I think that architected timers are an outlier. I recall
>> some words from John Stultz that platforms should select the
>> clocksources they use, but maybe things have changed. For this
>> kind of thing I wouldn't mind putting it in the defconfig though.
>> I'll put the patches on the list to get the discussion started.
>
> Ok, thanks!
>
>> > This works perfectly for Cortex-A5, -A8, -A9, -A12, -A15, -A17, Brahma-B15,
>> > PJ4B-MP, Scorpion and Krait-450, which all clearly fall into one of
>> > the two other categories.
>> > 
>> > The two exceptions that don't quite fit are still "good enough":
>> > 
>> > - PJ4/PJ4B (not PJ4B-MP) has a different custom opcode for udiv and sdiv
>> >   in ARM mode. We don't support that with true multiplatform kernels
>> >   because those opcodes work nowhere else, though with your proposed
>> >   series we could easily do that for dynamic patching.
>> 
>> Do you have the information on these custom opcodes? I can work
>> that into the patches assuming the MIDR is different.
>
> Thomas Petazzoni said this in a private mail:
>
> | According to the datasheet, the PJ4B has integer signed and unsigned
> | divide, similar to the sdiv and udiv ARM instructions. But the way to
> | access it is by doing a MRC instruction.
> |
> |MRC p6, 1, Rd , CRn , CRm, 4
> |
> |for PJ4B is the same as:
> |
> |SDIV Rd , Rn, Rm
> |
> | on ARM cores.
> |
> |And:
> |
> |MRC p6, 1, Rd , CRn , CRm, 0
> |
> |for PJ4B is the same as:
> |
> |UDIV Rd , Rn, Rm
> |
> |on ARM cores.
> |
> |This is documented in the "Extended instructions" section of the
> |PJ4B datasheet.
>
> I assume what he meant was that this is true for both PJ4 and PJ4B
> but not for PJ4B-MP, which has the normal udiv/sdiv instructions.
>
> IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> while part 0x584 can use the sdiv/udiv that it reports correctly.

Or we could simply ignore those and they'd be no worse off than they are
now.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux  writes:

> On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> that do understand -mcpu=cortex-a15.
>
> That's not all.  The bigger problem is that there are toolchains out
> there which accept these options, but do _not_ support IDIV in either
> ARM or Thumb mode.  I'm afraid that makes it impossible to add this
> feature to the mainline kernel in this form: we need to run a test
> build to check that -march=armv7ve or what-not really does work
> through both GCC and GAS.

If the compiler accepts the option but doesn't actually emit any div
instructions, is there any real harm?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> that do understand -mcpu=cortex-a15.
>
> That's not all.  The bigger problem is that there are toolchains out
> there which accept these options, but do _not_ support IDIV in either
> ARM or Thumb mode.  I'm afraid that makes it impossible to add this
> feature to the mainline kernel in this form: we need to run a test
> build to check that -march=armv7ve or what-not really does work
> through both GCC and GAS.

If the compiler accepts the option but doesn't actually emit any div
instructions, is there any real harm?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Tue, Nov 24, 2015 at 12:10:02PM +0000, Måns Rullgård wrote:
>> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>> 
>> > On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> >> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> >> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> >> that do understand -mcpu=cortex-a15.
>> >
>> > That's not all.  The bigger problem is that there are toolchains out
>> > there which accept these options, but do _not_ support IDIV in either
>> > ARM or Thumb mode.  I'm afraid that makes it impossible to add this
>> > feature to the mainline kernel in this form: we need to run a test
>> > build to check that -march=armv7ve or what-not really does work
>> > through both GCC and GAS.
>> 
>> If the compiler accepts the option but doesn't actually emit any div
>> instructions, is there any real harm?
>
> That's not what I've found.  I've found that asking the assembler
> to accept idiv instructions appears to be ignored, which is something
> completely different.
>
> Further to this, what it comes down to is the stupid idea that the
> compiler should embed .arch / .cpu in the assembly output, which has
> the effect of overriding the command line arguments given to it via
> -Wa.  So, giving -Wa,-mcpu=cortex-a15 is a total no-op, because the
> first thing in the assembly output is:
>
>   .arch armv7-a
>
> which kills off any attempt to set the assembly level ISA from the
> command line.

Oh, you mean the compiler knows about the instructions but the assembler
doesn't or isn't passed the right options.  It's infuriating when that
happens.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Arnd Bergmann <a...@arndb.de> writes:

> On Monday 23 November 2015 15:13:52 Stephen Boyd wrote:
>> On 11/23, Arnd Bergmann wrote:
>> > On Monday 23 November 2015 13:32:06 Stephen Boyd wrote:
>> > diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> > index b251013eef0a..bad6343c34d5 100644
>> > --- a/drivers/clocksource/Kconfig
>> > +++ b/drivers/clocksource/Kconfig
>> > @@ -324,8 +324,9 @@ config EM_TIMER_STI
>> >  such as EMEV2 from former NEC Electronics.
>> >  
>> >  config CLKSRC_QCOM
>> > -  bool "Qualcomm MSM timer" if COMPILE_TEST
>> > +  bool "Qualcomm MSM timer" if ARCH_QCOM || COMPILE_TEST
>> >depends on ARM
>> > +  default ARCH_QCOM
>> >select CLKSRC_OF
>> >help
>> >  This enables the clocksource and the per CPU clockevent driver for the
>> > 
>> > would make both of them equally configurable and not clutter up
>> > the Kconfig file when ARCH_QCOM is not selected. I've added
>> > Daniel Lezcano to Cc, he probably has an opinion on this too.
>> 
>> Yeah I think that architected timers are an outlier. I recall
>> some words from John Stultz that platforms should select the
>> clocksources they use, but maybe things have changed. For this
>> kind of thing I wouldn't mind putting it in the defconfig though.
>> I'll put the patches on the list to get the discussion started.
>
> Ok, thanks!
>
>> > This works perfectly for Cortex-A5, -A8, -A9, -A12, -A15, -A17, Brahma-B15,
>> > PJ4B-MP, Scorpion and Krait-450, which all clearly fall into one of
>> > the two other categories.
>> > 
>> > The two exceptions that don't quite fit are still "good enough":
>> > 
>> > - PJ4/PJ4B (not PJ4B-MP) has a different custom opcode for udiv and sdiv
>> >   in ARM mode. We don't support that with true multiplatform kernels
>> >   because those opcodes work nowhere else, though with your proposed
>> >   series we could easily do that for dynamic patching.
>> 
>> Do you have the information on these custom opcodes? I can work
>> that into the patches assuming the MIDR is different.
>
> Thomas Petazzoni said this in a private mail:
>
> | According to the datasheet, the PJ4B has integer signed and unsigned
> | divide, similar to the sdiv and udiv ARM instructions. But the way to
> | access it is by doing a MRC instruction.
> |
> |MRC p6, 1, Rd , CRn , CRm, 4
> |
> |for PJ4B is the same as:
> |
> |SDIV Rd , Rn, Rm
> |
> | on ARM cores.
> |
> |And:
> |
> |MRC p6, 1, Rd , CRn , CRm, 0
> |
> |for PJ4B is the same as:
> |
> |UDIV Rd , Rn, Rm
> |
> |on ARM cores.
> |
> |This is documented in the "Extended instructions" section of the
> |PJ4B datasheet.
>
> I assume what he meant was that this is true for both PJ4 and PJ4B
> but not for PJ4B-MP, which has the normal udiv/sdiv instructions.
>
> IOW, anything with CPU implementer 0x56 part 0x581 should use those,
> while part 0x584 can use the sdiv/udiv that it reports correctly.

Or we could simply ignore those and they'd be no worse off than they are
now.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-24 Thread Måns Rullgård
Russell King - ARM Linux <li...@arm.linux.org.uk> writes:

> On Tue, Nov 24, 2015 at 12:29:06PM +0000, Måns Rullgård wrote:
>> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>> 
>> > On Tue, Nov 24, 2015 at 12:10:02PM +, Måns Rullgård wrote:
>> >> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
>> >> 
>> >> > On Tue, Nov 24, 2015 at 11:38:53AM +0100, Arnd Bergmann wrote:
>> >> >> I suggested using -mcpu=cortex-a15 because there are old gcc versions
>> >> >> that don't know about -march=armv7ve or -march=armv7-a+idiv yet, but
>> >> >> that do understand -mcpu=cortex-a15.
>> >> >
>> >> > That's not all.  The bigger problem is that there are toolchains out
>> >> > there which accept these options, but do _not_ support IDIV in either
>> >> > ARM or Thumb mode.  I'm afraid that makes it impossible to add this
>> >> > feature to the mainline kernel in this form: we need to run a test
>> >> > build to check that -march=armv7ve or what-not really does work
>> >> > through both GCC and GAS.
>> >> 
>> >> If the compiler accepts the option but doesn't actually emit any div
>> >> instructions, is there any real harm?
>> >
>> > That's not what I've found.  I've found that asking the assembler
>> > to accept idiv instructions appears to be ignored, which is something
>> > completely different.
>> >
>> > Further to this, what it comes down to is the stupid idea that the
>> > compiler should embed .arch / .cpu in the assembly output, which has
>> > the effect of overriding the command line arguments given to it via
>> > -Wa.  So, giving -Wa,-mcpu=cortex-a15 is a total no-op, because the
>> > first thing in the assembly output is:
>> >
>> >.arch armv7-a
>> >
>> > which kills off any attempt to set the assembly level ISA from the
>> > command line.
>> 
>> Oh, you mean the compiler knows about the instructions but the assembler
>> doesn't or isn't passed the right options.  It's infuriating when that
>> happens.
>
> No, that isn't what I meant.

Then what do you mean?  The compiler either emits the instructions or it
doesn't.  If it doesn't, what the assembler accepts is irrelevant.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 3/3] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-23 Thread Måns Rullgård
Stephen Boyd  writes:

> On 11/21, Måns Rullgård wrote:
>> Stephen Boyd  writes:
>> 
>> > +static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym 
>> > *sym)
>> > +{
>> > +  extern char __aeabi_uidiv[], __aeabi_idiv[];
>> > +  unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
>> > +  unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
>> > +  unsigned int udiv_insn, sdiv_insn, mask;
>> > +
>> > +  if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
>> > +  mask = HWCAP_IDIVT;
>> > +  udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
>> > +  sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
>> > +  } else {
>> > +  mask = HWCAP_IDIVA;
>> > +  udiv_insn = __opcode_to_mem_arm(0xe730f110);
>> > +  sdiv_insn = __opcode_to_mem_arm(0xe710f110);
>> > +  }
>> > +
>> > +  if (elf_hwcap & mask) {
>> > +  if (sym->st_value == udiv_addr) {
>> > +  *(u32 *)loc = udiv_insn;
>> > +  return 1;
>> > +  } else if (sym->st_value == sdiv_addr) {
>> > +  *(u32 *)loc = sdiv_insn;
>> > +  return 1;
>> > +  }
>> > +  }
>> > +
>> > +  return 0;
>> > +}
>> 
>> [...]
>> 
>> > +static void __init patch_aeabi_uidiv(void)
>> > +{
>> > +  extern unsigned long *__start_udiv_loc[], *__stop_udiv_loc[];
>> > +  extern unsigned long *__start_idiv_loc[], *__stop_idiv_loc[];
>> > +  unsigned long **p;
>> > +  unsigned int udiv_insn, sdiv_insn, mask;
>> > +
>> > +  if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
>> > +  mask = HWCAP_IDIVT;
>> > +  udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
>> > +  sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
>> > +  } else {
>> > +  mask = HWCAP_IDIVA;
>> > +  udiv_insn = __opcode_to_mem_arm(0xe730f110);
>> > +  sdiv_insn = __opcode_to_mem_arm(0xe710f110);
>> > +  }
>> > +
>> > +  if (elf_hwcap & mask) {
>> > +  for (p = __start_udiv_loc; p < __stop_udiv_loc; p++) {
>> > +  unsigned long *inst = *p;
>> > +  *inst = udiv_insn;
>> > +  }
>> > +  for (p = __start_idiv_loc; p < __stop_idiv_loc; p++) {
>> > +  unsigned long *inst = *p;
>> > +  *inst = sdiv_insn;
>> > +  }
>> > +  }
>> > +}
>> 
>> These functions are rather similar.  Perhaps they could be combined
>> somehow.
>> 
>
> Yes. I have this patch on top, just haven't folded it in because
> it doesn't reduce the lines of code.

I don't see any reason to split it anyhow.  The end result isn't any
harder to understand than the intermediate.

> 8<
> From: Stephen Boyd 
> Subject: [PATCH] consolidate with module code
>
> Signed-off-by: Stephen Boyd 
> ---
>  arch/arm/include/asm/setup.h |  3 +++
>  arch/arm/kernel/module.c | 16 +
>  arch/arm/kernel/setup.c  | 54 
> +++-
>  3 files changed, 42 insertions(+), 31 deletions(-)
>
> diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
> index e0adb9f1bf94..3f251cdb94ef 100644
> --- a/arch/arm/include/asm/setup.h
> +++ b/arch/arm/include/asm/setup.h
> @@ -25,4 +25,7 @@ extern int arm_add_memory(u64 start, u64 size);
>  extern void early_print(const char *str, ...);
>  extern void dump_machine_table(void);
>
> +extern void patch_uidiv(void *addr, size_t size);
> +extern void patch_idiv(void *addr, size_t size);

Why not call things sdiv and udiv like the actual instructions?

>  #endif
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 064e6ae60e08..684a68f1085b 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -22,6 +22,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,24 +59,19 @@ static int module_patch_aeabi_uidiv(unsigned long loc, 
> const Elf32_Sym *sym)
>   extern char __aeabi_uidiv[], __aeabi_idiv[];
>   unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
>   unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
> - unsigned int udiv_insn, sdiv_insn, mask;
> + unsigned int mask;
>
> - if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> + if (IS_ENABLED(CONFIG_THUMB2

Re: [RFC/PATCH 3/3] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-23 Thread Måns Rullgård
Stephen Boyd <sb...@codeaurora.org> writes:

> On 11/21, Måns Rullgård wrote:
>> Stephen Boyd <sb...@codeaurora.org> writes:
>> 
>> > +static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym 
>> > *sym)
>> > +{
>> > +  extern char __aeabi_uidiv[], __aeabi_idiv[];
>> > +  unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
>> > +  unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
>> > +  unsigned int udiv_insn, sdiv_insn, mask;
>> > +
>> > +  if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
>> > +  mask = HWCAP_IDIVT;
>> > +  udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
>> > +  sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
>> > +  } else {
>> > +  mask = HWCAP_IDIVA;
>> > +  udiv_insn = __opcode_to_mem_arm(0xe730f110);
>> > +  sdiv_insn = __opcode_to_mem_arm(0xe710f110);
>> > +  }
>> > +
>> > +  if (elf_hwcap & mask) {
>> > +  if (sym->st_value == udiv_addr) {
>> > +  *(u32 *)loc = udiv_insn;
>> > +  return 1;
>> > +  } else if (sym->st_value == sdiv_addr) {
>> > +  *(u32 *)loc = sdiv_insn;
>> > +  return 1;
>> > +  }
>> > +  }
>> > +
>> > +  return 0;
>> > +}
>> 
>> [...]
>> 
>> > +static void __init patch_aeabi_uidiv(void)
>> > +{
>> > +  extern unsigned long *__start_udiv_loc[], *__stop_udiv_loc[];
>> > +  extern unsigned long *__start_idiv_loc[], *__stop_idiv_loc[];
>> > +  unsigned long **p;
>> > +  unsigned int udiv_insn, sdiv_insn, mask;
>> > +
>> > +  if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
>> > +  mask = HWCAP_IDIVT;
>> > +  udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
>> > +  sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
>> > +  } else {
>> > +  mask = HWCAP_IDIVA;
>> > +  udiv_insn = __opcode_to_mem_arm(0xe730f110);
>> > +  sdiv_insn = __opcode_to_mem_arm(0xe710f110);
>> > +  }
>> > +
>> > +  if (elf_hwcap & mask) {
>> > +  for (p = __start_udiv_loc; p < __stop_udiv_loc; p++) {
>> > +  unsigned long *inst = *p;
>> > +  *inst = udiv_insn;
>> > +  }
>> > +  for (p = __start_idiv_loc; p < __stop_idiv_loc; p++) {
>> > +  unsigned long *inst = *p;
>> > +  *inst = sdiv_insn;
>> > +  }
>> > +  }
>> > +}
>> 
>> These functions are rather similar.  Perhaps they could be combined
>> somehow.
>> 
>
> Yes. I have this patch on top, just haven't folded it in because
> it doesn't reduce the lines of code.

I don't see any reason to split it anyhow.  The end result isn't any
harder to understand than the intermediate.

> 8<
> From: Stephen Boyd <sb...@codeaurora.org>
> Subject: [PATCH] consolidate with module code
>
> Signed-off-by: Stephen Boyd <sb...@codeaurora.org>
> ---
>  arch/arm/include/asm/setup.h |  3 +++
>  arch/arm/kernel/module.c | 16 +
>  arch/arm/kernel/setup.c  | 54 
> +++-
>  3 files changed, 42 insertions(+), 31 deletions(-)
>
> diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
> index e0adb9f1bf94..3f251cdb94ef 100644
> --- a/arch/arm/include/asm/setup.h
> +++ b/arch/arm/include/asm/setup.h
> @@ -25,4 +25,7 @@ extern int arm_add_memory(u64 start, u64 size);
>  extern void early_print(const char *str, ...);
>  extern void dump_machine_table(void);
>
> +extern void patch_uidiv(void *addr, size_t size);
> +extern void patch_idiv(void *addr, size_t size);

Why not call things sdiv and udiv like the actual instructions?

>  #endif
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index 064e6ae60e08..684a68f1085b 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -22,6 +22,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -58,24 +59,19 @@ static int module_patch_aeabi_uidiv(unsigned long loc, 
> const Elf32_Sym *sym)
>   extern char __aeabi_uidiv[], __aeabi_idiv[];
>   unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
>   unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
> - unsigned int udiv_insn, sdiv_insn, mask;
> + unsigned int mask;
>

Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-22 Thread Måns Rullgård
Arnd Bergmann  writes:

> arnd@wuerfel:/tmp$ arm-linux-gnueabihf-gcc -Wall -O2 -mcpu=cortex-a15 idiv.c 
> -c -o idiv-arm.o
> arnd@wuerfel:/tmp$ objdump -dr idiv-arm.o   
>
> idiv-arm.o: file format elf32-littlearm
>
> Disassembly of section .text:
>
>  :
>0:   fbb0 f0f1   udivr0, r0, r1
>4:   4770bx  lr
>6:   bf00nop
>
> 0008 :
>8:   fb90 f0f1   sdivr0, r0, r1
>c:   4770bx  lr
>e:   bf00nop

Your compiler seems to default to thumb so you should add -marm.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-22 Thread Måns Rullgård
Arnd Bergmann  writes:

> On Sunday 22 November 2015 13:29:29 Peter Maydell wrote:
>> On 21 November 2015 at 23:21, Arnd Bergmann  wrote:
>> > Regarding PJ4, it's still unclear whether that has the same
>> > problem and it only reports idivt when it actually supports idiva,
>> > or whether the lack of idiva support on PJ4 is instead the reason
>> > why the ARM ARM was updated to have separate flags.
>> 
>> SDIV/IDIV were originally introduced for R and M profile only
>> and there the Thumb encodings of SDIV/IDIV are mandatory
>> whereas the ARM ones are optional (and weren't initially
>> defined at all). So if you're looking for CPUs with only the
>> Thumb encodings I would try checking older R profile cores
>> like the Cortex-R4.
>
> The question is really about Marvell Dove, MMP and Armada 370,
> which are all based on PJ4 or PJ4B (CPU part : 0x581), so ARMv7-A
> and report idivt support but idiva.
>
> There are a couple of explanations here:
>
> a) Marvell really implemented only idivt but not idiva
>and reports it correctly, and the people from
>https://groups.google.com/a/dartlang.org/forum/#!topic/reviews/9wvsJvq0YYY
>just misinterpreted the flags
>
> b) the dartlag.org folks are correct, and it supports neither
>idivt nor idiva, and the /proc/cpuinfo flag is just wrong
>and requires a fixup
>
> c) like Krait, it actually implements both idiva and idivt but
>gets the reporting wrong.

It's trivial to test for someone who has one.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-22 Thread Måns Rullgård
Arnd Bergmann <a...@arndb.de> writes:

> On Sunday 22 November 2015 13:29:29 Peter Maydell wrote:
>> On 21 November 2015 at 23:21, Arnd Bergmann <a...@arndb.de> wrote:
>> > Regarding PJ4, it's still unclear whether that has the same
>> > problem and it only reports idivt when it actually supports idiva,
>> > or whether the lack of idiva support on PJ4 is instead the reason
>> > why the ARM ARM was updated to have separate flags.
>> 
>> SDIV/IDIV were originally introduced for R and M profile only
>> and there the Thumb encodings of SDIV/IDIV are mandatory
>> whereas the ARM ones are optional (and weren't initially
>> defined at all). So if you're looking for CPUs with only the
>> Thumb encodings I would try checking older R profile cores
>> like the Cortex-R4.
>
> The question is really about Marvell Dove, MMP and Armada 370,
> which are all based on PJ4 or PJ4B (CPU part : 0x581), so ARMv7-A
> and report idivt support but idiva.
>
> There are a couple of explanations here:
>
> a) Marvell really implemented only idivt but not idiva
>and reports it correctly, and the people from
>https://groups.google.com/a/dartlang.org/forum/#!topic/reviews/9wvsJvq0YYY
>just misinterpreted the flags
>
> b) the dartlag.org folks are correct, and it supports neither
>idivt nor idiva, and the /proc/cpuinfo flag is just wrong
>and requires a fixup
>
> c) like Krait, it actually implements both idiva and idivt but
>gets the reporting wrong.

It's trivial to test for someone who has one.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-22 Thread Måns Rullgård
Arnd Bergmann <a...@arndb.de> writes:

> arnd@wuerfel:/tmp$ arm-linux-gnueabihf-gcc -Wall -O2 -mcpu=cortex-a15 idiv.c 
> -c -o idiv-arm.o
> arnd@wuerfel:/tmp$ objdump -dr idiv-arm.o   
>
> idiv-arm.o: file format elf32-littlearm
>
> Disassembly of section .text:
>
>  :
>0:   fbb0 f0f1   udivr0, r0, r1
>4:   4770bx  lr
>6:   bf00nop
>
> 0008 :
>8:   fb90 f0f1   sdivr0, r0, r1
>c:   4770bx  lr
>e:   bf00nop

Your compiler seems to default to thumb so you should add -marm.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-21 Thread Måns Rullgård
Arnd Bergmann  writes:

> On Saturday 21 November 2015 20:45:38 Måns Rullgård wrote:
>> On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann  wrote:
>> >On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
>> >> This is a respin of a patch series from about a year ago[1]. I
>> >realized
>> >> that we already had most of the code in recordmcount to figure out
>> >> where we make calls to particular functions, so recording where
>> >> we make calls to the integer division functions should be easy enough
>> >> to add support for in the same codepaths. Looking back on the thread
>> >> it seems like Mans was thinking along the same lines, although it
>> >wasn't
>> >> obvious to me back then or even over the last few days when I wrote
>> >this.
>> >
>> >Shouldn't we start by allowing to build the kernel for -march=armv7ve
>> >on platforms that allow it? That would seem like a simpler change
>> >and likely generate better code for most people, except when you
>> >actually
>> >care about running the same binary kernel on older platforms.
>> >
>> >I tried to get a complete list of CPU cores with idiv, lpae and
>> >virtualization support at some point, but I don't remember the
>> >details for all Qualcomm and Marvell cores any more, to create the
>> >complete configuration matrix. IIRC, all CPUs that support
>> >virtualization also do lpae (they have to) and all CPUs that
>> >do lpae also do idiv, but the opposite is not true.
>> >
>> 
>> The ARM ARM says anything with virt has idiv, lpae doesn't matter.
>
> Ok, and anything with virt also has lpae by definition. The question is
> whether we care about using idiv on cores that do not have lpae, or that
> have neither lpae nor virt.

The question is, are there any such cores?  GCC doesn't know of any, but
then it's missing most non-ARM designs.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-21 Thread Måns Rullgård
On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann  wrote:
>On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
>> This is a respin of a patch series from about a year ago[1]. I
>realized
>> that we already had most of the code in recordmcount to figure out
>> where we make calls to particular functions, so recording where
>> we make calls to the integer division functions should be easy enough
>> to add support for in the same codepaths. Looking back on the thread
>> it seems like Mans was thinking along the same lines, although it
>wasn't
>> obvious to me back then or even over the last few days when I wrote
>this.
>
>Shouldn't we start by allowing to build the kernel for -march=armv7ve
>on platforms that allow it? That would seem like a simpler change
>and likely generate better code for most people, except when you
>actually
>care about running the same binary kernel on older platforms.
>
>I tried to get a complete list of CPU cores with idiv, lpae and
>virtualization support at some point, but I don't remember the
>details for all Qualcomm and Marvell cores any more, to create the
>complete configuration matrix. IIRC, all CPUs that support
>virtualization also do lpae (they have to) and all CPUs that
>do lpae also do idiv, but the opposite is not true.
>
>   Arnd

The ARM ARM says anything with virt has idiv, lpae doesn't matter. ARMv7-R also 
has idiv. I've no idea if anyone runs Linux on those though.
-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] i2c: xlr: add support for Sigma Designs controller variant

2015-11-21 Thread Måns Rullgård
Mans Rullgard  writes:

> Sigma Designs chips use a variant of this controller with the following
> differences:
>
> - The BUSY bit in the STATUS register is inverted
> - Bit 8 of the CONFIG register must be set
> - The controller can generate interrupts
>
> This patch adds support for the first two of these.  It also calculates
> and sets the correct clock divisor if a clk is provided.  The bus
> frequency is optionally speficied in the device tree node.
>
> Signed-off-by: Mans Rullgard 
> ---
>  drivers/i2c/busses/Kconfig   |  6 ++--
>  drivers/i2c/busses/i2c-xlr.c | 81 
> +---
>  2 files changed, 80 insertions(+), 7 deletions(-)

Any comments on these patches?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 3/3] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-21 Thread Måns Rullgård
Stephen Boyd  writes:

> +static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym *sym)
> +{
> + extern char __aeabi_uidiv[], __aeabi_idiv[];
> + unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
> + unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
> + unsigned int udiv_insn, sdiv_insn, mask;
> +
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> + mask = HWCAP_IDIVT;
> + udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
> + sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
> + } else {
> + mask = HWCAP_IDIVA;
> + udiv_insn = __opcode_to_mem_arm(0xe730f110);
> + sdiv_insn = __opcode_to_mem_arm(0xe710f110);
> + }
> +
> + if (elf_hwcap & mask) {
> + if (sym->st_value == udiv_addr) {
> + *(u32 *)loc = udiv_insn;
> + return 1;
> + } else if (sym->st_value == sdiv_addr) {
> + *(u32 *)loc = sdiv_insn;
> + return 1;
> + }
> + }
> +
> + return 0;
> +}

[...]

> +static void __init patch_aeabi_uidiv(void)
> +{
> + extern unsigned long *__start_udiv_loc[], *__stop_udiv_loc[];
> + extern unsigned long *__start_idiv_loc[], *__stop_idiv_loc[];
> + unsigned long **p;
> + unsigned int udiv_insn, sdiv_insn, mask;
> +
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> + mask = HWCAP_IDIVT;
> + udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
> + sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
> + } else {
> + mask = HWCAP_IDIVA;
> + udiv_insn = __opcode_to_mem_arm(0xe730f110);
> + sdiv_insn = __opcode_to_mem_arm(0xe710f110);
> + }
> +
> + if (elf_hwcap & mask) {
> + for (p = __start_udiv_loc; p < __stop_udiv_loc; p++) {
> + unsigned long *inst = *p;
> + *inst = udiv_insn;
> + }
> + for (p = __start_idiv_loc; p < __stop_idiv_loc; p++) {
> + unsigned long *inst = *p;
> + *inst = sdiv_insn;
> + }
> + }
> +}

These functions are rather similar.  Perhaps they could be combined
somehow.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-21 Thread Måns Rullgård
Arnd Bergmann <a...@arndb.de> writes:

> On Saturday 21 November 2015 20:45:38 Måns Rullgård wrote:
>> On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann <a...@arndb.de> wrote:
>> >On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
>> >> This is a respin of a patch series from about a year ago[1]. I
>> >realized
>> >> that we already had most of the code in recordmcount to figure out
>> >> where we make calls to particular functions, so recording where
>> >> we make calls to the integer division functions should be easy enough
>> >> to add support for in the same codepaths. Looking back on the thread
>> >> it seems like Mans was thinking along the same lines, although it
>> >wasn't
>> >> obvious to me back then or even over the last few days when I wrote
>> >this.
>> >
>> >Shouldn't we start by allowing to build the kernel for -march=armv7ve
>> >on platforms that allow it? That would seem like a simpler change
>> >and likely generate better code for most people, except when you
>> >actually
>> >care about running the same binary kernel on older platforms.
>> >
>> >I tried to get a complete list of CPU cores with idiv, lpae and
>> >virtualization support at some point, but I don't remember the
>> >details for all Qualcomm and Marvell cores any more, to create the
>> >complete configuration matrix. IIRC, all CPUs that support
>> >virtualization also do lpae (they have to) and all CPUs that
>> >do lpae also do idiv, but the opposite is not true.
>> >
>> 
>> The ARM ARM says anything with virt has idiv, lpae doesn't matter.
>
> Ok, and anything with virt also has lpae by definition. The question is
> whether we care about using idiv on cores that do not have lpae, or that
> have neither lpae nor virt.

The question is, are there any such cores?  GCC doesn't know of any, but
then it's missing most non-ARM designs.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 0/3] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2015-11-21 Thread Måns Rullgård
On 21 November 2015 20:39:58 GMT+00:00, Arnd Bergmann <a...@arndb.de> wrote:
>On Friday 20 November 2015 17:23:14 Stephen Boyd wrote:
>> This is a respin of a patch series from about a year ago[1]. I
>realized
>> that we already had most of the code in recordmcount to figure out
>> where we make calls to particular functions, so recording where
>> we make calls to the integer division functions should be easy enough
>> to add support for in the same codepaths. Looking back on the thread
>> it seems like Mans was thinking along the same lines, although it
>wasn't
>> obvious to me back then or even over the last few days when I wrote
>this.
>
>Shouldn't we start by allowing to build the kernel for -march=armv7ve
>on platforms that allow it? That would seem like a simpler change
>and likely generate better code for most people, except when you
>actually
>care about running the same binary kernel on older platforms.
>
>I tried to get a complete list of CPU cores with idiv, lpae and
>virtualization support at some point, but I don't remember the
>details for all Qualcomm and Marvell cores any more, to create the
>complete configuration matrix. IIRC, all CPUs that support
>virtualization also do lpae (they have to) and all CPUs that
>do lpae also do idiv, but the opposite is not true.
>
>   Arnd

The ARM ARM says anything with virt has idiv, lpae doesn't matter. ARMv7-R also 
has idiv. I've no idea if anyone runs Linux on those though.
-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] i2c: xlr: add support for Sigma Designs controller variant

2015-11-21 Thread Måns Rullgård
Mans Rullgard <m...@mansr.com> writes:

> Sigma Designs chips use a variant of this controller with the following
> differences:
>
> - The BUSY bit in the STATUS register is inverted
> - Bit 8 of the CONFIG register must be set
> - The controller can generate interrupts
>
> This patch adds support for the first two of these.  It also calculates
> and sets the correct clock divisor if a clk is provided.  The bus
> frequency is optionally speficied in the device tree node.
>
> Signed-off-by: Mans Rullgard <m...@mansr.com>
> ---
>  drivers/i2c/busses/Kconfig   |  6 ++--
>  drivers/i2c/busses/i2c-xlr.c | 81 
> +---
>  2 files changed, 80 insertions(+), 7 deletions(-)

Any comments on these patches?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC/PATCH 3/3] ARM: Replace calls to __aeabi_{u}idiv with udiv/sdiv instructions

2015-11-21 Thread Måns Rullgård
Stephen Boyd <sb...@codeaurora.org> writes:

> +static int module_patch_aeabi_uidiv(unsigned long loc, const Elf32_Sym *sym)
> +{
> + extern char __aeabi_uidiv[], __aeabi_idiv[];
> + unsigned long udiv_addr = (unsigned long)__aeabi_uidiv;
> + unsigned long sdiv_addr = (unsigned long)__aeabi_idiv;
> + unsigned int udiv_insn, sdiv_insn, mask;
> +
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> + mask = HWCAP_IDIVT;
> + udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
> + sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
> + } else {
> + mask = HWCAP_IDIVA;
> + udiv_insn = __opcode_to_mem_arm(0xe730f110);
> + sdiv_insn = __opcode_to_mem_arm(0xe710f110);
> + }
> +
> + if (elf_hwcap & mask) {
> + if (sym->st_value == udiv_addr) {
> + *(u32 *)loc = udiv_insn;
> + return 1;
> + } else if (sym->st_value == sdiv_addr) {
> + *(u32 *)loc = sdiv_insn;
> + return 1;
> + }
> + }
> +
> + return 0;
> +}

[...]

> +static void __init patch_aeabi_uidiv(void)
> +{
> + extern unsigned long *__start_udiv_loc[], *__stop_udiv_loc[];
> + extern unsigned long *__start_idiv_loc[], *__stop_idiv_loc[];
> + unsigned long **p;
> + unsigned int udiv_insn, sdiv_insn, mask;
> +
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) {
> + mask = HWCAP_IDIVT;
> + udiv_insn = __opcode_to_mem_thumb32(0xfbb0f0f1);
> + sdiv_insn = __opcode_to_mem_thumb32(0xfb90f0f1);
> + } else {
> + mask = HWCAP_IDIVA;
> + udiv_insn = __opcode_to_mem_arm(0xe730f110);
> + sdiv_insn = __opcode_to_mem_arm(0xe710f110);
> + }
> +
> + if (elf_hwcap & mask) {
> + for (p = __start_udiv_loc; p < __stop_udiv_loc; p++) {
> + unsigned long *inst = *p;
> + *inst = udiv_insn;
> + }
> + for (p = __start_idiv_loc; p < __stop_idiv_loc; p++) {
> + unsigned long *inst = *p;
> + *inst = sdiv_insn;
> + }
> + }
> +}

These functions are rather similar.  Perhaps they could be combined
somehow.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] devicetree: add binding for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Rob Herring  writes:

> On Thu, Nov 19, 2015 at 06:33:45PM +, Mans Rullgard wrote:
>> This adds a binding for the secondary interrupt controller in
>> Sigma Designs SMP86xx and SMP87xx chips.
>> 
>> Signed-off-by: Mans Rullgard 
>> ---
>>  .../interrupt-controller/sigma,smp8642-intc.txt| 47 
>> ++
>>  1 file changed, 47 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> 
>> diff --git 
>> a/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>>  
>> b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> new file mode 100644
>> index 000..f82cddf
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> @@ -0,0 +1,47 @@
>> +Sigma Designs SMP86xx/SMP87xx secondary interrupt controller
>> +
>> +Required properties:
>> +- compatible: should be "sigma,smp8642-intc"
>> +- reg: physical address of MMIO region
>> +- interrupt-parent: phandle of parent interrupt controller
>> +- interrupt-controller: boolean
>> +
>> +One child node per control block with properties:
>> +- sigma,reg-offset: offset of registers from main base address
>
> Your driver defines these offsets too.
>
> Do you expect to have many different variations here? If not, I would 
> get rid of all the child nodes and just hard code it in the driver.

Then how will other DT nodes reference the correct one?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clk: tango4: undefined CONFIG_ARCH_TANGOX

2015-11-20 Thread Måns Rullgård
Marc Gonzalez  writes:

> On 20/11/2015 09:50, Valentin Rothberg wrote:
>
>> your commit ed12dfc92f01 ("clk: tango4: clkgen driver for Tango4
>> platforms") has shown up in today's linux-next tree (i.e.,
>> next-20151120) adding the following build condition to the tango4 clk
>> driver:
>> 
>> drivers/clk/Makefile:45:obj-$(CONFIG_ARCH_TANGOX) += clk-tango4.o
>> 
>> However, ARCH_TANGOX is nowhere defined in Kconfig so that the driver
>> cannot be compiled at the current state.  I checked the LKML, and found
>> a bunch of patches referencing ARCH_TANGOX as well, but I could not find
>> any patch adding this option.
>> 
>> Is there a patch queued somewhere that adds ARCH_TANGOX?
>
> Hello Valentin,
>
> Platform support has not been accepted yet.
>
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/456280
>
> In fact, Kevin Hilman has pointed out that the arch should not
> be called TANGOX, because X is a wildcard.
>
> (However, several unrelated drivers have been submitted with
> TANGOX in the name. Is that a problem?)
>
> tango3 was a MIPS-based design
> tango4 is an ARM-based design (with one MIPS-based outlier).
> tango5 is an ARM-based design
>
> Although Mans is against the idea, I believe there should be one
> different clk driver for each arch.

It's essentially the same clock generator.  It should be a single
driver.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Mason  writes:

> On 19/11/2015 19:33, Mans Rullgard wrote:
>
>> This adds support for the secondary interrupt controller used in Sigma
>> Designs SMP86xx and SMP87xx chips.
>> 
>> Signed-off-by: Mans Rullgard 
>> ---
>>  drivers/irqchip/Kconfig  |   5 +
>>  drivers/irqchip/Makefile |   1 +
>>  drivers/irqchip/irq-tangox.c | 232 
>> +++
>>  3 files changed, 238 insertions(+)
>>  create mode 100644 drivers/irqchip/irq-tangox.c
>> 
>> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
>> index 4d7294e..baf3345 100644
>> --- a/drivers/irqchip/Kconfig
>> +++ b/drivers/irqchip/Kconfig
>> @@ -193,3 +193,8 @@ config IRQ_MXS
>>  def_bool y if MACH_ASM9260 || ARCH_MXS
>>  select IRQ_DOMAIN
>>  select STMP_DEVICE
>> +
>> +config TANGOX_IRQ
>
> Question: Kevin Hilman said I should use tango instead of tangox
> for the arch directory. Does that advice extend to Kconfig names?

I suppose the same logic applies to all names.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Marc Zyngier  writes:

>> +static void tangox_dispatch_irqs(struct irq_domain *dom, unsigned int 
>> status,
>> + int base)
>> +{
>> +unsigned int hwirq;
>> +unsigned int virq;
>> +
>> +while (status) {
>> +hwirq = __ffs(status);
>> +virq = irq_find_mapping(dom, base + hwirq);
>
> You may want to check virq in case you get interrupts from unexpected
> sources (unlikely, but still).

Sure, never hurts to be safe.

>> +generic_handle_irq(virq);
>> +status &= ~BIT(hwirq);
>> +}
>> +}

[...]

>> +static int __init tangox_irq_init(void __iomem *base, struct device_node 
>> *node)
>> +{
>> +struct tangox_irq_chip *chip;
>> +struct irq_domain *dom;
>> +const char *name;
>> +u32 ctl;
>> +int irq;
>> +int err;
>> +int i;
>> +
>> +irq = irq_of_parse_and_map(node, 0);
>> +if (!irq)
>> +panic("%s: failed to get IRQ", node->name);
>> +
>> +if (of_property_read_u32(node, "sigma,reg-offset", ))
>> +panic("%s: failed to get reg base", node->name);
>
> My DT foo is a bit crap, but I'm sure there is ways to express ranges
> inside a region that do not require to have vendor-specific properties.
> Mark?

I wasn't happy about that either.  The usual way is to use a "ranges"
property in the parent node and "reg" in the child node.  That makes it
easy to obtain a mapping of the child range using of_iomap() or
whatever.  The problem is that that's not what I need here.  The type
and ack registers are common while the enable/disable registers are per
sub-block, and the generic irqchip structs use a single base address and
offsets for the various registers, so I need the offset from the common
base to the start of the per-block registers, not the actual full
address.  I could use of_address_to_resource() and subtract one from the
other, I suppose.

>> +
>> +if (of_property_read_string(node, "label", ))
>> +name = node->name;
>
> Do you really need this cosmetic thing? node->name should be enough for
> everybody, and the "label" has nothing to do with the HW description.

No, it's not needed.  I'll get rid of it.

>> +
>> +chip = kzalloc(sizeof(*chip), GFP_KERNEL);
>> +chip->ctl = ctl;
>> +chip->base = base;
>> +
>> +dom = irq_domain_add_linear(node, 64, _generic_chip_ops, chip);
>> +if (!dom)
>> +panic("%s: failed to create irqdomain", node->name);
>> +
>> +err = irq_alloc_domain_generic_chips(dom, 32, 2, name, handle_level_irq,
>> + 0, 0, 0);
>> +if (err)
>> +panic("%s: failed to allocate irqchip", node->name);
>> +
>> +tangox_irq_domain_init(dom);
>> +
>> +for (i = 0; i < 64; i++)
>> +irq_create_mapping(dom, i);
>
> /me puzzled. What's that for? You really should never need something
> like this.

I had some reason for doing when I first wrote this code (MIPS, no DT),
but it's not needed now.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Marc Zyngier <marc.zyng...@arm.com> writes:

>> +static void tangox_dispatch_irqs(struct irq_domain *dom, unsigned int 
>> status,
>> + int base)
>> +{
>> +unsigned int hwirq;
>> +unsigned int virq;
>> +
>> +while (status) {
>> +hwirq = __ffs(status);
>> +virq = irq_find_mapping(dom, base + hwirq);
>
> You may want to check virq in case you get interrupts from unexpected
> sources (unlikely, but still).

Sure, never hurts to be safe.

>> +generic_handle_irq(virq);
>> +status &= ~BIT(hwirq);
>> +}
>> +}

[...]

>> +static int __init tangox_irq_init(void __iomem *base, struct device_node 
>> *node)
>> +{
>> +struct tangox_irq_chip *chip;
>> +struct irq_domain *dom;
>> +const char *name;
>> +u32 ctl;
>> +int irq;
>> +int err;
>> +int i;
>> +
>> +irq = irq_of_parse_and_map(node, 0);
>> +if (!irq)
>> +panic("%s: failed to get IRQ", node->name);
>> +
>> +if (of_property_read_u32(node, "sigma,reg-offset", ))
>> +panic("%s: failed to get reg base", node->name);
>
> My DT foo is a bit crap, but I'm sure there is ways to express ranges
> inside a region that do not require to have vendor-specific properties.
> Mark?

I wasn't happy about that either.  The usual way is to use a "ranges"
property in the parent node and "reg" in the child node.  That makes it
easy to obtain a mapping of the child range using of_iomap() or
whatever.  The problem is that that's not what I need here.  The type
and ack registers are common while the enable/disable registers are per
sub-block, and the generic irqchip structs use a single base address and
offsets for the various registers, so I need the offset from the common
base to the start of the per-block registers, not the actual full
address.  I could use of_address_to_resource() and subtract one from the
other, I suppose.

>> +
>> +if (of_property_read_string(node, "label", ))
>> +name = node->name;
>
> Do you really need this cosmetic thing? node->name should be enough for
> everybody, and the "label" has nothing to do with the HW description.

No, it's not needed.  I'll get rid of it.

>> +
>> +chip = kzalloc(sizeof(*chip), GFP_KERNEL);
>> +chip->ctl = ctl;
>> +chip->base = base;
>> +
>> +dom = irq_domain_add_linear(node, 64, _generic_chip_ops, chip);
>> +if (!dom)
>> +panic("%s: failed to create irqdomain", node->name);
>> +
>> +err = irq_alloc_domain_generic_chips(dom, 32, 2, name, handle_level_irq,
>> +     0, 0, 0);
>> +if (err)
>> +panic("%s: failed to allocate irqchip", node->name);
>> +
>> +tangox_irq_domain_init(dom);
>> +
>> +for (i = 0; i < 64; i++)
>> +irq_create_mapping(dom, i);
>
> /me puzzled. What's that for? You really should never need something
> like this.

I had some reason for doing when I first wrote this code (MIPS, no DT),
but it's not needed now.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] irqchip: add support for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> On 19/11/2015 19:33, Mans Rullgard wrote:
>
>> This adds support for the secondary interrupt controller used in Sigma
>> Designs SMP86xx and SMP87xx chips.
>> 
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>> ---
>>  drivers/irqchip/Kconfig  |   5 +
>>  drivers/irqchip/Makefile |   1 +
>>  drivers/irqchip/irq-tangox.c | 232 
>> +++
>>  3 files changed, 238 insertions(+)
>>  create mode 100644 drivers/irqchip/irq-tangox.c
>> 
>> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
>> index 4d7294e..baf3345 100644
>> --- a/drivers/irqchip/Kconfig
>> +++ b/drivers/irqchip/Kconfig
>> @@ -193,3 +193,8 @@ config IRQ_MXS
>>  def_bool y if MACH_ASM9260 || ARCH_MXS
>>  select IRQ_DOMAIN
>>  select STMP_DEVICE
>> +
>> +config TANGOX_IRQ
>
> Question: Kevin Hilman said I should use tango instead of tangox
> for the arch directory. Does that advice extend to Kconfig names?

I suppose the same logic applies to all names.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: clk: tango4: undefined CONFIG_ARCH_TANGOX

2015-11-20 Thread Måns Rullgård
Marc Gonzalez <marc_gonza...@sigmadesigns.com> writes:

> On 20/11/2015 09:50, Valentin Rothberg wrote:
>
>> your commit ed12dfc92f01 ("clk: tango4: clkgen driver for Tango4
>> platforms") has shown up in today's linux-next tree (i.e.,
>> next-20151120) adding the following build condition to the tango4 clk
>> driver:
>> 
>> drivers/clk/Makefile:45:obj-$(CONFIG_ARCH_TANGOX) += clk-tango4.o
>> 
>> However, ARCH_TANGOX is nowhere defined in Kconfig so that the driver
>> cannot be compiled at the current state.  I checked the LKML, and found
>> a bunch of patches referencing ARCH_TANGOX as well, but I could not find
>> any patch adding this option.
>> 
>> Is there a patch queued somewhere that adds ARCH_TANGOX?
>
> Hello Valentin,
>
> Platform support has not been accepted yet.
>
> http://thread.gmane.org/gmane.linux.ports.arm.kernel/456280
>
> In fact, Kevin Hilman has pointed out that the arch should not
> be called TANGOX, because X is a wildcard.
>
> (However, several unrelated drivers have been submitted with
> TANGOX in the name. Is that a problem?)
>
> tango3 was a MIPS-based design
> tango4 is an ARM-based design (with one MIPS-based outlier).
> tango5 is an ARM-based design
>
> Although Mans is against the idea, I believe there should be one
> different clk driver for each arch.

It's essentially the same clock generator.  It should be a single
driver.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] devicetree: add binding for Sigma Designs SMP86xx interrupt controller

2015-11-20 Thread Måns Rullgård
Rob Herring <r...@kernel.org> writes:

> On Thu, Nov 19, 2015 at 06:33:45PM +, Mans Rullgard wrote:
>> This adds a binding for the secondary interrupt controller in
>> Sigma Designs SMP86xx and SMP87xx chips.
>> 
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>> ---
>>  .../interrupt-controller/sigma,smp8642-intc.txt| 47 
>> ++
>>  1 file changed, 47 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> 
>> diff --git 
>> a/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>>  
>> b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> new file mode 100644
>> index 000..f82cddf
>> --- /dev/null
>> +++ 
>> b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp8642-intc.txt
>> @@ -0,0 +1,47 @@
>> +Sigma Designs SMP86xx/SMP87xx secondary interrupt controller
>> +
>> +Required properties:
>> +- compatible: should be "sigma,smp8642-intc"
>> +- reg: physical address of MMIO region
>> +- interrupt-parent: phandle of parent interrupt controller
>> +- interrupt-controller: boolean
>> +
>> +One child node per control block with properties:
>> +- sigma,reg-offset: offset of registers from main base address
>
> Your driver defines these offsets too.
>
> Do you expect to have many different variations here? If not, I would 
> get rid of all the child nodes and just hard code it in the driver.

Then how will other DT nodes reference the correct one?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] watchdog: add support for Sigma Designs SMP86xx/SMP87xx

2015-11-19 Thread Måns Rullgård
Guenter Roeck  writes:

> On 11/19/2015 02:09 PM, Mans Rullgard wrote:
>> This adds support for the Sigma Designs SMP86xx/SMP87xx family built-in
>> watchdog.
>>
>> Signed-off-by: Mans Rullgard 
>
> Reviewed-by: Guenter Roeck 
>
> Did you send the bindings document to some other mailing list (only) ?
> I see "PATCH 2/2" in some of your submissions, but I seem to be missing
> patch 1/2.

Blame scripts/get_maintaintainer.pl for that.  Anyway, here it is:
http://article.gmane.org/gmane.linux.drivers.devicetree/144423

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Thu, 19 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre  writes:
>> 
>> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
>> > +{
>> > +  unsigned long long res;
>> > +  unsigned int tmp = 0;
>> > +
>> > +  if (!bias) {
>> > +  asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "=" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
>> > +  res = m;
>> > +  asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "+" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else {
>> > +  asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
>> > +  "cmn%Q0, %Q2\n\t"
>> > +  "adcs   %R0, %R0, %R2\n\t"
>> > +  "adc%Q0, %1, #0"
>> > +  : "=" (res), "+" (tmp)
>> > +  : "r" (m), "r" (n)
>> 
>> Why is tmp using a +r constraint here?  The register is not written, so
>> using an input-only operand could/should result in better code.  That is
>> also what the old code did.
>
> No, it is worse. gcc allocates two registers because, somehow, it 
> doesn't think that the first one still holds zero after the first usage.  
> This way usage of only one temporary register is forced throughout, 
> producing better code.

Makes sense.  Thanks for explaining.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre  writes:

> +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
> +{
> + unsigned long long res;
> + unsigned int tmp = 0;
> +
> + if (!bias) {
> + asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "=" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + res = m;
> + asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "+" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
> + "cmn%Q0, %Q2\n\t"
> + "adcs   %R0, %R0, %R2\n\t"
> + "adc%Q0, %1, #0"
> + : "=" (res), "+" (tmp)
> + : "r" (m), "r" (n)

Why is tmp using a +r constraint here?  The register is not written, so
using an input-only operand could/should result in better code.  That is
also what the old code did.

> + : "cc");
> + }
> +
> + if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + asm (   "umlal  %R0, %Q0, %R1, %Q2\n\t"
> + "umlal  %R0, %Q0, %Q1, %R2\n\t"
> + "mov%R0, #0\n\t"
> + "umlal  %Q0, %R0, %R1, %R2"
> + : "+" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umlal  %R0, %Q0, %R2, %Q3\n\t"
> + "umlal  %R0, %1, %Q2, %R3\n\t"
> + "mov%R0, #0\n\t"
> + "adds   %Q0, %1, %Q0\n\t"
> + "adc%R0, %R0, #0\n\t"
> + "umlal  %Q0, %R0, %R2, %R3"
> + : "+" (res), "+" (tmp)
> + : "r" (m), "r" (n)
> + : "cc");
> + }
> +
> + return res;
> +}

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] watchdog: add support for Sigma Designs SMP86xx/SMP87xx

2015-11-19 Thread Måns Rullgård
Guenter Roeck  writes:

>> +writel(WD_CONFIG_XTAL_IN, dev->base + WD_CONFIG);
>
> What happens if the DISABLE bit was previously set (assuming
> that DISABLE means that the watchdog is disabled) ?
>
> Concern I guess would be the combination of DISABLE being set
> and WD_COUNTER at a value != 0 (say, 1). That might effectively
> auto-enable the watchdog or even reset the system.

There's no sane reason why it would be in that state, but then again
firmware authors are not sane.

How about checking the DISABLE bit, and if it's set turn off the
counter, otherwise leave it running?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] watchdog: add support for Sigma Designs SMP86xx/SMP87xx

2015-11-19 Thread Måns Rullgård
Guenter Roeck <li...@roeck-us.net> writes:

> On 11/19/2015 02:09 PM, Mans Rullgard wrote:
>> This adds support for the Sigma Designs SMP86xx/SMP87xx family built-in
>> watchdog.
>>
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>
> Reviewed-by: Guenter Roeck <li...@roeck-us.net>
>
> Did you send the bindings document to some other mailing list (only) ?
> I see "PATCH 2/2" in some of your submissions, but I seem to be missing
> patch 1/2.

Blame scripts/get_maintaintainer.pl for that.  Anyway, here it is:
http://article.gmane.org/gmane.linux.drivers.devicetree/144423

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] watchdog: add support for Sigma Designs SMP86xx/SMP87xx

2015-11-19 Thread Måns Rullgård
Guenter Roeck <li...@roeck-us.net> writes:

>> +writel(WD_CONFIG_XTAL_IN, dev->base + WD_CONFIG);
>
> What happens if the DISABLE bit was previously set (assuming
> that DISABLE means that the watchdog is disabled) ?
>
> Concern I guess would be the combination of DISABLE being set
> and WD_COUNTER at a value != 0 (say, 1). That might effectively
> auto-enable the watchdog or even reset the system.

There's no sane reason why it would be in that state, but then again
firmware authors are not sane.

How about checking the DISABLE bit, and if it's set turn off the
counter, otherwise leave it running?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre <nicolas.pi...@linaro.org> writes:

> +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
> +{
> + unsigned long long res;
> + unsigned int tmp = 0;
> +
> + if (!bias) {
> + asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "=" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + res = m;
> + asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
> + "mov%Q0, #0"
> + : "+" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
> + "cmn%Q0, %Q2\n\t"
> + "adcs   %R0, %R0, %R2\n\t"
> + "adc%Q0, %1, #0"
> + : "=" (res), "+" (tmp)
> + : "r" (m), "r" (n)

Why is tmp using a +r constraint here?  The register is not written, so
using an input-only operand could/should result in better code.  That is
also what the old code did.

> + : "cc");
> + }
> +
> + if (!(m & ((1ULL << 63) | (1ULL << 31 {
> + asm (   "umlal  %R0, %Q0, %R1, %Q2\n\t"
> + "umlal  %R0, %Q0, %Q1, %R2\n\t"
> + "mov%R0, #0\n\t"
> + "umlal  %Q0, %R0, %R1, %R2"
> + : "+" (res)
> + : "r" (m), "r" (n)
> + : "cc");
> + } else {
> + asm (   "umlal  %R0, %Q0, %R2, %Q3\n\t"
> + "umlal  %R0, %1, %Q2, %R3\n\t"
> + "mov%R0, #0\n\t"
> + "adds   %Q0, %1, %Q0\n\t"
> + "adc%R0, %R0, #0\n\t"
> + "umlal  %Q0, %R0, %R2, %R3"
> + : "+" (res), "+" (tmp)
> + : "r" (m), "r" (n)
> + : "cc");
> + }
> +
> + return res;
> +}

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] ARM: asm/div64.h: adjust to generic codde

2015-11-19 Thread Måns Rullgård
Nicolas Pitre <nicolas.pi...@linaro.org> writes:

> On Thu, 19 Nov 2015, Måns Rullgård wrote:
>
>> Nicolas Pitre <nicolas.pi...@linaro.org> writes:
>> 
>> > +static inline uint64_t __arch_xprod_64(uint64_t m, uint64_t n, bool bias)
>> > +{
>> > +  unsigned long long res;
>> > +  unsigned int tmp = 0;
>> > +
>> > +  if (!bias) {
>> > +  asm (   "umull  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "=" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else if (!(m & ((1ULL << 63) | (1ULL << 31 {
>> > +  res = m;
>> > +  asm (   "umlal  %Q0, %R0, %Q1, %Q2\n\t"
>> > +  "mov%Q0, #0"
>> > +  : "+" (res)
>> > +  : "r" (m), "r" (n)
>> > +  : "cc");
>> > +  } else {
>> > +  asm (   "umull  %Q0, %R0, %Q2, %Q3\n\t"
>> > +  "cmn%Q0, %Q2\n\t"
>> > +  "adcs   %R0, %R0, %R2\n\t"
>> > +  "adc%Q0, %1, #0"
>> > +  : "=" (res), "+" (tmp)
>> > +  : "r" (m), "r" (n)
>> 
>> Why is tmp using a +r constraint here?  The register is not written, so
>> using an input-only operand could/should result in better code.  That is
>> also what the old code did.
>
> No, it is worse. gcc allocates two registers because, somehow, it 
> doesn't think that the first one still holds zero after the first usage.  
> This way usage of only one temporary register is forced throughout, 
> producing better code.

Makes sense.  Thanks for explaining.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clocksource: Store reg field within struct clocksource

2015-11-18 Thread Måns Rullgård
Marc Gonzalez  writes:

> Since 'struct clocksource' is cacheline_aligned, gcc must insert
> a lot of padding between reg and clksrc in 'struct clocksource_mmio'
> (for example, L1_CACHE_BYTES = 64 on ARMv7).
>
> Storing reg within 'struct clocksource' removes unnecessary padding,
> and reg can then be grouped with other hot data.

Can you demonstrate a difference with this change?  Not saying it's bad,
but it's always good to have numbers.

> A nice side-effect of this patch is making container_of() unnecessary,
> which makes the code a bit simpler.

You really need to get used to that construct.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clocksource: Store reg field within struct clocksource

2015-11-18 Thread Måns Rullgård
Marc Gonzalez <marc_gonza...@sigmadesigns.com> writes:

> Since 'struct clocksource' is cacheline_aligned, gcc must insert
> a lot of padding between reg and clksrc in 'struct clocksource_mmio'
> (for example, L1_CACHE_BYTES = 64 on ARMv7).
>
> Storing reg within 'struct clocksource' removes unnecessary padding,
> and reg can then be grouped with other hot data.

Can you demonstrate a difference with this change?  Not saying it's bad,
but it's always good to have numbers.

> A nice side-effect of this patch is making container_of() unnecessary,
> which makes the code a bit simpler.

You really need to get used to that construct.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clocksource/drivers/tango-xtal: Replace code by clocksource_mmio_init

2015-11-17 Thread Måns Rullgård
Marc Gonzalez  writes:

> On 17/11/2015 13:22, Daniel Lezcano wrote:
>> On 11/13/2015 01:20 PM, Marc Gonzalez wrote:
>>> On 13/11/2015 11:58, Daniel Lezcano wrote:
>>>
>>>> The current code to initialize, register and read the clocksource is
>>>> already factored out in mmio.c via the clocksource_mmio_init function.
>>>>
>>>> Factor out the code with the clocksource_mmio_init function.
>>>
>>> The reason I didn't like clocksource_mmio_init() is because it exports
>>> 4 generic accessors.
>>>
>>> I guess this function makes more sense when all platforms are using it,
>>> in an ARCH_MULTIPLATFORM kernel. (Also the accessors are probably quite
>>> small, so the waste is probably minimal.)
>> 
>> Hi Marc,
>> 
>> it is not clear for me if you agree with this patch or not. Can you 
>> clarify ?
>
> [ Adding rmk as the original mmio.c author ]
>
> Hello Daniel,
>
> It's hard to give a straight answer. From my limited perspective (building
> a kernel for Sigma boards only) I feel that mmio.c brings nothing. (As you
> mentioned, it would be interesting to measure whether having reg_base in
> the ctx, rather than as a global is better or worse for perf, though the
> actual difference may be lost in the noise.)
>
> On the other hand, I can see how a different perspective, such as yours,
> may see benefits in having all drivers use the same APIs; and there may
> be other savings for ARCH_MULTIPLATFORM builds with lots of platforms.

I think the patch is good.  If there are issues with clocksource_mmio
those should addressed separately.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] usb: chipidea: removing of_find_property

2015-11-17 Thread Måns Rullgård
Saurabh Sengar  writes:

> call to of_find_property() before of_property_read_u32() is unnecessary.
> of_property_read_u32() anyway calls to of_find_property() only.
>
> Signed-off-by: Saurabh Sengar 
> ---
> v2: removed pval variable
>  drivers/usb/chipidea/core.c | 61 
> +++--
>  1 file changed, 26 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
> index 965d0e2..916a20d 100644
> --- a/drivers/usb/chipidea/core.c
> +++ b/drivers/usb/chipidea/core.c
> @@ -688,52 +688,43 @@ static int ci_get_platdata(struct device *dev,
>   if (usb_get_maximum_speed(dev) == USB_SPEED_FULL)
>   platdata->flags |= CI_HDRC_FORCE_FULLSPEED;
>
> - if (of_find_property(dev->of_node, "phy-clkgate-delay-us", NULL))
> - of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> -  >phy_clkgate_delay_us);
> + if (!of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> +  >phy_clkgate_delay_us))
>
>   platdata->itc_setting = 1;

Drop that if().  Since we're ignoring of_property_read_u32() failing,
there is no need to test its return value, and code above incorrectly
makes the next statement conditional.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: chipidea: removing of_find_property

2015-11-17 Thread Måns Rullgård
Saurabh Sengar  writes:

> call to of_find_property() before of_property_read_u32() is unnecessary.
> of_property_read_u32() anyway calls to of_find_property() only.
>
> Signed-off-by: Saurabh Sengar 
> ---
>  drivers/usb/chipidea/core.c | 67 
> ++---
>  1 file changed, 32 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
> index 965d0e2..8a4c22c 100644
> --- a/drivers/usb/chipidea/core.c
> +++ b/drivers/usb/chipidea/core.c
> @@ -643,6 +643,7 @@ static int ci_get_platdata(struct device *dev,
>   struct extcon_dev *ext_vbus, *ext_id;
>   struct ci_hdrc_cable *cable;
>   int ret;
> + u32 pval;
>
>   if (!platdata->phy_mode)
>   platdata->phy_mode = of_usb_get_phy_mode(dev->of_node);
> @@ -688,52 +689,48 @@ static int ci_get_platdata(struct device *dev,
>   if (usb_get_maximum_speed(dev) == USB_SPEED_FULL)
>   platdata->flags |= CI_HDRC_FORCE_FULLSPEED;
>
> - if (of_find_property(dev->of_node, "phy-clkgate-delay-us", NULL))
> - of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> -  >phy_clkgate_delay_us);
> + if (!of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> +  ))
> + platdata->phy_clkgate_delay_us = pval;

You don't need to use the pval temporary as of_property_read_u32 only
modifies the destination on success.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clocksource/drivers/tango-xtal: Replace code by clocksource_mmio_init

2015-11-17 Thread Måns Rullgård
Marc Gonzalez <marc_gonza...@sigmadesigns.com> writes:

> On 17/11/2015 13:22, Daniel Lezcano wrote:
>> On 11/13/2015 01:20 PM, Marc Gonzalez wrote:
>>> On 13/11/2015 11:58, Daniel Lezcano wrote:
>>>
>>>> The current code to initialize, register and read the clocksource is
>>>> already factored out in mmio.c via the clocksource_mmio_init function.
>>>>
>>>> Factor out the code with the clocksource_mmio_init function.
>>>
>>> The reason I didn't like clocksource_mmio_init() is because it exports
>>> 4 generic accessors.
>>>
>>> I guess this function makes more sense when all platforms are using it,
>>> in an ARCH_MULTIPLATFORM kernel. (Also the accessors are probably quite
>>> small, so the waste is probably minimal.)
>> 
>> Hi Marc,
>> 
>> it is not clear for me if you agree with this patch or not. Can you 
>> clarify ?
>
> [ Adding rmk as the original mmio.c author ]
>
> Hello Daniel,
>
> It's hard to give a straight answer. From my limited perspective (building
> a kernel for Sigma boards only) I feel that mmio.c brings nothing. (As you
> mentioned, it would be interesting to measure whether having reg_base in
> the ctx, rather than as a global is better or worse for perf, though the
> actual difference may be lost in the noise.)
>
> On the other hand, I can see how a different perspective, such as yours,
> may see benefits in having all drivers use the same APIs; and there may
> be other savings for ARCH_MULTIPLATFORM builds with lots of platforms.

I think the patch is good.  If there are issues with clocksource_mmio
those should addressed separately.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: chipidea: removing of_find_property

2015-11-17 Thread Måns Rullgård
Saurabh Sengar <saurabh.tr...@gmail.com> writes:

> call to of_find_property() before of_property_read_u32() is unnecessary.
> of_property_read_u32() anyway calls to of_find_property() only.
>
> Signed-off-by: Saurabh Sengar <saurabh.tr...@gmail.com>
> ---
>  drivers/usb/chipidea/core.c | 67 
> ++---
>  1 file changed, 32 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
> index 965d0e2..8a4c22c 100644
> --- a/drivers/usb/chipidea/core.c
> +++ b/drivers/usb/chipidea/core.c
> @@ -643,6 +643,7 @@ static int ci_get_platdata(struct device *dev,
>   struct extcon_dev *ext_vbus, *ext_id;
>   struct ci_hdrc_cable *cable;
>   int ret;
> + u32 pval;
>
>   if (!platdata->phy_mode)
>   platdata->phy_mode = of_usb_get_phy_mode(dev->of_node);
> @@ -688,52 +689,48 @@ static int ci_get_platdata(struct device *dev,
>   if (usb_get_maximum_speed(dev) == USB_SPEED_FULL)
>   platdata->flags |= CI_HDRC_FORCE_FULLSPEED;
>
> - if (of_find_property(dev->of_node, "phy-clkgate-delay-us", NULL))
> - of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> -  >phy_clkgate_delay_us);
> + if (!of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> +  ))
> +     platdata->phy_clkgate_delay_us = pval;

You don't need to use the pval temporary as of_property_read_u32 only
modifies the destination on success.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] usb: chipidea: removing of_find_property

2015-11-17 Thread Måns Rullgård
Saurabh Sengar <saurabh.tr...@gmail.com> writes:

> call to of_find_property() before of_property_read_u32() is unnecessary.
> of_property_read_u32() anyway calls to of_find_property() only.
>
> Signed-off-by: Saurabh Sengar <saurabh.tr...@gmail.com>
> ---
> v2: removed pval variable
>  drivers/usb/chipidea/core.c | 61 
> +++--
>  1 file changed, 26 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c
> index 965d0e2..916a20d 100644
> --- a/drivers/usb/chipidea/core.c
> +++ b/drivers/usb/chipidea/core.c
> @@ -688,52 +688,43 @@ static int ci_get_platdata(struct device *dev,
>   if (usb_get_maximum_speed(dev) == USB_SPEED_FULL)
>   platdata->flags |= CI_HDRC_FORCE_FULLSPEED;
>
> - if (of_find_property(dev->of_node, "phy-clkgate-delay-us", NULL))
> - of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> -  >phy_clkgate_delay_us);
> + if (!of_property_read_u32(dev->of_node, "phy-clkgate-delay-us",
> +  >phy_clkgate_delay_us))
>
>   platdata->itc_setting = 1;

Drop that if().  Since we're ignoring of_property_read_u32() failing,
there is no need to test its return value, and code above incorrectly
makes the next statement conditional.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-16 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Mon, 16 Nov 2015 20:59:18 +
>
>> Anything else?
>
> Sorry, when I find one problem I give you the feedback for that
> and move on to the 100s of other patches I have in my queue.

Some people prefer to comment on more than one issue, if present, per
patch iteration.  Apparently you're not one of them.

> Or would you like me to devote all of my time to just your driver
> instead of taking everyone's submissions into consideration?

Of course not.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-16 Thread Måns Rullgård
David Miller  writes:

> From: Mans Rullgard 
> Date: Mon, 16 Nov 2015 18:23:35 +
>
>> +static int nb8800_alloc_rx(struct net_device *dev, int i, bool napi)
>
> "i" is passed in as a signed int here, but:
>
>> +static void nb8800_receive(struct net_device *dev, unsigned i, unsigned len)
>  ...
>> +err = nb8800_alloc_rx(dev, i, true);
>
> It comes from an 'unsigned' value.
>
> Please pick one type and use it consistently.

Darn, missed one.

> Also, always fully spell out "unsigned int" rather than use "unsigned"
> as a shorthand.

OK

Anything else?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-16 Thread Måns Rullgård
David Miller <da...@davemloft.net> writes:

> From: Måns Rullgård <m...@mansr.com>
> Date: Mon, 16 Nov 2015 20:59:18 +
>
>> Anything else?
>
> Sorry, when I find one problem I give you the feedback for that
> and move on to the 100s of other patches I have in my queue.

Some people prefer to comment on more than one issue, if present, per
patch iteration.  Apparently you're not one of them.

> Or would you like me to devote all of my time to just your driver
> instead of taking everyone's submissions into consideration?

Of course not.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-16 Thread Måns Rullgård
David Miller <da...@davemloft.net> writes:

> From: Mans Rullgard <m...@mansr.com>
> Date: Mon, 16 Nov 2015 18:23:35 +
>
>> +static int nb8800_alloc_rx(struct net_device *dev, int i, bool napi)
>
> "i" is passed in as a signed int here, but:
>
>> +static void nb8800_receive(struct net_device *dev, unsigned i, unsigned len)
>  ...
>> +err = nb8800_alloc_rx(dev, i, true);
>
> It comes from an 'unsigned' value.
>
> Please pick one type and use it consistently.

Darn, missed one.

> Also, always fully spell out "unsigned int" rather than use "unsigned"
> as a shorthand.

OK

Anything else?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND][PATCH] watchdog: add support for Sigma Designs SMP86xx

2015-11-13 Thread Måns Rullgård
Guenter Roeck  writes:

>>>> +config TANGOX_WDT
>>>> +  tristate "SMP86xx watchdog"
>>>> +  select WATCHDOG_CORE
>>>> +  depends on ARCH_TANGOX
>>>> +  help
>>>> +Watchdog driver for Sigma Designs SMP86xx.
>>>
>>> Not really; it is for SMP8642, and we don't know if other (later ?) chips
>>> will be supported by the same driver. You should be explicit here. More 
>>> chips
>>> can be added later (that would be needed for the devicetree bindings anyway)
>>> as they are tested.
>>
>> I have tested it on SMP8642 and SMP8759.  The documentation for SMP8654
>> agrees.
>>
>
> We should have that information somewhere - maybe in the driver header.
> It is very useful to know which hardware this was tested with and which
> hardware is supposed to work.

OK, I'll add that info somewhere.

>>>> +static int tangox_wdt_set_timeout(struct watchdog_device *wdt,
>>>> +unsigned int new_timeout)
>>>> +{
>>>> +  struct tangox_wdt_device *dev = watchdog_get_drvdata(wdt);
>>>> +
>>>> +  wdt->timeout = new_timeout;
>>>> +  dev->timeout = 1 + new_timeout * clk_get_rate(dev->clk);
>>>
>>> Why "1 +" ?
>>
>> The counter counts down from the loaded value and asserts the reset pin
>> when it reaches 1.  Setting it to zero disables the watchdog.
>
> You might want to explain that somewhere. Maybe use a define, explain
> it there, and use the define here and below.

Will do.

>>>> +static const struct of_device_id tangox_wdt_dt_ids[] = {
>>>> +  { .compatible = "sigma,smp8642-wdt" },
>>>
>>> So this is really for smp8642 only, not for any other chips in the series ?
>>
>> It's for about a dozen SMP86xx, SMP87xx, and SMP89xx chips.  Should I
>> list them all?  I don't even know where to find a comprehensive list of
>> device numbers.
>>
> I thought so, but I am not a devicetree expert, and I see some "xx" in
> existing devicetree bindings. Something to ask when you submit the
> bindings to the devicetree mailing list. Either case, I think it would
> be either something like "sigma,smp86xx-wdt" or a list of all of them,
> but not "sigma,smp8642-wdt" to be used for all chips.

The general advice is to not use wildcards in DT bindings since the next
chip matching the pattern might not be compatible at all.  New chips
known to be compatible with an old one can specify both the exact chip
and the older one such that existing drivers will work automatically.
If at some point they are found not to be compatible after all (hardware
bugs, perhaps) a fixed driver will work with existing device trees.

Thanks for the review.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND][PATCH] watchdog: add support for Sigma Designs SMP86xx

2015-11-13 Thread Måns Rullgård
Guenter Roeck  writes:

> On 11/13/2015 05:14 AM, Mans Rullgard wrote:
>> This adds support for the Sigma Designs SMP86xx family built-in
>> watchdog.
>>
>> Signed-off-by: Mans Rullgard 
>> ---
>>   drivers/watchdog/Kconfig  |   7 ++
>>   drivers/watchdog/Makefile |   1 +
>>   drivers/watchdog/tangox_wdt.c | 185 
>> ++
>
> Why tangox_wdt instead of smp86xx_wdt.c ?
>
> tangox also implies that this would (should) work for SMP87xx as well,
> about which no statement is made. So why not tango3_wdt ?
>
> [ ok, I see all drivers are named tangox, so if the other maintainers
>   are ok with that, so am I. ]
>
> Is it known if the driver will work for any of the other chips of the
> series (SMP86XX/SMP87XX) ?

It does work on SMP87xx (tango4) as well.  I wrote the driver before I
had any such hardware, then forgot to update the help text and commit
message.

> I think it would be helpful to describe in more detail which chips
> are supported, or at least which chips should work but are untested.
>
>>   3 files changed, 193 insertions(+)
>>   create mode 100644 drivers/watchdog/tangox_wdt.c
>>
>> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
>> index 79e1aa1..0ed5ee8 100644
>> --- a/drivers/watchdog/Kconfig
>> +++ b/drivers/watchdog/Kconfig
>> @@ -1337,6 +1337,13 @@ config RALINK_WDT
>>  help
>>Hardware driver for the Ralink SoC Watchdog Timer.
>>
>> +config TANGOX_WDT
>> +tristate "SMP86xx watchdog"
>> +select WATCHDOG_CORE
>> +depends on ARCH_TANGOX
>> +help
>> +  Watchdog driver for Sigma Designs SMP86xx.
>
> Not really; it is for SMP8642, and we don't know if other (later ?) chips
> will be supported by the same driver. You should be explicit here. More chips
> can be added later (that would be needed for the devicetree bindings anyway)
> as they are tested.

I have tested it on SMP8642 and SMP8759.  The documentation for SMP8654
agrees.

>> +static int tangox_wdt_set_timeout(struct watchdog_device *wdt,
>> +  unsigned int new_timeout)
>> +{
>> +struct tangox_wdt_device *dev = watchdog_get_drvdata(wdt);
>> +
>> +wdt->timeout = new_timeout;
>> +dev->timeout = 1 + new_timeout * clk_get_rate(dev->clk);
>
> Why "1 +" ?

The counter counts down from the loaded value and asserts the reset pin
when it reaches 1.  Setting it to zero disables the watchdog.

>> +static int tangox_wdt_restart(struct notifier_block *nb, unsigned long 
>> action,
>> +  void *data)
>> +{
>> +struct tangox_wdt_device *dev =
>> +container_of(nb, struct tangox_wdt_device, restart);
>> +
>> +writel(1, dev->base + WD_COUNTER);
>> +
>
> A comment might be useful here, explaining what this does (reset after 
> minimum timeout ?).
> Also, the code should wait a bit to ensure that the reset 'catches'
> before the function returns.

Writing 1 to the counter asserts the reset immediately.

>> +static const struct of_device_id tangox_wdt_dt_ids[] = {
>> +{ .compatible = "sigma,smp8642-wdt" },
>
> So this is really for smp8642 only, not for any other chips in the series ?

It's for about a dozen SMP86xx, SMP87xx, and SMP89xx chips.  Should I
list them all?  I don't even know where to find a comprehensive list of
device numbers.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: vitesse: add support for VSC8601

2015-11-13 Thread Måns Rullgård
Mason  writes:

> On 12/11/2015 19:41, Mans Rullgard wrote:
>
>> +.phy_id = PHY_ID_VSC8601,
>> +.name   = "Vitesse VSC8601",
>> +.phy_id_mask= 0x0000,
>> +.features   = PHY_GBIT_FEATURES,
>> +.flags  = PHY_HAS_INTERRUPT,
>> +.config_init= _config_init,
>> +.config_aneg= _config_aneg,
>> +.read_status= _read_status,
>> +.ack_interrupt  = _ack_interrupt,
>> +.config_intr= _config_intr,
>
> I expected Documentation/CodingStyle to forbid taking the address
> of functions.

I can't find anything to that effect.  That said, it's not something I
would normally do, but all the other phy_driver entries in that file
look like that.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: vitesse: add support for VSC8601

2015-11-13 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> On 12/11/2015 19:41, Mans Rullgard wrote:
>
>> +.phy_id = PHY_ID_VSC8601,
>> +.name   = "Vitesse VSC8601",
>> +.phy_id_mask= 0x0000,
>> +.features   = PHY_GBIT_FEATURES,
>> +.flags  = PHY_HAS_INTERRUPT,
>> +.config_init= _config_init,
>> +.config_aneg= _config_aneg,
>> +.read_status= _read_status,
>> +.ack_interrupt  = _ack_interrupt,
>> +.config_intr= _config_intr,
>
> I expected Documentation/CodingStyle to forbid taking the address
> of functions.

I can't find anything to that effect.  That said, it's not something I
would normally do, but all the other phy_driver entries in that file
look like that.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND][PATCH] watchdog: add support for Sigma Designs SMP86xx

2015-11-13 Thread Måns Rullgård
Guenter Roeck <li...@roeck-us.net> writes:

> On 11/13/2015 05:14 AM, Mans Rullgard wrote:
>> This adds support for the Sigma Designs SMP86xx family built-in
>> watchdog.
>>
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>> ---
>>   drivers/watchdog/Kconfig  |   7 ++
>>   drivers/watchdog/Makefile |   1 +
>>   drivers/watchdog/tangox_wdt.c | 185 
>> ++
>
> Why tangox_wdt instead of smp86xx_wdt.c ?
>
> tangox also implies that this would (should) work for SMP87xx as well,
> about which no statement is made. So why not tango3_wdt ?
>
> [ ok, I see all drivers are named tangox, so if the other maintainers
>   are ok with that, so am I. ]
>
> Is it known if the driver will work for any of the other chips of the
> series (SMP86XX/SMP87XX) ?

It does work on SMP87xx (tango4) as well.  I wrote the driver before I
had any such hardware, then forgot to update the help text and commit
message.

> I think it would be helpful to describe in more detail which chips
> are supported, or at least which chips should work but are untested.
>
>>   3 files changed, 193 insertions(+)
>>   create mode 100644 drivers/watchdog/tangox_wdt.c
>>
>> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
>> index 79e1aa1..0ed5ee8 100644
>> --- a/drivers/watchdog/Kconfig
>> +++ b/drivers/watchdog/Kconfig
>> @@ -1337,6 +1337,13 @@ config RALINK_WDT
>>  help
>>Hardware driver for the Ralink SoC Watchdog Timer.
>>
>> +config TANGOX_WDT
>> +tristate "SMP86xx watchdog"
>> +select WATCHDOG_CORE
>> +depends on ARCH_TANGOX
>> +help
>> +  Watchdog driver for Sigma Designs SMP86xx.
>
> Not really; it is for SMP8642, and we don't know if other (later ?) chips
> will be supported by the same driver. You should be explicit here. More chips
> can be added later (that would be needed for the devicetree bindings anyway)
> as they are tested.

I have tested it on SMP8642 and SMP8759.  The documentation for SMP8654
agrees.

>> +static int tangox_wdt_set_timeout(struct watchdog_device *wdt,
>> +  unsigned int new_timeout)
>> +{
>> +struct tangox_wdt_device *dev = watchdog_get_drvdata(wdt);
>> +
>> +wdt->timeout = new_timeout;
>> +dev->timeout = 1 + new_timeout * clk_get_rate(dev->clk);
>
> Why "1 +" ?

The counter counts down from the loaded value and asserts the reset pin
when it reaches 1.  Setting it to zero disables the watchdog.

>> +static int tangox_wdt_restart(struct notifier_block *nb, unsigned long 
>> action,
>> +  void *data)
>> +{
>> +struct tangox_wdt_device *dev =
>> +container_of(nb, struct tangox_wdt_device, restart);
>> +
>> +writel(1, dev->base + WD_COUNTER);
>> +
>
> A comment might be useful here, explaining what this does (reset after 
> minimum timeout ?).
> Also, the code should wait a bit to ensure that the reset 'catches'
> before the function returns.

Writing 1 to the counter asserts the reset immediately.

>> +static const struct of_device_id tangox_wdt_dt_ids[] = {
>> +{ .compatible = "sigma,smp8642-wdt" },
>
> So this is really for smp8642 only, not for any other chips in the series ?

It's for about a dozen SMP86xx, SMP87xx, and SMP89xx chips.  Should I
list them all?  I don't even know where to find a comprehensive list of
device numbers.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND][PATCH] watchdog: add support for Sigma Designs SMP86xx

2015-11-13 Thread Måns Rullgård
Guenter Roeck <li...@roeck-us.net> writes:

>>>> +config TANGOX_WDT
>>>> +  tristate "SMP86xx watchdog"
>>>> +  select WATCHDOG_CORE
>>>> +  depends on ARCH_TANGOX
>>>> +  help
>>>> +Watchdog driver for Sigma Designs SMP86xx.
>>>
>>> Not really; it is for SMP8642, and we don't know if other (later ?) chips
>>> will be supported by the same driver. You should be explicit here. More 
>>> chips
>>> can be added later (that would be needed for the devicetree bindings anyway)
>>> as they are tested.
>>
>> I have tested it on SMP8642 and SMP8759.  The documentation for SMP8654
>> agrees.
>>
>
> We should have that information somewhere - maybe in the driver header.
> It is very useful to know which hardware this was tested with and which
> hardware is supposed to work.

OK, I'll add that info somewhere.

>>>> +static int tangox_wdt_set_timeout(struct watchdog_device *wdt,
>>>> +unsigned int new_timeout)
>>>> +{
>>>> +  struct tangox_wdt_device *dev = watchdog_get_drvdata(wdt);
>>>> +
>>>> +  wdt->timeout = new_timeout;
>>>> +  dev->timeout = 1 + new_timeout * clk_get_rate(dev->clk);
>>>
>>> Why "1 +" ?
>>
>> The counter counts down from the loaded value and asserts the reset pin
>> when it reaches 1.  Setting it to zero disables the watchdog.
>
> You might want to explain that somewhere. Maybe use a define, explain
> it there, and use the define here and below.

Will do.

>>>> +static const struct of_device_id tangox_wdt_dt_ids[] = {
>>>> +  { .compatible = "sigma,smp8642-wdt" },
>>>
>>> So this is really for smp8642 only, not for any other chips in the series ?
>>
>> It's for about a dozen SMP86xx, SMP87xx, and SMP89xx chips.  Should I
>> list them all?  I don't even know where to find a comprehensive list of
>> device numbers.
>>
> I thought so, but I am not a devicetree expert, and I see some "xx" in
> existing devicetree bindings. Something to ask when you submit the
> bindings to the devicetree mailing list. Either case, I think it would
> be either something like "sigma,smp86xx-wdt" or a list of all of them,
> but not "sigma,smp8642-wdt" to be used for all chips.

The general advice is to not use wildcards in DT bindings since the next
chip matching the pattern might not be compatible at all.  New chips
known to be compatible with an old one can specify both the exact chip
and the older one such that existing drivers will work automatically.
If at some point they are found not to be compatible after all (hardware
bugs, perhaps) a fixed driver will work with existing device trees.

Thanks for the review.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: at803x: support interrupt on 8030 and 8035

2015-11-12 Thread Måns Rullgård
Mason  writes:

> On 12/11/2015 20:14, Florian Fainelli wrote:
>> On 12/11/15 11:09, Måns Rullgård wrote:
>>> On 12 November 2015 19:06:23 GMT+00:00, Mason wrote:
>>>> On 12/11/2015 18:40, Mans Rullgard wrote:
>>>>> Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode"
>>>>> added interrupt support for the 8031 PHY but left out the other two
>>>>> chips supported by this driver.
>>>>>
>>>>> This patch sets the .ack_interrupt and .config_intr functions for the
>>>>> 8030 and 8035 drivers as well.
>>>>>
>>>>> Signed-off-by: Mans Rullgard 
>>>>> ---
>>>>> I have only tested this with an 8035.  I can't find a datasheet for
>>>>> the 8030, but since 8031, 8032, and 8035 all have the same register
>>>>> layout, there's a good chance 8030 does as well.
>>>>> ---
>>>>>  drivers/net/phy/at803x.c | 4 
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
>>>>> index fabf11d..2d020a3 100644
>>>>> --- a/drivers/net/phy/at803x.c
>>>>> +++ b/drivers/net/phy/at803x.c
>>>>> @@ -308,6 +308,8 @@ static struct phy_driver at803x_driver[] = {
>>>>>   .flags  = PHY_HAS_INTERRUPT,
>>>>>   .config_aneg= genphy_config_aneg,
>>>>>   .read_status= genphy_read_status,
>>>>> + .ack_interrupt  = at803x_ack_interrupt,
>>>>> + .config_intr= at803x_config_intr,
>>>>>   .driver = {
>>>>>   .owner = THIS_MODULE,
>>>>>   },
>>>>> @@ -327,6 +329,8 @@ static struct phy_driver at803x_driver[] = {
>>>>>   .flags  = PHY_HAS_INTERRUPT,
>>>>>   .config_aneg= genphy_config_aneg,
>>>>>   .read_status= genphy_read_status,
>>>>> + .ack_interrupt  = at803x_ack_interrupt,
>>>>> + .config_intr= at803x_config_intr,
>>>>>   .driver     = {
>>>>>   .owner = THIS_MODULE,
>>>>>   },
>>>>
>>>> Shouldn't we take the opportunity to clean up the duplicated register
>>>> definitions? (I'll send an informal patch to spur discussion.)
>>>>
>>>> Regards.
>>>
>>> That can be done independently. Feel free to send a patch.
>> 
>> Agreed, that deserve a separate patch.
>
> Isn't there a problem when at803x_set_wol() sets the AT803X_WOL_ENABLE
> bit, but a DISABLE/ENABLE cycle through at803x_config_intr() will
> discard that bit?

Possibly, but fixing that should be yet another patch.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: at803x: support interrupt on 8030 and 8035

2015-11-12 Thread Måns Rullgård
On 12 November 2015 19:06:23 GMT+00:00, Mason  wrote:
>On 12/11/2015 18:40, Mans Rullgard wrote:
>> Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt
>mode"
>> added interrupt support for the 8031 PHY but left out the other two
>> chips supported by this driver.
>> 
>> This patch sets the .ack_interrupt and .config_intr functions for the
>> 8030 and 8035 drivers as well.
>> 
>> Signed-off-by: Mans Rullgard 
>> ---
>> I have only tested this with an 8035.  I can't find a datasheet for
>> the 8030, but since 8031, 8032, and 8035 all have the same register
>> layout, there's a good chance 8030 does as well.
>> ---
>>  drivers/net/phy/at803x.c | 4 
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
>> index fabf11d..2d020a3 100644
>> --- a/drivers/net/phy/at803x.c
>> +++ b/drivers/net/phy/at803x.c
>> @@ -308,6 +308,8 @@ static struct phy_driver at803x_driver[] = {
>>  .flags  = PHY_HAS_INTERRUPT,
>>  .config_aneg= genphy_config_aneg,
>>  .read_status= genphy_read_status,
>> +.ack_interrupt  = at803x_ack_interrupt,
>> +.config_intr= at803x_config_intr,
>>  .driver = {
>>  .owner = THIS_MODULE,
>>  },
>> @@ -327,6 +329,8 @@ static struct phy_driver at803x_driver[] = {
>>  .flags  = PHY_HAS_INTERRUPT,
>>  .config_aneg= genphy_config_aneg,
>>  .read_status= genphy_read_status,
>> +.ack_interrupt  = at803x_ack_interrupt,
>> +.config_intr= at803x_config_intr,
>>  .driver = {
>>  .owner = THIS_MODULE,
>>  },
>
>Shouldn't we take the opportunity to clean up the duplicated register
>definitions? (I'll send an informal patch to spur discussion.)
>
>Regards.

That can be done independently. Feel free to send a patch.
-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
David Daney  writes:

> On 11/12/2015 04:31 AM, Peter Zijlstra wrote:
>> Hi
>>
>> I think the MIPS arch_spin_unlock() is borken.
>>
>> spin_unlock() must have RELEASE semantics, these require that no LOADs
>> nor STOREs leak out from the critical section.
>>
>>  From what I know MIPS has a relaxed memory model which allows reads to
>> pass stores, and as implemented arch_spin_unlock() only issues a wmb
>> which doesn't order prior reads vs later stores.
>>
>> Therefore upgrade the wmb() to smp_mb().
>>
>> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)
>
> asm/spinlock.h is only used for !CONFIG_SMP.  So, smp_wmb() would
> imply that special handling for non-SMP is needed, when this is
> already only used for the SMP build case.
>
>>
>> Maybe-Signed-off-by: Peter Zijlstra (Intel) 
>> ---
>> diff --git a/arch/mips/include/asm/spinlock.h 
>> b/arch/mips/include/asm/spinlock.h
>> index 40196bebe849..b2ca13f06152 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>>   static inline void arch_spin_unlock(arch_spinlock_t *lock)
>>   {
>>  unsigned int serving_now = lock->h.serving_now + 1;
>> -wmb();
>> +smp_mb();
>
> That is too heavy.
>
> It implies a full MIPS "SYNC" operation which stalls execution until
> all previous writes are committed and globally visible.
>
> We really want just release semantics, and there is no standard named
> primitive that gives us that.
>
> For CONFIG_CPU_CAVIUM_OCTEON the proper thing would be:
>
> smp_wmb();
> smp_rmb();
>
> Which expands to exactly the same thing as wmb() because smp_rmb()
> expands to nothing.
>
> For CPUs that have out-of-order loads, smp_rmb() should expand to
> something lighter weight than "SYNC"
>
> Certainly we can load up the code with "SYNC" all over the place, but
> it will kill performance on SMP systems.  So, my vote would be to make
> it as light weight as possible, but no lighter.  That will mean
> inventing the proper barrier primitives.

It seems to me that the proper barrier here is a "SYNC 18" aka
SYNC_RELEASE instruction, at least on CPUs that implement that variant.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Måns Rullgård  writes:

> Mason  writes:
>
>> [ CCing a few knowledgeable people ]
>>
>> Despite the subject, this is about an Atheros 8035 PHY :-)
>>
>> On 12/11/2015 15:04, Måns Rullgård wrote:
>>
>>> Mason wrote:
>>> 
>>>> BTW, you're not using the PHY IRQ, right? I think I remember you saying
>>>> it didn't work reliably?
>>> 
>>> It doesn't seem to be wired up on any of my boards, or there's some
>>> magic required to activate it that I'm unaware of.
>>
>> Weird. The board schematics for the 1172 show Tango ETH0_MDINT# pin
>> properly connected to AR8035 INT pin (pin 20).
>
> I have a different board.
>
>> 
>>
>> http://www.redeszone.net/app/uploads/2014/04/AR8035.pdf
>>
>> INT pin 20
>> I/O, D, PD
>> Interrupt Signal to System; default OD-gate, needs an external
>> 10Kohm pull-up, active low; can be configured to I/O by register,
>> active high.
>>
>> 4.1.17 Interrupt Enable
>> Offset: 0x12
>> Mode: Read/Write
>> Hardware Reset: 0
>>
>> Strange... it looks like AT803X_INER and AT803X_INTR_ENABLE refer to
>> the same "Interrupt Enable" register?
>
> Seems like someone missed that it was already defined.
>
>> In fact, AT803X_INER_INIT == 0xec00 makes sense for register 0x12:
>> link success/fail, speed/duplex changed, autoneg error
>>
>> Looks like at803x_config_intr() is used for 8031, but not for 8035...
>>
>> Relevant commit:
>> 77a9939426f7a "phy/at8031: enable at8031 to work on interrupt mode"
>>
>> If I add .config_intr and .ack_interrupt to the 8035 struct, then I get
>> (also added some traces)
>
> I tried that just now, and I get nothing.  What interrupt did you
> specify in your device tree?

It works with the interrupt set to trigger on rising edge.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Mason  writes:

> [ CCing a few knowledgeable people ]
>
> Despite the subject, this is about an Atheros 8035 PHY :-)
>
> On 12/11/2015 15:04, Måns Rullgård wrote:
>
>> Mason wrote:
>> 
>>> BTW, you're not using the PHY IRQ, right? I think I remember you saying
>>> it didn't work reliably?
>> 
>> It doesn't seem to be wired up on any of my boards, or there's some
>> magic required to activate it that I'm unaware of.
>
> Weird. The board schematics for the 1172 show Tango ETH0_MDINT# pin
> properly connected to AR8035 INT pin (pin 20).

I have a different board.

> 
>
> http://www.redeszone.net/app/uploads/2014/04/AR8035.pdf
>
> INT pin 20
> I/O, D, PD
> Interrupt Signal to System; default OD-gate, needs an external
> 10Kohm pull-up, active low; can be configured to I/O by register,
> active high.
>
> 4.1.17 Interrupt Enable
> Offset: 0x12
> Mode: Read/Write
> Hardware Reset: 0
>
> Strange... it looks like AT803X_INER and AT803X_INTR_ENABLE refer to
> the same "Interrupt Enable" register?

Seems like someone missed that it was already defined.

> In fact, AT803X_INER_INIT == 0xec00 makes sense for register 0x12:
> link success/fail, speed/duplex changed, autoneg error
>
> Looks like at803x_config_intr() is used for 8031, but not for 8035...
>
> Relevant commit:
> 77a9939426f7a "phy/at8031: enable at8031 to work on interrupt mode"
>
> If I add .config_intr and .ack_interrupt to the 8035 struct, then I get
> (also added some traces)

I tried that just now, and I get nothing.  What interrupt did you
specify in your device tree?

> Questions:
>
> Can't at803x_ack_interrupt() just return phy_read(phydev, AT803X_INSR);

No, that would return the actual value of the register.  The caller
doesn't care about the value, but should be notified if there was an
error.

> Can at803x_config_intr() be used with the 8035

Probably.  The person who sent the patch for 8031 probably happened to
have that model.

> What about AT803X_INER/AT803X_INTR_ENABLE and AT803X_INSR/AT803X_INTR_STATUS

Accidental duplicates.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
"Paul E. McKenney"  writes:

> On Thu, Nov 12, 2015 at 01:31:23PM +0100, Peter Zijlstra wrote:
>> Hi
>> 
>> I think the MIPS arch_spin_unlock() is borken.
>> 
>> spin_unlock() must have RELEASE semantics, these require that no LOADs
>> nor STOREs leak out from the critical section.
>> 
>> >From what I know MIPS has a relaxed memory model which allows reads to
>> pass stores, and as implemented arch_spin_unlock() only issues a wmb
>> which doesn't order prior reads vs later stores.
>> 
>> Therefore upgrade the wmb() to smp_mb().
>> 
>> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)
>
> One guess is that they want to order I/O accesses within the critical
> section?

Isn't that what mmiowb() is for?

>> Maybe-Signed-off-by: Peter Zijlstra (Intel) 
>> ---
>> diff --git a/arch/mips/include/asm/spinlock.h 
>> b/arch/mips/include/asm/spinlock.h
>> index 40196bebe849..b2ca13f06152 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>>  {
>>  unsigned int serving_now = lock->h.serving_now + 1;
>> -wmb();
>> +smp_mb();
>>  lock->h.serving_now = (u16)serving_now;
>>  nudge_writes();
>>  }
>> 
>

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Mason  writes:

> On 10/11/2015 20:25, Måns Rullgård wrote:
>
>> Mason writes:
>> 
>>> On 10/11/2015 17:14, Mans Rullgard wrote:
>>>
>>>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>>>> It is an almost complete rewrite of a driver originally found in
>>>> a Sigma Designs 2.6.22 tree.
>>>>
>>>> Signed-off-by: Mans Rullgard 
>>>> ---
>>>> Changes:
>>>> - Refactored mdio access functions
>>>> - Refactored register access helpers
>>>> - Improved error handling in rx buffer allocation
>>>> - Optimised some fifo parameters
>>>> - Overhauled tx dma. Multiple packets are now chained in a single dma
>>>>   operation if xmit_more is set, improving performance.
>>>> - Improved rx irq handling. It's not possible to disable interrupts
>>>>   entirely for napi poll, but they can be slowed down a little.
>>>> - Use readx_poll_timeout in various places
>>>> - Improved error detection
>>>> - Improved statistics
>>>> - Report hardware statistics counters through ethtool
>>>> - Improved tangox-specific setup
>>>> - Support for flow control using pause frames
>>>> - Explanatory comments added
>>>> - Various minor stylistic changes
>>>> ---
>>>>  drivers/net/ethernet/Kconfig |1 +
>>>>  drivers/net/ethernet/Makefile|1 +
>>>>  drivers/net/ethernet/aurora/Kconfig  |   20 +
>>>>  drivers/net/ethernet/aurora/Makefile |1 +
>>>>  drivers/net/ethernet/aurora/nb8800.c | 1530 
>>>> ++
>>>>  drivers/net/ethernet/aurora/nb8800.h |  314 +++
>>>>  6 files changed, 1867 insertions(+)
>>>
>>> The code has grown much since the previous patch, despite some
>>> refactoring. Is this mostly due to ethtool_ops support?
>>>
>>>  drivers/net/ethernet/aurora/nb8800.c | 1146 
>>> ++
>>>  drivers/net/ethernet/aurora/nb8800.h |  230 +++
>> 
>> Some of the increase is from new features, some from improvements, and
>> then there are a bunch of new comments.
>
> Sweet.
>
> With this version, my kernel boots faster than before
> (I had been using a 5 month-old version.)
>
> Before:
>
> [0.613623] tangox-enet 26000.ethernet: SMP86xx internal Ethernet at 
> 0x26000
> [0.623638] libphy: tangox-mii: probed
> [0.686527] tangox-enet 26000.ethernet: PHY: found Atheros 8035 ethernet 
> at 0x4
> [0.697169] tangox-enet 26000.ethernet eth0: MAC address 00:16:e8:02:08:42
> ...
> [1.306360] Sending DHCP requests ..
> [4.699969] tangox-enet 26000.ethernet eth0: Link is Up - 1Gbps/Full - 
> flow control rx/tx
> [8.899671] ., OK
> [8.926343] IP-Config: Got DHCP answer from 172.27.200.1, my address is 
> 172.27.64.49
> ...
> [8.987327] Freeing unused kernel memory: 168K (c039e000 - c03c8000)
>
> After:
>
> [0.623526] libphy: nb8800-mii: probed
> [0.628092] nb8800 26000.ethernet eth0: MAC address 00:16:e8:02:08:42
> ...
> [4.732948] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
> control rx/tx
> [4.752655] Sending DHCP requests ., OK
> [4.782644] IP-Config: Got DHCP answer from 172.27.200.1, my address is 
> 172.27.64.49
> ...
> [4.849298] Freeing unused kernel memory: 164K (c039f000 - c03c8000)
>
> The DHCP request is sent later, but the kernel doesn't twiddle its thumbs
> for 4 seconds after the link comes up. Does this come from not probing the
> PHY anymore?

No, that's from properly setting the link state initially down.

> BTW, you're not using the PHY IRQ, right? I think I remember you saying
> it didn't work reliably?

It doesn't seem to be wired up on any of my boards, or there's some
magic required to activate it that I'm unaware of.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
Peter Zijlstra  writes:

> Hi
>
> I think the MIPS arch_spin_unlock() is borken.
>
> spin_unlock() must have RELEASE semantics, these require that no LOADs
> nor STOREs leak out from the critical section.
>
> From what I know MIPS has a relaxed memory model which allows reads to
> pass stores, and as implemented arch_spin_unlock() only issues a wmb
> which doesn't order prior reads vs later stores.

This is correct.

> Therefore upgrade the wmb() to smp_mb().
>
> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)

Good question.

The current MIPS asm/barrier.h uses a plain SYNC instruction for all
kinds of barriers (except on Cavium Octeon), which is a bit wasteful.
A MIPS implementation can optionally support partial barriers (load,
store, acquire, release) which all behave like a full barrier if not
implemented, so those really ought to be used.

> Maybe-Signed-off-by: Peter Zijlstra (Intel) 
> ---
> diff --git a/arch/mips/include/asm/spinlock.h 
> b/arch/mips/include/asm/spinlock.h
> index 40196bebe849..b2ca13f06152 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
>   unsigned int serving_now = lock->h.serving_now + 1;
> - wmb();
> + smp_mb();
>   lock->h.serving_now = (u16)serving_now;
>   nudge_writes();
>  }

All this weirdness was added in commit 500c2e1f:

MIPS: Optimize spinlocks.

The current locking mechanism uses a ll/sc sequence to release a
spinlock.  This is slower than a wmb() followed by a store to unlock.

The branching forward to .subsection 2 on sc failure slows down the
contended case.  So we get rid of that part too.

Since we are now working on naturally aligned u16 values, we can get
rid of a masking operation as the LHU already does the right thing.
The ANDI are reversed for better scheduling on multi-issue CPUs

On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet
forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%.

Signed-off-by: David Daney 
To: linux-m...@linux-mips.org
    Patchwork: http://patchwork.linux-mips.org/patch/937/
Signed-off-by: Ralf Baechle 

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: at803x: support interrupt on 8030 and 8035

2015-11-12 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> On 12/11/2015 20:14, Florian Fainelli wrote:
>> On 12/11/15 11:09, Måns Rullgård wrote:
>>> On 12 November 2015 19:06:23 GMT+00:00, Mason wrote:
>>>> On 12/11/2015 18:40, Mans Rullgard wrote:
>>>>> Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode"
>>>>> added interrupt support for the 8031 PHY but left out the other two
>>>>> chips supported by this driver.
>>>>>
>>>>> This patch sets the .ack_interrupt and .config_intr functions for the
>>>>> 8030 and 8035 drivers as well.
>>>>>
>>>>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>>>>> ---
>>>>> I have only tested this with an 8035.  I can't find a datasheet for
>>>>> the 8030, but since 8031, 8032, and 8035 all have the same register
>>>>> layout, there's a good chance 8030 does as well.
>>>>> ---
>>>>>  drivers/net/phy/at803x.c | 4 
>>>>>  1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
>>>>> index fabf11d..2d020a3 100644
>>>>> --- a/drivers/net/phy/at803x.c
>>>>> +++ b/drivers/net/phy/at803x.c
>>>>> @@ -308,6 +308,8 @@ static struct phy_driver at803x_driver[] = {
>>>>>   .flags  = PHY_HAS_INTERRUPT,
>>>>>   .config_aneg= genphy_config_aneg,
>>>>>   .read_status= genphy_read_status,
>>>>> + .ack_interrupt  = at803x_ack_interrupt,
>>>>> + .config_intr= at803x_config_intr,
>>>>>   .driver = {
>>>>>   .owner = THIS_MODULE,
>>>>>   },
>>>>> @@ -327,6 +329,8 @@ static struct phy_driver at803x_driver[] = {
>>>>>   .flags  = PHY_HAS_INTERRUPT,
>>>>>   .config_aneg= genphy_config_aneg,
>>>>>   .read_status= genphy_read_status,
>>>>> + .ack_interrupt  = at803x_ack_interrupt,
>>>>> + .config_intr= at803x_config_intr,
>>>>>   .driver = {
>>>>>   .owner = THIS_MODULE,
>>>>>   },
>>>>
>>>> Shouldn't we take the opportunity to clean up the duplicated register
>>>> definitions? (I'll send an informal patch to spur discussion.)
>>>>
>>>> Regards.
>>>
>>> That can be done independently. Feel free to send a patch.
>> 
>> Agreed, that deserve a separate patch.
>
> Isn't there a problem when at803x_set_wol() sets the AT803X_WOL_ENABLE
> bit, but a DISABLE/ENABLE cycle through at803x_config_intr() will
> discard that bit?

Possibly, but fixing that should be yet another patch.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> On 10/11/2015 20:25, Måns Rullgård wrote:
>
>> Mason writes:
>> 
>>> On 10/11/2015 17:14, Mans Rullgard wrote:
>>>
>>>> This adds a driver for the Aurora VLSI NB8800 Ethernet controller.
>>>> It is an almost complete rewrite of a driver originally found in
>>>> a Sigma Designs 2.6.22 tree.
>>>>
>>>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>>>> ---
>>>> Changes:
>>>> - Refactored mdio access functions
>>>> - Refactored register access helpers
>>>> - Improved error handling in rx buffer allocation
>>>> - Optimised some fifo parameters
>>>> - Overhauled tx dma. Multiple packets are now chained in a single dma
>>>>   operation if xmit_more is set, improving performance.
>>>> - Improved rx irq handling. It's not possible to disable interrupts
>>>>   entirely for napi poll, but they can be slowed down a little.
>>>> - Use readx_poll_timeout in various places
>>>> - Improved error detection
>>>> - Improved statistics
>>>> - Report hardware statistics counters through ethtool
>>>> - Improved tangox-specific setup
>>>> - Support for flow control using pause frames
>>>> - Explanatory comments added
>>>> - Various minor stylistic changes
>>>> ---
>>>>  drivers/net/ethernet/Kconfig |1 +
>>>>  drivers/net/ethernet/Makefile|1 +
>>>>  drivers/net/ethernet/aurora/Kconfig  |   20 +
>>>>  drivers/net/ethernet/aurora/Makefile |1 +
>>>>  drivers/net/ethernet/aurora/nb8800.c | 1530 
>>>> ++
>>>>  drivers/net/ethernet/aurora/nb8800.h |  314 +++
>>>>  6 files changed, 1867 insertions(+)
>>>
>>> The code has grown much since the previous patch, despite some
>>> refactoring. Is this mostly due to ethtool_ops support?
>>>
>>>  drivers/net/ethernet/aurora/nb8800.c | 1146 
>>> ++
>>>  drivers/net/ethernet/aurora/nb8800.h |  230 +++
>> 
>> Some of the increase is from new features, some from improvements, and
>> then there are a bunch of new comments.
>
> Sweet.
>
> With this version, my kernel boots faster than before
> (I had been using a 5 month-old version.)
>
> Before:
>
> [0.613623] tangox-enet 26000.ethernet: SMP86xx internal Ethernet at 
> 0x26000
> [0.623638] libphy: tangox-mii: probed
> [0.686527] tangox-enet 26000.ethernet: PHY: found Atheros 8035 ethernet 
> at 0x4
> [0.697169] tangox-enet 26000.ethernet eth0: MAC address 00:16:e8:02:08:42
> ...
> [1.306360] Sending DHCP requests ..
> [4.699969] tangox-enet 26000.ethernet eth0: Link is Up - 1Gbps/Full - 
> flow control rx/tx
> [8.899671] ., OK
> [8.926343] IP-Config: Got DHCP answer from 172.27.200.1, my address is 
> 172.27.64.49
> ...
> [8.987327] Freeing unused kernel memory: 168K (c039e000 - c03c8000)
>
> After:
>
> [0.623526] libphy: nb8800-mii: probed
> [0.628092] nb8800 26000.ethernet eth0: MAC address 00:16:e8:02:08:42
> ...
> [4.732948] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
> control rx/tx
> [4.752655] Sending DHCP requests ., OK
> [4.782644] IP-Config: Got DHCP answer from 172.27.200.1, my address is 
> 172.27.64.49
> ...
> [4.849298] Freeing unused kernel memory: 164K (c039f000 - c03c8000)
>
> The DHCP request is sent later, but the kernel doesn't twiddle its thumbs
> for 4 seconds after the link comes up. Does this come from not probing the
> PHY anymore?

No, that's from properly setting the link state initially down.

> BTW, you're not using the PHY IRQ, right? I think I remember you saying
> it didn't work reliably?

It doesn't seem to be wired up on any of my boards, or there's some
magic required to activate it that I'm unaware of.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
"Paul E. McKenney" <paul...@linux.vnet.ibm.com> writes:

> On Thu, Nov 12, 2015 at 01:31:23PM +0100, Peter Zijlstra wrote:
>> Hi
>> 
>> I think the MIPS arch_spin_unlock() is borken.
>> 
>> spin_unlock() must have RELEASE semantics, these require that no LOADs
>> nor STOREs leak out from the critical section.
>> 
>> >From what I know MIPS has a relaxed memory model which allows reads to
>> pass stores, and as implemented arch_spin_unlock() only issues a wmb
>> which doesn't order prior reads vs later stores.
>> 
>> Therefore upgrade the wmb() to smp_mb().
>> 
>> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)
>
> One guess is that they want to order I/O accesses within the critical
> section?

Isn't that what mmiowb() is for?

>> Maybe-Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
>> ---
>> diff --git a/arch/mips/include/asm/spinlock.h 
>> b/arch/mips/include/asm/spinlock.h
>> index 40196bebe849..b2ca13f06152 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>>  {
>>  unsigned int serving_now = lock->h.serving_now + 1;
>> -wmb();
>> +smp_mb();
>>  lock->h.serving_now = (u16)serving_now;
>>  nudge_writes();
>>  }
>> 
>

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
Peter Zijlstra <pet...@infradead.org> writes:

> Hi
>
> I think the MIPS arch_spin_unlock() is borken.
>
> spin_unlock() must have RELEASE semantics, these require that no LOADs
> nor STOREs leak out from the critical section.
>
> From what I know MIPS has a relaxed memory model which allows reads to
> pass stores, and as implemented arch_spin_unlock() only issues a wmb
> which doesn't order prior reads vs later stores.

This is correct.

> Therefore upgrade the wmb() to smp_mb().
>
> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)

Good question.

The current MIPS asm/barrier.h uses a plain SYNC instruction for all
kinds of barriers (except on Cavium Octeon), which is a bit wasteful.
A MIPS implementation can optionally support partial barriers (load,
store, acquire, release) which all behave like a full barrier if not
implemented, so those really ought to be used.

> Maybe-Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> ---
> diff --git a/arch/mips/include/asm/spinlock.h 
> b/arch/mips/include/asm/spinlock.h
> index 40196bebe849..b2ca13f06152 100644
> --- a/arch/mips/include/asm/spinlock.h
> +++ b/arch/mips/include/asm/spinlock.h
> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
>   unsigned int serving_now = lock->h.serving_now + 1;
> - wmb();
> + smp_mb();
>   lock->h.serving_now = (u16)serving_now;
>   nudge_writes();
>  }

All this weirdness was added in commit 500c2e1f:

MIPS: Optimize spinlocks.

The current locking mechanism uses a ll/sc sequence to release a
spinlock.  This is slower than a wmb() followed by a store to unlock.

The branching forward to .subsection 2 on sc failure slows down the
contended case.  So we get rid of that part too.

Since we are now working on naturally aligned u16 values, we can get
rid of a masking operation as the LHU already does the right thing.
The ANDI are reversed for better scheduling on multi-issue CPUs

On a 12 CPU 750MHz Octeon cn5750 this patch improves ipv4 UDP packet
forwarding rates from 3.58*10^6 PPS to 3.99*10^6 PPS, or about 11%.

Signed-off-by: David Daney <dda...@caviumnetworks.com>
To: linux-m...@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/937/
Signed-off-by: Ralf Baechle <r...@linux-mips.org>

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

2015-11-12 Thread Måns Rullgård
David Daney <dda...@caviumnetworks.com> writes:

> On 11/12/2015 04:31 AM, Peter Zijlstra wrote:
>> Hi
>>
>> I think the MIPS arch_spin_unlock() is borken.
>>
>> spin_unlock() must have RELEASE semantics, these require that no LOADs
>> nor STOREs leak out from the critical section.
>>
>>  From what I know MIPS has a relaxed memory model which allows reads to
>> pass stores, and as implemented arch_spin_unlock() only issues a wmb
>> which doesn't order prior reads vs later stores.
>>
>> Therefore upgrade the wmb() to smp_mb().
>>
>> (Also, why the unconditional wmb, as opposed to smp_wmb() ?)
>
> asm/spinlock.h is only used for !CONFIG_SMP.  So, smp_wmb() would
> imply that special handling for non-SMP is needed, when this is
> already only used for the SMP build case.
>
>>
>> Maybe-Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
>> ---
>> diff --git a/arch/mips/include/asm/spinlock.h 
>> b/arch/mips/include/asm/spinlock.h
>> index 40196bebe849..b2ca13f06152 100644
>> --- a/arch/mips/include/asm/spinlock.h
>> +++ b/arch/mips/include/asm/spinlock.h
>> @@ -140,7 +140,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>>   static inline void arch_spin_unlock(arch_spinlock_t *lock)
>>   {
>>  unsigned int serving_now = lock->h.serving_now + 1;
>> -wmb();
>> +smp_mb();
>
> That is too heavy.
>
> It implies a full MIPS "SYNC" operation which stalls execution until
> all previous writes are committed and globally visible.
>
> We really want just release semantics, and there is no standard named
> primitive that gives us that.
>
> For CONFIG_CPU_CAVIUM_OCTEON the proper thing would be:
>
> smp_wmb();
> smp_rmb();
>
> Which expands to exactly the same thing as wmb() because smp_rmb()
> expands to nothing.
>
> For CPUs that have out-of-order loads, smp_rmb() should expand to
> something lighter weight than "SYNC"
>
> Certainly we can load up the code with "SYNC" all over the place, but
> it will kill performance on SMP systems.  So, my vote would be to make
> it as light weight as possible, but no lighter.  That will mean
> inventing the proper barrier primitives.

It seems to me that the proper barrier here is a "SYNC 18" aka
SYNC_RELEASE instruction, at least on CPUs that implement that variant.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Mason <slash@free.fr> writes:

> [ CCing a few knowledgeable people ]
>
> Despite the subject, this is about an Atheros 8035 PHY :-)
>
> On 12/11/2015 15:04, Måns Rullgård wrote:
>
>> Mason wrote:
>> 
>>> BTW, you're not using the PHY IRQ, right? I think I remember you saying
>>> it didn't work reliably?
>> 
>> It doesn't seem to be wired up on any of my boards, or there's some
>> magic required to activate it that I'm unaware of.
>
> Weird. The board schematics for the 1172 show Tango ETH0_MDINT# pin
> properly connected to AR8035 INT pin (pin 20).

I have a different board.

> 
>
> http://www.redeszone.net/app/uploads/2014/04/AR8035.pdf
>
> INT pin 20
> I/O, D, PD
> Interrupt Signal to System; default OD-gate, needs an external
> 10Kohm pull-up, active low; can be configured to I/O by register,
> active high.
>
> 4.1.17 Interrupt Enable
> Offset: 0x12
> Mode: Read/Write
> Hardware Reset: 0
>
> Strange... it looks like AT803X_INER and AT803X_INTR_ENABLE refer to
> the same "Interrupt Enable" register?

Seems like someone missed that it was already defined.

> In fact, AT803X_INER_INIT == 0xec00 makes sense for register 0x12:
> link success/fail, speed/duplex changed, autoneg error
>
> Looks like at803x_config_intr() is used for 8031, but not for 8035...
>
> Relevant commit:
> 77a9939426f7a "phy/at8031: enable at8031 to work on interrupt mode"
>
> If I add .config_intr and .ack_interrupt to the 8035 struct, then I get
> (also added some traces)

I tried that just now, and I get nothing.  What interrupt did you
specify in your device tree?

> Questions:
>
> Can't at803x_ack_interrupt() just return phy_read(phydev, AT803X_INSR);

No, that would return the actual value of the register.  The caller
doesn't care about the value, but should be notified if there was an
error.

> Can at803x_config_intr() be used with the 8035

Probably.  The person who sent the patch for 8031 probably happened to
have that model.

> What about AT803X_INER/AT803X_INTR_ENABLE and AT803X_INSR/AT803X_INTR_STATUS

Accidental duplicates.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-12 Thread Måns Rullgård
Måns Rullgård <m...@mansr.com> writes:

> Mason <slash@free.fr> writes:
>
>> [ CCing a few knowledgeable people ]
>>
>> Despite the subject, this is about an Atheros 8035 PHY :-)
>>
>> On 12/11/2015 15:04, Måns Rullgård wrote:
>>
>>> Mason wrote:
>>> 
>>>> BTW, you're not using the PHY IRQ, right? I think I remember you saying
>>>> it didn't work reliably?
>>> 
>>> It doesn't seem to be wired up on any of my boards, or there's some
>>> magic required to activate it that I'm unaware of.
>>
>> Weird. The board schematics for the 1172 show Tango ETH0_MDINT# pin
>> properly connected to AR8035 INT pin (pin 20).
>
> I have a different board.
>
>> 
>>
>> http://www.redeszone.net/app/uploads/2014/04/AR8035.pdf
>>
>> INT pin 20
>> I/O, D, PD
>> Interrupt Signal to System; default OD-gate, needs an external
>> 10Kohm pull-up, active low; can be configured to I/O by register,
>> active high.
>>
>> 4.1.17 Interrupt Enable
>> Offset: 0x12
>> Mode: Read/Write
>> Hardware Reset: 0
>>
>> Strange... it looks like AT803X_INER and AT803X_INTR_ENABLE refer to
>> the same "Interrupt Enable" register?
>
> Seems like someone missed that it was already defined.
>
>> In fact, AT803X_INER_INIT == 0xec00 makes sense for register 0x12:
>> link success/fail, speed/duplex changed, autoneg error
>>
>> Looks like at803x_config_intr() is used for 8031, but not for 8035...
>>
>> Relevant commit:
>> 77a9939426f7a "phy/at8031: enable at8031 to work on interrupt mode"
>>
>> If I add .config_intr and .ack_interrupt to the 8035 struct, then I get
>> (also added some traces)
>
> I tried that just now, and I get nothing.  What interrupt did you
> specify in your device tree?

It works with the interrupt set to trigger on rising edge.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: phy: at803x: support interrupt on 8030 and 8035

2015-11-12 Thread Måns Rullgård
On 12 November 2015 19:06:23 GMT+00:00, Mason <slash@free.fr> wrote:
>On 12/11/2015 18:40, Mans Rullgard wrote:
>> Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt
>mode"
>> added interrupt support for the 8031 PHY but left out the other two
>> chips supported by this driver.
>> 
>> This patch sets the .ack_interrupt and .config_intr functions for the
>> 8030 and 8035 drivers as well.
>> 
>> Signed-off-by: Mans Rullgard <m...@mansr.com>
>> ---
>> I have only tested this with an 8035.  I can't find a datasheet for
>> the 8030, but since 8031, 8032, and 8035 all have the same register
>> layout, there's a good chance 8030 does as well.
>> ---
>>  drivers/net/phy/at803x.c | 4 
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
>> index fabf11d..2d020a3 100644
>> --- a/drivers/net/phy/at803x.c
>> +++ b/drivers/net/phy/at803x.c
>> @@ -308,6 +308,8 @@ static struct phy_driver at803x_driver[] = {
>>  .flags  = PHY_HAS_INTERRUPT,
>>  .config_aneg= genphy_config_aneg,
>>  .read_status= genphy_read_status,
>> +.ack_interrupt  = at803x_ack_interrupt,
>> +.config_intr= at803x_config_intr,
>>  .driver = {
>>  .owner = THIS_MODULE,
>>  },
>> @@ -327,6 +329,8 @@ static struct phy_driver at803x_driver[] = {
>>  .flags  = PHY_HAS_INTERRUPT,
>>  .config_aneg= genphy_config_aneg,
>>  .read_status= genphy_read_status,
>> +.ack_interrupt  = at803x_ack_interrupt,
>> +.config_intr= at803x_config_intr,
>>  .driver = {
>>  .owner = THIS_MODULE,
>>  },
>
>Shouldn't we take the opportunity to clean up the duplicated register
>definitions? (I'll send an informal patch to spur discussion.)
>
>Regards.

That can be done independently. Feel free to send a patch.
-- 
Måns Rullgård
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 19:35:05 +
>
>>> I don't think it's silly at all.
>> 
>> I'm sure I read somewhere that the time spent spinning on a lock should
>> be kept as small as possible.
>> 
>>> And unless you can measure it making a difference, don't knock the idea.
>> 
>> I tried using netif_tx_lock() in the IRQ handler instead, and it locked
>> up solid.  Clearly that was the wrong thing to do.
>
> Oh that's right, it's a BH lock not an IRQ one.
>
> Yet another argument for doing everything in ->poll(), thus making all
> operations outside of NAPI scheduling run in software interrupt
> context, and therefore being able to make use of the TXQ lock for
> this.

Well, I tried calling the DMA restart function from NAPI poll under
netif_tx_lock().  Now it works only as long as there is incoming
traffic.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 19:25:46 +
>
>> David Miller  writes:
>> 
>>> From: Måns Rullgård 
>>> Date: Wed, 11 Nov 2015 19:17:07 +
>>>
>>>> David Miller  writes:
>>>> 
>>>>> From: Måns Rullgård 
>>>>> Date: Wed, 11 Nov 2015 19:09:19 +
>>>>>
>>>>>> David Miller  writes:
>>>>>> 
>>>>>>> From: Måns Rullgård 
>>>>>>> Date: Wed, 11 Nov 2015 18:25:05 +
>>>>>>>
>>>>>>>> If the TX DMA channel is idle when start_xmit is called, it can be
>>>>>>>> started immediately.  Checking the DMA status and starting it if
>>>>>>>> idle has to be done atomically somehow.
>>>>>>>
>>>>>>> ->ndo_start_xmit() is guaranteed to be invoked atomically, protected
>>>>>>> by the TX queue spinlock.
>>>>>> 
>>>>>> Yes, but the DMA needs to be restarted from some other context if it was
>>>>>> busy when start_xmit checked.
>>>>>
>>>>> Then you can probably use the TXQ lock in the interrupt handler just for
>>>>> that.
>>>> 
>>>> That seems a bit heavy-handed when the critical section for this is only
>>>> a tiny part of the start_xmit function.
>>>
>>> Then what synchornization primitive other than spin locks are you going
>>> to use for this?
>>>
>>> My point is that there is a spinlock the core code is _already_ taking,
>>> unconditionally, when ->ndo_start_xmit() executes.  And you can therefore
>>> take advantage of that rather than using another lock of your own.
>> 
>> I get that.  But that remains locked for the duration of ndo_start_xmit()
>> whereas the part that needs to be synchronised with the DMA completion
>> IRQ handler is tiny.  Having the IRQ handler spin for the duration of
>> ndo_start_xmit() seemed silly to me.
>
> I don't think it's silly at all.

I'm sure I read somewhere that the time spent spinning on a lock should
be kept as small as possible.

> And unless you can measure it making a difference, don't knock the idea.

I tried using netif_tx_lock() in the IRQ handler instead, and it locked
up solid.  Clearly that was the wrong thing to do.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 19:17:07 +
>
>> David Miller  writes:
>> 
>>> From: Måns Rullgård 
>>> Date: Wed, 11 Nov 2015 19:09:19 +
>>>
>>>> David Miller  writes:
>>>> 
>>>>> From: Måns Rullgård 
>>>>> Date: Wed, 11 Nov 2015 18:25:05 +
>>>>>
>>>>>> If the TX DMA channel is idle when start_xmit is called, it can be
>>>>>> started immediately.  Checking the DMA status and starting it if
>>>>>> idle has to be done atomically somehow.
>>>>>
>>>>> ->ndo_start_xmit() is guaranteed to be invoked atomically, protected
>>>>> by the TX queue spinlock.
>>>> 
>>>> Yes, but the DMA needs to be restarted from some other context if it was
>>>> busy when start_xmit checked.
>>>
>>> Then you can probably use the TXQ lock in the interrupt handler just for
>>> that.
>> 
>> That seems a bit heavy-handed when the critical section for this is only
>> a tiny part of the start_xmit function.
>
> Then what synchornization primitive other than spin locks are you going
> to use for this?
>
> My point is that there is a spinlock the core code is _already_ taking,
> unconditionally, when ->ndo_start_xmit() executes.  And you can therefore
> take advantage of that rather than using another lock of your own.

I get that.  But that remains locked for the duration of ndo_start_xmit()
whereas the part that needs to be synchronised with the DMA completion
IRQ handler is tiny.  Having the IRQ handler spin for the duration of
ndo_start_xmit() seemed silly to me.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 19:09:19 +
>
>> David Miller  writes:
>> 
>>> From: Måns Rullgård 
>>> Date: Wed, 11 Nov 2015 18:25:05 +
>>>
>>>> If the TX DMA channel is idle when start_xmit is called, it can be
>>>> started immediately.  Checking the DMA status and starting it if
>>>> idle has to be done atomically somehow.
>>>
>>> ->ndo_start_xmit() is guaranteed to be invoked atomically, protected
>>> by the TX queue spinlock.
>> 
>> Yes, but the DMA needs to be restarted from some other context if it was
>> busy when start_xmit checked.
>
> Then you can probably use the TXQ lock in the interrupt handler just for
> that.

That seems a bit heavy-handed when the critical section for this is only
a tiny part of the start_xmit function.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 18:25:05 +
>
>> If the TX DMA channel is idle when start_xmit is called, it can be
>> started immediately.  Checking the DMA status and starting it if
>> idle has to be done atomically somehow.
>
> ->ndo_start_xmit() is guaranteed to be invoked atomically, protected
> by the TX queue spinlock.

Yes, but the DMA needs to be restarted from some other context if it was
busy when start_xmit checked.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-11-11 Thread Måns Rullgård
David Miller  writes:

> From: Måns Rullgård 
> Date: Wed, 11 Nov 2015 13:04:07 +
>
>> Måns Rullgård  writes:
>> 
>>> David Miller  writes:
>>>
>>>> From: Måns Rullgård 
>>>> Date: Wed, 11 Nov 2015 00:40:09 +
>>>>
>>>>> When the DMA complete interrupt arrives, the next chain should be
>>>>> kicked off as quickly as possible, and I don't see why that would
>>>>> benefit from being done in napi context.
>>>>
>>>> NAPI isn't about low latency, it's about fairness and interrupt
>>>> mitigation.
>>>>
>>>> You probably don't even realize that all of the TX SKB freeing you do
>>>> in the hardware interrupt handler end up being actually processed by a
>>>> scheduled software interrupt anyways.
>>>>
>>>> So you are gaining almost nothing by not doing TX completion in NAPI
>>>> context, whereas by doing so you would be gaining a lot including
>>>> more simplified locking or even the ability to do no locking at all.
>>>
>>> TX completion is separate from restarting the DMA, and moving that to
>>> NAPI may well be a good idea.  Should I simply napi_schedule() if the
>>> hardware indicates TX is complete and do the cleanup in the NAPI poll
>>> function?
>> 
>> I tried that, and throughput (as measured by iperf3) dropped by 2%.
>> Maybe I did something wrong.
>
> Did you fix all the locking in that change?
>
> Since all of your TX handling runs in software interrupt context, you
> can stop using IRQ locking and use BH locking driver-wide instead.
>
> And actually, no locking is really needed for TX processing.  With
> proper memory barriers and properly crafter queue state tests, you
> can run completely lockless.
>
> Again, look at example drivers.  I know, for example, that
> drivers/net/ethernet/broadcom/tg3.c runs TX lockless.  You'll
> see that tg3_tx() takes no locks at all.

The way the hardware works, once a DMA operation has been started,
adding more packets to the active chain can't be done reliably.  For
that reason, if start_xmit is called (with xmit_more zero) while a DMA
operation is in progress, the new packet(s) must be queued until the
hardware raises the DMA complete interrupt.  At that time, the next
pending DMA chain, if any, can be kicked off.  If the TX DMA channel is
idle when start_xmit is called, it can be started immediately.  Checking
the DMA status and starting it if idle has to be done atomically
somehow.

There is a separate indication for actual TX completion, and the
interrupt for that can be set to only fire every 7 frames or when a
timeout expires.  When this happens, the TX cleanup needs to run, and
that can obviously be done from NAPI without using any locks.

Bear in mind that this hardware is quite primitive compared to modern
high-performance Ethernet controllers from the likes of Intel and
Broadcom.  The documentation I have is dated 2003.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   >