Re: Tearing down DMA transfer setup after DMA client has finished

2016-12-09 Thread Sebastian Frias
On 09/12/16 07:59, Vinod Koul wrote:
> On Thu, Dec 08, 2016 at 04:48:18PM +, Måns Rullgård wrote:
>> Vinod Koul  writes:
>>
>>> To make it efficient, disregarding your Sbox HW issue, the solution is
>>> virtual channels. You can delink physical channels and virtual channels. If
>>> one has SW controlled MUX then a channel can service any client. For few
>>> controllers request lines are hard wired so they cant use any channel. But
>>> if you dont have this restriction then driver can queue up many transactions
>>> from different controllers.
>>
>> Have you been paying attention at all?  This exactly what the driver
>> ALREADY DOES.
> 
> And have you read what the question was?
> 

I think many people appreciate the quick turn around time and responsiveness of
knowledgeable people making constructive remarks in this thread, but it looks we
are slowly drifting away from the main problem.

If we had to sum up the discussion, the current DMA API/framework in Linux seems
to lack a way to properly handle this HW (or if it has a way, the information 
got
lost somewhere).

What concrete solution do you propose?

Alternatively, one can think of the current issue (i.e.: the fact that the IRQ
arrives "too soon") in a different way.
Instead of thinking the IRQ indicates "transfer complete", it is indicating 
"ready
to accept another command", which in practice (and with proper API support) can
translate into efficient queuing of DMA operations.


Re: [PATCH v3] add equivalent of BIT(x) for bitfields

2016-12-07 Thread Sebastian Frias
On 07/12/16 15:02, Jakub Kicinski wrote:
> On Wed, 7 Dec 2016 14:51:36 +0100, Sebastian Frias wrote:
>> On 07/12/16 13:38, Jakub Kicinski wrote:
>>> On Wed, 7 Dec 2016 12:23:56 +0100, Sebastian Frias wrote:  
>>>> On 07/12/16 12:05, Jakub Kicinski wrote:  
>>>>> On Wed, 7 Dec 2016 11:00:57 +0100, Sebastian Frias wrote:
>>>>>> On 07/12/16 09:42, Kalle Valo wrote:
>>>>>>> Sebastian Frias  writes:
>>>>>>>   
>>>>>>>> Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
>>>>>>>> continuous bitfields, just as BIT(x) does for single bits.
>>>>>>>>
>>>>>>>> GENVALUE_ULL(msb, lsb, value) macro is also added.
>>>>>>>>
>>>>>>>> This is useful mostly for creating values to be packed together
>>>>>>>> via OR operations, ex:
>>>>>>>>
>>>>>>>>u32 val = 0x;
>>>>>>>>val |= GENVALUE(19, 12, 0x5a);
>>>>>>>>
>>>>>>>> now 'val = 0x1115a000'
>>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Sebastian Frias 
>>>>>>>> Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Change in v2:
>>>>>>>> - rename the macro to GENVALUE as proposed by Linus
>>>>>>>> - longer comment attempts to show use case for the macro as
>>>>>>>> proposed by Borislav
>>>>>>>>
>>>>>>>> Change in v3:
>>>>>>>> - use BUILD_BUG_ON_ZERO() to break if some input parameters
>>>>>>>> (essentially 'lsb' but also 'msb') are not constants as
>>>>>>>> proposed by Linus.
>>>>>>>> Indeed, 'lsb' is used twice so it cannot have side-effects;
>>>>>>>> 'msb' is subjected to same constraints for consistency.  
>>>>>>>
>>>>>>> (I missed there was v3 already, but I'll repeat what I said in v1.)
>>>>>>>
>>>>>>> Please check FIELD_PREP() from include/linux/bitfield.h, doesn't it
>>>>>>> already do the same?  
>>>>>>
>>>>>> Indeed, it appears to do the same :-)
>>>>>> Any reason why "include/linux/bitfield.h" is not included by default in
>>>>>> bitops.h?
>>>>>
>>>>> Hi!
>>>>>
>>>>> The code is in a separate header because of circular dependencies
>>>>> (coming from bug.h including bitops.h, IIRC).  You could possibly add an
>>>>> include of bitfield.h in bitops.h if you're very careful, I haven't
>>>>> tried TBH :)
>>>>> 
>>>>
>>>> Well, the following seems to work just fine:
>>>>
>>>> diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
>>>> index f6505d8..24c7480 100644
>>>> --- a/include/linux/bitfield.h
>>>> +++ b/include/linux/bitfield.h
>>>> @@ -15,8 +15,6 @@
>>>>  #ifndef _LINUX_BITFIELD_H
>>>>  #define _LINUX_BITFIELD_H
>>>>  
>>>> -#include 
>>>> -  
>>>
>>> That would break users who include bitfield.h directly and don't
>>> include bug.h, no?  
>>
>> That's why I was asking if there's a way to verify that I did not break
>> anything, as it may be simpler to just modify the users.
>>
>> Right now bitops.h does not include bug.h yet my patch on this thread
>> used BUILD_BUG_ON_ZERO() inside bitops.h without issues.
>>
>> So it seems like the "proper" include order is already in place.
>> (even though I agree with you that ideally each header file should include
>> headers required for it to work properly, regardless of include order)
>>
>> Would the following files be the only two users we would need to worry
>> about?
>>
>> $ git grep "linux/bitfield\.h"
>> drivers/net/ethernet/netronome/nfp/nfp_bpf.h:#include 
>> drivers/net/wireless/mediatek/mt7601u/mt7601u.h:#include 
> 
> If you're proposing to require users to have to remember to include
> bug.h before bitfield.h then I don't think that's acceptable.  
> 

I was proposing a quick fix, and noted that "I agree with you that ideally 
each header file should include headers required for it to work properly,
regardless of include order"

The reason is probably the same that you had when you submitted bitfield.h
without including it in bitops.h: it seems that the "circular dependency"
issue is very deep.

However, we have to keep in mind that after fixing the current users of
bitfield.h, the rest won't have to do anything, because bitfield.h would
come with bitops.h and work properly.

> There are also out-of-tree users like LEDE/OpenWRT who such changes may
> disturb.
> 

I thought, maybe incorrectly, that out-of-tree users are not to be worried
about.

> BTW I dug out the old conversation: https://lkml.org/lkml/2016/8/19/96
> 
>> I can take a look at the underlying include order issue later.
> 
> I think that would be the only way forward, but is not easy.
> 

Right :-)

> Is your concern that bitfield.h is hard to discover?  Or that users need
> an extra #include beyond just bitops.h?
> 

The thing is that there are so many files that it is indeed not easy to know
that some facilities are already there.

I wrote the original macros in 4.7 which does not has bitfield.h, so it is
not really the case here, but I think it would be a good idea to include
these macros by default.


Re: [PATCH v3] add equivalent of BIT(x) for bitfields

2016-12-07 Thread Sebastian Frias
On 07/12/16 13:38, Jakub Kicinski wrote:
> On Wed, 7 Dec 2016 12:23:56 +0100, Sebastian Frias wrote:
>> On 07/12/16 12:05, Jakub Kicinski wrote:
>>> On Wed, 7 Dec 2016 11:00:57 +0100, Sebastian Frias wrote:  
>>>> On 07/12/16 09:42, Kalle Valo wrote:  
>>>>> Sebastian Frias  writes:
>>>>> 
>>>>>> Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
>>>>>> continuous bitfields, just as BIT(x) does for single bits.
>>>>>>
>>>>>> GENVALUE_ULL(msb, lsb, value) macro is also added.
>>>>>>
>>>>>> This is useful mostly for creating values to be packed together
>>>>>> via OR operations, ex:
>>>>>>
>>>>>>u32 val = 0x;
>>>>>>val |= GENVALUE(19, 12, 0x5a);
>>>>>>
>>>>>> now 'val = 0x1115a000'
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Sebastian Frias 
>>>>>> Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
>>>>>> ---
>>>>>>
>>>>>> Change in v2:
>>>>>> - rename the macro to GENVALUE as proposed by Linus
>>>>>> - longer comment attempts to show use case for the macro as
>>>>>> proposed by Borislav
>>>>>>
>>>>>> Change in v3:
>>>>>> - use BUILD_BUG_ON_ZERO() to break if some input parameters
>>>>>> (essentially 'lsb' but also 'msb') are not constants as
>>>>>> proposed by Linus.
>>>>>> Indeed, 'lsb' is used twice so it cannot have side-effects;
>>>>>> 'msb' is subjected to same constraints for consistency.
>>>>>
>>>>> (I missed there was v3 already, but I'll repeat what I said in v1.)
>>>>>
>>>>> Please check FIELD_PREP() from include/linux/bitfield.h, doesn't it
>>>>> already do the same?
>>>>
>>>> Indeed, it appears to do the same :-)
>>>> Any reason why "include/linux/bitfield.h" is not included by default in
>>>> bitops.h?  
>>>
>>> Hi!
>>>
>>> The code is in a separate header because of circular dependencies
>>> (coming from bug.h including bitops.h, IIRC).  You could possibly add an
>>> include of bitfield.h in bitops.h if you're very careful, I haven't
>>> tried TBH :)
>>>   
>>
>> Well, the following seems to work just fine:
>>
>> diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
>> index f6505d8..24c7480 100644
>> --- a/include/linux/bitfield.h
>> +++ b/include/linux/bitfield.h
>> @@ -15,8 +15,6 @@
>>  #ifndef _LINUX_BITFIELD_H
>>  #define _LINUX_BITFIELD_H
>>  
>> -#include 
>> -
> 
> That would break users who include bitfield.h directly and don't
> include bug.h, no?

That's why I was asking if there's a way to verify that I did not break
anything, as it may be simpler to just modify the users.

Right now bitops.h does not include bug.h yet my patch on this thread
used BUILD_BUG_ON_ZERO() inside bitops.h without issues.

So it seems like the "proper" include order is already in place.
(even though I agree with you that ideally each header file should include
headers required for it to work properly, regardless of include order)

Would the following files be the only two users we would need to worry
about?

$ git grep "linux/bitfield\.h"
drivers/net/ethernet/netronome/nfp/nfp_bpf.h:#include 
drivers/net/wireless/mediatek/mt7601u/mt7601u.h:#include 

I can take a look at the underlying include order issue later.

> 
>>  /*
>>   * Bitfield access macros
>>   *
>> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> index a83c822..7e5fab8 100644
>> --- a/include/linux/bitops.h
>> +++ b/include/linux/bitops.h
>> @@ -24,6 +24,8 @@
>>  #define GENMASK_ULL(h, l) \
>> (((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
>>  
>> +#include "bitfield.h"
>> +
>>  extern unsigned int __sw_hweight8(unsigned int w);
>>  extern unsigned int __sw_hweight16(unsigned int w);
>>  extern unsigned int __sw_hweight32(unsigned int w);
>>
>>
>> Is there a way to be sure it works in all cases? Otherwise
>> I could just submit that, right?
> 



Re: [PATCH v3] add equivalent of BIT(x) for bitfields

2016-12-07 Thread Sebastian Frias
On 07/12/16 12:05, Jakub Kicinski wrote:
> On Wed, 7 Dec 2016 11:00:57 +0100, Sebastian Frias wrote:
>> On 07/12/16 09:42, Kalle Valo wrote:
>>> Sebastian Frias  writes:
>>>   
>>>> Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
>>>> continuous bitfields, just as BIT(x) does for single bits.
>>>>
>>>> GENVALUE_ULL(msb, lsb, value) macro is also added.
>>>>
>>>> This is useful mostly for creating values to be packed together
>>>> via OR operations, ex:
>>>>
>>>>    u32 val = 0x;
>>>>val |= GENVALUE(19, 12, 0x5a);
>>>>
>>>> now 'val = 0x1115a000'
>>>>
>>>>
>>>> Signed-off-by: Sebastian Frias 
>>>> Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
>>>> ---
>>>>
>>>> Change in v2:
>>>> - rename the macro to GENVALUE as proposed by Linus
>>>> - longer comment attempts to show use case for the macro as
>>>> proposed by Borislav
>>>>
>>>> Change in v3:
>>>> - use BUILD_BUG_ON_ZERO() to break if some input parameters
>>>> (essentially 'lsb' but also 'msb') are not constants as
>>>> proposed by Linus.
>>>> Indeed, 'lsb' is used twice so it cannot have side-effects;
>>>> 'msb' is subjected to same constraints for consistency.  
>>>
>>> (I missed there was v3 already, but I'll repeat what I said in v1.)
>>>
>>> Please check FIELD_PREP() from include/linux/bitfield.h, doesn't it
>>> already do the same?  
>>
>> Indeed, it appears to do the same :-)
>> Any reason why "include/linux/bitfield.h" is not included by default in
>> bitops.h?
> 
> Hi!
> 
> The code is in a separate header because of circular dependencies
> (coming from bug.h including bitops.h, IIRC).  You could possibly add an
> include of bitfield.h in bitops.h if you're very careful, I haven't
> tried TBH :)
> 

Well, the following seems to work just fine:

diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index f6505d8..24c7480 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -15,8 +15,6 @@
 #ifndef _LINUX_BITFIELD_H
 #define _LINUX_BITFIELD_H
 
-#include 
-
 /*
  * Bitfield access macros
  *
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..7e5fab8 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,8 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#include "bitfield.h"
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);


Is there a way to be sure it works in all cases? Otherwise
I could just submit that, right?



Re: [PATCH v3] add equivalent of BIT(x) for bitfields

2016-12-07 Thread Sebastian Frias
On 07/12/16 09:42, Kalle Valo wrote:
> Sebastian Frias  writes:
> 
>> Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
>> continuous bitfields, just as BIT(x) does for single bits.
>>
>> GENVALUE_ULL(msb, lsb, value) macro is also added.
>>
>> This is useful mostly for creating values to be packed together
>> via OR operations, ex:
>>
>>u32 val = 0x;
>>val |= GENVALUE(19, 12, 0x5a);
>>
>> now 'val = 0x1115a000'
>>
>>
>> Signed-off-by: Sebastian Frias 
>> Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
>> ---
>>
>> Change in v2:
>> - rename the macro to GENVALUE as proposed by Linus
>> - longer comment attempts to show use case for the macro as
>> proposed by Borislav
>>
>> Change in v3:
>> - use BUILD_BUG_ON_ZERO() to break if some input parameters
>> (essentially 'lsb' but also 'msb') are not constants as
>> proposed by Linus.
>> Indeed, 'lsb' is used twice so it cannot have side-effects;
>> 'msb' is subjected to same constraints for consistency.
> 
> (I missed there was v3 already, but I'll repeat what I said in v1.)
> 
> Please check FIELD_PREP() from include/linux/bitfield.h, doesn't it
> already do the same?

Indeed, it appears to do the same :-)
Any reason why "include/linux/bitfield.h" is not included by default in
bitops.h?



[PATCH v4] add macros for bitfield manipulation ala GENMASK

2016-12-06 Thread Sebastian Frias
Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
continuous bitfields, just as BIT(x) does for single bits.

GENVALUE_ULL(msb, lsb, value) macro is also added.

These are useful mostly for creating values to be packed together
via OR operations, ex:

   u32 val = 0x;
   val |= GENVALUE(19, 12, 0x5a);

now 'val = 0x1115a000'

Introduce BITFIELD_INSERT(target, msb, lsb, val) and its counter-part
BITFIELD_EXTRACT(source, msb, lsb)

BITFIELD_INSERT_ULL/BITFIELD_EXTRACT_ULL are also added.

These are useful for something like:

   u32 a = 0x111a5000;
   u32 b = BITFIELD_EXTRACT(a, 19, 12);

=> b = 0xa5;

   BITFIELD_INSERT(a, 19, 12, 0xff5a);
=> a = 0x1115a000;


Signed-off-by: Sebastian Frias 
Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
---

Change in v2:
- rename the macro to GENVALUE as proposed by Linus
- longer comment attempts to show use case for the macro as
proposed by Borislav

Change in v3:
- use BUILD_BUG_ON_ZERO() to break if some input parameters
(essentially 'lsb' but also 'msb') are not constants as
proposed by Linus.
Indeed, 'lsb' is used twice so it cannot have side-effects;
'msb' is subjected to same constraints for consistency.

Change in v4:
- add BITFIELD_INSERT/BITFIELD_EXTRACT as suggested by Geert
- modify commit message to account for that

---
 include/linux/bitops.h | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..491f966 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,72 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#ifdef __KERNEL__
+/*
+ * Equivalent of BIT(x) but for contiguous bitfields
+ *
+ * NOTE: 'val' is truncated to the size of the bitfield [msb:lsb]
+ *
+ * GENVALUE(1, 0,0x) = 0x0003
+ * GENVALUE(3, 0,0x) = 0x000f
+ * GENVALUE(15,8,0x) = 0xff00
+ * GENVALUE(6, 6, 1) = 0x0040 == BIT(6)
+ */
+#define GENVALUE(msb, lsb, val)
\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK((msb), (lsb\
+   )
+
+#define GENVALUE_ULL(msb, lsb, val)\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK_ULL((msb), (lsb\
+   )
+
+/*
+ * Takes 'val' and inserts it as bitfield [msb:lsb] into 'target'
+ *
+ * NOTE: 'val' is truncated to the size of the bitfield [msb:lsb]
+ *
+ *u32 a = 0x111a5000;
+ *BITFIELD_INSERT(a, 19, 12, 0xff5a);
+ * => a = 0x1115a000;
+ */
+#define BITFIELD_INSERT(target, msb, lsb, val) \
+   (target = ((target & ~GENMASK(msb, lsb)) |  \
+  GENVALUE(msb, lsb, val)))
+
+#define BITFIELD_INSERT_ULL(target, msb, lsb, val) \
+   (target = ((target & ~GENMASK_ULL(msb, lsb)) |  \
+  GENVALUE_ULL(msb, lsb, val)))
+
+/*
+ * Extracts bitfield [msb:lsb] from 'source'
+ *
+ *u32 a = 0x111a5000;
+ *u32 b = BITFIELD_EXTRACT(a, 19, 12);
+ * => b = 0xa5;
+ */
+#define BITFIELD_EXTRACT(source, msb, lsb) \
+   (   \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((source) & GENMASK(msb, lsb)) >> (lsb))   \
+   )
+
+#define BITFIELD_EXTRACT_ULL(source, msb, lsb) \
+   (   \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((source) & GENMASK_ULL(msb, lsb)) >> (lsb))   \
+   )
+#endif
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
-- 
1.8.3.1



[PATCH v5] add macros for bitfield manipulation "à la" GENMASK

2016-12-06 Thread Sebastian Frias
Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
continuous bitfields, just as BIT(x) does for single bits.

GENVALUE_ULL(msb, lsb, value) macro is also added.

These are useful mostly for creating values to be packed together
via OR operations, ex:

   u32 val = 0x;
   val |= GENVALUE(19, 12, 0x5a);

=> 'val = 0x1115a000'

Introduce BITFIELD_INSERT(target, msb, lsb, val) and its counter-part
BITFIELD_EXTRACT(source, msb, lsb)

BITFIELD_INSERT_ULL/BITFIELD_EXTRACT_ULL are also added.

These are useful for:

   a = 0x111a5000;
   b = BITFIELD_EXTRACT(a, 19, 12);

=> 'b = 0xa5'

   BITFIELD_INSERT(a, 19, 12, 0xff5a);

=> 'a = 0x1115a000'


Signed-off-by: Sebastian Frias 
Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
---

Change in v2:
- rename the macro to GENVALUE as proposed by Linus
- longer comment attempts to show use case for the macro as
proposed by Borislav

Change in v3:
- use BUILD_BUG_ON_ZERO() to break if some input parameters
(essentially 'lsb' but also 'msb') are not constants as
proposed by Linus.
Indeed, 'lsb' is used twice so it cannot have side-effects;
'msb' is subjected to same constraints for consistency.

Change in v4:
- add BITFIELD_INSERT/BITFIELD_EXTRACT as suggested by Geert
- modify commit message to account for that

Change in v5:
- fix minor formatting in commit message

---
 include/linux/bitops.h | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..491f966 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,72 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#ifdef __KERNEL__
+/*
+ * Equivalent of BIT(x) but for contiguous bitfields
+ *
+ * NOTE: 'val' is truncated to the size of the bitfield [msb:lsb]
+ *
+ * GENVALUE(1, 0,0x) = 0x0003
+ * GENVALUE(3, 0,0x) = 0x000f
+ * GENVALUE(15,8,0x) = 0xff00
+ * GENVALUE(6, 6, 1) = 0x0040 == BIT(6)
+ */
+#define GENVALUE(msb, lsb, val)
\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK((msb), (lsb\
+   )
+
+#define GENVALUE_ULL(msb, lsb, val)\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK_ULL((msb), (lsb\
+   )
+
+/*
+ * Takes 'val' and inserts it as bitfield [msb:lsb] into 'target'
+ *
+ * NOTE: 'val' is truncated to the size of the bitfield [msb:lsb]
+ *
+ *a = 0x111a5000;
+ *BITFIELD_INSERT(a, 19, 12, 0xff5a);
+ * => 'a = 0x1115a000'
+ */
+#define BITFIELD_INSERT(target, msb, lsb, val) \
+   (target = ((target & ~GENMASK(msb, lsb)) |  \
+  GENVALUE(msb, lsb, val)))
+
+#define BITFIELD_INSERT_ULL(target, msb, lsb, val) \
+   (target = ((target & ~GENMASK_ULL(msb, lsb)) |  \
+  GENVALUE_ULL(msb, lsb, val)))
+
+/*
+ * Extracts bitfield [msb:lsb] from 'source'
+ *
+ *a = 0x111a5000;
+ *b = BITFIELD_EXTRACT(a, 19, 12);
+ * => 'b = 0xa5'
+ */
+#define BITFIELD_EXTRACT(source, msb, lsb) \
+   (   \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((source) & GENMASK(msb, lsb)) >> (lsb))   \
+   )
+
+#define BITFIELD_EXTRACT_ULL(source, msb, lsb) \
+   (   \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((source) & GENMASK_ULL(msb, lsb)) >> (lsb))   \
+   )
+#endif
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
-- 
1.8.3.1




Re: [PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-06 Thread Sebastian Frias
Hi Geert,

On 06/12/16 12:12, Geert Uytterhoeven wrote:
>>> ... which means "generate a value"??
>>>
>>
>> Yes.
>> Although I'm not sure if I understood the essence of your point.
>> Are you suggesting that the name should be GENERATE_A_VALUE?
> 
> No. I mean that "value" is a way too generic name.
> Hence "GENVALUE" may be suitable for a macro local to a driver, but is way
> too generic and fuzzy for a global function.

IMHO GENMASK presents the same situation, but actually I don't really mind
about the name.

> 
>> There's already GENMASK, which "generates a mask".
> 
> Yes. And it generates a (bit)mask, which is clear from its name.
> But a "value" is just too generic for a global function, and make me think of
> a pseudo-random number generator ;-)

:-)

>>> "val |= 0x5a << 12;" looks much more readable to me...
>>>
>>
>> Well, the idea behind this is that one can use it like:
>>
>> (see https://marc.info/?l=linux-kernel&m=148095872915717&w=2)
>>
>> ...
>> #define TIMEOUT_CLK_UNIT_MHZ   BIT(6)
>> #define BUS_CLK_FREQ_FOR_SD_CLK(x) GENVALUE(14,7,x)
>> ...
>> val = 0;
>> val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
>> val |= BUS_CLK_FREQ_FOR_SD_CLK(200); /* SDIO clock: 200MHz */
>> ...
>>
>> which makes it very practical for writing macros for associated HW
>> documentation.
> 
> Actually I more like the SETBITFIELD name...
> 

Well, Linus did not like it for example.
Since the start I was open to suggestions because I felt the name
was not great either.

https://marc.info/?l=linux-kernel&m=148095805515487&w=2


>>> OK. So it inserts a value into a bitfield.
>>>
>>> Yes, that can be useful. Now let's find a sensible name for this.
>>> Perhaps inspired by a PowerPC mnemonic? At least that would be more
>>> obvious than "GENVALUE", IMHO...
>>
>> I'm open to suggestions.
> 
> BITFIELD_INSERT()?

The problem is that right now it does not inserts into anything,
it just generates a value. The user can then OR that with something else
essentially "inserting" the value.

(see my reply to Borislav here:
https://marc.info/?l=linux-kernel&m=148095872915717&w=2 )

This allows a use case where BIT() and GENVALUE() can be used in a
similar way:

#define TIMEOUT_CLK_UNIT_MHZ   BIT(6)
#define BUS_CLK_FREQ_FOR_SD_CLK(x) GENVALUE(14,7,x)
...
val = 0;
val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
val |= BUS_CLK_FREQ_FOR_SD_CLK(200); /* SDIO clock: 200MHz */
...

I can write BITFIELD_INSERT as well, but I would not want to have
to choose between BITFIELD_INSERT and GENVALUE, because that would
mean that the snippet above would become:

#define BITFIELD_INSERT(target, msb, lsb, val)  \
(target = ((target & ~GENMASK(msb, lsb)) | GENVALUE(msb, lsb, val)))
...
#define TIMEOUT_CLK_UNIT_MHZ  BIT(6)
#define BUS_CLK_FREQ_FOR_SD_CLK(y, x) BITFIELD_INSERT(y,14,7,x)
...
val = 0;
val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
BUS_CLK_FREQ_FOR_SD_CLK(val, 200);   /* SDIO clock: 200MHz */
...


NOTES:
- Going for BITFIELD_INSERT() means that we probably
need to decide its behaviour w.r.t existing bits, does it OR
(thus preserving bits set) or does it overwrite? (most likely
the later option)
- Going for BITFIELD_INSERT() calls for its counter-part
BITFIELD_EXTRACT(), so that we can do:

...
   val = 0x1115a000;
   if (BITFIELD_EXTRACT(val, 19, 12) == 0x5a)
...


Best regards,

Sebastian


Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-12-06 Thread Sebastian Frias
Hi,

On 05/12/16 17:30, Doug Anderson wrote:

> 
> AKA: you are setting up various "corecfg" stuff that's documented in
> the generic Arasan docs.  Others SDHCI-Arasan implementations might
> want to set the same things, but those values may be stored elsewhere
> for them.

Exactly, that is what I'm trying to find out.
To me, one good place to store this would be DT.

> 
> So if _all_ Arasan implementations need these same values or need the
> same logic to figure out these values, then you should do something
> that's not chip-specific but something generic.
> 

It depends on where this needs to be set, and why am I the first to need
to set this up.

> If you've got a specific weird quirk that's specific to your platform,
> then you could do that in a chip-specific init.

Yes, or in the set_ios you talked earlier.

> 
> Presumably many of the above could just be hardcoded on some
> implementations, so they might not be available in a memory-mapped
> implementation...
> 

Exactly.

> 
>> which seems much easier to handle (and portable).
>>
>> At any rate, one thing to note from this is that many of these
>> bits should probably be initialised based on DT, right?
> 
> Probably, or by proving the voltage value of regulations.  Note that I
> think DT already gets parsed and sets up caps.  I'm not really an
> expert here and I'd let someone who actually knows / maintains SDMMC
> comment.  I know for sure that dw_mmc (which I'm way more familiar
> with) does things very differently than sdhci (which I'm barely
> familiar with).

Thanks.
Could somebody else comment please?

Best regards,

Sebastian

> 
> 
>> For example, the DT has a "non-removable" property, which I think
>> should be used to setup SLOT_TYPE_EMBEDDED (if the property is
>> present) or SLOT_TYPE_REMOVABLE (if the property is not present)
>>
>> Looking at Documentation/devicetree/bindings/mmc/mmc.txt we can see
>> more related properties:
>>
>> Optional properties:
>> - bus-width: Number of data lines, can be <1>, <4>, or <8>.  The default
>>   will be <1> if the property is absent.
>> - wp-gpios: Specify GPIOs for write protection, see gpio binding
>> - cd-inverted: when present, polarity on the CD line is inverted. See the 
>> note
>>   below for the case, when a GPIO is used for the CD line
>> - wp-inverted: when present, polarity on the WP line is inverted. See the 
>> note
>>   below for the case, when a GPIO is used for the WP line
>> - disable-wp: When set no physical WP line is present. This property should
>>   only be specified when the controller has a dedicated write-protect
>>   detection logic. If a GPIO is always used for the write-protect detection
>>   logic it is sufficient to not specify wp-gpios property in the absence of 
>> a WP
>>   line.
>> - max-frequency: maximum operating clock frequency
>> - no-1-8-v: when present, denotes that 1.8v card voltage is not supported on
>>   this system, even if the controller claims it is.
>> - cap-sd-highspeed: SD high-speed timing is supported
>> - cap-mmc-highspeed: MMC high-speed timing is supported
>> - sd-uhs-sdr12: SD UHS SDR12 speed is supported
>> - sd-uhs-sdr25: SD UHS SDR25 speed is supported
>> - sd-uhs-sdr50: SD UHS SDR50 speed is supported
>> - sd-uhs-sdr104: SD UHS SDR104 speed is supported
>> - sd-uhs-ddr50: SD UHS DDR50 speed is supported
>> ...
>>
>> which makes me wonder, what is the recommended way of dealing with this?
>> - Should I use properties on the DT? If so, then I need to add code to set
>> up the register properly.
>> - Should I hardcode these values use a minimal DT? If so, then I need an
>> init function to put all this.
>> - Should I hardcode stuff at u-Boot level? If so, nothing is required and
>> should work without any modifications to the Arasan Linux driver.
>>
>> It appears that the Linux driver is expecting most of these fields to be
>> hardcoded and "pre-set" before (maybe by the bootloader) it starts, hence
>> the lack of any "init" function so far.
>>
>>>
>>> In your platform-specific init you're proposing you could set
>>> tango_pad_mode to 0.  That seems tango-specific.
>>>
>>> You'd want to hook into "set_ios" for setting sel_sdio or not.  That's
>>> important if anyone ever wants to plug in an external SDIO card to
>>> your slot.  This one good argument for putting this in
>>> sdhci_arasan_soc_ctl_map, since you wouldn't want to do a
>>> compatibility matching every time set_ios is called.
>>
>> Thanks for the advice, I will look into that.
>>
>>>
>>> I'd have to look more into the whole SD/WP polarity issue.  There are
>>> already so many levels of inversion for these signals in Linux that
>>> it's confusing.  It seems like you might just want to hardcode them to
>>> "0" and let users use all the existing ways to invert things...  You
>>> could either put that hardcoding into your platform init code or (if
>>> you're using sdhci_arasan_soc_ctl_map) put it in the main init code so
>>> that if anyone else needs to init similar signals then they

Re: [PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-06 Thread Sebastian Frias
On 06/12/16 11:42, Geert Uytterhoeven wrote:
> On Tue, Dec 6, 2016 at 11:31 AM, Sebastian Frias  wrote:
>> On 05/12/16 18:48, Geert Uytterhoeven wrote:
>>> On Mon, Dec 5, 2016 at 2:36 PM, Sebastian Frias  wrote:
>>>> Introduce SETBITFIELD(msb, lsb, value) macro to ease dealing with
>>>> continuous bitfields, just as BIT(x) does for single bits.
>>>
>>> If it's a bitfield, why not calling it that way?
>>
>> I don't know if you saw v2 (or v3 for that matter), but the name was changed
>> to GENVALUE.
> 
> ... which means "generate a value"??
> 

Yes.
Although I'm not sure if I understood the essence of your point.
Are you suggesting that the name should be GENERATE_A_VALUE?

There's already GENMASK, which "generates a mask".

>> Also a small use case was added to the commit message:
>>
>> "Introduce GENVALUE(msb, lsb, value) macro..."
>> "...This is useful mostly for creating values to be packed together
>> via OR operations, ex:
>>
>>u32 val = 0x;
>>val |= GENVALUE(19, 12, 0x5a);
> 
> "val |= 0x5a << 12;" looks much more readable to me...
> 

Well, the idea behind this is that one can use it like:

(see https://marc.info/?l=linux-kernel&m=148095872915717&w=2)

...
#define TIMEOUT_CLK_UNIT_MHZ   BIT(6)
#define BUS_CLK_FREQ_FOR_SD_CLK(x) GENVALUE(14,7,x)
...
val = 0;
val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
val |= BUS_CLK_FREQ_FOR_SD_CLK(200); /* SDIO clock: 200MHz */
...

which makes it very practical for writing macros for associated HW
documentation.

>> now 'val = 0x1115a000'"
>>
>>> So what about BITFIELD(start ,size), like arch/tile/kernel/tile-desc_32.c 
>>> has?
>>>
>>>> SETBITFIELD_ULL(msb, lsb, value) macro is also added.
>>>
>>> Confused by the need for a "value" parameter...
>>
>> "value" is the value to be massaged (shifted, masked) into a [msb:lsb] 
>> bitfield.
> 
> OK. So it inserts a value into a bitfield.
> 
> Yes, that can be useful. Now let's find a sensible name for this.
> Perhaps inspired by a PowerPC mnemonic? At least that would be more
> obvious than "GENVALUE", IMHO...

I'm open to suggestions.

> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
> 



Re: [PATCH v2] add equivalent of BIT(x) for bitfields

2016-12-06 Thread Sebastian Frias
On 05/12/16 19:23, Linus Torvalds wrote:
> On Mon, Dec 5, 2016 at 9:49 AM, Sebastian Frias  wrote:
>> Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
>> continuous bitfields, just as BIT(x) does for single bits.
> 
> Oh, and looking at the implementation, this is wrong. You use "lsb"
> twice, so it mustn't have side effects.
> 
> That's fine for all expected users (since you'd expect that the only
> real use of this is with constant values for msb/lsb), 

Yes, that's what I thought.

>but please add
> actual checking.

Sure! Thanks for the advice.

> 
> You can use BUILD_BUG_ON_ZERO(!__builtin_constant_p(x)) or something.
> That returns zero, so it's easy to use in expressions.

Done. v3 of the patch is submitted.

> 
>  Linus
> 



Re: [PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-06 Thread Sebastian Frias
On 05/12/16 18:48, Geert Uytterhoeven wrote:
> On Mon, Dec 5, 2016 at 2:36 PM, Sebastian Frias  wrote:
>> Introduce SETBITFIELD(msb, lsb, value) macro to ease dealing with
>> continuous bitfields, just as BIT(x) does for single bits.
> 
> If it's a bitfield, why not calling it that way?
> 

I don't know if you saw v2 (or v3 for that matter), but the name was changed
to GENVALUE.
Also a small use case was added to the commit message:

"Introduce GENVALUE(msb, lsb, value) macro..."
"...This is useful mostly for creating values to be packed together
via OR operations, ex:

   u32 val = 0x;
   val |= GENVALUE(19, 12, 0x5a);

now 'val = 0x1115a000'"

> So what about BITFIELD(start ,size), like arch/tile/kernel/tile-desc_32.c has?
> 
>> SETBITFIELD_ULL(msb, lsb, value) macro is also added.
> 
> Confused by the need for a "value" parameter...

"value" is the value to be massaged (shifted, masked) into a [msb:lsb] bitfield.

> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
> 



[PATCH v3] add equivalent of BIT(x) for bitfields

2016-12-06 Thread Sebastian Frias
Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
continuous bitfields, just as BIT(x) does for single bits.

GENVALUE_ULL(msb, lsb, value) macro is also added.

This is useful mostly for creating values to be packed together
via OR operations, ex:

   u32 val = 0x;
   val |= GENVALUE(19, 12, 0x5a);

now 'val = 0x1115a000'


Signed-off-by: Sebastian Frias 
Link: https://marc.info/?l=linux-kernel&m=148094498711000&w=2
---

Change in v2:
- rename the macro to GENVALUE as proposed by Linus
- longer comment attempts to show use case for the macro as
proposed by Borislav

Change in v3:
- use BUILD_BUG_ON_ZERO() to break if some input parameters
(essentially 'lsb' but also 'msb') are not constants as
proposed by Linus.
Indeed, 'lsb' is used twice so it cannot have side-effects;
'msb' is subjected to same constraints for consistency.

---
 include/linux/bitops.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..419529a0 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,31 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#ifdef __KERNEL__
+/*
+ * Equivalent of BIT(x) but for contiguous bitfields
+ * GENVALUE(1, 0,0xff) = 0x0003
+ * GENVALUE(3, 0,0xff) = 0x000f
+ * GENVALUE(15,8,0xff) = 0xff00
+ * GENVALUE(6, 6,   1) = 0x0040 == BIT(6)
+ */
+#define GENVALUE(msb, lsb, val)\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK((msb), (lsb\
+   )
+
+#define GENVALUE_ULL(msb, lsb, val)\
+   (   \
+   /* BUILD_BUG_ON_ZERO() evaluates to 0 */\
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(msb)) | \
+   BUILD_BUG_ON_ZERO(!__builtin_constant_p(lsb)) | \
+   (((val) << (lsb)) & (GENMASK_ULL((msb), (lsb\
+   )
+#endif
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
-- 
1.8.3.1


[PATCH v2] add equivalent of BIT(x) for bitfields

2016-12-05 Thread Sebastian Frias
Introduce GENVALUE(msb, lsb, value) macro to ease dealing with
continuous bitfields, just as BIT(x) does for single bits.

GENVALUE_ULL(msb, lsb, value) macro is also added.

This is useful mostly for creating values to be packed together
via OR operations, ex:

   u32 val = 0x;
   val |= GENVALUE(19, 12, 0x5a);

now 'val = 0x1115a000'


Signed-off-by: Sebastian Frias 
---

Change in v2:
- rename the macro to GENVALUE as proposed by Linus
- longer comment attempts to show use case for the macro as
proposed by Borislav

---
 include/linux/bitops.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..641675d 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,20 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#ifdef __KERNEL__
+/*
+ * Equivalent of BIT(x) but for contiguous bitfields
+ * GENVALUE(1, 0,0xff) = 0x0003
+ * GENVALUE(3, 0,0xff) = 0x000f
+ * GENVALUE(15,8,0xff) = 0xff00
+ * GENVALUE(6, 6,   1) = 0x0040 == BIT(6)
+ */
+#define GENVALUE(msb, lsb, val) \
+   (((val) << (lsb)) & (GENMASK((msb), (lsb
+#define GENVALUE_ULL(msb, lsb, val) \
+   (((val) << (lsb)) & (GENMASK_ULL((msb), (lsb
+#endif
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
-- 
1.8.3.1


Re: [PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-05 Thread Sebastian Frias
On 05/12/16 18:13, Linus Torvalds wrote:
> On Mon, Dec 5, 2016 at 5:36 AM, Sebastian Frias  wrote:
>> Introduce SETBITFIELD(msb, lsb, value) macro to ease dealing with
>> continuous bitfields, just as BIT(x) does for single bits.
>>
>> SETBITFIELD_ULL(msb, lsb, value) macro is also added.
> 
> No. No, no, no.
> 
> Didn't we have this discussion already? Or was that for one of the
> other silly naming things?
> 
> That thing doesn't "SET" anything at all. It generates a value, nothing more.
> 
> So the name is completely unacceptable. It follows the convention of
> GENMASK, so maybe GENVALUE?
> 

Thanks for your input.
I was looking for suggestions on the name, thanks for yours, I will submit
a v2 with the name changed as you proposed.

> I also absolutely hate the stupid "big bit first" idiocy, but we did
> that for GENMASK too, so I guess we're stuck with that retarded model.
> 

Yes, I followed the same convention.

> Yes, I understand why it happened - people look at register definition
> graphics, and the high bits are to the left.
> 
> But when you then read the documentation, it will still say things
> like "bits 9 through 12 contain the value XYZ", because while
> individual numbers are written MSB first, we actuall _read_ left to
> right. You'd never give a range as "12 to 5", you'd say "5 to 12".
> 
>Linus
> 



Re: [PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-05 Thread Sebastian Frias
On 05/12/16 16:34, Borislav Petkov wrote:
> On Mon, Dec 05, 2016 at 02:36:07PM +0100, Sebastian Frias wrote:
>> + * Equivalent of BIT(x) but for contiguous bitfields
>> + * SETBITFIELD(1, 0,0xff) = 0x0003
>> + * SETBITFIELD(3, 0,0xff) = 0x000f
>> + * SETBITFIELD(15,8,0xff) = 0xff00
>> + * SETBITFIELD(6, 6,   1) = 0x0040 == BIT(6)
>> + */
>> +#define SETBITFIELD(msb, lsb, val) \
>> +(((val) << (lsb)) & (GENMASK((msb), (lsb
>> +#define SETBITFIELD_ULL(msb, lsb, val) \
>> +(((val) << (lsb)) & (GENMASK_ULL((msb), (lsb
> 
> What is the use case for these macros?
> 

Well, right now I'm using them for stuff like:

#define TIMEOUT_CLK_UNIT_MHZ   BIT(6)
#define BUS_CLK_FREQ_FOR_SD_CLK(x) SETBITFIELD(14,7,x)
...
val = 0;
val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
val |= BUS_CLK_FREQ_FOR_SD_CLK(200); /* SDIO clock: 200MHz */
...

which makes it very practical for writing macros for associated HW
documentation.

It is true though that the name SETBITFIELD may not be great, since the
macro is more about taking a value and massaging it to fit the given
bitfield so that it can be ORed to place said bits.
I wouldn't call it a mask, because IMHO it does not really mask (i.e.:
hides something).

> I'm thinking it would be better if they were really generic so that I
> can be able to hand in a value and to say, set bits from [lsb, msb] to
> this bit pattern. Because then you'd have to punch a hole in the value
> with a mask and then OR-in the actual mask you want to have.
> 
> I.e., something like this:
> 
> #define SET_BITMASK(val, msb, lsb, mask)\
> (val = (val & ~(GENMASK(msb, lsb))) | (mask << lsb))
> 
> So that you can be able to do:
> 
> unsigned int val = 0x;
> 
> SET_BITMASK(val, 19, 12, 0x5a)
> val = 0x1115a000;
> 
> I'm not sure, though, whether this won't make the actual use too cryptic
> and people having to go look it up each time they use it instead of
> actually doing the ORing in by hand which everyone can understand from
> staring at it instead of seeing a funny macro SET_BITMASK().
> 

Well, we could have both, with your proposed SET_BITMASK() building on top
of the proposed SETBITFIELD:

#define SET_BITMASK(target, msb, lsb, val) \
   (target = (target | SETBITFIELD(msb, lsb, val)))

> Actually, you could simply add another macro which is called
> 
> GENMASK_MASK(msb, lsb, mask)
> 
> or so (yeah, yucky name) which gives you that arbitrary mask which you
> then can use anywhere.
> 
> Because then it becomes more readable:
> 
> val &= ~GENMASK(19, 12);
> val |= GENMASK_MASK(19, 12, 0xa5);
> 
> Now you know exactly what happens: first line punches the hole, second
> ORs in the new mask.
> 
> Hmmm...
> 

Ok, so you are basically proposing to keep the macro, but with a different
name (SETBITFIELD to GENMASK_MASK).

I think GENMASK_MASK may not be much better than SETBITFIELD.


[PATCH] bitops: add equivalent of BIT(x) for bitfields

2016-12-05 Thread Sebastian Frias
Introduce SETBITFIELD(msb, lsb, value) macro to ease dealing with
continuous bitfields, just as BIT(x) does for single bits.

SETBITFIELD_ULL(msb, lsb, value) macro is also added.

Signed-off-by: Sebastian Frias 
---

Code protected with "#ifdef __KERNEL__" just as the BIT(x)
macros.

I would have preferred another name, like BITS(x) but it is
already used.

Suggestions for other names welcome.
---
 include/linux/bitops.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index a83c822..4659237 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -24,6 +24,20 @@
 #define GENMASK_ULL(h, l) \
(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h
 
+#ifdef __KERNEL__
+/*
+ * Equivalent of BIT(x) but for contiguous bitfields
+ * SETBITFIELD(1, 0,0xff) = 0x0003
+ * SETBITFIELD(3, 0,0xff) = 0x000f
+ * SETBITFIELD(15,8,0xff) = 0xff00
+ * SETBITFIELD(6, 6,   1) = 0x0040 == BIT(6)
+ */
+#define SETBITFIELD(msb, lsb, val) \
+   (((val) << (lsb)) & (GENMASK((msb), (lsb
+#define SETBITFIELD_ULL(msb, lsb, val) \
+   (((val) << (lsb)) & (GENMASK_ULL((msb), (lsb
+#endif
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
-- 
1.8.3.1



Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-12-05 Thread Sebastian Frias
Hi Doug,

On 28/11/16 19:02, Doug Anderson wrote:
> Hi,
> 
> On Mon, Nov 28, 2016 at 5:28 AM, Sebastian Frias  wrote:
>> +static void sdhci_tango4_platform_init(struct sdhci_host *host)
>> +{
>> +   printk("%s\n", __func__);
>> +
>> +   /*
>> + pad_mode[2:0]=0must be 0
>> + sel_sdio[3]=1  must be 1 for SDIO
>> + inv_sdwp_pol[4]=0  if set inverts the SD write protect polarity
>> + inv_sdcd_pol[5]=0  if set inverts the SD card present polarity
>> +   */
>> +   sdhci_writel(host, 0x0008, 0x100 + 0x0);
> 
> If I were doing this, I'd probably actually add these fields to the
> "sdhci_arasan_soc_ctl_map", then add a 3rd option to
> sdhci_arasan_syscon_write().  Right now it has 2 modes: "hiword
> update" and "non-hiword update".  You could add a 3rd indicating that
> you set config registers by just writing at some offset of the main
> SDHCI registers space.
> 
> So you'd add 4 fields:
> 
> .tango_pad_mode = { .reg = 0x, .width = 3, .shift = 0 },
> .sel_sdio = { .reg = 0x, .width = 1, .shift = 3},
> .inv_sdwp_pol = { .reg = 0x, .width = 1, .shift = 4},
> .inv_sdcd_pol = { .reg = 0x, .width = 1, .shift = 5},

Right now I'm using something like:

+   val = 0;
+   val |= PAD_MODE(0); /* must be 0 */
+   val |= SEL_SDIO;/* enable SDIO */
+   sdhci_writel(host, val, 0x100 + 0x0);
+
+   val = 0;
+   val |= TIMEOUT_CLK_UNIT_MHZ; /* unit: MHz */
+   val |= TIMEOUT_CLK_FREQ(52); /* timeout clock: 52MHz */
+   val |= BUS_CLK_FREQ_FOR_SD_CLK(200); /* SDIO clock: 200MHz (TODO: get 
from DT) */
+   val |= MAX_BLOCK_LENGTH(3);  /* max block size: 4096 bytes */
+   val |= EXTENDED_MEDIA_BUS_SUPPORTED; /* DT? */
+   val |= ADMA2_SUPPORTED;  /* DT? */
+   val |= HIGH_SPEED_SUPPORTED; /* DT? */
+   val |= SDMA_SUPPORTED;   /* DT? */
+   val |= SUSPEND_RESUME_SUPPORTED; /* DT? */
+   val |= VOLTAGE_3_3_V_SUPPORTED;  /* DT? */
+#if 0
+   val |= VOLTAGE_1_8_V_SUPPORTED;  /* DT? */
+#endif
+   val |= ASYNCHRONOUS_IRQ_SUPPORTED;   /* DT? */
+   val |= SLOT_TYPE_REMOVABLE;  /* DT? */
+   val |= SDR50_SUPPORTED;  /* DT? */
+   sdhci_writel(host, val, 0x100 + 0x40);
+
+   val = 0;
+   val |= TIMER_COUNT_FOR_RETUNING(1);  /* DT? */
+   val |= CLOCK_MULTIPLIER(0);  /* DT? */
+   val |= SPI_MODE_SUPPORTED;   /* DT? */
+   val |= SPI_BLOCK_MODE_SUPPORTED; /* DT? */
+   sdhci_writel(host, val, 0x100 + 0x44);
+
+   sdhci_writel(host, 0x0004022c, 0x100 + 0x28);
+   sdhci_writel(host, 0x0002, 0x100 + 0x2c);
+
+   sdhci_writel(host, 0x0060, 0x100 + 0x50);

which seems much easier to handle (and portable).

At any rate, one thing to note from this is that many of these
bits should probably be initialised based on DT, right?

For example, the DT has a "non-removable" property, which I think
should be used to setup SLOT_TYPE_EMBEDDED (if the property is
present) or SLOT_TYPE_REMOVABLE (if the property is not present)

Looking at Documentation/devicetree/bindings/mmc/mmc.txt we can see
more related properties:

Optional properties:
- bus-width: Number of data lines, can be <1>, <4>, or <8>.  The default
  will be <1> if the property is absent.
- wp-gpios: Specify GPIOs for write protection, see gpio binding
- cd-inverted: when present, polarity on the CD line is inverted. See the note
  below for the case, when a GPIO is used for the CD line
- wp-inverted: when present, polarity on the WP line is inverted. See the note
  below for the case, when a GPIO is used for the WP line
- disable-wp: When set no physical WP line is present. This property should
  only be specified when the controller has a dedicated write-protect
  detection logic. If a GPIO is always used for the write-protect detection
  logic it is sufficient to not specify wp-gpios property in the absence of a WP
  line.
- max-frequency: maximum operating clock frequency
- no-1-8-v: when present, denotes that 1.8v card voltage is not supported on
  this system, even if the controller claims it is.
- cap-sd-highspeed: SD high-speed timing is supported
- cap-mmc-highspeed: MMC high-speed timing is supported
- sd-uhs-sdr12: SD UHS SDR12 speed is supported
- sd-uhs-sdr25: SD UHS SDR25 speed is supported
- sd-uhs-sdr50: SD UHS SDR50 speed is supported
- sd-uhs-sdr104: SD UHS SDR104 speed is supported
- sd-uhs-ddr50: SD UHS DDR50 speed is supported
...

which makes me wonder, what is the recommended way of dealing with this?
- Should I use properties on the DT? If so, then I need to add code to set
up the register properly.
- Should I hardcode these values use a minimal DT? If so, then I

Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-12-05 Thread Sebastian Frias
Hi Doug,

On 28/11/16 18:46, Doug Anderson wrote:
> Hi,
> 
> On Mon, Nov 28, 2016 at 6:39 AM, Sebastian Frias  wrote:
>>> I will try to send another patch with what a different approach
>>>
>>
>> Here's a different approach (I just tested that it built, because I don't 
>> have the
>> rk3399 platform):
>>
>> diff --git a/drivers/mmc/host/sdhci-of-arasan.c 
>> b/drivers/mmc/host/sdhci-of-arasan.c
>> index 410a55b..5be6e67 100644
>> --- a/drivers/mmc/host/sdhci-of-arasan.c
>> +++ b/drivers/mmc/host/sdhci-of-arasan.c
>> @@ -382,22 +382,6 @@ static int sdhci_arasan_resume(struct device *dev)
>>  static SIMPLE_DEV_PM_OPS(sdhci_arasan_dev_pm_ops, sdhci_arasan_suspend,
>>  sdhci_arasan_resume);
>>
>> -static const struct of_device_id sdhci_arasan_of_match[] = {
>> -   /* SoC-specific compatible strings w/ soc_ctl_map */
>> -   {
>> -   .compatible = "rockchip,rk3399-sdhci-5.1",
>> -   .data = &rk3399_soc_ctl_map,
>> -   },
>> -
>> -   /* Generic compatible below here */
>> -   { .compatible = "arasan,sdhci-8.9a" },
>> -   { .compatible = "arasan,sdhci-5.1" },
>> -   { .compatible = "arasan,sdhci-4.9a" },
>> -
>> -   { /* sentinel */ }
>> -};
>> -MODULE_DEVICE_TABLE(of, sdhci_arasan_of_match);
>> -
>>  /**
>>   * sdhci_arasan_sdcardclk_recalc_rate - Return the card clock rate
>>   *
>> @@ -578,28 +562,18 @@ static void sdhci_arasan_unregister_sdclk(struct 
>> device *dev)
>> of_clk_del_provider(dev->of_node);
>>  }
>>
>> -static int sdhci_arasan_probe(struct platform_device *pdev)
>> +static int sdhci_rockchip_platform_init(struct sdhci_host *host,
>> +   struct platform_device *pdev)
>>  {
>> int ret;
>> -   const struct of_device_id *match;
>> struct device_node *node;
>> -   struct clk *clk_xin;
>> -   struct sdhci_host *host;
>> struct sdhci_pltfm_host *pltfm_host;
>> struct sdhci_arasan_data *sdhci_arasan;
>> -   struct device_node *np = pdev->dev.of_node;
>> -
>> -   host = sdhci_pltfm_init(pdev, &sdhci_arasan_pdata,
>> -   sizeof(*sdhci_arasan));
>> -   if (IS_ERR(host))
>> -   return PTR_ERR(host);
>>
>> pltfm_host = sdhci_priv(host);
>> sdhci_arasan = sdhci_pltfm_priv(pltfm_host);
>> -   sdhci_arasan->host = host;
>>
>> -   match = of_match_node(sdhci_arasan_of_match, pdev->dev.of_node);
>> -   sdhci_arasan->soc_ctl_map = match->data;
>> +   sdhci_arasan->soc_ctl_map = &rk3399_soc_ctl_map;
>>
>> node = of_parse_phandle(pdev->dev.of_node, "arasan,soc-ctl-syscon", 
>> 0);
>> if (node) {
>> @@ -611,10 +585,107 @@ static int sdhci_arasan_probe(struct platform_device 
>> *pdev)
>> if (ret != -EPROBE_DEFER)
>> dev_err(&pdev->dev, "Can't get syscon: %d\n",
>> ret);
>> -   goto err_pltfm_free;
>> +   return -1;
>> }
>> }
>>
>> +   if (of_property_read_bool(pdev->dev.of_node, 
>> "xlnx,fails-without-test-cd"))
>> +   sdhci_arasan->quirks |= SDHCI_ARASAN_QUIRK_FORCE_CDTEST;
>> +
>> +   return 0;
>> +}
>> +
>> +static int sdhci_rockchip_clock_init(struct sdhci_host *host,
>> +   struct platform_device *pdev)
>> +{
>> +   struct sdhci_pltfm_host *pltfm_host;
>> +   struct sdhci_arasan_data *sdhci_arasan;
>> +
>> +   pltfm_host = sdhci_priv(host);
>> +   sdhci_arasan = sdhci_pltfm_priv(pltfm_host);
>> +
>> +   if (of_device_is_compatible(pdev->dev.of_node,
>> +   "rockchip,rk3399-sdhci-5.1"))
>> +   sdhci_arasan_update_clockmultiplier(host, 0x0);
> 
> This _does_ belong in a Rockchip-specific init function, for now.

I'm not sure I understood. Are you saying that you agree to put this
into a specific function? Essentially agreeing with the concept the
patch is putting forward?

Note, I'm more interested in the concept (i.e.: init functions) and less
in knowing if my patch (which was a quick and dirty thing) properly moved
the 

Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-11-28 Thread Sebastian Frias
On 28/11/16 14:28, Sebastian Frias wrote:
> On 28/11/16 12:44, Adrian Hunter wrote:
>> On 28/11/16 13:20, Sebastian Frias wrote:
>>> Hi Adrian,
>>>
>>> On 28/11/16 11:30, Adrian Hunter wrote:
>>>> On 28/11/16 09:32, Michal Simek wrote:
>>>>> +Sai for Xilinx perspective.
>>>>>
>>>>> On 25.11.2016 16:24, Sebastian Frias wrote:
>>>>>> Hi,
>>>>>>
>>>>>> When using the Arasan SDHCI HW IP, there is a set of parameters called
>>>>>> "Hardware initialized registers"
>>>>>>
>>>>>> (Table 7, Section "Pin Signals", page 56 of Arasan "SD3.0/SDIO3.0/eMMC4.4
>>>>>> AHB Host Controller", revision 6.0 document)
>>>>>>
>>>>>> In some platforms those signals are connected to registers that need to
>>>>>> be programmed at some point for proper driver/HW initialisation.
>>>>>>
>>>>>> I found that the 'struct sdhci_ops' contains a '.platform_init' callback
>>>>>> that is called from within 'sdhci_pltfm_init', and that seems a good
>>>>>> candidate for a place to program those registers (*).
>>>>>>
>>>>>> Do you agree?
>>>>
>>>> We already killed .platform_init
>>>
>>> I just saw that, yet it was the perfect place for the HW initialisation I'm
>>> talking about.
>>> Any way we can restore it?
>>
>> It doesn't serve any purpose I am aware of.
> 
> It would serve (for me) if it was there :-)
> 
>>
>>>
>>>>
>>>> What is wrong with sdhci_arasan_probe()?
>>>
>>> Well, in 4.7 sdhci_arasan_probe() did not call of_match_device(), so I had
>>> put a call to it just before sdhci_pltfm_init(), something like:
>>>
>>> +static const struct of_device_id sdhci_arasan_of_match[] = {
>>> +   {
>>> +   .compatible = "arasan,sdhci-8.9a",
>>> +   .data = &sdhci_arasan_ops,
>>> +   },
>>> +   {
>>> +   .compatible = "arasan,sdhci-5.1",
>>> +   .data = &sdhci_arasan_ops,
>>> +   },
>>> +   {
>>> +   .compatible = "arasan,sdhci-4.9a",
>>> +   .data = &sdhci_arasan_ops,
>>> +   },
>>> +   {
>>> +   .compatible = "sigma,smp8734-sdio",
>>> +   .data = &sdhci_arasan_tango4_ops,
>>> +   },
>>> +   { }
>>> +};
>>> +MODULE_DEVICE_TABLE(of, sdhci_arasan_of_match);
>>>
>>> ...
>>>
>>> +   const struct of_device_id *match;
>>> +
>>> +   match = of_match_device(sdhci_arasan_of_match, &pdev->dev);
>>> +   if (match)
>>> +   sdhci_arasan_pdata.ops = match->data;
>>>
>>> where 'sdhci_arasan_tango4_ops' contained a pointer to a .platform_init
>>> callback.
>>>
>>> However, as I stated earlier, an upstream commit:
>>>
>>> commit 3ea4666e8d429223fbb39c1dccee7599ef7657d5
>>> Author: Douglas Anderson 
>>> Date:   Mon Jun 20 10:56:47 2016 -0700
>>>
>>> mmc: sdhci-of-arasan: Properly set corecfg_baseclkfreq on rk3399
>>>
>>> changed struct 'sdhci_arasan_of_match' to convey different data, which
>>> means that instead of having a generic way of accessing such data (such
>>> as 'of_match_device()' and ".data" field), one must also check for
>>> specific "compatible" strings to make sense of the ".data" field, such as
>>> "rockchip,rk3399-sdhci-5.1"
>>>
>>> With the current code:
>>> - there's no 'of_match_device()' before 'sdhci_pltfm_init()'
>>> - the sdhci_pltfm_init() call is made with a static 'sdhci_arasan_pdata'
>>> struct (so it cannot be made dependent on the "compatible" string).
>>> - since 'sdhci_arasan_pdata' is the same for all compatible devices, even
>>> for those that require special handling, more "compatible" matching code is
>>> required
>>> - leading to spread "compatible" matching code; IMHO it would be cleaner if
>>> the 'sdhci_arasan_probe()' code was generic, with just a generic 
>>

Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-11-28 Thread Sebastian Frias
On 28/11/16 12:44, Adrian Hunter wrote:
> On 28/11/16 13:20, Sebastian Frias wrote:
>> Hi Adrian,
>>
>> On 28/11/16 11:30, Adrian Hunter wrote:
>>> On 28/11/16 09:32, Michal Simek wrote:
>>>> +Sai for Xilinx perspective.
>>>>
>>>> On 25.11.2016 16:24, Sebastian Frias wrote:
>>>>> Hi,
>>>>>
>>>>> When using the Arasan SDHCI HW IP, there is a set of parameters called
>>>>> "Hardware initialized registers"
>>>>>
>>>>> (Table 7, Section "Pin Signals", page 56 of Arasan "SD3.0/SDIO3.0/eMMC4.4
>>>>> AHB Host Controller", revision 6.0 document)
>>>>>
>>>>> In some platforms those signals are connected to registers that need to
>>>>> be programmed at some point for proper driver/HW initialisation.
>>>>>
>>>>> I found that the 'struct sdhci_ops' contains a '.platform_init' callback
>>>>> that is called from within 'sdhci_pltfm_init', and that seems a good
>>>>> candidate for a place to program those registers (*).
>>>>>
>>>>> Do you agree?
>>>
>>> We already killed .platform_init
>>
>> I just saw that, yet it was the perfect place for the HW initialisation I'm
>> talking about.
>> Any way we can restore it?
> 
> It doesn't serve any purpose I am aware of.

It would serve (for me) if it was there :-)

> 
>>
>>>
>>> What is wrong with sdhci_arasan_probe()?
>>
>> Well, in 4.7 sdhci_arasan_probe() did not call of_match_device(), so I had
>> put a call to it just before sdhci_pltfm_init(), something like:
>>
>> +static const struct of_device_id sdhci_arasan_of_match[] = {
>> +   {
>> +   .compatible = "arasan,sdhci-8.9a",
>> +   .data = &sdhci_arasan_ops,
>> +   },
>> +   {
>> +   .compatible = "arasan,sdhci-5.1",
>> +   .data = &sdhci_arasan_ops,
>> +   },
>> +   {
>> +   .compatible = "arasan,sdhci-4.9a",
>> +   .data = &sdhci_arasan_ops,
>> +   },
>> +   {
>> +   .compatible = "sigma,smp8734-sdio",
>> +   .data = &sdhci_arasan_tango4_ops,
>> +   },
>> +   { }
>> +};
>> +MODULE_DEVICE_TABLE(of, sdhci_arasan_of_match);
>>
>> ...
>>
>> +   const struct of_device_id *match;
>> +
>> +   match = of_match_device(sdhci_arasan_of_match, &pdev->dev);
>> +   if (match)
>> +   sdhci_arasan_pdata.ops = match->data;
>>
>> where 'sdhci_arasan_tango4_ops' contained a pointer to a .platform_init
>> callback.
>>
>> However, as I stated earlier, an upstream commit:
>>
>> commit 3ea4666e8d429223fbb39c1dccee7599ef7657d5
>> Author: Douglas Anderson 
>> Date:   Mon Jun 20 10:56:47 2016 -0700
>>
>> mmc: sdhci-of-arasan: Properly set corecfg_baseclkfreq on rk3399
>>
>> changed struct 'sdhci_arasan_of_match' to convey different data, which
>> means that instead of having a generic way of accessing such data (such
>> as 'of_match_device()' and ".data" field), one must also check for
>> specific "compatible" strings to make sense of the ".data" field, such as
>> "rockchip,rk3399-sdhci-5.1"
>>
>> With the current code:
>> - there's no 'of_match_device()' before 'sdhci_pltfm_init()'
>> - the sdhci_pltfm_init() call is made with a static 'sdhci_arasan_pdata'
>> struct (so it cannot be made dependent on the "compatible" string).
>> - since 'sdhci_arasan_pdata' is the same for all compatible devices, even
>> for those that require special handling, more "compatible" matching code is
>> required
>> - leading to spread "compatible" matching code; IMHO it would be cleaner if
>> the 'sdhci_arasan_probe()' code was generic, with just a generic "compatible"
>> matching, which then proceeded with specific initialisation and generic
>> initialisation.
>>
>> In a nutshell, IMHO it would be better if adding support for more SoCs only
>> involved changing just 'sdhci_arasan_of_match' without the need to change
>> 'sdhci_arasan_probe()'.
>> That would clearly separate the generic and "SoC"-specific code, thus 
>&

Re: Adding a .platform_init callback to sdhci_arasan_ops

2016-11-28 Thread Sebastian Frias
Hi Adrian,

On 28/11/16 11:30, Adrian Hunter wrote:
> On 28/11/16 09:32, Michal Simek wrote:
>> +Sai for Xilinx perspective.
>>
>> On 25.11.2016 16:24, Sebastian Frias wrote:
>>> Hi,
>>>
>>> When using the Arasan SDHCI HW IP, there is a set of parameters called
>>> "Hardware initialized registers"
>>>
>>> (Table 7, Section "Pin Signals", page 56 of Arasan "SD3.0/SDIO3.0/eMMC4.4
>>> AHB Host Controller", revision 6.0 document)
>>>
>>> In some platforms those signals are connected to registers that need to
>>> be programmed at some point for proper driver/HW initialisation.
>>>
>>> I found that the 'struct sdhci_ops' contains a '.platform_init' callback
>>> that is called from within 'sdhci_pltfm_init', and that seems a good
>>> candidate for a place to program those registers (*).
>>>
>>> Do you agree?
> 
> We already killed .platform_init

I just saw that, yet it was the perfect place for the HW initialisation I'm
talking about.
Any way we can restore it?

> 
> What is wrong with sdhci_arasan_probe()?

Well, in 4.7 sdhci_arasan_probe() did not call of_match_device(), so I had
put a call to it just before sdhci_pltfm_init(), something like:

+static const struct of_device_id sdhci_arasan_of_match[] = {
+   {
+   .compatible = "arasan,sdhci-8.9a",
+   .data = &sdhci_arasan_ops,
+   },
+   {
+   .compatible = "arasan,sdhci-5.1",
+   .data = &sdhci_arasan_ops,
+   },
+   {
+   .compatible = "arasan,sdhci-4.9a",
+   .data = &sdhci_arasan_ops,
+   },
+   {
+   .compatible = "sigma,smp8734-sdio",
+   .data = &sdhci_arasan_tango4_ops,
+   },
+   { }
+};
+MODULE_DEVICE_TABLE(of, sdhci_arasan_of_match);

...

+   const struct of_device_id *match;
+
+   match = of_match_device(sdhci_arasan_of_match, &pdev->dev);
+   if (match)
+   sdhci_arasan_pdata.ops = match->data;

where 'sdhci_arasan_tango4_ops' contained a pointer to a .platform_init
callback.

However, as I stated earlier, an upstream commit:

commit 3ea4666e8d429223fbb39c1dccee7599ef7657d5
Author: Douglas Anderson 
Date:   Mon Jun 20 10:56:47 2016 -0700

mmc: sdhci-of-arasan: Properly set corecfg_baseclkfreq on rk3399

changed struct 'sdhci_arasan_of_match' to convey different data, which
means that instead of having a generic way of accessing such data (such
as 'of_match_device()' and ".data" field), one must also check for
specific "compatible" strings to make sense of the ".data" field, such as
"rockchip,rk3399-sdhci-5.1"

With the current code:
- there's no 'of_match_device()' before 'sdhci_pltfm_init()'
- the sdhci_pltfm_init() call is made with a static 'sdhci_arasan_pdata'
struct (so it cannot be made dependent on the "compatible" string).
- since 'sdhci_arasan_pdata' is the same for all compatible devices, even
for those that require special handling, more "compatible" matching code is
required
- leading to spread "compatible" matching code; IMHO it would be cleaner if
the 'sdhci_arasan_probe()' code was generic, with just a generic "compatible"
matching, which then proceeded with specific initialisation and generic
initialisation.

In a nutshell, IMHO it would be better if adding support for more SoCs only
involved changing just 'sdhci_arasan_of_match' without the need to change
'sdhci_arasan_probe()'.
That would clearly separate the generic and "SoC"-specific code, thus allowing
better maintenance.

Does that makes sense to you guys?

Best regards,

Sebastian

> 
>>>
>>> Best regards,
>>>
>>> Sebastian
>>>
>>>
>>> (*): This has been prototyped on 4.7 as working properly.
>>> However, upstream commit:
>>>
>>> commit 3ea4666e8d429223fbb39c1dccee7599ef7657d5
>>> Author: Douglas Anderson 
>>> Date:   Mon Jun 20 10:56:47 2016 -0700
>>>
>>> mmc: sdhci-of-arasan: Properly set corecfg_baseclkfreq on rk3399
>>> ...
>>>
>>> could affect this solution because of the way the 'sdhci_arasan_of_match'
>>> struct is used after that commit.
>>>
>>
>>
> 



Adding a .platform_init callback to sdhci_arasan_ops

2016-11-25 Thread Sebastian Frias
Hi,

When using the Arasan SDHCI HW IP, there is a set of parameters called
"Hardware initialized registers"

(Table 7, Section "Pin Signals", page 56 of Arasan "SD3.0/SDIO3.0/eMMC4.4
AHB Host Controller", revision 6.0 document)

In some platforms those signals are connected to registers that need to
be programmed at some point for proper driver/HW initialisation.

I found that the 'struct sdhci_ops' contains a '.platform_init' callback
that is called from within 'sdhci_pltfm_init', and that seems a good
candidate for a place to program those registers (*).

Do you agree?

Best regards,

Sebastian


(*): This has been prototyped on 4.7 as working properly.
However, upstream commit:

commit 3ea4666e8d429223fbb39c1dccee7599ef7657d5
Author: Douglas Anderson 
Date:   Mon Jun 20 10:56:47 2016 -0700

mmc: sdhci-of-arasan: Properly set corecfg_baseclkfreq on rk3399
...

could affect this solution because of the way the 'sdhci_arasan_of_match'
struct is used after that commit.


Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-14 Thread Sebastian Frias
On 11/09/2016 06:03 PM, Florian Fainelli wrote:
> On 11/09/2016 05:02 AM, Sebastian Frias wrote:
>> On 11/04/2016 05:49 PM, Måns Rullgård wrote:
>>>>> But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
>>>>> will apply the delay.
>>>>>
>>>>> I think a better way of dealing with this is that both, PHY and MAC
>>>>> drivers exchange information so that the delay is applied only once.
>>>>
>>>> Exchange what information? The PHY device interface (phydev->interface)
>>>> conveys the needed information for both entities.
>>>
>>> There doesn't seem to be any consensus among the drivers regarding where
>>> the delay should be applied.  Since only a few drivers, MAC or PHY, act
>>> on this property, most combinations still work by chance.  It is common
>>> for boards to set the delay at the PHY using external config pins so no
>>> software setup is required (although I have one Sigma based board that
>>> gets this wrong).  I suspect if drivers/net/ethernet/broadcom/genet were
>>> used with one of the four PHY drivers that also set the delay based on
>>> this DT property, things would go wrong.
>>>
>>
>> Exactly, what about a patch like (I can make a formal submission, even
>> merge it with the patch discussed in this thread, consider this a RFC):
> 
> I really don't see a point in doing this when we can just clarify what
> phydev->interface does and already have the knowledge that we need
> without introducing additional flags in the phy driver.
> 

Ok, so who can clarify what "phydev->interface" does, especially in the
context of this discussion?

What happens when a TX delay must be applied and:
  - both the PHY and the MAC support the delay
  - only the PHY supports the delay
  - only the MAC supports the delay

Best regards,

Sebastian


Re: Is Documentation/networking/phy.txt still up-to-date?

2016-11-14 Thread Sebastian Frias
On 11/09/2016 06:07 PM, Florian Fainelli wrote:
> On 11/09/2016 05:24 AM, Sebastian Frias wrote:
>> Hi,
>>
>> Documentation/networking/phy.txt discusses phy_connect and states that:
>>
>>  "...
>>
>>  interface is a u32 which specifies the connection type used
>>  between the controller and the PHY.  Examples are GMII, MII,
>>  RGMII, and SGMII.  For a full list, see include/linux/phy.h
>>
>>  Now just make sure that phydev->supported and phydev->advertising have any
>>  values pruned from them which don't make sense for your controller (a 10/100
>>  controller may be connected to a gigabit capable PHY, so you would need to
>>  mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
>>  for these bitfields. Note that you should not SET any bits, or the PHY may
>>  get put into an unsupported state.
>>
>>  ..."
>>
>> However, 'drivers/net/ethernet/aurora/nb8800.c' for example, does SETs some
>> bits (in function 'nb8800_pause_adv').
> 
> All pause/flow control related bits should be set by the Ethernet MAC
> driver because this is an Ethernet MAC, not PHY, thing. See this
> discussion for some details:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg135347.html
> 
> So the nb8800 drivers does the correct thing here, but the documentation
> should be updated to reflect that this applies to all bits, except the
> Pause capabilities because these need to come from the Ethernet MAC.
> 

Ok, thanks.

>>
>> I checked 'drivers/net/ethernet/broadcom/genet/bcmmii.c' and that one CLEARs
>> bits (as per the documentation).
>>
>> Does anybody knows what is the correct/recommended approach?
> 
> Both drivers do correct things, they just don't set the same things here.
> 

Thanks!


Is Documentation/networking/phy.txt still up-to-date?

2016-11-09 Thread Sebastian Frias
Hi,

Documentation/networking/phy.txt discusses phy_connect and states that:

 "...

 interface is a u32 which specifies the connection type used
 between the controller and the PHY.  Examples are GMII, MII,
 RGMII, and SGMII.  For a full list, see include/linux/phy.h

 Now just make sure that phydev->supported and phydev->advertising have any
 values pruned from them which don't make sense for your controller (a 10/100
 controller may be connected to a gigabit capable PHY, so you would need to
 mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
 for these bitfields. Note that you should not SET any bits, or the PHY may
 get put into an unsupported state.

 ..."

However, 'drivers/net/ethernet/aurora/nb8800.c' for example, does SETs some
bits (in function 'nb8800_pause_adv').

I checked 'drivers/net/ethernet/broadcom/genet/bcmmii.c' and that one CLEARs
bits (as per the documentation).

Does anybody knows what is the correct/recommended approach?

Best regards,

Sebastian


Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-09 Thread Sebastian Frias
On 11/04/2016 05:49 PM, Måns Rullgård wrote:
>>> But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
>>> will apply the delay.
>>>
>>> I think a better way of dealing with this is that both, PHY and MAC
>>> drivers exchange information so that the delay is applied only once.
>>
>> Exchange what information? The PHY device interface (phydev->interface)
>> conveys the needed information for both entities.
> 
> There doesn't seem to be any consensus among the drivers regarding where
> the delay should be applied.  Since only a few drivers, MAC or PHY, act
> on this property, most combinations still work by chance.  It is common
> for boards to set the delay at the PHY using external config pins so no
> software setup is required (although I have one Sigma based board that
> gets this wrong).  I suspect if drivers/net/ethernet/broadcom/genet were
> used with one of the four PHY drivers that also set the delay based on
> this DT property, things would go wrong.
> 

Exactly, what about a patch like (I can make a formal submission, even
merge it with the patch discussed in this thread, consider this a RFC):

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index fba2699..4217ff4 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1283,6 +1283,10 @@ static int nb8800_tangox_init(struct net_device *dev)
case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
+
+   if ((dev->phydev->flags & PHY_SUPPORTS_TXID) == 0)
+   pad_mode |= PAD_MODE_GTX_CLK_DELAY;
+
break;
 
default:
diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 2e0c759..5eddb04 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -426,7 +426,9 @@ static int at803x_aneg_done(struct phy_device *phydev)
.suspend= at803x_suspend,
.resume = at803x_resume,
.features   = PHY_GBIT_FEATURES,
-   .flags  = PHY_HAS_INTERRUPT,
+   .flags  = PHY_HAS_INTERRUPT |
+ PHY_SUPPORTS_RXID |
+ PHY_SUPPORTS_TXID,
.config_aneg= genphy_config_aneg,
.read_status= genphy_read_status,
.ack_interrupt  = at803x_ack_interrupt,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index e7e1fd3..0f0b17e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -61,6 +61,8 @@
 #define PHY_HAS_INTERRUPT  0x0001
 #define PHY_HAS_MAGICANEG  0x0002
 #define PHY_IS_INTERNAL0x0004
+#define PHY_SUPPORTS_RXID   0x0008
+#define PHY_SUPPORTS_TXID   0x0010
 #define MDIO_DEVICE_IS_PHY 0x8000
 
 /* Interface Mode definitions */



Re: [PATCH v3 2/2] net: ethernet: nb8800: handle all RGMII definitions

2016-11-07 Thread Sebastian Frias
Hi Måns,

On 11/05/2016 01:58 PM, Måns Rullgård wrote:
>>  if (gigabit) {
>> -if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
>> +if (phy_interface_is_rgmii(phydev))
>>  mac_mode |= RGMII_MODE;
>>
>>  mac_mode |= GMAC_MODE;
> 
> As I said before, this part can/should be applied separately, although
> personally I probably wouldn't have bothered adding a single-use variable.

It was for consistency with other functions that use 'phydev', but I don't
mind making the changes.

Just to be clear, when you say "can/should be applied separately", do you
mean that this patch should not be part of a series, and that I should split
the series into separate patches?

Best regards,

Sebastian


Re: [PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias
Hi David,

On 11/04/2016 06:54 PM, David Miller wrote:
> From: Sebastian Frias 
> Date: Fri, 4 Nov 2016 18:02:15 +0100
> 
>> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
>> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
>> delay for tx and rx only") added several RGMII definitions:
>> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
>> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
>>
>> Those are all RGMII modes (1Gbit) and must be considered that way when
>> setting the MAC mode or the pad mode for the HW to work properly.
>>
>> Signed-off-by: Sebastian Frias 
> 
> You cannot just repost one part of a patch series when you make changes.
> 
> You must always repost the entire series as a new fresh version, with
> changelog entries added to your "[PATCH v2 0/2] ..." header posting.
> 

Thanks for the information.

I sent v2, and then v3, because v2 had formatting issues, hopefully it is
ok now.

Best regards,

Sebastian


[PATCH v3 2/2] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias
Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
 static void nb8800_mac_config(struct net_device *dev)
 {
struct nb8800_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;
 
if (gigabit) {
-   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;
 
mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII:
-   pad_mode = PAD_MODE_RGMII;
-   break;
-
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2




[PATCH v3 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
The delay can be applied at PHY or MAC level, but since
PHY drivers will apply the delay at PHY level when using
one of the "internal delay" declinations of RGMII mode
(like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
at MAC level causes issues.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index b59aa35..d2855c9 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII_TXID:
-   pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+   pad_mode = PAD_MODE_RGMII;
break;
 
default:
-- 
1.7.11.2




[PATCH v3 0/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
This is v3 of the series, it fixes formatting issues of v2.

In v2 of the series, only the second patch:
"net: ethernet: nb8800: handle all RGMII definitions" is modified
to account for Florian's suggestion.

Sebastian Frias (2):
  net: ethernet: nb8800: Do not apply TX delay at MAC level
  net: ethernet: nb8800: handle all RGMII definitions

 drivers/net/ethernet/aurora/nb8800.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.11.2


[PATCH v2 2/2] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias
Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
drivers/net/ethernet/aurora/nb8800.c | 8 
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
static void nb8800_mac_config(struct net_device *dev)
{
struct nb8800_priv *priv = netdev_priv(dev);
+ struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;

if (gigabit) {
- if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+ if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;

mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;

case PHY_INTERFACE_MODE_RGMII:
- pad_mode = PAD_MODE_RGMII;
- break;
-
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2




[PATCH v2 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
The delay can be applied at PHY or MAC level, but since
PHY drivers will apply the delay at PHY level when using
one of the "internal delay" declinations of RGMII mode
(like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
at MAC level causes issues.

Signed-off-by: Sebastian Frias 
---
drivers/net/ethernet/aurora/nb8800.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index b59aa35..d2855c9 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
break;

case PHY_INTERFACE_MODE_RGMII_TXID:
- pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+ pad_mode = PAD_MODE_RGMII;
break;

default:
-- 
1.7.11.2



[PATCH v2 0/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
This is v2 of the series, only the second patch:
"net: ethernet: nb8800: handle all RGMII definitions" is modified
to account for Florian's suggestion.

Sebastian Frias (2):
  net: ethernet: nb8800: Do not apply TX delay at MAC level
  net: ethernet: nb8800: handle all RGMII definitions

 drivers/net/ethernet/aurora/nb8800.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.11.2


[PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias
Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
 static void nb8800_mac_config(struct net_device *dev)
 {
struct nb8800_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;
 
if (gigabit) {
-   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;
 
mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII:
-   pad_mode = PAD_MODE_RGMII;
-   break;
-
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2




Re: [PATCH 2/2] net: ethernet: nb8800: handle all RGMII declinations

2016-11-04 Thread Sebastian Frias
On 11/04/2016 05:23 PM, Florian Fainelli wrote:
> 
> 
> On 11/04/2016 08:05 AM, Sebastian Frias wrote:
>> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
>> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
>> delay for tx and rx only") added several RGMII declinations:
>> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
>> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
>>
>> Those are all RGMII modes (1Gbit) and must be considered that way when
>> setting the MAC Mode or the Pads Mode for the HW to work properly.
>>
>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index d2855c9..6230ace 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -609,7 +609,10 @@ static void nb8800_mac_config(struct net_device *dev)
>>  mac_mode |= HALF_DUPLEX;
>>  
>>  if (gigabit) {
>> -if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
>> +if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_ID ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_RXID ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_TXID)
> 
> phy_interface_is_rgmii(phydev)?

Thanks! I'll post an update.

> 
>>  mac_mode |= RGMII_MODE;
>>  
>>  mac_mode |= GMAC_MODE;
>> @@ -1278,9 +1281,8 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>  
>>  case PHY_INTERFACE_MODE_RGMII:
>> -pad_mode = PAD_MODE_RGMII;
>> -break;
>> -
>> +case PHY_INTERFACE_MODE_RGMII_ID:
>> +case PHY_INTERFACE_MODE_RGMII_RXID:
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>>  pad_mode = PAD_MODE_RGMII;
>>  break;
>>
> 



Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
Hi Måns,

On 11/04/2016 04:18 PM, Måns Rullgård wrote:
> Sebastian Frias  writes:
> 
>> The delay can be applied at PHY or MAC level, but since
>> PHY drivers will apply the delay at PHY level when using
>> one of the "internal delay" declinations of RGMII mode
>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>> at MAC level causes issues.
> 
> The Broadcom GENET driver does the same thing.
> 

Well, I don't know who uses that driver, or why they did it that way.

However, with the current code and DT bindings, if one requires
the delay, phy-connection-type="rgmii-txid" must be set.

But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
will apply the delay.

I think a better way of dealing with this is that both, PHY and MAC
drivers exchange information so that the delay is applied only once.

I can see how to do that in another patch set.

>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index b59aa35..d2855c9 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>> +pad_mode = PAD_MODE_RGMII;
>>  break;
>>
>>  default:
>> -- 
>> 1.7.11.2
> 
> If this change is correct (and I'm not convinced it is), that case
> should be merged with the one above it and PHY_INTERFACE_MODE_RGMII_RXID
> added as well.
> 

I can do a single patch.

The reason I made two patches was that it was clear what this patch
does, i.e.: do not apply the delay at MAC level, and what the subsequent
patch does, i.e.: handle all RGMII declinations.

Best regards,

Sebastian


Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
Hi Andrew,

On 11/04/2016 04:11 PM, Andrew Lunn wrote:
> On Fri, Nov 04, 2016 at 04:02:24PM +0100, Sebastian Frias wrote:
>> The delay can be applied at PHY or MAC level, but since
>> PHY drivers will apply the delay at PHY level when using
>> one of the "internal delay" declinations of RGMII mode
>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>> at MAC level causes issues.
>>
>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index b59aa35..d2855c9 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>  
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>> +pad_mode = PAD_MODE_RGMII;
>>  break;
> 
> How many boards use this Ethernet driver? How many boards are your
> potentially breaking, because they need this delay?
> 

This part is specific to the Tango architecture, as noted by the
function name "nb8800_tangox_init".

Also the register used here is Sigma-specific (i.e.: not related to the
Aurora VLSI MAC, "au-nb8800")

The thing is that without this patch if we set
phy-connection-type="rgmii-txid" on the DT, then both, the PHY and the
MAC, will apply the delay.

Best regards,

Sebastian

> I guess it is a small number, because doesn't it require the PHY is
> also broken, not adding a delay when it should?
> 
>  Andrew
> 



[PATCH 2/2] net: ethernet: nb8800: handle all RGMII declinations

2016-11-04 Thread Sebastian Frias
Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII declinations:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC Mode or the Pads Mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..6230ace 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -609,7 +609,10 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;
 
if (gigabit) {
-   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII ||
+   priv->phy_mode == PHY_INTERFACE_MODE_RGMII_ID ||
+   priv->phy_mode == PHY_INTERFACE_MODE_RGMII_RXID ||
+   priv->phy_mode == PHY_INTERFACE_MODE_RGMII_TXID)
mac_mode |= RGMII_MODE;
 
mac_mode |= GMAC_MODE;
@@ -1278,9 +1281,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII:
-   pad_mode = PAD_MODE_RGMII;
-   break;
-
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2




[PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias
The delay can be applied at PHY or MAC level, but since
PHY drivers will apply the delay at PHY level when using
one of the "internal delay" declinations of RGMII mode
(like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
at MAC level causes issues.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index b59aa35..d2855c9 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII_TXID:
-   pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+   pad_mode = PAD_MODE_RGMII;
break;
 
default:
-- 
1.7.11.2



Re: Disabling an interrupt in the handler locks the system up

2016-10-25 Thread Sebastian Frias
Hi Thomas,

On 10/24/2016 06:55 PM, Thomas Gleixner wrote:
> On Mon, 24 Oct 2016, Mason wrote:
>>
>> For the record, setting the IRQ_DISABLE_UNLAZY flag for this device
>> makes the system lock-up disappear.
> 
> The way how lazy irq disabling works is:
> 
> 1) Interrupt is marked disabled in software, but the hardware is not masked
> 
> 2) If the interrupt fires befor the interrupt is reenabled, then it's
>masked at the hardware level in the low level interrupt flow handler.
> 

Would you mind explaining what is the intention behind?
Because it does not seem obvious why there isn't a direct map between
"disable_irq*()" and "mask_irq()"

Thanks in advance.
Best regards,

Sebastian



Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-14 Thread Sebastian Frias
Hi Mark,

On 09/13/2016 04:51 PM, Mark Rutland wrote:
> Hi,
> 
> On Tue, Sep 13, 2016 at 04:22:09PM +0200, Sebastian Frias wrote:
>> On 09/13/2016 03:12 PM, Mark Rutland wrote:
> 
> [context was deleted, TL;DR: binding review is necessary, and takes
> effort, regardless of presence/absence of a driver]

Ok, I see you deleted the part where I was asking about the 'staging' folders.

$ find . -name 'stag*'
./Documentation/devicetree/bindings/staging
./drivers/staging
...

I don't know if that was intended (since you said you did not read it) but
I think it would be interesting for the discussion to give a little overview
on that.
My understanding is that there's a 'staging' area for bindings, couldn't the
nodes/properties from my example be put in there?

>>> Go and take a look at all the effort that went into sorting out generic
>>> IOMMU bindings, when driver support was written after a large amount of
>>> review to sort out fundamental concepts. We had to sort out fundamentals
>>> before prototype driver code could be written, and while we knew drivers
>>> were coming, an awful lot of review effort came first.
>>
>> Again, you are talking about drivers, but it is not the case at hand.
> 
> No, I am not. Please do not presume to put words in my mouth.

I'm sorry, that was not my intention.
I just assumed that because your paragraph mentioned the word "driver" and
appears to have been Linux-only effort geared towards writing drivers:
   "...We had to sort out fundamentals before prototype driver code could be
written, and while we knew drivers were coming..."

Maybe I miss some more context on that example.

> 
> I explicitly described a case where binding review took effort, and the
> presence or absence of drivers was irrelevant. We later had drivers,
> yes, but we had to understand the hardware to get the binding right
> first.

I see.
So if I understood correctly, it was a cleanup, or a way to unify some
bindings, so that drivers could be written easier?

I can see that taking time, but the underlying idea of this discussion is
that the DT could be divided in different sections.

I think this would allow and encourage SoC manufacturers to publish more
details, since they could use them internally as well.

Obviously, this is not strictly necessary, and I'm sure SoC manufacturers
could choose to just avoid having to deal with the open-source community
entirely by using forks.

However, that does not benefit the community and allows a "one-way" street,
with companies "profiting" from open-source software (in this example,
they would profit unwillingly, yet out of the necessity to avoid dealing
with outside 'restrictions' perceived as unnecessary).

It seems it is a similar case to that of allowing or not binary
drivers/blobs.
If they were not allowed, less companies would choose Linux.
I don't want to go into that discussion, but it seems similar enough to
this one for it to be worth mentioning.

> If there's data which has no consumer, it has no value being in the DT.
> Placing data with no consumer in the DT comes with a number of issues,
> e.g.
> 
> a) Some DTS authors will ignore it, and not place data according to it
>in DTs. Hence there's no gain in consistency.

Ok, but in this case the DT author (SoC manufacturer) would have interest
in using it.

> 
> b) Though some accident (perhaps a typo, perhaps a misunderstanding of
>the binding), a DT will come to have erroneous data, yet this will go
>unnoticed, as there is no consumer. When later a consumer appears, it
>can't trust existing DTs, and has to either ignore the binding
>entirely, or bodge around each and every broken DT.
> 

I think it is likely this has already happened, even with the current
constraints on review.

> c) When a consumer eventually appears, it turned out we didn't capture
>details of the hardware sufficiently, and the binding turns out to be
>useless. At worst, this boils down to (b), at best, we require
>additional properties. In this case absolutely nothing is gained.

Case c) could happen even when a driver is provided, for example if the
driver writer got incomplete documentation.

At any rate, since there was no user, even if the bindings need amending,
it should be easy to do, since there's no backwards compatibility to keep.

> In all cases, all we end up doing is enlarging DTBs, and risk causing
> even more work.
> 
> If there *is* going to be a consumer, and if information regarding that
> will be provided, then matters are different, and we can consider a
> binding on its own merit. We need a specific example for that.
> 
>> If the "user of the binding" is not 

Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-14 Thread Sebastian Frias
Hi Mark,

On 09/13/2016 05:47 PM, Mark Rutland wrote:
>>> If you believe that the bindings don't matter, then there is absolutely
>>> no reason for them to exist in the first place.
>>>
>>> If those binding matter to *anyone*, then those collating the bindings
>>> have some responsibility of stewardship, and that includes
>>> review/maintenance/etc.
>>
>> The thing is that right now it seems the "responsibility of stewardship"
>> lies only within "Linux", whereas DT is proposed as open for everybody,
>> Bootloaders, FreeBSD, etc.
>>
>> In that case, shouldn't the "responsibility" be shared?
> 
> Ideally, yes.
> 
> Which is one of the reasons devicetree.org was set up as a common forum
> for projects to collaborate on devicetree.

I see, what about using different 'sections' on a DT to allow different
parties be responsible for their 'section'?

- 'generic' sections (i.e.: those using bindings used by Linux drivers)
would be under stewardship of Linux.

- 'specific' sections (i.e.: my example, bindings *not used by Linux*, but
they could be bindings for other OSs as you said) would be under a
different stewardship.

DT seems essentially free-form, like XML.
One could imagine that some tool could then be used to guarantee that
some parts of DT conform to a given XML schema, including backwards
compatibility, while at the same time ignoring 'staging'/'specific' stuff.

NOTE: this appears to be possible using 'overlays' as Warner suggested, but
in that case not all parts are public, which limits public information.

Best regards,

Sebastian


Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-13 Thread Sebastian Frias
Hi Mark,

On 09/13/2016 03:12 PM, Mark Rutland wrote:
>> Exactly, that is why I was thinking it would take less "review" time.
>> Indeed, if there is no driver, why would it matter what those bindings
>> are?
> 
> If you believe that the bindings don't matter, then there is absolutely
> no reason for them to exist in the first place.
> 
> If those binding matter to *anyone*, then those collating the bindings
> have some responsibility of stewardship, and that includes
> review/maintenance/etc.

The thing is that right now it seems the "responsibility of stewardship"
lies only within "Linux", whereas DT is proposed as open for everybody,
Bootloaders, FreeBSD, etc.

In that case, shouldn't the "responsibility" be shared?
Alternatively, maybe 'borders' could be created, in order to enable the
allocation of responsibility of different sections to different parties,
right?
Obviously, moving properties/nodes from one 'section' to another crossing
responsibility 'borders' would require agreements.

Shouldn't that be something good to think about?

Best regards,

Sebastian


Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-13 Thread Sebastian Frias
Hi Mark,

On 09/13/2016 03:12 PM, Mark Rutland wrote:
>> Exactly, that is why I was thinking it would take less "review" time.
>> Indeed, if there is no driver, why would it matter what those bindings
>> are?
> 
> If you believe that the bindings don't matter, then there is absolutely
> no reason for them to exist in the first place.

Again, they would be there serving as HW description. Like documentation.
3rd parties and the open-source community could then use them to write
drivers.

Also, it would avoid having multiple DTs (bootloader, Linux, etc.) or
using 'binary DT overlays'.

In one case the nodes/properties would be made public, most likely with
documentation (even if there's no driver for them), in the other case
undocumented 'binary DT overlays' are used.

> 
> If those binding matter to *anyone*, then those collating the bindings
> have some responsibility of stewardship, and that includes
> review/maintenance/etc.
> 
> Hence, the Linux community cares as stewards of those bindings, and
> don't accept bindings they don't understand, for which there is no
> obvious user, nor for which the authors claim stability does not matter.
> Those go against the aims of DT, and against out responsiblities as
> stewards.
> 
> I cannot put that any clearer. 

I understand that.
However, it does not looks crazy to add some sort of 'staging', actually,
I found this:

$ find . -name 'stag*'
./Documentation/devicetree/bindings/staging
./drivers/staging
...
$ ls -1R ./Documentation/devicetree/bindings/staging
./Documentation/devicetree/bindings/staging:
iio
ion

./Documentation/devicetree/bindings/staging/iio:
adc

./Documentation/devicetree/bindings/staging/iio/adc:
lpc32xx-adc.txt
spear-adc.txt

./Documentation/devicetree/bindings/staging/ion:
hi6220-ion.txt

$ ls -1R ./drivers/staging | wc -l
2134

Isn't that a similar use to the one discussed in this thread?

>> Only for bindings for which there is a driver.
> 
> This is not true for all but the most trivial of hardware, as I stated
> previously.
> 
> Go and take a look at all the effort that went into sorting out generic
> IOMMU bindings, when driver support was written after a large amount of
> review to sort out fundamental concepts. We had to sort out fundamentals
> before prototype driver code could be written, and while we knew drivers
> were coming, an awful lot of review effort came first.

Again, you are talking about drivers, but it is not the case at hand.

DT seems essentially free-form, like XML.
One could imagine that some tool could then be used to guarantee that
some parts of DT conform to a given XML schema, including backwards
compatibility, while at the same time ignoring 'staging' stuff.

>> Think about this:
>> - a binding with no driver is submitted and ends up in the tree
>> (it could be on a staging area if necessary)
>>
>> - if at a later point somebody attempts to upstream a driver using those
>> 'staging' bindings, the reviewers could say "you are using 'staging' 
>> bindings,
>> please add compatibility with 'staging' and 'standard' bindings", even if 
>> that
>> includes the discussion and review of newly created 'standard' bindings
>> corresponding to the 'staging' bindings.
>>
>> - the submitter may even say "there's no need for compatibility for 'staging'
>> bindings, because they were never used (or other valid reasons)".
>>
>> What would you think of something like that?
> 
> As above, if they were never used, and potentially wrong, why did they
> exist?

The last step of my description, where the 'staging' bindings may not have
been used, was listed only for the sake of completeness since it is just a
possibility.
The hope is that they would be eventually used (most likely as is, but
there is also the possibility of having to rethink the bindings when
moving them from 'staging' to 'stable').

> Trying to upstream a binding with no user comees with no immediate
> benefit, and potentially creates longer-term pain, whereas you can defer
> upstreaming a binding until the driver is ready.

The immediate benefit is complete HW description, which, if generalised and
done properly would amount to HW documentation, isn't that good?

> Note that I've repeatedly pointed out that the user of the binding
> doesn't necessarily have to be linux. However, there does need to be
> some demonstration, and commitment to maintaining the principles DT aims
> towards (e.g. stability).

The commitment to stability can be enforced later, since changes would be
reviewed.

If the "user of the binding" is not Linux, under what circumstances
could "Linux" have the legitimacy of guaranteeing or enforcing any Linux-
specific commitment?

>> Let's make an abstraction of the word 'binding', 'create a binding', etc. and
>> just focus on this:
>> - Somebody submits a DT file that contains properties and nodes that are
>> *not used* by any Linux driver.
>> - Said properties and nodes serve as HW description for HW blocks for which
>> *there is no* Linux driver

Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-13 Thread Sebastian Frias
Hi Mark,

On 09/12/2016 06:56 PM, Mark Rutland wrote:
>> Exactly, that's why to I'm having trouble to understand why there is so much
>> insistence on "getting the DT 100% right", since a HW change could imply
>> that what made 100% sense yesterday, does not today.
>> Since that is a possibility we have to live with, then the "100% right" goal
>> is most likely unachievable.
>> That's different from "backwards compatibility" for which some technique,
>> like alternate descriptions, can be put in place.
> 
> I'm not on about "100% right", but more "a reasonable chance of being
> good enough". 

Ok.

> The latter is extremely difficult to judge when you just
> get a binding document with little or no additional context.

Exactly, that is why I was thinking it would take less "review" time.
Indeed, if there is no driver, why would it matter what those bindings
are?

> Backwards compatibility is with regards to new software supporting old
> bindings. If new HW comes out, we can create new bindings. Old bindings
> should remain supported regardless.

Indeed. So you agree that without a driver, there is no backward compatibility
to maintain (because nothing could break).
Could we then say that in that case, it should not matter that much (if at all)
if the bindings have "a reasonable chance of being good enough"?

Because I'm trying to understand why/how would the "open-source community" be
affected by some DT properties/nodes that are not used by upstream Linux 
drivers.

Warner proposed 'overlays' to deal with the "open-source community" reticence to
accept such bindings.
Whereas it may be simpler to just find a way to accommodate the integration of
such bindings in a way that everybody wins.
That could take the form of a "staging" DT area, so that the 'overlay' is 
public.
Just for the sake and benefit of everybody, 3rd parties and open-source 
community.

>> Could you be more precise on those two issues? Namely:
>> "the effort" and the "lack of benefit for the community"?
> 
> As above, reviewing is tricky. One has to spend the time gaining an
> understanding of a particular piece of hardware, the class of hardware
> it falls in, and also the bigger picture that it fits in. Once you have
> that, you have to review the binding in that context, and that takes
> time and effort.

Yes, but this is based on a binding for which there's a driver.
If there's no driver it should not take that much time, right?

> As things evolve, perhaps mistakes or inconsistencies are found, or new
> ways to generalise things. As that occurs, there is a maintenance burden
> for existing bindings.
> 
> All of that takes time and effort.

Only for bindings for which there is a driver.

Think about this:
- a binding with no driver is submitted and ends up in the tree
(it could be on a staging area if necessary)

- if at a later point somebody attempts to upstream a driver using those
'staging' bindings, the reviewers could say "you are using 'staging' bindings,
please add compatibility with 'staging' and 'standard' bindings", even if that
includes the discussion and review of newly created 'standard' bindings
corresponding to the 'staging' bindings.

- the submitter may even say "there's no need for compatibility for 'staging'
bindings, because they were never used (or other valid reasons)".

What would you think of something like that?

> If, at the end of that, a proprietary vendor driver is using a different
> version of the binding anyway, because "there's no backward
> compatibility to guarantee", then there is no benefit to the document.

No, it would not use a "different version of the binding", because the
binding would be public, i.e.: everything would be on the public DT.

> Even if that proprietary driver does remain stable, the Linux community
> will have done work that only benefits the authors of that driver. We
> would prefer that the results of our efforts are open, and benefit all.

Do you have an example of a binding *not used by Linux* that created enormous
amount of work for the Linux community?

>> I can understand the effort it takes to review a binding and some
>> driver, but if there's no driver, why would it matter if the DT binding is
>> 100% right? Hence, why would it take more effort?
>> Furthermore, if there's no driver, there's no backward compatibility to
>> guarantee. Shouldn't it require less effort?
> 
> I don't follow. If there's no compatibility to guarantee, why do you
> want a binding?

Let's make an abstraction of the word 'binding', 'create a binding', etc. and
just focus on this:
- Somebody submits a DT file that contains properties and nodes that are
*not used* by any Linux driver.
- Said properties and nodes serve as HW description for HW blocks for which
*there is no* Linux driver.

The goal of the above is to use the DT as the authoritative (and single)
source of HW definition.

An alternate solution, as proposed by Warner, is to use 'binary overlays' for
the DT.

> A binding is a form of c

Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-12 Thread Sebastian Frias
Hi Warner,

On 09/12/2016 04:26 PM, Warner Losh wrote:
> On Mon, Sep 12, 2016 at 8:01 AM, Mark Rutland  wrote:
>>> Since the question seems understood, do you have an example of other SoC's
>>> doing something similar?
>>
>> I do not have an example. I know that others are using DT for data
>> beyond what Linux or another OS requires, but it's my understanding that
>> that is typically in a separate DTB.
> 
> Just to clarify: FreeBSD uses, for the most part, the DTB's that the
> 'vendor' ships, which is quite often the same ones included in Linux.
> There's some exceptions where the bindings weren't really hardware
> independent, or where the abstraction model was really Linux specific
> (for things like the HDMI stack).
> 
> However, with the advent of overlays, one would think that a vendor
> could easily include an overlay with the DTB data for the devices they
> don't wish to, or cannot for other reasons release. It seems like the
> perfect mechanism to comply with the rules about inclusion of nodes in
> the DTS. Vendors are free to document these nodes and don't require
> the Linux kernel include them in the Documents directory to do so.
> There have been recent efforts to move this documentation to a third
> party to maintain.

This is very interesting, do you have a more concrete example of such
usage?

The overlay technique could be a solution, but so is forking and
distributing a non-documented DT. That's why I'd put this solution a
little bit lower than just exposing the entire HW description through
the DT.

Best regards,

Sebastian


Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-12 Thread Sebastian Frias
On 09/12/2016 06:07 PM, Sebastian Frias wrote:
> Hi Mark,
> 
> On 09/12/2016 04:01 PM, Mark Rutland wrote:
>>> 3rd parties could choose to write a driver (as opposed to use say, a 
>>> user-mode
>>> library) if it fits their programming model better, if they think they would
>>> have better performance, or other reasons.
>>
>> A vendor can always choose to "add value" in this manner. The general
>> expectation of *some* driver being upstreamed remains.
> 
> Yes, that's the idea.

Just to clarify, what I meant is that, using the DT as the authoritative
source of HW description is a way to "add value" to everybody, because both,
3rd-parties and the open-source community get the same information.
This creates the conditions for drivers to exist, with the expectation that
eventually said drivers would be upstreamed.


Re: ARM, SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-12 Thread Sebastian Frias
Hi Mark,

On 09/12/2016 04:01 PM, Mark Rutland wrote:
>> 3rd parties could choose to write a driver (as opposed to use say, a 
>> user-mode
>> library) if it fits their programming model better, if they think they would
>> have better performance, or other reasons.
> 
> A vendor can always choose to "add value" in this manner. The general
> expectation of *some* driver being upstreamed remains.

Yes, that's the idea.

>>
>> That may be true, but so far we are not discussing changing DT's API so it
>> should not have big ramifications.
> 
> You're not changing the code, but you are creating a binding. Bindings
> are intended to be stable (i.e. a working DTB from today should continue
> to work in future), and thus there are ramifications.

Ok, but who is responsible for such guarantee?
How is it enforced and verified?

> 
> Few devices these says are entirely independent, and most devices can be
> instantiated multiple times (even if there happens to only be a single
> instance in practice). For the example of a userspace driver there are
> very real ABI concerns, such as how the device(s) are discovered, how
> any related components like regulators and clocks are controlled, etc.
> 
> There are ramifications here, and it's a dangerous over simplification
> to say that this doesn't matter because we're not changing kernel code.
> 
>> Besides, what "makes sense now" may "not make sense tomorrow" depending on
>> how the HW is modified.
> 
> That's always the case when a new generation of hardware comes out, so I
> don't think that's relevant to the topic at hand.

Exactly, that's why to I'm having trouble to understand why there is so much
insistence on "getting the DT 100% right", since a HW change could imply
that what made 100% sense yesterday, does not today.
Since that is a possibility we have to live with, then the "100% right" goal
is most likely unachievable.
That's different from "backwards compatibility" for which some technique,
like alternate descriptions, can be put in place.

Maybe I will get a better understanding of this once my previous question
about who and how guarantees the stability of a given DT blob.

>>
>> Actually, I think it would encourage more SoC manufacturers to use DT as a 
>> way
>> to document their HW, which is a good thing.
> 
> Writing and reviewing bindings is a very tricky topic, as it can require
> fairly intimate knowledge of a piece of hardware. I've repeatedly found
> that binding descriptions did not match the realities of the hardware,
> and I've only managed to do so by looking at accompanying driver code.
> 
> Given that manuals and other information on devices are often not freely
> available (if they exist at all), the proposal effectively limits myself
> and others to spot common (anti)patterns, which is far less than ideal,
> and will result in more mistakes.
> 
> As it stands, the proposal asks for effort for the community (in terms
> of review and maintenance of bindings), with no benefit to the kernel
> community, and a number of pitfalls that we would rather avoid.

Could you be more precise on those two issues? Namely:
"the effort" and the "lack of benefit for the community"?

I can understand the effort it takes to review a binding and some
driver, but if there's no driver, why would it matter if the DT binding is
100% right? Hence, why would it take more effort?
Furthermore, if there's no driver, there's no backward compatibility to
guarantee. Shouldn't it require less effort?

Also, what sort of "benefits" does the community expects or requires?
Because the idea behind the proposal is to put the HW description in DT, so
basically the HW would be documented. Maybe there wouldn't be a driver right
away, but the HW description would allow for drivers to exist.

> 
> In an ideal world, writing and reviewing bindings would be a simple
> affair, and this could happen separately from work on any particular OS.
> In practice, things are sufficiently complicated that you need *some*
> demonstration that a binding is suitable, which is what I'm personally
> after when I ask for a Linux driver.
> 
 However, after discussing over IRC, it looks like there was no guidance on
 this. Some people think submitting DT properties/nodes without a 
 corresponding
 Linux driver is frowned upon, while others thought it was an odd limitation
 and suggested asking here.
>>>
>>> Unfortunately, I think that the area is sufficiently vague that there
>>> simply is no clear and general answer.
>>>
>>> For the sake of discussion, an example of a particular block, along with
>>> what you expect/need to describe would be helpful.
>>
>> I don't have a more concrete example now.
> 
> For this discussion to go somewhere, we need an example. Otherwise we're
> all coming at this with differing implicit assumptions and no clear
> evidence for any assertions.

Ok, I'll try to come up with some example.

>> But if I understood correctly your comment, you are basically sayi

Re: ARM,SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-12 Thread Sebastian Frias
Hi Mark,

On 09/12/2016 02:38 PM, Mark Rutland wrote:
>>
>> 3rd party users of said SoC could then write kernel modules for such HW
>> blocks using the DT description. The DT would thus become the authoritative
>> source of information regarding register programming for the SoC.
> 
> I don't follow this part entirely. Why are you expecting thrid parties
> to write a driver for those blocks rather than upstreaming a driver for
> them?

3rd parties could choose to write a driver (as opposed to use say, a user-mode
library) if it fits their programming model better, if they think they would
have better performance, or other reasons.

The main idea is to make DT the authoritative source of HW description.

> 
>> Currently, HW blocks for which there is no public driver (that it is
>> accessed through user-mode libraries or firmware) require a separate
>> HW description (be it Documentation, headers, etc.)
>>
>> Since the DT describes the HW, it would make sense to expose the HW through
>> DT, that would centralise the HW description.
> 
> I would generally agree that the hardware should be described in DT.
> The difficulty is that without a 'real' user it's not always possible to
> tell if we're describing the thing correctly.
> 

That may be true, but so far we are not discussing changing DT's API so it
should not have big ramifications.
Besides, what "makes sense now" may "not make sense tomorrow" depending on
how the HW is modified.
We have somehow learned the hard-way that "le mieux est l'ennemi du bien"
(the better is the enemy of the good) and we are trying to take a more
practical (and flexible) approach.

> Putting smoething together that's only sufficient to support some
> out-of-tree driver with implicit assumptions that we are not aware of is
> far from fantastic.

That does not seem very positive and it is not the case anyway, otherwise we
would not be consulting here :-)
Agreed, right now this whole thing seems like a really hypothetical question,
but the intention is good.

Actually, I think it would encourage more SoC manufacturers to use DT as a way
to document their HW, which is a good thing.

> 
>> However, after discussing over IRC, it looks like there was no guidance on
>> this. Some people think submitting DT properties/nodes without a 
>> corresponding
>> Linux driver is frowned upon, while others thought it was an odd limitation
>> and suggested asking here.
> 
> Unfortunately, I think that the area is sufficiently vague that there
> simply is no clear and general answer.
> 
> For the sake of discussion, an example of a particular block, along with
> what you expect/need to describe would be helpful.

I don't have a more concrete example now.

As I stated, right now HW description is not centralised, and thus different
bits of information are cherry-picked by hand from HW description into DT for
bootloader, DT for Linux, Documentation/headers for 3rd-parties, etc.

But if I understood correctly your comment, you are basically saying that
without an example is hard to say.
Since the question seems understood, do you have an example of other SoC's
doing something similar?

I've seen some big DT descriptions, but it is difficult to know if we are the
first ones trying to use the DT in this way.

Best regards,

Sebastian



Re: ARM,SoC: About the use DT-defined properties by 3rd-party drivers

2016-09-12 Thread Sebastian Frias
Hi Timur,

On 08/28/2016 10:36 PM, Timur Tabi wrote:
> On Wed, Aug 24, 2016 at 9:29 AM, Sebastian Frias  wrote:
>>
>> If this is really not possible, it forces the SoC manufacturer to expose
>> those properties in a different way, thus wasting a (seemingly) perfectly
>> fine way of doing so: the DT and its documentation.
> 
> When you submit a new driver upstream, that patch also includes the
> new device tree nodes and documentation for those nodes.  Everything
> is peer-reviewed together.  I don't understand what you think the
> problem is.
> 

Thanks for your comment and sorry for the late reply.

My question is about submitting DT properties/nodes (describing some HW) for
which there is no Linux driver. Like register addresses for HW blocks,
including HW capabilities of said HW blocks, which may or may not be setup
by Linux directly.

The idea being that since DT describes the HW and is usually shared with the
bootloader (yet stored in the Linux kernel tree), all layers of the stack
could use the same DT and each layer would use relevant properties. So the
DT would describe the whole SoC even if not all HW blocks have a Linux
driver.

3rd party users of said SoC could then write kernel modules for such HW
blocks using the DT description. The DT would thus become the authoritative
source of information regarding register programming for the SoC.

Currently, HW blocks for which there is no public driver (that it is
accessed through user-mode libraries or firmware) require a separate
HW description (be it Documentation, headers, etc.)

Since the DT describes the HW, it would make sense to expose the HW through
DT, that would centralise the HW description.

However, after discussing over IRC, it looks like there was no guidance on
this. Some people think submitting DT properties/nodes without a corresponding
Linux driver is frowned upon, while others thought it was an odd limitation
and suggested asking here.

Does that clarifies the scope of the question?

Best regards,

Sebastian


[tip:irq/core] genirq/generic_chip: Verify irqs_per_chip <= 32

2016-09-02 Thread tip-bot for Sebastian Frias
Commit-ID:  f88eecfe2f22b2790e7527c0aaec14ea175919de
Gitweb: http://git.kernel.org/tip/f88eecfe2f22b2790e7527c0aaec14ea175919de
Author: Sebastian Frias 
AuthorDate: Tue, 16 Aug 2016 16:05:08 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Sep 2016 20:20:59 +0200

genirq/generic_chip: Verify irqs_per_chip <= 32

Most (if not all) code here implicitly assumes that the maximum number of
IRQs per chip will be 32, and thus uses 'u32' or 'unsigned long' for many
tasks (for example "struct irq_data" declares its 'mask' field as 'u32',
and "struct irq_chip_generic" declares its 'installed' field as 'unsigned
long')

However, there is no check to verify that irqs_per_chip is <= 32.  Hence,
calling irq_alloc_domain_generic_chips() with a bigger value will result in
unexpected results.

Provide a wrapper with a MAYBE_BUILD_BUG_ON(nrirqs >= 32) to catch such
cases.

[ tglx: Reduced changelog to the essential information ]

Signed-off-by: Sebastian Frias 
Cc: Marc Zyngier 
Cc: Mason 
Cc: Jason Cooper 
Link: http://lkml.kernel.org/r/57b31d94.5040...@laposte.net
Signed-off-by: Thomas Gleixner 
---
 include/linux/irq.h   | 18 +-
 kernel/irq/generic-chip.c | 16 
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b52424e..6039867 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -916,12 +916,20 @@ void irq_remove_generic_chip(struct irq_chip_generic *gc, 
u32 msk,
 unsigned int clr, unsigned int set);
 
 struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, 
unsigned int hw_irq);
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags flags);
 
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags flags);
+
+#define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \
+  handler, clr, set, flags)\
+({ \
+   MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \
+   __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name,\
+handler, clr, set, flags); \
+})
 
 static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
 {
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index a3a3920..ee32870 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -260,9 +260,9 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
 }
 
 /**
- * irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
+ * __irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
  * @d: irq domain for which to allocate chips
- * @irqs_per_chip: Number of interrupts each chip handles
+ * @irqs_per_chip: Number of interrupts each chip handles (max 32)
  * @num_ct:Number of irq_chip_type instances associated with this
  * @name:  Name of the irq chip
  * @handler:   Default flow handler associated with these chips
@@ -270,11 +270,11 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
  * @set:   IRQ_* bits to set in the mapping function
  * @gcflags:   Generic chip specific setup flags
  */
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags gcflags)
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags gcflags)
 {
struct irq_domain_chip_generic *dgc;
struct irq_chip_generic *gc;
@@ -326,7 +326,7 @@ int irq_alloc_domain_generic_chips(struct irq_domain *d, 
int irqs_per_chip,
d->name = name;
return 0;
 }
-EXPORT_SYMBOL_GPL(irq_alloc_domain_generic_

[tip:irq/core] irqdomain: Mask irq type in irq_domain_xlate_onetwocell()

2016-09-02 Thread tip-bot for Sebastian Frias
Commit-ID:  0c228919e04ddec195402296e7ebf2472ed6caef
Gitweb: http://git.kernel.org/tip/0c228919e04ddec195402296e7ebf2472ed6caef
Author: Sebastian Frias 
AuthorDate: Tue, 2 Aug 2016 10:52:45 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Sep 2016 18:06:50 +0200

irqdomain: Mask irq type in irq_domain_xlate_onetwocell()

According to the xlate() callback definition, the 'out_type' parameter
needs to be the "linux irq type".

A mask for such bits exists, IRQ_TYPE_SENSE_MASK, which is correctly
applied in irq_domain_xlate_twocell()

So use it for irq_domain_xlate_onetwocell() as well.

Signed-off-by: Sebastian Frias 
Cc: Grant Likely 
Cc: Marc Zyngier 
Cc: Mason 
Cc: Jason Cooper 
Link: http://lkml.kernel.org/r/57a05f5d@laposte.net
Signed-off-by: Thomas Gleixner 

---
 kernel/irq/irqdomain.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 4752b43..f10cffe 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -868,7 +868,10 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
if (WARN_ON(intsize < 1))
return -EINVAL;
*out_hwirq = intspec[0];
-   *out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
+   if (intsize > 1)
+   *out_type = intspec[1] & IRQ_TYPE_SENSE_MASK;
+   else
+   *out_type = IRQ_TYPE_NONE;
return 0;
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);


[tip:irq/core] genirq/generic_chip: Verify irqs_per_chip <= 32

2016-09-02 Thread tip-bot for Sebastian Frias
Commit-ID:  895d3b95ed05f72a94f69ab52cb313915a6b889f
Gitweb: http://git.kernel.org/tip/895d3b95ed05f72a94f69ab52cb313915a6b889f
Author: Sebastian Frias 
AuthorDate: Tue, 16 Aug 2016 16:05:08 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Sep 2016 18:06:50 +0200

genirq/generic_chip: Verify irqs_per_chip <= 32

Most (if not all) code here implicitly assumes that the maximum number of
IRQs per chip will be 32, and thus uses 'u32' or 'unsigned long' for many
tasks (for example "struct irq_data" declares its 'mask' field as 'u32',
and "struct irq_chip_generic" declares its 'installed' field as 'unsigned
long')

However, there is no check to verify that irqs_per_chip is <= 32.  Hence,
calling irq_alloc_domain_generic_chips() with a bigger value will result in
unexpected results.

Provide a wrapper with a MAYBE_BUILD_BUG_ON(nrirqs >= 32) to catch such
cases.

[ tglx: Reduced changelog to the essential information ]

Signed-off-by: Sebastian Frias 
Cc: Marc Zyngier 
Cc: Mason 
Cc: Jason Cooper 
Link: http://lkml.kernel.org/r/57b31d94.5040...@laposte.net
Signed-off-by: Thomas Gleixner 

---
 include/linux/irq.h   | 18 +-
 kernel/irq/generic-chip.c | 16 
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b52424e..9a96860 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -916,12 +916,20 @@ void irq_remove_generic_chip(struct irq_chip_generic *gc, 
u32 msk,
 unsigned int clr, unsigned int set);
 
 struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, 
unsigned int hw_irq);
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags flags);
 
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags flags);
+
+#define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \
+  handler, clr, set, flags)\
+({ \
+   MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \
+   __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct,  \
+handler, clr, set, flags); \
+})
 
 static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
 {
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index a3a3920..ee32870 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -260,9 +260,9 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
 }
 
 /**
- * irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
+ * __irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
  * @d: irq domain for which to allocate chips
- * @irqs_per_chip: Number of interrupts each chip handles
+ * @irqs_per_chip: Number of interrupts each chip handles (max 32)
  * @num_ct:Number of irq_chip_type instances associated with this
  * @name:  Name of the irq chip
  * @handler:   Default flow handler associated with these chips
@@ -270,11 +270,11 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
  * @set:   IRQ_* bits to set in the mapping function
  * @gcflags:   Generic chip specific setup flags
  */
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags gcflags)
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags gcflags)
 {
struct irq_domain_chip_generic *dgc;
struct irq_chip_generic *gc;
@@ -326,7 +326,7 @@ int irq_alloc_domain_generic_chips(struct irq_domain *d, 
int irqs_per_chip,
d->name = name;
return 0;
 }
-EXPORT_SYMBOL_GPL(irq_alloc_domain_generic_

[tip:irq/core] genirq/generic_chip: Add irq_unmap callback

2016-09-02 Thread tip-bot for Sebastian Frias
Commit-ID:  ee26c013cdee0b947e29d6cadfb9ff3341c69ff9
Gitweb: http://git.kernel.org/tip/ee26c013cdee0b947e29d6cadfb9ff3341c69ff9
Author: Sebastian Frias 
AuthorDate: Mon, 1 Aug 2016 16:27:38 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Sep 2016 18:06:49 +0200

genirq/generic_chip: Add irq_unmap callback

Without this patch irq_domain_disassociate() cannot properly release the
interrupt. In fact, irq_map_generic_chip() checks a bit on 'gc->installed'
but said bit is never cleared, only set.

Commit 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
added irq_map_generic_chip() function and also stated "This lacks a removal
function for now".

This commit provides an implementation of an unmap function that can be
called by irq_domain_disassociate().

[ tglx: Made the function static and removed the export as we have neither
a prototype nor a modular user. ]

Fixes: 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
Signed-off-by: Sebastian Frias 
Cc: Marc Zyngier 
Cc: Mason 
Cc: Jason Cooper 
Link: http://lkml.kernel.org/r/579f5c5a.2070...@laposte.net
Signed-off-by: Thomas Gleixner 

---
 kernel/irq/generic-chip.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index 11ad73b..a3a3920 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -414,8 +414,29 @@ int irq_map_generic_chip(struct irq_domain *d, unsigned 
int virq,
return 0;
 }
 
+static void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq)
+{
+   struct irq_data *data = irq_domain_get_irq_data(d, virq);
+   struct irq_domain_chip_generic *dgc = d->gc;
+   unsigned int hw_irq = data->hwirq;
+   struct irq_chip_generic *gc;
+   int irq_idx;
+
+   gc = irq_get_domain_generic_chip(d, hw_irq);
+   if (!gc)
+   return;
+
+   irq_idx = hw_irq % dgc->irqs_per_chip;
+
+   clear_bit(irq_idx, &gc->installed);
+   irq_domain_set_info(d, virq, hw_irq, &no_irq_chip, NULL, NULL, NULL,
+   NULL);
+
+}
+
 struct irq_domain_ops irq_generic_chip_ops = {
.map= irq_map_generic_chip,
+   .unmap  = irq_unmap_generic_chip,
.xlate  = irq_domain_xlate_onetwocell,
 };
 EXPORT_SYMBOL_GPL(irq_generic_chip_ops);


[tip:irq/core] genirq/generic_chip: Get rid of code duplication

2016-09-02 Thread tip-bot for Sebastian Frias
Commit-ID:  f0c450eaa364cb77c778f2a46ee2aa3ff464b332
Gitweb: http://git.kernel.org/tip/f0c450eaa364cb77c778f2a46ee2aa3ff464b332
Author: Sebastian Frias 
AuthorDate: Mon, 1 Aug 2016 16:27:53 +0200
Committer:  Thomas Gleixner 
CommitDate: Fri, 2 Sep 2016 18:06:49 +0200

genirq/generic_chip: Get rid of code duplication

irq_map_generic_chip() contains about the same code as
irq_get_domain_generic_chip() except for the return values.

Split out the irq_get_domain_generic_chip() implementation so it can be
reused.

[ tglx: Removed the extra churn in irq_get_domain_generic_chip() callers
and massaged changelog ]

Signed-off-by: Sebastian Frias 
Cc: Marc Zyngier 
Cc: Mason 
Cc: Jason Cooper 
Link: http://lkml.kernel.org/r/579f5c69.8070...@laposte.net
Signed-off-by: Thomas Gleixner 

---
 kernel/irq/generic-chip.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index 5fbb94b..11ad73b 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -328,6 +328,20 @@ int irq_alloc_domain_generic_chips(struct irq_domain *d, 
int irqs_per_chip,
 }
 EXPORT_SYMBOL_GPL(irq_alloc_domain_generic_chips);
 
+static struct irq_chip_generic *
+__irq_get_domain_generic_chip(struct irq_domain *d, unsigned int hw_irq)
+{
+   struct irq_domain_chip_generic *dgc = d->gc;
+   int idx;
+
+   if (!dgc)
+   return ERR_PTR(-ENODEV);
+   idx = hw_irq / dgc->irqs_per_chip;
+   if (idx >= dgc->num_chips)
+   return ERR_PTR(-EINVAL);
+   return dgc->gc[idx];
+}
+
 /**
  * irq_get_domain_generic_chip - Get a pointer to the generic chip of a hw_irq
  * @d: irq domain pointer
@@ -336,15 +350,9 @@ EXPORT_SYMBOL_GPL(irq_alloc_domain_generic_chips);
 struct irq_chip_generic *
 irq_get_domain_generic_chip(struct irq_domain *d, unsigned int hw_irq)
 {
-   struct irq_domain_chip_generic *dgc = d->gc;
-   int idx;
+   struct irq_chip_generic *gc = __irq_get_domain_generic_chip(d, hw_irq);
 
-   if (!dgc)
-   return NULL;
-   idx = hw_irq / dgc->irqs_per_chip;
-   if (idx >= dgc->num_chips)
-   return NULL;
-   return dgc->gc[idx];
+   return !IS_ERR(gc) ? gc : NULL;
 }
 EXPORT_SYMBOL_GPL(irq_get_domain_generic_chip);
 
@@ -368,13 +376,9 @@ int irq_map_generic_chip(struct irq_domain *d, unsigned 
int virq,
unsigned long flags;
int idx;
 
-   if (!d->gc)
-   return -ENODEV;
-
-   idx = hw_irq / dgc->irqs_per_chip;
-   if (idx >= dgc->num_chips)
-   return -EINVAL;
-   gc = dgc->gc[idx];
+   gc = __irq_get_domain_generic_chip(d, hw_irq);
+   if (IS_ERR(gc))
+   return PTR_ERR(gc);
 
idx = hw_irq % dgc->irqs_per_chip;
 


Re: [PATCH 1/2] genirq: Generic chip: add irq_unmap_generic_chip

2016-09-02 Thread Sebastian Frias
Hi Thomas,

On 09/02/2016 05:12 PM, Thomas Gleixner wrote:
> On Mon, 1 Aug 2016, Sebastian Frias wrote:
>> NOTE: While the proposed unmap() function attempts to undo as much things
>> as done by the map() function, I did not find a way to undo the following:
>>
>> a) irq_gc_init_mask_cache(gc, dgc->gc_flags)
> 
> You can't undo that. Because that represents the mask cache of the irq chip
> and that is required to be consistent over the life time of the irq
> chip.
> 
> Unmapping does not make the generic chip and the underlying irqchip go
> away.
> 
>> b) irq_set_lockdep_class(virq, &irq_nested_lock_class)
> 
> No point in undoing that. The irq descriptor is released on unmap.
> 
>> c) irq_modify_status(virq, dgc->irq_flags_to_clear, dgc->irq_flags_to_set)
> 
> See b)
> 

Thanks for your time, I gather from this that the patch is ok then?

I submitted a few more patches, one of them a follow up of this one, and some
more, here are pointers to them just in case:
   https://lkml.org/lkml/2016/8/1/302
   https://lkml.org/lkml/2016/8/2/118
   https://lkml.org/lkml/2016/8/19/598

Best regards,

Sebastian



ARM,SoC: About the use DT-defined properties by 3rd-party drivers

2016-08-24 Thread Sebastian Frias
Hi,

Given a SoC and its SDK, 3rd party users of said SoC (customers of the SoC's
manufacturer) may need/want to write kernel modules.
Since the DT describes the HW, it would make sense to expose some HW properties
through the DT, and have 3rd party users rely on them to write their drivers in
a generic way.
However, it seems that one is not allowed to upstream DT bindings and
properties without a driver, is that right?

If true, that hampers DT usage by out-of-tree drivers (or in this case,
drivers that are not written yet) and limits the usage of DT properties as an
API.

We can understand that DT maintainers want to know what goes upstream,
make sure that the bindings are documented, that there are no conflicts with
other bindings or generic strings, etc.
However, it is hard to see why there is not a "relaxed" upstreaming path
for SoC-specific properties (i.e.: that do not affect DT API nor other
SoCs).

As an example, we would like to add some properties to /soc/ like:

   soc {
   compatible = "simple-bus";
   ...
   sigma-register-layout-version = <2>;
   sigma-has-XYZ-capability;

and so on.

If this is really not possible, it forces the SoC manufacturer to expose
those properties in a different way, thus wasting a (seemingly) perfectly
fine way of doing so: the DT and its documentation.

Thoughts?

By the way, what about properties generated on-the-fly by the bootloader?

Thanks in advance.
Best regards,

Sebastian


[PATCH] genirq: Generic chip: verify irqs_per_chip <= 32

2016-08-19 Thread Sebastian Frias
Most (if not all) code here implicitly assumes that the maximum number of
IRQs per chip will be 32, and thus uses 'u32' or 'unsigned long' for many
tasks (for example "struct irq_data" declares its 'mask' field as 'u32',
and "struct irq_chip_generic" declares its 'installed' field as 'unsigned
long')

However, there is no check to verify that irqs_per_chip is <= 32.
Hence, calling irq_alloc_domain_generic_chips() with a bigger value would
result in unexpected results depending on how the 'mask' is accessed later
on.
For example, if set_bit() is used on the 'installed' field of "struct
irq_chip_generic", it would result in the next field (in this case
the 'unused' field) being overwritten, because set_bit() is designed to
treat its parameter as a field of bits of arbitrary size organised as
"unsigned long" words.

This patch renames irq_alloc_domain_generic_chips() to
__irq_alloc_domain_generic_chips() and creates a macro to replace it.
The macro uses MAYBE_BUILD_BUG_ON to check the irqs_per_chip parameter to
stop compilation (if the compiler can resolve the parameter to a constant
at compile time) or to warn during run-time, if the parameter given is
bigger than 32.

Signed-off-by: Sebastian Frias 
---
 include/linux/irq.h   | 20 +++-
 kernel/irq/generic-chip.c | 16 
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b52424e..2a527e6 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -916,11 +916,21 @@ void irq_remove_generic_chip(struct irq_chip_generic *gc, 
u32 msk,
 unsigned int clr, unsigned int set);
 
 struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, 
unsigned int hw_irq);
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags flags);
+
+#define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \
+  handler, clr, set, flags)\
+   ({  \
+   MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \
+   __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, \
+name, handler, clr, set, \
+flags);\
+   })
+
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags flags);
 
 
 static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index abd286a..302ab659 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -260,9 +260,9 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
 }
 
 /**
- * irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
+ * __irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
  * @d: irq domain for which to allocate chips
- * @irqs_per_chip: Number of interrupts each chip handles
+ * @irqs_per_chip: Number of interrupts each chip handles (max 32)
  * @num_ct:Number of irq_chip_type instances associated with this
  * @name:  Name of the irq chip
  * @handler:   Default flow handler associated with these chips
@@ -270,11 +270,11 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
  * @set:   IRQ_* bits to set in the mapping function
  * @gcflags:   Generic chip specific setup flags
  */
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags gcflags)
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags gcflags)
 {
struct irq_domain_chip_generic *d

[PATCH] genirq: Generic chip: verify irqs_per_chip <= 32

2016-08-16 Thread Sebastian Frias
Most (if not all) code here implicitly assumes that the maximum number of
IRQs per chip will be 32, and thus uses 'u32' or 'unsigned long' for many
tasks (for example "struct irq_data" declares its 'mask' field as 'u32',
and "struct irq_chip_generic" declares its 'installed' field as 'unsigned
long')

However, there is no check to verify that irqs_per_chip is <= 32.
Hence, calling irq_alloc_domain_generic_chips() with a bigger value would
result in unexpected results depending on how the 'mask' is accessed later
on.
For example, if set_bit() is used on the 'installed' field of "struct
irq_chip_generic", it would result in the next field (in this case
the 'unused' field) being overwritten, because set_bit() is designed to
treat its parameter as a field of bits of arbitrary size organised as
"unsigned long" words.

This patch renames irq_alloc_domain_generic_chips() to
__irq_alloc_domain_generic_chips() and creates a macro to replace it.
The macro uses MAYBE_BUILD_BUG_ON to check the irqs_per_chip parameter to
stop compilation (if the compiler can resolve the parameter to a constant
at compile time) or to warn during run-time, if the parameter given is
bigger than 32.

Signed-off-by: Sebastian Frias 
---
 include/linux/irq.h   | 20 +++-
 kernel/irq/generic-chip.c | 16 
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index b52424e..2a527e6 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -916,11 +916,21 @@ void irq_remove_generic_chip(struct irq_chip_generic *gc, 
u32 msk,
 unsigned int clr, unsigned int set);
 
 struct irq_chip_generic *irq_get_domain_generic_chip(struct irq_domain *d, 
unsigned int hw_irq);
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags flags);
+
+#define irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, name, \
+  handler, clr, set, flags)\
+   ({  \
+   MAYBE_BUILD_BUG_ON(irqs_per_chip > 32); \
+   __irq_alloc_domain_generic_chips(d, irqs_per_chip, num_ct, \
+name, handler, clr, set, \
+flags);\
+   })
+
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags flags);
 
 
 static inline struct irq_chip_type *irq_data_get_chip_type(struct irq_data *d)
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index abd286a..302ab659 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -260,9 +260,9 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
 }
 
 /**
- * irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
+ * __irq_alloc_domain_generic_chip - Allocate generic chips for an irq domain
  * @d: irq domain for which to allocate chips
- * @irqs_per_chip: Number of interrupts each chip handles
+ * @irqs_per_chip: Number of interrupts each chip handles (max 32)
  * @num_ct:Number of irq_chip_type instances associated with this
  * @name:  Name of the irq chip
  * @handler:   Default flow handler associated with these chips
@@ -270,11 +270,11 @@ irq_gc_init_mask_cache(struct irq_chip_generic *gc, enum 
irq_gc_flags flags)
  * @set:   IRQ_* bits to set in the mapping function
  * @gcflags:   Generic chip specific setup flags
  */
-int irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
-  int num_ct, const char *name,
-  irq_flow_handler_t handler,
-  unsigned int clr, unsigned int set,
-  enum irq_gc_flags gcflags)
+int __irq_alloc_domain_generic_chips(struct irq_domain *d, int irqs_per_chip,
+int num_ct, const char *name,
+irq_flow_handler_t handler,
+unsigned int clr, unsigned int set,
+enum irq_gc_flags gcflags)
 {
struct irq_domain_chip_generic *d

Re: [PATCH 1/2] genirq: Generic chip: add irq_unmap_generic_chip

2016-08-12 Thread Sebastian Frias
Hi,

I don't know if somebody has had the time to look at these patches, but if you
have comments (even if related to removing the "fixes" tag if you don't
consider this a fix) please let me know.

Best regards,

Sebastian

On 08/01/2016 04:27 PM, Sebastian Frias wrote:
> Without this patch irq_domain_disassociate() cannot properly release the
> interrupt.
> Indeed, irq_map_generic_chip() checks a bit on 'gc->installed' but said bit
> is never cleared, only set.
> 
> Commit 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
> added irq_map_generic_chip() function and also stated "This lacks a removal
> function for now".
> 
> This commit provides with an implementation of an unmap function that can
> be called by irq_domain_disassociate().
> 
> Fixes: 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
> 
> Signed-off-by: Sebastian Frias 
> ---
> 
> This is required by loadable modules requesting IRQs.
> In our case rmmod will perform free_irq() + irq_dispose_mapping().
> Without the unmap call the module cannot request the IRQ after "rmmod"
> because it is marked as "installed" by the first successful "insmod".
> 
> NOTE: While the proposed unmap() function attempts to undo as much things
> as done by the map() function, I did not find a way to undo the following:
> 
> a) irq_gc_init_mask_cache(gc, dgc->gc_flags)
> b) irq_set_lockdep_class(virq, &irq_nested_lock_class)
> c) irq_modify_status(virq, dgc->irq_flags_to_clear, dgc->irq_flags_to_set)
> 
> Feel free to comment on that matter.
> 
> ---
>  kernel/irq/generic-chip.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
> index abd286a..7b464cd 100644
> --- a/kernel/irq/generic-chip.c
> +++ b/kernel/irq/generic-chip.c
> @@ -411,8 +411,34 @@ int irq_map_generic_chip(struct irq_domain *d, unsigned 
> int virq,
>  }
>  EXPORT_SYMBOL_GPL(irq_map_generic_chip);
>  
> +void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq)
> +{
> + struct irq_data *data = irq_domain_get_irq_data(d, virq);
> + struct irq_domain_chip_generic *dgc = d->gc;
> + struct irq_chip_generic *gc;
> + unsigned int hw_irq = data->hwirq;
> + int chip_idx, irq_idx;
> +
> + if (!d->gc)
> + return;
> +
> + chip_idx = hw_irq / dgc->irqs_per_chip;
> + if (chip_idx >= dgc->num_chips)
> + return;
> + gc = dgc->gc[chip_idx];
> +
> + irq_idx = hw_irq % dgc->irqs_per_chip;
> +
> + clear_bit(irq_idx, &gc->installed);
> + irq_domain_set_info(d, virq, hw_irq,
> + &no_irq_chip, NULL, NULL, NULL, NULL);
> +
> +}
> +EXPORT_SYMBOL_GPL(irq_unmap_generic_chip);
> +
>  struct irq_domain_ops irq_generic_chip_ops = {
>   .map= irq_map_generic_chip,
> + .unmap  = irq_unmap_generic_chip,
>   .xlate  = irq_domain_xlate_onetwocell,
>  };
>  EXPORT_SYMBOL_GPL(irq_generic_chip_ops);
> 



Re: [PATCH 1/2] irqdomain: fix irq_domain_xlate_onetwocell()

2016-08-04 Thread Sebastian Frias
Hi Thomas,

On 08/04/2016 12:05 PM, Thomas Gleixner wrote:
> On Tue, 2 Aug 2016, Sebastian Frias wrote:
> 
>> Use IRQ_TYPE_SENSE_MASK on irq_domain_xlate_onetwocell()
>>
>> According to the xlate() callback definition, the 'out_type' parameter
>> needs to be the "linux irq type". A mask for such bits exists,
>> IRQ_TYPE_SENSE_MASK, and as a matter of fact it is used in
>> irq_domain_xlate_twocell(), use it for irq_domain_xlate_onetwocell() as
>> well.
>>
>> Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
>> drivers can use")
> 
> You seem to be obsessed by adding "Fixes" tags and 'Fix' to the $subject, but
> your changelog just babbles about making the code the same as in
> irq_domain_xlate_twocell() and avoids carefully to explain what is broken,
> what consequences that has and what the fix is.

I'm sorry but I think your comment above is unfair, so you could have omitted
it entirely.

> 
> Now if you look at the call site which uses the xlate functions then you'll
> notice that we have a sanity check there already:
> 
> if (irq_domain_translate(domain, fwspec, &hwirq, &type))
> return 0;
> /*
>  * WARN if the irqchip returns a type with bits
>  * outside the sense mask set and clear these bits.
>  */
> if (WARN_ON(type & ~IRQ_TYPE_SENSE_MASK))
> type &= IRQ_TYPE_SENSE_MASK;
> 

(for what is worth, my patch was based on a recent 4.7 that does not has the
code you mention above)

> So we better remove the masking in the xlate functions completely and let all
> the offending device trees trip over the warning so they get eventually fixed.

That's fine by me.
If you want I can submit a patch that does what you want (if so, should I use
"fixes" or not?), let me know.

Best regards,

Sebastian


Re: [PATCH v2] irqdomain: factorise irq_domain_xlate_onetwocell()

2016-08-02 Thread Sebastian Frias
Hi Thomas,

On 08/02/2016 10:31 AM, Sebastian Frias wrote:
>> So the proper way to do that is to split this into two patches:
>>
>>  #1 Change the existing code to do the masking and explain why it is correct.
>>
>>  #2 Refactor the code and get rid of the duplicated implementation.
> 
> Ok, I can do two patches.

I splitted it in two patches, one to fix it, and another one to refactor.
However, I sent the first one (the one for the fix) as a separate one, instead 
of as
part of a set of two patches.

I'm resending both as a set of two patches (since the second one requires the 
first
one), sorry for the inconvenience.

Best regards,

Sebastian


[PATCH] irqdomain: fix irq_domain_xlate_onetwocell()

2016-08-02 Thread Sebastian Frias
Use IRQ_TYPE_SENSE_MASK on irq_domain_xlate_onetwocell()

According to the xlate() callback definition, the 'out_type' parameter
needs to be the "linux irq type". A mask for such bits exists,
IRQ_TYPE_SENSE_MASK, and as a matter of fact it is used in
irq_domain_xlate_twocell(), use it for irq_domain_xlate_onetwocell() as
well.

Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use")

Signed-off-by: Sebastian Frias 
---
 kernel/irq/irqdomain.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8798b6c..1bdd3fe 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -824,7 +824,10 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
if (WARN_ON(intsize < 1))
return -EINVAL;
*out_hwirq = intspec[0];
-   *out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
+   if (intsize > 1)
+   *out_type = intspec[1] & IRQ_TYPE_SENSE_MASK;
+   else
+   *out_type = IRQ_TYPE_NONE;
return 0;
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);
-- 
1.7.11.2



[PATCH 2/2] irqdomain: factorise irq_domain_xlate_onetwocell()

2016-08-02 Thread Sebastian Frias
Commit 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use") introduced three similar functions:

irq_domain_xlate_onecell()
irq_domain_xlate_twocell()
irq_domain_xlate_onetwocell()

yet the last one, irq_domain_xlate_onetwocell(), can be factored to use the
two previous ones to avoid code duplication.

Signed-off-by: Sebastian Frias 
---
 kernel/irq/irqdomain.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 1bdd3fe..28c09ab 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -821,14 +821,12 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
const u32 *intspec, unsigned int intsize,
unsigned long *out_hwirq, unsigned int 
*out_type)
 {
-   if (WARN_ON(intsize < 1))
-   return -EINVAL;
-   *out_hwirq = intspec[0];
if (intsize > 1)
-   *out_type = intspec[1] & IRQ_TYPE_SENSE_MASK;
+   return irq_domain_xlate_twocell(d, ctrlr, intspec, intsize,
+   out_hwirq, out_type);
else
-   *out_type = IRQ_TYPE_NONE;
-   return 0;
+   return irq_domain_xlate_onecell(d, ctrlr, intspec, intsize,
+   out_hwirq, out_type);
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);
 
-- 
1.7.11.2



[PATCH 1/2] irqdomain: fix irq_domain_xlate_onetwocell()

2016-08-02 Thread Sebastian Frias
Use IRQ_TYPE_SENSE_MASK on irq_domain_xlate_onetwocell()

According to the xlate() callback definition, the 'out_type' parameter
needs to be the "linux irq type". A mask for such bits exists,
IRQ_TYPE_SENSE_MASK, and as a matter of fact it is used in
irq_domain_xlate_twocell(), use it for irq_domain_xlate_onetwocell() as
well.

Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use")

Signed-off-by: Sebastian Frias 
---
 kernel/irq/irqdomain.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8798b6c..1bdd3fe 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -824,7 +824,10 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
if (WARN_ON(intsize < 1))
return -EINVAL;
*out_hwirq = intspec[0];
-   *out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
+   if (intsize > 1)
+   *out_type = intspec[1] & IRQ_TYPE_SENSE_MASK;
+   else
+   *out_type = IRQ_TYPE_NONE;
return 0;
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);
-- 
1.7.11.2



Re: [PATCH v2] irqdomain: factorise irq_domain_xlate_onetwocell()

2016-08-02 Thread Sebastian Frias
Hi Thomas,

On 08/01/2016 07:07 PM, Thomas Gleixner wrote:
> On Mon, 1 Aug 2016, Sebastian Frias wrote:
>> Commit 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
>> drivers can use") introduced three similar functions:
>>
>> irq_domain_xlate_onecell()
>> irq_domain_xlate_twocell()
>> irq_domain_xlate_onetwocell()
>>
>> yet the last one, irq_domain_xlate_onetwocell(), can be factored to use the
>> two previous ones to avoid code duplication.
>>
>> Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
>> drivers can use")
> 
> That does not fix anything. It optimizes code. We use the "Fixes" tag only
> when the existing code is buggy.

Ok, I will remove that.

> 
>> Signed-off-by: Sebastian Frias 
>> ---
>>
>> NOTE: the factored code is not strictly the same in the sense that
>> 16b2e6e2f31d returns "intspec[1]" as 'out_type', while this patch would
>> make it return "intspec[1] & IRQ_TYPE_SENSE_MASK".
> 
> So the proper way to do that is to split this into two patches:
> 
>  #1 Change the existing code to do the masking and explain why it is correct.
> 
>  #2 Refactor the code and get rid of the duplicated implementation.

Ok, I can do two patches.

> 
>  
>> Feel free to comment on that matter.
>>
>> ---
>>  kernel/irq/irqdomain.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index bee8b02..125a28c 100644
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -839,9 +839,12 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
>>  {
>>  if (WARN_ON(intsize < 1))
>>  return -EINVAL;
>> -*out_hwirq = intspec[0];
>> -*out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
>> -return 0;
>> +if (intsize == 1)
>> +return irq_domain_xlate_onecell(d, ctrlr, intspec, intsize,
>> +out_hwirq, out_type);
>> +else
>> +return irq_domain_xlate_twocell(d, ctrlr, intspec, intsize,
>> +out_hwirq, out_type);
> 
> So I really wonder how much of a saving that change is. I wouldn't be
> surprised if it would create worse code on some architectures.
> 

Maybe it does, although I looked at this from the point of view of reducing
duplicated code because of the well known issues duplicated code entails.
This case is a good example, since the code was duplicated we ended up with
slightly different versions of it.

Best regards,

Sebastian


[PATCH v2] irqdomain: factorise irq_domain_xlate_onetwocell()

2016-08-01 Thread Sebastian Frias
Commit 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use") introduced three similar functions:

irq_domain_xlate_onecell()
irq_domain_xlate_twocell()
irq_domain_xlate_onetwocell()

yet the last one, irq_domain_xlate_onetwocell(), can be factored to use the
two previous ones to avoid code duplication.

Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use")

Signed-off-by: Sebastian Frias 
---

NOTE: the factored code is not strictly the same in the sense that
16b2e6e2f31d returns "intspec[1]" as 'out_type', while this patch would
make it return "intspec[1] & IRQ_TYPE_SENSE_MASK".

Feel free to comment on that matter.

---
 kernel/irq/irqdomain.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index bee8b02..125a28c 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -839,9 +839,12 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
 {
if (WARN_ON(intsize < 1))
return -EINVAL;
-   *out_hwirq = intspec[0];
-   *out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
-   return 0;
+   if (intsize == 1)
+   return irq_domain_xlate_onecell(d, ctrlr, intspec, intsize,
+   out_hwirq, out_type);
+   else
+   return irq_domain_xlate_twocell(d, ctrlr, intspec, intsize,
+   out_hwirq, out_type);
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);
 
-- 
1.7.11.2




[PATCH 1/2] genirq: Generic chip: add irq_unmap_generic_chip

2016-08-01 Thread Sebastian Frias
Without this patch irq_domain_disassociate() cannot properly release the
interrupt.
Indeed, irq_map_generic_chip() checks a bit on 'gc->installed' but said bit
is never cleared, only set.

Commit 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")
added irq_map_generic_chip() function and also stated "This lacks a removal
function for now".

This commit provides with an implementation of an unmap function that can
be called by irq_domain_disassociate().

Fixes: 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")

Signed-off-by: Sebastian Frias 
---

This is required by loadable modules requesting IRQs.
In our case rmmod will perform free_irq() + irq_dispose_mapping().
Without the unmap call the module cannot request the IRQ after "rmmod"
because it is marked as "installed" by the first successful "insmod".

NOTE: While the proposed unmap() function attempts to undo as much things
as done by the map() function, I did not find a way to undo the following:

a) irq_gc_init_mask_cache(gc, dgc->gc_flags)
b) irq_set_lockdep_class(virq, &irq_nested_lock_class)
c) irq_modify_status(virq, dgc->irq_flags_to_clear, dgc->irq_flags_to_set)

Feel free to comment on that matter.

---
 kernel/irq/generic-chip.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index abd286a..7b464cd 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -411,8 +411,34 @@ int irq_map_generic_chip(struct irq_domain *d, unsigned 
int virq,
 }
 EXPORT_SYMBOL_GPL(irq_map_generic_chip);
 
+void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq)
+{
+   struct irq_data *data = irq_domain_get_irq_data(d, virq);
+   struct irq_domain_chip_generic *dgc = d->gc;
+   struct irq_chip_generic *gc;
+   unsigned int hw_irq = data->hwirq;
+   int chip_idx, irq_idx;
+
+   if (!d->gc)
+   return;
+
+   chip_idx = hw_irq / dgc->irqs_per_chip;
+   if (chip_idx >= dgc->num_chips)
+   return;
+   gc = dgc->gc[chip_idx];
+
+   irq_idx = hw_irq % dgc->irqs_per_chip;
+
+   clear_bit(irq_idx, &gc->installed);
+   irq_domain_set_info(d, virq, hw_irq,
+   &no_irq_chip, NULL, NULL, NULL, NULL);
+
+}
+EXPORT_SYMBOL_GPL(irq_unmap_generic_chip);
+
 struct irq_domain_ops irq_generic_chip_ops = {
.map= irq_map_generic_chip,
+   .unmap  = irq_unmap_generic_chip,
.xlate  = irq_domain_xlate_onetwocell,
 };
 EXPORT_SYMBOL_GPL(irq_generic_chip_ops);
-- 
1.7.11.2




[PATCH 2/2] genirq: Generic chip: factorise code using irq_get_domain_generic_chip()

2016-08-01 Thread Sebastian Frias
irq_map_generic_chip() contains about the same code as
irq_get_domain_generic_chip() except for the return values. Using ERR_PTR()
along with IS_ERR() and PTR_ERR() it is possible to make the map (and
unmap) functions call irq_get_domain_generic_chip instead of duplicating
the code.

irq_get_domain_generic_chip() is modified to return different error codes
using ERR_PTR() to preserve error information.

Most callers of irq_get_domain_generic_chip() do not check for its return
value and assume it is always a valid pointer. The two drivers that do
check for its return value are updated to use IS_ERR()

Fixes: 088f40b7b027 ("genirq: Generic chip: Add linear irq domain support")

Signed-off-by: Sebastian Frias 
---
 drivers/gpio/gpio-dwapb.c  |  2 +-
 drivers/irqchip/irq-atmel-aic-common.c |  2 +-
 kernel/irq/generic-chip.c  | 24 
 3 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/gpio/gpio-dwapb.c b/drivers/gpio/gpio-dwapb.c
index 34779bb..157d684 100644
--- a/drivers/gpio/gpio-dwapb.c
+++ b/drivers/gpio/gpio-dwapb.c
@@ -317,7 +317,7 @@ static void dwapb_configure_irqs(struct dwapb_gpio *gpio,
}
 
irq_gc = irq_get_domain_generic_chip(gpio->domain, 0);
-   if (!irq_gc) {
+   if (IS_ERR(irq_gc)) {
irq_domain_remove(gpio->domain);
gpio->domain = NULL;
return;
diff --git a/drivers/irqchip/irq-atmel-aic-common.c 
b/drivers/irqchip/irq-atmel-aic-common.c
index 28b26c8..33205211 100644
--- a/drivers/irqchip/irq-atmel-aic-common.c
+++ b/drivers/irqchip/irq-atmel-aic-common.c
@@ -122,7 +122,7 @@ static void __init aic_common_ext_irq_of_init(struct 
irq_domain *domain)
 
of_property_for_each_u32(node, "atmel,external-irqs", prop, p, hwirq) {
gc = irq_get_domain_generic_chip(domain, hwirq);
-   if (!gc) {
+   if (IS_ERR(gc)) {
pr_warn("AIC: external irq %d >= %d skip it\n",
hwirq, domain->revmap_size);
continue;
diff --git a/kernel/irq/generic-chip.c b/kernel/irq/generic-chip.c
index f7d6654..3ccac06 100644
--- a/kernel/irq/generic-chip.c
+++ b/kernel/irq/generic-chip.c
@@ -350,10 +350,10 @@ irq_get_domain_generic_chip(struct irq_domain *d, 
unsigned int hw_irq)
int idx;
 
if (!dgc)
-   return NULL;
+   return ERR_PTR(-ENODEV);
idx = hw_irq / dgc->irqs_per_chip;
if (idx >= dgc->num_chips)
-   return NULL;
+   return ERR_PTR(-EINVAL);
return dgc->gc[idx];
 }
 EXPORT_SYMBOL_GPL(irq_get_domain_generic_chip);
@@ -378,13 +378,9 @@ int irq_map_generic_chip(struct irq_domain *d, unsigned 
int virq,
unsigned long flags;
int idx;
 
-   if (!d->gc)
-   return -ENODEV;
-
-   idx = hw_irq / dgc->irqs_per_chip;
-   if (idx >= dgc->num_chips)
-   return -EINVAL;
-   gc = dgc->gc[idx];
+   gc = irq_get_domain_generic_chip(d, hw_irq);
+   if (IS_ERR(gc))
+   return PTR_ERR(gc);
 
idx = hw_irq % dgc->irqs_per_chip;
 
@@ -433,15 +429,11 @@ void irq_unmap_generic_chip(struct irq_domain *d, 
unsigned int virq)
struct irq_domain_chip_generic *dgc = d->gc;
struct irq_chip_generic *gc;
unsigned int hw_irq = data->hwirq;
-   int chip_idx, irq_idx;
-
-   if (!d->gc)
-   return;
+   int irq_idx;
 
-   chip_idx = hw_irq / dgc->irqs_per_chip;
-   if (chip_idx >= dgc->num_chips)
+   gc = irq_get_domain_generic_chip(d, hw_irq);
+   if (IS_ERR(gc))
return;
-   gc = dgc->gc[chip_idx];
 
irq_idx = hw_irq % dgc->irqs_per_chip;
 
-- 
1.7.11.2



[PATCH] irqdomain: factorise irq_domain_xlate_onetwocell()

2016-08-01 Thread Sebastian Frias
Commit 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use") introduced three similar functions:

irq_domain_xlate_onecell()
irq_domain_xlate_twocell()
irq_domain_xlate_onetwocell()

yet the last one, irq_domain_xlate_onetwocell(), can be factored to use the
two previous ones to avoid code duplication.

Fixes: 16b2e6e2f31d ("irq_domain: Create common xlate functions that device
drivers can use")

Signed-off-by: Sebastian Frias 
---

NOTE: the factored code is not strictly the same in the sense that
16b2e6e2f31d returns "intspec[1]" as 'out_type', while this patch would
make it return "intspec[1] & IRQ_TYPE_SENSE_MASK".

Feel free to comment on that matter.

---
 kernel/irq/irqdomain.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index bee8b02..ea2df0e 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -839,9 +839,20 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d,
 {
if (WARN_ON(intsize < 1))
return -EINVAL;
-   *out_hwirq = intspec[0];
-   *out_type = (intsize > 1) ? intspec[1] : IRQ_TYPE_NONE;
-   return 0;
+   if (intsize == 1)
+   return irq_domain_xlate_onecell(d,
+   ctrlr,
+   intspec,
+   intsize,
+   out_hwirq,
+   out_type);
+   else
+   return irq_domain_xlate_twocell(d,
+   ctrlr,
+   intspec,
+   intsize,
+   out_hwirq,
+   out_type);
 }
 EXPORT_SYMBOL_GPL(irq_domain_xlate_onetwocell);
 
-- 
1.7.11.2



Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-20 Thread Sebastian Frias
Hi Jason,

On 07/06/2016 06:28 PM, Jason Cooper wrote:
> Hi Sebastian,
> 
> On Wed, Jul 06, 2016 at 01:37:21PM +0200, Sebastian Frias wrote:
>> On 07/05/2016 06:16 PM, Jason Cooper wrote:
>>>> Come to think of it, I'm not sure the *name* of the file documenting
>>>> a binding is as important to DT maintainers as the compatible string.
>>>
>>> Correct.  devicetee compatible strings need to be as specific as
>>> possible.  
>>
>> Specific with respect to what thing? To the HW module they are describing
>> (USB, IRQ controller, etc.) or to the chip the HW module it is embedded
>> into?
> 
> The compatible string uniquely identifies an interface between an IP
> block and the software (the devicetree binding).  We use the most
> specific model number or name we can for that IP block when we create
> the binding and the compatible string.

Ok, so we agree that the string identifies the HW block itself, and not the
chip in which it is embedded into.

> 
> If future SoCs come out and the IP block contained within, regardless of
> identifier, is compatible with the existing binding, then we can reuse
> it.  This is what I was trying to show with my little chart quoted
> below.

I understand, is just that the chart was using chip IDs so I was a bit
confused.

> 
> For ethernet and other major blocks it's easy because they have their
> own model numbers and such.  For the smaller blocks, like irqchips, we
> have to use the model number or unique name of the SoC we first found it
> in.

Ok, but I'm not sure I understand the logic, I thought (and it is my
understanding of your previous paragraphs) that the string is supposed to
identify the IP block (or HW module) itself.

That seems more generic and flexible than attaching it some "external"
property, like the name of the chip the HW module happens to be embedded into.

Normally such "small HW blocks" you talk about, because they are not generic
enough, probably require documentation/engineering effort from the SoC
manufacturer, so I would guess that the naming can be decided with the SoC
them.

Why is there a difference on the naming convention for "smaller blocks, 
like irqchips" and other blocks (ethernet, USB, UART, etc.) ?

>> A SoC is composed of several HW modules, some are shared among different
>> manufacturers (i.e.: "generic-xhci"), and some are shared among different
>> product lines of the same manufacturer (i.e.: "sigma,smp,irqrouter").
> 
> Yes, that's why it's critical to be specific.  We want to reuse drivers
> where it makes sense.  We can if the interface is the same.

Ok, so is "sigma,smp,irqrouter" good or not?

> 
>> And the DT for a given chip should describe the collection of HW modules
>> that make up the SoC, regardless of what chip introduced them or what
>> other chip uses it. Why would that be relevant anyway?
> 
> Because the first time we see a new IP block, we can't predict the
> future.  We don't know where it's going to be reused.  We don't know
> when it's going to be changed, or how.  So we mark it as specifically as
> we can based on the information we have when we first encounter it.
> 
> When a new version comes out, we see if the interface is still
> compatible.  If it is, there's nothing to do.  If it changed, we see if
> we can address it by adding a property to the existing binding.  Failing
> that, we create a new compatible string to indicate a new interface.

I see. I think this process is ok for when there's no information, like when
somebody reverse engineer a HW block of a chip of a given line of chips.
In that case I can understand that there's no way to know if previous,
similar or future chips use the same block, nor its name.

Yet, I still fail to see the reason to attach the chip ID to what clearly
are submodules. Especially when one can assume that such submodules can
be shared on different chip IDs.

> 
>> By that reasoning, I also think that drivers/net/ethernet/aurora/nb8800.c
>> should have had a string like "aurora,nb8800,sigma" or something, to
>> specify that it is a NB8800 from Aurora integrated in a Sigma platform.
> 
> The fact that it is in the Sigma dts file tells this already.  Based on
> the above, did the interface change when adding it to the sigma
> platform?  Can the binding be reused?

I think it did not change.
Actually, I think only Sigma uses it, as the driver only has Sigma-specific
initialisations for two Sigma-specific strings.

> 
>> That way it can behave as vanilla NB8800 when the string is "aurora,nb8800"
>> and it can adapt its behaviour to Sigma's platform when a different

Re: [RFC PATCH v2] irqchip: add support for SMP irq router

2016-07-20 Thread Sebastian Frias
Hi Thomas,

Thanks for your comments.
I appreciate the time you dedicated to the review.
I will take your comments into account, in the meanwhile, and since this was
a RFC, would it be possible to also get some feedback/comments from a
conceptual point of view?

I'm copy/pasting some of the questions I attached to the RFC email here:


2) In order to get a virq mapping for the domains associated with the outputs
(the outputs connected to the GIC) I request a sort of "Fake HW IRQ"
See irq-tango_v4.c:1400

/* To request a virq we need a HW IRQ, use a "Fake HW IRQ" */
hwirq  = index + irqrouter->input_count + irqrouter->swirq_count;

This feels a bit like a hack, so suggestions are welcome.

3) The file is called 'irq-tango_v4.c' but I think it should match the
compatible string, so I was thinking of renaming:
irq-tango_v4.c => irq-sigma_smp_irqrouter.c
irq-tango_v4.h => irq-sigma_smp_irqrouter.h

What do you think?

4) Do I have to do something more to handle the affinity stuff?

6) More of a theoretical question:
I have to #include  from
code that is not in the Linux tree. That's fine for now, but if in the future
there are other irq controllers and another header is required, how can the
code know which header to #include ?
Are we supposed to #include all of them, and then somehow detect which module
is actually active and act accordingly? If so, can we detect which module
matched and probed correctly to know which controller version we need to
talk to?


What the driver does is:
- creates a domain hierarchy attached to its parent domain (the GIC)
  - the callbacks for this domain hierarchy will be used to deal with IRQs
  "directly" routed to the GIC (i.e.: exclusive IRQ lines routed to the GIC
  and not shared).
  In this case the GIC dispatches the interrupts.
- create linear domains attached to this driver's node.
  - the callbacks for these linear domains will be used to deal with shared
  IRQs (they share the routing to one of the GIC inputs).
  In this case this driver dispatches the interrupts after reading the flags.

It is such concept, and then its implementation, that I would need some more
advise with, because, as stated in the questions above, it still does not
feels right, especially question 2.

I will reply to your comments below and will post a v3 later, hopefully after
dealing also with the answers to the doubts outlined above.


On 07/19/2016 06:49 PM, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Sebastian Frias wrote:
>> +
>> +#define DBGERR(__format, ...)   panic("[%s:%d] %s(): " __format,
>> \
>> +  __FILE__, __LINE__,   \
>> +  __func__, ##__VA_ARGS__)
> 
> Please get rid of this macro mess. Use the functions (panic, ...)
> directly. There is no value for this file, line, func crappola. Think about
> proper texts which can be grepped for.

Ok.

> 
>> +
>> +#define DBGWARN(__format, ...)  pr_err("[%s:%d] %s(): " __format,   
>> \
> 
> Very consistent. WARN == err 
> 
>> +   __FILE__, __LINE__,  \
>> +   __func__, ##__VA_ARGS__)
>> +
>> +
>> +#if 0
> 
> Don't even think about posting code with "#if 0" or such in it. 
> 

The idea behind these macros was:
- to easily enable/disable them.
- to be more consistent with other code written at Sigma: this is a
Sigma-only driver.

>> +#define DBGLOG(__format, ...)   pr_info("[%s:%d] %s(): " __format,  \
>> +__FILE__, __LINE__, \
>> +__func__, ##__VA_ARGS__)
>> +#else
>> +#define DBGLOG(__format, ...) do {} while (0)
>> +#endif
> 
> pr_debug() is there for a reason.

Yes, but doesn't pr_debug() requires changes to the Makefile?

> 
>> +#if 0
>> +#define SHORT_OR_FULL_NAME full_name
>> +#else
>> +#define SHORT_OR_FULL_NAME name
>> +#endif
> 
> Use one and be done with it. Really.

By making the choice explicit, the person reading/debugging is aware of the 
possibilities.
Picking "full_name" or "name" would likely prevent the person from knowing it 
has other debug options.

But I can change this if upstreaming depends on it.

> 
>> +
>> +#define NODE_NAME(__node__) (__node__ ? __node__->SHORT_OR_FULL_NAME :  
>> \
>> + "")
> 
> Sigh. pr_xxx("%s", NULL) prints . That's really sufficient for your
> debug/error handling.

The code is checking for __node__ to be non-NULL before dereferencing it.

> 
>> +
>> +#define BITMASK_VE

[RFC PATCH v2] irqchip: add support for SMP irq router

2016-07-19 Thread Sebastian Frias

This adds support for a second-gen irq router/controller present
on some Sigma Designs chips.

Signed-off-by: Sebastian Frias 
---

This is RFC v2 attempts to address the comments given on the previous
RFC:
- domains used to be created by the DT, now they are created internally
by the driver:
https://marc.info/?l=linux-kernel&m=146773883324507&w=2
- IRQ sharing (which does not seems to be handled by current DT spec,
https://marc.info/?l=linux-kernel&m=146779759405950&w=2) is implemented
as:
  - explicit grouping: https://marc.info/?l=linux-kernel&m=146781302410442&w=2
  - implicit grouping: https://marc.info/?l=linux-kernel&m=146780217707395&w=2

However, I still have a few doubts:
1) I have implemented two ways of declaring the IRQ sharing, implicit and
explicit IRQ grouping, see:
Documentation/devicetree/bindings/interrupt-controller/sigma,smp,irqrouter.txt
for more information.

Since it is still not clear how this is going to be used, I prefer to keep
both available for the moment.

2) In order to get a virq mapping for the domains associated with the outputs
(the outputs connected to the GIC) I request a sort of "Fake HW IRQ"
See irq-tango_v4.c:1400

/* To request a virq we need a HW IRQ, use a "Fake HW IRQ" */
hwirq  = index + irqrouter->input_count + irqrouter->swirq_count;

This feels a bit like a hack, so suggestions are welcome.

3) The file is called 'irq-tango_v4.c' but I think it should match the
compatible string, so I was thinking of renaming:
irq-tango_v4.c => irq-sigma_smp_irqrouter.c
irq-tango_v4.h => irq-sigma_smp_irqrouter.h

What do you think?

4) Do I have to do something more to handle the affinity stuff?

5) checkpatch.pl reports a few warnings:

- WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
I have not changed the file because I would like to be settled on the naming
first (see point 2 above)
- WARNING: quoted string split across lines
I think it is not an issue
- WARNING: line over 80 characters
This is for two big macros, I think it is not an issue

6) More of a theoretical question:
I have to #include  from
code that is not in the Linux tree. That's fine for now, but if in the future
there are other irq controllers and another header is required, how can the
code know which header to #include ?
Are we supposed to #include all of them, and then somehow detect which module
is actually active and act accordingly? If so, can we detect which module
matched and probed correctly to know which controller version we need to
talk to?

Please feel free to comment and suggest improvements.

---
 .../interrupt-controller/sigma,smp,irqrouter.txt   |  135 ++
 drivers/irqchip/Makefile   |1 +
 drivers/irqchip/irq-tango_v4.c | 1729 
 .../interrupt-controller/irq-tango_v4.h|   39 +
 4 files changed, 1904 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/sigma,smp,irqrouter.txt
 create mode 100644 drivers/irqchip/irq-tango_v4.c
 create mode 100644 include/dt-bindings/interrupt-controller/irq-tango_v4.h

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/sigma,smp,irqrouter.txt
 
b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp,irqrouter.txt
new file mode 100644
index 000..ff7c4d5
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp,irqrouter.txt
@@ -0,0 +1,135 @@
+* Sigma Designs Interrupt Router
+
+This module can route X IRQ inputs and Y SW IRQ inputs into Z IRQ outputs,
+with X+Y>Z.
+For instance X=128, Y=16, Z=24.
+
+Note however that the HW does not latch the IRQ lines, so devices
+connecting to the router are expected to latch their IRQ line by themselves.
+
+SW IRQs can be used by firmware running on different parts of the SoC to
+communicate with the CPU.
+They can be registered by specific drivers, and any entity capable of writing
+to the Interrupt Router registers can trigger and clear these interrupts.
+
+A single node in the device tree is used to describe the Interrupt Router.
+Since X+Y>Z, some IRQ inputs may need to be routed to the same IRQ output,
+thus "sharing the IRQ line".
+
+There are two ways of defining such sharing, either by specifying the number
+of IRQ groups, either by listing the IRQ groups and their contents.
+In the former case the groups are created and named "implicitly", in the
+later case the groups are created "explicitly" by grouping IRQs.
+An IRQ group will share an IRQ output, thus all input IRQs falling into that
+group will share a given IRQ line.
+
+If SW IRQs are enabled ("swirq-count" != 0), they will *ALL* be grouped
+together.
+
+Required properties:
+- compatible: Should be "sigma,smp,irqrouter".
+- interrupt-controller: Identifies the node as an interrupt controller.
+- inpu

Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-07 Thread Sebastian Frias
Hi Marc,

On 07/06/2016 03:50 PM, Marc Zyngier wrote:
>> I think that's where part the misunderstanding comes from.
>> IMHO the output line is not a direct function of the input line.
>> Any of the 64 IRQ lines entering the "old controller" (irq-tango.c) can be
>> routed to any of its 3 outputs.
> 
> Then the current DT binding isn't properly describing the HW.

Ok, thanks, so it is not a good example then.

>> In a nutshell:
>> - "old controller": routes [0...N] => GIC inputs [2...4]
>> - "new controller": routes [0...M] => GIC inputs [0...23]
>>
>> So, when we think about it, if the "new DT" specified 24 domains, it would
>> be equivalent of the "old DT" with 3 domains, right?
> 
> Indeed, but I consider the "old" binding to be rather misleading. It
> should have been described as a router too, rather than hardcoding
> things in DT. Granted, it doesn't matter much when you only have 3
> possible output lines. But with 24 outputs, that becomes much more relevant.

I see.

>> So, putting aside routing considerations and the discussion above, I think
>> a simpler question is: if the domains should not be described in the DT,
>> how can we define the IRQ sharing in the DT?
> 
> You could have a set of sub-nodes saying something like this:
> 
>   mux-hint0 {
>   inputs = <1 45 127>;
>   }
> 
>   mux-hint1 {
>   inputs = <2 33>;
>   }
> 
> (or maybe you can have that as direct properties, but you get the idea).
> Here, you have two output pins dedicated to muxed interrupts (assuming
> they are all level interrupts), and the last 22 can be freely allocated
> as direct routes.
> 

Ok, I'll try to do that.
So, aside from the DT issues (that is, that it is describing domains),
would it be ok to create a domain for each of the outputs?

Because I was looking at:
- 
Documentation/devicetree/bindings/interrupt-controller/samsung,exynos4210-combiner.txt
- drivers/irqchip/exynos-combiner.c
- arch/arm/boot/dts/exynos4210.dtsi

and what I see is that the DT basically list all outputs [0...15] connected
to the parent interrupt controller, although the driver does not creates
separate domains, just one. Then it attaches a chained handler for each of
the outputs. On the .map callback it attaches a irqchip to the domain.

There is also:
- Documentation/devicetree/bindings/arm/omap/crossbar.txt
- drivers/irqchip/irq-crossbar.c
- arch/arm/boot/dts/dra7.dtsi

This one creates a domain hierarchy linked to the parent domain and uses
irq_domain_alloc_irqs_parent() and irq_domain_set_hwirq_and_chip() to attach
a irqchip to the domain on the .alloc callback.

Both use a single domain, as opposed to irq-tango.c which creates 3 domains.
Right now irq-tango_v2.c is supposed to create one domain per output (if
so the DT says)
Are there guidelines regarding that?

Thanks in advance.
Best regards,

Sebastian







Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-06 Thread Sebastian Frias
Hi Jason,

On 07/05/2016 06:16 PM, Jason Cooper wrote:
>> Come to think of it, I'm not sure the *name* of the file documenting
>> a binding is as important to DT maintainers as the compatible string.
> 
> Correct.  devicetee compatible strings need to be as specific as
> possible.  

Specific with respect to what thing? To the HW module they are describing
(USB, IRQ controller, etc.) or to the chip the HW module it is embedded
into?

>In a series of compatible IP blocks, the string should refer
> to the first version in the series, e.g. sigma,smp8710 for a series of
> compatible IP blocks like 8710, 8712, 8715, 8724.  If an 8751 came along
> with a different register layout or some other incompatibility, then a
> new string would be sigma,smp8751.  So,
> 
>   8710 uses "sigma,smp8710"
>   8712 uses "sigma,smp8710"
>   8715 uses "sigma,smp8710"
>   8724 uses "sigma,smp8710"
>   8751 uses "sigma,smp8751"
>   8754 uses "sigma,smp8751"
> 

But this is not consistent with the strings for generic drivers, like
"generic-xhci", etc.

A SoC is composed of several HW modules, some are shared among different
manufacturers (i.e.: "generic-xhci"), and some are shared among different
product lines of the same manufacturer (i.e.: "sigma,smp,irqrouter").

And the DT for a given chip should describe the collection of HW modules
that make up the SoC, regardless of what chip introduced them or what
other chip uses it. Why would that be relevant anyway?

By that reasoning, I also think that drivers/net/ethernet/aurora/nb8800.c
should have had a string like "aurora,nb8800,sigma" or something, to
specify that it is a NB8800 from Aurora integrated in a Sigma platform.
That way it can behave as vanilla NB8800 when the string is "aurora,nb8800"
and it can adapt its behaviour to Sigma's platform when a different string
(like "aurora,nb8800,sigma") is used in the DT.

Later if Sigma modified the nb8800 HW, it could be "aurora,nb8800,sigma-v2".
The same way, if later Sigma used a different Aurora HW module, the DT
would say "aurora,nb2000,sigma" (to signal that another driver is required)

Instead, drivers/net/ethernet/aurora/nb8800.c has "sigma,smp8642-ethernet"
string, which ties it to a particular chip, and I don't see how does that
convey version about the module or other relevant information; Plus it is
confusing if the same module is then embedded on a SMP8756 chip.

Would you mind explaining how does that contrasts with the logic used
by DT naming conventions?

Best regards,

Sebastian



Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-06 Thread Sebastian Frias
Hi,

On 07/06/2016 11:30 AM, Thomas Gleixner wrote:
> On Wed, 6 Jul 2016, Marc Zyngier wrote:
>> On 05/07/16 20:24, Thomas Gleixner wrote:
>>> On Tue, 5 Jul 2016, Marc Zyngier wrote:
 Hardcoded? No way. You simply implement a route allocator in your
 driver, assigning them as needed. And yes, if you have more than 24
 interrupts, they get muxed.
>>>
>>> There is one caveat though. Under some circumstances (think RT) you want to
>>> configure which interrupts get muxed and which not. We really should have 
>>> that
>>> option, but yes for anything which has less than 24 autorouting is the way 
>>> to
>>> go.
>>
>> Good point. I can see two possibilities for that:
>>
>> - either we describe this DT with some form of hint, indicating what are
>> the inputs that can be muxed to a single output. Easy, but the DT guys
>> are going to throw rocks at me for being Linux-specific.
> 
> That's not necessarily Linux specific. The problem arises with any other OS as
> well.
> 
>> - or we have a way to express QoS in the irq subsystem, and a driver can
>> request an interrupt with a "make it fast" flag. Of course, everybody
>> and his dog are going to ask for it, and we're back to square one.
> 
> That and the driver does not know about the particular application
> scenario/system configuration.
>  
>> Do we have a way to detect which interrupt is more likely to be
>> sensitive to muxing? My hunch is that if it is requested with
>> IRQF_SHARED, then it is effectively muxable. Thoughts?
> 
> That's too late. request_irq happens _after_ the interrupt is set up and the
> routing established.
> 

What about using 3 values for the interrupt description like the GIC does?
When connecting to the GIC we say "interrupts = ;"
If devices using this driver (the one from the RFC) requested the interrupt 
like:
"interrupts = <0 38 IRQ_TYPE_LEVEL_HIGH>;"
"interrupts = <2 27 IRQ_TYPE_LEVEL_HIGH>;"
etc.
with the first field being the "group", then the driver could create a domain
for the device's IRQ (or associate it to an existing one if it has already been
created). It would thus serve as a hint on how to create domains and how to
share IRQs into the same line (domain).

I guess I can get such information from the .translate and .alloc callbacks
from a newly created domain hierarchy attached to the GIC, right?

What do you think?

Best regards,

Sebastian



Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-06 Thread Sebastian Frias
Hi Marc,

On 07/05/2016 07:13 PM, Marc Zyngier wrote:
>>> You really don't need to describe this. The configuration that is
>>> applied to your router in entirely under software control,
>>
>> With "entirely under software control" do you mean this driver's code?
> 
> Yes.

Ok.

> 
>>
>>> and none of
>>> that should appear in the DT. You could decide to mux all the interrupts
>>> to a single one, or decide that the 23 first interrupts you discover get
>>> their own private line to the GIC and that everything else is muxed.
>>>
>>> So given that this is completely defined by software, it has no place in
>>> DT. 
>>
>> I think I'm missing something, what is the difference between the domains
>> described by nodes in the DT for irq-tango.c 
>> (arch/arm/boot/dts/tango4-common.dtsi)
>> and the DT from my RFC?
> 
> The fundamental difference is that with your new fancy controller, you
> can decide what is going where, while the previous one is completely set
> in stone (the output line is a direct function of the input line).

I think that's where part the misunderstanding comes from.
IMHO the output line is not a direct function of the input line.
Any of the 64 IRQ lines entering the "old controller" (irq-tango.c) can be
routed to any of its 3 outputs.
The only thing fixed is which GIC input is connected to those 3 outputs, ie:
GIC inputs 2, 3 and 4.

In the the "new controller" (irq-tango_v2.c, this RFC), any of 128 IRQ lines
can be routed to any of 24 outputs, connected to GIC inputs 0...23.

In a nutshell:
- "old controller": routes [0...N] => GIC inputs [2...4]
- "new controller": routes [0...M] => GIC inputs [0...23]

So, when we think about it, if the "new DT" specified 24 domains, it would
be equivalent of the "old DT" with 3 domains, right?

That's why it seemed more or less natural to keep describing the domains in
the DT, the main reason for that being that it allowed the user to specify
the IRQ sharing in the DT, and this is precisely the key point of this.

So, putting aside routing considerations and the discussion above, I think
a simpler question is: if the domains should not be described in the DT,
how can we define the IRQ sharing in the DT?

Best regards,

Sebastian


Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-05 Thread Sebastian Frias
Hi Marc,

On 07/05/2016 06:48 PM, Marc Zyngier wrote:
>> I already did something like that, you can see it here:
>>
>> https://marc.info/?l=linux-kernel&m=146592235919308&w=2
>>
>> the problem with that code is that it cannot handle more than 24 IRQs (the
>> number of outputs of the router), because they are not being shared.
>>
>> Maybe I need a sort of hybrid approach by reintroducing part of
>> "irq-crossbar.c" code to replace the irq domain layout that is currently
>> being done using DT properties ?
>>
>> However, I have not seen any examples of how to describe, using the DT,
>> an association between a device HW irq, and the GIC hwirq where it goes to,
>> nor how to express in the DT that multiple devices should share a given GIC
>> hwirq.
>> Basically, when a device requests the IRQ specified in its DT, I need:
>> - to know which GIC hwirq line should I route it to (or the GIC to tell
>> me which one it expects)
> 
> You really don't need to describe this. The configuration that is
> applied to your router in entirely under software control,

With "entirely under software control" do you mean this driver's code?

> and none of
> that should appear in the DT. You could decide to mux all the interrupts
> to a single one, or decide that the 23 first interrupts you discover get
> their own private line to the GIC and that everything else is muxed.
> 
> So given that this is completely defined by software, it has no place in
> DT. 

I think I'm missing something, what is the difference between the domains
described by nodes in the DT for irq-tango.c 
(arch/arm/boot/dts/tango4-common.dtsi)
and the DT from my RFC?

Alternatively, the previous DT (arch/arm/boot/dts/tango4-common.dtsi) allowed
IRQ sharing to be specified in the DT, is that wrong?

>There may not be an example of such an interrupt controller in the
> tree, but this doesn't look too hard to implement.

Well, if you the domains should not be described in the DT and that they should
be somehow hardcoded into the drivers' code, it should not be hard indeed.

Best regards,

Sebastian


Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-05 Thread Sebastian Frias
Hi Jason,

On 07/05/2016 05:53 PM, Jason Cooper wrote:
>>
>> Thanks for your comments.
>> So, aside from some naming issues, do you think the driver is ok?
> 
> Well, it's going to be few days before I can really dig in to this.
> Until then, what I can say I see is that it looks like you're using
> devicetree to tell Linux how to lay out the irq domains.  That's not
> right :(

Ok, so that replies my questions 1 and 2, thanks.

> 
> The devicetree should *only* describe the hardware. Would *BSD be able
> to use the description in the dtb effectively?
> 
> iiuc, I think irq-crossbar.c may be a similar enough in task to give you
> an idea or two.

I already did something like that, you can see it here:

https://marc.info/?l=linux-kernel&m=146592235919308&w=2

the problem with that code is that it cannot handle more than 24 IRQs (the
number of outputs of the router), because they are not being shared.

Maybe I need a sort of hybrid approach by reintroducing part of
"irq-crossbar.c" code to replace the irq domain layout that is currently
being done using DT properties ?

However, I have not seen any examples of how to describe, using the DT,
an association between a device HW irq, and the GIC hwirq where it goes to,
nor how to express in the DT that multiple devices should share a given GIC
hwirq.
Basically, when a device requests the IRQ specified in its DT, I need:
- to know which GIC hwirq line should I route it to (or the GIC to tell
me which one it expects)
- two devices should be able to request to share the same GIC hwirq

If you take a look at the DT for drivers/irqchip/irq-tango.c
(arch/arm/boot/dts/tango4-common.dtsi) you will see that 3 domains are
created using DT nodes. The difference being that in the irq-tango.c case
the routing is fixed w.r.t the GIC.

Best regards,

Sebastian


Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-05 Thread Sebastian Frias
Hi Jason,

On 07/05/2016 04:41 PM, Jason Cooper wrote:
> Hey Sebastian, Mason,
> 
> * Please fix mailer to wrap text at a sane length.  I've re-wrapped and
> trimmed.
> 
> On Tue, Jul 05, 2016 at 02:30:12PM +0200, Sebastian Frias wrote:
>> On 07/04/2016 02:11 PM, Mason wrote:
> ...
>>>>  .../sigma,smp87xx-irqrouter.txt|  69 +++
>>>
>>> In the *actual* submission, we can't use a wildcard like smp87xx
>>> we'll have to use an actual part number.
>>
>> Are you sure?
>> That would hinder genericity.
>> Actually I wanted to call it "sigma,smp-irqrouter.txt" (or 
>> "sigma,smp,irqrouter.txt").
> 
> sigma,smp-irqrouter.txt should be fine.  The devicetree maintainers
> should yelp if they want something different.
> 
>> To me there's no need to link the compatible string of a given HW
>> module with that of the chip name the module it is embedded into.  For
>> example, the generic USB3 driver is "generic-xhci".  While this module
>> is not generic to be embedded in chips from different manufacturers,
>> it is supposed to be generic within Sigma, and multiple Sigma chips
>> (with potentially different denominations) can use it.
>>
>>>
>>>>  drivers/irqchip/Makefile   |   1 +
>>>>  drivers/irqchip/irq-tango_v2.c | 594 
>>>> +
>>>
>>> Likewise, I don't like the "_v2" suffix, it's too generic.
>>> Actual submission should use something more specific.
>>
>> Well, the other driver is irq-tango.c that is generic as well.
>> I prefer versioning, as it is unrelated with the actual chip name.
> 
> Is there a name, similar to 'tango', for this version of the IP?
> Something that would spark recognition for someone looking for "the damn
> driver for this XYZ irqchip I have".  If not, irq-tango_v2.c is fine.
> 

Thanks for your comments.
So, aside from some naming issues, do you think the driver is ok?
Indeed, I'm not sure about the following:

1) I had to unroll irq_of_parse_and_map() in order to get the HW
IRQ declared in the device tree so that I can associate it with
a given domain. Is there another way?

2) I'm not sure about the DT specification, in particular, the use
of child nodes to declare different domains, but it works.
Advise and/or hints regarding this are welcome.

3) I'm calling this an irq router to somehow highlight the fact
that it is not a simple interrupt controller. Indeed it does
not latch the IRQ lines by itself, and does not supports edge
detection.
Does anybody else has a better idea for the name?

4) Do I have to do something more to handle the affinity stuff?

5) checkpatch.pl reports warnings, etc. but I guess it's ok for
now since it is a RFC 

6) Do we need a lock in the cascade_irq callback set with
irq_set_chained_handler_and_data() ?

Thanks in advance.
Best regards,

Sebastian


Re: [RFC PATCH v1] irqchip: add support for SMP irq router

2016-07-05 Thread Sebastian Frias
On 07/04/2016 02:11 PM, Mason wrote:
> 
> In the patch subject, do you mean SMP as in Symmetric Multi Processor?

As in Sigma Multimedia Processor :-)
The prefix for Sigma's chips is SMP.

I can change that to "Tango" if it is confusing.

> 
> Is that the address you intend to submit with?

Yes.

> 
> The "core" topic is over my head, so I'll just discuss the color
> of the bike shed.

:-)
Thanks for your comments, hopefully some knowledgeable people will discuss the 
core topic as well.

> 
>>  .../sigma,smp87xx-irqrouter.txt|  69 +++
> 
> In the *actual* submission, we can't use a wildcard like smp87xx
> we'll have to use an actual part number.

Are you sure?
That would hinder genericity.
Actually I wanted to call it "sigma,smp-irqrouter.txt" (or 
"sigma,smp,irqrouter.txt").

To me there's no need to link the compatible string of a given HW module with 
that of the chip name the module it is embedded into.
For example, the generic USB3 driver is "generic-xhci".
While this module is not generic to be embedded in chips from different 
manufacturers, it is supposed to be generic within Sigma, and multiple Sigma 
chips (with potentially different denominations) can use it.

> 
>>  drivers/irqchip/Makefile   |   1 +
>>  drivers/irqchip/irq-tango_v2.c | 594 
>> +
> 
> Likewise, I don't like the "_v2" suffix, it's too generic.
> Actual submission should use something more specific.

Well, the other driver is irq-tango.c that is generic as well.
I prefer versioning, as it is unrelated with the actual chip name.

Following the reasoning from my previous reply, I think the file name should be 
something like "smp-irqrouter.c" to be close to the "compatible" string.

However, IMHO there is a mix of various naming conventions on the kernel tree 
so I don't know which one is recommended.

> 
>> +++ 
>> b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp87xx-irqrouter.txt
>> @@ -0,0 +1,69 @@
>> +* Sigma Designs Interrupt Router
>> +
>> +This module can route N IRQ inputs into M IRQ outputs, with N>M.
>> +For instance N=128, M=24.
>> +
>> +Note however that the HW does not latches the IRQ lines, so devices
>  ^^^
> "does not latch"

Ok.

> 
> Also, you hard-code the fact that N/32 = 4 with the status_i variables.
> 
> Would it make sense to use for loops?
> (instead of unrolling the loops)
> 
>   e.g. for (i = 0; i < N/4; ++i) { ... }
> 

Actually my current version uses this (also on 
"tango_irqdomain_handle_cascade_irq()"), but I'm not sure I'm happy with it 
because the person reading may think the code is generic while it is not.
I mean, there is no guarantee on how the HW guys will extend this in the 
future, and trying to make the code generic could end up as wasted premature 
optimization.

> 
>> +Optional properties:
>> +- interrupt-parent: pHandle of the parent interrupt controller, if not
>> +  inherited from the parent node.
> 
> I'm not sure this is what "optional" means.
> 

>From what I've seen in the documentation for other DT bindings, this is the 
>meaning of "optional".
But I don't mind changing it if I get some hints about the meaning.

> 
>> +The following example declares a irqrouter with 128 inputs and 24 outputs,
>> +with registers @ 0x6F800 and connected to the GIC.
>> +The two child nodes define two irqdomains, one connected to GIC input 2
>> +(hwirq=2, level=high), and ther other connected to GIC input 3 (hwirq=3,
>   
> 
>> +#define ROUTER_INPUTS  (128)
>> +#define ROUTER_OUTPUTS (24)
> 
> Parentheses are unnecessary around constants.

I'm used to use parenthesis in macros to avoid operator priority side effects.

> 
>> +#define IRQ_ROUTER_ENABLE_MASK (BIT(31))
>> +#define IRQ_ROUTER_INVERT_MASK (BIT(16))
> 
> Parentheses already provided in BIT macro.

Same as above.

> 
>> +#define READ_SYS_IRQ_GROUP0 (0x420)
>> +#define READ_SYS_IRQ_GROUP1 (0x424)
>> +#define READ_SYS_IRQ_GROUP2 (0x428)
>> +#define READ_SYS_IRQ_GROUP3 (0x42C)
> 
> If a for loop were used, we'd only need to
> #define SYSTEM_IRQ0x420
> 
>   for (i = 0; i < N/4; ++i) {
> status_i = readl(base + SYSTEM_IRQ + i*4);
> 

I suppose you mean "status[i]".
There's a trade-off between explicit and clear code, and compact code.
I believe in this case the unrolled code not only is more clear, it may end up 
being faster.

As I was saying earlier, my current version uses a mixed approach with the loop 
unrolled for reading the registers and a for loop for the dispatch. 

If in the future we have to read much more registers, we should revisit this.

>> +#if 0
>> +#define SHORT_OR_FULL_NAME full_name
>> +#else
>> +#define SHORT_OR_FULL_NAME name
>> +#endif
> 
> Just pick one?

By making the choice explicit, the person reading/debugging is aware of the 
possibilities.
Picking "full_name" or "name" would likely prevent the person from knowing it 
has other debug opt

[RFC PATCH v1] irqchip: add support for SMP irq router

2016-06-30 Thread Sebastian Frias
This adds support for a second-gen irq router/controller present
on some Sigma Designs chips.

Signed-off-by: Sebastian Frias 
---

This is a RFC because I have a few doubts:
1) I had to unroll irq_of_parse_and_map() in order to get the HW
IRQ declared in the device tree so that I can associate it with
a given domain.

2) I'm not sure about the DT specification, in particular, the use
of child nodes to declare different domains, but it works

3) I'm calling this an irq router to somehow highlight the fact
that it is not a simple interrupt controller. Indeed it does
not latch the IRQ lines by itself, and does not supports edge
detection.

4) Do I have to do something more to handle the affinity stuff?

5) checkpatch.pl reports warnings, etc. but I guess it's ok for
now since it is a RFC :-)

Please feel free to comment and suggest improvements.
---
 .../sigma,smp87xx-irqrouter.txt|  69 +++
 drivers/irqchip/Makefile   |   1 +
 drivers/irqchip/irq-tango_v2.c | 594 +
 3 files changed, 664 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/sigma,smp87xx-irqrouter.txt
 create mode 100644 drivers/irqchip/irq-tango_v2.c

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/sigma,smp87xx-irqrouter.txt
 
b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp87xx-irqrouter.txt
new file mode 100644
index 000..0e404f0
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/interrupt-controller/sigma,smp87xx-irqrouter.txt
@@ -0,0 +1,69 @@
+* Sigma Designs Interrupt Router
+
+This module can route N IRQ inputs into M IRQ outputs, with N>M.
+For instance N=128, M=24.
+
+Note however that the HW does not latches the IRQ lines, so devices
+connecting to the router are expected to latch their IRQ line by themselves.
+
+A single node in the device tree is used to describe the interrupt router.
+Child nodes (up to a maximum of 'outputs') describe irqdomains for the outputs
+of the interrupt router.
+These child nodes specify, via their 'interrupts' property, how the
+interrupt router is connected to its parent interrupt controller (usually the
+GIC), and define irqdomains that can be used in other nodes' 'interrupts'
+property.
+
+Required properties:
+- compatible: Should be "sigma,smp87xx-irqrouter".
+- interrupt-controller: Identifies the node as an interrupt controller.
+- inputs: The number of IRQ lines entering the router
+- outputs: The number of IRQ lines exiting the router
+- reg: Base address and size of interrupt router registers.
+- #interrupt-cells: Should be <2>. Defines how other nodes will be able to
+interact with this node. The meaning of the cells are
+   * First Cell: HW IRQ number.
+   * Second Cell: IRQ polarity (level high or low).
+
+Required properties of child nodes:
+- interrupt-controller: Identifies the node as an interrupt controller.
+- interrupts: Defines the hwirq associated with a domain and connected to
+the parent interrupt controller. The format of the interrupt specifier
+depends on the interrupt parent controller.
+
+Optional properties:
+- interrupt-parent: pHandle of the parent interrupt controller, if not
+  inherited from the parent node.
+
+
+Example:
+
+See Documentation/devicetree/bindings/interrupt-controller/interrupts.txt and
+Documentation/devicetree/bindings/arm/gic.txt for further details.
+
+The following example declares a irqrouter with 128 inputs and 24 outputs,
+with registers @ 0x6F800 and connected to the GIC.
+The two child nodes define two irqdomains, one connected to GIC input 2
+(hwirq=2, level=high), and ther other connected to GIC input 3 (hwirq=3,
+level=low)
+
+   irq_router: irq_router@6f800 {
+compatible = "sigma,smp87xx-irqrouter";
+reg = <0x6f800 0x800>;
+interrupt-controller;
+interrupt-parent = <&gic>;
+inputs = <128>;
+outputs = <24>;
+
+irq0: irqdomain0@parentirq2 {
+  interrupt-controller;
+  #interrupt-cells = <2>;
+  interrupts = ;
+};
+
+irq1: irqdomain1@parentirq3 {
+  interrupt-controller;
+  #interrupt-cells = <2>;
+  interrupts = ;
+};
+   };
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index 7451245..703a2b5 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_ARCH_NSPIRE) += irq-zevio.o
 obj-$(CONFIG_ARCH_VT8500)  += irq-vt8500.o
 obj-$(CONFIG_ST_IR

Re: Using irq-crossbar.c

2016-06-21 Thread Sebastian Frias
Hi Marc,

On 06/21/2016 02:41 PM, Marc Zyngier wrote:
>> Ok, so after discussing with some HW engineers, they said that even
>> if this is a pure router and cannot latch by itself, since the
>> devices themselves latch their IRQ output, reading the 4x32bit RAW
>> status registers could work as well, that means that if there are
>> more than 24 devices, some could share IRQs, right?
> 
> As mentioned earlier, this only works for level interrupts. If you
> enforce this, this is OK. I assume that you also have a way to mask
> these interrupts, right?

Yes, that's what the bit 31 does (see tangox_setup_irq_route() function in the 
code I sent).

For reference, here's the documentation of the mux:


CPU block interrupt interface is now 32bits.
The 24 first interrupt bits are generated from the system interrupts and the 8 
msb interrupts are cpu local interrupts :

IRQs [23:0] tango system irqs.
IRQs [27:24] CPU core cross trigger interface interrupt (1 per core).
IRQs [31:28] CPU core PMU (performance unit) interrupt (1 per core). 

The 24 lsb interrupts are generated through a new interrupt map module that 
maps the tango 128 interrupts to those 24 interrupts.
For each of the 128 input system interrupt, one register is dedicated to 
program the destination interrupt among the 24 available.
The mapper is configured as follows, starting at address (0x6f800) :

offset namedescription
0x000  irq_in_0_cfg"en"=bit[31]; "inv"=bit[16]; "dest"=bits[4:0]
0x004  irq_in_1_cfg"en"=bit[31]; "inv"=bit[16]; "dest"=bits[4:0]
.
.
.
0x1FC  irq_in_127_cfg  "en"=bit[31]; "inv"=bit[16]; "dest"=bits[4:0]
0x200  soft_irq_cfg"enable"=bits[15:0]
0x204  soft_irq_map0   "map3"=bits[28:24]; "map2"=bits[20:16]; 
"map1"=bits[12:8]; "map0"=bits[4:0]
0x208  soft_irq_map1   "map3"=bits[28:24]; "map2"=bits[20:16]; 
"map1"=bits[12:8]; "map0"=bits[4:0]
0x20C  soft_irq_map2   "map3"=bits[28:24]; "map2"=bits[20:16]; 
"map1"=bits[12:8]; "map0"=bits[4:0]
0x210  soft_irq_map3   "map3"=bits[28:24]; "map2"=bits[20:16]; 
"map1"=bits[12:8]; "map0"=bits[4:0]
0x214  soft_irq_set"set"=bits[15:0]
0x218  soft_irq_clear  "clear"=bits[15:0]
0x21C  read_cpu_irq"cpu_block_irq"=bits[23:0]
0x220  read_sys_irq0   "system_irq"=bits[31:0]; (irqs: 0->31)
0x224  read_sys_irq1   "system_irq"=bits[31:0]; (irqs: 32->63)
0x228  read_sys_irq2   "system_irq"=bits[31:0]; (irqs: 64->95)
0x22C  read_sys_irq3   "system_irq"=bits[31:0]; (irqs: 96->127)

irq_in_N_cfg   : input N mapping :
- dest bits[4:0]=> set destination interrupt among the 24 output 
interrupts. (if multiple inputs are mapped to the same output, result is an OR 
of the inputs).
- inv bit[16]   => if set, inverts input interrupt polarity (active at 0).
- en bit[31]=> enable interrupt. Acts like a mask on the input 
interrupt. 
soft_irq   : this module supports up to 16 software interrupts.
- enable bits[15:0] => enable usage of software IRQs (SIRQ), 1 bit per SIRQ. 
soft_irq_mapN  : For each of the 16 soft IRQ (SIRQ), map them in out IRQ[23:0] 
vector.
- mapN  => 5 bits to select where to connect the SIRQ among the 23 
bits output IRQ. (if multiple SIRQ are mapped to the same output IRQ, result is 
an OR of those signals). 
soft_irq_set   : 16bits, write 1 bit at one set the corresponding SIRQ. Read 
returns the software SIRQ vector value.
soft_irq_clear : 16bits, write 1 bit at one clear the corresponding software 
SIRQ. Read returns the software SIRQ vector value.
read_cpu_irq   : 24bits, returns output IRQ value (IRQs connected to the ARM 
cluster).
read_sys_irqN  : 32bits, returns input system IRQ value before mapping. 


> 
>> Two questions then:
>> a) let's say we need to share some of the IRQs, which APIs should be used?
> 
> The usual
> irq_set_chained_handler_and_data()/chained_irq_enter()/chained_irq_exit().

Ok, thanks.

At first I thought that I could modify tangox_allocate_gic_irq() so that when 
running out of IRQ lines going to the GIC, it:
- reserved the last line of the GIC with:

irq = fwspec.param[1];
err = irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec);

- then, it could add an irqchip to the domain with:

err = irq_alloc_domain_generic_chips(domain, 128, 1, node->name, 
handle_level_irq, 0, 0, 0);
for (i = 0; i < 4; i++) {
   gc = irq_get_domain_generic_chip(domain, i * 32);
   tangox_irq_init_chip(gc, i * IRQ_CTL_HI, i * EDGE_CTL_HI);
}

irq_set_chained_handler(irq, tangox_irq_handler);
irq_set_handler_data(irq, domain);

Not sure if that makes sense (it's hard to get a clear understanding of all 
these APIs and their possible interactions)

> 
>> b) some people have been asking about IRQ affinity, they have not
>> been clear about it, but I suppose maybe they want to redistribute
>> the IRQs. In this case, when using IRQ sharing a device may go from
>> sharing an IRQ line to an exclusive line or viceversa, right? Does
>> Linux handles that on its own or there's some API to call as well?
> 
> You need to i

Re: Using irq-crossbar.c

2016-06-21 Thread Sebastian Frias
Hi Marc,

On 06/21/2016 12:18 PM, Marc Zyngier wrote:
>> Since irq-tango_v2.c is similar to irq-crossbar.c from TI (since it
>> is based on it), I was wondering what is the policy or recommendation
>> in such cases?
>> Should I attempt to merge the code (mainly the way to set up the
>> registers) on irq-crossbar.c or should we consider irq-tango_v2.c to
>> live its own life?
> 
> If the HW is significantly different, I'd rather have a separate driver.
> We can always share some things later on by having a small library of
> "stuff".

I'd say it is very similar. Most of the changes I did were done to understand 
how it worked.
However, it may end up being different if we use cascaded interrupts.

> 
>> NOTE: IMHO, irq-crossbar.c could benefit from clearer names for some
>> DT properties, because "max_irqs" and "max-crossbar-sources" are not
>> straight forward names for "mux_outputs" and "mux_inputs"
>> (respectively)
> 
> Maybe, but this ship has sailed a long time ago. This is an ABI now, and
> it is not going to change unless proven to be broken. On the other hand,
> you can name your own properties as you see fit.

Ok.

> 
>> NOTE2: current irq-tango_v2.c code still has a 24 IRQ limitation
>> since it is not using any API that would allow it to setup IRQ
>> sharing.
> 
> Unless you limit your mux level interrupts only, I cannot see how you
> could deal with cascaded interrupts. By the time you receive an edge,
> the line will have dropped, and you won't be able to identify the source
> interrupt.

Yes, cascaded interrupts would be limited to level only.

By the way, did you see my other questions? (copy/pasted here for convenience):


Ok, so after discussing with some HW engineers, they said that even if this is 
a pure router and cannot latch by itself, since the devices themselves latch 
their IRQ output, reading the 4x32bit RAW status registers could work as well, 
that means that if there are more than 24 devices, some could share IRQs, right?

Two questions then:
a) let's say we need to share some of the IRQs, which APIs should be used?

b) some people have been asking about IRQ affinity, they have not been clear 
about it, but I suppose maybe they want to redistribute the IRQs.
In this case, when using IRQ sharing a device may go from sharing an IRQ line 
to an exclusive line or viceversa, right?
Does Linux handles that on its own or there's some API to call as well?


About a) I did not find any driver that uses irq_domain_add_linear() and 
irq_domain_add_hierarchy() but maybe I'm not approaching the problem from the 
right angle.

Best regards,

Sebastian



Re: Using irq-crossbar.c

2016-06-16 Thread Sebastian Frias
Hi Marc,

On 06/14/2016 06:39 PM, Sebastian Frias wrote:
> On 06/14/2016 06:37 PM, Sebastian Frias wrote:
>>>>> Also, without seeing the code,
>>>>> it is pretty difficult to make any meaningful comment...
>>>>
>>>> Base code is either 4.7rc1 or 4.4.
>>>> The irq-crossbar code is not much different from TI, but you can find it 
>>>> attached.
>>>
>>> Please post it separately (and inline), the email client I have here
>>> makes it hard to review attached patches.
>>
>> Ok, I'll post it in a separate email and inline.
>>
> 
> Here it goes:
> 
> 

 

> IRQCHIP_DECLARE(tangox_intc, "sigma,smp-irq-mux", tangox_of_irq_mux_init);
> 
> 

I have tested the code, and aside from the missing #interrupt-cells in the DT 
that you pointed out, it seems it is working (devices using IRQ appear 
functional), here's some log:

tangox_irq_mux_domain_translate(): domain 0xcf805000
tangox_irq_mux_domain_translate(): hwirq 1 (0x1) type 4 (0x4)
tangox_irq_mux_domain_alloc(): domain 0xcf805000, virq 18 (0x12) nr_irqs 1
tangox_allocate_gic_irq(): domain 0xcf805000, virq 18 (0x12) hwirq 1 (0x1)
tangox_setup_irq_route(): route irq  1 (@ 0xf006f804) => irq 23
tangox_irq_mux_domain_translate(): domain 0xcf805000
tangox_irq_mux_domain_translate(): hwirq 38 (0x26) type 4 (0x4)
tangox_irq_mux_domain_alloc(): domain 0xcf805000, virq 19 (0x13) nr_irqs 1
tangox_allocate_gic_irq(): domain 0xcf805000, virq 19 (0x13) hwirq 38 (0x26)
tangox_setup_irq_route(): route irq 38 (@ 0xf006f898) => irq 22
tangox_irq_mux_domain_translate(): domain 0xcf805000
tangox_irq_mux_domain_translate(): hwirq 67 (0x43) type 4 (0x4)
tangox_irq_mux_domain_alloc(): domain 0xcf805000, virq 20 (0x14) nr_irqs 1
tangox_allocate_gic_irq(): domain 0xcf805000, virq 20 (0x14) hwirq 67 (0x43)
tangox_setup_irq_route(): route irq 67 (@ 0xf006f90c) => irq 21

Since irq-tango_v2.c is similar to irq-crossbar.c from TI (since it is based on 
it), I was wondering what is the policy or recommendation in such cases?
Should I attempt to merge the code (mainly the way to set up the registers) on 
irq-crossbar.c or should we consider irq-tango_v2.c to live its own life?

NOTE: IMHO, irq-crossbar.c could benefit from clearer names for some DT 
properties, because "max_irqs" and "max-crossbar-sources" are not straight 
forward names for "mux_outputs" and "mux_inputs" (respectively)

NOTE2: current irq-tango_v2.c code still has a 24 IRQ limitation since it is 
not using any API that would allow it to setup IRQ sharing.

Thanks in advance.
Best regards,

Sebastian


Re: Using irq-crossbar.c

2016-06-14 Thread Sebastian Frias
On 06/14/2016 06:37 PM, Sebastian Frias wrote:
>>>> Also, without seeing the code,
>>>> it is pretty difficult to make any meaningful comment...
>>>
>>> Base code is either 4.7rc1 or 4.4.
>>> The irq-crossbar code is not much different from TI, but you can find it 
>>> attached.
>>
>> Please post it separately (and inline), the email client I have here
>> makes it hard to review attached patches.
> 
> Ok, I'll post it in a separate email and inline.
> 

Here it goes:


#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define IRQ_MUX_INPUT_LINES  (128)
#define IRQ_MUX_OUTPUT_LINES (24)
#define IRQ_FREE (-1)
#define IRQ_RESERVED (-2)


/**
 * struct tangox_irq_mux : irq mux private driver data
 * @lock: spinlock serializing access to @irq_map
 * @mux_inputs:  inputs (irq lines entering) the mux
 * @mux_outputs: outputs (irq lines exiting) the mux (connected to the GIC)
 * @irq_map: irq input->output map
 * @reg_base: mux base address
 */
struct tangox_irq_mux {
raw_spinlock_t lock;
uint mux_inputs;
uint mux_outputs;
uint *irq_map;
void __iomem *reg_base;
};


#define DBGLOG(__format, ...)   \
do {\
pr_info("[%s:%d] %s(): " __format, __FILE__, __LINE__, 
__FUNCTION__ , ##__VA_ARGS__); \
} while (0)


static inline u32 intc_readl(int address)
{
u32 value = readl_relaxed((void __iomem *)address);
//DBGLOG("read 0x%x @ 0x%x\n", value, address);
return value;
}

static inline void intc_writel(int value, int address)
{
//DBGLOG("write 0x%x @ 0x%x\n", value, address);
writel_relaxed(value, (void __iomem *)address);
}

static inline void tangox_setup_irq_route(struct tangox_irq_mux 
*irq_mux_context, int irq_in, int irq_out)
{
u32 value = irq_out;
u32 offset = (irq_in * 4);
u32 address = irq_mux_context->reg_base + offset;

DBGLOG("route irq %2d (@ 0x%08x) => irq %2d\n", irq_in, address, value);

if (value)
value |= 0x8000;

intc_writel(value, address);
}


static int tangox_allocate_gic_irq(struct irq_domain *domain,
   unsigned virq,
   irq_hw_number_t hwirq)
{
struct tangox_irq_mux *irq_mux_context = domain->host_data;
struct irq_fwspec fwspec;
int i;
int err;

DBGLOG("domain 0x%p, virq %d (0x%x) hwirq %d (0x%x)\n", domain, virq, 
virq, hwirq, hwirq);

if (!irq_domain_get_of_node(domain->parent))
return -EINVAL;

raw_spin_lock(&(irq_mux_context->lock));
for (i = irq_mux_context->mux_outputs - 1; i >= 0; i--) {
if (irq_mux_context->irq_map[i] == IRQ_FREE) {
irq_mux_context->irq_map[i] = hwirq;
break;
}
}
raw_spin_unlock(&(irq_mux_context->lock));

if (i < 0)
return -ENODEV;

fwspec.fwnode = domain->parent->fwnode;
fwspec.param_count = 3;
fwspec.param[0] = 0;/* SPI */
fwspec.param[1] = i;
fwspec.param[2] = IRQ_TYPE_LEVEL_HIGH;

err = irq_domain_alloc_irqs_parent(domain, virq, 1, &fwspec);
if (err)
irq_mux_context->irq_map[i] = IRQ_FREE;
else
tangox_setup_irq_route(irq_mux_context, hwirq, i);

return err;
}

static struct irq_chip mux_chip = {
.name   = "CBAR",
.irq_eoi= irq_chip_eoi_parent,
.irq_mask   = irq_chip_mask_parent,
.irq_unmask = irq_chip_unmask_parent,
.irq_retrigger  = irq_chip_retrigger_hierarchy,
.irq_set_type   = irq_chip_set_type_parent,
.flags  = IRQCHIP_MASK_ON_SUSPEND |
  IRQCHIP_SKIP_SET_WAKE,
#ifdef CONFIG_SMP
.irq_set_affinity   = irq_chip_set_affinity_parent,
#endif
};

/**
 * tangox_irq_mux_domain_alloc - map/reserve a mux<->irq connection
 * @domain: domain of irq to map
 * @virq: virq number
 * @nr_irqs: number of irqs to reserve
 *
 */
static int tangox_irq_mux_domain_alloc(struct irq_domain *domain,
   unsigned int virq,
   unsigned int nr_irqs,
   void *data)
{
struct tangox_irq_mux *irq_mux_context = domain->host_data;
struct irq_fwspec *fwspec = data;
irq_hw_number_t hwirq;
int i;

DBGLOG("domain 0x%p, virq %d (0x%x) nr_irqs %d\n", do

Re: Using irq-crossbar.c

2016-06-14 Thread Sebastian Frias
Hi Marc,

On 06/13/2016 06:24 PM, Marc Zyngier wrote:
>> My understanding of "hierarchical irq domains" is that it is useful
>> when there are multiple stacked interrupt controllers. Also, the
>> documentation says "the majority of drivers should use the linear
>> map" (as opposed to the hierarchical one).
> 
> The "linear map" to be opposed to the "tree", not to the hierarchy.
> Hierarchies themselves can be built out most domain type (only the
> underlying data structure changes).

Thanks for the clarification.

> 
>> Maybe the definition of "interrupt controller" could benefit from
>> some clarification, but my understanding is that, in our case, the
>> GIC is the only interrupt controller (that's where the interrupt type
>> must be set active_high/active_low/edge, etc.), in front of it, it
>> happens to be a crossbar, that happens to be programmable, and that
>> can be used to map any 128 line into any of 24 lines of the GIC
>> (actually it is a many-to-many router, without any latching nor edge
>> detection functionality)
> 
> An interrupt controller is absolutely *anything* that is on the
> interrupt path, unless it is absolutely transparent, invisible to
> software, and doesn't require any form of configuration. 

Well, one could imagine that this many-to-many router could be pre-configured, 
and thus not require any form of configuration from Linux, yet, somehow the 
resulting mapping needs to be communicated to Linux, right?
What APIs calls should be used?

>Your own
> definition of an interrupt controller is way too restrictive.

I see, thanks for the clarification.

> 
>> Obviously, when the DT says that ethernet driver uses IRQ=120 (for
>> example), the crossbar must be setup to route line 120 to one of the
>> 24 lines going to the GIC. So a minimum of interaction between the
>> GIC driver (and/or the Linux IRQ framework) and the driver
>> programming the crossbar is required, and that's what we are trying
>> to achieve.
>>
>> Does that makes sense?
> 
> Maybe you and Mason should get together and decide what you want to
> support. Because you seem to have diverging requirements (Mason
> suggesting the exact opposite over the weekend).

IIUC what you refer to, when Mason said that we could route all 128 lines to a 
single GIC line he was talking about a first order approximation.

>>
>> That's the last log (it's stuck there) and I was asking how/where to enable 
>> more logs to be able to debug this.
>>
>> Or there are no standardised logs and every person has to come up
>> with its own debug logs?
> 
> We have basic tracing in the irqdomain layer, 

I followed this https://www.kernel.org/doc/local/pr_debug.txt for 
kernel/irq/irqdomain.c

Do you want the whole boot log? Or just the snippets from irqdomain.c and those 
from our driver?

>and you can then
> instrument your own driver.
> 
>>
>> In the meanwhile, and in case irq-crossbar is not the good example for our 
>> case, would it be possible to get some guidance, examples, tips, on how to 
>> write a driver like the one described? ie:
>>
>> [0..127] IRQ inputs => BIG_MUX => [0..23] outputs => [0..23] GIC inputs
>>
>> BIG_MUX is a many-to-many router:
>> - 128x32bit registers to setup a route between any of the 128 input
>> (IRQ_dev) and any of the 24 outputs (IRQ_gic)
>> -   4x32bit registers that read the RAW status of each of the 128
>> lines (no latching nor edge detection) not sure how useful is to read
>> such RAW status, because (naive hypothesis follows) Linux's IRQ
>> framework could remember which IRQ_dev lines are routed to which
>> IRQ_gic, and thus when handling IRQ_gic(x), Linux could just ask the
>> drivers tied to it to check for interruptions.
>>
> 
> OK, so this is definitely a pure router, and the lack of latch makes it
> completely unsuitable for a a cascaded interrupt controller. At least,
> we've managed to establish that this thing will never be able to handle
> more than 24 devices in a sane way. So let's forget about Mason's idea
> of cascading everything to a single output line, and let's focus on your
> initial idea of having something similar to TI's crossbar, which is a
> much saner approach.

Ok, so after discussing with some HW engineers, they said that even if this is 
a pure router and cannot latch by itself, since the devices themselves latch 
their IRQ output, reading the 4x32bit RAW status registers could work as well, 
that means that if there are more than 24 devices, some could share IRQs, right?

Two questions then:
a) let's say we need to share some of the IRQs, which APIs should be used?

b) some people have been asking about IRQ affinity, they have not been clear 
about it, but I suppose maybe they want to redistribute the IRQs.
In this case, when using IRQ sharing a device may go from sharing an IRQ line 
to an exclusive line or viceversa, right?
Does Linux handles that on its own or there's some API to call as well?

> 
>>
>>> Also, without seeing the code,
>>> it is pretty difficult to make any 

Re: Using irq-crossbar.c

2016-06-13 Thread Sebastian Frias
Hi Marc,

Thanks for your reply, please find my comments below.

On 06/10/2016 06:05 PM, Marc Zyngier wrote:
> On 10/06/16 16:37, Sebastian Frias wrote:
>> Hi,
>>
>> We are trying to write a driver for an interrupt controller (actually
>> more of a crossbar) for an ARM-based SoC. This IRQ crossbar has 128
>> inputs and 24 outputs, the outputs are connected directly to the
>> GIC. The idea is that the GIC handles everything, and just request a
>> mapping from an IRQ number (0...127, from a device's DT entry) into
>> one of its 24 input lines.
> 
> "Just request a mapping...".
> 
>> By looking at current code (4.7-rc1) there seems to be a driver
>> (drivers/irqchip/irq-crossbar.c) that provides similar
>> functionality. The driver uses hierarchical irq domains (since commit
>> 783d31863fb8 "irqchip: crossbar: Convert dra7 crossbar to stacked
>> domains") which we believe we don't need because the only controller
>> is the GIC.
> 
> So you need it, but you don't need it? The GIC may be the only interrupt
> controller with which software interacts when the interrupt occurs, but
> the crossbar does play a major role in *routing* the interrupt to the
> right GIC pin.

My understanding of "hierarchical irq domains" is that it is useful when there 
are multiple stacked interrupt controllers.
Also, the documentation says "the majority of drivers should use the linear 
map" (as opposed to the hierarchical one).

Maybe the definition of "interrupt controller" could benefit from some 
clarification, but my understanding is that, in our case, the GIC is the only 
interrupt controller (that's where the interrupt type must be set 
active_high/active_low/edge, etc.), in front of it, it happens to be a 
crossbar, that happens to be programmable, and that can be used to map any 128 
line into any of 24 lines of the GIC (actually it is a many-to-many router, 
without any latching nor edge detection functionality)

Obviously, when the DT says that ethernet driver uses IRQ=120 (for example), 
the crossbar must be setup to route line 120 to one of the 24 lines going to 
the GIC.
So a minimum of interaction between the GIC driver (and/or the Linux IRQ 
framework) and the driver programming the crossbar is required, and that's what 
we are trying to achieve.

Does that makes sense?

> 
>> However the API used previously, register_routable_domain_ops(), was
>> removed with commit a5561c3e845c "irqchip: gic: Get rid of routable
>> domain".
> 
> And every day, I thank $DEITY for having delivered us from this evil.
> Really. And it wasn't much of an API. It was the son of a hack, bolted
> on the side of another hack. Unmaintainable, getting in the way. I had
> much fun slaughtering it! ;-)
> 
>> Trying to use the driver with hierarchical domains (after
>> modifications for our SoC), results on the kernel being blocked at
>> some point:
>>
>> [0.041524] ThumbEE CPU extension supported.
>> [0.041589] Registering SWP/SWPB emulation handler
>> [0.052022] Freeing unused kernel memory: 12364K (c029b000 - c0eae000)
>> [0.074084] random: dbus-uuidgen urandom read with 0 bits of entropy 
>> available
> 
> Sorry, but that's not much of a log. Anything related to interrupts, maybe?

That's the last log (it's stuck there) and I was asking how/where to enable 
more logs to be able to debug this.

Or there are no standardised logs and every person has to come up with its own 
debug logs?

> 
>> We've put logs on the different domain_ops calls (alloc, free,
>> translate) but they are not called, even if the DT is supposed to
>> tell devices to take interrupts from this controller (*).
> 
> At all? Nobody is talking to the GIC?
> 
>> Do you have suggestions on what APIs should be used, further
>> reading/examples and/or pointers on how debug this (logs to enable,
>> things to look for, etc.)?
> 
> You could start with posting actual logs of an interrupt being
> requested, as well as perform some basic tracing of the various
> callbacks into the irqdomain and irqchip layers, all the way down to the
> interrupt controllers (note the plural). 

Ok, thanks.

In the meanwhile, and in case irq-crossbar is not the good example for our 
case, would it be possible to get some guidance, examples, tips, on how to 
write a driver like the one described? ie:

[0..127] IRQ inputs => BIG_MUX => [0..23] outputs => [0..23] GIC inputs

BIG_MUX is a many-to-many router:
- 128x32bit registers to setup a route between any of the 128 input (IRQ_dev) 
and any of the 24 outputs (IRQ_gic)
-   4x32bit registers that read the RAW status of each of the 128 lines (no 
latch

Re: Using irq-crossbar.c

2016-06-13 Thread Sebastian Frias
Hi Marc,

Thanks for your input, please find my comments below.

On 06/11/2016 11:58 AM, Marc Zyngier wrote:
>> I think Sebastian is even more baffled by the DT mess
>> (sorry, intricacies) than I am.
> 
> This mess is what has saved us from the apocalypse 5 years ago, and
> describing a complex system is not easy (what a surprise...). If you
> just want to apply recipes without understanding the underlying
> constraints, you're in for a lot of pain.

Nobody said we wanted a recipe, but it would help to have examples of each of 
the configurations supported.
Indeed, there is effort in writing the DT bindings and even 
Documentation/IRQ-domain.txt yet I did not see example drivers for the 
different configurations.
Also, it is difficult to know what is the recommended way, since old APIs are 
kept for a while.

APIs are usually extended/designed around specific cases, so there must be 
examples, but they are not easy to find.

> 
>> The base file he was referring to is:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/tango4-common.dtsi
> 
> I know which file that is, it is mentioned in the diff. I was merely
> trying to point out the glaring mistakes that could be enough for a
> interrupt controller hierarchy to be completely non-functional:
> 
> - Your crossbar doesn't have a #interrupt-cells property. How do you
>   expect the interrupt specifiers to be interpreted?

Thanks for the pointer, actually, after adding "#interrupt-cells = <3>;" I can 
see the driver getting requests for domain maps and xlates :-)

> - You've changed the default interrupt controller to be your crossbar.
>   Which means that all the sub-nodes are inheriting it. Have you
>   checked that this was valid for all of these nodes?

What do you mean with "valid for all these nodes"? All HW irq lines go to the 
crossbar.

> 
> And as it has been pointed out before, you seem to be reusing an
> existing driver. Do you know for sure that the usage constraints are
> similar?

It is not being reused as is, I just thought (incorrectly apparently) that it 
was a good example for our case.

Best regards,

Sebastian


Re: Using irq-crossbar.c

2016-06-13 Thread Sebastian Frias
Hi Lennart,

Thanks for your input, please find my comments below.

On 06/13/2016 04:04 PM, Lennart Sorensen wrote:
> On Sat, Jun 11, 2016 at 05:37:51PM +0200, Mason wrote:
>> Only the name of the file was provided, not the path. I was not aware
>> that you already knew where to find it. I made no claim whatsoever on
>> the implementation. In fact, I agree with everything Lennart wrote.
> 
> The way the TI irq crossbar works is:
> 
> You have say 200 interrupt sources in the system.  You have some
> coprocessor (say PRU0) that you want to handle UART7, so you tell the
> crossbar: Please map the IRQ from UART7 (which might be input 146 on the
> crossbar) to IRQ 10 on the PRU0 processor.  Now whenever the interrupt
> from UART7 fires, PRU0 will see it as IRQ10.  There are no interrupt
> registers to check in the crossbar, it is purely virtual wires for
> routing interrupts.  It just happens to be an enourmouse and very very
> flexible set of wires.
> 

Ok, thanks for the information.

> If on the other hand you have 128 inputs and 24 outputs and you want
> all 128 inputs to work, and you have 24 128bit registers in the device
> to tell you which of the inputs fired to cause a given output to fire,
> then you in fact have an interrupt controller, not a crossbar. 

Actually we have 128 inputs and 24 outputs, the 24 outputs go straight to the 
GIC.
The HW block is a many-to-many router.
There are 128 32bit registers which specify, for each of the corresponding 128 
inputs, to which of the 24 outputs it would be routed to.

There are 4 32bit registers that can show the RAW status of the 128 inputs, but 
they do not latch on the inputs.
That's why our understanding is that on Linux terms it is not an interrupt 
controller, but just a many-to-many mux, the only real interrupt-controller 
(where one can set if the line is active high or low for example) is the GIC.

> This would
> be much more similar to how the original x86 design has an i8259 connected
> with 8 inputs and 1 output to an input on a second i8259 (the output of
> the secondary chip connects to IRQ2 on the primary).  You actually need
> an irq controller driver to go read the registers to determine which
> external interrupt fired and to assign a virtual IRQ number so that
> drivers can register for a given externel pin.  Cascaded interrupt
> controllers like this is perfectly common on lots of systems.

Thanks for the background on the i8259 and the cascaded interrupts.
However, our understanding is that it would only be required if more than 24 
devices request IRQ lines, in which case, some of them would have to share a 
single GIC IRQ line, right?
Shall we worry about that now?

> 
> Now of course you could cheat and simply declare that all the inputs
> that are mixed to a single output are in fact sharing a single interrupt
> (fine if they are level triggered, could be problematic if edge triggered,
> depending on how it works exactly).  Of course then when an interrupt
> happens, every driver that handles some device on a given shared interrupt
> will have the handler called in turn until everyone has had a change
> to handle the interrupt, at which point everyone should have stopped
> triggering it and it will clear itself, unless of course another event
> happened on one of the devices in which case another loop through everyone
> happens to handle the new events.  The loop will continue until the
> interrupt clears, as long as some of the handlers return that they in fact
> did do some work to handle the interrupt event.  If no one claims to have
> done something and the interrupt stays, you get the kernel killing the
> interrupt with a message about "Interrupt foo happened and nobody cared".

This is interesting.
We have one interrupt controller already upstream, drivers/irqchip/irq-tango.c, 
and our understanding is that it dispatches one IRQ at the time, see 
tangox_dispatch_irqs() function, is that what you are discussing?

Best regards,

Sebastian


Using irq-crossbar.c

2016-06-10 Thread Sebastian Frias
Hi,

We are trying to write a driver for an interrupt controller (actually more of a 
crossbar) for an ARM-based SoC.
This IRQ crossbar has 128 inputs and 24 outputs, the outputs are connected 
directly to the GIC.
The idea is that the GIC handles everything, and just request a mapping from an 
IRQ number (0...127, from a device's DT entry) into one of its 24 input lines.

By looking at current code (4.7-rc1) there seems to be a driver 
(drivers/irqchip/irq-crossbar.c) that provides similar functionality.
The driver uses hierarchical irq domains (since commit 783d31863fb8 "irqchip: 
crossbar: Convert dra7 crossbar to stacked domains") which we believe we don't 
need because the only controller is the GIC.
However the API used previously, register_routable_domain_ops(), was removed 
with commit a5561c3e845c "irqchip: gic: Get rid of routable domain".

Trying to use the driver with hierarchical domains (after modifications for our 
SoC), results on the kernel being blocked at some point:

[0.041524] ThumbEE CPU extension supported.
[0.041589] Registering SWP/SWPB emulation handler
[0.052022] Freeing unused kernel memory: 12364K (c029b000 - c0eae000)
[0.074084] random: dbus-uuidgen urandom read with 0 bits of entropy 
available

We've put logs on the different domain_ops calls (alloc, free, translate) but 
they are not called, even if the DT is supposed to tell devices to take 
interrupts from this controller (*).

Do you have suggestions on what APIs should be used, further reading/examples 
and/or pointers on how debug this (logs to enable, things to look for, etc.)?

Thanks in advance.
Best regards,

Sebastian


(*):
here's the diff on our DT:

--- tango4-common.dtsi  2016-06-10 16:23:08.244246017 +0200
+++ tangox_irqv2-common.dtsi2016-06-10 16:24:01.212588737 +0200
@@ -47,7 +47,7 @@
 
soc {
compatible = "simple-bus";
-   interrupt-parent = <&irq0>;
+   interrupt-parent = <&irq_mux>;
#address-cells = <1>;
#size-cells = <1>;
ranges;
@@ -75,7 +75,7 @@
uart: serial@10700 {
compatible = "ralink,rt2880-uart";
reg = <0x10700 0x30>;
-   interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
+   interrupts = ;
clock-frequency = <7372800>;
reg-shift = <2>;
};
@@ -83,10 +83,11 @@
eth0: ethernet@26000 {
compatible = "sigma,smp8734-ethernet";
reg = <0x26000 0x800>;
-   interrupts = <38 IRQ_TYPE_LEVEL_HIGH>;
+   interrupts = ;
clocks = <&clkgen 1>;
};
 
+#if 0
intc: interrupt-controller@6e000 {
compatible = "sigma,smp8642-intc";
reg = <0x6e000 0x400>;
@@ -117,5 +118,16 @@
interrupts = ;
};
};
+#else
+   irq_mux: irq_mux@6f800 {
+compatible = "sigma,smp-irq-mux";
+reg = <0x6f800 0x400>;
+interrupt-controller;
+interrupt-parent = <&gic>;
+irqs-reserved = <2 3 4 125 126 127>;
+   };
+
+#endif
+
};
 };


Re: [U-Boot] Sharing code between Linux and bootloader (U-boot) ?

2016-05-23 Thread Sebastian Frias
Hi Tom,

On 05/21/2016 03:41 AM, Tom Rini wrote:
> On Fri, May 20, 2016 at 04:28:23PM +0200, Sebastian Frias wrote:
> 
>> Hi,
>>
>> Some bootloaders (like U-boot) support several HW devices: serial,
>> network, NAND, USB, etc. most of which are also supported by Linux.
>>
>> So the question is: is code shared? I mean, I understand that the
>> drivers need to talk to the appropriate API, and such API could be
>> different between Linux and U-boot.
>> But putting that aside, would it be naive to imagine that some "core"
>> functionality could be shared? Or would that part be so small it is
>> not worth the effort?
>>
>> Since many companies use both, U-boot and Linux, I would figure they
>> try their best to optimize engineering resources and share code,
>> right?
>> In that case, I also wonder how do they share DT descriptions that
>> right now are being stored in the Linux kernel tree.
>>
>> We'd like to share code/DT for obvious reasons, what would you guys
>> suggest?
> 
> So, in all cases, Linux is always the primary.  

For drivers and DT?

>In some cases in U-Boot
> we port drivers over (NAND is a good example here). 

>From your message, I get that first you write the driver for Linux and then 
>port to U-boot, is that right?
I would have thought that the opposite way could be easier, since the U-boot 
driver could be simpler (maybe no DMA) w.r.t the one in Linux.

> In other cases,
> things are similar enough that it's having done it in one place it's
> easy enough to do it again in the other.
> 

So, basically people just do it again, duplicating code?

Best regards,

Sebastian


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-23 Thread Sebastian Frias
Hi Alan,

On 05/13/2016 05:41 PM, One Thousand Gnomes wrote:
>> My understanding is that there was a time when there was no overcommit at 
>> all.
>> If that's the case, understanding why overcommit was introduced would be 
>> helpful.
> 
> Linux always had overcommit.
> 
> The origin of overcommit is virtual memory for the most part. In a
> classic swapping system without VM the meaning of brk() and thus malloc()
> is that it allocates memory (or swap). Likewise this is true of fork()
> and stack extension.
> 
> In a virtual memory system these allocate _address space_. It does not
> become populated except by page faulting, copy on write and the like. It
> turns out that for most use cases on a virtual memory system we get huge
> amounts of page sharing or untouched space.
> 
> Historically Linux did guess based overcommit and I added no overcommit
> support way back when, along with 'anything is allowed' support for
> certain HPC use cases.
> 
> The beancounter patches combined with this made the entire setup
> completely robust but the beancounters never hit upstream although years
> later they became part of the basis of the cgroups.
> 
> You can sort of set a current Linux up for definitely no overcommit using
> cgroups and no overcommit settings. It works for most stuff although last
> I checked most graphics drivers were terminally broken (and not just to
> no overcommit but to the point you can remote DoS Linux boxes with a
> suitably constructed web page and chrome browser)
> 
> Alan
> 

Thanks for your comment, it certainly provides more clues and provided some 
history about the "overcommit" setting.
I will see if we can do what we want with cgroups.

Best regards,

Sebastian


Sharing code between Linux and bootloader (U-boot) ?

2016-05-20 Thread Sebastian Frias
Hi,

Some bootloaders (like U-boot) support several HW devices: serial, network, 
NAND, USB, etc. most of which are also supported by Linux.

So the question is: is code shared? I mean, I understand that the drivers need 
to talk to the appropriate API, and such API could be different between Linux 
and U-boot.
But putting that aside, would it be naive to imagine that some "core" 
functionality could be shared? Or would that part be so small it is not worth 
the effort?

Since many companies use both, U-boot and Linux, I would figure they try their 
best to optimize engineering resources and share code, right?
In that case, I also wonder how do they share DT descriptions that right now 
are being stored in the Linux kernel tree.

We'd like to share code/DT for obvious reasons, what would you guys suggest?

Best regards,

Sebastian


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Michal,

On 05/17/2016 10:16 PM, Michal Hocko wrote:
> On Tue 17-05-16 18:16:58, Sebastian Frias wrote:
> [...]
>> From reading Documentation/cgroup-v1/memory.txt (and from a few
>> replies here talking about cgroups), it looks like the OOM-killer is
>> still being actively discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes
>> in a given cgroup until the OOM situation is solved for that cgroup,
>> right?
> 
> It will be blocked waiting either for some external action which would
> result in OOM codition going away or any other charge release. You have
> to configure memcg for that though. The default behavior is to invoke
> the same OOM killer algorithm which is just reduced to tasks from the
> memcg (hierarchy).

Ok, I see, thanks!

> 
>> If that is right, it means that there is indeed a way to deal
>> with an OOM situation (stack expansion, COW failure, 'memory hog',
>> etc.) in a better way than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole
>> system behave as if it was inside a cgroup? (*)
> 
> No it is not. You have to realize that the system wide and the memcg OOM
> situations are quite different. There is usually quite some memory free
> when you hit the memcg OOM so the administrator can actually do
> something. 

Ok, so it works like the 5% reserved for 'root' on filesystems?

>The global OOM means there is _no_ memory at all. Many kernel
> operations will need some memory to do something useful. Let's say you
> would want to do an educated guess about who to kill - most proc APIs
> will need to allocate. And this is just a beginning. Things are getting
> really nasty when you get deeper and deeper. E.g. the OOM killer has to
> give the oom victim access to memory reserves so that the task can exit
> because that path needs to allocate as well. 

Really? I would have thought that once that SIGKILL is sent, the victim process 
is not expected to do anything else and thus its memory could be claimed 
immediately.
Or the OOM-killer is more of a OOM-terminator? (i.e.: sends SIGTERM)

>So even if you wanted to
> give userspace some chance to resolve the OOM situation you would either
> need some special API to tell "this process is really special and it can
> access memory reserves and it has an absolute priority etc." or have a
> in kernel fallback to do something or your system could lockup really
> easily.
> 

I see, so basically at least two cgroups would be needed, one reserved for 
handling the OOM situation through some API and another for the "rest of the 
system".
Basically just like the 5% reserved for 'root' on filesystems.
Do you think that would work?

Best regards,

Sebastian


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-18 Thread Sebastian Frias
Hi Austin,

On 05/17/2016 07:29 PM, Austin S. Hemmelgarn wrote:
>> I see the difference, your answer seems a bit like the one from Austin, 
>> basically:
>> - killing a process is a sort of kernel protection attempting to deal 
>> "automatically" with some situation, like deciding what is a 'memory hog', 
>> or what is 'in infinite loop', "usually" in a correct way.
>> It seems there's people who think its better to avoid having to take such 
>> decisions and/or they should be decided by the user, because "usually" != 
>> "always".
> FWIW, it's really easy to see what's using a lot of memory, it's impossible 
> to tell if something is stuck in an infinite loop without looking deep into 
> the process state and possibly even at the source code (and even then it can 
> be almost impossible to be certain).  This is why we have a OOM-Killer, and 
> not a infinite-loop-killer.
> 
> Again I reiterate, if a system is properly provisioned (that is, if you have 
> put in enough RAM and possibly swap space to do what you want to use it for), 
> the only reason the OOM-killer should be invoked is due to a bug. 

Are you sure that's the only possible reason?
I mean, what if somebody keeps opening tabs on Firefox?
If malloc() returned NULL maybe Firefox could say "hey, you have too many tabs 
open, please close some to free memory".

> The non-default overcommit options still have the same issues they just 
> change how and when they happen (overcommit=never will fire sooner, 
> overcommit=always will fire later), and also can impact memory allocation 
> performance (I have numbers somewhere that I can't find right now that 
> demonstrated that overcommit=never gave more deterministic and (on average) 
> marginally better malloc() performance, and simple logic would suggest that 
> overcommit=always would make malloc() perform better too).
>> And people who see that as a nice thing but complex thing to do.
>> In this thread we've tried to explain why this heuristic (and/or OOM-killer) 
>> is/was needed and/or its history, which has been very enlightening by the 
>> way.
>>
>> From reading Documentation/cgroup-v1/memory.txt (and from a few replies here 
>> talking about cgroups), it looks like the OOM-killer is still being actively 
>> discussed, well, there's also "cgroup-v2".
>> My understanding is that cgroup's memory control will pause processes in a 
>> given cgroup until the OOM situation is solved for that cgroup, right?
>> If that is right, it means that there is indeed a way to deal with an OOM 
>> situation (stack expansion, COW failure, 'memory hog', etc.) in a better way 
>> than the OOM-killer, right?
>> In which case, do you guys know if there is a way to make the whole system 
>> behave as if it was inside a cgroup? (*)
> No, not with the process freeze behavior, because getting the group running 
> again requires input from an external part of the system, which by definition 
> doesn't exist if the group is the entire system; 

Do you mean that it pauses all processes in the cgroup?
I thought it would pause on a case-by-case basis, like first process to reach 
the limit gets paused, and so on.

Honestly I thought it would work a bit like the filesystems, where 'root' 
usually has 5% reserved, so that a process (or processes) filling the disk does 
not disrupt the system to the point of preventing 'root' from performing 
administrative actions.

That makes me think, why is disk space handled differently than memory in this 
case? I mean, why is disk space exhaustion handled differently than memory 
exhaustion?
We could imagine that both resources are required for proper system and process 
operation, so if OOM-killer is there to attempt to keep the system working at 
all costs (even if that means sacrificing processes), why isn't there an 
OOFS-killer (out-of-free-space killer)?

>and, because our GUI isn't built into the kernel, we can't pause things and 
>pop up a little dialog asking the user what to do to resolve the issue.

:-) Yeah, I was thinking that could be handled with the cgroups' notification 
system + the reserved space (like on filesystems)
Maybe I was too optimistic (naive or just plain ignorant) about this.

Best regards,

Sebastian




Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-17 Thread Sebastian Frias
Hi Michal,

On 05/17/2016 10:57 AM, Michal Hocko wrote:
> On Tue 17-05-16 10:24:20, Sebastian Frias wrote:
> [...]
>>>> Also, under what conditions would copy-on-write fail?
>>>
>>> When you have no memory or swap pages free and you touch a COW page that
>>> is currently shared. At that point there is no resource to back to the
>>> copy so something must die - either the process doing the copy or
>>> something else.
>>
>> Exactly, and why does "killing something else" makes more sense (or
>> was chosen over) "killing the process doing the copy"?
> 
> Because that "something else" is usually a memory hog and so chances are
> that the out of memory situation will get resolved. If you kill "process
> doing the copy" then you might end up just not getting any memory back
> because that might be a little forked process which doesn't own all that
> much memory on its own. That would leave you in the oom situation for a
> long time until somebody actually sitting on some memory happens to ask
> for CoW... See the difference?
> 

I see the difference, your answer seems a bit like the one from Austin, 
basically:
- killing a process is a sort of kernel protection attempting to deal 
"automatically" with some situation, like deciding what is a 'memory hog', or 
what is 'in infinite loop', "usually" in a correct way.
It seems there's people who think its better to avoid having to take such 
decisions and/or they should be decided by the user, because "usually" != 
"always".
And people who see that as a nice thing but complex thing to do.
In this thread we've tried to explain why this heuristic (and/or OOM-killer) 
is/was needed and/or its history, which has been very enlightening by the way.

>From reading Documentation/cgroup-v1/memory.txt (and from a few replies here 
>talking about cgroups), it looks like the OOM-killer is still being actively 
>discussed, well, there's also "cgroup-v2".
My understanding is that cgroup's memory control will pause processes in a 
given cgroup until the OOM situation is solved for that cgroup, right?
If that is right, it means that there is indeed a way to deal with an OOM 
situation (stack expansion, COW failure, 'memory hog', etc.) in a better way 
than the OOM-killer, right?
In which case, do you guys know if there is a way to make the whole system 
behave as if it was inside a cgroup? (*)

Best regards,

Sebastian


(*): I tried setting up a simple test but failed, so I think I need more 
reading :-)


Re: [PATCH] mm: add config option to select the initial overcommit mode

2016-05-17 Thread Sebastian Frias
Hi Alan,

On 05/13/2016 05:43 PM, One Thousand Gnomes wrote:
>> But wouldn't those affect a given process at at time?
>> Does that means that the OOM-killer is woken up to kill process X when those 
>> situations arise on process Y?
> 
> Not sure I understand the question.

I'm sorry for the "at at time" typo.
What I meant was that situations you described "Stakc expansion failure is not 
reportable. Copy on write failure is not reportable and so on.", should affect 
one process at the time, in that case:
1) either process X with the COW failure happens could die
2) either random process Y dies so that COW failure on process X can be handled.

Do you know why was 2) chosen over 1)?

> 
>> Also, under what conditions would copy-on-write fail?
> 
> When you have no memory or swap pages free and you touch a COW page that
> is currently shared. At that point there is no resource to back to the
> copy so something must die - either the process doing the copy or
> something else.

Exactly, and why does "killing something else" makes more sense (or was chosen 
over) "killing the process doing the copy"?

Best regards,

Sebastian


  1   2   >