Re: [PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread kbuild test robot
Hi,

[auto build test WARNING on robh/for-next]
[also build test WARNING on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jongsung-Kim/clk-fixed-factor-add-optional-dt-binding-clock-flags/20160624-115201
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: microblaze-mmu_defconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=microblaze 

All warnings (new ones prefixed by >>):

   drivers/clk/clk-fixed-factor.c: In function 'of_fixed_factor_clk_setup':
>> drivers/clk/clk-fixed-factor.c:170:2: warning: passing argument 3 of 
>> 'of_property_read_u32' from incompatible pointer type
 of_property_read_u32(node, "clock-flags", );
 ^
   In file included from include/linux/clk-provider.h:15:0,
from drivers/clk/clk-fixed-factor.c:11:
   include/linux/of.h:916:19: note: expected 'u32 *' but argument is of type 
'long unsigned int *'
static inline int of_property_read_u32(const struct device_node *np,
  ^

vim +/of_property_read_u32 +170 drivers/clk/clk-fixed-factor.c

   154  u32 div, mult;
   155  
   156  if (of_property_read_u32(node, "clock-div", )) {
   157  pr_err("%s Fixed factor clock <%s> must have a 
clock-div property\n",
   158  __func__, node->name);
   159  return;
   160  }
   161  
   162  if (of_property_read_u32(node, "clock-mult", )) {
   163  pr_err("%s Fixed factor clock <%s> must have a 
clock-mult property\n",
   164  __func__, node->name);
   165  return;
   166  }
   167  
   168  of_property_read_string(node, "clock-output-names", _name);
   169  parent_name = of_clk_get_parent_name(node, 0);
 > 170  of_property_read_u32(node, "clock-flags", );
   171  
   172  clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 
flags,
   173  mult, div);
   174  if (!IS_ERR(clk))
   175  of_clk_add_provider(node, of_clk_src_simple_get, clk);
   176  }
   177  EXPORT_SYMBOL_GPL(of_fixed_factor_clk_setup);
   178  CLK_OF_DECLARE(fixed_factor_clk, "fixed-factor-clock",

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread kbuild test robot
Hi,

[auto build test WARNING on robh/for-next]
[also build test WARNING on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jongsung-Kim/clk-fixed-factor-add-optional-dt-binding-clock-flags/20160624-115201
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: microblaze-mmu_defconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=microblaze 

All warnings (new ones prefixed by >>):

   drivers/clk/clk-fixed-factor.c: In function 'of_fixed_factor_clk_setup':
>> drivers/clk/clk-fixed-factor.c:170:2: warning: passing argument 3 of 
>> 'of_property_read_u32' from incompatible pointer type
 of_property_read_u32(node, "clock-flags", );
 ^
   In file included from include/linux/clk-provider.h:15:0,
from drivers/clk/clk-fixed-factor.c:11:
   include/linux/of.h:916:19: note: expected 'u32 *' but argument is of type 
'long unsigned int *'
static inline int of_property_read_u32(const struct device_node *np,
  ^

vim +/of_property_read_u32 +170 drivers/clk/clk-fixed-factor.c

   154  u32 div, mult;
   155  
   156  if (of_property_read_u32(node, "clock-div", )) {
   157  pr_err("%s Fixed factor clock <%s> must have a 
clock-div property\n",
   158  __func__, node->name);
   159  return;
   160  }
   161  
   162  if (of_property_read_u32(node, "clock-mult", )) {
   163  pr_err("%s Fixed factor clock <%s> must have a 
clock-mult property\n",
   164  __func__, node->name);
   165  return;
   166  }
   167  
   168  of_property_read_string(node, "clock-output-names", _name);
   169  parent_name = of_clk_get_parent_name(node, 0);
 > 170  of_property_read_u32(node, "clock-flags", );
   171  
   172  clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 
flags,
   173  mult, div);
   174  if (!IS_ERR(clk))
   175  of_clk_add_provider(node, of_clk_src_simple_get, clk);
   176  }
   177  EXPORT_SYMBOL_GPL(of_fixed_factor_clk_setup);
   178  CLK_OF_DECLARE(fixed_factor_clk, "fixed-factor-clock",

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v5 2/2] [media] atmel-isc: DT binding for Image Sensor Controller driver

2016-06-23 Thread Wu, Songjun

Hi Rob,

Thank you for your comments.

On 6/20/2016 21:25, Rob Herring wrote:

On Fri, Jun 17, 2016 at 04:57:14PM +0800, Songjun Wu wrote:

DT binding documentation for ISC driver.

Signed-off-by: Songjun Wu 
---

Changes in v5:
- Add clock names.

Changes in v4:
- Remove the isc clock nodes.

Changes in v3:
- Remove the 'atmel,sensor-preferred'.
- Modify the isc clock node according to the Rob's remarks.

Changes in v2:
- Remove the unit address of the endpoint.
- Add the unit address to the clock node.
- Avoid using underscores in node names.
- Drop the "0x" in the unit address of the i2c node.
- Modify the description of 'atmel,sensor-preferred'.
- Add the description for the ISC internal clock.

 .../devicetree/bindings/media/atmel-isc.txt| 64 ++
 1 file changed, 64 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/atmel-isc.txt

diff --git a/Documentation/devicetree/bindings/media/atmel-isc.txt 
b/Documentation/devicetree/bindings/media/atmel-isc.txt
new file mode 100644
index 000..9558a77
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/atmel-isc.txt
@@ -0,0 +1,64 @@
+Atmel Image Sensor Controller (ISC)
+--
+
+Required properties for ISC:
+- compatible
+   Must be "atmel,sama5d2-isc".
+- reg
+   Physical base address and length of the registers set for the device.
+- interrupts
+   Should contain IRQ line for the ISC.
+- clocks
+   List of clock specifiers, corresponding to entries in
+   the clock-names property;
+   Please refer to clock-bindings.txt.
+- clock-names
+   Required elements: "hclock".


What about the 2 other clocks in the example?


The other clocks is optional, not required.
Do you have any suggestion?


+- #clock-cells
+   Should be 0.
+- clock-output-names
+   Should contain the name of the clock driving the sensor master clock.


State what the name is.


"isc-mck" will be added.


+- pinctrl-names, pinctrl-0
+   Please refer to pinctrl-bindings.txt.
+
+ISC supports a single port node with parallel bus. It should contain one
+'port' child node with child 'endpoint' node. Please refer to the bindings
+defined in Documentation/devicetree/bindings/media/video-interfaces.txt.
+
+Example:
+isc: isc@f0008000 {
+   compatible = "atmel,sama5d2-isc";
+   reg = <0xf0008000 0x4000>;
+   interrupts = <46 IRQ_TYPE_LEVEL_HIGH 5>;
+   clocks = <_clk>, <>, <_gclk>;
+   clock-names = "hclock", "iscck", "gck";
+   #clock-cells = <0>;
+   clock-output-names = "isc-mck";
+   pinctrl-names = "default";
+   pinctrl-0 = <_isc_base _isc_data_8bit _isc_data_9_10 
_isc_data_11_12>;
+
+   port {
+   isc_0: endpoint {
+   remote-endpoint = <_0>;
+   hsync-active = <1>;
+   vsync-active = <0>;
+   pclk-sample = <1>;
+   };
+   };
+};
+
+i2c1: i2c@fc028000 {
+   ov7740: camera@21 {
+   compatible = "ovti,ov7740";


Indentation is still wrong here...


Sorry, my mistake.
It should be fixed.


+   reg = <0x21>;
+   clocks = <>;
+   clock-names = "xvclk";
+   assigned-clocks = <>;
+   assigned-clock-rates = <2400>;
+
+   port {
+   ov7740_0: endpoint {
+   remote-endpoint = <_0>;
+   };
+   };
+};
--
2.7.4



Re: [PATCH v5 2/2] [media] atmel-isc: DT binding for Image Sensor Controller driver

2016-06-23 Thread Wu, Songjun

Hi Rob,

Thank you for your comments.

On 6/20/2016 21:25, Rob Herring wrote:

On Fri, Jun 17, 2016 at 04:57:14PM +0800, Songjun Wu wrote:

DT binding documentation for ISC driver.

Signed-off-by: Songjun Wu 
---

Changes in v5:
- Add clock names.

Changes in v4:
- Remove the isc clock nodes.

Changes in v3:
- Remove the 'atmel,sensor-preferred'.
- Modify the isc clock node according to the Rob's remarks.

Changes in v2:
- Remove the unit address of the endpoint.
- Add the unit address to the clock node.
- Avoid using underscores in node names.
- Drop the "0x" in the unit address of the i2c node.
- Modify the description of 'atmel,sensor-preferred'.
- Add the description for the ISC internal clock.

 .../devicetree/bindings/media/atmel-isc.txt| 64 ++
 1 file changed, 64 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/atmel-isc.txt

diff --git a/Documentation/devicetree/bindings/media/atmel-isc.txt 
b/Documentation/devicetree/bindings/media/atmel-isc.txt
new file mode 100644
index 000..9558a77
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/atmel-isc.txt
@@ -0,0 +1,64 @@
+Atmel Image Sensor Controller (ISC)
+--
+
+Required properties for ISC:
+- compatible
+   Must be "atmel,sama5d2-isc".
+- reg
+   Physical base address and length of the registers set for the device.
+- interrupts
+   Should contain IRQ line for the ISC.
+- clocks
+   List of clock specifiers, corresponding to entries in
+   the clock-names property;
+   Please refer to clock-bindings.txt.
+- clock-names
+   Required elements: "hclock".


What about the 2 other clocks in the example?


The other clocks is optional, not required.
Do you have any suggestion?


+- #clock-cells
+   Should be 0.
+- clock-output-names
+   Should contain the name of the clock driving the sensor master clock.


State what the name is.


"isc-mck" will be added.


+- pinctrl-names, pinctrl-0
+   Please refer to pinctrl-bindings.txt.
+
+ISC supports a single port node with parallel bus. It should contain one
+'port' child node with child 'endpoint' node. Please refer to the bindings
+defined in Documentation/devicetree/bindings/media/video-interfaces.txt.
+
+Example:
+isc: isc@f0008000 {
+   compatible = "atmel,sama5d2-isc";
+   reg = <0xf0008000 0x4000>;
+   interrupts = <46 IRQ_TYPE_LEVEL_HIGH 5>;
+   clocks = <_clk>, <>, <_gclk>;
+   clock-names = "hclock", "iscck", "gck";
+   #clock-cells = <0>;
+   clock-output-names = "isc-mck";
+   pinctrl-names = "default";
+   pinctrl-0 = <_isc_base _isc_data_8bit _isc_data_9_10 
_isc_data_11_12>;
+
+   port {
+   isc_0: endpoint {
+   remote-endpoint = <_0>;
+   hsync-active = <1>;
+   vsync-active = <0>;
+   pclk-sample = <1>;
+   };
+   };
+};
+
+i2c1: i2c@fc028000 {
+   ov7740: camera@21 {
+   compatible = "ovti,ov7740";


Indentation is still wrong here...


Sorry, my mistake.
It should be fixed.


+   reg = <0x21>;
+   clocks = <>;
+   clock-names = "xvclk";
+   assigned-clocks = <>;
+   assigned-clock-rates = <2400>;
+
+   port {
+   ov7740_0: endpoint {
+   remote-endpoint = <_0>;
+   };
+   };
+};
--
2.7.4



Re: [PATCH v3] Doc/memory-barriers: Add Korean translation

2016-06-23 Thread SeongJae Park
Hello, Byungchul,


I guess the review is ongoing yet and maybe it requires more days.  Can you let
me know your estimated time for the review if it doesn't bother you?


Thanks,
SeongJae Park

On Fri, Jun 17, 2016 at 3:24 PM, Minchan Kim  wrote:
> On Wed, Jun 15, 2016 at 03:47:34PM +0900, SeongJae Park wrote:
>> 2016-06-09 5:45 GMT+09:00 SeongJae Park :
>> > 2016-06-09 2:24 GMT+09:00 Paul E. McKenney :
>> >> On Wed, Jun 08, 2016 at 05:58:41PM +0900, SeongJae Park wrote:
>> >>> This commit adds Korean version of memory-barriers.txt document.  The
>> >>> header is refered to HOWTO Korean version.
>> >>>
>> >>> The translator, SeongJae Park, is interested in parallel programming and
>> >>> translating[1] a book[2] about the topic.
>> >>>
>> >>> The translation has started from Feb, 2016 and using a github public
>> >>> repository[3] to maintain the work.  The work is following[4] updates to
>> >>> the original document as well.
>> >>>
>> >>> Because the translator has knowledge about the topic and already
>> >>> following up the upstream changes, one would sure that this translation
>> >>> will keep reasonable quality and freshness.
>> >>>
>> >>> [1] 
>> >>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/perfbook.git/commit/FAQ.txt?id=edbfcdee0460
>> >>> [2] 
>> >>> https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
>> >>> [3] https://github.com/sjp38/linux.doc_trans_membarrier
>> >>> [4] 
>> >>> https://github.com/sjp38/linux.doc_trans_membarrier/commit/06bd12d390f164fd253b861fc3aa8006d6d19ed9
>> >>>
>> >>> Signed-off-by: SeongJae Park 
>> >>> Acked-by: David Howells 
>> >>> Signed-off-by: Parl E. McKenney 
>> >>> Acked-by: Minchan Kim 
>> >>
>> >> I cannot judge this, so I must defer to Minchan.  That said, the diffstat
>> >> since your version of April 18 is as follows:
>> >>
>> >>  memory-barriers.txt |  303 
>> >> ++--
>> >>  1 file changed, 175 insertions(+), 128 deletions(-)
>> >>
>> >> I cannot tell which mainline version these patches correspond to, but the
>> >> English version has this diffstat since v4.5:
>> >>
>> >>  memory-barriers.txt |  293 
>> >> +---
>> >>  1 file changed, 234 insertions(+), 59 deletions(-)
>> >>
>> >> So your level of change thus far seems plausible.
>> >
>> > I agree that Minchan deserves to judge that since he is the Korean 
>> > translations
>> > maintainer.
>>
>> Minchan, may I ask your opinion about this patch?
>
> Sorry for late response.
>
> I think it's really worth. That's why I support it. However, until now,
> I didn't review it in detail. Sorry about that and stuck with urgent
> works so I asked colleague byungchul who scheduler guy, he has an
> interest so He will review this patch soon.
>
> Hope it helps you.
>
> Thanks.
>


Re: [PATCH v3] Doc/memory-barriers: Add Korean translation

2016-06-23 Thread SeongJae Park
Hello, Byungchul,


I guess the review is ongoing yet and maybe it requires more days.  Can you let
me know your estimated time for the review if it doesn't bother you?


Thanks,
SeongJae Park

On Fri, Jun 17, 2016 at 3:24 PM, Minchan Kim  wrote:
> On Wed, Jun 15, 2016 at 03:47:34PM +0900, SeongJae Park wrote:
>> 2016-06-09 5:45 GMT+09:00 SeongJae Park :
>> > 2016-06-09 2:24 GMT+09:00 Paul E. McKenney :
>> >> On Wed, Jun 08, 2016 at 05:58:41PM +0900, SeongJae Park wrote:
>> >>> This commit adds Korean version of memory-barriers.txt document.  The
>> >>> header is refered to HOWTO Korean version.
>> >>>
>> >>> The translator, SeongJae Park, is interested in parallel programming and
>> >>> translating[1] a book[2] about the topic.
>> >>>
>> >>> The translation has started from Feb, 2016 and using a github public
>> >>> repository[3] to maintain the work.  The work is following[4] updates to
>> >>> the original document as well.
>> >>>
>> >>> Because the translator has knowledge about the topic and already
>> >>> following up the upstream changes, one would sure that this translation
>> >>> will keep reasonable quality and freshness.
>> >>>
>> >>> [1] 
>> >>> https://git.kernel.org/cgit/linux/kernel/git/paulmck/perfbook.git/commit/FAQ.txt?id=edbfcdee0460
>> >>> [2] 
>> >>> https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
>> >>> [3] https://github.com/sjp38/linux.doc_trans_membarrier
>> >>> [4] 
>> >>> https://github.com/sjp38/linux.doc_trans_membarrier/commit/06bd12d390f164fd253b861fc3aa8006d6d19ed9
>> >>>
>> >>> Signed-off-by: SeongJae Park 
>> >>> Acked-by: David Howells 
>> >>> Signed-off-by: Parl E. McKenney 
>> >>> Acked-by: Minchan Kim 
>> >>
>> >> I cannot judge this, so I must defer to Minchan.  That said, the diffstat
>> >> since your version of April 18 is as follows:
>> >>
>> >>  memory-barriers.txt |  303 
>> >> ++--
>> >>  1 file changed, 175 insertions(+), 128 deletions(-)
>> >>
>> >> I cannot tell which mainline version these patches correspond to, but the
>> >> English version has this diffstat since v4.5:
>> >>
>> >>  memory-barriers.txt |  293 
>> >> +---
>> >>  1 file changed, 234 insertions(+), 59 deletions(-)
>> >>
>> >> So your level of change thus far seems plausible.
>> >
>> > I agree that Minchan deserves to judge that since he is the Korean 
>> > translations
>> > maintainer.
>>
>> Minchan, may I ask your opinion about this patch?
>
> Sorry for late response.
>
> I think it's really worth. That's why I support it. However, until now,
> I didn't review it in detail. Sorry about that and stuck with urgent
> works so I asked colleague byungchul who scheduler guy, he has an
> interest so He will review this patch soon.
>
> Hope it helps you.
>
> Thanks.
>


Re: linux-next: manual merge of the audit tree with the security tree

2016-06-23 Thread Heiko Carstens
On Thu, Jun 23, 2016 at 12:14:11PM -0400, Paul Moore wrote:
> On Thu, Jun 23, 2016 at 2:01 AM, Heiko Carstens
>  wrote:
> > On Thu, Jun 23, 2016 at 02:18:14PM +1000, Stephen Rothwell wrote:
> >> Hi Paul,
> >>
> >> Today's linux-next merge of the audit tree got a conflict in:
> >>
> >>   arch/s390/kernel/ptrace.c
> >>
> >> between commit:
> >>
> >>   0208b9445bc0 ("s390/ptrace: run seccomp after ptrace")
> >>
> >> from the security tree and commit:
> >>
> >>   bba696c2c083 ("s390: ensure that syscall arguments are properly masked 
> >> on s390")
> >>
> >> from the audit tree.
> >
> > Hmm, I haven't seen that commit, therefore I'm just commenting on the
> > result ;)
> 
> It was sent to the linux-audit and linux-s390 mailing lists yesterday
> with a follow up comment that I was going to add it to the audit#next
> branch and if anyone had any objections to let me know.
> 
> * https://www.redhat.com/archives/linux-audit/2016-June/msg00051.html

Yes, I missed that, sorry!

> >> + audit_syscall_entry(regs->gprs[2], regs->orig_gpr2 & mask,
> >> + regs->gprs[3] & mask, regs->gprs[4] & mask,
> >> + regs->gprs[5] & mask);
> >
> > With these masks it is more correct, however these are still not the values
> > used by the system call itself. This would be still incorrect for
> > e.g. compat pointers (31 bit on s390).
> >
> > So it seems like audit_syscall_entry should be called after all sign, zero
> > and masking has been done?
> 
> For someone not familiar with s390, compat or not, where would you
> suggest we place the audit_syscall_entry() call?

I was thinking of a more generic solution for all architectures: for
example setting a new TIF flag within do_syscall_trace_enter which
indicates that audit_syscall_entry needs be called and then add a
conditional call to the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE macros.

That way audit_syscall_entry would always receive already properly sign and
zero extended system call parameters. At the downside this would increase
the kernel text size by probably ~370 conditional branches and add two more
instructions on the system call hot path.

But that's something that could be done independently from your patch,
which already improves the current situation.



Re: linux-next: manual merge of the audit tree with the security tree

2016-06-23 Thread Heiko Carstens
On Thu, Jun 23, 2016 at 12:14:11PM -0400, Paul Moore wrote:
> On Thu, Jun 23, 2016 at 2:01 AM, Heiko Carstens
>  wrote:
> > On Thu, Jun 23, 2016 at 02:18:14PM +1000, Stephen Rothwell wrote:
> >> Hi Paul,
> >>
> >> Today's linux-next merge of the audit tree got a conflict in:
> >>
> >>   arch/s390/kernel/ptrace.c
> >>
> >> between commit:
> >>
> >>   0208b9445bc0 ("s390/ptrace: run seccomp after ptrace")
> >>
> >> from the security tree and commit:
> >>
> >>   bba696c2c083 ("s390: ensure that syscall arguments are properly masked 
> >> on s390")
> >>
> >> from the audit tree.
> >
> > Hmm, I haven't seen that commit, therefore I'm just commenting on the
> > result ;)
> 
> It was sent to the linux-audit and linux-s390 mailing lists yesterday
> with a follow up comment that I was going to add it to the audit#next
> branch and if anyone had any objections to let me know.
> 
> * https://www.redhat.com/archives/linux-audit/2016-June/msg00051.html

Yes, I missed that, sorry!

> >> + audit_syscall_entry(regs->gprs[2], regs->orig_gpr2 & mask,
> >> + regs->gprs[3] & mask, regs->gprs[4] & mask,
> >> + regs->gprs[5] & mask);
> >
> > With these masks it is more correct, however these are still not the values
> > used by the system call itself. This would be still incorrect for
> > e.g. compat pointers (31 bit on s390).
> >
> > So it seems like audit_syscall_entry should be called after all sign, zero
> > and masking has been done?
> 
> For someone not familiar with s390, compat or not, where would you
> suggest we place the audit_syscall_entry() call?

I was thinking of a more generic solution for all architectures: for
example setting a new TIF flag within do_syscall_trace_enter which
indicates that audit_syscall_entry needs be called and then add a
conditional call to the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE macros.

That way audit_syscall_entry would always receive already properly sign and
zero extended system call parameters. At the downside this would increase
the kernel text size by probably ~370 conditional branches and add two more
instructions on the system call hot path.

But that's something that could be done independently from your patch,
which already improves the current situation.



Re: [PATCH 2/6] virtio-balloon: speed up inflate/deflate process

2016-06-23 Thread Michael S. Tsirkin
On Mon, Jun 13, 2016 at 05:47:09PM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very efficient,
> Bellow is test result of time spends on inflating the balloon to 3GB of
> a 4GB idle guest:
> 
> a. allocating pages (6.5%, 103ms)
> b. sending PFNs to host (68.3%, 787ms)
> c. address translation (6.1%, 96ms)
> d. madvise (19%, 300ms)
> 
> It takes about 1577ms for the whole inflating process to complete. The
> test shows that the bottle neck is the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we can
> reduce the overhead in stage b quite a lot. Furthermore, it's possible
> to do the address translation and the madvise with a bulk of pages,
> instead of the current page per page way, so the overhead of stage c
> and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to speed
> up the inflating & deflating process by adding a new feature to the
> virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB
> idle guest only takes 200ms, it's about 8 times as fast as before.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages instead
> of a single page at a time.
> 
> Signed-off-by: Liang Li 
> Suggested-by: Michael S. Tsirkin 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Cc: Cornelia Huck 
> Cc: Amit Shah 

Causes kbuild warnings

> ---
>  drivers/virtio/virtio_balloon.c | 164 
> +++-
>  include/uapi/linux/virtio_balloon.h |   1 +
>  2 files changed, 144 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..1fa601b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -40,11 +40,19 @@
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((2 * (1ULL << 30)) >> PAGE_SHIFT) /* 2GB 
> */

2<< 30  is 2G but that is not a useful comment.
pls explain what is the reason for this selection.

>  
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +struct balloon_bmap_hdr {
> + __virtio32 id;
> + __virtio32 page_shift;
> + __virtio64 start_pfn;
> + __virtio64 bmap_len;
> +};
> +

Put this in an uapi header please.

>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +70,11 @@ struct virtio_balloon {
>  
>   /* Number of balloon pages we've told the Host we're not using. */
>   unsigned int num_pages;
> + /* Bitmap and length used to tell the host the pages */
> + unsigned long *page_bitmap;
> + unsigned long bmap_len;
> + /* Used to record the processed pfn range */
> + unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>   /*
>* The pages we've told the Host we're not using are enqueued
>* at vb_dev_info->pages list.
> @@ -105,15 +118,51 @@ static void balloon_ack(struct virtqueue *vq)
>   wake_up(>acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> + vb->min_pfn = (1UL << 48);

Where does this value come from? Do you want ULONG_MAX?
This does not fit in long on 32 bit systems.


> + vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +  struct page *page)
> +{
> + unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> + if (balloon_pfn < vb->min_pfn)
> + vb->min_pfn = balloon_pfn;
> + if (balloon_pfn > vb->max_pfn)
> + vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
> - struct scatterlist sg;
>   unsigned int len;
>  
> - sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> + struct balloon_bmap_hdr hdr;

why not init fields here?

> + unsigned long bmap_len;

and here

> + struct scatterlist sg[2];
> +
> + hdr.id = cpu_to_virtio32(vb->vdev, 0);
> + hdr.page_shift = cpu_to_virtio32(vb->vdev, PAGE_SHIFT);
> + hdr.start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> + bmap_len = min(vb->bmap_len,
> + (vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> + hdr.bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> + sg_init_table(sg, 2);
> + sg_set_buf([0], , sizeof(hdr));
> + sg_set_buf([1], vb->page_bitmap, bmap_len);
> + 

Re: [PATCH 2/6] virtio-balloon: speed up inflate/deflate process

2016-06-23 Thread Michael S. Tsirkin
On Mon, Jun 13, 2016 at 05:47:09PM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very efficient,
> Bellow is test result of time spends on inflating the balloon to 3GB of
> a 4GB idle guest:
> 
> a. allocating pages (6.5%, 103ms)
> b. sending PFNs to host (68.3%, 787ms)
> c. address translation (6.1%, 96ms)
> d. madvise (19%, 300ms)
> 
> It takes about 1577ms for the whole inflating process to complete. The
> test shows that the bottle neck is the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we can
> reduce the overhead in stage b quite a lot. Furthermore, it's possible
> to do the address translation and the madvise with a bulk of pages,
> instead of the current page per page way, so the overhead of stage c
> and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to speed
> up the inflating & deflating process by adding a new feature to the
> virtio-balloon device. And now, inflating the balloon to 3GB of a 4GB
> idle guest only takes 200ms, it's about 8 times as fast as before.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages instead
> of a single page at a time.
> 
> Signed-off-by: Liang Li 
> Suggested-by: Michael S. Tsirkin 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Cc: Cornelia Huck 
> Cc: Amit Shah 

Causes kbuild warnings

> ---
>  drivers/virtio/virtio_balloon.c | 164 
> +++-
>  include/uapi/linux/virtio_balloon.h |   1 +
>  2 files changed, 144 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..1fa601b 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -40,11 +40,19 @@
>  #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((2 * (1ULL << 30)) >> PAGE_SHIFT) /* 2GB 
> */

2<< 30  is 2G but that is not a useful comment.
pls explain what is the reason for this selection.

>  
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +struct balloon_bmap_hdr {
> + __virtio32 id;
> + __virtio32 page_shift;
> + __virtio64 start_pfn;
> + __virtio64 bmap_len;
> +};
> +

Put this in an uapi header please.

>  struct virtio_balloon {
>   struct virtio_device *vdev;
>   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +70,11 @@ struct virtio_balloon {
>  
>   /* Number of balloon pages we've told the Host we're not using. */
>   unsigned int num_pages;
> + /* Bitmap and length used to tell the host the pages */
> + unsigned long *page_bitmap;
> + unsigned long bmap_len;
> + /* Used to record the processed pfn range */
> + unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>   /*
>* The pages we've told the Host we're not using are enqueued
>* at vb_dev_info->pages list.
> @@ -105,15 +118,51 @@ static void balloon_ack(struct virtqueue *vq)
>   wake_up(>acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> + vb->min_pfn = (1UL << 48);

Where does this value come from? Do you want ULONG_MAX?
This does not fit in long on 32 bit systems.


> + vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +  struct page *page)
> +{
> + unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> + if (balloon_pfn < vb->min_pfn)
> + vb->min_pfn = balloon_pfn;
> + if (balloon_pfn > vb->max_pfn)
> + vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
> - struct scatterlist sg;
>   unsigned int len;
>  
> - sg_init_one(, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> + struct balloon_bmap_hdr hdr;

why not init fields here?

> + unsigned long bmap_len;

and here

> + struct scatterlist sg[2];
> +
> + hdr.id = cpu_to_virtio32(vb->vdev, 0);
> + hdr.page_shift = cpu_to_virtio32(vb->vdev, PAGE_SHIFT);
> + hdr.start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> + bmap_len = min(vb->bmap_len,
> + (vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> + hdr.bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> + sg_init_table(sg, 2);
> + sg_set_buf([0], , sizeof(hdr));
> + sg_set_buf([1], vb->page_bitmap, bmap_len);
> + virtqueue_add_outbuf(vq, sg, 2, vb, GFP_KERNEL);

might fail if queue size < 2. validate queue size and clear
VIRTIO_BALLOON_F_PAGE_BITMAP?


RE: [PATCH] Maxim/driver: Add driver for maxim ds26522

2016-06-23 Thread Qiang Zhao
On Thu, 2016-06-23 at 10:59PM, David Miller wrote:
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, June 23, 2016 10:59 PM
> To: Qiang Zhao 
> Cc: o...@buserror.net; linux-kernel@vger.kernel.org; net...@vger.kernel.org;
> Xiaobo Xie 
> Subject: Re: [PATCH] Maxim/driver: Add driver for maxim ds26522
> 
> From: Zhao Qiang 
> Date: Thu, 23 Jun 2016 09:09:45 +0800
> 
> > +MODULE_DESCRIPTION(DRV_DESC);
> 
> There is no definition of DRV_DESC, so this makes it look like you didn't even
> compile this driver.

I really, really compiled this driver.
Thank you for your review and comments. I will modify it the next version.

[zhaoqiang@titan:~/upstream/linux]$ll drivers/net/wan/slic_ds26522.o
-rw-r--r-- 1 zhaoqiang klocwork 153288 Jun 22 15:48 
drivers/net/wan/slic_ds26522.o
[zhaoqiang@titan:~/upstream/linux]$date
Fri Jun 24 09:42:16 CST 2016

-Zhao Qiang
BR


RE: [PATCH] Maxim/driver: Add driver for maxim ds26522

2016-06-23 Thread Qiang Zhao
On Thu, 2016-06-23 at 10:59PM, David Miller wrote:
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, June 23, 2016 10:59 PM
> To: Qiang Zhao 
> Cc: o...@buserror.net; linux-kernel@vger.kernel.org; net...@vger.kernel.org;
> Xiaobo Xie 
> Subject: Re: [PATCH] Maxim/driver: Add driver for maxim ds26522
> 
> From: Zhao Qiang 
> Date: Thu, 23 Jun 2016 09:09:45 +0800
> 
> > +MODULE_DESCRIPTION(DRV_DESC);
> 
> There is no definition of DRV_DESC, so this makes it look like you didn't even
> compile this driver.

I really, really compiled this driver.
Thank you for your review and comments. I will modify it the next version.

[zhaoqiang@titan:~/upstream/linux]$ll drivers/net/wan/slic_ds26522.o
-rw-r--r-- 1 zhaoqiang klocwork 153288 Jun 22 15:48 
drivers/net/wan/slic_ds26522.o
[zhaoqiang@titan:~/upstream/linux]$date
Fri Jun 24 09:42:16 CST 2016

-Zhao Qiang
BR


[PATCH] libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment

2016-06-23 Thread Dan Williams
The updated ndctl unit tests discovered that if a pfn configuration with
a 4K alignment is read from the namespace, that alignment will be
ignored in favor of the default 2M alignment.  The result is that the
configuration will fail initialization with a message like:

dax6.1: bad offset: 0x22000 dax disabled align: 0x20

Fix this by allowing the alignment read from the info block to override
the default which is 2M not 0 in the autodetect path.  This also fixes a
similar problem with the mode and alignment settings silently being
overwritten by the kernel when userspace has changed it.  We now will
either overwrite the info block if userspace changes the uuid or fail
and warn if a live setting disagrees with the info block.

Cc: 
Cc: Micah Parrish 
Cc: Toshi Kani 
Signed-off-by: Dan Williams 
---

There was a similar, but incomplete, patch like this for the BTT back in
December of last year: "BTT: Change nd_btt_arena_is_valid() to verify
UUID".  I did not apply it due to the fact that it didn't address
setting changes and I did not fully understand the scope of the problem.

Now with the realization that the kernel silently overriding settings is
problematic, we should consider taking this deterministic behavior over
to the btt.  However, it's not a bug there like it is in the pfn case
because the settings default to an invalid uninitialized value, whereas
pfn devices have a default valid alignment.

 drivers/nvdimm/pfn_devs.c |   51 +++--
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index f7718ec685fa..cea8350fbc7e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -344,6 +344,8 @@ struct device *nd_pfn_create(struct nd_region *nd_region)
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
u64 checksum, offset;
+   unsigned long align;
+   enum nd_pfn_mode mode;
struct nd_namespace_io *nsio;
struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
struct nd_namespace_common *ndns = nd_pfn->ndns;
@@ -386,22 +388,50 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
*sig)
return -ENXIO;
}
 
+   align = le32_to_cpu(pfn_sb->align);
+   offset = le64_to_cpu(pfn_sb->dataoff);
+   if (align == 0)
+   align = 1UL << ilog2(offset);
+   mode = le32_to_cpu(pfn_sb->mode);
+
if (!nd_pfn->uuid) {
-   /* from probe we allocate */
+   /*
+* When probing a namepace via nd_pfn_probe() the uuid
+* is NULL (see: nd_pfn_devinit()) we init settings from
+* pfn_sb
+*/
nd_pfn->uuid = kmemdup(pfn_sb->uuid, 16, GFP_KERNEL);
if (!nd_pfn->uuid)
return -ENOMEM;
+   nd_pfn->align = align;
+   nd_pfn->mode = mode;
} else {
-   /* from init we validate */
+   /*
+* When probing a pfn / dax instance we validate the
+* live settings against the pfn_sb
+*/
if (memcmp(nd_pfn->uuid, pfn_sb->uuid, 16) != 0)
return -ENODEV;
+
+   /*
+* If the uuid validates, but other settings mismatch
+* return EINVAL because userspace has managed to change
+* the configuration without specifying new
+* identification.
+*/
+   if (nd_pfn->align != align || nd_pfn->mode != mode) {
+   dev_err(_pfn->dev,
+   "init failed, settings mismatch\n");
+   dev_dbg(_pfn->dev, "align: %lx:%lx mode: %d:%d\n",
+   nd_pfn->align, align, nd_pfn->mode,
+   mode);
+   return -EINVAL;
+   }
}
 
-   if (nd_pfn->align == 0)
-   nd_pfn->align = le32_to_cpu(pfn_sb->align);
-   if (nd_pfn->align > nvdimm_namespace_capacity(ndns)) {
+   if (align > nvdimm_namespace_capacity(ndns)) {
dev_err(_pfn->dev, "alignment: %lx exceeds capacity %llx\n",
-   nd_pfn->align, nvdimm_namespace_capacity(ndns));
+   align, nvdimm_namespace_capacity(ndns));
return -EINVAL;
}
 
@@ -411,7 +441,6 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 * namespace has changed since the pfn superblock was
 * established.
 */
-   offset = le64_to_cpu(pfn_sb->dataoff);
nsio = to_nd_namespace_io(>dev);
if (offset >= resource_size(>res)) {
dev_err(_pfn->dev, "pfn array size exceeds capacity of %s\n",
@@ -419,10 +448,11 

[PATCH] libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment

2016-06-23 Thread Dan Williams
The updated ndctl unit tests discovered that if a pfn configuration with
a 4K alignment is read from the namespace, that alignment will be
ignored in favor of the default 2M alignment.  The result is that the
configuration will fail initialization with a message like:

dax6.1: bad offset: 0x22000 dax disabled align: 0x20

Fix this by allowing the alignment read from the info block to override
the default which is 2M not 0 in the autodetect path.  This also fixes a
similar problem with the mode and alignment settings silently being
overwritten by the kernel when userspace has changed it.  We now will
either overwrite the info block if userspace changes the uuid or fail
and warn if a live setting disagrees with the info block.

Cc: 
Cc: Micah Parrish 
Cc: Toshi Kani 
Signed-off-by: Dan Williams 
---

There was a similar, but incomplete, patch like this for the BTT back in
December of last year: "BTT: Change nd_btt_arena_is_valid() to verify
UUID".  I did not apply it due to the fact that it didn't address
setting changes and I did not fully understand the scope of the problem.

Now with the realization that the kernel silently overriding settings is
problematic, we should consider taking this deterministic behavior over
to the btt.  However, it's not a bug there like it is in the pfn case
because the settings default to an invalid uninitialized value, whereas
pfn devices have a default valid alignment.

 drivers/nvdimm/pfn_devs.c |   51 +++--
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index f7718ec685fa..cea8350fbc7e 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -344,6 +344,8 @@ struct device *nd_pfn_create(struct nd_region *nd_region)
 int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 {
u64 checksum, offset;
+   unsigned long align;
+   enum nd_pfn_mode mode;
struct nd_namespace_io *nsio;
struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
struct nd_namespace_common *ndns = nd_pfn->ndns;
@@ -386,22 +388,50 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
*sig)
return -ENXIO;
}
 
+   align = le32_to_cpu(pfn_sb->align);
+   offset = le64_to_cpu(pfn_sb->dataoff);
+   if (align == 0)
+   align = 1UL << ilog2(offset);
+   mode = le32_to_cpu(pfn_sb->mode);
+
if (!nd_pfn->uuid) {
-   /* from probe we allocate */
+   /*
+* When probing a namepace via nd_pfn_probe() the uuid
+* is NULL (see: nd_pfn_devinit()) we init settings from
+* pfn_sb
+*/
nd_pfn->uuid = kmemdup(pfn_sb->uuid, 16, GFP_KERNEL);
if (!nd_pfn->uuid)
return -ENOMEM;
+   nd_pfn->align = align;
+   nd_pfn->mode = mode;
} else {
-   /* from init we validate */
+   /*
+* When probing a pfn / dax instance we validate the
+* live settings against the pfn_sb
+*/
if (memcmp(nd_pfn->uuid, pfn_sb->uuid, 16) != 0)
return -ENODEV;
+
+   /*
+* If the uuid validates, but other settings mismatch
+* return EINVAL because userspace has managed to change
+* the configuration without specifying new
+* identification.
+*/
+   if (nd_pfn->align != align || nd_pfn->mode != mode) {
+   dev_err(_pfn->dev,
+   "init failed, settings mismatch\n");
+   dev_dbg(_pfn->dev, "align: %lx:%lx mode: %d:%d\n",
+   nd_pfn->align, align, nd_pfn->mode,
+   mode);
+   return -EINVAL;
+   }
}
 
-   if (nd_pfn->align == 0)
-   nd_pfn->align = le32_to_cpu(pfn_sb->align);
-   if (nd_pfn->align > nvdimm_namespace_capacity(ndns)) {
+   if (align > nvdimm_namespace_capacity(ndns)) {
dev_err(_pfn->dev, "alignment: %lx exceeds capacity %llx\n",
-   nd_pfn->align, nvdimm_namespace_capacity(ndns));
+   align, nvdimm_namespace_capacity(ndns));
return -EINVAL;
}
 
@@ -411,7 +441,6 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
 * namespace has changed since the pfn superblock was
 * established.
 */
-   offset = le64_to_cpu(pfn_sb->dataoff);
nsio = to_nd_namespace_io(>dev);
if (offset >= resource_size(>res)) {
dev_err(_pfn->dev, "pfn array size exceeds capacity of %s\n",
@@ -419,10 +448,11 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
*sig)
return 

Re: [v2,1/2] refactor code parsing size based on memory range

2016-06-23 Thread Michael Ellerman
On Wed, 2016-22-06 at 19:25:26 UTC, Hari Bathini wrote:
> Currently, crashkernel parameter supports the below syntax to parse size
> based on memory range:
> 
>   crashkernel=:[,:,...]
> 
> While such parsing is implemented for crashkernel parameter, it applies to
> other parameters with similar syntax. So, move this code to a more generic
> place for code reuse.
> 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Rusty Russell 
> Cc: ke...@lists.infradead.org
> Signed-off-by: Hari Bathini 

Hari, it's not immediately clear that this makes no change to the logic in the
kexec code. Can you reply with a longer change log explaining why the old & new
logic is the same for kexec.

cheers


> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 94aa10f..72f55e5 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -436,6 +436,11 @@ extern char *get_options(const char *str, int nints, int 
> *ints);
>  extern unsigned long long memparse(const char *ptr, char **retptr);
>  extern bool parse_option_str(const char *str, const char *option);
>  
> +extern bool __init is_param_range_based(const char *cmdline);
> +extern unsigned long long __init parse_mem_range_size(const char *param,
> +   char **str,
> +   unsigned long long 
> system_ram);
> +
>  extern int core_kernel_text(unsigned long addr);
>  extern int core_kernel_data(unsigned long addr);
>  extern int __kernel_text_address(unsigned long addr);
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 56b3ed0..d43f5cc 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1083,59 +1083,9 @@ static int __init parse_crashkernel_mem(char *cmdline,
>   char *cur = cmdline, *tmp;
>  
>   /* for each entry of the comma-separated list */
> - do {
> - unsigned long long start, end = ULLONG_MAX, size;
> -
> - /* get the start of the range */
> - start = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("crashkernel: Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (*cur != '-') {
> - pr_warn("crashkernel: '-' expected\n");
> - return -EINVAL;
> - }
> - cur++;
> -
> - /* if no ':' is here, than we read the end */
> - if (*cur != ':') {
> - end = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("crashkernel: Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (end <= start) {
> - pr_warn("crashkernel: end <= start\n");
> - return -EINVAL;
> - }
> - }
> -
> - if (*cur != ':') {
> - pr_warn("crashkernel: ':' expected\n");
> - return -EINVAL;
> - }
> - cur++;
> -
> - size = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (size >= system_ram) {
> - pr_warn("crashkernel: invalid size\n");
> - return -EINVAL;
> - }
> -
> - /* match ? */
> - if (system_ram >= start && system_ram < end) {
> - *crash_size = size;
> - break;
> - }
> - } while (*cur++ == ',');
> + *crash_size = parse_mem_range_size("crashkernel", , system_ram);
> + if (cur == cmdline)
> + return -EINVAL;
>  
>   if (*crash_size > 0) {
>   while (*cur && *cur != ' ' && *cur != '@')
> @@ -1272,7 +1222,6 @@ static int __init __parse_crashkernel(char *cmdline,
>const char *name,
>const char *suffix)
>  {
> - char*first_colon, *first_space;
>   char*ck_cmdline;
>  
>   BUG_ON(!crash_size || !crash_base);
> @@ -1290,12 +1239,10 @@ static int __init __parse_crashkernel(char *cmdline,
>   return parse_crashkernel_suffix(ck_cmdline, crash_size,
>   suffix);
>   /*
> -  * if the commandline contains a ':', then that's the extended
> +  * if the parameter is range based, then that's the extended
>* syntax -- if not, it must be the classic syntax
>*/
> - first_colon = strchr(ck_cmdline, ':');
> - first_space = strchr(ck_cmdline, ' ');
> - if (first_colon && (!first_space || first_colon < first_space))

Re: [v2,1/2] refactor code parsing size based on memory range

2016-06-23 Thread Michael Ellerman
On Wed, 2016-22-06 at 19:25:26 UTC, Hari Bathini wrote:
> Currently, crashkernel parameter supports the below syntax to parse size
> based on memory range:
> 
>   crashkernel=:[,:,...]
> 
> While such parsing is implemented for crashkernel parameter, it applies to
> other parameters with similar syntax. So, move this code to a more generic
> place for code reuse.
> 
> Cc: Eric Biederman 
> Cc: Vivek Goyal 
> Cc: Rusty Russell 
> Cc: ke...@lists.infradead.org
> Signed-off-by: Hari Bathini 

Hari, it's not immediately clear that this makes no change to the logic in the
kexec code. Can you reply with a longer change log explaining why the old & new
logic is the same for kexec.

cheers


> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 94aa10f..72f55e5 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -436,6 +436,11 @@ extern char *get_options(const char *str, int nints, int 
> *ints);
>  extern unsigned long long memparse(const char *ptr, char **retptr);
>  extern bool parse_option_str(const char *str, const char *option);
>  
> +extern bool __init is_param_range_based(const char *cmdline);
> +extern unsigned long long __init parse_mem_range_size(const char *param,
> +   char **str,
> +   unsigned long long 
> system_ram);
> +
>  extern int core_kernel_text(unsigned long addr);
>  extern int core_kernel_data(unsigned long addr);
>  extern int __kernel_text_address(unsigned long addr);
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 56b3ed0..d43f5cc 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1083,59 +1083,9 @@ static int __init parse_crashkernel_mem(char *cmdline,
>   char *cur = cmdline, *tmp;
>  
>   /* for each entry of the comma-separated list */
> - do {
> - unsigned long long start, end = ULLONG_MAX, size;
> -
> - /* get the start of the range */
> - start = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("crashkernel: Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (*cur != '-') {
> - pr_warn("crashkernel: '-' expected\n");
> - return -EINVAL;
> - }
> - cur++;
> -
> - /* if no ':' is here, than we read the end */
> - if (*cur != ':') {
> - end = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("crashkernel: Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (end <= start) {
> - pr_warn("crashkernel: end <= start\n");
> - return -EINVAL;
> - }
> - }
> -
> - if (*cur != ':') {
> - pr_warn("crashkernel: ':' expected\n");
> - return -EINVAL;
> - }
> - cur++;
> -
> - size = memparse(cur, );
> - if (cur == tmp) {
> - pr_warn("Memory value expected\n");
> - return -EINVAL;
> - }
> - cur = tmp;
> - if (size >= system_ram) {
> - pr_warn("crashkernel: invalid size\n");
> - return -EINVAL;
> - }
> -
> - /* match ? */
> - if (system_ram >= start && system_ram < end) {
> - *crash_size = size;
> - break;
> - }
> - } while (*cur++ == ',');
> + *crash_size = parse_mem_range_size("crashkernel", , system_ram);
> + if (cur == cmdline)
> + return -EINVAL;
>  
>   if (*crash_size > 0) {
>   while (*cur && *cur != ' ' && *cur != '@')
> @@ -1272,7 +1222,6 @@ static int __init __parse_crashkernel(char *cmdline,
>const char *name,
>const char *suffix)
>  {
> - char*first_colon, *first_space;
>   char*ck_cmdline;
>  
>   BUG_ON(!crash_size || !crash_base);
> @@ -1290,12 +1239,10 @@ static int __init __parse_crashkernel(char *cmdline,
>   return parse_crashkernel_suffix(ck_cmdline, crash_size,
>   suffix);
>   /*
> -  * if the commandline contains a ':', then that's the extended
> +  * if the parameter is range based, then that's the extended
>* syntax -- if not, it must be the classic syntax
>*/
> - first_colon = strchr(ck_cmdline, ':');
> - first_space = strchr(ck_cmdline, ' ');
> - if (first_colon && (!first_space || first_colon < first_space))
> + if (is_param_range_based(ck_cmdline))
>   return 

[PATCH v2 5/5] dmaengine: dma: Use different channel names for each dma

2016-06-23 Thread Kedareswara rao Appana
Current driver assumes that child node channel name is either
"xlnx,axi-vdma-mm2s-channel" or "xlnx,axi-vdma-s2mm-channel"
which is confusing the users of AXI DMA and CDMA.
This patch fixes this issue by using different channel
names for the AXI DMA and AXI CDMA child nodes.

Signed-off-by: Kedareswara rao Appana 
---
Chanes for v2:
---> New patch.

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |6 +-
 drivers/dma/xilinx/xilinx_dma.c|8 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
index 0faa189..a2b8bfa 100644
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
+++ b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
@@ -50,8 +50,12 @@ Optional properties for VDMA:
{3}, flush s2mm channel
 
 Required child node properties:
-- compatible: It should be either "xlnx,axi-vdma-mm2s-channel" or
+- compatible:
+   For VDMA: It should be either "xlnx,axi-vdma-mm2s-channel" or
"xlnx,axi-vdma-s2mm-channel".
+   For CDMA: It should be "xlnx,axi-cdma-channel".
+   For AXIDMA: It should be either "xlnx,axi-dma-mm2s-channel" or
+   "xlnx,axi-dma-s2mm-channel".
 - interrupts: Should contain per channel VDMA interrupts.
 - xlnx,datawidth: Should contain the stream data width, take values
{32,64...1024}.
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 0768d9f..cf47347 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2353,7 +2353,9 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
if (!has_dre)
xdev->common.copy_align = fls(width - 1);
 
-   if (of_device_is_compatible(node, "xlnx,axi-vdma-mm2s-channel")) {
+   if (of_device_is_compatible(node, "xlnx,axi-vdma-mm2s-channel") ||
+   of_device_is_compatible(node, "xlnx,axi-dma-mm2s-channel") ||
+   of_device_is_compatible(node, "xlnx,axi-cdma-channel")) {
chan->direction = DMA_MEM_TO_DEV;
chan->id = chan_id;
chan->tdest = chan_id;
@@ -2367,7 +2369,9 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->flush_on_fsync = true;
}
} else if (of_device_is_compatible(node,
-   "xlnx,axi-vdma-s2mm-channel")) {
+  "xlnx,axi-vdma-s2mm-channel") ||
+  of_device_is_compatible(node,
+  "xlnx,axi-dma-s2mm-channel")) {
chan->direction = DMA_DEV_TO_MEM;
chan->id = chan_id;
chan->tdest = chan_id - xdev->nr_channels;
-- 
1.7.1



[PATCH v2 5/5] dmaengine: dma: Use different channel names for each dma

2016-06-23 Thread Kedareswara rao Appana
Current driver assumes that child node channel name is either
"xlnx,axi-vdma-mm2s-channel" or "xlnx,axi-vdma-s2mm-channel"
which is confusing the users of AXI DMA and CDMA.
This patch fixes this issue by using different channel
names for the AXI DMA and AXI CDMA child nodes.

Signed-off-by: Kedareswara rao Appana 
---
Chanes for v2:
---> New patch.

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |6 +-
 drivers/dma/xilinx/xilinx_dma.c|8 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
index 0faa189..a2b8bfa 100644
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
+++ b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
@@ -50,8 +50,12 @@ Optional properties for VDMA:
{3}, flush s2mm channel
 
 Required child node properties:
-- compatible: It should be either "xlnx,axi-vdma-mm2s-channel" or
+- compatible:
+   For VDMA: It should be either "xlnx,axi-vdma-mm2s-channel" or
"xlnx,axi-vdma-s2mm-channel".
+   For CDMA: It should be "xlnx,axi-cdma-channel".
+   For AXIDMA: It should be either "xlnx,axi-dma-mm2s-channel" or
+   "xlnx,axi-dma-s2mm-channel".
 - interrupts: Should contain per channel VDMA interrupts.
 - xlnx,datawidth: Should contain the stream data width, take values
{32,64...1024}.
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index 0768d9f..cf47347 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -2353,7 +2353,9 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
if (!has_dre)
xdev->common.copy_align = fls(width - 1);
 
-   if (of_device_is_compatible(node, "xlnx,axi-vdma-mm2s-channel")) {
+   if (of_device_is_compatible(node, "xlnx,axi-vdma-mm2s-channel") ||
+   of_device_is_compatible(node, "xlnx,axi-dma-mm2s-channel") ||
+   of_device_is_compatible(node, "xlnx,axi-cdma-channel")) {
chan->direction = DMA_MEM_TO_DEV;
chan->id = chan_id;
chan->tdest = chan_id;
@@ -2367,7 +2369,9 @@ static int xilinx_dma_chan_probe(struct xilinx_dma_device 
*xdev,
chan->flush_on_fsync = true;
}
} else if (of_device_is_compatible(node,
-   "xlnx,axi-vdma-s2mm-channel")) {
+  "xlnx,axi-vdma-s2mm-channel") ||
+  of_device_is_compatible(node,
+  "xlnx,axi-dma-s2mm-channel")) {
chan->direction = DMA_DEV_TO_MEM;
chan->id = chan_id;
chan->tdest = chan_id - xdev->nr_channels;
-- 
1.7.1



[PATCH v2 4/5] dmaengine: dma: Rename driver and config

2016-06-23 Thread Kedareswara rao Appana
In the existing vdma driver support for
AXI DMA and CDMA got added so the driver is no
longer VDMA specific.

This patch renames the driver and DT binding doc to xilinx_dma
and updates the Kconfig description for all the DMAS.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> None.

 .../dma/xilinx/{xilinx_vdma.txt => xilinx_dma.txt} |0
 drivers/dma/Kconfig|   11 ---
 drivers/dma/xilinx/Makefile|2 +-
 drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} |0
 4 files changed, 9 insertions(+), 4 deletions(-)
 rename Documentation/devicetree/bindings/dma/xilinx/{xilinx_vdma.txt => 
xilinx_dma.txt} (100%)
 rename drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} (100%)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
similarity index 100%
rename from Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
rename to Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 8c98779..1f39f3e 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -519,19 +519,24 @@ config XGENE_DMA
help
  Enable support for the APM X-Gene SoC DMA engine.
 
-config XILINX_VDMA
-   tristate "Xilinx AXI VDMA Engine"
+config XILINX_DMA
+   tristate "Xilinx AXI DMAS Engine"
depends on (ARCH_ZYNQ || MICROBLAZE || ARM64)
select DMA_ENGINE
help
  Enable support for Xilinx AXI VDMA Soft IP.
 
- This engine provides high-bandwidth direct memory access
+ AXI VDMA engine provides high-bandwidth direct memory access
  between memory and AXI4-Stream video type target
  peripherals including peripherals which support AXI4-
  Stream Video Protocol.  It has two stream interfaces/
  channels, Memory Mapped to Stream (MM2S) and Stream to
  Memory Mapped (S2MM) for the data transfers.
+ AXI CDMA engine provides high-bandwidth direct memory access
+ between a memory-mapped source address and a memory-mapped
+ destination address.
+ AXI DMA engine provides high-bandwidth one dimensional direct
+ memory access between memory and AXI4-Stream target peripherals.
 
 config ZX_DMA
tristate "ZTE ZX296702 DMA support"
diff --git a/drivers/dma/xilinx/Makefile b/drivers/dma/xilinx/Makefile
index 3c4e9f2..af9e69a 100644
--- a/drivers/dma/xilinx/Makefile
+++ b/drivers/dma/xilinx/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_XILINX_VDMA) += xilinx_vdma.o
+obj-$(CONFIG_XILINX_DMA) += xilinx_dma.o
diff --git a/drivers/dma/xilinx/xilinx_vdma.c b/drivers/dma/xilinx/xilinx_dma.c
similarity index 100%
rename from drivers/dma/xilinx/xilinx_vdma.c
rename to drivers/dma/xilinx/xilinx_dma.c
-- 
1.7.1



[PATCH v2 0/5] dmaengine: vdma: AXI DMAS Enhancments

2016-06-23 Thread Kedareswara rao Appana
This patch series does the following thing.
---> Add support for AXI DMA Multi-channel DMA mode.
---> Delete AXI DMA binding doc.
---> Rename the driver and update config options.

Kedareswara rao Appana (5):
  Documentation: DT: vdma: Update binding doc for multi-channel dma
mode
  dmaengine: vdma: Add support for mulit-channel dma mode
  Documentation: DT: dma: Delete binding doc for AXI DMA
  dmaengine: dma: Rename driver and config
  dmaengine: dma: Use different channel names for each dma

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |   94 +++--
 .../devicetree/bindings/dma/xilinx/xilinx_vdma.txt |  107 --
 drivers/dma/Kconfig|   11 +-
 drivers/dma/xilinx/Makefile|2 +-
 drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} |  221 +---
 5 files changed, 277 insertions(+), 158 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
 rename drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} (92%)



[PATCH v2 1/5] Documentation: DT: vdma: Update binding doc for multi-channel dma mode

2016-06-23 Thread Kedareswara rao Appana
This patch updates the device-tree binding doc for
AXI DMA multi channel dma mode.

Acked-by: Rob Herring 
Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> Added Rob Acked-by.

 .../devicetree/bindings/dma/xilinx/xilinx_vdma.txt |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
index a1f2683..0faa189 100644
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
+++ b/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
@@ -40,6 +40,8 @@ Required properties for VDMA:
 Optional properties:
 - xlnx,include-sg: Tells configured for Scatter-mode in
the hardware.
+Optional properties for AXI DMA:
+- xlnx,mcdma: Tells whether configured for multi-channel mode in the hardware.
 Optional properties for VDMA:
 - xlnx,flush-fsync: Tells which channel to Flush on Frame sync.
It takes following values:
@@ -60,6 +62,8 @@ Optional child node properties:
 Optional child node properties for VDMA:
 - xlnx,genlock-mode: Tells Genlock synchronization is
enabled/disabled in hardware.
+Optional child node properties for AXI DMA:
+-dma-channels: Number of dma channels in child node.
 
 Example:
 
-- 
1.7.1



[PATCH v2 2/5] dmaengine: vdma: Add support for mulit-channel dma mode

2016-06-23 Thread Kedareswara rao Appana
This patch adds support for AXI DMA multi-channel dma mode
Multichannel mode enables DMA to connect to multiple masters
and slaves on the streaming side.

In Multichannel mode AXI DMA supports 2D transfers.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> Removed mcdma_config as suggested by vinod

 drivers/dma/xilinx/xilinx_vdma.c |  213 +
 1 files changed, 190 insertions(+), 23 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_vdma.c b/drivers/dma/xilinx/xilinx_vdma.c
index 40f754b..0768d9f 100644
--- a/drivers/dma/xilinx/xilinx_vdma.c
+++ b/drivers/dma/xilinx/xilinx_vdma.c
@@ -114,7 +114,7 @@
 #define XILINX_VDMA_REG_START_ADDRESS_64(n)(0x000c + 8 * (n))
 
 /* HW specific definitions */
-#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x2
+#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x20
 
 #define XILINX_DMA_DMAXR_ALL_IRQ_MASK  \
(XILINX_DMA_DMASR_FRM_CNT_IRQ | \
@@ -165,6 +165,18 @@
 #define XILINX_DMA_COALESCE_MAX255
 #define XILINX_DMA_NUM_APP_WORDS   5
 
+/* Multi-Channel DMA Descriptor offsets*/
+#define XILINX_DMA_MCRX_CDESC(x)   (0x40 + (x-1) * 0x20)
+#define XILINX_DMA_MCRX_TDESC(x)   (0x48 + (x-1) * 0x20)
+
+/* Multi-Channel DMA Masks/Shifts */
+#define XILINX_DMA_BD_HSIZE_MASK   GENMASK(15, 0)
+#define XILINX_DMA_BD_STRIDE_MASK  GENMASK(15, 0)
+#define XILINX_DMA_BD_VSIZE_MASK   GENMASK(31, 19)
+#define XILINX_DMA_BD_TDEST_MASK   GENMASK(4, 0)
+#define XILINX_DMA_BD_STRIDE_SHIFT 0
+#define XILINX_DMA_BD_VSIZE_SHIFT  19
+
 /* AXI CDMA Specific Registers/Offsets */
 #define XILINX_CDMA_REG_SRCADDR0x18
 #define XILINX_CDMA_REG_DSTADDR0x20
@@ -210,8 +222,8 @@ struct xilinx_axidma_desc_hw {
u32 next_desc_msb;
u32 buf_addr;
u32 buf_addr_msb;
-   u32 pad1;
-   u32 pad2;
+   u32 mcdma_control;
+   u32 vsize_stride;
u32 control;
u32 status;
u32 app[XILINX_DMA_NUM_APP_WORDS];
@@ -349,6 +361,7 @@ struct xilinx_dma_chan {
struct xilinx_axidma_tx_segment *seg_v;
struct xilinx_axidma_tx_segment *cyclic_seg_v;
void (*start_transfer)(struct xilinx_dma_chan *chan);
+   u16 tdest;
 };
 
 struct xilinx_dma_config {
@@ -365,6 +378,7 @@ struct xilinx_dma_config {
  * @common: DMA device structure
  * @chan: Driver specific DMA channel
  * @has_sg: Specifies whether Scatter-Gather is present or not
+ * @mcdma: Specifies whether Multi-Channel is present or not
  * @flush_on_fsync: Flush on frame sync
  * @ext_addr: Indicates 64 bit addressing is supported by dma device
  * @pdev: Platform device structure pointer
@@ -374,6 +388,8 @@ struct xilinx_dma_config {
  * @txs_clk: DMA mm2s stream clock
  * @rx_clk: DMA s2mm clock
  * @rxs_clk: DMA s2mm stream clock
+ * @nr_channels: Number of channels DMA device supports
+ * @chan_id: DMA channel identifier
  */
 struct xilinx_dma_device {
void __iomem *regs;
@@ -381,6 +397,7 @@ struct xilinx_dma_device {
struct dma_device common;
struct xilinx_dma_chan *chan[XILINX_DMA_MAX_CHANS_PER_DEVICE];
bool has_sg;
+   bool mcdma;
u32 flush_on_fsync;
bool ext_addr;
struct platform_device  *pdev;
@@ -390,6 +407,8 @@ struct xilinx_dma_device {
struct clk *txs_clk;
struct clk *rx_clk;
struct clk *rxs_clk;
+   u32 nr_channels;
+   u32 chan_id;
 };
 
 /* Macros */
@@ -1196,18 +1215,20 @@ static void xilinx_dma_start_transfer(struct 
xilinx_dma_chan *chan)
tail_segment = list_last_entry(_desc->segments,
   struct xilinx_axidma_tx_segment, node);
 
-   old_head = list_first_entry(_desc->segments,
-   struct xilinx_axidma_tx_segment, node);
-   new_head = chan->seg_v;
-   /* Copy Buffer Descriptor fields. */
-   new_head->hw = old_head->hw;
+   if (chan->has_sg && !chan->xdev->mcdma) {
+   old_head = list_first_entry(_desc->segments,
+   struct xilinx_axidma_tx_segment, node);
+   new_head = chan->seg_v;
+   /* Copy Buffer Descriptor fields. */
+   new_head->hw = old_head->hw;
 
-   /* Swap and save new reserve */
-   list_replace_init(_head->node, _head->node);
-   chan->seg_v = old_head;
+   /* Swap and save new reserve */
+   list_replace_init(_head->node, _head->node);
+   chan->seg_v = old_head;
 
-   tail_segment->hw.next_desc = chan->seg_v->phys;
-   head_desc->async_tx.phys = new_head->phys;
+   tail_segment->hw.next_desc = chan->seg_v->phys;
+   head_desc->async_tx.phys = new_head->phys;
+   }
 
reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
 
@@ -1218,23 +1239,53 @@ static void xilinx_dma_start_transfer(struct 
xilinx_dma_chan *chan)

[PATCH v2 4/5] dmaengine: dma: Rename driver and config

2016-06-23 Thread Kedareswara rao Appana
In the existing vdma driver support for
AXI DMA and CDMA got added so the driver is no
longer VDMA specific.

This patch renames the driver and DT binding doc to xilinx_dma
and updates the Kconfig description for all the DMAS.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> None.

 .../dma/xilinx/{xilinx_vdma.txt => xilinx_dma.txt} |0
 drivers/dma/Kconfig|   11 ---
 drivers/dma/xilinx/Makefile|2 +-
 drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} |0
 4 files changed, 9 insertions(+), 4 deletions(-)
 rename Documentation/devicetree/bindings/dma/xilinx/{xilinx_vdma.txt => 
xilinx_dma.txt} (100%)
 rename drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} (100%)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
similarity index 100%
rename from Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
rename to Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 8c98779..1f39f3e 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -519,19 +519,24 @@ config XGENE_DMA
help
  Enable support for the APM X-Gene SoC DMA engine.
 
-config XILINX_VDMA
-   tristate "Xilinx AXI VDMA Engine"
+config XILINX_DMA
+   tristate "Xilinx AXI DMAS Engine"
depends on (ARCH_ZYNQ || MICROBLAZE || ARM64)
select DMA_ENGINE
help
  Enable support for Xilinx AXI VDMA Soft IP.
 
- This engine provides high-bandwidth direct memory access
+ AXI VDMA engine provides high-bandwidth direct memory access
  between memory and AXI4-Stream video type target
  peripherals including peripherals which support AXI4-
  Stream Video Protocol.  It has two stream interfaces/
  channels, Memory Mapped to Stream (MM2S) and Stream to
  Memory Mapped (S2MM) for the data transfers.
+ AXI CDMA engine provides high-bandwidth direct memory access
+ between a memory-mapped source address and a memory-mapped
+ destination address.
+ AXI DMA engine provides high-bandwidth one dimensional direct
+ memory access between memory and AXI4-Stream target peripherals.
 
 config ZX_DMA
tristate "ZTE ZX296702 DMA support"
diff --git a/drivers/dma/xilinx/Makefile b/drivers/dma/xilinx/Makefile
index 3c4e9f2..af9e69a 100644
--- a/drivers/dma/xilinx/Makefile
+++ b/drivers/dma/xilinx/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_XILINX_VDMA) += xilinx_vdma.o
+obj-$(CONFIG_XILINX_DMA) += xilinx_dma.o
diff --git a/drivers/dma/xilinx/xilinx_vdma.c b/drivers/dma/xilinx/xilinx_dma.c
similarity index 100%
rename from drivers/dma/xilinx/xilinx_vdma.c
rename to drivers/dma/xilinx/xilinx_dma.c
-- 
1.7.1



[PATCH v2 0/5] dmaengine: vdma: AXI DMAS Enhancments

2016-06-23 Thread Kedareswara rao Appana
This patch series does the following thing.
---> Add support for AXI DMA Multi-channel DMA mode.
---> Delete AXI DMA binding doc.
---> Rename the driver and update config options.

Kedareswara rao Appana (5):
  Documentation: DT: vdma: Update binding doc for multi-channel dma
mode
  dmaengine: vdma: Add support for mulit-channel dma mode
  Documentation: DT: dma: Delete binding doc for AXI DMA
  dmaengine: dma: Rename driver and config
  dmaengine: dma: Use different channel names for each dma

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |   94 +++--
 .../devicetree/bindings/dma/xilinx/xilinx_vdma.txt |  107 --
 drivers/dma/Kconfig|   11 +-
 drivers/dma/xilinx/Makefile|2 +-
 drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} |  221 +---
 5 files changed, 277 insertions(+), 158 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
 rename drivers/dma/xilinx/{xilinx_vdma.c => xilinx_dma.c} (92%)



[PATCH v2 1/5] Documentation: DT: vdma: Update binding doc for multi-channel dma mode

2016-06-23 Thread Kedareswara rao Appana
This patch updates the device-tree binding doc for
AXI DMA multi channel dma mode.

Acked-by: Rob Herring 
Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> Added Rob Acked-by.

 .../devicetree/bindings/dma/xilinx/xilinx_vdma.txt |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
index a1f2683..0faa189 100644
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
+++ b/Documentation/devicetree/bindings/dma/xilinx/xilinx_vdma.txt
@@ -40,6 +40,8 @@ Required properties for VDMA:
 Optional properties:
 - xlnx,include-sg: Tells configured for Scatter-mode in
the hardware.
+Optional properties for AXI DMA:
+- xlnx,mcdma: Tells whether configured for multi-channel mode in the hardware.
 Optional properties for VDMA:
 - xlnx,flush-fsync: Tells which channel to Flush on Frame sync.
It takes following values:
@@ -60,6 +62,8 @@ Optional child node properties:
 Optional child node properties for VDMA:
 - xlnx,genlock-mode: Tells Genlock synchronization is
enabled/disabled in hardware.
+Optional child node properties for AXI DMA:
+-dma-channels: Number of dma channels in child node.
 
 Example:
 
-- 
1.7.1



[PATCH v2 2/5] dmaengine: vdma: Add support for mulit-channel dma mode

2016-06-23 Thread Kedareswara rao Appana
This patch adds support for AXI DMA multi-channel dma mode
Multichannel mode enables DMA to connect to multiple masters
and slaves on the streaming side.

In Multichannel mode AXI DMA supports 2D transfers.

Signed-off-by: Kedareswara rao Appana 
---
Changes for v2:
---> Removed mcdma_config as suggested by vinod

 drivers/dma/xilinx/xilinx_vdma.c |  213 +
 1 files changed, 190 insertions(+), 23 deletions(-)

diff --git a/drivers/dma/xilinx/xilinx_vdma.c b/drivers/dma/xilinx/xilinx_vdma.c
index 40f754b..0768d9f 100644
--- a/drivers/dma/xilinx/xilinx_vdma.c
+++ b/drivers/dma/xilinx/xilinx_vdma.c
@@ -114,7 +114,7 @@
 #define XILINX_VDMA_REG_START_ADDRESS_64(n)(0x000c + 8 * (n))
 
 /* HW specific definitions */
-#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x2
+#define XILINX_DMA_MAX_CHANS_PER_DEVICE0x20
 
 #define XILINX_DMA_DMAXR_ALL_IRQ_MASK  \
(XILINX_DMA_DMASR_FRM_CNT_IRQ | \
@@ -165,6 +165,18 @@
 #define XILINX_DMA_COALESCE_MAX255
 #define XILINX_DMA_NUM_APP_WORDS   5
 
+/* Multi-Channel DMA Descriptor offsets*/
+#define XILINX_DMA_MCRX_CDESC(x)   (0x40 + (x-1) * 0x20)
+#define XILINX_DMA_MCRX_TDESC(x)   (0x48 + (x-1) * 0x20)
+
+/* Multi-Channel DMA Masks/Shifts */
+#define XILINX_DMA_BD_HSIZE_MASK   GENMASK(15, 0)
+#define XILINX_DMA_BD_STRIDE_MASK  GENMASK(15, 0)
+#define XILINX_DMA_BD_VSIZE_MASK   GENMASK(31, 19)
+#define XILINX_DMA_BD_TDEST_MASK   GENMASK(4, 0)
+#define XILINX_DMA_BD_STRIDE_SHIFT 0
+#define XILINX_DMA_BD_VSIZE_SHIFT  19
+
 /* AXI CDMA Specific Registers/Offsets */
 #define XILINX_CDMA_REG_SRCADDR0x18
 #define XILINX_CDMA_REG_DSTADDR0x20
@@ -210,8 +222,8 @@ struct xilinx_axidma_desc_hw {
u32 next_desc_msb;
u32 buf_addr;
u32 buf_addr_msb;
-   u32 pad1;
-   u32 pad2;
+   u32 mcdma_control;
+   u32 vsize_stride;
u32 control;
u32 status;
u32 app[XILINX_DMA_NUM_APP_WORDS];
@@ -349,6 +361,7 @@ struct xilinx_dma_chan {
struct xilinx_axidma_tx_segment *seg_v;
struct xilinx_axidma_tx_segment *cyclic_seg_v;
void (*start_transfer)(struct xilinx_dma_chan *chan);
+   u16 tdest;
 };
 
 struct xilinx_dma_config {
@@ -365,6 +378,7 @@ struct xilinx_dma_config {
  * @common: DMA device structure
  * @chan: Driver specific DMA channel
  * @has_sg: Specifies whether Scatter-Gather is present or not
+ * @mcdma: Specifies whether Multi-Channel is present or not
  * @flush_on_fsync: Flush on frame sync
  * @ext_addr: Indicates 64 bit addressing is supported by dma device
  * @pdev: Platform device structure pointer
@@ -374,6 +388,8 @@ struct xilinx_dma_config {
  * @txs_clk: DMA mm2s stream clock
  * @rx_clk: DMA s2mm clock
  * @rxs_clk: DMA s2mm stream clock
+ * @nr_channels: Number of channels DMA device supports
+ * @chan_id: DMA channel identifier
  */
 struct xilinx_dma_device {
void __iomem *regs;
@@ -381,6 +397,7 @@ struct xilinx_dma_device {
struct dma_device common;
struct xilinx_dma_chan *chan[XILINX_DMA_MAX_CHANS_PER_DEVICE];
bool has_sg;
+   bool mcdma;
u32 flush_on_fsync;
bool ext_addr;
struct platform_device  *pdev;
@@ -390,6 +407,8 @@ struct xilinx_dma_device {
struct clk *txs_clk;
struct clk *rx_clk;
struct clk *rxs_clk;
+   u32 nr_channels;
+   u32 chan_id;
 };
 
 /* Macros */
@@ -1196,18 +1215,20 @@ static void xilinx_dma_start_transfer(struct 
xilinx_dma_chan *chan)
tail_segment = list_last_entry(_desc->segments,
   struct xilinx_axidma_tx_segment, node);
 
-   old_head = list_first_entry(_desc->segments,
-   struct xilinx_axidma_tx_segment, node);
-   new_head = chan->seg_v;
-   /* Copy Buffer Descriptor fields. */
-   new_head->hw = old_head->hw;
+   if (chan->has_sg && !chan->xdev->mcdma) {
+   old_head = list_first_entry(_desc->segments,
+   struct xilinx_axidma_tx_segment, node);
+   new_head = chan->seg_v;
+   /* Copy Buffer Descriptor fields. */
+   new_head->hw = old_head->hw;
 
-   /* Swap and save new reserve */
-   list_replace_init(_head->node, _head->node);
-   chan->seg_v = old_head;
+   /* Swap and save new reserve */
+   list_replace_init(_head->node, _head->node);
+   chan->seg_v = old_head;
 
-   tail_segment->hw.next_desc = chan->seg_v->phys;
-   head_desc->async_tx.phys = new_head->phys;
+   tail_segment->hw.next_desc = chan->seg_v->phys;
+   head_desc->async_tx.phys = new_head->phys;
+   }
 
reg = dma_ctrl_read(chan, XILINX_DMA_REG_DMACR);
 
@@ -1218,23 +1239,53 @@ static void xilinx_dma_start_transfer(struct 
xilinx_dma_chan *chan)
dma_ctrl_write(chan, 

[PATCH v2 3/5] Documentation: DT: dma: Delete binding doc for AXI DMA

2016-06-23 Thread Kedareswara rao Appana
The AXI DMA support is added to the existing AXI VDMA
driver. Device tree binding information also updated
in the VDMA binding doc.

Acked-by: Rob Herring 
Signed-off-by: Kedareswara rao Appana 
---
--> Added Rob Acked-by.

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |   65 
 1 files changed, 0 insertions(+), 65 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
deleted file mode 100644
index 3cf0072..000
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-Xilinx AXI DMA engine, it does transfers between memory and AXI4 stream
-target devices. It can be configured to have one channel or two channels.
-If configured as two channels, one is to transmit to the device and another
-is to receive from the device.
-
-Required properties:
-- compatible: Should be "xlnx,axi-dma-1.00.a"
-- #dma-cells: Should be <1>, see "dmas" property below
-- reg: Should contain DMA registers location and length.
-- dma-channel child node: Should have at least one channel and can have up to
-   two channels per device. This node specifies the properties of each
-   DMA channel (see child node properties below).
-
-Optional properties:
-- xlnx,include-sg: Tells whether configured for Scatter-mode in
-   the hardware.
-
-Required child node properties:
-- compatible: It should be either "xlnx,axi-dma-mm2s-channel" or
-   "xlnx,axi-dma-s2mm-channel".
-- interrupts: Should contain per channel DMA interrupts.
-- xlnx,datawidth: Should contain the stream data width, take values
-   {32,64...1024}.
-
-Option child node properties:
-- xlnx,include-dre: Tells whether hardware is configured for Data
-   Realignment Engine.
-
-Example:
-
-
-axi_dma_0: axidma@4040 {
-   compatible = "xlnx,axi-dma-1.00.a";
-   #dma_cells = <1>;
-   reg = < 0x4040 0x1 >;
-   dma-channel@4040 {
-   compatible = "xlnx,axi-dma-mm2s-channel";
-   interrupts = < 0 59 4 >;
-   xlnx,datawidth = <0x40>;
-   } ;
-   dma-channel@40400030 {
-   compatible = "xlnx,axi-dma-s2mm-channel";
-   interrupts = < 0 58 4 >;
-   xlnx,datawidth = <0x40>;
-   } ;
-} ;
-
-
-* DMA client
-
-Required properties:
-- dmas: a list of <[DMA device phandle] [Channel ID]> pairs,
-   where Channel ID is '0' for write/tx and '1' for read/rx
-   channel.
-- dma-names: a list of DMA channel names, one per "dmas" entry
-
-Example:
-
-
-dmatest_0: dmatest@0 {
-   compatible ="xlnx,axi-dma-test-1.00.a";
-   dmas = <_dma_0 0
-   _dma_0 1>;
-   dma-names = "dma0", "dma1";
-} ;
-- 
1.7.1



[PATCH v2 3/5] Documentation: DT: dma: Delete binding doc for AXI DMA

2016-06-23 Thread Kedareswara rao Appana
The AXI DMA support is added to the existing AXI VDMA
driver. Device tree binding information also updated
in the VDMA binding doc.

Acked-by: Rob Herring 
Signed-off-by: Kedareswara rao Appana 
---
--> Added Rob Acked-by.

 .../devicetree/bindings/dma/xilinx/xilinx_dma.txt  |   65 
 1 files changed, 0 insertions(+), 65 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt

diff --git a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt 
b/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
deleted file mode 100644
index 3cf0072..000
--- a/Documentation/devicetree/bindings/dma/xilinx/xilinx_dma.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-Xilinx AXI DMA engine, it does transfers between memory and AXI4 stream
-target devices. It can be configured to have one channel or two channels.
-If configured as two channels, one is to transmit to the device and another
-is to receive from the device.
-
-Required properties:
-- compatible: Should be "xlnx,axi-dma-1.00.a"
-- #dma-cells: Should be <1>, see "dmas" property below
-- reg: Should contain DMA registers location and length.
-- dma-channel child node: Should have at least one channel and can have up to
-   two channels per device. This node specifies the properties of each
-   DMA channel (see child node properties below).
-
-Optional properties:
-- xlnx,include-sg: Tells whether configured for Scatter-mode in
-   the hardware.
-
-Required child node properties:
-- compatible: It should be either "xlnx,axi-dma-mm2s-channel" or
-   "xlnx,axi-dma-s2mm-channel".
-- interrupts: Should contain per channel DMA interrupts.
-- xlnx,datawidth: Should contain the stream data width, take values
-   {32,64...1024}.
-
-Option child node properties:
-- xlnx,include-dre: Tells whether hardware is configured for Data
-   Realignment Engine.
-
-Example:
-
-
-axi_dma_0: axidma@4040 {
-   compatible = "xlnx,axi-dma-1.00.a";
-   #dma_cells = <1>;
-   reg = < 0x4040 0x1 >;
-   dma-channel@4040 {
-   compatible = "xlnx,axi-dma-mm2s-channel";
-   interrupts = < 0 59 4 >;
-   xlnx,datawidth = <0x40>;
-   } ;
-   dma-channel@40400030 {
-   compatible = "xlnx,axi-dma-s2mm-channel";
-   interrupts = < 0 58 4 >;
-   xlnx,datawidth = <0x40>;
-   } ;
-} ;
-
-
-* DMA client
-
-Required properties:
-- dmas: a list of <[DMA device phandle] [Channel ID]> pairs,
-   where Channel ID is '0' for write/tx and '1' for read/rx
-   channel.
-- dma-names: a list of DMA channel names, one per "dmas" entry
-
-Example:
-
-
-dmatest_0: dmatest@0 {
-   compatible ="xlnx,axi-dma-test-1.00.a";
-   dmas = <_dma_0 0
-   _dma_0 1>;
-   dma-names = "dma0", "dma1";
-} ;
-- 
1.7.1



Re: [PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread kbuild test robot
Hi,

[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jongsung-Kim/clk-fixed-factor-add-optional-dt-binding-clock-flags/20160624-115201
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: x86_64-randconfig-s4-06241247 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/clk/clk-fixed-factor.c: In function 'of_fixed_factor_clk_setup':
>> drivers/clk/clk-fixed-factor.c:170:44: error: passing argument 3 of 
>> 'of_property_read_u32' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 of_property_read_u32(node, "clock-flags", );
   ^
   In file included from include/linux/clk-provider.h:15:0,
from drivers/clk/clk-fixed-factor.c:11:
   include/linux/of.h:916:19: note: expected 'u32 * {aka unsigned int *}' but 
argument is of type 'long unsigned int *'
static inline int of_property_read_u32(const struct device_node *np,
  ^~~~
   cc1: some warnings being treated as errors

vim +/of_property_read_u32 +170 drivers/clk/clk-fixed-factor.c

   164  __func__, node->name);
   165  return;
   166  }
   167  
   168  of_property_read_string(node, "clock-output-names", _name);
   169  parent_name = of_clk_get_parent_name(node, 0);
 > 170  of_property_read_u32(node, "clock-flags", );
   171  
   172  clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 
flags,
   173  mult, div);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread kbuild test robot
Hi,

[auto build test ERROR on robh/for-next]
[also build test ERROR on v4.7-rc4 next-20160623]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jongsung-Kim/clk-fixed-factor-add-optional-dt-binding-clock-flags/20160624-115201
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: x86_64-randconfig-s4-06241247 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/clk/clk-fixed-factor.c: In function 'of_fixed_factor_clk_setup':
>> drivers/clk/clk-fixed-factor.c:170:44: error: passing argument 3 of 
>> 'of_property_read_u32' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 of_property_read_u32(node, "clock-flags", );
   ^
   In file included from include/linux/clk-provider.h:15:0,
from drivers/clk/clk-fixed-factor.c:11:
   include/linux/of.h:916:19: note: expected 'u32 * {aka unsigned int *}' but 
argument is of type 'long unsigned int *'
static inline int of_property_read_u32(const struct device_node *np,
  ^~~~
   cc1: some warnings being treated as errors

vim +/of_property_read_u32 +170 drivers/clk/clk-fixed-factor.c

   164  __func__, node->name);
   165  return;
   166  }
   167  
   168  of_property_read_string(node, "clock-output-names", _name);
   169  parent_name = of_clk_get_parent_name(node, 0);
 > 170  of_property_read_u32(node, "clock-flags", );
   171  
   172  clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 
flags,
   173  mult, div);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


RE: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-06-23 Thread Yang, Wenyou
Hi Alan,

Sorry for late answer.

> -Original Message-
> From: Alan Stern [mailto:st...@rowland.harvard.edu]
> Sent: 2016年5月13日 2:11
> To: Yang, Wenyou 
> Cc: Greg Kroah-Hartman ; Ferre, Nicolas
> ; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending
> 
> On Thu, 12 May 2016, Wenyou Yang wrote:
> 
> > In order to get lower consumption, as a workaround, suspend the USB
> > PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> > Configuration Register while OHCI USB suspending.
> 
> What does this mean?  What does suspending a port do?  Is it the same as a
> normal USB port suspend?

The usb controller from Synopsis does not managed correctly the suspend mode 
for the EHCI. 
There is no way to have the VDDUTMII (USB device and host UTMI interface) 
suspend without any device connected to it. 

That's why we added this specific control to fix this issue. Namely, by setting 
some bits of one of the special function registers to fix this issue outside 
the usb controller. 

And the suspend mode works in OHCI mode.

It is not same as a normal USB port suspend.

> 
> If it is the same, why doesn't the USB_PORT_FEAT_SUSPEND subcase of the
> SetPortFeature case in ohci_hub_control() already take care of this?
> 
> > This suspend operation must be done before stopping the USB clock,
> > resume after the USB clock enabled.
> >
> > Signed-off-by: Wenyou Yang 
> > ---
> 
> > @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device
> > *pdev)
> >
> >
> > /*
> > -*/
> >
> > +struct regmap *at91_dt_syscon_sfr(void) {
> > +   struct regmap *regmap;
> > +
> > +   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> > +   if (IS_ERR(regmap))
> > +   return NULL;
> 
> If you get an error, the regmap pointer is set to NULL...
> 
> > @@ -197,6 +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver
> *driver,
> > goto err;
> > }
> >
> > +   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> 
> With no other error checking...
> 
> > +
> > board = hcd->self.controller->platform_data;
> > ohci = hcd_to_ohci(hcd);
> > ohci->num_ports = board->ports;
> 
> > +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable) {
> > +   u32 regval;
> > +   int ret;
> > +
> > +   if (IS_ERR(regmap))
> > +   return PTR_ERR(regmap);
> > +
> > +   ret = regmap_read(regmap, SFR_OHCIICR, );
> 
> And now what happens if regmap is NULL?  Hint: It won't be pretty...
> 
> Alan Stern


Best Regards,
Wenyou Yang



RE: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-06-23 Thread Yang, Wenyou
Hi Alan,

Sorry for late answer.

> -Original Message-
> From: Alan Stern [mailto:st...@rowland.harvard.edu]
> Sent: 2016年5月13日 2:11
> To: Yang, Wenyou 
> Cc: Greg Kroah-Hartman ; Ferre, Nicolas
> ; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending
> 
> On Thu, 12 May 2016, Wenyou Yang wrote:
> 
> > In order to get lower consumption, as a workaround, suspend the USB
> > PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> > Configuration Register while OHCI USB suspending.
> 
> What does this mean?  What does suspending a port do?  Is it the same as a
> normal USB port suspend?

The usb controller from Synopsis does not managed correctly the suspend mode 
for the EHCI. 
There is no way to have the VDDUTMII (USB device and host UTMI interface) 
suspend without any device connected to it. 

That's why we added this specific control to fix this issue. Namely, by setting 
some bits of one of the special function registers to fix this issue outside 
the usb controller. 

And the suspend mode works in OHCI mode.

It is not same as a normal USB port suspend.

> 
> If it is the same, why doesn't the USB_PORT_FEAT_SUSPEND subcase of the
> SetPortFeature case in ohci_hub_control() already take care of this?
> 
> > This suspend operation must be done before stopping the USB clock,
> > resume after the USB clock enabled.
> >
> > Signed-off-by: Wenyou Yang 
> > ---
> 
> > @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device
> > *pdev)
> >
> >
> > /*
> > -*/
> >
> > +struct regmap *at91_dt_syscon_sfr(void) {
> > +   struct regmap *regmap;
> > +
> > +   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> > +   if (IS_ERR(regmap))
> > +   return NULL;
> 
> If you get an error, the regmap pointer is set to NULL...
> 
> > @@ -197,6 +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver
> *driver,
> > goto err;
> > }
> >
> > +   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> 
> With no other error checking...
> 
> > +
> > board = hcd->self.controller->platform_data;
> > ohci = hcd_to_ohci(hcd);
> > ohci->num_ports = board->ports;
> 
> > +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable) {
> > +   u32 regval;
> > +   int ret;
> > +
> > +   if (IS_ERR(regmap))
> > +   return PTR_ERR(regmap);
> > +
> > +   ret = regmap_read(regmap, SFR_OHCIICR, );
> 
> And now what happens if regmap is NULL?  Hint: It won't be pretty...
> 
> Alan Stern


Best Regards,
Wenyou Yang



linux-next: manual merge of the userns tree with Linus' tree

2016-06-23 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the userns tree got a conflict in:

  fs/proc/root.c

between commit:

  e54ad7f1ee26 ("proc: prevent stacking filesystems on top")

from Linus' tree and commit:

  e94591d0d90c ("proc: Convert proc_mount to use mount_ns")

from the userns tree.

I fixed it up (I used the userns version of this file and added the
following patch) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

From: Stephen Rothwell 
Date: Fri, 24 Jun 2016 14:27:47 +1000
Subject: [PATCH] proc: fixup for "prevent stacking filesystems on top"

Signed-off-by: Stephen Rothwell 
---
 fs/proc/inode.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a5b2c33745b7..6b1843e78bd7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -463,6 +463,13 @@ int proc_fill_super(struct super_block *s, void *data, int 
silent)
struct inode *root_inode;
int ret;
 
+   /*
+* procfs isn't actually a stacking filesystem; however, there is
+* too much magic going on inside it to permit stacking things on
+* top of it
+*/
+   s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
+
if (!proc_parse_options(data, ns))
return -EINVAL;
 
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the userns tree with Linus' tree

2016-06-23 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the userns tree got a conflict in:

  fs/proc/root.c

between commit:

  e54ad7f1ee26 ("proc: prevent stacking filesystems on top")

from Linus' tree and commit:

  e94591d0d90c ("proc: Convert proc_mount to use mount_ns")

from the userns tree.

I fixed it up (I used the userns version of this file and added the
following patch) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

From: Stephen Rothwell 
Date: Fri, 24 Jun 2016 14:27:47 +1000
Subject: [PATCH] proc: fixup for "prevent stacking filesystems on top"

Signed-off-by: Stephen Rothwell 
---
 fs/proc/inode.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a5b2c33745b7..6b1843e78bd7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -463,6 +463,13 @@ int proc_fill_super(struct super_block *s, void *data, int 
silent)
struct inode *root_inode;
int ret;
 
+   /*
+* procfs isn't actually a stacking filesystem; however, there is
+* too much magic going on inside it to permit stacking things on
+* top of it
+*/
+   s->s_stack_depth = FILESYSTEM_MAX_STACK_DEPTH;
+
if (!proc_parse_options(data, ns))
return -EINVAL;
 
-- 
2.8.1

-- 
Cheers,
Stephen Rothwell


Re: [PATCH 6/7] of_graph: add of_graph_get_top_port()

2016-06-23 Thread kbuild test robot
Hi,

[auto build test WARNING on robh/for-next]
[also build test WARNING on v4.7-rc4 next-20160623]
[cannot apply to glikely/devicetree/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kuninori-Morimoto/of_graph-prepare-for-ALSA-graph-support/20160624-105421
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: arm-at91_dt_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c:22:0:
>> include/linux/of_graph.h:54:50: warning: 'struct device' declared inside 
>> parameter list
struct device_node *of_graph_get_top_port(struct device *dev);
 ^
>> include/linux/of_graph.h:54:50: warning: its scope is only this definition 
>> or declaration, which is probably not what you want

vim +54 include/linux/of_graph.h

38   */
39  #define for_each_endpoint_of_node(parent, child) \
40  for (child = of_graph_get_next_endpoint(parent, NULL); child != 
NULL; \
41   child = of_graph_get_next_endpoint(parent, child))
42  
43  #define of_graph_port_type_is_sound(n)  
of_graph_port_type_is(n, "sound")
44  #define of_graph_endpoint_type_is_sound(n)  
of_graph_endpoint_type_is(n, "sound")
45  #define of_graph_get_sound_endpoint_count(n)
of_graph_get_endpoint_count(n, "sound")
46  
47  #ifdef CONFIG_OF
48  int of_graph_parse_endpoint(const struct device_node *node,
49  struct of_endpoint *endpoint);
50  bool of_graph_port_type_is(struct device_node *port, char *type);
51  bool of_graph_endpoint_type_is(struct device_node *ep, char *type);
52  int of_graph_get_endpoint_count(const struct device_node *np, char 
*type);
53  struct device_node *of_graph_get_port_by_id(struct device_node *node, 
u32 id);
  > 54  struct device_node *of_graph_get_top_port(struct device *dev);
55  struct device_node *of_graph_get_next_endpoint(const struct device_node 
*parent,
56  struct device_node *previous);
57  struct device_node *of_graph_get_endpoint_by_regs(
58  const struct device_node *parent, int port_reg, int 
reg);
59  struct device_node *of_graph_get_remote_endpoint(
60  const struct device_node *node);
61  struct device_node *of_graph_get_port_parent(struct device_node *node);
62  struct device_node *of_graph_get_remote_port_parent(

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 6/7] of_graph: add of_graph_get_top_port()

2016-06-23 Thread kbuild test robot
Hi,

[auto build test WARNING on robh/for-next]
[also build test WARNING on v4.7-rc4 next-20160623]
[cannot apply to glikely/devicetree/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kuninori-Morimoto/of_graph-prepare-for-ALSA-graph-support/20160624-105421
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux for-next
config: arm-at91_dt_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_output.c:22:0:
>> include/linux/of_graph.h:54:50: warning: 'struct device' declared inside 
>> parameter list
struct device_node *of_graph_get_top_port(struct device *dev);
 ^
>> include/linux/of_graph.h:54:50: warning: its scope is only this definition 
>> or declaration, which is probably not what you want

vim +54 include/linux/of_graph.h

38   */
39  #define for_each_endpoint_of_node(parent, child) \
40  for (child = of_graph_get_next_endpoint(parent, NULL); child != 
NULL; \
41   child = of_graph_get_next_endpoint(parent, child))
42  
43  #define of_graph_port_type_is_sound(n)  
of_graph_port_type_is(n, "sound")
44  #define of_graph_endpoint_type_is_sound(n)  
of_graph_endpoint_type_is(n, "sound")
45  #define of_graph_get_sound_endpoint_count(n)
of_graph_get_endpoint_count(n, "sound")
46  
47  #ifdef CONFIG_OF
48  int of_graph_parse_endpoint(const struct device_node *node,
49  struct of_endpoint *endpoint);
50  bool of_graph_port_type_is(struct device_node *port, char *type);
51  bool of_graph_endpoint_type_is(struct device_node *ep, char *type);
52  int of_graph_get_endpoint_count(const struct device_node *np, char 
*type);
53  struct device_node *of_graph_get_port_by_id(struct device_node *node, 
u32 id);
  > 54  struct device_node *of_graph_get_top_port(struct device *dev);
55  struct device_node *of_graph_get_next_endpoint(const struct device_node 
*parent,
56  struct device_node *previous);
57  struct device_node *of_graph_get_endpoint_by_regs(
58  const struct device_node *parent, int port_reg, int 
reg);
59  struct device_node *of_graph_get_remote_endpoint(
60  const struct device_node *node);
61  struct device_node *of_graph_get_port_parent(struct device_node *node);
62  struct device_node *of_graph_get_remote_port_parent(

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RESEND][PATCH 0/2] Add pl031 RTC support for Hi6220/HiKey

2016-06-23 Thread Rob Herring
On Thu, Jun 23, 2016 at 3:39 PM, John Stultz  wrote:
> This patchset enables the pl031 RTC on the Hi6220 SoC.
>
> I'd like to submit it for review and consideration to be merged.
> (But I've not gotten much feedback on it. Do I have the right
> people cc'ed?)

Yes. One issue is the DT header causes dependency problems as either
clk or arm-soc maintainers have to take everything. I think it is
desired that you don't use defines in the dts file, so arm-soc can
take it and Michael/Stephen can take the clock changes.

Send the dts file change to a...@kernel.org if you can't get any
response from the sub-arch maintainer.

Rob

>
> Michael/Wei: If you don't object to this, can I get an ack from
> one of you so the other can take the change through their tree?
>
> thanks
> -john
>
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: Rob Herring 
> Cc: Pawel Moll 
> Cc: Wei Xu 
> Cc: Guodong Xu 
> Cc: Zhangfei Gao 
>
> Zhangfei Gao (2):
>   clk: hi6220: Add RTC clock for pl031
>   arm64: dts: hi6220: Add pl031 RTC support
>
>  arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 16 
>  drivers/clk/hisilicon/clk-hi6220.c|  2 ++
>  include/dt-bindings/clock/hi6220-clock.h  |  5 +++--
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> --
> 1.9.1
>


Re: [RESEND][PATCH 0/2] Add pl031 RTC support for Hi6220/HiKey

2016-06-23 Thread Rob Herring
On Thu, Jun 23, 2016 at 3:39 PM, John Stultz  wrote:
> This patchset enables the pl031 RTC on the Hi6220 SoC.
>
> I'd like to submit it for review and consideration to be merged.
> (But I've not gotten much feedback on it. Do I have the right
> people cc'ed?)

Yes. One issue is the DT header causes dependency problems as either
clk or arm-soc maintainers have to take everything. I think it is
desired that you don't use defines in the dts file, so arm-soc can
take it and Michael/Stephen can take the clock changes.

Send the dts file change to a...@kernel.org if you can't get any
response from the sub-arch maintainer.

Rob

>
> Michael/Wei: If you don't object to this, can I get an ack from
> one of you so the other can take the change through their tree?
>
> thanks
> -john
>
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: Rob Herring 
> Cc: Pawel Moll 
> Cc: Wei Xu 
> Cc: Guodong Xu 
> Cc: Zhangfei Gao 
>
> Zhangfei Gao (2):
>   clk: hi6220: Add RTC clock for pl031
>   arm64: dts: hi6220: Add pl031 RTC support
>
>  arch/arm64/boot/dts/hisilicon/hi6220.dtsi | 16 
>  drivers/clk/hisilicon/clk-hi6220.c|  2 ++
>  include/dt-bindings/clock/hi6220-clock.h  |  5 +++--
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> --
> 1.9.1
>


Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG

2016-06-23 Thread Jaehoon Chung
On 06/24/2016 10:25 AM, Shawn Lin wrote:
> Hi Jaehoon,
> 
> On 2016/6/23 19:39, Jaehoon Chung wrote:
>> Hi Shawn,
>>
>> On 06/21/2016 04:39 PM, Shawn Lin wrote:
>>> 在 2016/6/21 13:32, Jaehoon Chung 写道:
 Hi guys,

 On 06/21/2016 11:31 AM, Shawn Lin wrote:
> On 2016/6/21 10:24, Seung-Woo Kim wrote:
>> Hello Shawn,
>>
>>> -Original Message-
>>> From: Shawn Lin [mailto:shawn@rock-chips.com]
>>> Sent: Tuesday, June 21, 2016 10:52 AM
>>> To: Seung-Woo Kim; jh80.ch...@samsung.com; ulf.hans...@linaro.org; 
>>> linux-...@vger.kernel.org; linux-
>>> ker...@vger.kernel.org
>>> Cc: shawn@rock-chips.com
>>> Subject: Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG
>>>
>>> On 2016/6/20 16:34, Seung-Woo Kim wrote:
 Hi folks,

 During booting test on my Exynos5422 based Odroid-XU3, kernel compiled
 with CONFIG_DMA_API_DEBUG reported following warning:

 [ cut here ]
 WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1096 check_unmap+0x7bc/0xb38
 dwmmc_exynos 1220.mmc: DMA-API: device driver tries to free DMA 
 memory it has not allocated [device
>>> address=0x6d9d2200]
>>>
>>> Thanks for this report and fix.
>>>
>>> DTO(the same as IDMAC-RI/TI) interrupts may or may not come together
>>> with DATA_ERR. If DATA_ERR occur without geting DTO, we should issue
>>> CMD12 manually to generate DTO. It's a ugly deisgn for dwmmc but from
>>> the vendor's ask.
>>>
>>> So you should never think we complete the xfer without
>>> checking DATA_ERR. This way you got the warning.

 Well, EVENT_DATA_ERR is already checked in tasklet_func..and cleared that 
 flags.
>>>
>>> From my view, the reality is that when we got DATA_ERROR interrupts,
>>> we set EVENT_DATA_ERR to the pending_events and schedule the tasklet
>>> but we may still fallback to the IDMAC interrupt case as the tasklet
>>> may come up a little late, namely right after the IDMAC interrupt checking.
>>>
>>> I'm trying to add some log there, and it well proves my guess.
>>
>> You're right..This is appeared because of "Data Over".
>> If Data Over interrupt is occurred, SW needs to read the remaining Data in 
>> FIFO.
>> At that time, it was set to DATA_COMPLETE. Because SW might read the 
>> remaining data, but already set to ERROR_DATA.
>>
>> In this case, Your suggestion may prevent to free twice.
>>
>> There is other case..during tuning sequence.. :(
>> I found that it also appeared during the tuning sequence.
>> - Really..stupid design..
>>
>> My suggestions are
>> First, apply the below solution.
> 
> Which solution? This $SUBJECT or the one I sent?

Yours. :)

> 
> 
> 
>> And then consider the HS200 tuning block with the below patch.
>>
>> https://patchwork.kernel.org/patch/8935791/
> 
> I saw this patch long ago, but I still have not seen
> this issue or got reports for it. From the code itself, it should
> be ok to landed as I checked the databook. :)
> 
>>
>> How about?
>> Do you have any other opinion?
>>
>> Best Regards,
>> Jaehoon Chung
>>
>>>

>>>
>>> So could you try this one:
>>
>> With your patch, there is no more the DMA API waring in my environment.
>
> Nice to hear that.  Thanks for testing, Seung-Woo.

 Really? It's not solution..When send tuning command, it should be returned 
 CRC error.
 Then it called the dw_mci_stop_dma() and also dma_ops->complete().
>>>
>>> Hrmm.. I can't see the reason it will also call dma_ops->complete.
>>> Could you explain a bit more here? :)
>>>
>>> From V2.70a Table 3-2
>>> For MMC CMD19, there may be no CRC status returned by the
>>> card but EBE is generated. Hence, EBE is set for CMD19. The application
>>> should not treat this as an error.
>>>
>>>

 When i applied you suggestion, also produced.. :)

 [2.469916] [] (unwind_backtrace) from [] 
 (show_stack+0x10/0x14)
 [2.469934] [] (show_stack) from [] 
 (dump_stack+0x74/0x94)
 [2.469949] [] (dump_stack) from [] 
 (__warn+0xd4/0x100)
 [2.469961] [] (__warn) from [] 
 (warn_slowpath_fmt+0x38/0x48)
 [2.469975] [] (warn_slowpath_fmt) from [] 
 (check_unmap+0x828/0x8a8)
 [2.469991] [] (check_unmap) from [] 
 (debug_dma_unmap_sg+0x5c/0x13c)
 [2.470012] [] (debug_dma_unmap_sg) from [] 
 (dw_mci_dma_cleanup+0x68/0xa4)
 [2.470029] [] (dw_mci_dma_cleanup) from [] 
 (dw_mci_stop_dma+0x30/0x40)
 [2.470045] [] (dw_mci_stop_dma) from [] 
 (dw_mci_tasklet_func+0x340/0x3b4)
 [2.470063] [] (dw_mci_tasklet_func) from [] 
 (tasklet_action+0x84/0x12c)
 [2.470076] [] (tasklet_action) from [] 
 (__do_softirq+0xec/0x244)
 [2.470089] [] (__do_softirq) from [] 
 (irq_exit+0xb4/0xf8)
 [2.470109] [] (irq_exit) from [] 
 (__handle_domain_irq+0x70/0xe4)
 [

Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG

2016-06-23 Thread Jaehoon Chung
On 06/24/2016 10:25 AM, Shawn Lin wrote:
> Hi Jaehoon,
> 
> On 2016/6/23 19:39, Jaehoon Chung wrote:
>> Hi Shawn,
>>
>> On 06/21/2016 04:39 PM, Shawn Lin wrote:
>>> 在 2016/6/21 13:32, Jaehoon Chung 写道:
 Hi guys,

 On 06/21/2016 11:31 AM, Shawn Lin wrote:
> On 2016/6/21 10:24, Seung-Woo Kim wrote:
>> Hello Shawn,
>>
>>> -Original Message-
>>> From: Shawn Lin [mailto:shawn@rock-chips.com]
>>> Sent: Tuesday, June 21, 2016 10:52 AM
>>> To: Seung-Woo Kim; jh80.ch...@samsung.com; ulf.hans...@linaro.org; 
>>> linux-...@vger.kernel.org; linux-
>>> ker...@vger.kernel.org
>>> Cc: shawn@rock-chips.com
>>> Subject: Re: mmc: dw_mmc: warning with CONFIG_DMA_API_DEBUG
>>>
>>> On 2016/6/20 16:34, Seung-Woo Kim wrote:
 Hi folks,

 During booting test on my Exynos5422 based Odroid-XU3, kernel compiled
 with CONFIG_DMA_API_DEBUG reported following warning:

 [ cut here ]
 WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1096 check_unmap+0x7bc/0xb38
 dwmmc_exynos 1220.mmc: DMA-API: device driver tries to free DMA 
 memory it has not allocated [device
>>> address=0x6d9d2200]
>>>
>>> Thanks for this report and fix.
>>>
>>> DTO(the same as IDMAC-RI/TI) interrupts may or may not come together
>>> with DATA_ERR. If DATA_ERR occur without geting DTO, we should issue
>>> CMD12 manually to generate DTO. It's a ugly deisgn for dwmmc but from
>>> the vendor's ask.
>>>
>>> So you should never think we complete the xfer without
>>> checking DATA_ERR. This way you got the warning.

 Well, EVENT_DATA_ERR is already checked in tasklet_func..and cleared that 
 flags.
>>>
>>> From my view, the reality is that when we got DATA_ERROR interrupts,
>>> we set EVENT_DATA_ERR to the pending_events and schedule the tasklet
>>> but we may still fallback to the IDMAC interrupt case as the tasklet
>>> may come up a little late, namely right after the IDMAC interrupt checking.
>>>
>>> I'm trying to add some log there, and it well proves my guess.
>>
>> You're right..This is appeared because of "Data Over".
>> If Data Over interrupt is occurred, SW needs to read the remaining Data in 
>> FIFO.
>> At that time, it was set to DATA_COMPLETE. Because SW might read the 
>> remaining data, but already set to ERROR_DATA.
>>
>> In this case, Your suggestion may prevent to free twice.
>>
>> There is other case..during tuning sequence.. :(
>> I found that it also appeared during the tuning sequence.
>> - Really..stupid design..
>>
>> My suggestions are
>> First, apply the below solution.
> 
> Which solution? This $SUBJECT or the one I sent?

Yours. :)

> 
> 
> 
>> And then consider the HS200 tuning block with the below patch.
>>
>> https://patchwork.kernel.org/patch/8935791/
> 
> I saw this patch long ago, but I still have not seen
> this issue or got reports for it. From the code itself, it should
> be ok to landed as I checked the databook. :)
> 
>>
>> How about?
>> Do you have any other opinion?
>>
>> Best Regards,
>> Jaehoon Chung
>>
>>>

>>>
>>> So could you try this one:
>>
>> With your patch, there is no more the DMA API waring in my environment.
>
> Nice to hear that.  Thanks for testing, Seung-Woo.

 Really? It's not solution..When send tuning command, it should be returned 
 CRC error.
 Then it called the dw_mci_stop_dma() and also dma_ops->complete().
>>>
>>> Hrmm.. I can't see the reason it will also call dma_ops->complete.
>>> Could you explain a bit more here? :)
>>>
>>> From V2.70a Table 3-2
>>> For MMC CMD19, there may be no CRC status returned by the
>>> card but EBE is generated. Hence, EBE is set for CMD19. The application
>>> should not treat this as an error.
>>>
>>>

 When i applied you suggestion, also produced.. :)

 [2.469916] [] (unwind_backtrace) from [] 
 (show_stack+0x10/0x14)
 [2.469934] [] (show_stack) from [] 
 (dump_stack+0x74/0x94)
 [2.469949] [] (dump_stack) from [] 
 (__warn+0xd4/0x100)
 [2.469961] [] (__warn) from [] 
 (warn_slowpath_fmt+0x38/0x48)
 [2.469975] [] (warn_slowpath_fmt) from [] 
 (check_unmap+0x828/0x8a8)
 [2.469991] [] (check_unmap) from [] 
 (debug_dma_unmap_sg+0x5c/0x13c)
 [2.470012] [] (debug_dma_unmap_sg) from [] 
 (dw_mci_dma_cleanup+0x68/0xa4)
 [2.470029] [] (dw_mci_dma_cleanup) from [] 
 (dw_mci_stop_dma+0x30/0x40)
 [2.470045] [] (dw_mci_stop_dma) from [] 
 (dw_mci_tasklet_func+0x340/0x3b4)
 [2.470063] [] (dw_mci_tasklet_func) from [] 
 (tasklet_action+0x84/0x12c)
 [2.470076] [] (tasklet_action) from [] 
 (__do_softirq+0xec/0x244)
 [2.470089] [] (__do_softirq) from [] 
 (irq_exit+0xb4/0xf8)
 [2.470109] [] (irq_exit) from [] 
 (__handle_domain_irq+0x70/0xe4)
 [

[PATCH v4 04/16] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

2016-06-23 Thread Andy Lutomirski
This avoids pointless races in which another CPU or task might see a
partially populated global pgd entry.  These races should normally
be harmless, but, if another CPU propagates the entry via
vmalloc_fault and then populate_pgd fails (due to memory allocation
failure, for example), this prevents a use-after-free of the pgd
entry.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/mm/pageattr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 7a1f7bbf4105..6a8026918bf6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1113,7 +1113,9 @@ static int populate_pgd(struct cpa_data *cpa, unsigned 
long addr)
 
ret = populate_pud(cpa, addr, pgd_entry, pgprot);
if (ret < 0) {
-   unmap_pgd_range(cpa->pgd, addr,
+   if (pud)
+   free_page((unsigned long)pud);
+   unmap_pud_range(pgd_entry, addr,
addr + (cpa->numpages << PAGE_SHIFT));
return ret;
}
-- 
2.5.5



[PATCH v4 03/16] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()

2016-06-23 Thread Andy Lutomirski
From: Ingo Molnar 

So when memory hotplug removes a piece of physical memory from pagetable
mappings, it also frees the underlying PGD entry.

This complicates PGD management, so don't do this. We can keep the
PGD mapped and the PUD table all clear - it's only a single 4K page
per 512 GB of memory hotplugged.

Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Waiman Long 
Cc: linux...@kvack.org
Signed-off-by: Ingo Molnar 
Message-Id: <1442903021-3893-4-git-send-email-mi...@kernel.org>
---
 arch/x86/mm/init_64.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bce2e5d9edd4..c7465453d64e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -702,27 +702,6 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, 
pud_t *pud)
spin_unlock(_mm.page_table_lock);
 }
 
-/* Return true if pgd is changed, otherwise return false. */
-static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
-{
-   pud_t *pud;
-   int i;
-
-   for (i = 0; i < PTRS_PER_PUD; i++) {
-   pud = pud_start + i;
-   if (pud_val(*pud))
-   return false;
-   }
-
-   /* free a pud table */
-   free_pagetable(pgd_page(*pgd), 0);
-   spin_lock(_mm.page_table_lock);
-   pgd_clear(pgd);
-   spin_unlock(_mm.page_table_lock);
-
-   return true;
-}
-
 static void __meminit
 remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 bool direct)
@@ -913,7 +892,6 @@ remove_pagetable(unsigned long start, unsigned long end, 
bool direct)
unsigned long addr;
pgd_t *pgd;
pud_t *pud;
-   bool pgd_changed = false;
 
for (addr = start; addr < end; addr = next) {
next = pgd_addr_end(addr, end);
@@ -924,13 +902,8 @@ remove_pagetable(unsigned long start, unsigned long end, 
bool direct)
 
pud = (pud_t *)pgd_page_vaddr(*pgd);
remove_pud_table(pud, addr, next, direct);
-   if (free_pud_table(pud, pgd))
-   pgd_changed = true;
}
 
-   if (pgd_changed)
-   sync_global_pgds(start, end - 1, 1);
-
flush_tlb_all();
 }
 
-- 
2.5.5



[PATCH v4 04/16] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

2016-06-23 Thread Andy Lutomirski
This avoids pointless races in which another CPU or task might see a
partially populated global pgd entry.  These races should normally
be harmless, but, if another CPU propagates the entry via
vmalloc_fault and then populate_pgd fails (due to memory allocation
failure, for example), this prevents a use-after-free of the pgd
entry.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/mm/pageattr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 7a1f7bbf4105..6a8026918bf6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1113,7 +1113,9 @@ static int populate_pgd(struct cpa_data *cpa, unsigned 
long addr)
 
ret = populate_pud(cpa, addr, pgd_entry, pgprot);
if (ret < 0) {
-   unmap_pgd_range(cpa->pgd, addr,
+   if (pud)
+   free_page((unsigned long)pud);
+   unmap_pud_range(pgd_entry, addr,
addr + (cpa->numpages << PAGE_SHIFT));
return ret;
}
-- 
2.5.5



[PATCH v4 03/16] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()

2016-06-23 Thread Andy Lutomirski
From: Ingo Molnar 

So when memory hotplug removes a piece of physical memory from pagetable
mappings, it also frees the underlying PGD entry.

This complicates PGD management, so don't do this. We can keep the
PGD mapped and the PUD table all clear - it's only a single 4K page
per 512 GB of memory hotplugged.

Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Waiman Long 
Cc: linux...@kvack.org
Signed-off-by: Ingo Molnar 
Message-Id: <1442903021-3893-4-git-send-email-mi...@kernel.org>
---
 arch/x86/mm/init_64.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bce2e5d9edd4..c7465453d64e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -702,27 +702,6 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, 
pud_t *pud)
spin_unlock(_mm.page_table_lock);
 }
 
-/* Return true if pgd is changed, otherwise return false. */
-static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
-{
-   pud_t *pud;
-   int i;
-
-   for (i = 0; i < PTRS_PER_PUD; i++) {
-   pud = pud_start + i;
-   if (pud_val(*pud))
-   return false;
-   }
-
-   /* free a pud table */
-   free_pagetable(pgd_page(*pgd), 0);
-   spin_lock(_mm.page_table_lock);
-   pgd_clear(pgd);
-   spin_unlock(_mm.page_table_lock);
-
-   return true;
-}
-
 static void __meminit
 remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 bool direct)
@@ -913,7 +892,6 @@ remove_pagetable(unsigned long start, unsigned long end, 
bool direct)
unsigned long addr;
pgd_t *pgd;
pud_t *pud;
-   bool pgd_changed = false;
 
for (addr = start; addr < end; addr = next) {
next = pgd_addr_end(addr, end);
@@ -924,13 +902,8 @@ remove_pagetable(unsigned long start, unsigned long end, 
bool direct)
 
pud = (pud_t *)pgd_page_vaddr(*pgd);
remove_pud_table(pud, addr, next, direct);
-   if (free_pud_table(pud, pgd))
-   pgd_changed = true;
}
 
-   if (pgd_changed)
-   sync_global_pgds(start, end - 1, 1);
-
flush_tlb_all();
 }
 
-- 
2.5.5



[PATCH v4 05/16] x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()

2016-06-23 Thread Andy Lutomirski
kernel_unmap_pages_in_pgd() is dangerous: if a pgd entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.

Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions.  This leaves a couple of
other helpers unused, so delete them, too.

Cc: Matt Fleming 
Cc: linux-...@vger.kernel.org
Reviewed-by: Matt Fleming 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/efi.h   |  1 -
 arch/x86/include/asm/pgtable_types.h |  2 --
 arch/x86/mm/pageattr.c   | 28 
 arch/x86/platform/efi/efi.c  |  2 --
 arch/x86/platform/efi/efi_32.c   |  3 ---
 arch/x86/platform/efi/efi_64.c   |  5 -
 6 files changed, 41 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 78d1e7467eae..45ea38df86d4 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -125,7 +125,6 @@ extern void __init efi_map_region_fixed(efi_memory_desc_t 
*md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
-extern void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
 extern void __init old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 7b5efe264eff..0b9f58ad10c8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -475,8 +475,6 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
   unsigned numpages, unsigned long page_flags);
-void kernel_unmap_pages_in_pgd(pgd_t *root, unsigned long address,
-  unsigned numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a8026918bf6..762162af3662 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -746,18 +746,6 @@ static bool try_to_free_pmd_page(pmd_t *pmd)
return true;
 }
 
-static bool try_to_free_pud_page(pud_t *pud)
-{
-   int i;
-
-   for (i = 0; i < PTRS_PER_PUD; i++)
-   if (!pud_none(pud[i]))
-   return false;
-
-   free_page((unsigned long)pud);
-   return true;
-}
-
 static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
 {
pte_t *pte = pte_offset_kernel(pmd, start);
@@ -871,16 +859,6 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long 
start, unsigned long end)
 */
 }
 
-static void unmap_pgd_range(pgd_t *root, unsigned long addr, unsigned long end)
-{
-   pgd_t *pgd_entry = root + pgd_index(addr);
-
-   unmap_pud_range(pgd_entry, addr, end);
-
-   if (try_to_free_pud_page((pud_t *)pgd_page_vaddr(*pgd_entry)))
-   pgd_clear(pgd_entry);
-}
-
 static int alloc_pte_page(pmd_t *pmd)
 {
pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
@@ -1993,12 +1971,6 @@ out:
return retval;
 }
 
-void kernel_unmap_pages_in_pgd(pgd_t *root, unsigned long address,
-  unsigned numpages)
-{
-   unmap_pgd_range(root, address, address + (numpages << PAGE_SHIFT));
-}
-
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f93545e7dc54..62986e5fbdba 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -978,8 +978,6 @@ static void __init __efi_enter_virtual_mode(void)
 * EFI mixed mode we need all of memory to be accessible when
 * we pass parameters to the EFI runtime services in the
 * thunking code.
-*
-* efi_cleanup_page_tables(__pa(new_memmap), 1 << pg_shift);
 */
free_pages((unsigned long)new_memmap, pg_shift);
 
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 338402b91d2e..cef39b097649 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -49,9 +49,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 {
return 0;
 }
-void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned 
num_pages)
-{
-}
 
 void __init efi_map_region(efi_memory_desc_t *md)
 {
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6e7242be1c87..5ab219c2ba43 

[PATCH v4 02/16] rxrpc: Avoid using stack memory in SG lists in rxkad

2016-06-23 Thread Andy Lutomirski
From: Herbert Xu 

rxkad uses stack memory in SG lists which would not work if stacks
were allocated from vmalloc memory.  In fact, in most cases this
isn't even necessary as the stack memory ends up getting copied
over to kmalloc memory.

This patch eliminates all the unnecessary stack memory uses by
supplying the final destination directly to the crypto API.  In
two instances where a temporary buffer is actually needed we also
switch use the skb->cb area instead of the stack.

Finally there is no need to split a split-page buffer into two SG
entries so code dealing with that has been removed.

Message-Id: <20160623064137.ga8...@gondor.apana.org.au>
Signed-off-by: Herbert Xu 
Signed-off-by: Andy Lutomirski 
---
 net/rxrpc/ar-internal.h |   1 +
 net/rxrpc/rxkad.c   | 103 
 2 files changed, 44 insertions(+), 60 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index f0b807a163fa..8ee5933982f3 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -277,6 +277,7 @@ struct rxrpc_connection {
struct key  *key;   /* security for this connection 
(client) */
struct key  *server_key;/* security for this service */
struct crypto_skcipher  *cipher;/* encryption handle */
+   struct rxrpc_crypt  csum_iv_head;   /* leading block for csum_iv */
struct rxrpc_crypt  csum_iv;/* packet checksum base */
unsigned long   events;
 #define RXRPC_CONN_CHALLENGE   0   /* send challenge packet */
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index bab56ed649ba..a28a3c6fdf1d 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -105,11 +105,9 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
 {
struct rxrpc_key_token *token;
SKCIPHER_REQUEST_ON_STACK(req, conn->cipher);
-   struct scatterlist sg[2];
+   struct rxrpc_crypt *csum_iv;
+   struct scatterlist sg;
struct rxrpc_crypt iv;
-   struct {
-   __be32 x[4];
-   } tmpbuf __attribute__((aligned(16))); /* must all be in same page */
 
_enter("");
 
@@ -119,24 +117,21 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
token = conn->key->payload.data[0];
memcpy(, token->kad->session_key, sizeof(iv));
 
-   tmpbuf.x[0] = htonl(conn->epoch);
-   tmpbuf.x[1] = htonl(conn->cid);
-   tmpbuf.x[2] = 0;
-   tmpbuf.x[3] = htonl(conn->security_ix);
+   csum_iv = >csum_iv_head;
+   csum_iv[0].x[0] = htonl(conn->epoch);
+   csum_iv[0].x[1] = htonl(conn->cid);
+   csum_iv[1].x[0] = 0;
+   csum_iv[1].x[1] = htonl(conn->security_ix);
 
-   sg_init_one([0], , sizeof(tmpbuf));
-   sg_init_one([1], , sizeof(tmpbuf));
+   sg_init_one(, csum_iv, 16);
 
skcipher_request_set_tfm(req, conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, [1], [0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, , , 16, iv.x);
 
crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
 
-   memcpy(>csum_iv, [2], sizeof(conn->csum_iv));
-   ASSERTCMP((u32 __force)conn->csum_iv.n[0], ==, (u32 
__force)tmpbuf.x[2]);
-
_leave("");
 }
 
@@ -150,12 +145,9 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
 {
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
+   struct rxkad_level1_hdr hdr;
struct rxrpc_crypt iv;
-   struct scatterlist sg[2];
-   struct {
-   struct rxkad_level1_hdr hdr;
-   __be32  first;  /* first four bytes of data and padding */
-   } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+   struct scatterlist sg;
u16 check;
 
sp = rxrpc_skb(skb);
@@ -165,24 +157,21 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
check = sp->hdr.seq ^ sp->hdr.callNumber;
data_size |= (u32)check << 16;
 
-   tmpbuf.hdr.data_size = htonl(data_size);
-   memcpy(, sechdr + 4, sizeof(tmpbuf.first));
+   hdr.data_size = htonl(data_size);
+   memcpy(sechdr, , sizeof(hdr));
 
/* start the encryption afresh */
memset(, 0, sizeof(iv));
 
-   sg_init_one([0], , sizeof(tmpbuf));
-   sg_init_one([1], , sizeof(tmpbuf));
+   sg_init_one(, sechdr, 8);
 
skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, [1], [0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, , , 8, iv.x);
 
crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
 
-   memcpy(sechdr, , sizeof(tmpbuf));
-
_leave(" = 

[PATCH v4 05/16] x86/mm: Remove kernel_unmap_pages_in_pgd() and efi_cleanup_page_tables()

2016-06-23 Thread Andy Lutomirski
kernel_unmap_pages_in_pgd() is dangerous: if a pgd entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.

Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions.  This leaves a couple of
other helpers unused, so delete them, too.

Cc: Matt Fleming 
Cc: linux-...@vger.kernel.org
Reviewed-by: Matt Fleming 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/efi.h   |  1 -
 arch/x86/include/asm/pgtable_types.h |  2 --
 arch/x86/mm/pageattr.c   | 28 
 arch/x86/platform/efi/efi.c  |  2 --
 arch/x86/platform/efi/efi_32.c   |  3 ---
 arch/x86/platform/efi/efi_64.c   |  5 -
 6 files changed, 41 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 78d1e7467eae..45ea38df86d4 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -125,7 +125,6 @@ extern void __init efi_map_region_fixed(efi_memory_desc_t 
*md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
-extern void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
 extern void __init old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 7b5efe264eff..0b9f58ad10c8 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -475,8 +475,6 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
   unsigned numpages, unsigned long page_flags);
-void kernel_unmap_pages_in_pgd(pgd_t *root, unsigned long address,
-  unsigned numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a8026918bf6..762162af3662 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -746,18 +746,6 @@ static bool try_to_free_pmd_page(pmd_t *pmd)
return true;
 }
 
-static bool try_to_free_pud_page(pud_t *pud)
-{
-   int i;
-
-   for (i = 0; i < PTRS_PER_PUD; i++)
-   if (!pud_none(pud[i]))
-   return false;
-
-   free_page((unsigned long)pud);
-   return true;
-}
-
 static bool unmap_pte_range(pmd_t *pmd, unsigned long start, unsigned long end)
 {
pte_t *pte = pte_offset_kernel(pmd, start);
@@ -871,16 +859,6 @@ static void unmap_pud_range(pgd_t *pgd, unsigned long 
start, unsigned long end)
 */
 }
 
-static void unmap_pgd_range(pgd_t *root, unsigned long addr, unsigned long end)
-{
-   pgd_t *pgd_entry = root + pgd_index(addr);
-
-   unmap_pud_range(pgd_entry, addr, end);
-
-   if (try_to_free_pud_page((pud_t *)pgd_page_vaddr(*pgd_entry)))
-   pgd_clear(pgd_entry);
-}
-
 static int alloc_pte_page(pmd_t *pmd)
 {
pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL | __GFP_NOTRACK);
@@ -1993,12 +1971,6 @@ out:
return retval;
 }
 
-void kernel_unmap_pages_in_pgd(pgd_t *root, unsigned long address,
-  unsigned numpages)
-{
-   unmap_pgd_range(root, address, address + (numpages << PAGE_SHIFT));
-}
-
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index f93545e7dc54..62986e5fbdba 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -978,8 +978,6 @@ static void __init __efi_enter_virtual_mode(void)
 * EFI mixed mode we need all of memory to be accessible when
 * we pass parameters to the EFI runtime services in the
 * thunking code.
-*
-* efi_cleanup_page_tables(__pa(new_memmap), 1 << pg_shift);
 */
free_pages((unsigned long)new_memmap, pg_shift);
 
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 338402b91d2e..cef39b097649 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -49,9 +49,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 {
return 0;
 }
-void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned 
num_pages)
-{
-}
 
 void __init efi_map_region(efi_memory_desc_t *md)
 {
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6e7242be1c87..5ab219c2ba43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ 

[PATCH v4 02/16] rxrpc: Avoid using stack memory in SG lists in rxkad

2016-06-23 Thread Andy Lutomirski
From: Herbert Xu 

rxkad uses stack memory in SG lists which would not work if stacks
were allocated from vmalloc memory.  In fact, in most cases this
isn't even necessary as the stack memory ends up getting copied
over to kmalloc memory.

This patch eliminates all the unnecessary stack memory uses by
supplying the final destination directly to the crypto API.  In
two instances where a temporary buffer is actually needed we also
switch use the skb->cb area instead of the stack.

Finally there is no need to split a split-page buffer into two SG
entries so code dealing with that has been removed.

Message-Id: <20160623064137.ga8...@gondor.apana.org.au>
Signed-off-by: Herbert Xu 
Signed-off-by: Andy Lutomirski 
---
 net/rxrpc/ar-internal.h |   1 +
 net/rxrpc/rxkad.c   | 103 
 2 files changed, 44 insertions(+), 60 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index f0b807a163fa..8ee5933982f3 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -277,6 +277,7 @@ struct rxrpc_connection {
struct key  *key;   /* security for this connection 
(client) */
struct key  *server_key;/* security for this service */
struct crypto_skcipher  *cipher;/* encryption handle */
+   struct rxrpc_crypt  csum_iv_head;   /* leading block for csum_iv */
struct rxrpc_crypt  csum_iv;/* packet checksum base */
unsigned long   events;
 #define RXRPC_CONN_CHALLENGE   0   /* send challenge packet */
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index bab56ed649ba..a28a3c6fdf1d 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -105,11 +105,9 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
 {
struct rxrpc_key_token *token;
SKCIPHER_REQUEST_ON_STACK(req, conn->cipher);
-   struct scatterlist sg[2];
+   struct rxrpc_crypt *csum_iv;
+   struct scatterlist sg;
struct rxrpc_crypt iv;
-   struct {
-   __be32 x[4];
-   } tmpbuf __attribute__((aligned(16))); /* must all be in same page */
 
_enter("");
 
@@ -119,24 +117,21 @@ static void rxkad_prime_packet_security(struct 
rxrpc_connection *conn)
token = conn->key->payload.data[0];
memcpy(, token->kad->session_key, sizeof(iv));
 
-   tmpbuf.x[0] = htonl(conn->epoch);
-   tmpbuf.x[1] = htonl(conn->cid);
-   tmpbuf.x[2] = 0;
-   tmpbuf.x[3] = htonl(conn->security_ix);
+   csum_iv = >csum_iv_head;
+   csum_iv[0].x[0] = htonl(conn->epoch);
+   csum_iv[0].x[1] = htonl(conn->cid);
+   csum_iv[1].x[0] = 0;
+   csum_iv[1].x[1] = htonl(conn->security_ix);
 
-   sg_init_one([0], , sizeof(tmpbuf));
-   sg_init_one([1], , sizeof(tmpbuf));
+   sg_init_one(, csum_iv, 16);
 
skcipher_request_set_tfm(req, conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, [1], [0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, , , 16, iv.x);
 
crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
 
-   memcpy(>csum_iv, [2], sizeof(conn->csum_iv));
-   ASSERTCMP((u32 __force)conn->csum_iv.n[0], ==, (u32 
__force)tmpbuf.x[2]);
-
_leave("");
 }
 
@@ -150,12 +145,9 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
 {
struct rxrpc_skb_priv *sp;
SKCIPHER_REQUEST_ON_STACK(req, call->conn->cipher);
+   struct rxkad_level1_hdr hdr;
struct rxrpc_crypt iv;
-   struct scatterlist sg[2];
-   struct {
-   struct rxkad_level1_hdr hdr;
-   __be32  first;  /* first four bytes of data and padding */
-   } tmpbuf __attribute__((aligned(8))); /* must all be in same page */
+   struct scatterlist sg;
u16 check;
 
sp = rxrpc_skb(skb);
@@ -165,24 +157,21 @@ static int rxkad_secure_packet_auth(const struct 
rxrpc_call *call,
check = sp->hdr.seq ^ sp->hdr.callNumber;
data_size |= (u32)check << 16;
 
-   tmpbuf.hdr.data_size = htonl(data_size);
-   memcpy(, sechdr + 4, sizeof(tmpbuf.first));
+   hdr.data_size = htonl(data_size);
+   memcpy(sechdr, , sizeof(hdr));
 
/* start the encryption afresh */
memset(, 0, sizeof(iv));
 
-   sg_init_one([0], , sizeof(tmpbuf));
-   sg_init_one([1], , sizeof(tmpbuf));
+   sg_init_one(, sechdr, 8);
 
skcipher_request_set_tfm(req, call->conn->cipher);
skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, [1], [0], sizeof(tmpbuf), iv.x);
+   skcipher_request_set_crypt(req, , , 8, iv.x);
 
crypto_skcipher_encrypt(req);
skcipher_request_zero(req);
 
-   memcpy(sechdr, , sizeof(tmpbuf));
-
_leave(" = 0");
return 0;
 }
@@ -196,8 +185,7 @@ static int 

[PATCH v4 00/16] Virtually mapped stacks with guard pages (x86, core)

2016-06-23 Thread Andy Lutomirski
Since the dawn of time, a kernel stack overflow has been a real PITA
to debug, has caused nondeterministic crashes some time after the
actual overflow, and has generally been easy to exploit for root.

With this series, arches can enable HAVE_ARCH_VMAP_STACK.  Arches
that enable it (just x86 for now) get virtually mapped stacks with
guard pages.  This causes reliable faults when the stack overflows.

If the arch implements it well, we get a nice OOPS on stack overflow
(as opposed to panicing directly or otherwise exploding badly).  On
x86, the OOPS is nice, has a usable call trace, and the overflowing
task is killed cleanly.

On my laptop, this adds about 1.5µs of overhead to task creation,
which seems to be mainly caused by vmalloc inefficiently allocating
individual pages even when a higher-order page is available on the
freelist.

This does not address interrupt stacks.  It also does not address
the possibility of privilege escalation by a controlled stack
overflow that overwrites thread_info without hitting the guard page.
I'll send patches to address the latter issue once this series
lands.

It's worth noting that s390 has an arch-specific gcc feature that
detects stack overflows by adjusting function prologues.  Arches
with features like that may wish to avoid using vmapped stacks to
minimize the performance hit.

Ingo, would it make sense to throw it into a seaparate branch in
-tip?  I wouldn't mind seeing some -next testing to give people a
chance to shake out problems.  I'm particularly interested in
whether there are any drivers that expect virt_to_phys to work on
stack addresses.  (I know that virtio-net used to, but I fixed that
a while back.)

Once this lands in -tip, I'm planning on attacking thread_info.
Once thread_info is under control, we can start caching a couple of
stacks per cpu, and that should get us most of the performance back.

Changes from v3:
 - Fix rxrpc and bluetooth, which used scatterlists pointed at the stack
 - Add some acks and cc's

Changes from v2:
 - Delete kernel_unmap_pages_in_pgd rather than hardening it (Borislav)
 - Fix sub-page stack accounting better (Josh)

Changes from v1:
 - Fix rewind_stack_and_do_exit (Josh)
 - Fix deadlock under load
 - Clean up generic stack vmalloc code
 - Many other minor fixes
 
Andy Lutomirski (14):
  bluetooth: Switch SMP to crypto_cipher_encrypt_one()
  x86/cpa: In populate_pgd, don't set the pgd entry until it's populated
  x86/mm: Remove kernel_unmap_pages_in_pgd() and
efi_cleanup_page_tables()
  mm: Track NR_KERNEL_STACK in KiB instead of number of stacks
  mm: Fix memcg stack accounting for sub-page stacks
  dma-api: Teach the "DMA-from-stack" check about vmapped stacks
  fork: Add generic vmalloced stack support
  x86/die: Don't try to recover from an OOPS on a non-default stack
  x86/dumpstack: When OOPSing, rewind the stack before do_exit
  x86/dumpstack: When dumping stack bytes due to OOPS, start with
regs->sp
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/dumpstack/64: Handle faults when printing the "Stack:" part of an
OOPS
  x86/mm/64: Enable vmapped stacks
  x86/mm: Improve stack-overflow #PF handling

Herbert Xu (1):
  rxrpc: Avoid using stack memory in SG lists in rxkad

Ingo Molnar (1):
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()

 arch/Kconfig |  29 ++
 arch/ia64/include/asm/thread_info.h  |   2 +-
 arch/x86/Kconfig |   1 +
 arch/x86/entry/entry_32.S|  11 
 arch/x86/entry/entry_64.S|  11 
 arch/x86/include/asm/efi.h   |   1 -
 arch/x86/include/asm/pgtable_types.h |   2 -
 arch/x86/include/asm/switch_to.h |  28 +-
 arch/x86/include/asm/traps.h |   6 ++
 arch/x86/kernel/dumpstack.c  |  19 ++-
 arch/x86/kernel/dumpstack_32.c   |   4 +-
 arch/x86/kernel/dumpstack_64.c   |  16 +-
 arch/x86/kernel/traps.c  |  32 +++
 arch/x86/mm/fault.c  |  39 +
 arch/x86/mm/init_64.c|  27 -
 arch/x86/mm/pageattr.c   |  32 +--
 arch/x86/mm/tlb.c|  15 +
 arch/x86/platform/efi/efi.c  |   2 -
 arch/x86/platform/efi/efi_32.c   |   3 -
 arch/x86/platform/efi/efi_64.c   |   5 --
 drivers/base/node.c  |   3 +-
 fs/proc/meminfo.c|   2 +-
 include/linux/memcontrol.h   |   2 +-
 include/linux/mmzone.h   |   2 +-
 include/linux/sched.h|  15 +
 kernel/fork.c|  86 ++---
 lib/dma-debug.c  |  39 +++--
 mm/memcontrol.c  |   2 +-
 mm/page_alloc.c  |   3 +-
 net/bluetooth/smp.c  |  67 ++-
 net/rxrpc/ar-internal.h  |   1 +
 net/rxrpc/rxkad.c| 103 

[PATCH v4 00/16] Virtually mapped stacks with guard pages (x86, core)

2016-06-23 Thread Andy Lutomirski
Since the dawn of time, a kernel stack overflow has been a real PITA
to debug, has caused nondeterministic crashes some time after the
actual overflow, and has generally been easy to exploit for root.

With this series, arches can enable HAVE_ARCH_VMAP_STACK.  Arches
that enable it (just x86 for now) get virtually mapped stacks with
guard pages.  This causes reliable faults when the stack overflows.

If the arch implements it well, we get a nice OOPS on stack overflow
(as opposed to panicing directly or otherwise exploding badly).  On
x86, the OOPS is nice, has a usable call trace, and the overflowing
task is killed cleanly.

On my laptop, this adds about 1.5µs of overhead to task creation,
which seems to be mainly caused by vmalloc inefficiently allocating
individual pages even when a higher-order page is available on the
freelist.

This does not address interrupt stacks.  It also does not address
the possibility of privilege escalation by a controlled stack
overflow that overwrites thread_info without hitting the guard page.
I'll send patches to address the latter issue once this series
lands.

It's worth noting that s390 has an arch-specific gcc feature that
detects stack overflows by adjusting function prologues.  Arches
with features like that may wish to avoid using vmapped stacks to
minimize the performance hit.

Ingo, would it make sense to throw it into a seaparate branch in
-tip?  I wouldn't mind seeing some -next testing to give people a
chance to shake out problems.  I'm particularly interested in
whether there are any drivers that expect virt_to_phys to work on
stack addresses.  (I know that virtio-net used to, but I fixed that
a while back.)

Once this lands in -tip, I'm planning on attacking thread_info.
Once thread_info is under control, we can start caching a couple of
stacks per cpu, and that should get us most of the performance back.

Changes from v3:
 - Fix rxrpc and bluetooth, which used scatterlists pointed at the stack
 - Add some acks and cc's

Changes from v2:
 - Delete kernel_unmap_pages_in_pgd rather than hardening it (Borislav)
 - Fix sub-page stack accounting better (Josh)

Changes from v1:
 - Fix rewind_stack_and_do_exit (Josh)
 - Fix deadlock under load
 - Clean up generic stack vmalloc code
 - Many other minor fixes
 
Andy Lutomirski (14):
  bluetooth: Switch SMP to crypto_cipher_encrypt_one()
  x86/cpa: In populate_pgd, don't set the pgd entry until it's populated
  x86/mm: Remove kernel_unmap_pages_in_pgd() and
efi_cleanup_page_tables()
  mm: Track NR_KERNEL_STACK in KiB instead of number of stacks
  mm: Fix memcg stack accounting for sub-page stacks
  dma-api: Teach the "DMA-from-stack" check about vmapped stacks
  fork: Add generic vmalloced stack support
  x86/die: Don't try to recover from an OOPS on a non-default stack
  x86/dumpstack: When OOPSing, rewind the stack before do_exit
  x86/dumpstack: When dumping stack bytes due to OOPS, start with
regs->sp
  x86/dumpstack: Try harder to get a call trace on stack overflow
  x86/dumpstack/64: Handle faults when printing the "Stack:" part of an
OOPS
  x86/mm/64: Enable vmapped stacks
  x86/mm: Improve stack-overflow #PF handling

Herbert Xu (1):
  rxrpc: Avoid using stack memory in SG lists in rxkad

Ingo Molnar (1):
  x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()

 arch/Kconfig |  29 ++
 arch/ia64/include/asm/thread_info.h  |   2 +-
 arch/x86/Kconfig |   1 +
 arch/x86/entry/entry_32.S|  11 
 arch/x86/entry/entry_64.S|  11 
 arch/x86/include/asm/efi.h   |   1 -
 arch/x86/include/asm/pgtable_types.h |   2 -
 arch/x86/include/asm/switch_to.h |  28 +-
 arch/x86/include/asm/traps.h |   6 ++
 arch/x86/kernel/dumpstack.c  |  19 ++-
 arch/x86/kernel/dumpstack_32.c   |   4 +-
 arch/x86/kernel/dumpstack_64.c   |  16 +-
 arch/x86/kernel/traps.c  |  32 +++
 arch/x86/mm/fault.c  |  39 +
 arch/x86/mm/init_64.c|  27 -
 arch/x86/mm/pageattr.c   |  32 +--
 arch/x86/mm/tlb.c|  15 +
 arch/x86/platform/efi/efi.c  |   2 -
 arch/x86/platform/efi/efi_32.c   |   3 -
 arch/x86/platform/efi/efi_64.c   |   5 --
 drivers/base/node.c  |   3 +-
 fs/proc/meminfo.c|   2 +-
 include/linux/memcontrol.h   |   2 +-
 include/linux/mmzone.h   |   2 +-
 include/linux/sched.h|  15 +
 kernel/fork.c|  86 ++---
 lib/dma-debug.c  |  39 +++--
 mm/memcontrol.c  |   2 +-
 mm/page_alloc.c  |   3 +-
 net/bluetooth/smp.c  |  67 ++-
 net/rxrpc/ar-internal.h  |   1 +
 net/rxrpc/rxkad.c| 103 

[PATCH v4 01/16] bluetooth: Switch SMP to crypto_cipher_encrypt_one()

2016-06-23 Thread Andy Lutomirski
SMP does ECB crypto on stack buffers.  This is complicated and
fragile, and it will not work if the stack is virtually allocated.

Switch to the crypto_cipher interface, which is simpler and safer.

Cc: Marcel Holtmann 
Cc: Gustavo Padovan 
Cc: Johan Hedberg 
Cc: "David S. Miller" 
Cc: linux-blueto...@vger.kernel.org
Cc: Herbert Xu 
Cc: net...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 net/bluetooth/smp.c | 67 ++---
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index 50976a6481f3..4c1a16a96ae5 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -22,9 +22,9 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -88,7 +88,7 @@ struct smp_dev {
u8  min_key_size;
u8  max_key_size;
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -127,7 +127,7 @@ struct smp_chan {
u8  dhkey[32];
u8  mackey[16];
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -361,10 +361,8 @@ static int smp_h6(struct crypto_shash *tfm_cmac, const u8 
w[16],
  * s1 and ah.
  */
 
-static int smp_e(struct crypto_skcipher *tfm, const u8 *k, u8 *r)
+static int smp_e(struct crypto_cipher *tfm, const u8 *k, u8 *r)
 {
-   SKCIPHER_REQUEST_ON_STACK(req, tfm);
-   struct scatterlist sg;
uint8_t tmp[16], data[16];
int err;
 
@@ -378,7 +376,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* The most significant octet of key corresponds to k[0] */
swap_buf(k, tmp, 16);
 
-   err = crypto_skcipher_setkey(tfm, tmp, 16);
+   err = crypto_cipher_setkey(tfm, tmp, 16);
if (err) {
BT_ERR("cipher setkey failed: %d", err);
return err;
@@ -387,16 +385,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* Most significant octet of plaintextData corresponds to data[0] */
swap_buf(r, data, 16);
 
-   sg_init_one(, data, 16);
-
-   skcipher_request_set_tfm(req, tfm);
-   skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, , , 16, NULL);
-
-   err = crypto_skcipher_encrypt(req);
-   skcipher_request_zero(req);
-   if (err)
-   BT_ERR("Encrypt data error %d", err);
+   crypto_cipher_encrypt_one(tfm, data, data);
 
/* Most significant octet of encryptedData corresponds to data[0] */
swap_buf(data, r, 16);
@@ -406,7 +395,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
return err;
 }
 
-static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_c1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r[16], const u8 preq[7], const u8 pres[7], u8 _iat,
  const bdaddr_t *ia, u8 _rat, const bdaddr_t *ra, u8 res[16])
 {
@@ -455,7 +444,7 @@ static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_s1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r1[16], const u8 r2[16], u8 _r[16])
 {
int err;
@@ -471,7 +460,7 @@ static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_ah(struct crypto_skcipher *tfm, const u8 irk[16],
+static int smp_ah(struct crypto_cipher *tfm, const u8 irk[16],
  const u8 r[3], u8 res[3])
 {
u8 _res[16];
@@ -759,7 +748,7 @@ static void smp_chan_destroy(struct l2cap_conn *conn)
kzfree(smp->slave_csrk);
kzfree(smp->link_key);
 
-   crypto_free_skcipher(smp->tfm_aes);
+   crypto_free_cipher(smp->tfm_aes);
crypto_free_shash(smp->tfm_cmac);
 
/* Ensure that we don't leave any debug key around if debug key
@@ -1359,9 +1348,9 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)
if (!smp)
return NULL;
 
-   smp->tfm_aes = crypto_alloc_skcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC);
+   smp->tfm_aes = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(smp->tfm_aes)) {
-   BT_ERR("Unable to create ECB crypto context");
+   BT_ERR("Unable to create AES crypto context");
kzfree(smp);
return NULL;
}
@@ -1369,7 +1358,7 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)
smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
if 

[PATCH v4 07/16] mm: Fix memcg stack accounting for sub-page stacks

2016-06-23 Thread Andy Lutomirski
We should account for stacks regardless of stack size, and we need
to account in sub-page units if THREAD_SIZE < PAGE_SIZE.  Change the
units to kilobytes and Move it into account_kernel_stack().

Fixes: 12580e4b54ba8 ("mm: memcontrol: report kernel stack usage in cgroup2 
memory.stat")
Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Reviewed-by: Vladimir Davydov 
Acked-by: Michal Hocko 
Signed-off-by: Andy Lutomirski 
---
 include/linux/memcontrol.h |  2 +-
 kernel/fork.c  | 15 ++-
 mm/memcontrol.c|  2 +-
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..3b653b86bb8f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -52,7 +52,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_SWAP,   /* # of pages, swapped out */
MEM_CGROUP_STAT_NSTATS,
/* default hierarchy stats */
-   MEMCG_KERNEL_STACK = MEM_CGROUP_STAT_NSTATS,
+   MEMCG_KERNEL_STACK_KB = MEM_CGROUP_STAT_NSTATS,
MEMCG_SLAB_RECLAIMABLE,
MEMCG_SLAB_UNRECLAIMABLE,
MEMCG_SOCK,
diff --git a/kernel/fork.c b/kernel/fork.c
index be7f006af727..ff3c41c2ba96 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -165,20 +165,12 @@ static struct thread_info *alloc_thread_info_node(struct 
task_struct *tsk,
struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP,
  THREAD_SIZE_ORDER);
 
-   if (page)
-   memcg_kmem_update_page_stat(page, MEMCG_KERNEL_STACK,
-   1 << THREAD_SIZE_ORDER);
-
return page ? page_address(page) : NULL;
 }
 
 static inline void free_thread_info(struct thread_info *ti)
 {
-   struct page *page = virt_to_page(ti);
-
-   memcg_kmem_update_page_stat(page, MEMCG_KERNEL_STACK,
-   -(1 << THREAD_SIZE_ORDER));
-   __free_kmem_pages(page, THREAD_SIZE_ORDER);
+   free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
 }
 # else
 static struct kmem_cache *thread_info_cache;
@@ -227,6 +219,11 @@ static void account_kernel_stack(struct thread_info *ti, 
int account)
 
mod_zone_page_state(zone, NR_KERNEL_STACK_KB,
THREAD_SIZE / 1024 * account);
+
+   /* All stack pages belong to the same memcg. */
+   memcg_kmem_update_page_stat(
+   virt_to_page(ti), MEMCG_KERNEL_STACK_KB,
+   account * (THREAD_SIZE / 1024));
 }
 
 void free_task(struct task_struct *tsk)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 75e74408cc8f..8e13a2419dad 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5133,7 +5133,7 @@ static int memory_stat_show(struct seq_file *m, void *v)
seq_printf(m, "file %llu\n",
   (u64)stat[MEM_CGROUP_STAT_CACHE] * PAGE_SIZE);
seq_printf(m, "kernel_stack %llu\n",
-  (u64)stat[MEMCG_KERNEL_STACK] * PAGE_SIZE);
+  (u64)stat[MEMCG_KERNEL_STACK_KB] * 1024);
seq_printf(m, "slab %llu\n",
   (u64)(stat[MEMCG_SLAB_RECLAIMABLE] +
 stat[MEMCG_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
-- 
2.5.5



[PATCH v4 01/16] bluetooth: Switch SMP to crypto_cipher_encrypt_one()

2016-06-23 Thread Andy Lutomirski
SMP does ECB crypto on stack buffers.  This is complicated and
fragile, and it will not work if the stack is virtually allocated.

Switch to the crypto_cipher interface, which is simpler and safer.

Cc: Marcel Holtmann 
Cc: Gustavo Padovan 
Cc: Johan Hedberg 
Cc: "David S. Miller" 
Cc: linux-blueto...@vger.kernel.org
Cc: Herbert Xu 
Cc: net...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 net/bluetooth/smp.c | 67 ++---
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index 50976a6481f3..4c1a16a96ae5 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -22,9 +22,9 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -88,7 +88,7 @@ struct smp_dev {
u8  min_key_size;
u8  max_key_size;
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -127,7 +127,7 @@ struct smp_chan {
u8  dhkey[32];
u8  mackey[16];
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -361,10 +361,8 @@ static int smp_h6(struct crypto_shash *tfm_cmac, const u8 
w[16],
  * s1 and ah.
  */
 
-static int smp_e(struct crypto_skcipher *tfm, const u8 *k, u8 *r)
+static int smp_e(struct crypto_cipher *tfm, const u8 *k, u8 *r)
 {
-   SKCIPHER_REQUEST_ON_STACK(req, tfm);
-   struct scatterlist sg;
uint8_t tmp[16], data[16];
int err;
 
@@ -378,7 +376,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* The most significant octet of key corresponds to k[0] */
swap_buf(k, tmp, 16);
 
-   err = crypto_skcipher_setkey(tfm, tmp, 16);
+   err = crypto_cipher_setkey(tfm, tmp, 16);
if (err) {
BT_ERR("cipher setkey failed: %d", err);
return err;
@@ -387,16 +385,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* Most significant octet of plaintextData corresponds to data[0] */
swap_buf(r, data, 16);
 
-   sg_init_one(, data, 16);
-
-   skcipher_request_set_tfm(req, tfm);
-   skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, , , 16, NULL);
-
-   err = crypto_skcipher_encrypt(req);
-   skcipher_request_zero(req);
-   if (err)
-   BT_ERR("Encrypt data error %d", err);
+   crypto_cipher_encrypt_one(tfm, data, data);
 
/* Most significant octet of encryptedData corresponds to data[0] */
swap_buf(data, r, 16);
@@ -406,7 +395,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
return err;
 }
 
-static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_c1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r[16], const u8 preq[7], const u8 pres[7], u8 _iat,
  const bdaddr_t *ia, u8 _rat, const bdaddr_t *ra, u8 res[16])
 {
@@ -455,7 +444,7 @@ static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_s1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r1[16], const u8 r2[16], u8 _r[16])
 {
int err;
@@ -471,7 +460,7 @@ static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_ah(struct crypto_skcipher *tfm, const u8 irk[16],
+static int smp_ah(struct crypto_cipher *tfm, const u8 irk[16],
  const u8 r[3], u8 res[3])
 {
u8 _res[16];
@@ -759,7 +748,7 @@ static void smp_chan_destroy(struct l2cap_conn *conn)
kzfree(smp->slave_csrk);
kzfree(smp->link_key);
 
-   crypto_free_skcipher(smp->tfm_aes);
+   crypto_free_cipher(smp->tfm_aes);
crypto_free_shash(smp->tfm_cmac);
 
/* Ensure that we don't leave any debug key around if debug key
@@ -1359,9 +1348,9 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)
if (!smp)
return NULL;
 
-   smp->tfm_aes = crypto_alloc_skcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC);
+   smp->tfm_aes = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(smp->tfm_aes)) {
-   BT_ERR("Unable to create ECB crypto context");
+   BT_ERR("Unable to create AES crypto context");
kzfree(smp);
return NULL;
}
@@ -1369,7 +1358,7 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)
smp->tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
if (IS_ERR(smp->tfm_cmac)) {
BT_ERR("Unable to create CMAC crypto context");
-   crypto_free_skcipher(smp->tfm_aes);
+

[PATCH v4 07/16] mm: Fix memcg stack accounting for sub-page stacks

2016-06-23 Thread Andy Lutomirski
We should account for stacks regardless of stack size, and we need
to account in sub-page units if THREAD_SIZE < PAGE_SIZE.  Change the
units to kilobytes and Move it into account_kernel_stack().

Fixes: 12580e4b54ba8 ("mm: memcontrol: report kernel stack usage in cgroup2 
memory.stat")
Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Reviewed-by: Vladimir Davydov 
Acked-by: Michal Hocko 
Signed-off-by: Andy Lutomirski 
---
 include/linux/memcontrol.h |  2 +-
 kernel/fork.c  | 15 ++-
 mm/memcontrol.c|  2 +-
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..3b653b86bb8f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -52,7 +52,7 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_SWAP,   /* # of pages, swapped out */
MEM_CGROUP_STAT_NSTATS,
/* default hierarchy stats */
-   MEMCG_KERNEL_STACK = MEM_CGROUP_STAT_NSTATS,
+   MEMCG_KERNEL_STACK_KB = MEM_CGROUP_STAT_NSTATS,
MEMCG_SLAB_RECLAIMABLE,
MEMCG_SLAB_UNRECLAIMABLE,
MEMCG_SOCK,
diff --git a/kernel/fork.c b/kernel/fork.c
index be7f006af727..ff3c41c2ba96 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -165,20 +165,12 @@ static struct thread_info *alloc_thread_info_node(struct 
task_struct *tsk,
struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP,
  THREAD_SIZE_ORDER);
 
-   if (page)
-   memcg_kmem_update_page_stat(page, MEMCG_KERNEL_STACK,
-   1 << THREAD_SIZE_ORDER);
-
return page ? page_address(page) : NULL;
 }
 
 static inline void free_thread_info(struct thread_info *ti)
 {
-   struct page *page = virt_to_page(ti);
-
-   memcg_kmem_update_page_stat(page, MEMCG_KERNEL_STACK,
-   -(1 << THREAD_SIZE_ORDER));
-   __free_kmem_pages(page, THREAD_SIZE_ORDER);
+   free_kmem_pages((unsigned long)ti, THREAD_SIZE_ORDER);
 }
 # else
 static struct kmem_cache *thread_info_cache;
@@ -227,6 +219,11 @@ static void account_kernel_stack(struct thread_info *ti, 
int account)
 
mod_zone_page_state(zone, NR_KERNEL_STACK_KB,
THREAD_SIZE / 1024 * account);
+
+   /* All stack pages belong to the same memcg. */
+   memcg_kmem_update_page_stat(
+   virt_to_page(ti), MEMCG_KERNEL_STACK_KB,
+   account * (THREAD_SIZE / 1024));
 }
 
 void free_task(struct task_struct *tsk)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 75e74408cc8f..8e13a2419dad 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5133,7 +5133,7 @@ static int memory_stat_show(struct seq_file *m, void *v)
seq_printf(m, "file %llu\n",
   (u64)stat[MEM_CGROUP_STAT_CACHE] * PAGE_SIZE);
seq_printf(m, "kernel_stack %llu\n",
-  (u64)stat[MEMCG_KERNEL_STACK] * PAGE_SIZE);
+  (u64)stat[MEMCG_KERNEL_STACK_KB] * 1024);
seq_printf(m, "slab %llu\n",
   (u64)(stat[MEMCG_SLAB_RECLAIMABLE] +
 stat[MEMCG_SLAB_UNRECLAIMABLE]) * PAGE_SIZE);
-- 
2.5.5



[PATCH v4 10/16] x86/die: Don't try to recover from an OOPS on a non-default stack

2016-06-23 Thread Andy Lutomirski
It's not going to work, because the scheduler will explode if we try
to schedule when running on an IST stack or similar.

This will matter when we let kernel stack overflows (which are #DF)
call die().

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d6209f3a69cb..70d5aae8b8f7 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -245,6 +245,9 @@ void oops_end(unsigned long flags, struct pt_regs *regs, 
int signr)
return;
if (in_interrupt())
panic("Fatal exception in interrupt");
+   if (((current_stack_pointer() ^ (current_top_of_stack() - 1))
+& ~(THREAD_SIZE - 1)) != 0)
+   panic("Fatal exception on special stack");
if (panic_on_oops)
panic("Fatal exception");
do_exit(signr);
-- 
2.5.5



[PATCH v4 09/16] fork: Add generic vmalloced stack support

2016-06-23 Thread Andy Lutomirski
If CONFIG_VMAP_STACK is selected, kernel stacks are allocated with
vmalloc_node.

Signed-off-by: Andy Lutomirski 
---
 arch/Kconfig| 29 +
 arch/ia64/include/asm/thread_info.h |  2 +-
 include/linux/sched.h   | 15 +++
 kernel/fork.c   | 82 +
 4 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index e9734796531f..835eeef0f14d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -661,4 +661,33 @@ config ARCH_NO_COHERENT_DMA_MMAP
 config CPU_NO_EFFICIENT_FFS
def_bool n
 
+config HAVE_ARCH_VMAP_STACK
+   def_bool n
+   help
+ An arch should select this symbol if it can support kernel stacks
+ in vmalloc space.  This means:
+
+ - vmalloc space must be large enough to hold many kernel stacks.
+   This may rule out many 32-bit architectures.
+
+ - Stacks in vmalloc space need to work reliably.  For example, if
+   vmap page tables are created on demand, either this mechanism
+   needs to work while the stack points to a virtual address with
+   unpopulated page tables or arch code (switch_to and switch_mm,
+   most likely) needs to ensure that the stack's page table entries
+   are populated before running on a possibly unpopulated stack.
+
+ - If the stack overflows into a guard page, something reasonable
+   should happen.  The definition of "reasonable" is flexible, but
+   instantly rebooting without logging anything would be unfriendly.
+
+config VMAP_STACK
+   bool "Use a virtually-mapped stack"
+   depends on HAVE_ARCH_VMAP_STACK
+   ---help---
+ Enable this if you want the use virtually-mapped kernel stacks
+ with guard pages.  This causes kernel stack overflows to be
+ caught immediately rather than causing difficult-to-diagnose
+ corruption.
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/ia64/include/asm/thread_info.h 
b/arch/ia64/include/asm/thread_info.h
index aa995b67c3f5..d13edda6e09c 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -56,7 +56,7 @@ struct thread_info {
 #define alloc_thread_info_node(tsk, node)  ((struct thread_info *) 0)
 #define task_thread_info(tsk)  ((struct thread_info *) 0)
 #endif
-#define free_thread_info(ti)   /* nothing */
+#define free_thread_info(tsk)  /* nothing */
 #define task_stack_page(tsk)   ((void *)(tsk))
 
 #define __HAVE_THREAD_FUNCTIONS
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada26345..a37c3b790309 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1918,6 +1918,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_VMAP_STACK
+   struct vm_struct *stack_vm_area;
+#endif
 /* CPU-specific state of this task */
struct thread_struct thread;
 /*
@@ -1934,6 +1937,18 @@ extern int arch_task_struct_size __read_mostly;
 # define arch_task_struct_size (sizeof(struct task_struct))
 #endif
 
+#ifdef CONFIG_VMAP_STACK
+static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
+{
+   return t->stack_vm_area;
+}
+#else
+static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
+{
+   return NULL;
+}
+#endif
+
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
 #define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
 
diff --git a/kernel/fork.c b/kernel/fork.c
index ff3c41c2ba96..fe1c785e5f8c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -158,19 +158,38 @@ void __weak arch_release_thread_info(struct thread_info 
*ti)
  * Allocate pages if THREAD_SIZE is >= PAGE_SIZE, otherwise use a
  * kmemcache based allocator.
  */
-# if THREAD_SIZE >= PAGE_SIZE
+# if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK)
 static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
  int node)
 {
+#ifdef CONFIG_VMAP_STACK
+   struct thread_info *ti = __vmalloc_node_range(
+   THREAD_SIZE, THREAD_SIZE, VMALLOC_START, VMALLOC_END,
+   THREADINFO_GFP | __GFP_HIGHMEM, PAGE_KERNEL,
+   0, node, __builtin_return_address(0));
+
+   /*
+* We can't call find_vm_area() in interrupt context, and
+* free_thread_info can be called in interrupt context, so cache
+* the vm_struct.
+*/
+   if (ti)
+   tsk->stack_vm_area = find_vm_area(ti);
+   return ti;
+#else
struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP,
  THREAD_SIZE_ORDER);
 
return page ? page_address(page) : NULL;
+#endif
 }
 
-static inline void free_thread_info(struct thread_info *ti)
+static inline void free_thread_info(struct task_struct *tsk)
 {
-   

[PATCH v4 10/16] x86/die: Don't try to recover from an OOPS on a non-default stack

2016-06-23 Thread Andy Lutomirski
It's not going to work, because the scheduler will explode if we try
to schedule when running on an IST stack or similar.

This will matter when we let kernel stack overflows (which are #DF)
call die().

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d6209f3a69cb..70d5aae8b8f7 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -245,6 +245,9 @@ void oops_end(unsigned long flags, struct pt_regs *regs, 
int signr)
return;
if (in_interrupt())
panic("Fatal exception in interrupt");
+   if (((current_stack_pointer() ^ (current_top_of_stack() - 1))
+& ~(THREAD_SIZE - 1)) != 0)
+   panic("Fatal exception on special stack");
if (panic_on_oops)
panic("Fatal exception");
do_exit(signr);
-- 
2.5.5



[PATCH v4 09/16] fork: Add generic vmalloced stack support

2016-06-23 Thread Andy Lutomirski
If CONFIG_VMAP_STACK is selected, kernel stacks are allocated with
vmalloc_node.

Signed-off-by: Andy Lutomirski 
---
 arch/Kconfig| 29 +
 arch/ia64/include/asm/thread_info.h |  2 +-
 include/linux/sched.h   | 15 +++
 kernel/fork.c   | 82 +
 4 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index e9734796531f..835eeef0f14d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -661,4 +661,33 @@ config ARCH_NO_COHERENT_DMA_MMAP
 config CPU_NO_EFFICIENT_FFS
def_bool n
 
+config HAVE_ARCH_VMAP_STACK
+   def_bool n
+   help
+ An arch should select this symbol if it can support kernel stacks
+ in vmalloc space.  This means:
+
+ - vmalloc space must be large enough to hold many kernel stacks.
+   This may rule out many 32-bit architectures.
+
+ - Stacks in vmalloc space need to work reliably.  For example, if
+   vmap page tables are created on demand, either this mechanism
+   needs to work while the stack points to a virtual address with
+   unpopulated page tables or arch code (switch_to and switch_mm,
+   most likely) needs to ensure that the stack's page table entries
+   are populated before running on a possibly unpopulated stack.
+
+ - If the stack overflows into a guard page, something reasonable
+   should happen.  The definition of "reasonable" is flexible, but
+   instantly rebooting without logging anything would be unfriendly.
+
+config VMAP_STACK
+   bool "Use a virtually-mapped stack"
+   depends on HAVE_ARCH_VMAP_STACK
+   ---help---
+ Enable this if you want the use virtually-mapped kernel stacks
+ with guard pages.  This causes kernel stack overflows to be
+ caught immediately rather than causing difficult-to-diagnose
+ corruption.
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/ia64/include/asm/thread_info.h 
b/arch/ia64/include/asm/thread_info.h
index aa995b67c3f5..d13edda6e09c 100644
--- a/arch/ia64/include/asm/thread_info.h
+++ b/arch/ia64/include/asm/thread_info.h
@@ -56,7 +56,7 @@ struct thread_info {
 #define alloc_thread_info_node(tsk, node)  ((struct thread_info *) 0)
 #define task_thread_info(tsk)  ((struct thread_info *) 0)
 #endif
-#define free_thread_info(ti)   /* nothing */
+#define free_thread_info(tsk)  /* nothing */
 #define task_stack_page(tsk)   ((void *)(tsk))
 
 #define __HAVE_THREAD_FUNCTIONS
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada26345..a37c3b790309 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1918,6 +1918,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_VMAP_STACK
+   struct vm_struct *stack_vm_area;
+#endif
 /* CPU-specific state of this task */
struct thread_struct thread;
 /*
@@ -1934,6 +1937,18 @@ extern int arch_task_struct_size __read_mostly;
 # define arch_task_struct_size (sizeof(struct task_struct))
 #endif
 
+#ifdef CONFIG_VMAP_STACK
+static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
+{
+   return t->stack_vm_area;
+}
+#else
+static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
+{
+   return NULL;
+}
+#endif
+
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
 #define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
 
diff --git a/kernel/fork.c b/kernel/fork.c
index ff3c41c2ba96..fe1c785e5f8c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -158,19 +158,38 @@ void __weak arch_release_thread_info(struct thread_info 
*ti)
  * Allocate pages if THREAD_SIZE is >= PAGE_SIZE, otherwise use a
  * kmemcache based allocator.
  */
-# if THREAD_SIZE >= PAGE_SIZE
+# if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK)
 static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
  int node)
 {
+#ifdef CONFIG_VMAP_STACK
+   struct thread_info *ti = __vmalloc_node_range(
+   THREAD_SIZE, THREAD_SIZE, VMALLOC_START, VMALLOC_END,
+   THREADINFO_GFP | __GFP_HIGHMEM, PAGE_KERNEL,
+   0, node, __builtin_return_address(0));
+
+   /*
+* We can't call find_vm_area() in interrupt context, and
+* free_thread_info can be called in interrupt context, so cache
+* the vm_struct.
+*/
+   if (ti)
+   tsk->stack_vm_area = find_vm_area(ti);
+   return ti;
+#else
struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP,
  THREAD_SIZE_ORDER);
 
return page ? page_address(page) : NULL;
+#endif
 }
 
-static inline void free_thread_info(struct thread_info *ti)
+static inline void free_thread_info(struct task_struct *tsk)
 {
-   

[PATCH v4 14/16] x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS

2016-06-23 Thread Andy Lutomirski
If we overflow the stack into a guard page, we'll recursively fault
when trying to dump the contents of the guard page.  Use
probe_kernel_address so we can recover if this happens.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_64.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index a81e1ef73bf2..6dede08dd98b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -274,6 +274,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 
stack = sp;
for (i = 0; i < kstack_depth_to_print; i++) {
+   unsigned long word;
+
if (stack >= irq_stack && stack <= irq_stack_end) {
if (stack == irq_stack_end) {
stack = (unsigned long *) (irq_stack_end[-1]);
@@ -283,12 +285,18 @@ show_stack_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
if (kstack_end(stack))
break;
}
+
+   if (probe_kernel_address(stack, word))
+   break;
+
if ((i % STACKSLOTS_PER_LINE) == 0) {
if (i != 0)
pr_cont("\n");
-   printk("%s %016lx", log_lvl, *stack++);
+   printk("%s %016lx", log_lvl, word);
} else
-   pr_cont(" %016lx", *stack++);
+   pr_cont(" %016lx", word);
+
+   stack++;
touch_nmi_watchdog();
}
preempt_enable();
-- 
2.5.5



[PATCH v4 14/16] x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS

2016-06-23 Thread Andy Lutomirski
If we overflow the stack into a guard page, we'll recursively fault
when trying to dump the contents of the guard page.  Use
probe_kernel_address so we can recover if this happens.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_64.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index a81e1ef73bf2..6dede08dd98b 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -274,6 +274,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 
stack = sp;
for (i = 0; i < kstack_depth_to_print; i++) {
+   unsigned long word;
+
if (stack >= irq_stack && stack <= irq_stack_end) {
if (stack == irq_stack_end) {
stack = (unsigned long *) (irq_stack_end[-1]);
@@ -283,12 +285,18 @@ show_stack_log_lvl(struct task_struct *task, struct 
pt_regs *regs,
if (kstack_end(stack))
break;
}
+
+   if (probe_kernel_address(stack, word))
+   break;
+
if ((i % STACKSLOTS_PER_LINE) == 0) {
if (i != 0)
pr_cont("\n");
-   printk("%s %016lx", log_lvl, *stack++);
+   printk("%s %016lx", log_lvl, word);
} else
-   pr_cont(" %016lx", *stack++);
+   pr_cont(" %016lx", word);
+
+   stack++;
touch_nmi_watchdog();
}
preempt_enable();
-- 
2.5.5



[PATCH v4 13/16] x86/dumpstack: Try harder to get a call trace on stack overflow

2016-06-23 Thread Andy Lutomirski
If we overflow the stack, print_context_stack will abort.  Detect
this case and rewind back into the valid part of the stack so that
we can trace it.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 4592bc4ed3e1..4538f7ca9072 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -87,7 +87,7 @@ static inline int valid_stack_ptr(struct task_struct *task,
else
return 0;
}
-   return p > t && p < t + THREAD_SIZE - size;
+   return p >= t && p < t + THREAD_SIZE - size;
 }
 
 unsigned long
@@ -98,6 +98,13 @@ print_context_stack(struct task_struct *task,
 {
struct stack_frame *frame = (struct stack_frame *)bp;
 
+   /*
+* If we overflowed the stack into a guard page, jump back to the
+* bottom of the usable stack.
+*/
+   if ((unsigned long)task->stack - (unsigned long)stack < PAGE_SIZE)
+   stack = (unsigned long *)task->stack;
+
while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
unsigned long addr;
 
-- 
2.5.5



[PATCH v4 13/16] x86/dumpstack: Try harder to get a call trace on stack overflow

2016-06-23 Thread Andy Lutomirski
If we overflow the stack, print_context_stack will abort.  Detect
this case and rewind back into the valid part of the stack so that
we can trace it.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 4592bc4ed3e1..4538f7ca9072 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -87,7 +87,7 @@ static inline int valid_stack_ptr(struct task_struct *task,
else
return 0;
}
-   return p > t && p < t + THREAD_SIZE - size;
+   return p >= t && p < t + THREAD_SIZE - size;
 }
 
 unsigned long
@@ -98,6 +98,13 @@ print_context_stack(struct task_struct *task,
 {
struct stack_frame *frame = (struct stack_frame *)bp;
 
+   /*
+* If we overflowed the stack into a guard page, jump back to the
+* bottom of the usable stack.
+*/
+   if ((unsigned long)task->stack - (unsigned long)stack < PAGE_SIZE)
+   stack = (unsigned long *)task->stack;
+
while (valid_stack_ptr(task, stack, sizeof(*stack), end)) {
unsigned long addr;
 
-- 
2.5.5



[PATCH v4 15/16] x86/mm/64: Enable vmapped stacks

2016-06-23 Thread Andy Lutomirski
This allows x86_64 kernels to enable vmapped stacks.  There are a
couple of interesting bits.

First, x86 lazily faults in top-level paging entries for the vmalloc
area.  This won't work if we get a page fault while trying to access
the stack: the CPU will promote it to a double-fault and we'll die.
To avoid this problem, probe the new stack when switching stacks and
forcibly populate the pgd entry for the stack when switching mms.

Second, once we have guard pages around the stack, we'll want to
detect and handle stack overflow.

I didn't enable it on x86_32.  We'd need to rework the double-fault
code a bit and I'm concerned about running out of vmalloc virtual
addresses under some workloads.

This patch, by itself, will behave somewhat erratically when the
stack overflows while RSP is still more than a few tens of bytes
above the bottom of the stack.  Specifically, we'll get #PF and make
it to no_context and an oops without triggering a double-fault, and
no_context doesn't know about stack overflows.  The next patch will
improve that case.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/switch_to.h | 28 +++-
 arch/x86/kernel/traps.c  | 32 
 arch/x86/mm/tlb.c| 15 +++
 4 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9a94da0c29f..afdcf96ef109 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -92,6 +92,7 @@ config X86
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_EBPF_JITif X86_64
+   select HAVE_ARCH_VMAP_STACK if X86_64
select HAVE_CC_STACKPROTECTOR
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8f321a1b03a1..14e4b20f0aaf 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,28 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
  struct tss_struct *tss);
 
+/* This runs runs on the previous thread's stack. */
+static inline void prepare_switch_to(struct task_struct *prev,
+struct task_struct *next)
+{
+#ifdef CONFIG_VMAP_STACK
+   /*
+* If we switch to a stack that has a top-level paging entry
+* that is not present in the current mm, the resulting #PF will
+* will be promoted to a double-fault and we'll panic.  Probe
+* the new stack now so that vmalloc_fault can fix up the page
+* tables if needed.  This can only happen if we use a stack
+* in vmap space.
+*
+* We assume that the stack is aligned so that it never spans
+* more than one top-level paging entry.
+*
+* To minimize cache pollution, just follow the stack pointer.
+*/
+   READ_ONCE(*(unsigned char *)next->thread.sp);
+#endif
+}
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
@@ -39,6 +61,8 @@ do {  
\
 */ \
unsigned long ebx, ecx, edx, esi, edi;  \
\
+   prepare_switch_to(prev, next);  \
+   \
asm volatile("pushl %%ebp\n\t"  /* saveEBP   */ \
 "movl %%esp,%[prev_sp]\n\t"/* saveESP   */ \
 "movl %[next_sp],%%esp\n\t"/* restore ESP   */ \
@@ -103,7 +127,9 @@ do {
\
  * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
  * has no effect.
  */
-#define switch_to(prev, next, last) \
+#define switch_to(prev, next, last)  \
+   prepare_switch_to(prev, next);\
+ \
asm volatile(SAVE_CONTEXT \
 "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */   \
 "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */\
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 00f03d82e69a..9cb7ea781176 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -292,12 +292,30 @@ DO_ERROR(X86_TRAP_NP, SIGBUS,  "segment not present", 
segment_not_present)
 DO_ERROR(X86_TRAP_SS, SIGBUS,  "stack segment",stack_segment)
 DO_ERROR(X86_TRAP_AC, SIGBUS,  "alignment check",  alignment_check)
 
+#ifdef 

[PATCH v4 12/16] x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp

2016-06-23 Thread Andy Lutomirski
The comment suggests that show_stack(NULL, NULL) should backtrace
the current context, but the code doesn't match the comment.  If
regs are given, start the "Stack:" hexdump at regs->sp.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_32.c | 4 +++-
 arch/x86/kernel/dumpstack_64.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index fef917e79b9d..948d77da3881 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -96,7 +96,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
int i;
 
if (sp == NULL) {
-   if (task)
+   if (regs)
+   sp = (unsigned long *)regs->sp;
+   else if (task)
sp = (unsigned long *)task->thread.sp;
else
sp = (unsigned long *)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index d558a8a49016..a81e1ef73bf2 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -264,7 +264,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 * back trace for this cpu:
 */
if (sp == NULL) {
-   if (task)
+   if (regs)
+   sp = (unsigned long *)regs->sp;
+   else if (task)
sp = (unsigned long *)task->thread.sp;
else
sp = (unsigned long *)
-- 
2.5.5



[PATCH v4 16/16] x86/mm: Improve stack-overflow #PF handling

2016-06-23 Thread Andy Lutomirski
If we get a page fault indicating kernel stack overflow, invoke
handle_stack_overflow().  To prevent us from overflowing the stack
again while handling the overflow (because we are likely to have
very little stack space left), call handle_stack_overflow() on the
double-fault stack

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/traps.h |  6 ++
 arch/x86/kernel/traps.c  |  6 +++---
 arch/x86/mm/fault.c  | 39 +++
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index c3496619740a..01fd0a7f48cd 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -117,6 +117,12 @@ extern void ist_exit(struct pt_regs *regs);
 extern void ist_begin_non_atomic(struct pt_regs *regs);
 extern void ist_end_non_atomic(void);
 
+#ifdef CONFIG_VMAP_STACK
+void __noreturn handle_stack_overflow(const char *message,
+ struct pt_regs *regs,
+ unsigned long fault_address);
+#endif
+
 /* Interrupts/Exceptions */
 enum {
X86_TRAP_DE = 0,/*  0, Divide-by-zero */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9cb7ea781176..b389c0539eb9 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -293,9 +293,9 @@ DO_ERROR(X86_TRAP_SS, SIGBUS,  "stack segment", 
stack_segment)
 DO_ERROR(X86_TRAP_AC, SIGBUS,  "alignment check",  alignment_check)
 
 #ifdef CONFIG_VMAP_STACK
-static void __noreturn handle_stack_overflow(const char *message,
-struct pt_regs *regs,
-unsigned long fault_address)
+__visible void __noreturn handle_stack_overflow(const char *message,
+   struct pt_regs *regs,
+   unsigned long fault_address)
 {
printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is 
%p..%p)\n",
 (void *)fault_address, current->stack,
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7d1fa7cd2374..c68b81f5659f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -753,6 +753,45 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
}
 
+#ifdef CONFIG_VMAP_STACK
+   /*
+* Stack overflow?  During boot, we can fault near the initial
+* stack in the direct map, but that's not an overflow -- check
+* that we're in vmalloc space to avoid this.
+*
+* Check this after trying fixup_exception, since there are handful
+* of kernel code paths that wander off the top of the stack but
+* handle any faults that occur.  Once those are fixed, we can
+* move this above fixup_exception.
+*/
+   if (is_vmalloc_addr((void *)address) &&
+   (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
+address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
+   register void *__sp asm("rsp");
+   unsigned long stack =
+   this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) -
+   sizeof(void *);
+   /*
+* We're likely to be running with very little stack space
+* left.  It's plausible that we'd hit this condition but
+* double-fault even before we get this far, in which case
+* we're fine: the double-fault handler will deal with it.
+*
+* We don't want to make it all the way into the oops code
+* and then double-fault, though, because we're likely to
+* break the console driver and lose most of the stack dump.
+*/
+   asm volatile ("movq %[stack], %%rsp\n\t"
+ "call handle_stack_overflow\n\t"
+ "1: jmp 1b"
+ : "+r" (__sp)
+ : "D" ("kernel stack overflow (page fault)"),
+   "S" (regs), "d" (address),
+   [stack] "rm" (stack));
+   unreachable();
+   }
+#endif
+
/*
 * 32-bit:
 *
-- 
2.5.5



[PATCH v4 11/16] x86/dumpstack: When OOPSing, rewind the stack before do_exit

2016-06-23 Thread Andy Lutomirski
If we call do_exit with a clean stack, we greatly reduce the risk of
recursive oopses due to stack overflow in do_exit, and we allow
do_exit to work even if we OOPS from an IST stack.  The latter gives
us a much better chance of surviving long enough after we detect a
stack overflow to write out our logs.

I intentionally separated this from the preceding patch that
disables do_exit-on-OOPS on IST stacks.  This way, if we need to
revert this patch, we still end up in an acceptable state wrt stack
overflow handling.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S   | 11 +++
 arch/x86/entry/entry_64.S   | 11 +++
 arch/x86/kernel/dumpstack.c | 13 +
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 983e5d3a0d27..0b5e6039 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1153,3 +1153,14 @@ ENTRY(async_page_fault)
jmp error_code
 END(async_page_fault)
 #endif
+
+ENTRY(rewind_stack_do_exit)
+   /* Prevent any naive code from trying to unwind to our caller. */
+   xorl%ebp, %ebp
+
+   movlPER_CPU_VAR(cpu_current_top_of_stack), %esi
+   leal-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp
+
+   calldo_exit
+1: jmp 1b
+END(rewind_stack_do_exit)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9ee0da1807ed..b846875aeea6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1423,3 +1423,14 @@ ENTRY(ignore_sysret)
mov $-ENOSYS, %eax
sysret
 END(ignore_sysret)
+
+ENTRY(rewind_stack_do_exit)
+   /* Prevent any naive code from trying to unwind to our caller. */
+   xorl%ebp, %ebp
+
+   movqPER_CPU_VAR(cpu_current_top_of_stack), %rax
+   leaq-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%rax), %rsp
+
+   calldo_exit
+1: jmp 1b
+END(rewind_stack_do_exit)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 70d5aae8b8f7..4592bc4ed3e1 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -226,6 +226,8 @@ unsigned long oops_begin(void)
 EXPORT_SYMBOL_GPL(oops_begin);
 NOKPROBE_SYMBOL(oops_begin);
 
+extern void __noreturn rewind_stack_do_exit(int signr);
+
 void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
 {
if (regs && kexec_should_crash(current))
@@ -245,12 +247,15 @@ void oops_end(unsigned long flags, struct pt_regs *regs, 
int signr)
return;
if (in_interrupt())
panic("Fatal exception in interrupt");
-   if (((current_stack_pointer() ^ (current_top_of_stack() - 1))
-& ~(THREAD_SIZE - 1)) != 0)
-   panic("Fatal exception on special stack");
if (panic_on_oops)
panic("Fatal exception");
-   do_exit(signr);
+
+   /*
+* We're not going to return, but we might be on an IST stack or
+* have very little stack space left.  Rewind the stack and kill
+* the task.
+*/
+   rewind_stack_do_exit(signr);
 }
 NOKPROBE_SYMBOL(oops_end);
 
-- 
2.5.5



[PATCH v4 15/16] x86/mm/64: Enable vmapped stacks

2016-06-23 Thread Andy Lutomirski
This allows x86_64 kernels to enable vmapped stacks.  There are a
couple of interesting bits.

First, x86 lazily faults in top-level paging entries for the vmalloc
area.  This won't work if we get a page fault while trying to access
the stack: the CPU will promote it to a double-fault and we'll die.
To avoid this problem, probe the new stack when switching stacks and
forcibly populate the pgd entry for the stack when switching mms.

Second, once we have guard pages around the stack, we'll want to
detect and handle stack overflow.

I didn't enable it on x86_32.  We'd need to rework the double-fault
code a bit and I'm concerned about running out of vmalloc virtual
addresses under some workloads.

This patch, by itself, will behave somewhat erratically when the
stack overflows while RSP is still more than a few tens of bytes
above the bottom of the stack.  Specifically, we'll get #PF and make
it to no_context and an oops without triggering a double-fault, and
no_context doesn't know about stack overflows.  The next patch will
improve that case.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/switch_to.h | 28 +++-
 arch/x86/kernel/traps.c  | 32 
 arch/x86/mm/tlb.c| 15 +++
 4 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9a94da0c29f..afdcf96ef109 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -92,6 +92,7 @@ config X86
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_EBPF_JITif X86_64
+   select HAVE_ARCH_VMAP_STACK if X86_64
select HAVE_CC_STACKPROTECTOR
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8f321a1b03a1..14e4b20f0aaf 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,28 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
  struct tss_struct *tss);
 
+/* This runs runs on the previous thread's stack. */
+static inline void prepare_switch_to(struct task_struct *prev,
+struct task_struct *next)
+{
+#ifdef CONFIG_VMAP_STACK
+   /*
+* If we switch to a stack that has a top-level paging entry
+* that is not present in the current mm, the resulting #PF will
+* will be promoted to a double-fault and we'll panic.  Probe
+* the new stack now so that vmalloc_fault can fix up the page
+* tables if needed.  This can only happen if we use a stack
+* in vmap space.
+*
+* We assume that the stack is aligned so that it never spans
+* more than one top-level paging entry.
+*
+* To minimize cache pollution, just follow the stack pointer.
+*/
+   READ_ONCE(*(unsigned char *)next->thread.sp);
+#endif
+}
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
@@ -39,6 +61,8 @@ do {  
\
 */ \
unsigned long ebx, ecx, edx, esi, edi;  \
\
+   prepare_switch_to(prev, next);  \
+   \
asm volatile("pushl %%ebp\n\t"  /* saveEBP   */ \
 "movl %%esp,%[prev_sp]\n\t"/* saveESP   */ \
 "movl %[next_sp],%%esp\n\t"/* restore ESP   */ \
@@ -103,7 +127,9 @@ do {
\
  * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
  * has no effect.
  */
-#define switch_to(prev, next, last) \
+#define switch_to(prev, next, last)  \
+   prepare_switch_to(prev, next);\
+ \
asm volatile(SAVE_CONTEXT \
 "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */   \
 "movq %P[threadrsp](%[next]),%%rsp\n\t" /* restore RSP */\
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 00f03d82e69a..9cb7ea781176 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -292,12 +292,30 @@ DO_ERROR(X86_TRAP_NP, SIGBUS,  "segment not present", 
segment_not_present)
 DO_ERROR(X86_TRAP_SS, SIGBUS,  "stack segment",stack_segment)
 DO_ERROR(X86_TRAP_AC, SIGBUS,  "alignment check",  alignment_check)
 
+#ifdef CONFIG_VMAP_STACK

[PATCH v4 12/16] x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp

2016-06-23 Thread Andy Lutomirski
The comment suggests that show_stack(NULL, NULL) should backtrace
the current context, but the code doesn't match the comment.  If
regs are given, start the "Stack:" hexdump at regs->sp.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/dumpstack_32.c | 4 +++-
 arch/x86/kernel/dumpstack_64.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index fef917e79b9d..948d77da3881 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -96,7 +96,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
int i;
 
if (sp == NULL) {
-   if (task)
+   if (regs)
+   sp = (unsigned long *)regs->sp;
+   else if (task)
sp = (unsigned long *)task->thread.sp;
else
sp = (unsigned long *)
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index d558a8a49016..a81e1ef73bf2 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -264,7 +264,9 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs 
*regs,
 * back trace for this cpu:
 */
if (sp == NULL) {
-   if (task)
+   if (regs)
+   sp = (unsigned long *)regs->sp;
+   else if (task)
sp = (unsigned long *)task->thread.sp;
else
sp = (unsigned long *)
-- 
2.5.5



[PATCH v4 16/16] x86/mm: Improve stack-overflow #PF handling

2016-06-23 Thread Andy Lutomirski
If we get a page fault indicating kernel stack overflow, invoke
handle_stack_overflow().  To prevent us from overflowing the stack
again while handling the overflow (because we are likely to have
very little stack space left), call handle_stack_overflow() on the
double-fault stack

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/traps.h |  6 ++
 arch/x86/kernel/traps.c  |  6 +++---
 arch/x86/mm/fault.c  | 39 +++
 3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index c3496619740a..01fd0a7f48cd 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -117,6 +117,12 @@ extern void ist_exit(struct pt_regs *regs);
 extern void ist_begin_non_atomic(struct pt_regs *regs);
 extern void ist_end_non_atomic(void);
 
+#ifdef CONFIG_VMAP_STACK
+void __noreturn handle_stack_overflow(const char *message,
+ struct pt_regs *regs,
+ unsigned long fault_address);
+#endif
+
 /* Interrupts/Exceptions */
 enum {
X86_TRAP_DE = 0,/*  0, Divide-by-zero */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9cb7ea781176..b389c0539eb9 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -293,9 +293,9 @@ DO_ERROR(X86_TRAP_SS, SIGBUS,  "stack segment", 
stack_segment)
 DO_ERROR(X86_TRAP_AC, SIGBUS,  "alignment check",  alignment_check)
 
 #ifdef CONFIG_VMAP_STACK
-static void __noreturn handle_stack_overflow(const char *message,
-struct pt_regs *regs,
-unsigned long fault_address)
+__visible void __noreturn handle_stack_overflow(const char *message,
+   struct pt_regs *regs,
+   unsigned long fault_address)
 {
printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is 
%p..%p)\n",
 (void *)fault_address, current->stack,
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7d1fa7cd2374..c68b81f5659f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -753,6 +753,45 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
}
 
+#ifdef CONFIG_VMAP_STACK
+   /*
+* Stack overflow?  During boot, we can fault near the initial
+* stack in the direct map, but that's not an overflow -- check
+* that we're in vmalloc space to avoid this.
+*
+* Check this after trying fixup_exception, since there are handful
+* of kernel code paths that wander off the top of the stack but
+* handle any faults that occur.  Once those are fixed, we can
+* move this above fixup_exception.
+*/
+   if (is_vmalloc_addr((void *)address) &&
+   (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
+address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
+   register void *__sp asm("rsp");
+   unsigned long stack =
+   this_cpu_read(orig_ist.ist[DOUBLEFAULT_STACK]) -
+   sizeof(void *);
+   /*
+* We're likely to be running with very little stack space
+* left.  It's plausible that we'd hit this condition but
+* double-fault even before we get this far, in which case
+* we're fine: the double-fault handler will deal with it.
+*
+* We don't want to make it all the way into the oops code
+* and then double-fault, though, because we're likely to
+* break the console driver and lose most of the stack dump.
+*/
+   asm volatile ("movq %[stack], %%rsp\n\t"
+ "call handle_stack_overflow\n\t"
+ "1: jmp 1b"
+ : "+r" (__sp)
+ : "D" ("kernel stack overflow (page fault)"),
+   "S" (regs), "d" (address),
+   [stack] "rm" (stack));
+   unreachable();
+   }
+#endif
+
/*
 * 32-bit:
 *
-- 
2.5.5



[PATCH v4 11/16] x86/dumpstack: When OOPSing, rewind the stack before do_exit

2016-06-23 Thread Andy Lutomirski
If we call do_exit with a clean stack, we greatly reduce the risk of
recursive oopses due to stack overflow in do_exit, and we allow
do_exit to work even if we OOPS from an IST stack.  The latter gives
us a much better chance of surviving long enough after we detect a
stack overflow to write out our logs.

I intentionally separated this from the preceding patch that
disables do_exit-on-OOPS on IST stacks.  This way, if we need to
revert this patch, we still end up in an acceptable state wrt stack
overflow handling.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_32.S   | 11 +++
 arch/x86/entry/entry_64.S   | 11 +++
 arch/x86/kernel/dumpstack.c | 13 +
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 983e5d3a0d27..0b5e6039 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1153,3 +1153,14 @@ ENTRY(async_page_fault)
jmp error_code
 END(async_page_fault)
 #endif
+
+ENTRY(rewind_stack_do_exit)
+   /* Prevent any naive code from trying to unwind to our caller. */
+   xorl%ebp, %ebp
+
+   movlPER_CPU_VAR(cpu_current_top_of_stack), %esi
+   leal-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%esi), %esp
+
+   calldo_exit
+1: jmp 1b
+END(rewind_stack_do_exit)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9ee0da1807ed..b846875aeea6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1423,3 +1423,14 @@ ENTRY(ignore_sysret)
mov $-ENOSYS, %eax
sysret
 END(ignore_sysret)
+
+ENTRY(rewind_stack_do_exit)
+   /* Prevent any naive code from trying to unwind to our caller. */
+   xorl%ebp, %ebp
+
+   movqPER_CPU_VAR(cpu_current_top_of_stack), %rax
+   leaq-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%rax), %rsp
+
+   calldo_exit
+1: jmp 1b
+END(rewind_stack_do_exit)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 70d5aae8b8f7..4592bc4ed3e1 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -226,6 +226,8 @@ unsigned long oops_begin(void)
 EXPORT_SYMBOL_GPL(oops_begin);
 NOKPROBE_SYMBOL(oops_begin);
 
+extern void __noreturn rewind_stack_do_exit(int signr);
+
 void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
 {
if (regs && kexec_should_crash(current))
@@ -245,12 +247,15 @@ void oops_end(unsigned long flags, struct pt_regs *regs, 
int signr)
return;
if (in_interrupt())
panic("Fatal exception in interrupt");
-   if (((current_stack_pointer() ^ (current_top_of_stack() - 1))
-& ~(THREAD_SIZE - 1)) != 0)
-   panic("Fatal exception on special stack");
if (panic_on_oops)
panic("Fatal exception");
-   do_exit(signr);
+
+   /*
+* We're not going to return, but we might be on an IST stack or
+* have very little stack space left.  Rewind the stack and kill
+* the task.
+*/
+   rewind_stack_do_exit(signr);
 }
 NOKPROBE_SYMBOL(oops_end);
 
-- 
2.5.5



[PATCH v4 06/16] mm: Track NR_KERNEL_STACK in KiB instead of number of stacks

2016-06-23 Thread Andy Lutomirski
Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
zone.  This only makes sense if each kernel stack exists entirely in
one zone, and allowing vmapped stacks could break this assumption.

Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on
all architectures.  Keep it simple and use KiB.

Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Reviewed-by: Vladimir Davydov 
Acked-by: Michal Hocko 
Signed-off-by: Andy Lutomirski 
---
 drivers/base/node.c| 3 +--
 fs/proc/meminfo.c  | 2 +-
 include/linux/mmzone.h | 2 +-
 kernel/fork.c  | 3 ++-
 mm/page_alloc.c| 3 +--
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 560751bad294..27dc68a0ed2d 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -121,8 +121,7 @@ static ssize_t node_read_meminfo(struct device *dev,
   nid, K(node_page_state(nid, NR_FILE_MAPPED)),
   nid, K(node_page_state(nid, NR_ANON_PAGES)),
   nid, K(i.sharedram),
-  nid, node_page_state(nid, NR_KERNEL_STACK) *
-   THREAD_SIZE / 1024,
+  nid, node_page_state(nid, NR_KERNEL_STACK_KB),
   nid, K(node_page_state(nid, NR_PAGETABLE)),
   nid, K(node_page_state(nid, NR_UNSTABLE_NFS)),
   nid, K(node_page_state(nid, NR_BOUNCE)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 83720460c5bc..239b5a06cee0 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -145,7 +145,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
-   global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
+   global_page_state(NR_KERNEL_STACK_KB),
K(global_page_state(NR_PAGETABLE)),
 #ifdef CONFIG_QUICKLIST
K(quicklist_total_size()),
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02069c23486d..63f05a7efb54 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -127,7 +127,7 @@ enum zone_stat_item {
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE,   /* used for pagetables */
-   NR_KERNEL_STACK,
+   NR_KERNEL_STACK_KB, /* measured in KiB */
/* Second 128 byte cacheline */
NR_UNSTABLE_NFS,/* NFS unstable pages */
NR_BOUNCE,
diff --git a/kernel/fork.c b/kernel/fork.c
index 5c2c355aa97f..be7f006af727 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -225,7 +225,8 @@ static void account_kernel_stack(struct thread_info *ti, 
int account)
 {
struct zone *zone = page_zone(virt_to_page(ti));
 
-   mod_zone_page_state(zone, NR_KERNEL_STACK, account);
+   mod_zone_page_state(zone, NR_KERNEL_STACK_KB,
+   THREAD_SIZE / 1024 * account);
 }
 
 void free_task(struct task_struct *tsk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6903b695ebae..a277dea926c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4457,8 +4457,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_SHMEM)),
K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
-   zone_page_state(zone, NR_KERNEL_STACK) *
-   THREAD_SIZE / 1024,
+   zone_page_state(zone, NR_KERNEL_STACK_KB),
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_UNSTABLE_NFS)),
K(zone_page_state(zone, NR_BOUNCE)),
-- 
2.5.5



[PATCH v4 06/16] mm: Track NR_KERNEL_STACK in KiB instead of number of stacks

2016-06-23 Thread Andy Lutomirski
Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a
zone.  This only makes sense if each kernel stack exists entirely in
one zone, and allowing vmapped stacks could break this assumption.

Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on
all architectures.  Keep it simple and use KiB.

Cc: Vladimir Davydov 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: linux...@kvack.org
Reviewed-by: Vladimir Davydov 
Acked-by: Michal Hocko 
Signed-off-by: Andy Lutomirski 
---
 drivers/base/node.c| 3 +--
 fs/proc/meminfo.c  | 2 +-
 include/linux/mmzone.h | 2 +-
 kernel/fork.c  | 3 ++-
 mm/page_alloc.c| 3 +--
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 560751bad294..27dc68a0ed2d 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -121,8 +121,7 @@ static ssize_t node_read_meminfo(struct device *dev,
   nid, K(node_page_state(nid, NR_FILE_MAPPED)),
   nid, K(node_page_state(nid, NR_ANON_PAGES)),
   nid, K(i.sharedram),
-  nid, node_page_state(nid, NR_KERNEL_STACK) *
-   THREAD_SIZE / 1024,
+  nid, node_page_state(nid, NR_KERNEL_STACK_KB),
   nid, K(node_page_state(nid, NR_PAGETABLE)),
   nid, K(node_page_state(nid, NR_UNSTABLE_NFS)),
   nid, K(node_page_state(nid, NR_BOUNCE)),
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 83720460c5bc..239b5a06cee0 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -145,7 +145,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
-   global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
+   global_page_state(NR_KERNEL_STACK_KB),
K(global_page_state(NR_PAGETABLE)),
 #ifdef CONFIG_QUICKLIST
K(quicklist_total_size()),
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02069c23486d..63f05a7efb54 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -127,7 +127,7 @@ enum zone_stat_item {
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE,   /* used for pagetables */
-   NR_KERNEL_STACK,
+   NR_KERNEL_STACK_KB, /* measured in KiB */
/* Second 128 byte cacheline */
NR_UNSTABLE_NFS,/* NFS unstable pages */
NR_BOUNCE,
diff --git a/kernel/fork.c b/kernel/fork.c
index 5c2c355aa97f..be7f006af727 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -225,7 +225,8 @@ static void account_kernel_stack(struct thread_info *ti, 
int account)
 {
struct zone *zone = page_zone(virt_to_page(ti));
 
-   mod_zone_page_state(zone, NR_KERNEL_STACK, account);
+   mod_zone_page_state(zone, NR_KERNEL_STACK_KB,
+   THREAD_SIZE / 1024 * account);
 }
 
 void free_task(struct task_struct *tsk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6903b695ebae..a277dea926c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4457,8 +4457,7 @@ void show_free_areas(unsigned int filter)
K(zone_page_state(zone, NR_SHMEM)),
K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)),
K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)),
-   zone_page_state(zone, NR_KERNEL_STACK) *
-   THREAD_SIZE / 1024,
+   zone_page_state(zone, NR_KERNEL_STACK_KB),
K(zone_page_state(zone, NR_PAGETABLE)),
K(zone_page_state(zone, NR_UNSTABLE_NFS)),
K(zone_page_state(zone, NR_BOUNCE)),
-- 
2.5.5



[PATCH v4 08/16] dma-api: Teach the "DMA-from-stack" check about vmapped stacks

2016-06-23 Thread Andy Lutomirski
If we're using CONFIG_VMAP_STACK and we manage to point an sg entry
at the stack, then either the sg page will be in highmem or sg_virt
will return the direct-map alias.  In neither case will the existing
check_for_stack() implementation realize that it's a stack page.

Fix it by explicitly checking for stack pages.

This has no effect by itself.  It's broken out for ease of review.

Cc: Andrew Morton 
Cc: Arnd Bergmann 
Signed-off-by: Andy Lutomirski 
---
 lib/dma-debug.c | 39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 51a76af25c66..5b2e63cba90e 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1162,11 +1163,35 @@ static void check_unmap(struct dma_debug_entry *ref)
put_hash_bucket(bucket, );
 }
 
-static void check_for_stack(struct device *dev, void *addr)
+static void check_for_stack(struct device *dev,
+   struct page *page, size_t offset)
 {
-   if (object_is_on_stack(addr))
-   err_printk(dev, NULL, "DMA-API: device driver maps memory from "
-   "stack [addr=%p]\n", addr);
+   void *addr;
+   struct vm_struct *stack_vm_area = task_stack_vm_area(current);
+
+   if (!stack_vm_area) {
+   /* Stack is direct-mapped. */
+   if (PageHighMem(page))
+   return;
+   addr = page_address(page) + offset;
+   if (object_is_on_stack(addr))
+   err_printk(dev, NULL, "DMA-API: device driver maps 
memory from stack [addr=%p]\n",
+  addr);
+   } else {
+   /* Stack is vmalloced. */
+   int i;
+
+   for (i = 0; i < stack_vm_area->nr_pages; i++) {
+   if (page != stack_vm_area->pages[i])
+   continue;
+
+   addr = (u8 *)current->stack + i * PAGE_SIZE +
+   offset;
+   err_printk(dev, NULL, "DMA-API: device driver maps 
memory from stack [probable addr=%p]\n",
+  addr);
+   break;
+   }
+   }
 }
 
 static inline bool overlap(void *addr, unsigned long len, void *start, void 
*end)
@@ -1289,10 +1314,11 @@ void debug_dma_map_page(struct device *dev, struct page 
*page, size_t offset,
if (map_single)
entry->type = dma_debug_single;
 
+   check_for_stack(dev, page, offset);
+
if (!PageHighMem(page)) {
void *addr = page_address(page) + offset;
 
-   check_for_stack(dev, addr);
check_for_illegal_area(dev, addr, size);
}
 
@@ -1384,8 +1410,9 @@ void debug_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
entry->sg_call_ents   = nents;
entry->sg_mapped_ents = mapped_ents;
 
+   check_for_stack(dev, sg_page(s), s->offset);
+
if (!PageHighMem(sg_page(s))) {
-   check_for_stack(dev, sg_virt(s));
check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
}
 
-- 
2.5.5



[PATCH v4 08/16] dma-api: Teach the "DMA-from-stack" check about vmapped stacks

2016-06-23 Thread Andy Lutomirski
If we're using CONFIG_VMAP_STACK and we manage to point an sg entry
at the stack, then either the sg page will be in highmem or sg_virt
will return the direct-map alias.  In neither case will the existing
check_for_stack() implementation realize that it's a stack page.

Fix it by explicitly checking for stack pages.

This has no effect by itself.  It's broken out for ease of review.

Cc: Andrew Morton 
Cc: Arnd Bergmann 
Signed-off-by: Andy Lutomirski 
---
 lib/dma-debug.c | 39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 51a76af25c66..5b2e63cba90e 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1162,11 +1163,35 @@ static void check_unmap(struct dma_debug_entry *ref)
put_hash_bucket(bucket, );
 }
 
-static void check_for_stack(struct device *dev, void *addr)
+static void check_for_stack(struct device *dev,
+   struct page *page, size_t offset)
 {
-   if (object_is_on_stack(addr))
-   err_printk(dev, NULL, "DMA-API: device driver maps memory from "
-   "stack [addr=%p]\n", addr);
+   void *addr;
+   struct vm_struct *stack_vm_area = task_stack_vm_area(current);
+
+   if (!stack_vm_area) {
+   /* Stack is direct-mapped. */
+   if (PageHighMem(page))
+   return;
+   addr = page_address(page) + offset;
+   if (object_is_on_stack(addr))
+   err_printk(dev, NULL, "DMA-API: device driver maps 
memory from stack [addr=%p]\n",
+  addr);
+   } else {
+   /* Stack is vmalloced. */
+   int i;
+
+   for (i = 0; i < stack_vm_area->nr_pages; i++) {
+   if (page != stack_vm_area->pages[i])
+   continue;
+
+   addr = (u8 *)current->stack + i * PAGE_SIZE +
+   offset;
+   err_printk(dev, NULL, "DMA-API: device driver maps 
memory from stack [probable addr=%p]\n",
+  addr);
+   break;
+   }
+   }
 }
 
 static inline bool overlap(void *addr, unsigned long len, void *start, void 
*end)
@@ -1289,10 +1314,11 @@ void debug_dma_map_page(struct device *dev, struct page 
*page, size_t offset,
if (map_single)
entry->type = dma_debug_single;
 
+   check_for_stack(dev, page, offset);
+
if (!PageHighMem(page)) {
void *addr = page_address(page) + offset;
 
-   check_for_stack(dev, addr);
check_for_illegal_area(dev, addr, size);
}
 
@@ -1384,8 +1410,9 @@ void debug_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
entry->sg_call_ents   = nents;
entry->sg_mapped_ents = mapped_ents;
 
+   check_for_stack(dev, sg_page(s), s->offset);
+
if (!PageHighMem(sg_page(s))) {
-   check_for_stack(dev, sg_virt(s));
check_for_illegal_area(dev, sg_virt(s), sg_dma_len(s));
}
 
-- 
2.5.5



Re: [PATCH v6 2/2] mtd: nand: sunxi: add reset line support

2016-06-23 Thread Boris Brezillon
On Fri, 24 Jun 2016 07:20:38 +0800
Icenowy Zheng  wrote:

> In my opinion, return directly PTR_ERR(nfc->reset) is OK here.
> If devm_reset_control_get_optional() return -EPROBE_DEFER, the code here will 
> also return it. However, if we get other error, why should it return 
> -EPROBE_DEFER again?

Sorry, I just had a brainfart :-). Your implementation is correct.
BTW, can you avoid top-posting and reply inline?

> 
> 24.06.2016, 00:01, "Boris Brezillon" :
> > On Mon, 20 Jun 2016 12:48:38 +0800
> > Icenowy Zheng  wrote:
> >  
> >>  The NAND controller on some sun8i chips needs its reset line to be
> >>  deasserted before they can enter working state.
> >>
> >>  Signed-off-by: Icenowy Zheng 
> >>  ---
> >>    Changes in v2:
> >>  - Corrected the error checking code of reset line.
> >>
> >>    Changes in v3:
> >>  - Corrected a more serious error brought in the "fix" of v2.
> >>
> >>    Changes in v4:
> >>  - Removed unneeded code block after "else".
> >>
> >>    Changes in v5:
> >>  - Added reassertion code in case of initialization error and device
> >>    remove.
> >>
> >>    Changes in v6:
> >>  - Fixed a resource leak by not using goto to exit in case of error.
> >>
> >>   drivers/mtd/nand/sunxi_nand.c | 28 +---
> >>   1 file changed, 25 insertions(+), 3 deletions(-)
> >>
> >>  diff --git a/drivers/mtd/nand/sunxi_nand.c b/drivers/mtd/nand/sunxi_nand.c
> >>  index a83a690..08d5e88 100644
> >>  --- a/drivers/mtd/nand/sunxi_nand.c
> >>  +++ b/drivers/mtd/nand/sunxi_nand.c
> >>  @@ -39,6 +39,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >>  +#include 
> >>
> >>   #define NFC_REG_CTL 0x
> >>   #define NFC_REG_ST 0x0004
> >>  @@ -269,6 +270,7 @@ struct sunxi_nfc {
> >>   void __iomem *regs;
> >>   struct clk *ahb_clk;
> >>   struct clk *mod_clk;
> >>  + struct reset_control *reset;
> >>   unsigned long assigned_cs;
> >>   unsigned long clk_rate;
> >>   struct list_head chips;
> >>  @@ -1871,26 +1873,42 @@ static int sunxi_nfc_probe(struct platform_device 
> >> *pdev)
> >>   if (ret)
> >>   goto out_ahb_clk_unprepare;
> >>
> >>  + nfc->reset = devm_reset_control_get_optional(dev, "ahb");
> >>  +
> >>  + if (!IS_ERR(nfc->reset)) {
> >>  + ret = reset_control_deassert(nfc->reset);
> >>  + if (ret) {
> >>  + dev_err(dev, "reset err %d\n", ret);
> >>  + goto out_mod_clk_unprepare;
> >>  + }
> >>  + } else if (PTR_ERR(nfc->reset) != -ENOENT) {
> >>  + ret = PTR_ERR(nfc->reset);  
> >
> > You should return -EDEFER_PROBE here.
> >
> > And can you please rebase this series on top of nand/next [1]?
> >
> > [1]https://github.com/linux-nand/linux/tree/nand/next  



Re: [PATCH v6 2/2] mtd: nand: sunxi: add reset line support

2016-06-23 Thread Boris Brezillon
On Fri, 24 Jun 2016 07:20:38 +0800
Icenowy Zheng  wrote:

> In my opinion, return directly PTR_ERR(nfc->reset) is OK here.
> If devm_reset_control_get_optional() return -EPROBE_DEFER, the code here will 
> also return it. However, if we get other error, why should it return 
> -EPROBE_DEFER again?

Sorry, I just had a brainfart :-). Your implementation is correct.
BTW, can you avoid top-posting and reply inline?

> 
> 24.06.2016, 00:01, "Boris Brezillon" :
> > On Mon, 20 Jun 2016 12:48:38 +0800
> > Icenowy Zheng  wrote:
> >  
> >>  The NAND controller on some sun8i chips needs its reset line to be
> >>  deasserted before they can enter working state.
> >>
> >>  Signed-off-by: Icenowy Zheng 
> >>  ---
> >>    Changes in v2:
> >>  - Corrected the error checking code of reset line.
> >>
> >>    Changes in v3:
> >>  - Corrected a more serious error brought in the "fix" of v2.
> >>
> >>    Changes in v4:
> >>  - Removed unneeded code block after "else".
> >>
> >>    Changes in v5:
> >>  - Added reassertion code in case of initialization error and device
> >>    remove.
> >>
> >>    Changes in v6:
> >>  - Fixed a resource leak by not using goto to exit in case of error.
> >>
> >>   drivers/mtd/nand/sunxi_nand.c | 28 +---
> >>   1 file changed, 25 insertions(+), 3 deletions(-)
> >>
> >>  diff --git a/drivers/mtd/nand/sunxi_nand.c b/drivers/mtd/nand/sunxi_nand.c
> >>  index a83a690..08d5e88 100644
> >>  --- a/drivers/mtd/nand/sunxi_nand.c
> >>  +++ b/drivers/mtd/nand/sunxi_nand.c
> >>  @@ -39,6 +39,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >>  +#include 
> >>
> >>   #define NFC_REG_CTL 0x
> >>   #define NFC_REG_ST 0x0004
> >>  @@ -269,6 +270,7 @@ struct sunxi_nfc {
> >>   void __iomem *regs;
> >>   struct clk *ahb_clk;
> >>   struct clk *mod_clk;
> >>  + struct reset_control *reset;
> >>   unsigned long assigned_cs;
> >>   unsigned long clk_rate;
> >>   struct list_head chips;
> >>  @@ -1871,26 +1873,42 @@ static int sunxi_nfc_probe(struct platform_device 
> >> *pdev)
> >>   if (ret)
> >>   goto out_ahb_clk_unprepare;
> >>
> >>  + nfc->reset = devm_reset_control_get_optional(dev, "ahb");
> >>  +
> >>  + if (!IS_ERR(nfc->reset)) {
> >>  + ret = reset_control_deassert(nfc->reset);
> >>  + if (ret) {
> >>  + dev_err(dev, "reset err %d\n", ret);
> >>  + goto out_mod_clk_unprepare;
> >>  + }
> >>  + } else if (PTR_ERR(nfc->reset) != -ENOENT) {
> >>  + ret = PTR_ERR(nfc->reset);  
> >
> > You should return -EDEFER_PROBE here.
> >
> > And can you please rebase this series on top of nand/next [1]?
> >
> > [1]https://github.com/linux-nand/linux/tree/nand/next  



Re: [PATCH] capabilities: add capability cgroup controller

2016-06-23 Thread Andy Lutomirski
On Thu, Jun 23, 2016 at 6:14 PM, Topi Miettinen  wrote:
> On 06/23/16 23:46, Andrew Morton wrote:
>> On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen  wrote:
>>
>>> There are many basic ways to control processes, including capabilities,
>>> cgroups and resource limits. However, there are far fewer ways to find
>>> out useful values for the limits, except blind trial and error.
>>>
>>> Currently, there is no way to know which capabilities are actually used.
>>> Even the source code is only implicit, in-depth knowledge of each
>>> capability must be used when analyzing a program to judge which
>>> capabilities the program will exercise.
>>>
>>> Add a new cgroup controller for monitoring of capabilities
>>> in the cgroup.
>>
>> I'm having trouble understanding how valuable this feature is to our
>> users, and that's a rather important thing!
>>
>> Perhaps it would help if you were to explain your motivation:
>> particular use cases which benefited from this, for example.
>>
>
> It's easy to control with for example systemd or many other tools, which
> capabilities a service should have at the start. But how should a system
> administrator, application developer or distro maintaner ever determine
> a suitable value for this? Currently the only way seems to be to become
> an expert on capabilities, make an educated guess how the set of
> programs in question happen to work in this context and especially how
> they could exercise the capabilites in all possible use cases. Even
> then, the outcome is to just try something to see if that happens to
> work. Reading the source code (if available) does not help very much,
> because the use of capabilities is anything but explicit there.
>
> This is way too difficult, there must be some easier way. The
> information which capabilities actually were used in a trial run gives a
> much better starting point. The users can just use the list of used
> capabilities with configuring the service or when developing or
> maintaining the application. Of course, even that could still fail
> eventually, but then you simply copy the new value of used capabilities
> to the configuration, whereas currently you have to reconsider your
> understanding of the capabilities and the programs in light of the
> failure, which by itself might give no new useful information.
>
> One way to solve this for good would be to make the use of capabilities
> explicit in the ABI. For example, there could be a system call
> dac_override() which would be the only possible way ever to use the
> capability CAP_DAC_OVERRIDE and so forth. Then reading source code,
> tracing and many other approaches would be useful. But the OS with that
> kind of ABI (not Linux) would not be Unix-like at all for any
> (potentially) capability using programs, like find(1) or cat(1).

The problem is that most of the capabilities are so powerful on their
own that limiting services to just a few may be all but useless.

--Andy


Re: [PATCH] capabilities: add capability cgroup controller

2016-06-23 Thread Andy Lutomirski
On Thu, Jun 23, 2016 at 6:14 PM, Topi Miettinen  wrote:
> On 06/23/16 23:46, Andrew Morton wrote:
>> On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen  wrote:
>>
>>> There are many basic ways to control processes, including capabilities,
>>> cgroups and resource limits. However, there are far fewer ways to find
>>> out useful values for the limits, except blind trial and error.
>>>
>>> Currently, there is no way to know which capabilities are actually used.
>>> Even the source code is only implicit, in-depth knowledge of each
>>> capability must be used when analyzing a program to judge which
>>> capabilities the program will exercise.
>>>
>>> Add a new cgroup controller for monitoring of capabilities
>>> in the cgroup.
>>
>> I'm having trouble understanding how valuable this feature is to our
>> users, and that's a rather important thing!
>>
>> Perhaps it would help if you were to explain your motivation:
>> particular use cases which benefited from this, for example.
>>
>
> It's easy to control with for example systemd or many other tools, which
> capabilities a service should have at the start. But how should a system
> administrator, application developer or distro maintaner ever determine
> a suitable value for this? Currently the only way seems to be to become
> an expert on capabilities, make an educated guess how the set of
> programs in question happen to work in this context and especially how
> they could exercise the capabilites in all possible use cases. Even
> then, the outcome is to just try something to see if that happens to
> work. Reading the source code (if available) does not help very much,
> because the use of capabilities is anything but explicit there.
>
> This is way too difficult, there must be some easier way. The
> information which capabilities actually were used in a trial run gives a
> much better starting point. The users can just use the list of used
> capabilities with configuring the service or when developing or
> maintaining the application. Of course, even that could still fail
> eventually, but then you simply copy the new value of used capabilities
> to the configuration, whereas currently you have to reconsider your
> understanding of the capabilities and the programs in light of the
> failure, which by itself might give no new useful information.
>
> One way to solve this for good would be to make the use of capabilities
> explicit in the ABI. For example, there could be a system call
> dac_override() which would be the only possible way ever to use the
> capability CAP_DAC_OVERRIDE and so forth. Then reading source code,
> tracing and many other approaches would be useful. But the OS with that
> kind of ABI (not Linux) would not be Unix-like at all for any
> (potentially) capability using programs, like find(1) or cat(1).

The problem is that most of the capabilities are so powerful on their
own that limiting services to just a few may be all but useless.

--Andy


[PATCH v2] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread Jongsung Kim
There is no way to set additional flags for a DT-initialized fixed-
factor-clock, and it can be problematic i.e., when the clock rate
needs to be changed. [1][2]

This patch introduces an optional dt-binding named "clock-flags" to
be used for passing any needed flags from dts.

[1] http://www.spinics.net/lists/linux-clk/msg09040.html
[2] https://lkml.org/lkml/2016/6/20/1025

Changes since v1:
 - fix possible build failure when using gcc-5 or gcc-6

Signed-off-by: Jongsung Kim 
Cc: Maxime Ripard 
Cc: Mike Turquette 
Cc: Stephen Boyd 
---
 .../bindings/clock/fixed-factor-clock.txt  |  4 
 drivers/clk/clk-fixed-factor.c |  4 +++-
 include/dt-bindings/clk/clk.h  | 22 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 include/dt-bindings/clk/clk.h

diff --git a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt 
b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
index 1bae8527..3e1b79e 100644
--- a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
+++ b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
@@ -13,12 +13,16 @@ Required properties:
 
 Optional properties:
 - clock-output-names : From common clock binding.
+- clock-flags : Additional flags to be used.
 
 Example:
+   #include 
+
clock {
compatible = "fixed-factor-clock";
clocks = <>;
#clock-cells = <0>;
clock-div = <2>;
clock-mult = <1>;
+   clock-flags = ;
};
diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 75cd6c7..e626cad 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -150,6 +150,7 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
struct clk *clk;
const char *clk_name = node->name;
const char *parent_name;
+   u32 flags = 0;
u32 div, mult;
 
if (of_property_read_u32(node, "clock-div", )) {
@@ -166,8 +167,9 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
 
of_property_read_string(node, "clock-output-names", _name);
parent_name = of_clk_get_parent_name(node, 0);
+   of_property_read_u32(node, "clock-flags", );
 
-   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 0,
+   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags,
mult, div);
if (!IS_ERR(clk))
of_clk_add_provider(node, of_clk_src_simple_get, clk);
diff --git a/include/dt-bindings/clk/clk.h b/include/dt-bindings/clk/clk.h
new file mode 100644
index 000..1834933
--- /dev/null
+++ b/include/dt-bindings/clk/clk.h
@@ -0,0 +1,22 @@
+/*
+ * See include/linux/clk-provider.h for more information.
+ */
+
+#ifndef __DT_BINDINGS_CLK_CLK_H
+#define __DT_BINDINGS_CLK_CLK_H
+
+#define BIT(nr)(1UL << (nr))
+
+#define CLK_SET_RATE_GATE  BIT(0)
+#define CLK_SET_PARENT_GATEBIT(1)
+#define CLK_SET_RATE_PARENTBIT(2)
+#define CLK_IGNORE_UNUSED  BIT(3)
+#define CLK_IS_BASIC   BIT(5)
+#define CLK_GET_RATE_NOCACHE   BIT(6)
+#define CLK_SET_RATE_NO_REPARENT   BIT(7)
+#define CLK_GET_ACCURACY_NOCACHE   BIT(8)
+#define CLK_RECALC_NEW_RATES   BIT(9)
+#define CLK_SET_RATE_UNGATEBIT(10)
+#define CLK_IS_CRITICALBIT(11)
+
+#endif
-- 
2.7.4



[PATCH v2] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread Jongsung Kim
There is no way to set additional flags for a DT-initialized fixed-
factor-clock, and it can be problematic i.e., when the clock rate
needs to be changed. [1][2]

This patch introduces an optional dt-binding named "clock-flags" to
be used for passing any needed flags from dts.

[1] http://www.spinics.net/lists/linux-clk/msg09040.html
[2] https://lkml.org/lkml/2016/6/20/1025

Changes since v1:
 - fix possible build failure when using gcc-5 or gcc-6

Signed-off-by: Jongsung Kim 
Cc: Maxime Ripard 
Cc: Mike Turquette 
Cc: Stephen Boyd 
---
 .../bindings/clock/fixed-factor-clock.txt  |  4 
 drivers/clk/clk-fixed-factor.c |  4 +++-
 include/dt-bindings/clk/clk.h  | 22 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 include/dt-bindings/clk/clk.h

diff --git a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt 
b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
index 1bae8527..3e1b79e 100644
--- a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
+++ b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
@@ -13,12 +13,16 @@ Required properties:
 
 Optional properties:
 - clock-output-names : From common clock binding.
+- clock-flags : Additional flags to be used.
 
 Example:
+   #include 
+
clock {
compatible = "fixed-factor-clock";
clocks = <>;
#clock-cells = <0>;
clock-div = <2>;
clock-mult = <1>;
+   clock-flags = ;
};
diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 75cd6c7..e626cad 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -150,6 +150,7 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
struct clk *clk;
const char *clk_name = node->name;
const char *parent_name;
+   u32 flags = 0;
u32 div, mult;
 
if (of_property_read_u32(node, "clock-div", )) {
@@ -166,8 +167,9 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
 
of_property_read_string(node, "clock-output-names", _name);
parent_name = of_clk_get_parent_name(node, 0);
+   of_property_read_u32(node, "clock-flags", );
 
-   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 0,
+   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags,
mult, div);
if (!IS_ERR(clk))
of_clk_add_provider(node, of_clk_src_simple_get, clk);
diff --git a/include/dt-bindings/clk/clk.h b/include/dt-bindings/clk/clk.h
new file mode 100644
index 000..1834933
--- /dev/null
+++ b/include/dt-bindings/clk/clk.h
@@ -0,0 +1,22 @@
+/*
+ * See include/linux/clk-provider.h for more information.
+ */
+
+#ifndef __DT_BINDINGS_CLK_CLK_H
+#define __DT_BINDINGS_CLK_CLK_H
+
+#define BIT(nr)(1UL << (nr))
+
+#define CLK_SET_RATE_GATE  BIT(0)
+#define CLK_SET_PARENT_GATEBIT(1)
+#define CLK_SET_RATE_PARENTBIT(2)
+#define CLK_IGNORE_UNUSED  BIT(3)
+#define CLK_IS_BASIC   BIT(5)
+#define CLK_GET_RATE_NOCACHE   BIT(6)
+#define CLK_SET_RATE_NO_REPARENT   BIT(7)
+#define CLK_GET_ACCURACY_NOCACHE   BIT(8)
+#define CLK_RECALC_NEW_RATES   BIT(9)
+#define CLK_SET_RATE_UNGATEBIT(10)
+#define CLK_IS_CRITICALBIT(11)
+
+#endif
-- 
2.7.4



RE: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-06-23 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, June 24, 2016 11:37 AM
> 
> On Fri, 24 Jun 2016 10:52:58 +0800
> Yongji Xie  wrote:
> > On 2016/6/24 0:12, Alex Williamson wrote:
> > > On Mon, 30 May 2016 21:06:37 +0800
> > > Yongji Xie  wrote:
> > >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> > >> +{
> > >> +struct resource *res;
> > >> +int bar;
> > >> +struct vfio_pci_dummy_resource *dummy_res;
> > >> +
> > >> +INIT_LIST_HEAD(>dummy_resources_list);
> > >> +
> > >> +for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; 
> > >> bar++) {
> > >> +res = vdev->pdev->resource + bar;
> > >> +
> > >> +if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> > >> +goto no_mmap;
> > >> +
> > >> +if (!(res->flags & IORESOURCE_MEM))
> > >> +goto no_mmap;
> > >> +
> > >> +/*
> > >> + * The PCI core shouldn't set up a resource with a
> > >> + * type but zero size. But there may be bugs that
> > >> + * cause us to do that.
> > >> + */
> > >> +if (!resource_size(res))
> > >> +goto no_mmap;
> > >> +
> > >> +if (resource_size(res) >= PAGE_SIZE) {
> > >> +vdev->bar_mmap_supported[bar] = true;
> > >> +continue;
> > >> +}
> > >> +
> > >> +if (!(res->start & ~PAGE_MASK)) {
> > >> +/*
> > >> + * Add a dummy resource to reserve the remainder
> > >> + * of the exclusive page in case that hot-add
> > >> + * device's bar is assigned into it.
> > >> + */
> > >> +dummy_res = kzalloc(sizeof(*dummy_res), 
> > >> GFP_KERNEL);
> > >> +if (dummy_res == NULL)
> > >> +goto no_mmap;
> > >> +
> > >> +dummy_res->resource.start = res->end + 1;
> > >> +dummy_res->resource.end = res->start + 
> > >> PAGE_SIZE - 1;
> > >> +dummy_res->resource.flags = res->flags;
> > >> +if (request_resource(res->parent,
> > >> +_res->resource)) {
> > >> +kfree(dummy_res);
> > >> +goto no_mmap;
> > >> +}
> > > Isn't it true that request_resource() only tells us that at a given
> > > point in time, no other drivers have reserved that resource?  It seems
> > > like it does not guarantee that the resource isn't routed to another
> > > device or that another driver won't at some point attempt to request
> > > that same resource.  So for example if a user constructs their initrd
> > > to bind vfio-pci to devices before other modules load, this
> > > request_resource() may succeed, at the expense of drivers loaded later
> > > now failing.  The behavior will depend on driver load order and we're
> > > not actually insuring that the overflow resource is unused, just that
> > > we got it first.  Can we do better?  Am I missing something that
> > > prevents this?  Thanks,
> > >
> > > Alex
> >
> > Couldn't PCI resources allocator prevent this, which will find a
> > empty slot in the resource tree firstly, then try to request that
> > resource in allocate_resource() when a PCI device is probed.
> > And I'd like to know why a PCI device driver would attempt to
> > call request_resource()? Should this be done in PCI enumeration?
> 
> Hi Yongji,
> 
> Looks like most pci drivers call pci_request_regions().  From there the
> call path is:
> 
> pci_request_selected_regions
>   __pci_request_selected_regions
> __pci_request_region
>   __request_mem_region
> __request_region
>   __request_resource
> 
> We see this driver ordering issue sometimes with users attempting to
> blacklist native pci drivers, trying to leave a device free for use by
> vfio-pci.  If the device is a graphics card, the generic vesa or uefi
> driver can request device resources causing a failure when vfio-pci
> tries to request those same resources.  I expect that unless it's a
> boot device, like vga in my example, the resources are not enabled
> until the driver opens the device, therefore the request_resource() call
> doesn't occur until that point.
> 
> For another trivial example, look at /proc/iomem as you load and unload
> a driver, on my laptop with e1000e unloaded I see:
> 
>   e120-e121 : :00:19.0
>   e123e000-e123efff : :00:19.0
> 
> When e1000e is loaded, each of these becomes claimed by the e1000e
> driver:
> 
>   e120-e121 : :00:19.0
> e120-e121 : e1000e
>   

RE: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-06-23 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, June 24, 2016 11:37 AM
> 
> On Fri, 24 Jun 2016 10:52:58 +0800
> Yongji Xie  wrote:
> > On 2016/6/24 0:12, Alex Williamson wrote:
> > > On Mon, 30 May 2016 21:06:37 +0800
> > > Yongji Xie  wrote:
> > >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> > >> +{
> > >> +struct resource *res;
> > >> +int bar;
> > >> +struct vfio_pci_dummy_resource *dummy_res;
> > >> +
> > >> +INIT_LIST_HEAD(>dummy_resources_list);
> > >> +
> > >> +for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; 
> > >> bar++) {
> > >> +res = vdev->pdev->resource + bar;
> > >> +
> > >> +if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> > >> +goto no_mmap;
> > >> +
> > >> +if (!(res->flags & IORESOURCE_MEM))
> > >> +goto no_mmap;
> > >> +
> > >> +/*
> > >> + * The PCI core shouldn't set up a resource with a
> > >> + * type but zero size. But there may be bugs that
> > >> + * cause us to do that.
> > >> + */
> > >> +if (!resource_size(res))
> > >> +goto no_mmap;
> > >> +
> > >> +if (resource_size(res) >= PAGE_SIZE) {
> > >> +vdev->bar_mmap_supported[bar] = true;
> > >> +continue;
> > >> +}
> > >> +
> > >> +if (!(res->start & ~PAGE_MASK)) {
> > >> +/*
> > >> + * Add a dummy resource to reserve the remainder
> > >> + * of the exclusive page in case that hot-add
> > >> + * device's bar is assigned into it.
> > >> + */
> > >> +dummy_res = kzalloc(sizeof(*dummy_res), 
> > >> GFP_KERNEL);
> > >> +if (dummy_res == NULL)
> > >> +goto no_mmap;
> > >> +
> > >> +dummy_res->resource.start = res->end + 1;
> > >> +dummy_res->resource.end = res->start + 
> > >> PAGE_SIZE - 1;
> > >> +dummy_res->resource.flags = res->flags;
> > >> +if (request_resource(res->parent,
> > >> +_res->resource)) {
> > >> +kfree(dummy_res);
> > >> +goto no_mmap;
> > >> +}
> > > Isn't it true that request_resource() only tells us that at a given
> > > point in time, no other drivers have reserved that resource?  It seems
> > > like it does not guarantee that the resource isn't routed to another
> > > device or that another driver won't at some point attempt to request
> > > that same resource.  So for example if a user constructs their initrd
> > > to bind vfio-pci to devices before other modules load, this
> > > request_resource() may succeed, at the expense of drivers loaded later
> > > now failing.  The behavior will depend on driver load order and we're
> > > not actually insuring that the overflow resource is unused, just that
> > > we got it first.  Can we do better?  Am I missing something that
> > > prevents this?  Thanks,
> > >
> > > Alex
> >
> > Couldn't PCI resources allocator prevent this, which will find a
> > empty slot in the resource tree firstly, then try to request that
> > resource in allocate_resource() when a PCI device is probed.
> > And I'd like to know why a PCI device driver would attempt to
> > call request_resource()? Should this be done in PCI enumeration?
> 
> Hi Yongji,
> 
> Looks like most pci drivers call pci_request_regions().  From there the
> call path is:
> 
> pci_request_selected_regions
>   __pci_request_selected_regions
> __pci_request_region
>   __request_mem_region
> __request_region
>   __request_resource
> 
> We see this driver ordering issue sometimes with users attempting to
> blacklist native pci drivers, trying to leave a device free for use by
> vfio-pci.  If the device is a graphics card, the generic vesa or uefi
> driver can request device resources causing a failure when vfio-pci
> tries to request those same resources.  I expect that unless it's a
> boot device, like vga in my example, the resources are not enabled
> until the driver opens the device, therefore the request_resource() call
> doesn't occur until that point.
> 
> For another trivial example, look at /proc/iomem as you load and unload
> a driver, on my laptop with e1000e unloaded I see:
> 
>   e120-e121 : :00:19.0
>   e123e000-e123efff : :00:19.0
> 
> When e1000e is loaded, each of these becomes claimed by the e1000e
> driver:
> 
>   e120-e121 : :00:19.0
> e120-e121 : e1000e
>   e123e000-e123efff : :00:19.0
> e123e000-e123efff : e1000e
> 

Re: [PATCH V3] clocksource/drivers/arc: Convert init function to return error

2016-06-23 Thread Vineet Gupta
On Friday 17 June 2016 03:39 PM, Daniel Lezcano wrote:
> The init functions do not return any error. They behave as the following:
> 
>   - panic, thus leading to a kernel crash while another timer may work and
>make the system boot up correctly
> 
>   or
> 
>   - print an error and let the caller unaware if the state of the system
> 
> Change that by converting the init functions to return an error conforming
> to the CLOCKSOURCE_OF_RET prototype.
> 
> Proper error handling (rollback, errno value) will be changed later case
> by case, thus this change just return back an error or success in the init
> function.
> 
> Signed-off-by: Daniel Lezcano 
> ---
>  arch/arc/kernel/time.c | 69 
> ++

[...]

>   evt->cpumask = cpumask_of(smp_processor_id());
> @@ -347,24 +355,31 @@ static void __init arc_clockevent_setup(struct 
> device_node *node)
>   /* Needs apriori irq_set_percpu_devid() done in intc map function */
>   ret = request_percpu_irq(arc_timer_irq, timer_irq_handler,
>"Timer0 (per-cpu-tick)", evt);
> - if (ret)
> - panic("clockevent: unable to request irq\n");
> + if (ret) {
> + pr_err("clockevent: unable to request irq\n");
> + returnr ret;

oops I missed the typo here !
Daniel can u squash this to ur patch !

-Vineet



Re: [PATCH V3] clocksource/drivers/arc: Convert init function to return error

2016-06-23 Thread Vineet Gupta
On Friday 17 June 2016 03:39 PM, Daniel Lezcano wrote:
> The init functions do not return any error. They behave as the following:
> 
>   - panic, thus leading to a kernel crash while another timer may work and
>make the system boot up correctly
> 
>   or
> 
>   - print an error and let the caller unaware if the state of the system
> 
> Change that by converting the init functions to return an error conforming
> to the CLOCKSOURCE_OF_RET prototype.
> 
> Proper error handling (rollback, errno value) will be changed later case
> by case, thus this change just return back an error or success in the init
> function.
> 
> Signed-off-by: Daniel Lezcano 
> ---
>  arch/arc/kernel/time.c | 69 
> ++

[...]

>   evt->cpumask = cpumask_of(smp_processor_id());
> @@ -347,24 +355,31 @@ static void __init arc_clockevent_setup(struct 
> device_node *node)
>   /* Needs apriori irq_set_percpu_devid() done in intc map function */
>   ret = request_percpu_irq(arc_timer_irq, timer_irq_handler,
>"Timer0 (per-cpu-tick)", evt);
> - if (ret)
> - panic("clockevent: unable to request irq\n");
> + if (ret) {
> + pr_err("clockevent: unable to request irq\n");
> + returnr ret;

oops I missed the typo here !
Daniel can u squash this to ur patch !

-Vineet



[PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread Jongsung Kim
There is no way to set additional flags for a DT-initialized fixed-
factor-clock, and it can be problematic i.e., when the clock rate
needs to be changed. [1][2]

This patch introduces an optional dt-binding named "clock-flags" to
be used for passing any needed flags from dts.

[1] http://www.spinics.net/lists/linux-clk/msg09040.html
[2] https://lkml.org/lkml/2016/6/20/1025

Signed-off-by: Jongsung Kim 
Cc: Maxime Ripard 
Cc: Mike Turquette 
Cc: Stephen Boyd 
---
 .../bindings/clock/fixed-factor-clock.txt  |  4 
 drivers/clk/clk-fixed-factor.c |  4 +++-
 include/dt-bindings/clk/clk.h  | 22 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 include/dt-bindings/clk/clk.h

diff --git a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt 
b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
index 1bae8527..3e1b79e 100644
--- a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
+++ b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
@@ -13,12 +13,16 @@ Required properties:
 
 Optional properties:
 - clock-output-names : From common clock binding.
+- clock-flags : Additional flags to be used.
 
 Example:
+   #include 
+
clock {
compatible = "fixed-factor-clock";
clocks = <>;
#clock-cells = <0>;
clock-div = <2>;
clock-mult = <1>;
+   clock-flags = ;
};
diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 75cd6c7..da3cd9c 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -150,6 +150,7 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
struct clk *clk;
const char *clk_name = node->name;
const char *parent_name;
+   unsigned long flags = 0;
u32 div, mult;
 
if (of_property_read_u32(node, "clock-div", )) {
@@ -166,8 +167,9 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
 
of_property_read_string(node, "clock-output-names", _name);
parent_name = of_clk_get_parent_name(node, 0);
+   of_property_read_u32(node, "clock-flags", );
 
-   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 0,
+   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags,
mult, div);
if (!IS_ERR(clk))
of_clk_add_provider(node, of_clk_src_simple_get, clk);
diff --git a/include/dt-bindings/clk/clk.h b/include/dt-bindings/clk/clk.h
new file mode 100644
index 000..1834933
--- /dev/null
+++ b/include/dt-bindings/clk/clk.h
@@ -0,0 +1,22 @@
+/*
+ * See include/linux/clk-provider.h for more information.
+ */
+
+#ifndef __DT_BINDINGS_CLK_CLK_H
+#define __DT_BINDINGS_CLK_CLK_H
+
+#define BIT(nr)(1UL << (nr))
+
+#define CLK_SET_RATE_GATE  BIT(0)
+#define CLK_SET_PARENT_GATEBIT(1)
+#define CLK_SET_RATE_PARENTBIT(2)
+#define CLK_IGNORE_UNUSED  BIT(3)
+#define CLK_IS_BASIC   BIT(5)
+#define CLK_GET_RATE_NOCACHE   BIT(6)
+#define CLK_SET_RATE_NO_REPARENT   BIT(7)
+#define CLK_GET_ACCURACY_NOCACHE   BIT(8)
+#define CLK_RECALC_NEW_RATES   BIT(9)
+#define CLK_SET_RATE_UNGATEBIT(10)
+#define CLK_IS_CRITICALBIT(11)
+
+#endif
-- 
2.7.4



[PATCH] clk: fixed-factor: add optional dt-binding clock-flags

2016-06-23 Thread Jongsung Kim
There is no way to set additional flags for a DT-initialized fixed-
factor-clock, and it can be problematic i.e., when the clock rate
needs to be changed. [1][2]

This patch introduces an optional dt-binding named "clock-flags" to
be used for passing any needed flags from dts.

[1] http://www.spinics.net/lists/linux-clk/msg09040.html
[2] https://lkml.org/lkml/2016/6/20/1025

Signed-off-by: Jongsung Kim 
Cc: Maxime Ripard 
Cc: Mike Turquette 
Cc: Stephen Boyd 
---
 .../bindings/clock/fixed-factor-clock.txt  |  4 
 drivers/clk/clk-fixed-factor.c |  4 +++-
 include/dt-bindings/clk/clk.h  | 22 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 include/dt-bindings/clk/clk.h

diff --git a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt 
b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
index 1bae8527..3e1b79e 100644
--- a/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
+++ b/Documentation/devicetree/bindings/clock/fixed-factor-clock.txt
@@ -13,12 +13,16 @@ Required properties:
 
 Optional properties:
 - clock-output-names : From common clock binding.
+- clock-flags : Additional flags to be used.
 
 Example:
+   #include 
+
clock {
compatible = "fixed-factor-clock";
clocks = <>;
#clock-cells = <0>;
clock-div = <2>;
clock-mult = <1>;
+   clock-flags = ;
};
diff --git a/drivers/clk/clk-fixed-factor.c b/drivers/clk/clk-fixed-factor.c
index 75cd6c7..da3cd9c 100644
--- a/drivers/clk/clk-fixed-factor.c
+++ b/drivers/clk/clk-fixed-factor.c
@@ -150,6 +150,7 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
struct clk *clk;
const char *clk_name = node->name;
const char *parent_name;
+   unsigned long flags = 0;
u32 div, mult;
 
if (of_property_read_u32(node, "clock-div", )) {
@@ -166,8 +167,9 @@ void __init of_fixed_factor_clk_setup(struct device_node 
*node)
 
of_property_read_string(node, "clock-output-names", _name);
parent_name = of_clk_get_parent_name(node, 0);
+   of_property_read_u32(node, "clock-flags", );
 
-   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, 0,
+   clk = clk_register_fixed_factor(NULL, clk_name, parent_name, flags,
mult, div);
if (!IS_ERR(clk))
of_clk_add_provider(node, of_clk_src_simple_get, clk);
diff --git a/include/dt-bindings/clk/clk.h b/include/dt-bindings/clk/clk.h
new file mode 100644
index 000..1834933
--- /dev/null
+++ b/include/dt-bindings/clk/clk.h
@@ -0,0 +1,22 @@
+/*
+ * See include/linux/clk-provider.h for more information.
+ */
+
+#ifndef __DT_BINDINGS_CLK_CLK_H
+#define __DT_BINDINGS_CLK_CLK_H
+
+#define BIT(nr)(1UL << (nr))
+
+#define CLK_SET_RATE_GATE  BIT(0)
+#define CLK_SET_PARENT_GATEBIT(1)
+#define CLK_SET_RATE_PARENTBIT(2)
+#define CLK_IGNORE_UNUSED  BIT(3)
+#define CLK_IS_BASIC   BIT(5)
+#define CLK_GET_RATE_NOCACHE   BIT(6)
+#define CLK_SET_RATE_NO_REPARENT   BIT(7)
+#define CLK_GET_ACCURACY_NOCACHE   BIT(8)
+#define CLK_RECALC_NEW_RATES   BIT(9)
+#define CLK_SET_RATE_UNGATEBIT(10)
+#define CLK_IS_CRITICALBIT(11)
+
+#endif
-- 
2.7.4



Re: [PATCH 06/14] ARM: dts: sun8i: Add cpu0 label to sun8i-h3.dtsi

2016-06-23 Thread Chen-Yu Tsai
On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>
> Add label to the first cpu so that it can be referenced
> from derived dts files.
>
> Signed-off-by: Ondrej Jirman 
> ---
>  arch/arm/boot/dts/sun8i-h3.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
> index 9938972..82faefc 100644
> --- a/arch/arm/boot/dts/sun8i-h3.dtsi
> +++ b/arch/arm/boot/dts/sun8i-h3.dtsi
> @@ -52,7 +52,7 @@
> #address-cells = <1>;
> #size-cells = <0>;
>
> -   cpu@0 {
> +   cpu0: cpu@0 {
> compatible = "arm,cortex-a7";
> device_type = "cpu";
> reg = <0>;

Can you also set the cpu clock here? It is part of the SoC
and does not belong in the board DTS files.

Otherwise this one looks good.

ChenYu

> --
> 2.9.0
>


Re: [PATCH 06/14] ARM: dts: sun8i: Add cpu0 label to sun8i-h3.dtsi

2016-06-23 Thread Chen-Yu Tsai
On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>
> Add label to the first cpu so that it can be referenced
> from derived dts files.
>
> Signed-off-by: Ondrej Jirman 
> ---
>  arch/arm/boot/dts/sun8i-h3.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi b/arch/arm/boot/dts/sun8i-h3.dtsi
> index 9938972..82faefc 100644
> --- a/arch/arm/boot/dts/sun8i-h3.dtsi
> +++ b/arch/arm/boot/dts/sun8i-h3.dtsi
> @@ -52,7 +52,7 @@
> #address-cells = <1>;
> #size-cells = <0>;
>
> -   cpu@0 {
> +   cpu0: cpu@0 {
> compatible = "arm,cortex-a7";
> device_type = "cpu";
> reg = <0>;

Can you also set the cpu clock here? It is part of the SoC
and does not belong in the board DTS files.

Otherwise this one looks good.

ChenYu

> --
> 2.9.0
>


[PATCH V3] printk: Create pr_ functions

2016-06-23 Thread Joe Perches
Using functions instead of macros can reduce overall code size
by eliminating unnecessary "KERN_SOH" prefixes from
format strings.

defconfig x86-64:

$ size vmlinux*
   textdata bss  dec hex  filename
10193570 4331464 1105920 15630954  ee826a vmlinux.new
10192623 4335560 1105920 15634103  ee8eb7 vmlinux.old

As the return value are unimportant and unused in the kernel tree,
these new functions return void.

Miscellanea:

o change pr_ macros to call new __pr_ functions
o change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

Signed-off-by: Joe Perches 
---
change in v3:

In case anyone didn't notice, Joe can't cut'n'paste.
Fix __pr_info function definition at LOGLEVEL_NOTICE level.

changes in V2:

Fix "CONFIG_PRINTK is not set" builds by adding CONFIG_PRINTK blocks
Fix x86-32 builds by setting __pr_ functions __asmlinkage and visible

Compile tested cross-compiled sparc, tinyconfig, x86-32 & -64 w//o printk

 include/linux/printk.h   | 48 +---
 kernel/printk/internal.h | 16 ++--
 kernel/printk/nmi.c  | 13 +++--
 kernel/printk/printk.c   | 27 ---
 4 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index f4da695..e6ff22e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -254,21 +254,39 @@ extern asmlinkage void dump_stack(void) __cold;
  * and other debug macros are compiled out unless either DEBUG is defined
  * or CONFIG_DYNAMIC_DEBUG is set.
  */
-#define pr_emerg(fmt, ...) \
-   printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_alert(fmt, ...) \
-   printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_crit(fmt, ...) \
-   printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_err(fmt, ...) \
-   printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warning(fmt, ...) \
-   printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warn pr_warning
-#define pr_notice(fmt, ...) \
-   printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_info(fmt, ...) \
-   printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#ifdef CONFIG_PRINTK
+
+asmlinkage __printf(1, 2) __cold void __pr_emerg(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_alert(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_crit(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_err(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_warn(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_notice(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_info(const char *fmt, ...);
+
+#define pr_emerg(fmt, ...) __pr_emerg(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...) __pr_alert(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)  __pr_crit(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)   __pr_err(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)  __pr_warn(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)__pr_notice(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)  __pr_info(pr_fmt(fmt), ##__VA_ARGS__)
+
+#else
+
+#define pr_emerg(fmt, ...) printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...) printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)  printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)   printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)  printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#endif
+
+#define pr_warning pr_warn
+
 /*
  * Like KERN_CONT, pr_cont() should only be used when continuing
  * a line with no newline ('\n') enclosed. Otherwise it defaults
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..5d4505f 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -16,9 +16,11 @@
  */
 #include 
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+typedef __printf(2, 0) int (*printk_func_t)(int level, const char *fmt,
+   va_list args);
 
-int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+__printf(2, 0)
+int vprintk_default(int level, const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
@@ -31,9 +33,10 @@ extern raw_spinlock_t logbuf_lock;
  * via per-CPU variable.
  */
 DECLARE_PER_CPU(printk_func_t, printk_func);
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+__printf(2, 0)
+static inline int vprintk_func(int level, const char *fmt, va_list args)
 {
-   return this_cpu_read(printk_func)(fmt, args);
+   return this_cpu_read(printk_func)(level, fmt, args);
 }
 
 extern atomic_t 

[PATCH V3] printk: Create pr_ functions

2016-06-23 Thread Joe Perches
Using functions instead of macros can reduce overall code size
by eliminating unnecessary "KERN_SOH" prefixes from
format strings.

defconfig x86-64:

$ size vmlinux*
   textdata bss  dec hex  filename
10193570 4331464 1105920 15630954  ee826a vmlinux.new
10192623 4335560 1105920 15634103  ee8eb7 vmlinux.old

As the return value are unimportant and unused in the kernel tree,
these new functions return void.

Miscellanea:

o change pr_ macros to call new __pr_ functions
o change vprintk_nmi and vprintk_default to add LOGLEVEL_ argument

Signed-off-by: Joe Perches 
---
change in v3:

In case anyone didn't notice, Joe can't cut'n'paste.
Fix __pr_info function definition at LOGLEVEL_NOTICE level.

changes in V2:

Fix "CONFIG_PRINTK is not set" builds by adding CONFIG_PRINTK blocks
Fix x86-32 builds by setting __pr_ functions __asmlinkage and visible

Compile tested cross-compiled sparc, tinyconfig, x86-32 & -64 w//o printk

 include/linux/printk.h   | 48 +---
 kernel/printk/internal.h | 16 ++--
 kernel/printk/nmi.c  | 13 +++--
 kernel/printk/printk.c   | 27 ---
 4 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index f4da695..e6ff22e 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -254,21 +254,39 @@ extern asmlinkage void dump_stack(void) __cold;
  * and other debug macros are compiled out unless either DEBUG is defined
  * or CONFIG_DYNAMIC_DEBUG is set.
  */
-#define pr_emerg(fmt, ...) \
-   printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_alert(fmt, ...) \
-   printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_crit(fmt, ...) \
-   printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_err(fmt, ...) \
-   printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warning(fmt, ...) \
-   printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_warn pr_warning
-#define pr_notice(fmt, ...) \
-   printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
-#define pr_info(fmt, ...) \
-   printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#ifdef CONFIG_PRINTK
+
+asmlinkage __printf(1, 2) __cold void __pr_emerg(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_alert(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_crit(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_err(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_warn(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_notice(const char *fmt, ...);
+asmlinkage __printf(1, 2) __cold void __pr_info(const char *fmt, ...);
+
+#define pr_emerg(fmt, ...) __pr_emerg(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...) __pr_alert(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)  __pr_crit(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)   __pr_err(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)  __pr_warn(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)__pr_notice(pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)  __pr_info(pr_fmt(fmt), ##__VA_ARGS__)
+
+#else
+
+#define pr_emerg(fmt, ...) printk(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert(fmt, ...) printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit(fmt, ...)  printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err(fmt, ...)   printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn(fmt, ...)  printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice(fmt, ...)printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info(fmt, ...)  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+
+#endif
+
+#define pr_warning pr_warn
+
 /*
  * Like KERN_CONT, pr_cont() should only be used when continuing
  * a line with no newline ('\n') enclosed. Otherwise it defaults
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 7fd2838..5d4505f 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -16,9 +16,11 @@
  */
 #include 
 
-typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args);
+typedef __printf(2, 0) int (*printk_func_t)(int level, const char *fmt,
+   va_list args);
 
-int __printf(1, 0) vprintk_default(const char *fmt, va_list args);
+__printf(2, 0)
+int vprintk_default(int level, const char *fmt, va_list args);
 
 #ifdef CONFIG_PRINTK_NMI
 
@@ -31,9 +33,10 @@ extern raw_spinlock_t logbuf_lock;
  * via per-CPU variable.
  */
 DECLARE_PER_CPU(printk_func_t, printk_func);
-static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+__printf(2, 0)
+static inline int vprintk_func(int level, const char *fmt, va_list args)
 {
-   return this_cpu_read(printk_func)(fmt, args);
+   return this_cpu_read(printk_func)(level, fmt, args);
 }
 
 extern atomic_t nmi_message_lost;
@@ -44,9 

Re: [PATCH 07/14] regulator: SY8106A regulator driver

2016-06-23 Thread Chen-Yu Tsai
On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>
> SY8106A is I2C attached single output voltage regulator
> made by Silergy.
>
> Signed-off-by: Ondrej Jirman 
> ---
>  drivers/regulator/Kconfig |   8 +-
>  drivers/regulator/Makefile|   2 +-
>  drivers/regulator/sy8106a-regulator.c | 153 
> ++
>  3 files changed, 161 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/regulator/sy8106a-regulator.c
>
> diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
> index 144cbf5..fc3fae2 100644
> --- a/drivers/regulator/Kconfig
> +++ b/drivers/regulator/Kconfig
> @@ -860,5 +860,11 @@ config REGULATOR_WM8994
>   This driver provides support for the voltage regulators on the
>   WM8994 CODEC.
>
> -endif
> +config REGULATOR_SY8106A
> +   tristate "Silergy SY8106A"
> +   depends on I2C

Maybe you should also depend on OF since the driver is going to crippled
without any constraints set, or (OF || COMPILE_TEST) if you want some
compile test coverage.

> +   select REGMAP_I2C
> +   help
> + This driver provides support for the voltage regulator SY8106A.
>
> +endif
> diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
> index 85a1d44..f382095 100644
> --- a/drivers/regulator/Makefile
> +++ b/drivers/regulator/Makefile
> @@ -110,6 +110,6 @@ obj-$(CONFIG_REGULATOR_WM831X) += wm831x-ldo.o
>  obj-$(CONFIG_REGULATOR_WM8350) += wm8350-regulator.o
>  obj-$(CONFIG_REGULATOR_WM8400) += wm8400-regulator.o
>  obj-$(CONFIG_REGULATOR_WM8994) += wm8994-regulator.o
> -
> +obj-$(CONFIG_REGULATOR_SY8106A) += sy8106a-regulator.o

Follow the existing ordering in the Makefile.

>
>  ccflags-$(CONFIG_REGULATOR_DEBUG) += -DDEBUG
> diff --git a/drivers/regulator/sy8106a-regulator.c 
> b/drivers/regulator/sy8106a-regulator.c
> new file mode 100644
> index 000..34bd69c
> --- /dev/null
> +++ b/drivers/regulator/sy8106a-regulator.c
> @@ -0,0 +1,153 @@
> +/*
> + * sy8106a-regulator.c - Regulator device driver for SY8106A
> + *
> + * Copyright (C) 2016  Ondřej Jirman 
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Library General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Library General Public License for more details.
> + *
> + * You should have received a copy of the GNU Library General Public
> + * License along with this library; if not, write to the
> + * Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
> + * Boston, MA  02110-1301, USA.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 

Do you need this one?

> +#include 
> +#include 

And this one?

> +#include 
> +#include 

Sort alphabetically please.

> +
> +#define SY8106A_REG_VOUT1_SEL  0x01
> +#define SY8106A_REG_VOUT_COM   0x02
> +#define SY8106A_REG_VOUT1_SEL_MASK 0x7f
> +#define SY8106A_DISABLE_REG0x01

BIT(0) would be clearer.

> +
> +struct sy8106a {
> +   struct regulator_dev *rdev;
> +   struct regmap *regmap;
> +};
> +
> +static const struct regmap_config sy8106a_regmap_config = {
> +   .reg_bits = 8,
> +   .val_bits = 8,
> +};
> +
> +static int sy8106a_set_voltage_sel(struct regulator_dev *rdev, unsigned sel)
> +{
> +   return regmap_update_bits(rdev->regmap, rdev->desc->vsel_reg,
> + 0xff, sel | 0x80);

Can you use .apply_bit / .apply_reg with regulator_set_voltage_sel_regmap?

> +}
> +
> +static const struct regulator_ops sy8106a_ops = {
> +   .is_enabled = regulator_is_enabled_regmap,
> +   .set_voltage_sel = sy8106a_set_voltage_sel,
> +   .set_voltage_time_sel = regulator_set_voltage_time_sel,
> +   .get_voltage_sel = regulator_get_voltage_sel_regmap,
> +   .list_voltage = regulator_list_voltage_linear,
> +};
> +
> +/* Default limits measured in millivolts and milliamps */
> +#define SY8106A_MIN_MV 680
> +#define SY8106A_MAX_MV 1950
> +#define SY8106A_STEP_MV10
> +
> +static const struct regulator_desc sy8106a_reg = {
> +   .name = "SY8106A",
> +   .id = 0,
> +   .ops = _ops,
> +   .type = REGULATOR_VOLTAGE,
> +   .n_voltages = ((SY8106A_MAX_MV - SY8106A_MIN_MV) / SY8106A_STEP_MV) + 
> 1,
> +   .min_uV = (SY8106A_MIN_MV * 1000),
> +   .uV_step = (SY8106A_STEP_MV * 1000),
> +   .vsel_reg = SY8106A_REG_VOUT1_SEL,
> +   .vsel_mask = SY8106A_REG_VOUT1_SEL_MASK,
> +   .enable_reg = SY8106A_REG_VOUT_COM,
> +   .enable_mask = SY8106A_DISABLE_REG,
> +   

Re: [PATCH 07/14] regulator: SY8106A regulator driver

2016-06-23 Thread Chen-Yu Tsai
On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>
> SY8106A is I2C attached single output voltage regulator
> made by Silergy.
>
> Signed-off-by: Ondrej Jirman 
> ---
>  drivers/regulator/Kconfig |   8 +-
>  drivers/regulator/Makefile|   2 +-
>  drivers/regulator/sy8106a-regulator.c | 153 
> ++
>  3 files changed, 161 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/regulator/sy8106a-regulator.c
>
> diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
> index 144cbf5..fc3fae2 100644
> --- a/drivers/regulator/Kconfig
> +++ b/drivers/regulator/Kconfig
> @@ -860,5 +860,11 @@ config REGULATOR_WM8994
>   This driver provides support for the voltage regulators on the
>   WM8994 CODEC.
>
> -endif
> +config REGULATOR_SY8106A
> +   tristate "Silergy SY8106A"
> +   depends on I2C

Maybe you should also depend on OF since the driver is going to crippled
without any constraints set, or (OF || COMPILE_TEST) if you want some
compile test coverage.

> +   select REGMAP_I2C
> +   help
> + This driver provides support for the voltage regulator SY8106A.
>
> +endif
> diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
> index 85a1d44..f382095 100644
> --- a/drivers/regulator/Makefile
> +++ b/drivers/regulator/Makefile
> @@ -110,6 +110,6 @@ obj-$(CONFIG_REGULATOR_WM831X) += wm831x-ldo.o
>  obj-$(CONFIG_REGULATOR_WM8350) += wm8350-regulator.o
>  obj-$(CONFIG_REGULATOR_WM8400) += wm8400-regulator.o
>  obj-$(CONFIG_REGULATOR_WM8994) += wm8994-regulator.o
> -
> +obj-$(CONFIG_REGULATOR_SY8106A) += sy8106a-regulator.o

Follow the existing ordering in the Makefile.

>
>  ccflags-$(CONFIG_REGULATOR_DEBUG) += -DDEBUG
> diff --git a/drivers/regulator/sy8106a-regulator.c 
> b/drivers/regulator/sy8106a-regulator.c
> new file mode 100644
> index 000..34bd69c
> --- /dev/null
> +++ b/drivers/regulator/sy8106a-regulator.c
> @@ -0,0 +1,153 @@
> +/*
> + * sy8106a-regulator.c - Regulator device driver for SY8106A
> + *
> + * Copyright (C) 2016  Ondřej Jirman 
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Library General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Library General Public License for more details.
> + *
> + * You should have received a copy of the GNU Library General Public
> + * License along with this library; if not, write to the
> + * Free Software Foundation, Inc., 51 Franklin St, Fifth Floor,
> + * Boston, MA  02110-1301, USA.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 

Do you need this one?

> +#include 
> +#include 

And this one?

> +#include 
> +#include 

Sort alphabetically please.

> +
> +#define SY8106A_REG_VOUT1_SEL  0x01
> +#define SY8106A_REG_VOUT_COM   0x02
> +#define SY8106A_REG_VOUT1_SEL_MASK 0x7f
> +#define SY8106A_DISABLE_REG0x01

BIT(0) would be clearer.

> +
> +struct sy8106a {
> +   struct regulator_dev *rdev;
> +   struct regmap *regmap;
> +};
> +
> +static const struct regmap_config sy8106a_regmap_config = {
> +   .reg_bits = 8,
> +   .val_bits = 8,
> +};
> +
> +static int sy8106a_set_voltage_sel(struct regulator_dev *rdev, unsigned sel)
> +{
> +   return regmap_update_bits(rdev->regmap, rdev->desc->vsel_reg,
> + 0xff, sel | 0x80);

Can you use .apply_bit / .apply_reg with regulator_set_voltage_sel_regmap?

> +}
> +
> +static const struct regulator_ops sy8106a_ops = {
> +   .is_enabled = regulator_is_enabled_regmap,
> +   .set_voltage_sel = sy8106a_set_voltage_sel,
> +   .set_voltage_time_sel = regulator_set_voltage_time_sel,
> +   .get_voltage_sel = regulator_get_voltage_sel_regmap,
> +   .list_voltage = regulator_list_voltage_linear,
> +};
> +
> +/* Default limits measured in millivolts and milliamps */
> +#define SY8106A_MIN_MV 680
> +#define SY8106A_MAX_MV 1950
> +#define SY8106A_STEP_MV10
> +
> +static const struct regulator_desc sy8106a_reg = {
> +   .name = "SY8106A",
> +   .id = 0,
> +   .ops = _ops,
> +   .type = REGULATOR_VOLTAGE,
> +   .n_voltages = ((SY8106A_MAX_MV - SY8106A_MIN_MV) / SY8106A_STEP_MV) + 
> 1,
> +   .min_uV = (SY8106A_MIN_MV * 1000),
> +   .uV_step = (SY8106A_STEP_MV * 1000),
> +   .vsel_reg = SY8106A_REG_VOUT1_SEL,
> +   .vsel_mask = SY8106A_REG_VOUT1_SEL_MASK,
> +   .enable_reg = SY8106A_REG_VOUT_COM,
> +   .enable_mask = SY8106A_DISABLE_REG,
> +   .disable_val = SY8106A_DISABLE_REG,
> +   .enable_is_inverted = 1,
> +   

[PATCH] powercap/intel_rapl: Add support for Ivy Bridge server

2016-06-23 Thread Xiaolong Wang
It's confirmed that RAPL works as expected on Ivy Bridge servers.
Tested against processor: Intel(R) Xeon(R) CPU E5-2697 v2 @2.70GHz

Signed-off-by: Xiaolong Wang 
---
 drivers/powercap/intel_rapl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 06d21e6..fbab29d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -1134,6 +1134,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
RAPL_CPU(INTEL_FAM6_SANDYBRIDGE_X,  rapl_defaults_core),
 
RAPL_CPU(INTEL_FAM6_IVYBRIDGE,  rapl_defaults_core),
+   RAPL_CPU(INTEL_FAM6_IVYBRIDGE_X,rapl_defaults_core),
 
RAPL_CPU(INTEL_FAM6_HASWELL_CORE,   rapl_defaults_core),
RAPL_CPU(INTEL_FAM6_HASWELL_ULT,rapl_defaults_core),
-- 
1.8.3.1



[PATCH] powercap/intel_rapl: Add support for Ivy Bridge server

2016-06-23 Thread Xiaolong Wang
It's confirmed that RAPL works as expected on Ivy Bridge servers.
Tested against processor: Intel(R) Xeon(R) CPU E5-2697 v2 @2.70GHz

Signed-off-by: Xiaolong Wang 
---
 drivers/powercap/intel_rapl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 06d21e6..fbab29d 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -1134,6 +1134,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
RAPL_CPU(INTEL_FAM6_SANDYBRIDGE_X,  rapl_defaults_core),
 
RAPL_CPU(INTEL_FAM6_IVYBRIDGE,  rapl_defaults_core),
+   RAPL_CPU(INTEL_FAM6_IVYBRIDGE_X,rapl_defaults_core),
 
RAPL_CPU(INTEL_FAM6_HASWELL_CORE,   rapl_defaults_core),
RAPL_CPU(INTEL_FAM6_HASWELL_ULT,rapl_defaults_core),
-- 
1.8.3.1



Re: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-06-23 Thread Alex Williamson
On Fri, 24 Jun 2016 10:52:58 +0800
Yongji Xie  wrote:
> On 2016/6/24 0:12, Alex Williamson wrote:
> > On Mon, 30 May 2016 21:06:37 +0800
> > Yongji Xie  wrote:
> >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> >> +{
> >> +  struct resource *res;
> >> +  int bar;
> >> +  struct vfio_pci_dummy_resource *dummy_res;
> >> +
> >> +  INIT_LIST_HEAD(>dummy_resources_list);
> >> +
> >> +  for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
> >> +  res = vdev->pdev->resource + bar;
> >> +
> >> +  if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> >> +  goto no_mmap;
> >> +
> >> +  if (!(res->flags & IORESOURCE_MEM))
> >> +  goto no_mmap;
> >> +
> >> +  /*
> >> +   * The PCI core shouldn't set up a resource with a
> >> +   * type but zero size. But there may be bugs that
> >> +   * cause us to do that.
> >> +   */
> >> +  if (!resource_size(res))
> >> +  goto no_mmap;
> >> +
> >> +  if (resource_size(res) >= PAGE_SIZE) {
> >> +  vdev->bar_mmap_supported[bar] = true;
> >> +  continue;
> >> +  }
> >> +
> >> +  if (!(res->start & ~PAGE_MASK)) {
> >> +  /*
> >> +   * Add a dummy resource to reserve the remainder
> >> +   * of the exclusive page in case that hot-add
> >> +   * device's bar is assigned into it.
> >> +   */
> >> +  dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
> >> +  if (dummy_res == NULL)
> >> +  goto no_mmap;
> >> +
> >> +  dummy_res->resource.start = res->end + 1;
> >> +  dummy_res->resource.end = res->start + PAGE_SIZE - 1;
> >> +  dummy_res->resource.flags = res->flags;
> >> +  if (request_resource(res->parent,
> >> +  _res->resource)) {
> >> +  kfree(dummy_res);
> >> +  goto no_mmap;
> >> +  }  
> > Isn't it true that request_resource() only tells us that at a given
> > point in time, no other drivers have reserved that resource?  It seems
> > like it does not guarantee that the resource isn't routed to another
> > device or that another driver won't at some point attempt to request
> > that same resource.  So for example if a user constructs their initrd
> > to bind vfio-pci to devices before other modules load, this
> > request_resource() may succeed, at the expense of drivers loaded later
> > now failing.  The behavior will depend on driver load order and we're
> > not actually insuring that the overflow resource is unused, just that
> > we got it first.  Can we do better?  Am I missing something that
> > prevents this?  Thanks,
> >
> > Alex  
> 
> Couldn't PCI resources allocator prevent this, which will find a
> empty slot in the resource tree firstly, then try to request that
> resource in allocate_resource() when a PCI device is probed.
> And I'd like to know why a PCI device driver would attempt to
> call request_resource()? Should this be done in PCI enumeration?

Hi Yongji,

Looks like most pci drivers call pci_request_regions().  From there the
call path is:

pci_request_selected_regions
  __pci_request_selected_regions
__pci_request_region
  __request_mem_region
__request_region
  __request_resource

We see this driver ordering issue sometimes with users attempting to
blacklist native pci drivers, trying to leave a device free for use by
vfio-pci.  If the device is a graphics card, the generic vesa or uefi
driver can request device resources causing a failure when vfio-pci
tries to request those same resources.  I expect that unless it's a
boot device, like vga in my example, the resources are not enabled
until the driver opens the device, therefore the request_resource() call
doesn't occur until that point.

For another trivial example, look at /proc/iomem as you load and unload
a driver, on my laptop with e1000e unloaded I see:

  e120-e121 : :00:19.0
  e123e000-e123efff : :00:19.0

When e1000e is loaded, each of these becomes claimed by the e1000e
driver:

  e120-e121 : :00:19.0
e120-e121 : e1000e
  e123e000-e123efff : :00:19.0
e123e000-e123efff : e1000e

Clearly pci core knows the resource is associated with the device, but
I don't think we're tapping into that with request_resource(), we're
just potentially stealing resources that another driver might have
claimed otherwise as I described above.  That's my suspicion at
least, feel free to show otherwise if it's incorrect.  Thanks,

Alex


Re: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-06-23 Thread Alex Williamson
On Fri, 24 Jun 2016 10:52:58 +0800
Yongji Xie  wrote:
> On 2016/6/24 0:12, Alex Williamson wrote:
> > On Mon, 30 May 2016 21:06:37 +0800
> > Yongji Xie  wrote:
> >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> >> +{
> >> +  struct resource *res;
> >> +  int bar;
> >> +  struct vfio_pci_dummy_resource *dummy_res;
> >> +
> >> +  INIT_LIST_HEAD(>dummy_resources_list);
> >> +
> >> +  for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
> >> +  res = vdev->pdev->resource + bar;
> >> +
> >> +  if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> >> +  goto no_mmap;
> >> +
> >> +  if (!(res->flags & IORESOURCE_MEM))
> >> +  goto no_mmap;
> >> +
> >> +  /*
> >> +   * The PCI core shouldn't set up a resource with a
> >> +   * type but zero size. But there may be bugs that
> >> +   * cause us to do that.
> >> +   */
> >> +  if (!resource_size(res))
> >> +  goto no_mmap;
> >> +
> >> +  if (resource_size(res) >= PAGE_SIZE) {
> >> +  vdev->bar_mmap_supported[bar] = true;
> >> +  continue;
> >> +  }
> >> +
> >> +  if (!(res->start & ~PAGE_MASK)) {
> >> +  /*
> >> +   * Add a dummy resource to reserve the remainder
> >> +   * of the exclusive page in case that hot-add
> >> +   * device's bar is assigned into it.
> >> +   */
> >> +  dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
> >> +  if (dummy_res == NULL)
> >> +  goto no_mmap;
> >> +
> >> +  dummy_res->resource.start = res->end + 1;
> >> +  dummy_res->resource.end = res->start + PAGE_SIZE - 1;
> >> +  dummy_res->resource.flags = res->flags;
> >> +  if (request_resource(res->parent,
> >> +  _res->resource)) {
> >> +  kfree(dummy_res);
> >> +  goto no_mmap;
> >> +  }  
> > Isn't it true that request_resource() only tells us that at a given
> > point in time, no other drivers have reserved that resource?  It seems
> > like it does not guarantee that the resource isn't routed to another
> > device or that another driver won't at some point attempt to request
> > that same resource.  So for example if a user constructs their initrd
> > to bind vfio-pci to devices before other modules load, this
> > request_resource() may succeed, at the expense of drivers loaded later
> > now failing.  The behavior will depend on driver load order and we're
> > not actually insuring that the overflow resource is unused, just that
> > we got it first.  Can we do better?  Am I missing something that
> > prevents this?  Thanks,
> >
> > Alex  
> 
> Couldn't PCI resources allocator prevent this, which will find a
> empty slot in the resource tree firstly, then try to request that
> resource in allocate_resource() when a PCI device is probed.
> And I'd like to know why a PCI device driver would attempt to
> call request_resource()? Should this be done in PCI enumeration?

Hi Yongji,

Looks like most pci drivers call pci_request_regions().  From there the
call path is:

pci_request_selected_regions
  __pci_request_selected_regions
__pci_request_region
  __request_mem_region
__request_region
  __request_resource

We see this driver ordering issue sometimes with users attempting to
blacklist native pci drivers, trying to leave a device free for use by
vfio-pci.  If the device is a graphics card, the generic vesa or uefi
driver can request device resources causing a failure when vfio-pci
tries to request those same resources.  I expect that unless it's a
boot device, like vga in my example, the resources are not enabled
until the driver opens the device, therefore the request_resource() call
doesn't occur until that point.

For another trivial example, look at /proc/iomem as you load and unload
a driver, on my laptop with e1000e unloaded I see:

  e120-e121 : :00:19.0
  e123e000-e123efff : :00:19.0

When e1000e is loaded, each of these becomes claimed by the e1000e
driver:

  e120-e121 : :00:19.0
e120-e121 : e1000e
  e123e000-e123efff : :00:19.0
e123e000-e123efff : e1000e

Clearly pci core knows the resource is associated with the device, but
I don't think we're tapping into that with request_resource(), we're
just potentially stealing resources that another driver might have
claimed otherwise as I described above.  That's my suspicion at
least, feel free to show otherwise if it's incorrect.  Thanks,

Alex


Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Balbir Singh


On 24/06/16 03:17, Aneesh Kumar K.V wrote:
> Reza Arbab  writes:
> 
>> These functions are making direct calls to the hash table APIs,
>> leading to a BUG() on systems using radix.
>>
>> Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
>> move to the __meminit section.
> 
> 
> They are really not the same. They can possibly end up using different
> base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
> enabled. Does hotplug depend on sparsemem vmemmap ?

# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG

We depend on sparsemem for sure. vmemmap is just a way of getting the memory
virtually mapped. From the patch perspective, I think we need the equivalent of
just mapping the pages in kernel. The address may differ based on whether 
vmemmap
is used or not and of-course page_size, 

Balbir Singh


Re: [PATCH] powerpc/mm: update arch_{add,remove}_memory() for radix

2016-06-23 Thread Balbir Singh


On 24/06/16 03:17, Aneesh Kumar K.V wrote:
> Reza Arbab  writes:
> 
>> These functions are making direct calls to the hash table APIs,
>> leading to a BUG() on systems using radix.
>>
>> Switch them to the vmemmap_{create,remove}_mapping() wrappers, and
>> move to the __meminit section.
> 
> 
> They are really not the same. They can possibly end up using different
> base page size. Also vmemmap is available only with SPARSEMEM_VMEMMAP
> enabled. Does hotplug depend on sparsemem vmemmap ?

# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG

We depend on sparsemem for sure. vmemmap is just a way of getting the memory
virtually mapped. From the patch perspective, I think we need the equivalent of
just mapping the pages in kernel. The address may differ based on whether 
vmemmap
is used or not and of-course page_size, 

Balbir Singh


Re: [PATCH 03/14] thermal: Add support for sun8i THS on Allwinner H3

2016-06-23 Thread Chen-Yu Tsai
Hi,

On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>

The subject could read:

  thermal: sun8i_ths: Add support for the thermal sensor on Allwinner H3

> This patch adds support for the sun8i thermal sensor on
> Allwinner H3 SoC.
>
> Signed-off-by: Ondřej Jirman 
> ---
>  drivers/thermal/Kconfig |   7 ++
>  drivers/thermal/Makefile|   1 +
>  drivers/thermal/sun8i_ths.c | 295 
> 
>  3 files changed, 303 insertions(+)
>  create mode 100644 drivers/thermal/sun8i_ths.c
>
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 2d702ca..3de0f8d 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -351,6 +351,13 @@ config MTK_THERMAL
>   Enable this option if you want to have support for thermal 
> management
>   controller present in Mediatek SoCs
>
> +config SUN8I_THS
> +   tristate "sun8i THS driver"

Explain THS.

> +   depends on MACH_SUN8I
> +   depends on OF
> +   help
> + Enable this to support thermal reporting on some newer Allwinner 
> SoCs.
> +
>  menu "Texas Instruments thermal drivers"
>  depends on ARCH_HAS_BANDGAP || COMPILE_TEST
>  depends on HAS_IOMEM
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 10b07c1..7261ee8 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -51,3 +51,4 @@ obj-$(CONFIG_TEGRA_SOCTHERM)  += tegra/
>  obj-$(CONFIG_HISI_THERMAL) += hisi_thermal.o
>  obj-$(CONFIG_MTK_THERMAL)  += mtk_thermal.o
>  obj-$(CONFIG_GENERIC_ADC_THERMAL)  += thermal-generic-adc.o
> +obj-$(CONFIG_SUN8I_THS)+= sun8i_ths.o
> diff --git a/drivers/thermal/sun8i_ths.c b/drivers/thermal/sun8i_ths.c
> new file mode 100644
> index 000..618ccc3
> --- /dev/null
> +++ b/drivers/thermal/sun8i_ths.c
> @@ -0,0 +1,295 @@
> +/*
> + * sun8i THS driver

Explain THS.

> + *
> + * Copyright (C) 2016 Ondřej Jirman
> + * Based on the work of Josef Gajdusek 
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define THS_H3_CTRL0   0x00
> +#define THS_H3_CTRL2   0x40
> +#define THS_H3_INT_CTRL0x44
> +#define THS_H3_STAT0x48
> +#define THS_H3_FILTER  0x70
> +#define THS_H3_CDATA   0x74
> +#define THS_H3_DATA0x80
> +
> +#define THS_H3_CTRL0_SENSOR_ACQ0_OFFS   0
> +#define THS_H3_CTRL0_SENSOR_ACQ0(x) \
> +((x) << THS_H3_CTRL0_SENSOR_ACQ0_OFFS)
> +#define THS_H3_CTRL2_SENSE_EN_OFFS  0
> +#define THS_H3_CTRL2_SENSE_EN \
> +BIT(THS_H3_CTRL2_SENSE_EN_OFFS)
> +#define THS_H3_CTRL2_SENSOR_ACQ1_OFFS   16
> +#define THS_H3_CTRL2_SENSOR_ACQ1(x) \
> +((x) << THS_H3_CTRL2_SENSOR_ACQ1_OFFS)
> +
> +#define THS_H3_INT_CTRL_DATA_IRQ_EN_OFFS8
> +#define THS_H3_INT_CTRL_DATA_IRQ_EN \
> +   BIT(THS_H3_INT_CTRL_DATA_IRQ_EN_OFFS)
> +#define THS_H3_INT_CTRL_THERMAL_PER_OFFS12
> +#define THS_H3_INT_CTRL_THERMAL_PER(x) \
> +   ((x) << THS_H3_INT_CTRL_THERMAL_PER_OFFS)
> +
> +#define THS_H3_STAT_DATA_IRQ_STS_OFFS   8
> +#define THS_H3_STAT_DATA_IRQ_STS \
> +BIT(THS_H3_STAT_DATA_IRQ_STS_OFFS)
> +
> +#define THS_H3_FILTER_TYPE_OFFS 0
> +#define THS_H3_FILTER_TYPE(x) \
> +((x) << THS_H3_FILTER_TYPE_OFFS)
> +#define THS_H3_FILTER_EN_OFFS   2
> +#define THS_H3_FILTER_EN \
> +BIT(THS_H3_FILTER_EN_OFFS)

Is it really necessary to split the lines of all the macros?
It makes it harder to find and read stuff.

You're also not using any of the *_OFFS macros in the actual code,
so just drop them.

> +
> +#define THS_H3_CLK_IN 4000  /* Hz */
> +#define THS_H3_DATA_PERIOD 330  /* ms */
> +
> +#define THS_H3_FILTER_TYPE_VALUE   2  /* average over 2^(n+1) 
> samples */
> +#define THS_H3_FILTER_DIV  (1 << 
> (THS_H3_FILTER_TYPE_VALUE + 1))
> +#define THS_H3_INT_CTRL_THERMAL_PER_VALUE \
> +   (THS_H3_DATA_PERIOD * (THS_H3_CLK_IN / 1000) / THS_H3_FILTER_DIV / 
> 4096 - 1)
> +#define THS_H3_CTRL0_SENSOR_ACQ0_VALUE 0x3f /* 16us */
> +#define THS_H3_CTRL2_SENSOR_ACQ1_VALUE 0x3f
> +
> +struct sun8i_ths_data {
> +   struct reset_control *reset;
> +   struct clk *clk;
> +   struct clk *busclk;
> +   void __iomem *regs;
> +   struct nvmem_cell 

Re: [PATCH 03/14] thermal: Add support for sun8i THS on Allwinner H3

2016-06-23 Thread Chen-Yu Tsai
Hi,

On Fri, Jun 24, 2016 at 3:20 AM,   wrote:
> From: Ondrej Jirman 
>

The subject could read:

  thermal: sun8i_ths: Add support for the thermal sensor on Allwinner H3

> This patch adds support for the sun8i thermal sensor on
> Allwinner H3 SoC.
>
> Signed-off-by: Ondřej Jirman 
> ---
>  drivers/thermal/Kconfig |   7 ++
>  drivers/thermal/Makefile|   1 +
>  drivers/thermal/sun8i_ths.c | 295 
> 
>  3 files changed, 303 insertions(+)
>  create mode 100644 drivers/thermal/sun8i_ths.c
>
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 2d702ca..3de0f8d 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -351,6 +351,13 @@ config MTK_THERMAL
>   Enable this option if you want to have support for thermal 
> management
>   controller present in Mediatek SoCs
>
> +config SUN8I_THS
> +   tristate "sun8i THS driver"

Explain THS.

> +   depends on MACH_SUN8I
> +   depends on OF
> +   help
> + Enable this to support thermal reporting on some newer Allwinner 
> SoCs.
> +
>  menu "Texas Instruments thermal drivers"
>  depends on ARCH_HAS_BANDGAP || COMPILE_TEST
>  depends on HAS_IOMEM
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 10b07c1..7261ee8 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -51,3 +51,4 @@ obj-$(CONFIG_TEGRA_SOCTHERM)  += tegra/
>  obj-$(CONFIG_HISI_THERMAL) += hisi_thermal.o
>  obj-$(CONFIG_MTK_THERMAL)  += mtk_thermal.o
>  obj-$(CONFIG_GENERIC_ADC_THERMAL)  += thermal-generic-adc.o
> +obj-$(CONFIG_SUN8I_THS)+= sun8i_ths.o
> diff --git a/drivers/thermal/sun8i_ths.c b/drivers/thermal/sun8i_ths.c
> new file mode 100644
> index 000..618ccc3
> --- /dev/null
> +++ b/drivers/thermal/sun8i_ths.c
> @@ -0,0 +1,295 @@
> +/*
> + * sun8i THS driver

Explain THS.

> + *
> + * Copyright (C) 2016 Ondřej Jirman
> + * Based on the work of Josef Gajdusek 
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define THS_H3_CTRL0   0x00
> +#define THS_H3_CTRL2   0x40
> +#define THS_H3_INT_CTRL0x44
> +#define THS_H3_STAT0x48
> +#define THS_H3_FILTER  0x70
> +#define THS_H3_CDATA   0x74
> +#define THS_H3_DATA0x80
> +
> +#define THS_H3_CTRL0_SENSOR_ACQ0_OFFS   0
> +#define THS_H3_CTRL0_SENSOR_ACQ0(x) \
> +((x) << THS_H3_CTRL0_SENSOR_ACQ0_OFFS)
> +#define THS_H3_CTRL2_SENSE_EN_OFFS  0
> +#define THS_H3_CTRL2_SENSE_EN \
> +BIT(THS_H3_CTRL2_SENSE_EN_OFFS)
> +#define THS_H3_CTRL2_SENSOR_ACQ1_OFFS   16
> +#define THS_H3_CTRL2_SENSOR_ACQ1(x) \
> +((x) << THS_H3_CTRL2_SENSOR_ACQ1_OFFS)
> +
> +#define THS_H3_INT_CTRL_DATA_IRQ_EN_OFFS8
> +#define THS_H3_INT_CTRL_DATA_IRQ_EN \
> +   BIT(THS_H3_INT_CTRL_DATA_IRQ_EN_OFFS)
> +#define THS_H3_INT_CTRL_THERMAL_PER_OFFS12
> +#define THS_H3_INT_CTRL_THERMAL_PER(x) \
> +   ((x) << THS_H3_INT_CTRL_THERMAL_PER_OFFS)
> +
> +#define THS_H3_STAT_DATA_IRQ_STS_OFFS   8
> +#define THS_H3_STAT_DATA_IRQ_STS \
> +BIT(THS_H3_STAT_DATA_IRQ_STS_OFFS)
> +
> +#define THS_H3_FILTER_TYPE_OFFS 0
> +#define THS_H3_FILTER_TYPE(x) \
> +((x) << THS_H3_FILTER_TYPE_OFFS)
> +#define THS_H3_FILTER_EN_OFFS   2
> +#define THS_H3_FILTER_EN \
> +BIT(THS_H3_FILTER_EN_OFFS)

Is it really necessary to split the lines of all the macros?
It makes it harder to find and read stuff.

You're also not using any of the *_OFFS macros in the actual code,
so just drop them.

> +
> +#define THS_H3_CLK_IN 4000  /* Hz */
> +#define THS_H3_DATA_PERIOD 330  /* ms */
> +
> +#define THS_H3_FILTER_TYPE_VALUE   2  /* average over 2^(n+1) 
> samples */
> +#define THS_H3_FILTER_DIV  (1 << 
> (THS_H3_FILTER_TYPE_VALUE + 1))
> +#define THS_H3_INT_CTRL_THERMAL_PER_VALUE \
> +   (THS_H3_DATA_PERIOD * (THS_H3_CLK_IN / 1000) / THS_H3_FILTER_DIV / 
> 4096 - 1)
> +#define THS_H3_CTRL0_SENSOR_ACQ0_VALUE 0x3f /* 16us */
> +#define THS_H3_CTRL2_SENSOR_ACQ1_VALUE 0x3f
> +
> +struct sun8i_ths_data {
> +   struct reset_control *reset;
> +   struct clk *clk;
> +   struct clk *busclk;
> +   void __iomem *regs;
> +   struct nvmem_cell *calcell;
> +   struct platform_device *pdev;
> +   struct 

  1   2   3   4   5   6   7   8   9   10   >