from:"\"Serge Semin\""

[PATCH v2 0/3] ntb: Asynchronous NTB devices support

2016-07-28 Thread Serge Semin

Please, find the general patchset description in the cover letter of the first
patchset (see the very first message in thread).

Changes in v2:
 - Fix sparc64 compilation warning in drivers/ntb/hw/idt/ntb_hw_idt.c :
   warning: right shift count >= width of type
 - Fix sparc64 compilation warnings in drivers/ntb/test/ntb_mw_test.c :
   warning: right shift count >= width of type
   warning: cast to pointer from integer of different size

Thanks,

=
Serge V. Semin
Leading Programmer
Embedded SW development group
T-platforms
=

Signed-off-by: Serge Semin 

fancer (3):
  ntb: Add asynchronous devices support to NTB-bus interface
  ntb: IDT 89HPES*NT* PCIe-switches NTB device driver
  ntb: Test client drivers for asynchronous NTB devices

 drivers/ntb/Kconfig|4 +-
 drivers/ntb/hw/Kconfig |1 +
 drivers/ntb/hw/Makefile|6 +-
 drivers/ntb/hw/amd/ntb_hw_amd.c|   49 +-
 drivers/ntb/hw/idt/Kconfig |   21 +
 drivers/ntb/hw/idt/Makefile|5 +
 drivers/ntb/hw/idt/ntb_hw_idt.c| 4050 
 drivers/ntb/hw/idt/ntb_hw_idt.h|  390 +++
 drivers/ntb/hw/idt/ntb_hw_idt_quirks.c |  163 ++
 drivers/ntb/hw/idt/ntb_hw_idt_quirks.h |  114 +
 drivers/ntb/hw/idt/ntb_hw_idt_regmap.h |  877 +++
 drivers/ntb/hw/intel/ntb_hw_intel.c|   59 +-
 drivers/ntb/ntb.c  |   86 +-
 drivers/ntb/ntb_transport.c|   19 +-
 drivers/ntb/test/Kconfig   |   32 +
 drivers/ntb/test/Makefile  |9 +-
 drivers/ntb/test/ntb_db_test.c |  677 ++
 drivers/ntb/test/ntb_msg_test.c|  736 ++
 drivers/ntb/test/ntb_mw_test.c | 1539 
 drivers/ntb/test/ntb_perf.c|   16 +-
 drivers/ntb/test/ntb_pingpong.c|5 +
 drivers/ntb/test/ntb_tool.c|   25 +-
 include/linux/ntb.h|  600 -
 23 files changed, 9317 insertions(+), 166 deletions(-)
 create mode 100644 drivers/ntb/hw/idt/Kconfig
 create mode 100644 drivers/ntb/hw/idt/Makefile
 create mode 100644 drivers/ntb/hw/idt/ntb_hw_idt.c
 create mode 100644 drivers/ntb/hw/idt/ntb_hw_idt.h
 create mode 100644 drivers/ntb/hw/idt/ntb_hw_idt_quirks.c
 create mode 100644 drivers/ntb/hw/idt/ntb_hw_idt_quirks.h
 create mode 100644 drivers/ntb/hw/idt/ntb_hw_idt_regmap.h
 create mode 100644 drivers/ntb/test/ntb_db_test.c
 create mode 100644 drivers/ntb/test/ntb_msg_test.c
 create mode 100644 drivers/ntb/test/ntb_mw_test.c

-- 
2.6.6

[PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-07-28 Thread Serge Semin

Currently supported AMD and Intel Non-transparent PCIe-bridges are synchronous
devices, so translated base address of memory windows can be direcly written
to peer registers. But there are some IDT PCIe-switches which implement
complex interfaces using Lookup Tables of translation addresses. Due to
the way the table is accessed, it can not be done synchronously from different
RCs, that's why the asynchronous interface should be developed.

For these purpose the Memory Window related interface is correspondingly split
as it is for Doorbell and Scratchpad registers. The definition of Memory Window
is following: "It is a virtual memory region, which locally reflects a physical
memory of peer device." So to speak the "ntb_peer_mw_"-prefixed methods control
the peers memory windows, "ntb_mw_"-prefixed functions work with the local
memory windows.
Here is the description of the Memory Window related NTB-bus callback
functions:
 - ntb_mw_count() - number of local memory windows.
 - ntb_mw_get_maprsc() - get the physical address and size of the local memory
 window to map.
 - ntb_mw_set_trans() - set translation address of local memory window (this
address should be somehow retrieved from a peer).
 - ntb_mw_get_trans() - get translation address of local memory window.
 - ntb_mw_get_align() - get alignment of translated base address and size of
local memory window. Additionally one can get the
upper size limit of the memory window.
 - ntb_peer_mw_count() - number of peer memory windows (it can differ from the
 local number).
 - ntb_peer_mw_set_trans() - set translation address of peer memory window
 - ntb_peer_mw_get_trans() - get translation address of peer memory window
 - ntb_peer_mw_get_align() - get alignment of translated base address and size
 of peer memory window.Additionally one can get the
 upper size limit of the memory window.

As one can see current AMD and Intel NTB drivers mostly implement the
"ntb_peer_mw_"-prefixed methods. So this patch correspondingly renames the
driver functions. IDT NTB driver mostly expose "ntb_nw_"-prefixed methods,
since it doesn't have convenient access to the peer Lookup Table.

In order to pass information from one RC to another NTB functions of IDT
PCIe-switch implement Messaging subsystem. They currently support four message
registers to transfer DWORD sized data to a specified peer. So there are two
new callback methods are introduced:
 - ntb_msg_size() - get the number of DWORDs supported by NTB function to send
and receive messages
 - ntb_msg_post() - send message of size retrieved from ntb_msg_size()
to a peer
Additionally there is a new event function:
 - ntb_msg_event() - it is invoked when either a new message was retrieved
 (NTB_MSG_NEW), or last message was successfully sent
 (NTB_MSG_SENT), or the last message failed to be sent
 (NTB_MSG_FAIL).

The last change concerns the IDs (practically names) of NTB-devices on the
NTB-bus. It is not good to have the devices with same names in the system
and it brakes my IDT NTB driver from being loaded =) So I developed a simple
algorithm of NTB devices naming. Particulary it generates names "ntbS{N}" for
synchronous devices, "ntbA{N}" for asynchronous devices, and "ntbAS{N}" for
devices supporting both interfaces.

Signed-off-by: Serge Semin 

---
 drivers/ntb/Kconfig |   4 +-
 drivers/ntb/hw/amd/ntb_hw_amd.c |  49 ++-
 drivers/ntb/hw/intel/ntb_hw_intel.c |  59 +++-
 drivers/ntb/ntb.c   |  86 +-
 drivers/ntb/ntb_transport.c |  19 +-
 drivers/ntb/test/ntb_perf.c |  16 +-
 drivers/ntb/test/ntb_pingpong.c |   5 +
 drivers/ntb/test/ntb_tool.c |  25 +-
 include/linux/ntb.h | 600 +---
 9 files changed, 701 insertions(+), 162 deletions(-)

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index 95944e5..67d80c4 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -14,8 +14,6 @@ if NTB
 
 source "drivers/ntb/hw/Kconfig"
 
-source "drivers/ntb/test/Kconfig"
-
 config NTB_TRANSPORT
tristate "NTB Transport Client"
help
@@ -25,4 +23,6 @@ config NTB_TRANSPORT
 
 If unsure, say N.
 
+source "drivers/ntb/test/Kconfig"
+
 endif # NTB
diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 6ccba0d..ab6f353 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ntb_hw_amd.h"
@@ -84,11 +85,8 @@ static int amd_

[PATCH v2 3/3] ntb: Test client drivers for asynchronous NTB devices

2016-07-28 Thread Serge Semin

There are three drivers to independently test all interfaces implemented by
the IDT 89HPES*NT* NTB driver.

Doorbells are tested by new NTB Doorbell Pingpong client driver. It implements
the so-named algorithm. Driver starts working from setting the peer doorbell of
the last locally set doorbell bit. If there has not been locally set doorbell,
it sets the very first bit. After that the driver unmasks the events of the
just set doorbell bit and waits until the peer sets the same doorbell. When
peer does it, the local driver iterates to the next doorbell bit and starts
delayed work thread, which will set the corresponding bit and perform doorbell
bit umasking on waking up.

Messaging subsystem can be tested by the client driver implementing a simple
transmition/reception algorithm. A message can be send to a peer by writing
data to /sys/kernel/debug/ntb_msg_test/ntbA{N}/data file. The peer can read
it from the same file then.

Memory Windows test driver implements a simple write/read algorithm. The driver
allocates the predefined number of local buffers (inbound memory window -
inwndw{N}). In order to get a translated base address driver sends a
corresponding command to a peer. Then driver initialize the outbound memory
windows (outwndw{N}). The read/write operations can be performed using the
following debug nodes:
/sys/kernel/debug/ntb_mw_test/ntbA{N}/inwndw{N}
/sys/kernel/debug/ntb_mw_test/ntbA{N}/outwndw{N}

Signed-off-by: Serge Semin 

---
 drivers/ntb/test/Kconfig|   32 +
 drivers/ntb/test/Makefile   |9 +-
 drivers/ntb/test/ntb_db_test.c  |  677 +
 drivers/ntb/test/ntb_msg_test.c |  736 +++
 drivers/ntb/test/ntb_mw_test.c  | 1539 +++
 5 files changed, 2991 insertions(+), 2 deletions(-)
 create mode 100644 drivers/ntb/test/ntb_db_test.c
 create mode 100644 drivers/ntb/test/ntb_msg_test.c
 create mode 100644 drivers/ntb/test/ntb_mw_test.c

diff --git a/drivers/ntb/test/Kconfig b/drivers/ntb/test/Kconfig
index a5d0eda..80f5058 100644
--- a/drivers/ntb/test/Kconfig
+++ b/drivers/ntb/test/Kconfig
@@ -25,3 +25,35 @@ config NTB_PERF
 to and from the window without additional software interaction.
 
 If unsure, say N.
+
+config NTB_DB_TEST
+   tristate "NTB Doorbell Test Client"
+   help
+This is a driver to test doorbell subsystem of NTB bus devices.
+The design is similar to the ping pong although it exchanges the
+doorbell bits one-by-one, waiting for the peer response before getting
+to a next doorbell.
+
+If unsure, say N.
+
+config NTB_MSG_TEST
+   tristate "NTB Messaging Test Client"
+   help
+This is a driver to test messaging subsystem of NTB. It just creates
+one file in the DebugFS for each NTB device of asynchronous
+architecture. In order to send a message one can just write a text to
+the file. It will be immediately sent to the peer so user can get it
+by reading from the corresponding file.
+
+If unsure, say N.
+
+config NTB_MW_TEST
+   tristate "NTB Memory Windows Test Client"
+   help
+This is a driver to test memory sharing amongst devices. It creates a
+set of files in the DebugFS, one of which are used to write a text to
+outbound memory windows and anothers can be used to read data written
+by the peer to our inbound memory window.
+
+If unsure, say N.
+
diff --git a/drivers/ntb/test/Makefile b/drivers/ntb/test/Makefile
index 9e77e0b..6ea6db4 100644
--- a/drivers/ntb/test/Makefile
+++ b/drivers/ntb/test/Makefile
@@ -1,3 +1,8 @@
+# Synchronous hardware clients (Intel/AMD)
 obj-$(CONFIG_NTB_PINGPONG) += ntb_pingpong.o
-obj-$(CONFIG_NTB_TOOL) += ntb_tool.o
-obj-$(CONFIG_NTB_PERF) += ntb_perf.o
+obj-$(CONFIG_NTB_TOOL) += ntb_tool.o
+obj-$(CONFIG_NTB_PERF) += ntb_perf.o
+# Asynchronous hardware clients (IDT)
+obj-$(CONFIG_NTB_DB_TEST)  += ntb_db_test.o
+obj-$(CONFIG_NTB_MSG_TEST) += ntb_msg_test.o
+obj-$(CONFIG_NTB_MW_TEST)  += ntb_mw_test.o
diff --git a/drivers/ntb/test/ntb_db_test.c b/drivers/ntb/test/ntb_db_test.c
new file mode 100644
index 000..e93c0c6
--- /dev/null
+++ b/drivers/ntb/test/ntb_db_test.c
@@ -0,0 +1,677 @@
+/*
+ *   This file is provided under a GPLv2 license.  When using or
+ *   redistributing this file, you may do so under that license.
+ *
+ *   GPL LICENSE SUMMARY
+ *
+ *   Copyright (C) 2016 T-Platforms All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or modify it
+ *   under the terms and conditions of the GNU General Public License,
+ *   version 2, as published by the Free Software Foundation.
+ *
+ *   This program is distributed in the hope that it will be useful, but 
WITHOUT
+ *   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *   FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ *

Re: [PATCH v2 0/3] ntb: Asynchronous NTB devices support

2016-07-28 Thread Serge Semin

Hello, Allen.
Thanks for the message. I see your point. Yes, I've seen a lot of cruel
threads in mailing threads in lkml.org , so it's not my intention to
argue about basic things like Coding Style. That's why I left most of
the warnings discussable. While you a digging into the Patch 1/3, I'll
do my best to fix the checkpatch warnings of the rest of the code. Regarding
the last checkpatch error message, I need to spend some more time to
find a way to set it free of the warnings. I hope I'll come up with
something good, at least I'll give it a try. Otherwise I'll have to
redesign the driver regmap subsystem.(

Regards,
-Sergey

On Thu, Jul 28, 2016 at 10:42:30AM -0400, Allen Hubbe  
wrote:
> From: Serge Semin
> > Please, find the general patchset description in the cover letter of the 
> > first
> > patchset (see the very first message in thread).
> > 
> > Changes in v2:
> >  - Fix sparc64 compilation warning in drivers/ntb/hw/idt/ntb_hw_idt.c :
> >warning: right shift count >= width of type
> >  - Fix sparc64 compilation warnings in drivers/ntb/test/ntb_mw_test.c :
> >warning: right shift count >= width of type
> >warning: cast to pointer from integer of different size
> 
> Thanks for reacting to the test robot so quickly.  Since nobody else has 
> responded yet, I would like to assure you that the patches are not being 
> ignored.  Please be patient.  The IDT driver will be a valuable contribution 
> to the ntb subsystem.  I am working carefully through patch 1/3 first, since 
> it affects existing drivers and interface.
> 
> A word of caution regarding your statement, "There are a some types of 
> checkpatch warnings I left unfixed."  Coding style can be a touchy subject, 
> leading to some recent rants^H^H^H^H^Hdiscussion on some of the same topics 
> that are included in that list of unfixed warnings.  Be prepared to adhere to 
> the style guide, even if it is inconvenient and against your own logic, 
> because that is almost always the easier and more practical approach than 
> asking for changes or exceptions, and better for your mental health not to be 
> on the To: list of something like https://lkml.org/lkml/2016/7/8/625.
> 
> "Of course all of these warnings are discussable, except the last one."  Be 
> prepared, even if it will require significant changes to the code.  For 
> really inconvenient changes, we can talk about other more readily acceptable 
> approaches to keep the code short and elegant, as is obviously your intent.  
> Please be patient with the review.
>

Re: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-07 Thread Serge Semin

allocate a memory and send the 
address back using some of a hardware mechanism. It can be anything: Scratchpad 
registers, Message registers or even "crazy" doorbells bingbanging. For 
instance, the IDT switches of the first group support:
1) Shared Memory windows. In particular local root complex can set a translated 
base address to BARs of local and peer NT-function using the cross-coupled 
PCIe/NTB configuration space, the same way as it can be done for AMD/Intel NTBs.
2) One Doorbell register.
3) Two Scratchpads.
4) Four message regietsrs.
As you can see the switches of the first group can be considered as both 
synchronous and asynchronous. All the NTB bus kernel API can be implemented for 
it including the changes introduced by this patch (I would do it if I had a 
corresponding hardware). AMD and Intel NTBs can be considered both synchronous 
and asynchronous as well, although they don't support messaging so Scratchpads 
can be used to send a data to a peer. Finally the switches of the second group 
lack of ability to initialize BARs translated base address of peers due to the 
race condition I described before.

To sum up I've spent a lot of time designing the IDT NTB driver. I've done my 
best to make the IDT driver as much compatible with current design as possible, 
nevertheless the NTB bus kernel API had to be slightly changed. You can find 
answers to the commentaries down below.

On Fri, Aug 05, 2016 at 11:31:58AM -0400, Allen Hubbe  
wrote:
> From: Serge Semin
> > Currently supported AMD and Intel Non-transparent PCIe-bridges are 
> > synchronous
> > devices, so translated base address of memory windows can be direcly written
> > to peer registers. But there are some IDT PCIe-switches which implement
> > complex interfaces using Lookup Tables of translation addresses. Due to
> > the way the table is accessed, it can not be done synchronously from 
> > different
> > RCs, that's why the asynchronous interface should be developed.
> > 
> > For these purpose the Memory Window related interface is correspondingly 
> > split
> > as it is for Doorbell and Scratchpad registers. The definition of Memory 
> > Window
> > is following: "It is a virtual memory region, which locally reflects a 
> > physical
> > memory of peer device." So to speak the "ntb_peer_mw_"-prefixed methods 
> > control
> > the peers memory windows, "ntb_mw_"-prefixed functions work with the local
> > memory windows.
> > Here is the description of the Memory Window related NTB-bus callback
> > functions:
> >  - ntb_mw_count() - number of local memory windows.
> >  - ntb_mw_get_maprsc() - get the physical address and size of the local 
> > memory
> >  window to map.
> >  - ntb_mw_set_trans() - set translation address of local memory window (this
> > address should be somehow retrieved from a peer).
> >  - ntb_mw_get_trans() - get translation address of local memory window.
> >  - ntb_mw_get_align() - get alignment of translated base address and size of
> > local memory window. Additionally one can get the
> > upper size limit of the memory window.
> >  - ntb_peer_mw_count() - number of peer memory windows (it can differ from 
> > the
> >  local number).
> >  - ntb_peer_mw_set_trans() - set translation address of peer memory window
> >  - ntb_peer_mw_get_trans() - get translation address of peer memory window
> >  - ntb_peer_mw_get_align() - get alignment of translated base address and 
> > size
> >  of peer memory window.Additionally one can get 
> > the
> >  upper size limit of the memory window.
> > 
> > As one can see current AMD and Intel NTB drivers mostly implement the
> > "ntb_peer_mw_"-prefixed methods. So this patch correspondingly renames the
> > driver functions. IDT NTB driver mostly expose "ntb_nw_"-prefixed methods,
> > since it doesn't have convenient access to the peer Lookup Table.
> > 
> > In order to pass information from one RC to another NTB functions of IDT
> > PCIe-switch implement Messaging subsystem. They currently support four 
> > message
> > registers to transfer DWORD sized data to a specified peer. So there are two
> > new callback methods are introduced:
> >  - ntb_msg_size() - get the number of DWORDs supported by NTB function to 
> > send
> > and receive messages
> >  - ntb_msg_post() - send message of size retrieved from ntb_msg_size()
> > to a peer
> > Additio

[PATCH] mips: mm: Discard ioremap_cacheable_cow() method

2018-07-20 Thread Serge Semin

This macro substitution is the shortcut to map cacheable IO memory
with coherent and write-back attributes. Since it is entirely unused
by kernel, lets just remove it.

Signed-off-by: Serge Semin 
Suggested-by: Christoph Hellwig 
CC: Paul Burton 
Cc: James Hogan 
Cc: Ralf Baechle 
Cc: Sinan Kaya 
Cc: Huacai Chen 
Cc: sergey.se...@t-platforms.ru
Cc: linux-m...@linux-mips.org
Cc: linux-kernel@vger.kernel.org
---
 arch/mips/include/asm/io.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index f613d1df66c0..cd170d920d55 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -300,13 +300,6 @@ static inline void __iomem * __ioremap_mode(phys_addr_t 
offset, unsigned long si
 #define ioremap_wc(offset, size)   \
__ioremap_mode((offset), (size), boot_cpu_data.writecombine)
 
-/*
- * This is a MIPS specific ioremap variant. ioremap_cacheable_cow
- * requests a cachable mapping with CWB attribute enabled.
- */
-#define ioremap_cacheable_cow(offset, size)\
-   __ioremap_mode((offset), (size), _CACHE_CACHABLE_COW)
-
 static inline void iounmap(const volatile void __iomem *addr)
 {
if (plat_iounmap(addr))
-- 
2.12.0

Re: [PATCH 2/2] mips: mm: Discard ioremap_uncached_accelerated() method

2018-07-11 Thread Serge Semin

Hello Christoph,

On Tue, Jul 10, 2018 at 11:56:31PM -0700, Christoph Hellwig 
 wrote:
> > + * This is a MIPS specific ioremap variant. ioremap_cacheable_cow
> > + * requests a cachable mapping with CWB attribute enabled.
> >   */
> >  #define ioremap_cacheable_cow(offset, size)
> > \
> > __ioremap_mode((offset), (size), _CACHE_CACHABLE_COW)
> 
> This isn't actually used anywhere in the kernel tree.  Please remove it
> as well.

I don't really know whether it is necessary at this point. We discarded the 
ioremap_uncached_accelerated() method, since the obvious alternative is now
available: ioremap_wc(). While ioremap_cacheable_cow() hasn't got one.
So if it was up to me, I'd leave it here. Anyway if the subsystem maintainers
think otherwise, I won't refuse to submit a patch with this method removal.

Regards,
-Sergey

[PATCH] ntb: idt: Set PCIe bus address to BARLIMITx

2018-07-11 Thread Serge Semin

IDT NTB driver sets the upper limit of actual translation address
being set to the corresponding memory window. It is achieved by
BARLIMITx register initialization. Needless to say, that the register
works within PCIe bus address space.

In general CPU and PCIe address spaces are different. It means,
that addresses used for Memory TLPs routine can be different from
CPU addresses. While in most of cases they are the same, there are
exceptions when the proper mapping must be performed to have the
portable driver code. There used to be a virt_to_bus()/bus_to_virt()
interface for this purpose. But it's deprecated now. It was also a
mistake to use pci_resource_start() since the return address of the
method is at the CPU address space. In order to achieve the desired
purpose we need to use pcibios_resource_to_bus(). This method shall
return a PCIe bus address region of the corresponding BAR resources.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index dbe72f116017..0f4f5e7e4ff8 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -1311,6 +1311,7 @@ static int idt_ntb_peer_mw_set_trans(struct ntb_dev *ntb, 
int pidx, int widx,
/* DIR and LUT based translations are initialized differently */
if (mw_cfg->type == IDT_MW_DIR) {
const struct idt_ntb_bar *bar = &ntdata_tbl.bars[mw_cfg->bar];
+   struct pci_bus_region region;
u64 limit;
/* Set destination partition of translation */
data = idt_nt_read(ndev, bar->setup);
@@ -1320,7 +1321,9 @@ static int idt_ntb_peer_mw_set_trans(struct ntb_dev *ntb, 
int pidx, int widx,
idt_nt_write(ndev, bar->ltbase, (u32)addr);
idt_nt_write(ndev, bar->utbase, (u32)(addr >> 32));
/* Set the custom BAR aperture limit */
-   limit = pci_resource_start(ntb->pdev, mw_cfg->bar) + size;
+   pcibios_resource_to_bus(ntb->pdev->bus, ®ion,
+   &ntb->pdev->resource[mw_cfg->bar]);
+   limit = region.start + size;
idt_nt_write(ndev, bar->limit, (u32)limit);
if (IS_FLD_SET(BARSETUP_TYPE, data, 64))
idt_nt_write(ndev, (bar + 1)->limit, (limit >> 32));
-- 
2.12.0

[PATCH v2] ntb: idt: Set PCIe bus address to BARLIMITx

2018-07-11 Thread Serge Semin

IDT NTB driver sets the upper limit of actual translation address
being written to the corresponding memory window setup. It is achieved
by BARLIMITx register initialization. Needless to say, that the register
works within PCIe bus address space.

In general CPU and PCIe address spaces are different. It means,
that addresses used for Memory TLPs routine can be different from
CPU addresses. While in most of cases they are the same, there are
exceptions when the proper mapping must be performed to have the
portable driver code. There used to be a virt_to_bus()/bus_to_virt()
interface for this purpose. But it's deprecated now. It was also a
mistake to use pci_resource_start() since the return address of the
method is at the CPU address space. In order to achieve the desired
purpose we need to use pci_bus_address() helper. This method shall
return a PCIe bus base address of the corresponding BAR resource.

Signed-off-by: Serge Semin 

---

Changelog v2:
- Replace pcibios_resource_to_bus() with pci_bus_address() helper.

 drivers/ntb/hw/idt/ntb_hw_idt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index dbe72f116017..fb2c44ac9c69 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -1320,7 +1320,7 @@ static int idt_ntb_peer_mw_set_trans(struct ntb_dev *ntb, 
int pidx, int widx,
idt_nt_write(ndev, bar->ltbase, (u32)addr);
idt_nt_write(ndev, bar->utbase, (u32)(addr >> 32));
/* Set the custom BAR aperture limit */
-   limit = pci_resource_start(ntb->pdev, mw_cfg->bar) + size;
+   limit = pci_bus_address(ntb->pdev, mw_cfg->bar) + size;
idt_nt_write(ndev, bar->limit, (u32)limit);
if (IS_FLD_SET(BARSETUP_TYPE, data, 64))
idt_nt_write(ndev, (bar + 1)->limit, (limit >> 32));
-- 
2.12.0

[PATCH 4/4] ntb: idt: Alter the driver info comments

2018-07-14 Thread Serge Semin

Since IDT PCIe-switch temperature sensor is now always available
irregardless of the EEPROM/BIOS settings, Kconfig and in-code
description should be properly altered. In addition lets update
the driver copyright lines.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/Kconfig  |  4 +---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 11 ++-
 drivers/ntb/hw/idt/ntb_hw_idt.h |  2 +-
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/hw/idt/Kconfig b/drivers/ntb/hw/idt/Kconfig
index b360e5613b9f..bacffd369494 100644
--- a/drivers/ntb/hw/idt/Kconfig
+++ b/drivers/ntb/hw/idt/Kconfig
@@ -23,9 +23,7 @@ config NTB_IDT
 BAR settings of peer NT-functions, the BAR setups can't be done over
 kernel PCI fixups. That's why the alternative pre-initialization
 techniques like BIOS using SMBus interface or EEPROM should be
-utilized. Additionally if one needs to have temperature sensor
-information printed to system log, the corresponding registers must
-be initialized within BIOS/EEPROM as well.
+utilized.
 
 If unsure, say N.
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 3d48267ae315..d7a4984ed423 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -4,7 +4,7 @@
  *
  *   GPL LICENSE SUMMARY
  *
- *   Copyright (C) 2016 T-Platforms All Rights Reserved.
+ *   Copyright (C) 2016-2018 T-Platforms JSC All Rights Reserved.
  *
  *   This program is free software; you can redistribute it and/or modify it
  *   under the terms and conditions of the GNU General Public License,
@@ -1825,10 +1825,11 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, 
int pidx, int midx,
  *  7. Temperature sensor operations
  *
  *IDT PCIe-switch has an embedded temperature sensor, which can be used to
- * warn a user-space of possible chip overheating. Since workload temperature
- * can be different on different platforms, temperature thresholds as well as
- * general sensor settings must be setup in the framework of BIOS/EEPROM
- * initializations. It includes the actual sensor enabling as well.
+ * check current chip core temperature. Since a workload environment can be
+ * different on different platforms, an offset and ADC/filter settings can be
+ * specified. Although the offset configuration is only exposed to the sysfs
+ * hwmon interface at the moment. The rest of the settings can be adjusted
+ * for instance by the BIOS/EEPROM firmware.
  *=
  */
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.h b/drivers/ntb/hw/idt/ntb_hw_idt.h
index 3517cd2e2baa..2f1aa121b0cf 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.h
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.h
@@ -4,7 +4,7 @@
  *
  *   GPL LICENSE SUMMARY
  *
- *   Copyright (C) 2016 T-Platforms All Rights Reserved.
+ *   Copyright (C) 2016-2018 T-Platforms JSC All Rights Reserved.
  *
  *   This program is free software; you can redistribute it and/or modify it
  *   under the terms and conditions of the GNU General Public License,
-- 
2.12.0

[PATCH 1/4] ntb: idt: Alter temperature read method

2018-07-14 Thread Serge Semin

In order to create a hwmon interface for the IDT PCIe-switch temperature
sensor the already available reader method should be improved. Particularly
we need to redesign it so one would be able to read temperature/offset
values from registers of the passed types. Since IDT sensor interface
provides temperature in unsigned format 0:7:1 (7 bits for real value
and one for fraction) we also need to have helpers for the typical sysfs
temperature data type conversion to and from this format. Even though
the IDT PCIe-switch provided temperature offset got the same but signed
type it can be translated by these methods too.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 113 ++--
 drivers/ntb/hw/idt/ntb_hw_idt.h |  56 
 2 files changed, 152 insertions(+), 17 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index c1d03f951b0d..928f37877790 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -1830,22 +1830,99 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, 
int pidx, int midx,
  */
 
 /*
+ * idt_get_deg() - convert millidegree Celsius value to just degree
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: Degree corresponding to the passed millidegree value
+ */
+static inline s8 idt_get_deg(long mdegC)
+{
+   return mdegC / 1000;
+}
+
+/*
+ * idt_get_frac() - retrieve 0/0.5 fraction of the millidegree Celsius value
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: 0/0.5 degree fraction of the passed millidegree value
+ */
+static inline u8 idt_get_deg_frac(long mdegC)
+{
+   return (mdegC % 1000) >= 500 ? 5 : 0;
+}
+
+/*
+ * idt_get_temp_fmt() - convert millidegree Celsius value to 0:7:1 format
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: 0:7:1 format acceptable by the IDT temperature sensor
+ */
+static inline u8 idt_temp_get_fmt(long mdegC)
+{
+   return (idt_get_deg(mdegC) << 1) | (idt_get_deg_frac(mdegC) ? 1 : 0);
+}
+
+/*
+ * idt_get_temp_sval() - convert temp sample to signed millidegree Celsius
+ * @data:  IN - shifted to LSB 8-bits temperature sample
+ *
+ * Return: signed millidegree Celsius
+ */
+static inline long idt_get_temp_sval(u32 data)
+{
+   return ((s8)data / 2) * 1000 + (data & 0x1 ? 500 : 0);
+}
+
+/*
+ * idt_get_temp_sval() - convert temp sample to unsigned millidegree Celsius
+ * @data:  IN - shifted to LSB 8-bits temperature sample
+ *
+ * Return: unsigned millidegree Celsius
+ */
+static inline long idt_get_temp_uval(u32 data)
+{
+   return (data / 2) * 1000 + (data & 0x1 ? 500 : 0);
+}
+
+/*
  * idt_read_temp() - read temperature from chip sensor
  * @ntb:   NTB device context.
- * @val:   OUT - integer value of temperature
- * @frac:  OUT - fraction
+ * @type:  IN - type of the temperature value to read
+ * @val:   OUT - integer value of temperature in millidegree Celsius
  */
-static void idt_read_temp(struct idt_ntb_dev *ndev, unsigned char *val,
- unsigned char *frac)
+static void idt_read_temp(struct idt_ntb_dev *ndev,
+ const enum idt_temp_val type, long *val)
 {
u32 data;
 
-   /* Read the data from TEMP field of the TMPSTS register */
-   data = idt_sw_read(ndev, IDT_SW_TMPSTS);
-   data = GET_FIELD(TMPSTS_TEMP, data);
-   /* TEMP field has one fractional bit and seven integer bits */
-   *val = data >> 1;
-   *frac = ((data & 0x1) ? 5 : 0);
+   /* Alter the temperature field in accordance with the passed type */
+   switch (type) {
+   case IDT_TEMP_CUR:
+   data = GET_FIELD(TMPSTS_TEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_LOW:
+   data = GET_FIELD(TMPSTS_LTEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_HIGH:
+   data = GET_FIELD(TMPSTS_HTEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_OFFSET:
+   /* This is the only field with signed 0:7:1 format */
+   data = GET_FIELD(TMPADJ_OFFSET,
+idt_sw_read(ndev, IDT_SW_TMPADJ));
+   *val = idt_get_temp_sval(data);
+   return;
+   default:
+   data = GET_FIELD(TMPSTS_TEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   }
+
+   /* The rest of the fields accept unsigned 0:7:1 format */
+   *val = idt_get_temp_uval(data);
 }
 
 /*
@@ -1861,10 +1938,10 @@ static void idt_read_temp(struct idt_ntb_dev *ndev, 
unsigned char *val,
  */
 static void idt_temp_isr(struct idt_ntb_dev *ndev, u32 ntint_sts)
 {
-   unsigned char val, frac;
+   unsigned long mdeg;
 
/* R

[PATCH 0/4] ntb: idt: Add hwmon temperature sensor interface

2018-07-14 Thread Serge Semin

IDT PCIe-switches are equipped with an embedded temperature sensor. It
works within the range [0; 127.5]C with a resolution of 0.5C. It can
be used to monitor the chip core temperature so to have prevent it from
possible overheating. It might be very topical for the chip, since it
gets heated like in hell especially if ASPM isn't enabled.

Other than the current sampled temperatur, the sensor interface exposes
history registors with lowest and highest measured temperature, thresholds
and alarm IRQs enabled/disable bits, ADC/filter settings. The device manual
states that the switch is able to generate a msi interrupt on PCIe upstreams
if the temperature crosses one of three configurable thresholds. But in
practice we discovered that the enable/disable threshold IRQs bits interface
is very broken (see the third patch commit message), so it can't be used
to create the hwmon alarm interface. As the result we had to remove the
already available temperature sensor IRQ handler and disable the corresponding
interrupt.

Current version of the driver provides following standard hwmon sysfs
files: temperature input, lowest and highest measured temperature
with possibility to reset the history, temperature offset. The rest of the
nodes can't be safely implemented for the chip due to the described issues.

Signed-off-by: Serge Semin 

Serge Semin (4):
  ntb: idt: Alter temperature read method
  ntb: idt: Add basic hwmon sysfs interface
  ntb: idt: Discard temperature sensor IRQ handler
  ntb: idt: Alter the driver info comments

 drivers/ntb/hw/idt/Kconfig  |   4 +-
 drivers/ntb/hw/idt/ntb_hw_idt.c | 317 ++--
 drivers/ntb/hw/idt/ntb_hw_idt.h |  87 ++-
 3 files changed, 353 insertions(+), 55 deletions(-)

-- 
2.12.0

[PATCH 2/4] ntb: idt: Add basic hwmon sysfs interface

2018-07-14 Thread Serge Semin

IDT PCIe switches provide an embedded temperature sensor working
within [0; 127.5]C with resolution of 0.5C. They also can generate
a PCIe upstream interrupt in case if the temperature passes through
specified thresholds. Since this thresholds interface is very broken
the created hwmon-sysfs interface exposes only the next set of hwmon
nodes: current input temperature, lowest and highest values measured,
history resetting, value offset. HWmon alarm interface isn't provided.

IDT PCIe switch also've got an ADC/filter settings of the sensor.
This driver doesn't expose them to the hwmon-sysfs interface at the
moment, except the offset node.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 182 
 drivers/ntb/hw/idt/ntb_hw_idt.h |  24 +-
 2 files changed, 205 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 928f37877790..af767a13556a 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -49,11 +49,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "ntb_hw_idt.h"
@@ -1926,6 +1929,153 @@ static void idt_read_temp(struct idt_ntb_dev *ndev,
 }
 
 /*
+ * idt_write_temp() - write temperature to the chip sensor register
+ * @ntb:   NTB device context.
+ * @type:  IN - type of the temperature value to change
+ * @val:   IN - integer value of temperature in millidegree Celsius
+ */
+static void idt_write_temp(struct idt_ntb_dev *ndev,
+  const enum idt_temp_val type, const long val)
+{
+   unsigned int reg;
+   u32 data;
+   u8 fmt;
+
+   /* Retrieve the properly formatted temperature value */
+   fmt = idt_temp_get_fmt(val);
+
+   mutex_lock(&ndev->hwmon_mtx);
+   switch (type) {
+   case IDT_TEMP_LOW:
+   reg = IDT_SW_TMPALARM;
+   data = SET_FIELD(TMPALARM_LTEMP, idt_sw_read(ndev, reg), fmt) &
+   ~IDT_TMPALARM_IRQ_MASK;
+   break;
+   case IDT_TEMP_HIGH:
+   reg = IDT_SW_TMPALARM;
+   data = SET_FIELD(TMPALARM_HTEMP, idt_sw_read(ndev, reg), fmt) &
+   ~IDT_TMPALARM_IRQ_MASK;
+   break;
+   case IDT_TEMP_OFFSET:
+   reg = IDT_SW_TMPADJ;
+   data = SET_FIELD(TMPADJ_OFFSET, idt_sw_read(ndev, reg), fmt);
+   break;
+   default:
+   goto inval_spin_unlock;
+   }
+
+   idt_sw_write(ndev, reg, data);
+
+inval_spin_unlock:
+   mutex_unlock(&ndev->hwmon_mtx);
+}
+
+/*
+ * idt_sysfs_show_temp() - printout corresponding temperature value
+ * @dev:   Pointer to the NTB device structure
+ * @da:Sensor device attribute structure
+ * @buf:   Buffer to print temperature out
+ *
+ * Return: Number of written symbols or negative error
+ */
+static ssize_t idt_sysfs_show_temp(struct device *dev,
+  struct device_attribute *da, char *buf)
+{
+   struct sensor_device_attribute *attr = to_sensor_dev_attr(da);
+   struct idt_ntb_dev *ndev = dev_get_drvdata(dev);
+   enum idt_temp_val type = attr->index;
+   long mdeg;
+
+   idt_read_temp(ndev, type, &mdeg);
+   return sprintf(buf, "%ld\n", mdeg);
+}
+
+/*
+ * idt_sysfs_set_temp() - set corresponding temperature value
+ * @dev:   Pointer to the NTB device structure
+ * @da:Sensor device attribute structure
+ * @buf:   Buffer to print temperature out
+ * @count: Size of the passed buffer
+ *
+ * Return: Number of written symbols or negative error
+ */
+static ssize_t idt_sysfs_set_temp(struct device *dev,
+ struct device_attribute *da, const char *buf,
+ size_t count)
+{
+   struct sensor_device_attribute *attr = to_sensor_dev_attr(da);
+   struct idt_ntb_dev *ndev = dev_get_drvdata(dev);
+   enum idt_temp_val type = attr->index;
+   long mdeg;
+   int ret;
+
+   ret = kstrtol(buf, 10, &mdeg);
+   if (ret)
+   return ret;
+
+   /* Clamp the passed value in accordance with the type */
+   if (type == IDT_TEMP_OFFSET)
+   mdeg = clamp_val(mdeg, IDT_TEMP_MIN_OFFSET,
+IDT_TEMP_MAX_OFFSET);
+   else
+   mdeg = clamp_val(mdeg, IDT_TEMP_MIN_MDEG, IDT_TEMP_MAX_MDEG);
+
+   idt_write_temp(ndev, type, mdeg);
+
+   return count;
+}
+
+/*
+ * idt_sysfs_reset_hist() - reset temperature history
+ * @dev:   Pointer to the NTB device structure
+ * @da:Sensor device attribute structure
+ * @buf:   Buffer to print temperature out
+ * @count: Size of the passed buffer
+ *
+ * Return: Number of written symbols or nega

[PATCH 3/4] ntb: idt: Discard temperature sensor IRQ handler

2018-07-14 Thread Serge Semin

IDT PCIe-switch temperature sensor interface is very broken. First
of all only a few combinations of TMPCTL threshold enable bits
really cause the interrupts unmasked. Even if an individual bit
indicates the event unmasked, corresponding IRQ just isn't generated.
Most of the threshold enable bits combinations are in fact useless and
non of them can help to create a fully functional alarm interface.
So to speak, we can't create a well defined hwmon alarms based on
the IDT PCI-switch threshold IRQs.

Secondly a single threshold IRQ (not a combination of thresholds) can
be successfully enabled without the issue described above. But in this
case we experienced an enormous number of interrupts generated by
the chip if the temperature got near the enabled threshold value. Filter
adjustment didn't help much. It also doesn't provide a hysteresis settings.
Due to the temperature sample fluctuations near the threshold the
interrupts spate makes the system nearly unusable until the temperature
value finally settled so being pushed either to be fully higher or lower
the threshold.

All of these issues makes the temperature sensor alarm interface useless
and even at some point dangerous to be used in the driver. In this case
it is safer to completely discard it and disable the temperature alarm
interrupts.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 41 +
 drivers/ntb/hw/idt/ntb_hw_idt.h |  5 ++---
 2 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index af767a13556a..3d48267ae315 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2076,38 +2076,6 @@ static struct attribute *idt_temp_attrs[] = {
 ATTRIBUTE_GROUPS(idt_temp);
 
 /*
- * idt_temp_isr() - temperature sensor alarm events ISR
- * @ndev:  IDT NTB hardware driver descriptor
- * @ntint_sts: NT-function interrupt status
- *
- * It handles events of temperature crossing alarm thresholds. Since reading
- * of TMPALARM register clears it up, the function doesn't analyze the
- * read value, instead the current temperature value just warningly printed to
- * log.
- * The method is called from PCIe ISR bottom-half routine.
- */
-static void idt_temp_isr(struct idt_ntb_dev *ndev, u32 ntint_sts)
-{
-   unsigned long mdeg;
-
-   /* Read the current temperature value */
-   idt_read_temp(ndev, IDT_TEMP_CUR, &mdeg);
-
-   /* Read the temperature alarm to clean the alarm status out */
-   /*(void)idt_sw_read(ndev, IDT_SW_TMPALARM);*/
-
-   /* Clean the corresponding interrupt bit */
-   idt_nt_write(ndev, IDT_NT_NTINTSTS, IDT_NTINTSTS_TMPSENSOR);
-
-   dev_dbg(&ndev->ntb.pdev->dev,
-   "Temp sensor IRQ detected %#08x", ntint_sts);
-
-   /* Print temperature value to log */
-   dev_warn(&ndev->ntb.pdev->dev, "Temperature %hhd.%hhuC",
-   idt_get_deg(mdeg), idt_get_deg_frac(mdeg));
-}
-
-/*
  * idt_init_temp() - initialize temperature sensor interface
  * @ndev:  IDT NTB hardware driver descriptor
  *
@@ -2189,7 +2157,7 @@ static int idt_init_isr(struct idt_ntb_dev *ndev)
goto err_free_vectors;
}
 
-   /* Unmask Message/Doorbell/SE/Temperature interrupts */
+   /* Unmask Message/Doorbell/SE interrupts */
ntint_mask = idt_nt_read(ndev, IDT_NT_NTINTMSK) & ~IDT_NTINTMSK_ALL;
idt_nt_write(ndev, IDT_NT_NTINTMSK, ntint_mask);
 
@@ -2204,7 +2172,6 @@ err_free_vectors:
return ret;
 }
 
-
 /*
  * idt_deinit_ist() - deinitialize PCIe interrupt handler
  * @ndev:  IDT NTB hardware driver descriptor
@@ -2265,12 +2232,6 @@ static irqreturn_t idt_thread_isr(int irq, void *devid)
handled = true;
}
 
-   /* Handle temperature sensor interrupt */
-   if (ntint_sts & IDT_NTINTSTS_TMPSENSOR) {
-   idt_temp_isr(ndev, ntint_sts);
-   handled = true;
-   }
-
dev_dbg(&ndev->ntb.pdev->dev, "IDT IRQs 0x%08x handled", ntint_sts);
 
return handled ? IRQ_HANDLED : IRQ_NONE;
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.h b/drivers/ntb/hw/idt/ntb_hw_idt.h
index 032f81cb4d44..3517cd2e2baa 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.h
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.h
@@ -688,15 +688,14 @@
  * @IDT_NTINTMSK_DBELL:Doorbell interrupt mask bit
  * @IDT_NTINTMSK_SEVENT:   Switch Event interrupt mask bit
  * @IDT_NTINTMSK_TMPSENSOR:Temperature sensor interrupt mask bit
- * @IDT_NTINTMSK_ALL:  All the useful interrupts mask
+ * @IDT_NTINTMSK_ALL:  NTB-related interrupts mask
  */
 #define IDT_NTINTMSK_MSG   0x0001U
 #define IDT_NTINTMSK_DBELL 0x0002U
 #define IDT_NTINTMSK_SEVENT0x0008U
 #define IDT_NTINTMSK_TMPSENSOR 0x0080U
 #define I

Re: [PATCH 11/14] MIPS: memblock: Print out kernel virtual mem layout

2018-01-23 Thread Serge Semin

Hello Matt,

On Tue, Jan 23, 2018 at 03:35:14PM +, Matt Redfearn 
 wrote:
> Hi Serge,
> 
> On 19/01/18 14:27, Serge Semin wrote:
> >On Fri, Jan 19, 2018 at 07:59:43AM +, Matt Redfearn 
> > wrote:
> >
> >Hello Matt,
> >
> >>Hi Serge,
> >>
> >>
> >>
> >>On 18/01/18 20:18, Serge Semin wrote:
> >>>On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli 
> >>> wrote:
> >>>>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>>>It is useful to have the kernel virtual memory layout printed
> >>>>>at boot time so to have the full information about the booted
> >>>>>kernel. In some cases it might be unsafe to have virtual
> >>>>>addresses freely visible in logs, so the %pK format is used if
> >>>>>one want to hide them.
> >>>>>
> >>>>>Signed-off-by: Serge Semin 
> >>>>
> >>>>I personally like having that information because that helps debug and
> >>>>have a quick reference, but there appears to be a trend to remove this
> >>>>in the name of security:
> >>>>
> >>>>https://patchwork.kernel.org/patch/10124007/
> >>>>
> >>>>maybe hide this behind a configuration option?
> >>>
> >>>Yeah, arm code was the place I picked the function up.) But in my case
> >>>I've used %pK so the pointers would disappear from logging when
> >>>kptr_restrict sysctl is 1 or 2.
> >>>I agree, that we might need to make the printouts optional. If there is
> >>>any kernel config, which for instance increases the kernel security we
> >>>could also use it or anything else to discard the printouts at compile
> >>>time.
> >>
> >>
> >>Certainly, when KASLR is active it would be preferable to hide this
> >>information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> >>additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> >>that only people actively debugging the kernel see it:
> >>
> >>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
> >
> >Ok. I'll hide the printouts behind both of that config macros in the next 
> >patchset
> >version.
> 
> 
> Another thing to note - since ad67b74d2469d ("printk: hash addresses printed
> with %p") %pK at this time in the boot process is useless since the RNG is
> not sufficiently initialised and all prints end up being "(ptrval)". Hence
> after v4.15-rc2 we end up with output like:
> 
> [0.00] Kernel virtual memory layout:
> [0.00] lowmem  : 0x(ptrval) - 0x(ptrval)  ( 256 MB)
> [0.00]   .text : 0x(ptrval) - 0x(ptrval)  (7374 kB)
> [0.00]   .data : 0x(ptrval) - 0x(ptrval)  (1901 kB)
> [0.00]   .init : 0x(ptrval) - 0x(ptrval)  (1600 kB)
> [0.00]   .bss  : 0x(ptrval) - 0x(ptrval)  ( 415 kB)
> [0.00] vmalloc : 0x(ptrval) - 0x(ptrval)  (1023 MB)
> [0.00] fixmap  : 0x(ptrval) - 0x(ptrval)  (  68 kB)
> 

It must be some bug in the algo. What point in the %pK then? According to
the documentation the only way to see the pointers is when (kptr_restrict == 0).
But if it is we don't get into the restricted_pointer() method at all:
http://elixir.free-electrons.com/linux/v4.15-rc9/source/lib/vsprintf.c#L1934
In this case the vsprintf() executes the method ptr_to_id(), which of course
default to _not_ leak addresses, and hash it before printing.

Really %pK isn't supposed to be dependent from RNG at all since kptr_restrict
doesn't do any value randomization.

> 
> The %px format specifier was added for cases such as this, where we really
> want to print the unmodified address. And as long as this function is
> suitably guarded to only do this when KASLR is deactivated /
> CONFIG_DEBUG_KERNEL is activated, etc, then we are not unwittingly leaking
> information - we are deliberately making it available.
> 

If %pK would work as it's stated by the kernel documentation:
https://www.kernel.org/doc/Documentation/printk-formats.txt
then the only change I'd suggest to have here is to close the kernel memory
layout printout method by the CONFIG_DEBUG_KERNEL ifdef-macro. The kptr_restrict
should default to 1/2 if the KASLR is activated:
https://lwn.net/Articles/444556/

Regards,
-Sergey

> Thanks,
> Matt
> 
> >
> >Regards,
> >-Sergey
> >
> >>
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>>-- 
> >>>>Florian

Re: [PATCH 02/14] MIPS: memblock: Surely map BSS kernel memory section

2018-01-23 Thread Serge Semin

Hello Matt,

On Tue, Jan 23, 2018 at 11:03:27AM +, Matt Redfearn 
 wrote:
> Hi Serge,
> 
> On 22/01/18 21:47, Serge Semin wrote:
> >Hello Matt,
> >
> >On Mon, Jan 22, 2018 at 04:35:26PM +, Matt Redfearn 
> > wrote:
> >>Hi Serge,
> >>
> >>On 17/01/18 22:23, Serge Semin wrote:
> >>>The current MIPS code makes sure the kernel code/data/init
> >>>sections are in the maps, but BSS should also be there.
> >>
> >>Quite right - it should. But this was protected against by reserving all
> >>bootmem up to the _end symbol here:
> >>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> >>Which you remove in the next patch in this series. I'm not sure it is worth
> >
> >Right. Missed that part. The old code just doesn't set the kernel memory free
> >calling the free_bootmem() method for non-reserved parts below reserved_end.
> >
> >>disentangling the reserved_end stuff from the next patch to make this into a
> >>single logical change of reserving just .bss rather than everything below
> >>_end.
> >
> >Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
> >Add reserved memory regions to memblock". It logically belongs to that place.
> >Since basically by the arch_mem_addpart() calls we reserve all the kernel
> 
> 
> Actually I was wrong - it's not this sequence of arch_mem_addpart's that
> reserves the kernels memory. At least on DT based systems, it's pretty
> likely that these regions will overlap with the system memory already added.
> of_scan_flat_dt will look for the memory node and add it via
> early_init_dt_add_memory_arch.
> These calls to add the kernel text, init and bss detect that they overlap
> with the already present system memory, so don't get added, here:
> http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
> 
> As such, when we print out the content of boot_mem_map, we only have a
> single entry:
> 
> [0.00] Determined physical RAM map:
> [0.00]  memory: 1000 @  (usable)
> 
> 
> >memory now I'd also merged them into a single call for the range [_text, 
> >_end].
> >What do you think?
> 
> 
> I think that this patch makes sense in case the .bss is for some reason not
> covered by an existing entry, but I would leave it as a separate patch.
> 
> Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
> is actually self-contained since it replaces reserving all memory up to _end
> with the single reservation of the kernel's whole size
> 
> + size = __pa_symbol(&_end) - __pa_symbol(&_text);
> + memblock_reserve(__pa_symbol(&_text), size);
> 
> 
> Which I think is definitely an improvement since it is much clearer.
> 

Alright lets sum it up. First of all, yeah, you are right, arch_mem_addpart()
is created to make sure the kernel memory is added to the memblock/bootmem pool.
The previous arch code was leaving such the memory range non-freed since it was
higher the reserved_end, so to make sure the early memory allocations wouldn't
be made from the pages, where kernel actually resides.

In my code I still wanted to make sure the kernel memory is in the memblock 
pool.
But I also noticed, that .bss memory range wouldn't be added to the pool if 
neither
dts nor platform-specific code added any memory to the boot_mem_map pool. So I
decided to fix it. The actual kernel memory reservation is performed after all
the memory regions are declared by the code you cited. It's essential to do
the [_text, _end] memory range reservation there, otherwise memblock may
allocate from the memory range occupied by the kernel code/data.

Since you agree with leaving it in the separate patch, I'd only suggest to
call the arch_mem_addpart() method for just one range [_text, _end] instead of
doing it three times for a separate _text, _data and bss sections. What do you
think?

Regards,
-Sergey

> Thanks,
> Matt
> 
> >
> >Regards,
> >-Sergey
> >
> >>
> >>Reviewed-by: Matt Redfearn 
> >>
> >>Thanks,
> >>Matt
> >>
> >>>
> >>>Signed-off-by: Serge Semin 
> >>>---
> >>>  arch/mips/kernel/setup.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>>diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >>>index 76e9e2075..0d21c9e04 100644
> >>>--- a/arch/mips/kernel/setup.c
> >>>+++ b/arch/mips/kernel/setup.c
> >>>@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> >>>   arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> >>>PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> >>>BOOT_MEM_INIT_RAM);
> >>>+  arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> >>>+   PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> >>>+   BOOT_MEM_RAM);
> >>>   pr_info("Determined physical RAM map:\n");
> >>>   print_memory_map();
> >>>

Re: [PATCH 12/14] MIPS: memblock: Discard bootmem from Loongson3 code

2018-01-23 Thread Serge Semin

Hello Jiaxun,

On Tue, Jan 23, 2018 at 10:28:44PM +, Jiaxun Yang  
wrote:
> 在 2018-01-18四的 01:23 +0300，Serge Semin写道：
> Hi Serge
> 
> > Loongson64/3 runs its own code to initialize memory allocator in
> > case of NUMA configuration is selected. So in order to move to the
> > pure memblock utilization we discard the bootmem allocator usage
> > and insert the memblock reservation method for
> > kernel/addrspace_offset
> > memory regions.
> 
> Thanks for your patch. However, In my test, the system didn't boot
> anymore with your patch. Since I don't have any lowlevel debug
> instuments(ejtag or something). I can't provide any detail about the
> problem. Just let you know that we have a problem here.
> 

Thanks for performing the tests of the patchset. I really appreciate this.
Regarding the problems you got. You must be doing something wrong, since
Matt Redfearn already did the tests on Loongson3:
https://lkml.org/lkml/2018/1/22/610
and the kernel turns out to be booting without troubles.

So could you make sure, that you did everything right? Particularly, you
said the patch (singular) isn't working. But this patch functionality depends
on the whole patchset. Did you apply all the patches I sent and fully rebuild
the kernel then?

Regards,
-Sergey

> 
> --
> Jiaxun Yang

Re: [PATCH 07/14] MIPS: memblock: Mark present sparsemem sections

2018-01-23 Thread Serge Semin

Hello Marcin

On Wed, Jan 24, 2018 at 07:13:03AM +0100, Marcin Nowakowski 
 wrote:
> Hi Serge,
> 
> On 17.01.2018 23:23, Serge Semin wrote:
> >If sparsemem is activated all sections with present pages must
> >be accordingly marked after memblock is fully initialized.
> >
> >Signed-off-by: Serge Semin 
> >---
> >  arch/mips/kernel/setup.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> >diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >index b121fa702..6df1eaf38 100644
> >--- a/arch/mips/kernel/setup.c
> >+++ b/arch/mips/kernel/setup.c
> >@@ -778,7 +778,7 @@ static void __init request_crashkernel(struct resource 
> >*res)
> >  static void __init arch_mem_init(char **cmdline_p)
> >  {
> >-struct memblock_region *reg;
> >+struct memblock_region *reg __maybe_unused;
> 
> nit: reg is used here. It becomes __maybe_unused in patch 8.
> 

Right. I'll move __maybe_unused change to the patch 8.

-Sergey

> 
> Marcin

Re: [PATCH] NTB: ntb_perf: fix cast to restricted __le32

2018-01-23 Thread Serge Semin

On Tue, Jan 23, 2018 at 11:18:06PM -0500, Jon Mason  wrote:
> On Fri, Jan 19, 2018 at 10:26:37PM +0100, Arnd Bergmann wrote:
> > On Fri, Jan 19, 2018 at 10:03 PM, Serge Semin  
> > wrote:
> > >
> > > Actually the provided patch is the best solution I could come up with.
> > > The thing is, that the methods can't be changed. Those functions are
> > > the part of the NTB API methods used by many drivers. So basically they
> > > are like pci_{read,write}_config_{byte,word,dword}() methods. We can't
> > > change their prototypes only because it's suit some driver. The methods
> > > give an access to the NTB device dummy u32-sized registers, nothing
> > > else. So endianness is the transmitted data settings in this case.
> > >
> > > NTB is the technology to interconnect some two systems with possibly
> > > different endianness (unlike PCI, which interconnect CPU with LE devices).
> > > In this case I'd need to set some agreement up between two systems about
> > > the endianness of the exchanged data like host and network types in
> > > Linux networking. I've chosen the network data to be little-endian,
> > > that's why I needed first to convert them from CPU to le32, then on
> > > remote side convert them back from le32 to CPU.
> > >
> > > If you have any better suggestion how the warning can be fixed, I'd
> > > be glad to stick to it.
> > 
> > I don't think your description matches what you actually do: The
> > underlying ntb hardware drivers (amd, idt, intel, mscc) all treat the
> > incoming data as CPU-endian and convert it to little-endian on
> > the register side, so the framework already assumes that whatever
> > you do here uses a little-endian wire-level protocol.
> > 
> > On a little-endian kernel/CPU, nothing is ever swapped here, neither
> > in the ntb_perf front-end nor in the back-ends. On a big-endian
> > kernel/CPU, they both swap, so you end up with CPU-endian
> > data on the wire, so it should be impossible for a big-endian
> > system to talk to a little-endian one. Have you actually tried that
> > combination with the current code?
> 
> I do not believe anyone has every tried NTB on a big endien system,
> let alone tried it with one side LE and the other BE.  To my
> knowledge, this has only ever been used on x86 to x86.
> 
> > If my interpretation is correct, then the best solution would be to
> > completely remove the cpu_to_le32/le32_to_cpu conversions
> > from ntb_perf, and just define that it works like any other PCI
> > device, exchanging little-endian data.
> 
> Yes, this would be the best solution.  Thank you for the insight.
> 
> Serge, when you get a chance, please make this change and resumbit.
> 

Ok. I'll do it in an hour.

Regards,
-Sergey

> Thanks,
> Jon
> 
> 
> > 
> > There are two interesting cases to consider though:
> > 
> > - if someone wants to implement an NTB based protocol
> >   using big-endian data on the wire, you probably want to add
> >   a ntb_peer_spad_read_be()/ntb_peer_msg_write_be()
> >   set of interfaces, to go along with ioread32_be()/iowrite32_be()
> >   the same way that ntb_peer_spad_read()/ntb_peer_msg_write()
> >   ends up doing ioread32()/iowrite32() with the implied little-endian
> >   behavior.
> > 
> > - memcpy_toio()/memcpy_fromio() and ioread32_rep()/iowrite32_rep
> >   importantly do not do any byteswap, they are meant to
> >   transfer byte streams.
> > 
> > Arnd

[PATCH v2] NTB: ntb_perf: fix cast to restricted __le32

2018-01-23 Thread Serge Semin

Sparse is whining about the u32 and __le32 mixed usage in the driver

drivers/ntb/test/ntb_perf.c:288:21: warning: cast to restricted __le32
drivers/ntb/test/ntb_perf.c:295:37: warning: incorrect type in argument 4 
(different base types)
drivers/ntb/test/ntb_perf.c:295:37:expected unsigned int [unsigned] 
[usertype] val
drivers/ntb/test/ntb_perf.c:295:37:got restricted __le32 [usertype] 

...

NTB hardware drivers shall accept CPU-endian data and translate it to
the portable formate by internal means, so the explicit conversions
are not necessary before Scratchpad/Messages API usage anymore.

Fixes: b83003b3fdc1 ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Serge Semin 
---
 drivers/ntb/test/ntb_perf.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 1829a17dd461..3fded6aeda08 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -273,21 +273,21 @@ static int perf_spad_cmd_send(struct perf_peer *peer, 
enum perf_cmd cmd,
 
sts = ntb_peer_spad_read(perf->ntb, peer->pidx,
 PERF_SPAD_CMD(perf->gidx));
-   if (le32_to_cpu(sts) != PERF_CMD_INVAL) {
+   if (sts != PERF_CMD_INVAL) {
usleep_range(MSG_UDELAY_LOW, MSG_UDELAY_HIGH);
continue;
}
 
ntb_peer_spad_write(perf->ntb, peer->pidx,
PERF_SPAD_LDATA(perf->gidx),
-   cpu_to_le32(lower_32_bits(data)));
+   lower_32_bits(data));
ntb_peer_spad_write(perf->ntb, peer->pidx,
PERF_SPAD_HDATA(perf->gidx),
-   cpu_to_le32(upper_32_bits(data)));
+   upper_32_bits(data));
mmiowb();
ntb_peer_spad_write(perf->ntb, peer->pidx,
PERF_SPAD_CMD(perf->gidx),
-   cpu_to_le32(cmd));
+   cmd);
mmiowb();
ntb_peer_db_set(perf->ntb, PERF_SPAD_NOTIFY(peer->gidx));
 
@@ -321,21 +321,20 @@ static int perf_spad_cmd_recv(struct perf_ctx *perf, int 
*pidx,
continue;
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_CMD(peer->gidx));
-   val = le32_to_cpu(val);
if (val == PERF_CMD_INVAL)
continue;
 
*cmd = val;
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_LDATA(peer->gidx));
-   *data = le32_to_cpu(val);
+   *data = val;
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_HDATA(peer->gidx));
-   *data |= (u64)le32_to_cpu(val) << 32;
+   *data |= (u64)val << 32;
 
/* Next command can be retrieved from now */
ntb_spad_write(perf->ntb, PERF_SPAD_CMD(peer->gidx),
-  cpu_to_le32(PERF_CMD_INVAL));
+  PERF_CMD_INVAL);
 
dev_dbg(&perf->ntb->dev, "CMD recv: %d 0x%llx\n", *cmd, *data);
 
@@ -371,7 +370,7 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum 
perf_cmd cmd,
return ret;
 
ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_LDATA,
- cpu_to_le32(lower_32_bits(data)));
+  lower_32_bits(data));
 
if (ntb_msg_read_sts(perf->ntb) & outbits) {
usleep_range(MSG_UDELAY_LOW, MSG_UDELAY_HIGH);
@@ -379,12 +378,11 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum 
perf_cmd cmd,
}
 
ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_HDATA,
- cpu_to_le32(upper_32_bits(data)));
+  upper_32_bits(data));
mmiowb();
 
/* This call shall trigger peer message event */
-   ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_CMD,
- cpu_to_le32(cmd));
+   ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_CMD, cmd);
 
break;
}
@@ -404,13 +402,13 @@ static int perf_msg_cmd_recv(struct perf_ctx *perf, int 
*pidx,
return -ENODATA;
 
val = ntb_msg_read(perf->ntb, pidx, PERF_MSG_CMD);
-   *cmd = le32_to_cpu(val);
+   *cmd = val;
 
val = ntb_msg_read(perf->ntb, pidx, PERF_MSG_LDATA);
-   *data = le32_to_cpu(val);
+   *data = val;
 
val = ntb_msg_read(perf->ntb, pidx, PERF_MSG_HDATA);
-   *data |= (u64

Re: [PATCH 14/14] MIPS: memblock: Deactivate bootmem allocator

2018-01-24 Thread Serge Semin

On Tue, Jan 23, 2018 at 11:59:35PM +, James Hogan  wrote:
> On Thu, Jan 18, 2018 at 01:23:12AM +0300, Serge Semin wrote:
> > Memblock allocator can be successfully used from now for early
> > memory management
> > 
> > Signed-off-by: Serge Semin 
> 
> Am I correct that intermediate commits in this patchset (i.e. bisection)
> may not work correctly, since bootmem will have been stripped out but
> NO_BOOTMEM=n and memblock may not be properly operational yet?
> 

Yes. You're absolutely right. The kernel will be buildable, but most
likely isn't operable until the PATCH 14 deactivates bootmem allocator.

> If so, is there a way to switch without breaking bisection that doesn't
> involve squashing most of the series into a single atomic commit?
> 

I don't think so. There is no way to switch without squashing at all,
at least since the alteration involves arch and platforms code, which
all relied on the bootmem allocator. Here is the list of patches, which
need to be combined to have the bisection unbroken:
[PATCH 03/14] MIPS: memblock: Reserve initrd memory in memblock
[PATCH 04/14] MIPS: memblock: Discard bootmem initialization
[PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
[PATCH 06/14] MIPS: memblock: Reserve kdump/crash regions in memblock
[PATCH 07/14] MIPS: memblock: Mark present sparsemem sections
[PATCH 08/14] MIPS: memblock: Simplify DMA contiguous reservation
[PATCH 09/14] MIPS: memblock: Allow memblock regions resize
[PATCH 12/14] MIPS: memblock: Discard bootmem from Loongson3 code
[PATCH 13/14] MIPS: memblock: Discard bootmem from SGI IP27 code
[PATCH 14/14] MIPS: memblock: Deactivate bootmem allocator

So the patches 03-09 imply the functional alterations so the arch code
would work correctly with memblock, the patches 13-14 alter the
platforms code of the specific NUMA devices like Loongson and
SGI IP27. After it's done the bootmem can be finally deactivated.

Regards,
-Sergey

> Cheers
> James
> 
> > ---
> >  arch/mips/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> > index 725b5ece7..a6c4fb6b6 100644
> > --- a/arch/mips/Kconfig
> > +++ b/arch/mips/Kconfig
> > @@ -4,7 +4,6 @@ config MIPS
> > default y
> > select ARCH_BINFMT_ELF_STATE
> > select ARCH_CLOCKSOURCE_DATA
> > -   select ARCH_DISCARD_MEMBLOCK
> > select ARCH_HAS_ELF_RANDOMIZE
> > select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> > select ARCH_MIGHT_HAVE_PC_PARPORT
> > @@ -57,6 +57,7 @@ config MIPS
> > select HAVE_IRQ_TIME_ACCOUNTING
> > select HAVE_KPROBES
> > select HAVE_KRETPROBES
> > +   select NO_BOOTMEM
> > select HAVE_MEMBLOCK
> > select HAVE_MEMBLOCK_NODE_MAP
> > select HAVE_MOD_ARCH_SPECIFIC
> > -- 
> > 2.12.0
> >

Re: [PATCH 02/14] MIPS: memblock: Surely map BSS kernel memory section

2018-01-24 Thread Serge Semin

Hello Matt,

On Wed, Jan 24, 2018 at 09:49:31AM +, Matt Redfearn 
 wrote:
> Hi Serge,
> 
> On 23/01/18 19:27, Serge Semin wrote:
> >Hello Matt,
> >
> >On Tue, Jan 23, 2018 at 11:03:27AM +, Matt Redfearn 
> > wrote:
> >>Hi Serge,
> >>
> >>On 22/01/18 21:47, Serge Semin wrote:
> >>>Hello Matt,
> >>>
> >>>On Mon, Jan 22, 2018 at 04:35:26PM +, Matt Redfearn 
> >>> wrote:
> >>>>Hi Serge,
> >>>>
> >>>>On 17/01/18 22:23, Serge Semin wrote:
> >>>>>The current MIPS code makes sure the kernel code/data/init
> >>>>>sections are in the maps, but BSS should also be there.
> >>>>
> >>>>Quite right - it should. But this was protected against by reserving all
> >>>>bootmem up to the _end symbol here:
> >>>>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> >>>>Which you remove in the next patch in this series. I'm not sure it is 
> >>>>worth
> >>>
> >>>Right. Missed that part. The old code just doesn't set the kernel memory 
> >>>free
> >>>calling the free_bootmem() method for non-reserved parts below 
> >>>reserved_end.
> >>>
> >>>>disentangling the reserved_end stuff from the next patch to make this 
> >>>>into a
> >>>>single logical change of reserving just .bss rather than everything below
> >>>>_end.
> >>>
> >>>Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
> >>>Add reserved memory regions to memblock". It logically belongs to that 
> >>>place.
> >>>Since basically by the arch_mem_addpart() calls we reserve all the kernel
> >>
> >>
> >>Actually I was wrong - it's not this sequence of arch_mem_addpart's that
> >>reserves the kernels memory. At least on DT based systems, it's pretty
> >>likely that these regions will overlap with the system memory already added.
> >>of_scan_flat_dt will look for the memory node and add it via
> >>early_init_dt_add_memory_arch.
> >>These calls to add the kernel text, init and bss detect that they overlap
> >>with the already present system memory, so don't get added, here:
> >>http://elixir.free-electrons.com/linux/v4.15-rc9/source/arch/mips/kernel/setup.c#L759
> >>
> >>As such, when we print out the content of boot_mem_map, we only have a
> >>single entry:
> >>
> >>[0.00] Determined physical RAM map:
> >>[0.00]  memory: 1000 @  (usable)
> >>
> >>
> >>>memory now I'd also merged them into a single call for the range [_text, 
> >>>_end].
> >>>What do you think?
> >>
> >>
> >>I think that this patch makes sense in case the .bss is for some reason not
> >>covered by an existing entry, but I would leave it as a separate patch.
> >>
> >>Your [PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock
> >>is actually self-contained since it replaces reserving all memory up to _end
> >>with the single reservation of the kernel's whole size
> >>
> >>+   size = __pa_symbol(&_end) - __pa_symbol(&_text);
> >>+   memblock_reserve(__pa_symbol(&_text), size);
> >>
> >>
> >>Which I think is definitely an improvement since it is much clearer.
> >>
> >
> >Alright lets sum it up. First of all, yeah, you are right, arch_mem_addpart()
> >is created to make sure the kernel memory is added to the memblock/bootmem 
> >pool.
> >The previous arch code was leaving such the memory range non-freed since it 
> >was
> >higher the reserved_end, so to make sure the early memory allocations 
> >wouldn't
> >be made from the pages, where kernel actually resides.
> >
> >In my code I still wanted to make sure the kernel memory is in the memblock 
> >pool.
> >But I also noticed, that .bss memory range wouldn't be added to the pool if 
> >neither
> >dts nor platform-specific code added any memory to the boot_mem_map pool. So 
> >I
> >decided to fix it. The actual kernel memory reservation is performed after 
> >all
> >the memory regions are declared by the code you cited. It's essential to do
> >the [_text, _end] memory range reservation there, otherwise memblock may
> >allocate from the memory range occupied by the k

Re: [PATCH 11/14] MIPS: memblock: Print out kernel virtual mem layout

2018-01-24 Thread Serge Semin

Hello Matt,

On Wed, Jan 24, 2018 at 09:46:07AM +, Matt Redfearn 
 wrote:
> Hi Serge,
> 
> On 23/01/18 19:10, Serge Semin wrote:
> >Hello Matt,
> >
> >On Tue, Jan 23, 2018 at 03:35:14PM +, Matt Redfearn 
> > wrote:
> >>Hi Serge,
> >>
> >>On 19/01/18 14:27, Serge Semin wrote:
> >>>On Fri, Jan 19, 2018 at 07:59:43AM +, Matt Redfearn 
> >>> wrote:
> >>>
> >>>Hello Matt,
> >>>
> >>>>Hi Serge,
> >>>>
> >>>>
> >>>>
> >>>>On 18/01/18 20:18, Serge Semin wrote:
> >>>>>On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli 
> >>>>> wrote:
> >>>>>>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>>>>>It is useful to have the kernel virtual memory layout printed
> >>>>>>>at boot time so to have the full information about the booted
> >>>>>>>kernel. In some cases it might be unsafe to have virtual
> >>>>>>>addresses freely visible in logs, so the %pK format is used if
> >>>>>>>one want to hide them.
> >>>>>>>
> >>>>>>>Signed-off-by: Serge Semin 
> >>>>>>
> >>>>>>I personally like having that information because that helps debug and
> >>>>>>have a quick reference, but there appears to be a trend to remove this
> >>>>>>in the name of security:
> >>>>>>
> >>>>>>https://patchwork.kernel.org/patch/10124007/
> >>>>>>
> >>>>>>maybe hide this behind a configuration option?
> >>>>>
> >>>>>Yeah, arm code was the place I picked the function up.) But in my case
> >>>>>I've used %pK so the pointers would disappear from logging when
> >>>>>kptr_restrict sysctl is 1 or 2.
> >>>>>I agree, that we might need to make the printouts optional. If there is
> >>>>>any kernel config, which for instance increases the kernel security we
> >>>>>could also use it or anything else to discard the printouts at compile
> >>>>>time.
> >>>>
> >>>>
> >>>>Certainly, when KASLR is active it would be preferable to hide this
> >>>>information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> >>>>additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> >>>>that only people actively debugging the kernel see it:
> >>>>
> >>>>http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604
> >>>
> >>>Ok. I'll hide the printouts behind both of that config macros in the next 
> >>>patchset
> >>>version.
> >>
> >>
> >>Another thing to note - since ad67b74d2469d ("printk: hash addresses printed
> >>with %p") %pK at this time in the boot process is useless since the RNG is
> >>not sufficiently initialised and all prints end up being "(ptrval)". Hence
> >>after v4.15-rc2 we end up with output like:
> >>
> >>[0.00] Kernel virtual memory layout:
> >>[0.00] lowmem  : 0x(ptrval) - 0x(ptrval)  ( 256 MB)
> >>[0.00]   .text : 0x(ptrval) - 0x(ptrval)  (7374 kB)
> >>[0.00]   .data : 0x(ptrval) - 0x(ptrval)  (1901 kB)
> >>[0.00]   .init : 0x(ptrval) - 0x(ptrval)  (1600 kB)
> >>[0.00]   .bss  : 0x(ptrval) - 0x(ptrval)  ( 415 kB)
> >>[0.00] vmalloc : 0x(ptrval) - 0x(ptrval)  (1023 MB)
> >>[0.00] fixmap  : 0x(ptrval) - 0x(ptrval)  (  68 kB)
> >>
> >
> >It must be some bug in the algo. What point in the %pK then? According to
> >the documentation the only way to see the pointers is when (kptr_restrict == 
> >0).
> >But if it is we don't get into the restricted_pointer() method at all:
> >http://elixir.free-electrons.com/linux/v4.15-rc9/source/lib/vsprintf.c#L1934
> >In this case the vsprintf() executes the method ptr_to_id(), which of course
> >default to _not_ leak addresses, and hash it before printing.
> >
> >Really %pK isn't supposed to be dependent from RNG at all since kptr_restrict
> >doesn't do any value randomization.
> 
> 
> That was true until v4.15-rc2. The behavior of %pK was changed without that
> being reflected in the documentation. A patch
> (https://pat

Re: [PATCH 00/14] MIPS: memblock: Switch arch code to NO_BOOTMEM

2018-01-25 Thread Serge Semin

Hello Alexander,

On Thu, Jan 25, 2018 at 06:58:07PM +0100, Alexander Sverdlin 
 wrote:
> Hello Serge,
> 
> On 17/01/18 23:22, Serge Semin wrote:
> > The patchset is applied on top of kernel 4.15-rc8 and can be found
> > submitted at my repo:
> > https://github.com/fancer/Linux-kernel-MIPS-memblock-project
> 
> I've tested the Linux from your repo on Octeon2 and it looks good to me.
> I've only tested startup though. Therefore,
> 
> Tested-by: Alexander Sverdlin 
> 

Great! Thank you very much for doing this. I'll include the info about all the
tested platforms to the cover letter of the next patchset.

> I've noticed one positive effect I cannot explain -- with almost the same
> physical memory map I observe almost 2 megabytes more available memory
> after startup:
> 
> without patches:
> 
> root@(none):~ >free
>   totalusedfree  shared  buff/cache   
> available
> Mem: 955040   16264  839948   80068   98828  
> 810068
> Swap: 0   0   0
> 
> memory map:
> 
> memory: 01090dc0 @ 0900 (usable after init)
> memory: 0540 @ 02b0 (usable)
> memory: 00c0 @ 0820 (usable)
> memory: 0480 @ 0a10 (usable)
> memory: 1fc0 @ 2000 (usable)
> memory: 1000 @ 4000 (usable)
> memory: 0190a9d0 @ 0110 (usable)
> 
> 
> 
> with patches:
> 
> root@(none):~ >free
>   totalusedfree  shared  buff/cache   
> available
> Mem: 955028   14292  841884   80068   98852  
> 811996
> Swap: 0   0   0
> 
> memory map:
> 
> memory: 01090e00 @ 0900 (usable after init)
> memory: 0540 @ 02b0 (usable)
> memory: 00c0 @ 0820 (usable)
> memory: 0480 @ 0a10 (usable)
> memory: 1fc0 @ 2000 (usable)
> memory: 1000 @ 4000 (usable)
> memory: 0190c9d0 @ 0110 (usable)
> 

That's interesting. My suggestion is that the old code used to reserve all
the memory below kernel _end symbol. So if the kernel isn't loaded right at
the start of the lowest memory range, then there is going to be a wasted
memory between the range start and the _text kernel symbol:
[PATCH 04/14] MIPS: memblock: Discard bootmem initialization
My code reserves only the memory occupied by the kernel within [_text, _end]:
[PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock

There might be some other reason of the lesser memory consumption though.
Hopefully I didn't forget to reserve some necessary memory ranges.)

Regards,
-Sergey

> 
> > Signed-off-by: Serge Semin 
> > 
> > Serge Semin (14):
> >   MIPS: memblock: Add RESERVED_NOMAP memory flag
> >   MIPS: memblock: Surely map BSS kernel memory section
> >   MIPS: memblock: Reserve initrd memory in memblock
> >   MIPS: memblock: Discard bootmem initialization
> >   MIPS: memblock: Add reserved memory regions to memblock
> >   MIPS: memblock: Reserve kdump/crash regions in memblock
> >   MIPS: memblock: Mark present sparsemem sections
> >   MIPS: memblock: Simplify DMA contiguous reservation
> >   MIPS: memblock: Allow memblock regions resize
> >   MIPS: memblock: Perform early low memory test
> >   MIPS: memblock: Print out kernel virtual mem layout
> >   MIPS: memblock: Discard bootmem from Loongson3 code
> >   MIPS: memblock: Discard bootmem from SGI IP27 code
> >   MIPS: memblock: Deactivate bootmem allocator
> 
> -- 
> Best regards,
> Alexander Sverdlin.

Re: [PATCH 00/14] MIPS: memblock: Switch arch code to NO_BOOTMEM

2018-01-30 Thread Serge Semin

So, since there haven't been any new comments for over a week, I'll be
collecting the patchset v2 tomorrow.

Regards,
-Sergey

On Thu, Jan 18, 2018 at 01:22:58AM +0300, Serge Semin  
wrote:
> Even though it's common to see the architecture code using both
> bootmem and memblock early memory allocators, it's not good for
> multiple reasons. First of all, it's redundant to have two
> early memory allocator while one would be more than enough from
> functionality and stability points of view. Secondly, some new
> features introduced in the kernel utilize the methods of the most
> modern allocator ignoring the older one. It means the architecture
> code must keep the both subsystems up synchronized with information
> about memory regions and reservations, which leads to the code
> complexity increase, that obviously increases bugs probability.
> Finally it's better to keep all the architectures code unified for
> better readability and code simplification. All these reasons lead
> to one conclusion - arch code should use just one memory allocator,
> which is supposed to be memblock as the most modern and already
> utilized by the most of the kernel platforms. This patchset is
> mostly about it.
> 
> One more reason why the MIPS arch code should finally move to
> memblock is a BUG somewhere in the initialization process, when
> CMA is activated:
> 
> [0.248762] BUG: Bad page state in process swapper/0  pfn:01f93
> [0.255415] page:8205b0ac count:0 mapcount:-127 mapping:  (null) index:0x1
> [0.263172] flags: 0x4000()
> [0.266723] page dumped because: nonzero mapcount
> [0.272049] Modules linked in:
> [0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
> [0.282900] Stack :   80b6dd6a 003a   
> 8093 8092bff4
>   86073a14 80ac88c7 809f21ac  0001 80b6998c 0400 
> 
>   80a0 801822e8 80b6dd68  0002  809f8024 
> 86077ccc
>   80b8 801e9328 809fcbc0  0400 0001 86077ccc 
> 86073a14
>          
> 
>   ...
> [0.323148] Call Trace:
> [0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
> [0.330859] [<80404814>] dump_stack+0xd4/0x110
> [0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
> [0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
> [0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
> [0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
> [0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
> [0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
> [0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
> [0.376828] [<808faca4>] kernel_init+0x14/0x104
> [0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
> 
> The bugus pfn seems to be the one allocated for bootmem allocator
> pages and hasn't been freed before letting the CMA working with its
> areas. Anyway the bug is solved by this patchset.
> 
> Another reason why this patchset is useful is that it fixes the fdt
> reserved-memory nodes functionality for MIPS. Really it's bug to have
> the fdt reserved nodes scanning before the memblock is
> fully initialized (calling early_init_fdt_scan_reserved_mem before
> bootmem_init is called). Additionally no-map flag of the
> reserved-memory node hasn't been taking into account. This patchset
> fixes all of these.
> 
> As you probably remember I already did another attempt to merge a
> similar functionality into the kernel. This time the patchset got
> to be less complex (14 patches vs 21 last time) and fixes the
> platform code like SGI IP27 and Loongson3, which due to being
> NUMA introduce its own memory initialization process. Although
> I have much doubt in SGI IP27 code operability in the first place,
> since it got prom_meminit() method of early memory initialization,
> which hasn't been called at any other place in the kernel. It must
> have been left there unrenamed after arch/mips/mips-boards/generic
> code had been discarded.
> 
> Here are the list of folks, who agreed to perform some tests of
> the patchset:
> Alexander Sverdlin  - Octeon2
> Matt Redfearn  - Loongson3, etc
> Joshua Kinard  - IP27
> Marcin Nowakowski 
> Thanks to you all in regards and for everybody, who will be involved
> in reviewing and testing.
> 
> The patchset is applied on top of kernel 4.15-rc8 and can be found
> submitted at my repo:
> https://github.com/fancer/Linux-kernel-MIPS-memblock-project
> 
> Signed-off-by: Serge Semin 
> 
> Serge Semin (14)

Re: [PATCH 11/14] MIPS: memblock: Print out kernel virtual mem layout

2018-01-18 Thread Serge Semin

On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli 
 wrote:
> On 01/17/2018 02:23 PM, Serge Semin wrote:
> > It is useful to have the kernel virtual memory layout printed
> > at boot time so to have the full information about the booted
> > kernel. In some cases it might be unsafe to have virtual
> > addresses freely visible in logs, so the %pK format is used if
> > one want to hide them.
> > 
> > Signed-off-by: Serge Semin 
> 
> I personally like having that information because that helps debug and
> have a quick reference, but there appears to be a trend to remove this
> in the name of security:
> 
> https://patchwork.kernel.org/patch/10124007/
> 
> maybe hide this behind a configuration option?

Yeah, arm code was the place I picked the function up.) But in my case
I've used %pK so the pointers would disappear from logging when 
kptr_restrict sysctl is 1 or 2.
I agree, that we might need to make the printouts optional. If there is
any kernel config, which for instance increases the kernel security we
could also use it or anything else to discard the printouts at compile
time.

> -- 
> Florian

Re: [PATCH 11/14] MIPS: memblock: Print out kernel virtual mem layout

2018-01-19 Thread Serge Semin

On Fri, Jan 19, 2018 at 07:59:43AM +, Matt Redfearn 
 wrote:

Hello Matt,

> Hi Serge,
> 
> 
> 
> On 18/01/18 20:18, Serge Semin wrote:
> >On Thu, Jan 18, 2018 at 12:03:03PM -0800, Florian Fainelli 
> > wrote:
> >>On 01/17/2018 02:23 PM, Serge Semin wrote:
> >>>It is useful to have the kernel virtual memory layout printed
> >>>at boot time so to have the full information about the booted
> >>>kernel. In some cases it might be unsafe to have virtual
> >>>addresses freely visible in logs, so the %pK format is used if
> >>>one want to hide them.
> >>>
> >>>Signed-off-by: Serge Semin 
> >>
> >>I personally like having that information because that helps debug and
> >>have a quick reference, but there appears to be a trend to remove this
> >>in the name of security:
> >>
> >>https://patchwork.kernel.org/patch/10124007/
> >>
> >>maybe hide this behind a configuration option?
> >
> >Yeah, arm code was the place I picked the function up.) But in my case
> >I've used %pK so the pointers would disappear from logging when
> >kptr_restrict sysctl is 1 or 2.
> >I agree, that we might need to make the printouts optional. If there is
> >any kernel config, which for instance increases the kernel security we
> >could also use it or anything else to discard the printouts at compile
> >time.
> 
> 
> Certainly, when KASLR is active it would be preferable to hide this
> information, so you could use CONFIG_RELOCATABLE. The existing KASLR stuff
> additionally hides this kind of information behind CONFIG_DEBUG_KERNEL, so
> that only people actively debugging the kernel see it:
> 
> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L604

Ok. I'll hide the printouts behind both of that config macros in the next 
patchset
version.

Regards,
-Sergey

> 
> Thanks,
> Matt
> 
> >
> >>-- 
> >>Florian

[PATCH] NTB: ntb_perf: fix cast to restricted __le32

2018-01-19 Thread Serge Semin

Sparse is whining about the u32 and __le32 mixed usage in the
driver.

drivers/ntb/test/ntb_perf.c:288:21: warning: cast to restricted __le32
drivers/ntb/test/ntb_perf.c:295:37: warning: incorrect type in argument 4 
(different base types)
drivers/ntb/test/ntb_perf.c:295:37:expected unsigned int [unsigned] 
[usertype] val
drivers/ntb/test/ntb_perf.c:295:37:got restricted __le32 [usertype] 

...

The NTB API can't be changed so ntb_spad_*() methods
would return either pure __le32 or __be32, since the scratchpad
data can have arbitrary endianness in general. In this case we
need to forcibly cast all the u32 to be __le32 and vise-versa
where it's supposed to be in accordance with the driver logic.

Fixes: b83003b3fdc1 ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Serge Semin 
---
 drivers/ntb/test/ntb_perf.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index eaa252b71f82..5590aaf8e8a5 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -283,21 +283,21 @@ static int perf_spad_cmd_send(struct perf_peer *peer, 
enum perf_cmd cmd,
 
sts = ntb_peer_spad_read(perf->ntb, peer->pidx,
 PERF_SPAD_CMD(perf->gidx));
-   if (le32_to_cpu(sts) != PERF_CMD_INVAL) {
+   if (le32_to_cpu((__force __le32)sts) != PERF_CMD_INVAL) {
usleep_range(MSG_UDELAY_LOW, MSG_UDELAY_HIGH);
continue;
}
 
ntb_peer_spad_write(perf->ntb, peer->pidx,
-   PERF_SPAD_LDATA(perf->gidx),
-   cpu_to_le32(lower_32_bits(data)));
+   PERF_SPAD_LDATA(perf->gidx),
+   (__force u32)cpu_to_le32(lower_32_bits(data)));
ntb_peer_spad_write(perf->ntb, peer->pidx,
-   PERF_SPAD_HDATA(perf->gidx),
-   cpu_to_le32(upper_32_bits(data)));
+   PERF_SPAD_HDATA(perf->gidx),
+   (__force u32)cpu_to_le32(upper_32_bits(data)));
mmiowb();
ntb_peer_spad_write(perf->ntb, peer->pidx,
PERF_SPAD_CMD(perf->gidx),
-   cpu_to_le32(cmd));
+   (__force u32)cpu_to_le32(cmd));
mmiowb();
ntb_peer_db_set(perf->ntb, PERF_SPAD_NOTIFY(peer->gidx));
 
@@ -331,21 +331,21 @@ static int perf_spad_cmd_recv(struct perf_ctx *perf, int 
*pidx,
continue;
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_CMD(peer->gidx));
-   val = le32_to_cpu(val);
+   val = le32_to_cpu((__force __le32)val);
if (val == PERF_CMD_INVAL)
continue;
 
*cmd = val;
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_LDATA(peer->gidx));
-   *data = le32_to_cpu(val);
+   *data = le32_to_cpu((__force __le32)val);
 
val = ntb_spad_read(perf->ntb, PERF_SPAD_HDATA(peer->gidx));
-   *data |= (u64)le32_to_cpu(val) << 32;
+   *data |= (u64)le32_to_cpu((__force __le32)val) << 32;
 
/* Next command can be retrieved from now */
ntb_spad_write(perf->ntb, PERF_SPAD_CMD(peer->gidx),
-  cpu_to_le32(PERF_CMD_INVAL));
+  (__force u32)cpu_to_le32(PERF_CMD_INVAL));
 
dev_dbg(&perf->ntb->dev, "CMD recv: %d 0x%llx\n", *cmd, *data);
 
@@ -381,7 +381,7 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum 
perf_cmd cmd,
return ret;
 
ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_LDATA,
- cpu_to_le32(lower_32_bits(data)));
+   (__force u32)cpu_to_le32(lower_32_bits(data)));
 
if (ntb_msg_read_sts(perf->ntb) & outbits) {
usleep_range(MSG_UDELAY_LOW, MSG_UDELAY_HIGH);
@@ -389,12 +389,12 @@ static int perf_msg_cmd_send(struct perf_peer *peer, enum 
perf_cmd cmd,
}
 
ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_HDATA,
- cpu_to_le32(upper_32_bits(data)));
+   (__force u32)cpu_to_le32(upper_32_bits(data)));
mmiowb();
 
/* This call shall trigger peer message event */
ntb_peer_msg_write(perf->ntb, peer->pidx, PERF_MSG_CMD,
- cpu_to_le32(cmd));
+

Re: [PATCH] NTB: ntb_perf: fix cast to restricted __le32

2018-01-19 Thread Serge Semin

On Fri, Jan 19, 2018 at 09:42:17PM +0100, Arnd Bergmann  wrote:
> On Fri, Jan 19, 2018 at 6:30 PM, Serge Semin  wrote:
> > Sparse is whining about the u32 and __le32 mixed usage in the
> > driver.
> >
> > drivers/ntb/test/ntb_perf.c:288:21: warning: cast to restricted __le32
> > drivers/ntb/test/ntb_perf.c:295:37: warning: incorrect type in argument 4 
> > (different base types)
> > drivers/ntb/test/ntb_perf.c:295:37:expected unsigned int [unsigned] 
> > [usertype] val
> > drivers/ntb/test/ntb_perf.c:295:37:got restricted __le32 [usertype] 
> > 
> > ...
> >
> > The NTB API can't be changed so ntb_spad_*() methods
> > would return either pure __le32 or __be32, since the scratchpad
> > data can have arbitrary endianness in general. In this case we
> > need to forcibly cast all the u32 to be __le32 and vise-versa
> > where it's supposed to be in accordance with the driver logic.
> >
> 
> There's got to be a better way to do this than sprinkling lots of __force
> typecasts throughout the code.
> 
> It looks like all those casts are about
> ntb_peer_spad_read()/ntb_peer_msg_write() calls, so why not change
> those function prototypes to work on __le32 types?
> 
> There should also be some form of documentation regarding why you
> need to swap the data twice, since all the ntb drivers later end up
> doing another cpu_to_le32() on the little-endian data.
> 
>Arnd

Actually the provided patch is the best solution I could come up with.
The thing is, that the methods can't be changed. Those functions are
the part of the NTB API methods used by many drivers. So basically they
are like pci_{read,write}_config_{byte,word,dword}() methods. We can't
change their prototypes only because it's suit some driver. The methods
give an access to the NTB device dummy u32-sized registers, nothing
else. So endianness is the transmitted data settings in this case.

NTB is the technology to interconnect some two systems with possibly
different endianness (unlike PCI, which interconnect CPU with LE devices).
In this case I'd need to set some agreement up between two systems about
the endianness of the exchanged data like host and network types in
Linux networking. I've chosen the network data to be little-endian,
that's why I needed first to convert them from CPU to le32, then on
remote side convert them back from le32 to CPU.

If you have any better suggestion how the warning can be fixed, I'd
be glad to stick to it.

-Sergey

Re: [PATCH] NTB: ntb_perf: fix cast to restricted __le32

2018-01-19 Thread Serge Semin

On Sat, Jan 20, 2018 at 12:03:10AM +0300, Serge Semin  
wrote:
> On Fri, Jan 19, 2018 at 09:42:17PM +0100, Arnd Bergmann  wrote:
> > On Fri, Jan 19, 2018 at 6:30 PM, Serge Semin  
> > wrote:
> > > Sparse is whining about the u32 and __le32 mixed usage in the
> > > driver.
> > >
> > > drivers/ntb/test/ntb_perf.c:288:21: warning: cast to restricted __le32
> > > drivers/ntb/test/ntb_perf.c:295:37: warning: incorrect type in argument 4 
> > > (different base types)
> > > drivers/ntb/test/ntb_perf.c:295:37:expected unsigned int [unsigned] 
> > > [usertype] val
> > > drivers/ntb/test/ntb_perf.c:295:37:got restricted __le32 [usertype] 
> > > 
> > > ...
> > >
> > > The NTB API can't be changed so ntb_spad_*() methods
> > > would return either pure __le32 or __be32, since the scratchpad
> > > data can have arbitrary endianness in general. In this case we
> > > need to forcibly cast all the u32 to be __le32 and vise-versa
> > > where it's supposed to be in accordance with the driver logic.
> > >
> > 
> > There's got to be a better way to do this than sprinkling lots of __force
> > typecasts throughout the code.
> > 
> > It looks like all those casts are about
> > ntb_peer_spad_read()/ntb_peer_msg_write() calls, so why not change
> > those function prototypes to work on __le32 types?
> > 
> > There should also be some form of documentation regarding why you
> > need to swap the data twice, since all the ntb drivers later end up
> > doing another cpu_to_le32() on the little-endian data.
> > 
> >Arnd
> 
> Actually the provided patch is the best solution I could come up with.
> The thing is, that the methods can't be changed. Those functions are
> the part of the NTB API methods used by many drivers. So basically they
> are like pci_{read,write}_config_{byte,word,dword}() methods. We can't
> change their prototypes only because it's suit some driver. The methods
> give an access to the NTB device dummy u32-sized registers, nothing
> else. So endianness is the transmitted data settings in this case.
> 
> NTB is the technology to interconnect some two systems with possibly
> different endianness (unlike PCI, which interconnect CPU with LE devices).
> In this case I'd need to set some agreement up between two systems about
> the endianness of the exchanged data like host and network types in
> Linux networking. I've chosen the network data to be little-endian,
> that's why I needed first to convert them from CPU to le32, then on
> remote side convert them back from le32 to CPU.
> 
> If you have any better suggestion how the warning can be fixed, I'd
> be glad to stick to it.
> 
> -Sergey
> 

I meant, everything depends on the NTB hardware driver hidden behind the
API. If it does back and forth conversions writing/reading data to/from
scratchpad registers (using IO methods like write32/read32), then
I don't need to worry about data endianness at all and should have
discarded le32_to_cpu()/cpu_to_l32() usage. But if it doesn't do it,
then ntb_perf driver will be in trouble. So I sticked with the safest
solution. Although the final decision is after the subsystem maintainer.

-Sergey

Re: [PATCH][next] NTB: ntb_tool: fix memory leak on 'buf' on error exit path

2018-01-22 Thread Serge Semin

On Mon, Jan 22, 2018 at 09:38:57AM +, Colin King  
wrote:
> From: Colin Ian King 
> 
> Currently there is a memory leak on buf when the call to ntb_mw_get_align
> fails.  Add an exit err label and jump to this so that kfree on buf frees
> the memory.
> 
> Detected by CoverityScan, CID#1464286 ("Resource leak")
> 
> Fixes: d637628ce00c ("NTB: ntb_tool: Add full multi-port NTB API support")
> Signed-off-by: Colin Ian King 

Good catch, thanks!

Acked-by: Serge Semin 

> ---
>  drivers/ntb/test/ntb_tool.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
> index 920fc9b161b0..d592c0ffbd19 100644
> --- a/drivers/ntb/test/ntb_tool.c
> +++ b/drivers/ntb/test/ntb_tool.c
> @@ -659,7 +659,7 @@ static ssize_t tool_mw_trans_read(struct file *filep, 
> char __user *ubuf,
>   ret = ntb_mw_get_align(inmw->tc->ntb, inmw->pidx, inmw->widx,
>  &addr_align, &size_align, &size_max);
>   if (ret)
> - return ret;
> + goto err;
>  
>   off += scnprintf(buf + off, buf_size - off,
>"Inbound MW \t%d\n",
> @@ -694,6 +694,8 @@ static ssize_t tool_mw_trans_read(struct file *filep, 
> char __user *ubuf,
>&size_max);
>  
>   ret = simple_read_from_buffer(ubuf, size, offp, buf, off);
> +
> +err:
>   kfree(buf);
>  
>   return ret;
> -- 
> 2.15.1
>

Re: [PATCH 00/14] MIPS: memblock: Switch arch code to NO_BOOTMEM

2018-01-22 Thread Serge Semin

On Mon, Jan 22, 2018 at 04:36:40PM +, Matt Redfearn 
 wrote:

Hello Matt,

> Hi Serge,
> 
> On 17/01/18 22:22, Serge Semin wrote:
> >Even though it's common to see the architecture code using both
> >bootmem and memblock early memory allocators, it's not good for
> >multiple reasons. First of all, it's redundant to have two
> >early memory allocator while one would be more than enough from
> >functionality and stability points of view. Secondly, some new
> >features introduced in the kernel utilize the methods of the most
> >modern allocator ignoring the older one. It means the architecture
> >code must keep the both subsystems up synchronized with information
> >about memory regions and reservations, which leads to the code
> >complexity increase, that obviously increases bugs probability.
> >Finally it's better to keep all the architectures code unified for
> >better readability and code simplification. All these reasons lead
> >to one conclusion - arch code should use just one memory allocator,
> >which is supposed to be memblock as the most modern and already
> >utilized by the most of the kernel platforms. This patchset is
> >mostly about it.
> >
> >One more reason why the MIPS arch code should finally move to
> >memblock is a BUG somewhere in the initialization process, when
> >CMA is activated:
> >
> >[0.248762] BUG: Bad page state in process swapper/0  pfn:01f93
> >[0.255415] page:8205b0ac count:0 mapcount:-127 mapping:  (null) index:0x1
> >[0.263172] flags: 0x4000()
> >[0.266723] page dumped because: nonzero mapcount
> >[0.272049] Modules linked in:
> >[0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
> >[0.282900] Stack :   80b6dd6a 003a   
> >8093 8092bff4
> >   86073a14 80ac88c7 809f21ac  0001 80b6998c 0400 
> > 
> >   80a0 801822e8 80b6dd68  0002  809f8024 
> > 86077ccc
> >   80b8 801e9328 809fcbc0  0400 0001 86077ccc 
> > 86073a14
> >          
> > 
> >   ...
> >[0.323148] Call Trace:
> >[0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
> >[0.330859] [<80404814>] dump_stack+0xd4/0x110
> >[0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
> >[0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
> >[0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
> >[0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
> >[0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
> >[0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
> >[0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
> >[0.376828] [<808faca4>] kernel_init+0x14/0x104
> >[0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c
> >
> >The bugus pfn seems to be the one allocated for bootmem allocator
> >pages and hasn't been freed before letting the CMA working with its
> >areas. Anyway the bug is solved by this patchset.
> >
> >Another reason why this patchset is useful is that it fixes the fdt
> >reserved-memory nodes functionality for MIPS. Really it's bug to have
> >the fdt reserved nodes scanning before the memblock is
> >fully initialized (calling early_init_fdt_scan_reserved_mem before
> >bootmem_init is called). Additionally no-map flag of the
> >reserved-memory node hasn't been taking into account. This patchset
> >fixes all of these.
> >
> >As you probably remember I already did another attempt to merge a
> >similar functionality into the kernel. This time the patchset got
> >to be less complex (14 patches vs 21 last time) and fixes the
> >platform code like SGI IP27 and Loongson3, which due to being
> >NUMA introduce its own memory initialization process. Although
> >I have much doubt in SGI IP27 code operability in the first place,
> >since it got prom_meminit() method of early memory initialization,
> >which hasn't been called at any other place in the kernel. It must
> >have been left there unrenamed after arch/mips/mips-boards/generic
> >code had been discarded.
> >
> >Here are the list of folks, who agreed to perform some tests of
> >the patchset:
> >Alexander Sverdlin  - Octeon2
> >Matt Redfearn  - Loongson3, etc
> 
> 
> I have applied and tested these patches on various platforms that we have
> available here, and the kern

Re: [PATCH 02/14] MIPS: memblock: Surely map BSS kernel memory section

2018-01-22 Thread Serge Semin

Hello Matt,

On Mon, Jan 22, 2018 at 04:35:26PM +, Matt Redfearn 
 wrote:
> Hi Serge,
> 
> On 17/01/18 22:23, Serge Semin wrote:
> >The current MIPS code makes sure the kernel code/data/init
> >sections are in the maps, but BSS should also be there.
> 
> Quite right - it should. But this was protected against by reserving all
> bootmem up to the _end symbol here:
> http://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/mips/kernel/setup.c#L388
> Which you remove in the next patch in this series. I'm not sure it is worth

Right. Missed that part. The old code just doesn't set the kernel memory free
calling the free_bootmem() method for non-reserved parts below reserved_end.

> disentangling the reserved_end stuff from the next patch to make this into a
> single logical change of reserving just .bss rather than everything below
> _end.

Good point. I'll move this change into the "[PATCH 05/14] MIPS: memblock:
Add reserved memory regions to memblock". It logically belongs to that place.
Since basically by the arch_mem_addpart() calls we reserve all the kernel
memory now I'd also merged them into a single call for the range [_text, _end].
What do you think?

Regards,
-Sergey

> 
> Reviewed-by: Matt Redfearn 
> 
> Thanks,
> Matt
> 
> >
> >Signed-off-by: Serge Semin 
> >---
> >  arch/mips/kernel/setup.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> >diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> >index 76e9e2075..0d21c9e04 100644
> >--- a/arch/mips/kernel/setup.c
> >+++ b/arch/mips/kernel/setup.c
> >@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
> > arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
> >  PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
> >  BOOT_MEM_INIT_RAM);
> >+arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
> >+ PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
> >+ BOOT_MEM_RAM);
> > pr_info("Determined physical RAM map:\n");
> > print_memory_map();
> >

[PATCH 00/14] MIPS: memblock: Switch arch code to NO_BOOTMEM

2018-01-17 Thread Serge Semin

Even though it's common to see the architecture code using both
bootmem and memblock early memory allocators, it's not good for
multiple reasons. First of all, it's redundant to have two
early memory allocator while one would be more than enough from
functionality and stability points of view. Secondly, some new
features introduced in the kernel utilize the methods of the most
modern allocator ignoring the older one. It means the architecture
code must keep the both subsystems up synchronized with information
about memory regions and reservations, which leads to the code
complexity increase, that obviously increases bugs probability.
Finally it's better to keep all the architectures code unified for
better readability and code simplification. All these reasons lead
to one conclusion - arch code should use just one memory allocator,
which is supposed to be memblock as the most modern and already
utilized by the most of the kernel platforms. This patchset is
mostly about it.

One more reason why the MIPS arch code should finally move to
memblock is a BUG somewhere in the initialization process, when
CMA is activated:

[0.248762] BUG: Bad page state in process swapper/0  pfn:01f93
[0.255415] page:8205b0ac count:0 mapcount:-127 mapping:  (null) index:0x1
[0.263172] flags: 0x4000()
[0.266723] page dumped because: nonzero mapcount
[0.272049] Modules linked in:
[0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
[0.282900] Stack :   80b6dd6a 003a   
8093 8092bff4
  86073a14 80ac88c7 809f21ac  0001 80b6998c 0400 

  80a0 801822e8 80b6dd68  0002  809f8024 
86077ccc
  80b8 801e9328 809fcbc0  0400 0001 86077ccc 
86073a14
         

  ...
[0.323148] Call Trace:
[0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
[0.330859] [<80404814>] dump_stack+0xd4/0x110
[0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
[0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
[0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
[0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
[0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
[0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
[0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
[0.376828] [<808faca4>] kernel_init+0x14/0x104
[0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c

The bugus pfn seems to be the one allocated for bootmem allocator
pages and hasn't been freed before letting the CMA working with its
areas. Anyway the bug is solved by this patchset.

Another reason why this patchset is useful is that it fixes the fdt
reserved-memory nodes functionality for MIPS. Really it's bug to have
the fdt reserved nodes scanning before the memblock is
fully initialized (calling early_init_fdt_scan_reserved_mem before
bootmem_init is called). Additionally no-map flag of the
reserved-memory node hasn't been taking into account. This patchset
fixes all of these.

As you probably remember I already did another attempt to merge a
similar functionality into the kernel. This time the patchset got
to be less complex (14 patches vs 21 last time) and fixes the
platform code like SGI IP27 and Loongson3, which due to being
NUMA introduce its own memory initialization process. Although
I have much doubt in SGI IP27 code operability in the first place,
since it got prom_meminit() method of early memory initialization,
which hasn't been called at any other place in the kernel. It must
have been left there unrenamed after arch/mips/mips-boards/generic
code had been discarded.

Here are the list of folks, who agreed to perform some tests of
the patchset:
Alexander Sverdlin  - Octeon2
Matt Redfearn  - Loongson3, etc
Joshua Kinard  - IP27
Marcin Nowakowski 
Thanks to you all in regards and for everybody, who will be involved
in reviewing and testing.

The patchset is applied on top of kernel 4.15-rc8 and can be found
submitted at my repo:
https://github.com/fancer/Linux-kernel-MIPS-memblock-project

Signed-off-by: Serge Semin 

Serge Semin (14):
  MIPS: memblock: Add RESERVED_NOMAP memory flag
  MIPS: memblock: Surely map BSS kernel memory section
  MIPS: memblock: Reserve initrd memory in memblock
  MIPS: memblock: Discard bootmem initialization
  MIPS: memblock: Add reserved memory regions to memblock
  MIPS: memblock: Reserve kdump/crash regions in memblock
  MIPS: memblock: Mark present sparsemem sections
  MIPS: memblock: Simplify DMA contiguous reservation
  MIPS: memblock: Allow memblock regions resize
  MIPS: memblock: Perform early low memory test
  MIPS: memblock: Print out kernel virtual mem layout
  MIPS: memblock: Discard bootmem from Loongson3 code
  MIPS: membl

[PATCH 14/14] MIPS: memblock: Deactivate bootmem allocator

2018-01-17 Thread Serge Semin

Memblock allocator can be successfully used from now for early
memory management

Signed-off-by: Serge Semin 
---
 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 725b5ece7..a6c4fb6b6 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,7 +4,6 @@ config MIPS
default y
select ARCH_BINFMT_ELF_STATE
select ARCH_CLOCKSOURCE_DATA
-   select ARCH_DISCARD_MEMBLOCK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_MIGHT_HAVE_PC_PARPORT
@@ -57,6 +57,7 @@ config MIPS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KPROBES
select HAVE_KRETPROBES
+   select NO_BOOTMEM
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MOD_ARCH_SPECIFIC
-- 
2.12.0

[PATCH 06/14] MIPS: memblock: Reserve kdump/crash regions in memblock

2018-01-17 Thread Serge Semin

Kdump/crashkernel memory regions should be reserved in the
memblock allocator so they wouldn't be occupied by any further
allocations.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 9e14d9833..b121fa702 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -849,17 +849,15 @@ static void __init arch_mem_init(char **cmdline_p)
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
   setup_elfcorehdr, setup_elfcorehdr_size);
-   reserve_bootmem(setup_elfcorehdr, setup_elfcorehdr_size,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(setup_elfcorehdr, setup_elfcorehdr_size);
}
 #endif
 
mips_parse_crashkernel();
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
-   reserve_bootmem(crashk_res.start,
-   crashk_res.end - crashk_res.start + 1,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(crashk_res.start,
+crashk_res.end - crashk_res.start + 1);
 #endif
device_tree_init();
sparse_init();
-- 
2.12.0

[PATCH 01/14] MIPS: memblock: Add RESERVED_NOMAP memory flag

2018-01-17 Thread Serge Semin

Even if nomap flag is specified the reserved memory declared in dts
isn't really discarded from the buddy allocator in the current code.
We'll fix it by adding the no-map MIPS memory flag. Additionally
lets add the RESERVED_NOMAP memory regions handling to the methods,
which aren't going to be changed in the further patches.

Signed-off-by: Serge Semin 
---
 arch/mips/include/asm/bootinfo.h | 1 +
 arch/mips/kernel/prom.c  | 8 ++--
 arch/mips/kernel/setup.c | 8 
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index b603804ca..f7be3148a 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -90,6 +90,7 @@ extern unsigned long mips_machtype;
 #define BOOT_MEM_ROM_DATA  2
 #define BOOT_MEM_RESERVED  3
 #define BOOT_MEM_INIT_RAM  4
+#define BOOT_MEM_RESERVED_NOMAP5
 
 /*
  * A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 0dbcd152a..b123eb827 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,7 @@ char *mips_get_machine_name(void)
 #ifdef CONFIG_USE_OF
 void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-   return add_memory_region(base, size, BOOT_MEM_RAM);
+   add_memory_region(base, size, BOOT_MEM_RAM);
 }
 
 void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
@@ -52,7 +52,11 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 
align)
 int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
-   add_memory_region(base, size, BOOT_MEM_RESERVED);
+   if (!nomap)
+   add_memory_region(base, size, BOOT_MEM_RESERVED);
+   else
+   add_memory_region(base, size, BOOT_MEM_RESERVED_NOMAP);
+
return 0;
 }
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 4020d8f98..76e9e2075 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -172,6 +172,7 @@ bool __init memory_region_available(phys_addr_t start, 
phys_addr_t size)
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+   case BOOT_MEM_RESERVED_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -207,6 +208,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   printk(KERN_CONT "(reserved nomap)\n");
+   break;
default:
printk(KERN_CONT "type %lu\n", 
boot_mem_map.map[i].type);
break;
@@ -955,9 +969,13 @@ static void __init resource_init(void)
res->name = "System RAM";
res->flags |= IORESOURCE_SYSRAM;
break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   res->name = "reserved nomap";
+   break;
case BOOT_MEM_RESERVED:
default:
res->name = "reserved";
+   break;
}
 
request_resource(&iomem_resource, res);
-- 
2.12.0

[PATCH 12/14] MIPS: memblock: Discard bootmem from Loongson3 code

2018-01-17 Thread Serge Semin

Loongson64/3 runs its own code to initialize memory allocator in
case of NUMA configuration is selected. So in order to move to the
pure memblock utilization we discard the bootmem allocator usage
and insert the memblock reservation method for kernel/addrspace_offset
memory regions.

Signed-off-by: Serge Semin 
---
 arch/mips/loongson64/loongson-3/numa.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/mips/loongson64/loongson-3/numa.c 
b/arch/mips/loongson64/loongson-3/numa.c
index 282c5a8c2..902843516 100644
--- a/arch/mips/loongson64/loongson-3/numa.c
+++ b/arch/mips/loongson64/loongson-3/numa.c
@@ -180,7 +180,6 @@ static void __init szmem(unsigned int node)
 
 static void __init node_mem_init(unsigned int node)
 {
-   unsigned long bootmap_size;
unsigned long node_addrspace_offset;
unsigned long start_pfn, end_pfn, freepfn;
 
@@ -197,26 +196,21 @@ static void __init node_mem_init(unsigned int node)
 
__node_data[node] = prealloc__node_data + node;
 
-   NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
 
-   bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
-   start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
if (node == 0) /* used by finalize_initrd() */
max_low_pfn = end_pfn;
 
-   /* This is reserved for the kernel and bdata->node_bootmem_map */
-   reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
-   ((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
-   BOOTMEM_DEFAULT);
+   /* This is reserved for the kernel only */
+   if (node == 0)
+   memblock_reserve(start_pfn << PAGE_SHIFT,
+   ((freepfn - start_pfn) << PAGE_SHIFT));
 
if (node == 0 && node_end_pfn(0) >= (0x >> PAGE_SHIFT)) {
/* Reserve 0xfe00~0x for RS780E integrated GPU */
-   reserve_bootmem_node(NODE_DATA(node),
-   (node_addrspace_offset | 0xfe00),
-   32 << 20, BOOTMEM_DEFAULT);
+   memblock_reserve(node_addrspace_offset | 0xfe00, 32 << 20);
}
 
sparse_memory_present_with_active_regions(node);
-- 
2.12.0

[PATCH 09/14] MIPS: memblock: Allow memblock regions resize

2018-01-17 Thread Serge Semin

When all the main reservations are done the memblock regions
can be dynamically resized. Additionally it would be useful to have
memblock regions dumped on debug at this point.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index e0ca0d2bc..82c6b77f6 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -869,6 +869,10 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
+
+   memblock_allow_resize();
+
+   memblock_dump_all();
 }
 
 static void __init resource_init(void)
-- 
2.12.0

[PATCH 13/14] MIPS: memblock: Discard bootmem from SGI IP27 code

2018-01-17 Thread Serge Semin

SGI IP27 got its own code to set the early memory allocator up since it's
NUMA-based system. So in order to be compatible with NO_BOOTMEM config
we need to discard the bootmem allocator initialization and insert the
memblock reservation method. Although in my opinion the code isn't
working anyway since I couldn't find a place where prom_meminit() called
and kernel memory isn't reserved. It must have been untested since the
time the arch/mips/mips-boards/generic code was in the kernel.

Signed-off-by: Serge Semin 
---
 arch/mips/sgi-ip27/ip27-memory.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
index 8d0eb2643..d25758e25 100644
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -389,7 +389,6 @@ static void __init node_mem_init(cnodeid_t node)
 {
unsigned long slot_firstpfn = slot_getbasepfn(node, 0);
unsigned long slot_freepfn = node_getfirstfree(node);
-   unsigned long bootmap_size;
unsigned long start_pfn, end_pfn;
 
get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
@@ -400,7 +399,6 @@ static void __init node_mem_init(cnodeid_t node)
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE);
 
-   NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
 
@@ -409,12 +407,9 @@ static void __init node_mem_init(cnodeid_t node)
slot_freepfn += PFN_UP(sizeof(struct pglist_data) +
   sizeof(struct hub_data));
 
-   bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
-   start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
-   reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
-   ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(slot_firstpfn << PAGE_SHIFT,
+   ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT));
sparse_memory_present_with_active_regions(node);
 }
 
-- 
2.12.0

[PATCH 11/14] MIPS: memblock: Print out kernel virtual mem layout

2018-01-17 Thread Serge Semin

It is useful to have the kernel virtual memory layout printed
at boot time so to have the full information about the booted
kernel. In some cases it might be unsafe to have virtual
addresses freely visible in logs, so the %pK format is used if
one want to hide them.

Signed-off-by: Serge Semin 
---
 arch/mips/mm/init.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 15040266b..d3e6bb531 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,6 +61,51 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
 EXPORT_SYMBOL(zero_page_mask);
 
 /*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), 
SZ_1K)
+static void __init __maybe_unused mem_print_kmap_info(void)
+{
+   pr_notice("Kernel virtual memory layout:\n"
+ "lowmem  : 0x%pK - 0x%pK  (%4ld MB)\n"
+ "  .text : 0x%pK - 0x%pK  (%4td kB)\n"
+ "  .data : 0x%pK - 0x%pK  (%4td kB)\n"
+ "  .init : 0x%pK - 0x%pK  (%4td kB)\n"
+ "  .bss  : 0x%pK - 0x%pK  (%4td kB)\n"
+ "vmalloc : 0x%pK - 0x%pK  (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ "pkmap   : 0x%pK - 0x%pK  (%4ld MB)\n"
+#endif
+ "fixmap  : 0x%pK - 0x%pK  (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+   /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+   BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+   BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+   BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+   BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+   (unsigned long)high_memory);
+#endif
+   BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+   BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
+/*
  * Not static inline because used by IP27 special magic initialization code
  */
 void setup_zero_pages(void)
@@ -468,6 +514,7 @@ void __init mem_init(void)
free_all_bootmem();
setup_zero_pages(); /* Setup zeroed pages.  */
mem_init_free_highmem();
+   mem_print_kmap_info();
mem_init_print_info(NULL);
 
 #ifdef CONFIG_64BIT
-- 
2.12.0

[PATCH 08/14] MIPS: memblock: Simplify DMA contiguous reservation

2018-01-17 Thread Serge Semin

CMA reserves it areas in the memblock allocator. Since we aren't
using bootmem anymore, the reservations copying should be discarded.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 6df1eaf38..e0ca0d2bc 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -869,10 +869,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
-   /* Tell bootmem about cma reserved memblock section */
-   for_each_memblock(reserved, reg)
-   if (reg->size != 0)
-   reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
 }
 
 static void __init resource_init(void)
-- 
2.12.0

[PATCH 10/14] MIPS: memblock: Perform early low memory test

2018-01-17 Thread Serge Semin

Low memory can be tested at this point, since all the
reservations have just been finished without much of
additional allocations.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 82c6b77f6..b65047d85 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -873,6 +873,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_allow_resize();
 
memblock_dump_all();
+
+   early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
 }
 
 static void __init resource_init(void)
-- 
2.12.0

[PATCH 05/14] MIPS: memblock: Add reserved memory regions to memblock

2018-01-17 Thread Serge Semin

The memory reservation has to be performed for all the crucial
objects like kernel itself, it data and fdt blob. FDT reserved-memory
nodes should also be scanned to declare or discard reserved memory
regions, but it has to be done after the memblock is fully initialized
with low/high RAM (see the function description/code).

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 96 +-
 1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0ffbc3bb5..9e14d9833 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -362,6 +362,10 @@ static unsigned long __init init_initrd(void)
 static void __init bootmem_init(void)
 {
init_initrd();
+}
+
+static void __init reservation_init(void)
+{
finalize_initrd();
 }
 
@@ -478,54 +482,58 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+}
+
+static void __init reservation_init(void)
+{
+   phys_addr_t size;
+   int i;
 
/*
-* Register fully available low RAM pages with the bootmem allocator.
+* Reserve memory occupied by the kernel and it data
 */
-   for (i = 0; i < boot_mem_map.nr_map; i++) {
-   unsigned long start, end, size;
+   size = __pa_symbol(&_end) - __pa_symbol(&_text);
+   memblock_reserve(__pa_symbol(&_text), size);
 
-   start = PFN_UP(boot_mem_map.map[i].addr);
-   end   = PFN_DOWN(boot_mem_map.map[i].addr
-   + boot_mem_map.map[i].size);
+   /*
+* Handle FDT and it reserved-memory nodes now
+*/
+   early_init_fdt_reserve_self();
+   early_init_fdt_scan_reserved_mem();
 
-   /*
-* Reserve usable memory.
-*/
-   switch (boot_mem_map.map[i].type) {
-   case BOOT_MEM_RAM:
-   break;
-   case BOOT_MEM_INIT_RAM:
-   memory_present(0, start, end);
-   continue;
-   default:
-   /* Not usable memory */
-   if (start > min_low_pfn && end < max_low_pfn)
-   reserve_bootmem(boot_mem_map.map[i].addr,
-   boot_mem_map.map[i].size,
-   BOOTMEM_DEFAULT);
-   continue;
-   }
+   /*
+* Reserve requested memory ranges with the memblock allocator.
+*/
+   for (i = 0; i < boot_mem_map.nr_map; i++) {
+   phys_addr_t start, end;
 
-   /*
-* We are rounding up the start address of usable memory
-* and at the end of the usable range downwards.
-*/
-   if (start >= max_low_pfn)
+   if (boot_mem_map.map[i].type == BOOT_MEM_RAM)
continue;
-   if (end > max_low_pfn)
-   end = max_low_pfn;
+
+   start = boot_mem_map.map[i].addr;
+   end   = boot_mem_map.map[i].addr + boot_mem_map.map[i].size;
+   size  = boot_mem_map.map[i].size;
 
/*
-* ... finally, is the area going away?
+* Make sure the region isn't already reserved
 */
-   if (end <= start)
+   if (memblock_is_region_reserved(start, size)) {
+   pr_warn("Reserved region %08zx @ %pa already in-use\n",
+   (size_t)size, &start);
continue;
-   size = end - start;
+   }
 
-   /* Register lowmem ranges */
-   free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
-   memory_present(0, start, end);
+   switch (boot_mem_map.map[i].type) {
+   case BOOT_MEM_ROM_DATA:
+   case BOOT_MEM_RESERVED:
+   case BOOT_MEM_INIT_RAM:
+   memblock_reserve(start, size);
+   break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   default:
+   memblock_remove(start, size);
+   break;
+   }
}
 
 #ifdef CONFIG_RELOCATABLE
@@ -555,6 +563,12 @@ static void __init bootmem_init(void)
 * Reserve initrd memory if needed.
 */
finalize_initrd();
+
+   /*
+* Reserve for hibernation
+*/
+   size = __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin);
+   memblock_reserve(__pa_symbol(&__nosave_begin), size);
 }
 
 #endif /* CONFIG_SGI_IP27 */
@@ -569,6 +583,7 @@ static void __init bootmem_init(void)

[PATCH 07/14] MIPS: memblock: Mark present sparsemem sections

2018-01-17 Thread Serge Semin

If sparsemem is activated all sections with present pages must
be accordingly marked after memblock is fully initialized.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b121fa702..6df1eaf38 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -778,7 +778,7 @@ static void __init request_crashkernel(struct resource *res)
 
 static void __init arch_mem_init(char **cmdline_p)
 {
-   struct memblock_region *reg;
+   struct memblock_region *reg __maybe_unused;
extern void plat_mem_setup(void);
 
/* call board setup routine */
@@ -860,6 +860,11 @@ static void __init arch_mem_init(char **cmdline_p)
 crashk_res.end - crashk_res.start + 1);
 #endif
device_tree_init();
+#ifdef CONFIG_SPARSEMEM
+   for_each_memblock(memory, reg)
+   memory_present(0, memblock_region_memory_base_pfn(reg),
+   memblock_region_memory_end_pfn(reg));
+#endif /* CONFIG_SPARSEMEM */
sparse_init();
plat_swiotlb_setup();
 
-- 
2.12.0

[PATCH 04/14] MIPS: memblock: Discard bootmem initialization

2018-01-17 Thread Serge Semin

Since memblock is going to be used for the early memory allocation
lets discard the bootmem node setup and all the related free-space
search code. Low/high PFN extremums should be still calculated
since they are needed on the paging_init stage. Since the current
code is already doing memblock regions initialization the only thing
left is to set the upper allocation limit to be up to the max low
memory PFN, so the memblock API can be fully used from now.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 86 --
 1 file changed, 11 insertions(+), 75 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 1b8246e6c..0ffbc3bb5 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -367,29 +367,15 @@ static void __init bootmem_init(void)
 
 #else  /* !CONFIG_SGI_IP27 */
 
-static unsigned long __init bootmap_bytes(unsigned long pages)
-{
-   unsigned long bytes = DIV_ROUND_UP(pages, 8);
-
-   return ALIGN(bytes, sizeof(long));
-}
-
 static void __init bootmem_init(void)
 {
-   unsigned long reserved_end;
-   unsigned long mapstart = ~0UL;
-   unsigned long bootmap_size;
-   bool bootmap_valid = false;
int i;
 
/*
-* Sanity check any INITRD first. We don't take it into account
-* for bootmem setup initially, rely on the end-of-kernel-code
-* as our memory range starting point. Once bootmem is inited we
+* Sanity check any INITRD first. Once memblock is inited we
 * will reserve the area used for the initrd.
 */
init_initrd();
-   reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
 
/*
 * max_low_pfn is not a number of pages. The number of pages
@@ -428,16 +414,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
-   if (end <= reserved_end)
-   continue;
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Skip zones before initrd and initrd itself */
-   if (initrd_end && end <= (unsigned 
long)PFN_UP(__pa(initrd_end)))
-   continue;
-#endif
-   if (start >= mapstart)
-   continue;
-   mapstart = max(reserved_end, start);
}
 
if (min_low_pfn >= max_low_pfn)
@@ -463,53 +439,19 @@ static void __init bootmem_init(void)
 #endif
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
-
-#ifdef CONFIG_BLK_DEV_INITRD
-   /*
-* mapstart should be after initrd_end
-*/
-   if (initrd_end)
-   mapstart = max(mapstart, (unsigned 
long)PFN_UP(__pa(initrd_end)));
+#ifdef CONFIG_HIGHMEM
+   pr_info("PFNs: low min %lu, low max %lu, high start %lu, high end %lu,"
+   "max %lu\n",
+   min_low_pfn, max_low_pfn, highstart_pfn, highend_pfn, max_pfn);
+#else
+   pr_info("PFNs: low min %lu, low max %lu, max %lu\n",
+   min_low_pfn, max_low_pfn, max_pfn);
 #endif
 
/*
-* check that mapstart doesn't overlap with any of
-* memory regions that have been reserved through eg. DTB
-*/
-   bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
-
-   bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
-   bootmap_size);
-   for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
-   unsigned long mapstart_addr;
-
-   switch (boot_mem_map.map[i].type) {
-   case BOOT_MEM_RESERVED:
-   mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
-   boot_mem_map.map[i].size);
-   if (PHYS_PFN(mapstart_addr) < mapstart)
-   break;
-
-   bootmap_valid = memory_region_available(mapstart_addr,
-   bootmap_size);
-   if (bootmap_valid)
-   mapstart = PHYS_PFN(mapstart_addr);
-   break;
-   default:
-   break;
-   }
-   }
-
-   if (!bootmap_valid)
-   panic("No memory area to place a bootmap bitmap");
-
-   /*
-* Initialize the boot-time allocator with low memory only.
+* Initialize the boot-time allocator with low/high memory, but
+* set the allocation limit to low memory only
 */
-   if (bootmap_size != init_bootmem_node(NODE_DATA(0), mapstart,
-min_low_pfn, max_low_pfn))
-   panic("Unexpected memory size required for bootmap");
-
for (i = 0; i < boot_mem_map.nr_map; i++

[PATCH 02/14] MIPS: memblock: Surely map BSS kernel memory section

2018-01-17 Thread Serge Semin

The current MIPS code makes sure the kernel code/data/init
sections are in the maps, but BSS should also be there.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 76e9e2075..0d21c9e04 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
 PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
 BOOT_MEM_INIT_RAM);
+   arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+BOOT_MEM_RAM);
 
pr_info("Determined physical RAM map:\n");
print_memory_map();
-- 
2.12.0

[PATCH 03/14] MIPS: memblock: Reserve initrd memory in memblock

2018-01-17 Thread Serge Semin

There is no reserve_bootmem() method in the nobootmem interface,
so we need to replace it with memblock-specific one.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0d21c9e04..1b8246e6c 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -330,7 +330,7 @@ static void __init finalize_initrd(void)
 
maybe_bswap_initrd();
 
-   reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
+   memblock_reserve(__pa(initrd_start), size);
initrd_below_start_ok = 1;
 
pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
-- 
2.12.0

[PATCH 2/2] mips: mm: Discard ioremap_uncached_accelerated() method

2018-07-09 Thread Serge Semin

Adaptive ioremap_wc() method is now available (see "mips: mm:
Create UCA-based ioremap_wc() method" commit). We can use it for
UCA-featured MMIO transactions in the kernel, so we don't need
it platform clone ioremap_uncached_accelerated() being declard.
Seeing it is also unused anywhere in the kernel code, lets remove
it from io.h arch-specific header then.

Signed-off-by: Serge Semin 
Singed-off-by: Paul Burton 
Cc: James Hogan 
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: sta...@vger.kernel.org
---
 arch/mips/include/asm/io.h | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index babe5155a..360b7ddeb 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -301,15 +301,11 @@ static inline void __iomem * __ioremap_mode(phys_addr_t 
offset, unsigned long si
__ioremap_mode((offset), (size), boot_cpu_data.writecombine)
 
 /*
- * These two are MIPS specific ioremap variant. ioremap_cacheable_cow
- * requests a cachable mapping, ioremap_uncached_accelerated requests a
- * mapping using the uncached accelerated mode which isn't supported on
- * all processors.
+ * This is a MIPS specific ioremap variant. ioremap_cacheable_cow
+ * requests a cachable mapping with CWB attribute enabled.
  */
 #define ioremap_cacheable_cow(offset, size)\
__ioremap_mode((offset), (size), _CACHE_CACHABLE_COW)
-#define ioremap_uncached_accelerated(offset, size) \
-   __ioremap_mode((offset), (size), _CACHE_UNCACHED_ACCELERATED)
 
 static inline void iounmap(const volatile void __iomem *addr)
 {
-- 
2.12.0

[PATCH 1/2] mips: mm: Create UCA-based ioremap_wc() method

2018-07-09 Thread Serge Semin

Modern MIPS cores (like P5600/6600, M5150/6520, end so on) which
got L2-cache on chip also can enable a special type Cache-Coherency
attribute (CCA) named UnCached Accelerated attribute (UCA). In this
way uncached accelerated accesses are treated the same way as
non-accelerated uncached accesses, but uncached stores are gathered
together for more efficient bus utilization. So to speak this CCA
enables uncached transactions to better utilize bus bandwidth via
burst transactions.

This is exactly why ioremap_wc() method has been introduced in linux.
Alas MIPS-platform code hasn't implemented it so far, instead default
one has been used which was an alias to ioremap_nocache. In order to
fix this we added MIPS-specific ioremap_wc() macro substituted by
generic __ioremap_mode() method call with writecombine CPU-info
field passed. It shall create real ioremap_wc() method if CPU-cache
supports UCA feature and fall-back to _CACHE_UNCACHED attribute
if one doesn't. Additionally platform-specific io.h shall declare
ARCH_HAS_IOREMAP_WC macro as indication of architectural definition
of ioremap_wc() (similar to x86/powerpc).

Signed-off-by: Serge Semin 
Singed-off-by: Paul Burton 
Cc: James Hogan 
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: sta...@vger.kernel.org
---
 arch/mips/include/asm/io.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 4d709b61d..d4f8cdc58 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -12,6 +12,8 @@
 #ifndef _ASM_IO_H
 #define _ASM_IO_H
 
+#define ARCH_HAS_IOREMAP_WC
+
 #include 
 #include 
 #include 
@@ -278,6 +280,27 @@ static inline void __iomem * __ioremap_mode(phys_addr_t 
offset, unsigned long si
 #define ioremap_cache ioremap_cachable
 
 /*
+ * ioremap_wc -   map bus memory into CPU space
+ * @offset:bus address of the memory
+ * @size:  size of the resource to map
+ *
+ * ioremap_wc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked uncachable
+ * but accelerated by means of write-combining feature. It is specifically
+ * useful for PCIe prefetchable windows, which may vastly improve a
+ * communications performance. If it was determined on boot stage, what
+ * CPU CCA doesn't support UCA, the method shall fall-back to the
+ * _CACHE_UNCACHED option (see cpu_probe() method).
+ */
+#define ioremap_wc(offset, size)   \
+   __ioremap_mode((offset), (size), boot_cpu_data.writecombine)
+
+/*
  * These two are MIPS specific ioremap variant. ioremap_cacheable_cow
  * requests a cachable mapping, ioremap_uncached_accelerated requests a
  * mapping using the uncached accelerated mode which isn't supported on
-- 
2.12.0

Re: [PATCH 2/2] mips: mm: Discard ioremap_uncached_accelerated() method

2018-07-10 Thread Serge Semin

On Tue, Jul 10, 2018 at 09:15:17AM +0200, Mathieu Malaterre  
wrote:
> '
> On Mon, Jul 9, 2018 at 3:57 PM Serge Semin  wrote:
> >
> > Adaptive ioremap_wc() method is now available (see "mips: mm:
> > Create UCA-based ioremap_wc() method" commit). We can use it for
> > UCA-featured MMIO transactions in the kernel, so we don't need
> > it platform clone ioremap_uncached_accelerated() being declard.
> > Seeing it is also unused anywhere in the kernel code, lets remove
> > it from io.h arch-specific header then.
> >
> > Signed-off-by: Serge Semin 
> > Singed-off-by: Paul Burton 
> 
> nit: 'Signed' (on both patches)
> 

Good catch! Thanks. Didn't notice the typo. Should have copy-pasted
both the signature and the e-mail from another letter.

I'll fix it if there will be a second version of the patchset. Otherwise
I suppose it would be easier for the integrator to do this.

Regards,
-Sergey

> > Cc: James Hogan 
> > Cc: Ralf Baechle 
> > Cc: linux-m...@linux-mips.org
> > Cc: sta...@vger.kernel.org
> > ---
> >  arch/mips/include/asm/io.h | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
> > index babe5155a..360b7ddeb 100644
> > --- a/arch/mips/include/asm/io.h
> > +++ b/arch/mips/include/asm/io.h
> > @@ -301,15 +301,11 @@ static inline void __iomem * 
> > __ioremap_mode(phys_addr_t offset, unsigned long si
> > __ioremap_mode((offset), (size), boot_cpu_data.writecombine)
> >
> >  /*
> > - * These two are MIPS specific ioremap variant. 
> > ioremap_cacheable_cow
> > - * requests a cachable mapping, ioremap_uncached_accelerated requests a
> > - * mapping using the uncached accelerated mode which isn't supported on
> > - * all processors.
> > + * This is a MIPS specific ioremap variant. ioremap_cacheable_cow
> > + * requests a cachable mapping with CWB attribute enabled.
> >   */
> >  #define ioremap_cacheable_cow(offset, size)\
> > __ioremap_mode((offset), (size), _CACHE_CACHABLE_COW)
> > -#define ioremap_uncached_accelerated(offset, size) \
> > -   __ioremap_mode((offset), (size), _CACHE_UNCACHED_ACCELERATED)
> >
> >  static inline void iounmap(const volatile void __iomem *addr)
> >  {
> > --
> > 2.12.0
> >
> >

Re: [PATCH 2/2] mips: mm: Discard ioremap_uncached_accelerated() method

2018-07-10 Thread Serge Semin

On Tue, Jul 10, 2018 at 10:59:40AM -0700, Paul Burton  
wrote:
Hello Paul,

> Hi Sergey,
> 
> On Tue, Jul 10, 2018 at 10:48:15AM +0300, Serge Semin wrote:
> > On Tue, Jul 10, 2018 at 09:15:17AM +0200, Mathieu Malaterre 
> >  wrote:
> > > On Mon, Jul 9, 2018 at 3:57 PM Serge Semin  
> > > wrote:
> > > > Adaptive ioremap_wc() method is now available (see "mips: mm:
> > > > Create UCA-based ioremap_wc() method" commit). We can use it for
> > > > UCA-featured MMIO transactions in the kernel, so we don't need
> > > > it platform clone ioremap_uncached_accelerated() being declard.
> > > > Seeing it is also unused anywhere in the kernel code, lets remove
> > > > it from io.h arch-specific header then.
> > > >
> > > > Signed-off-by: Serge Semin 
> > > > Singed-off-by: Paul Burton 
> > > 
> > > nit: 'Signed' (on both patches)
> > 
> > Good catch! Thanks. Didn't notice the typo. Should have copy-pasted
> > both the signature and the e-mail from another letter.
> > 
> > I'll fix it if there will be a second version of the patchset. Otherwise
> > I suppose it would be easier for the integrator to do this.
> 
> I've fixed this up & applied these 2 patches with minor tweaks to
> mips-next for 4.19.
> 

Great! Thanks.

> However FYI for next time - you shouldn't really add someone else's
> Signed-off-by tag anyway. The tag effectively states that a person can
> agree to the Developer's Certificate of Origin for this patch (see
> Documentation/process/submitting-patches.rst), and you can't agree that
> on behalf of someone else. Generally a maintainer should add this tag
> for themselves when they apply a patch.
> 

I'm sorry if it seemed like I added Signed-off on your behalf. I thought
the Signed-off also concerns the ones, who participated in the patch
preparation. Since you suggested the design of the change, I've decided
to put your name in the Signed-off tag. What shall I use in this way
then?

> Anyway, I think we should reserve the Singed-off-by tag for patches that
> quell fires. ;)
> 
> Thanks,
> Paul

Regards,
-Sergey

Re: [PATCH 2/2] mips: mm: Discard ioremap_uncached_accelerated() method

2018-07-11 Thread Serge Semin

Paul,

On Tue, Jul 10, 2018 at 02:04:15PM -0700, Paul Burton  
wrote:
> Hi Serge,
> 
> On Tue, Jul 10, 2018 at 10:13:54PM +0300, Serge Semin wrote:
> > On Tue, Jul 10, 2018 at 10:59:40AM -0700, Paul Burton 
> >  wrote:
> > > However FYI for next time - you shouldn't really add someone else's
> > > Signed-off-by tag anyway. The tag effectively states that a person can
> > > agree to the Developer's Certificate of Origin for this patch (see
> > > Documentation/process/submitting-patches.rst), and you can't agree that
> > > on behalf of someone else. Generally a maintainer should add this tag
> > > for themselves when they apply a patch.
> > 
> > I'm sorry if it seemed like I added Signed-off on your behalf.
> 
> That's OK, I didn't think you did it maliciously :)
> 
> > I thought the Signed-off also concerns the ones, who participated in
> > the patch preparation. Since you suggested the design of the change,
> > I've decided to put your name in the Signed-off tag. What shall I use
> > in this way then?
> 
> In this case Suggested-by might have been a good choice. Reported-by is
> also commonly used if someone reported a problem which you created a fix
> for.
> 
> Section 13 of Documentation/process/submitting-patches.rst describes
> these tags along with a couple others.

I always thought of these tags as something more like a formality. In fact
this hasn't been my first patchset sent to the kernel e-mailing list.
Although all of the previous ones didn't involve someone else participating
in the changes development, except the reviewers of course. So I do aware
of all the tags mentioned in the doc. But as it turns out I didn't
fully understand their meaning. Main rule: most of the tags should not be
added without the permission, except more or less formal CC and Fixes ones.
Anyway thanks for the advice. Next time I'll be more careful with it.

Regards,
-Sergey

> 
> Thanks,
> Paul

Re: [PATCH 6/9] NTB: ntb_hw_idt: fix typo 'can by' to 'can be'

2018-05-11 Thread Serge Semin

On Sun, May 06, 2018 at 01:23:50PM +0200, Wolfram Sang 
 wrote:
> Signed-off-by: Wolfram Sang 
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 8d98872d0983b7..dbe72f116017ab 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -1401,7 +1401,7 @@ static int idt_ntb_peer_mw_clear_trans(struct ntb_dev 
> *ntb, int pidx,
>   *  5. Doorbell operations
>   *
>   *Doorbell functionality of IDT PCIe-switches is pretty unusual. First of
> - * all there is global doorbell register which state can by changed by any
> + * all there is global doorbell register which state can be changed by any
>   * NT-function of the IDT device in accordance with global permissions. These
>   * permissions configs are not supported by NTB API, so it must be done by
>   * either BIOS or EEPROM settings. In the same way the state of the global

Acked-by: Serge Semin 

> -- 
> 2.11.0
>

Re: [PATCH 0/8] Fix breakage caused by the NTB multi-port patchset

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:10PM -0600, Logan Gunthorpe  
wrote:

Good day, Logan.
Thanks for the patchset you submitted. My hopefully useful comments are
under the corresponding patches.

Regards,
-Sergey

> Hey,
> 
> Here are all the fixes required to get ntb_test on switchtec working
> again after the multi-port test patches were merged.
> 
> I'd appreciate it if future changes can be a) more careful about
> not breaking things, b) communicated more clearly so that better
> review can be done, and c) not merged until sufficient review actually
> is done.
> 
> Note, I sent the first patch in this series earlier; please disregard
> the earlier one.
> 
> Thanks,
> 
> Logan
> 
> Logan Gunthorpe (8):
>   NTB: ntb_tool: reading the link file should not end in a NULL byte
>   NTB: Setup the DMA mask globally for all drivers
>   NTB: Fix the default port and peer numbers for legacy drivers
>   NTB: ntb_pingpong: Choose doorbells based on port number
>   NTB: perf: Don't require one more memory window than number of peers
>   NTB: perf: Fix support for hardware that doesn't have port numbers
>   NTB: perf: Fix race condition when run with ntb_test
>   NTB: ntb_test: Fix bug when counting remote files
> 
>  drivers/ntb/hw/amd/ntb_hw_amd.c |  4 
>  drivers/ntb/hw/idt/ntb_hw_idt.c |  6 --
>  drivers/ntb/hw/intel/ntb_hw_intel.c |  4 
>  drivers/ntb/ntb.c   | 22 ++
>  drivers/ntb/test/ntb_perf.c | 22 +++---
>  drivers/ntb/test/ntb_pingpong.c | 14 ++
>  drivers/ntb/test/ntb_tool.c |  3 +--
>  tools/testing/selftests/ntb/ntb_test.sh |  2 +-
>  8 files changed, 41 insertions(+), 36 deletions(-)
> 
> --
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-1-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 1/8] NTB: ntb_tool: reading the link file should not end in a NULL byte

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:11PM -0600, Logan Gunthorpe  
wrote:
> When running ntb_test this warning is issued:
> 
> ./ntb_test.sh: line 200: warning: command substitution: ignored null
> byte in input
> 

This is weird. Neither me nor the folks' who tested the script saw this warning.
I tried it on my laptop with bash and on a target device with busybox-shell. The
warning never occurred. I even tried a simple command like:
[[ $(echo -ne "\x4e\x0a\00") == "N" ]] && echo "True"

It might be that your bash is more modern than mine. Anyway if this patch 
solves the
problem you see, that's great. Thanks for it.

-Sergey

> This is caused by the kernel returning one more byte than is necessary
> when reading the link file.
> 
> Reduce the number of bytes read back to 2 as it was before the
> commit that regressed this.
> 
> Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/ntb/test/ntb_tool.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
> index d592c0ffbd19..ec5cf095cdb9 100644
> --- a/drivers/ntb/test/ntb_tool.c
> +++ b/drivers/ntb/test/ntb_tool.c
> @@ -504,7 +504,7 @@ static ssize_t tool_peer_link_read(struct file *filep, 
> char __user *ubuf,
>   buf[1] = '\n';
>   buf[2] = '\0';
>  
> - return simple_read_from_buffer(ubuf, size, offp, buf, 3);
> + return simple_read_from_buffer(ubuf, size, offp, buf, 2);
>  }
>  
>  static TOOL_FOPS_RDWR(tool_peer_link_fops,
> @@ -1690,4 +1690,3 @@ static void __exit tool_exit(void)
>   debugfs_remove_recursive(tool_dbgfs_topdir);
>  }
>  module_exit(tool_exit);
> -
> -- 
> 2.11.0
>

Re: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:13PM -0600, Logan Gunthorpe  
wrote:
> When the commit adding ntb_default_port_number() and
> ntb_default_peer_port_number()  entered the kernel there was no
> users of it so it was impossible to tell what the API needed.
> 
> When a user finally landed a year later (ntb_pingpong) there were
> more NTB topologies were created and no consideration was considered
> to how other drivers had changed.
> 
> Now that there is a user it can be fixed to provide a sensible default
> for the legacy drivers that do not implement ntb_{peer_}port_number().
> Seeing ntb_pingpong doesn't check error codes returning EINVAL was also
> not sensible.
> 
> Patches for ntb_pingpong and ntb_perf follow (which are broken
> otherwise) to support hardware that doesn't have port numbers. This is
> important not only to not break support with existing drivers but for
> the cross link topology which, due to its perfect symmetry, cannot
> assign unique port numbers to each side.
> 
> Fixes: 1e5301196a88 ("NTB: Add indexed ports NTB API")
> Signed-off-by: Logan Gunthorpe 

As a part of the multi-port NTB API the port-index interface was freshly
introduced. The main idea was to somehow address local/peer domains within one
NTB device, since from now there can be more than one peer domain to send
message to or to set MWs up with. For this we invented the two-spaces interface
which mapped in general non-linear ports space to the locally linear ports
indexes space, and vise-versa. That mapping was implemented by new callbacks:
ntb_port*()/ntb_peer_port*().

Even though it perfectly fitted the IDT NTB functions, the Intel/AMD devices
didn't have explicit ports numbering. Instead we decided to assign the numbers
by using the topology type. So the Primary and B2B US sides got port
NTB_PORT_PRI_USD, Secondary and B2B DS sides got port NTB_PORT_SEC_DSD.
In order to make it being default for all pure two-ports devices like
Intel/AMD the new methods ntb_default_port_number() and
ntb_default_peer_port_number() were developed and utilized in the
ntb_port*()/ntb_peer_port*() API functions (see ntb.h header file).

So to speak the main purpose of the default methods is to assign some unique
port number to the NTB devices based on the topology at current implementation.
Please note, that it is essential for the NTB API to have each port uniquely
enumerated within one device. This is the way the multi-port NTB API has been
designed in the first place. That was the reason we altered the Intel/AMD and
IDT drivers about two years ago.

Based on this I redeveloped the ntb_tool/ntb_perf/ntb_pingpong drivers.
Needless to say that I was sure all the NTB devices followed the API convention
regarding the port numbers. Since the Switchtec driver doesn't provide the
explicit port-index API callbacks, the NTB API internals uses the default
methods, which as you can see don't know anything about SWITCH and CROSSLINK
topologies. That's why the methods return -EINVAL so the test drivers don't
work properly.

Concerning the fix of the discovered issues and fixes introduced by this
patchset. I'd suggest to add the ports-index callbacks to the Switchtec
driver, which identify local and peer ports. After this the current version
of all the test drivers shall perfectly work.

As far as I can see the PFX family switches documentation operates with the
definitions like Ports/Partitions (similar to the IDT switches) as well as
the switchtec management driver. It might be a clue to the switch functionality,
which can be used to find something similar to the ports numbering.

Regards,
-Sergey

> ---
>  drivers/ntb/ntb.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
> index 93f24440d11d..d955a92a095a 100644
> --- a/drivers/ntb/ntb.c
> +++ b/drivers/ntb/ntb.c
> @@ -225,10 +225,8 @@ int ntb_default_port_number(struct ntb_dev *ntb)
>   case NTB_TOPO_B2B_DSD:
>   return NTB_PORT_SEC_DSD;
>   default:
> - break;
> + return 0;
>   }
> -
> - return -EINVAL;
>  }
>  EXPORT_SYMBOL(ntb_default_port_number);
>  
> @@ -251,10 +249,8 @@ int ntb_default_peer_port_number(struct ntb_dev *ntb, 
> int pidx)
>   case NTB_TOPO_B2B_DSD:
>   return NTB_PORT_PRI_USD;
>   default:
> - break;
> + return 0;
>   }
> -
> - return -EINVAL;
>  }
>  EXPORT_SYMBOL(ntb_default_peer_port_number);
>  
> @@ -326,4 +322,3 @@ static void __exit ntb_driver_exit(void)
>   bus_unregister(&ntb_bus);
>  }
>  module_exit(ntb_driver_exit);
> -
> -- 
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups

Re: [PATCH 4/8] NTB: ntb_pingpong: Choose doorbells based on port number

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:14PM -0600, Logan Gunthorpe  
wrote:
> This commit fixes pingpong support for existing drivers that do not
> implement ntb_default_port_number() and ntb_default_peer_port_number().
> This is required for hardware (like the crosslink topology of
> switchtec) which cannot assign reasonable port numbers to each port due
> to its perfect symmetry.
> 
> Instead of picking the doorbell to use based on the the index of the
> peer, we use the peer's port number. This is a bit clearer and easier
> to understand.
> 

Thanks for the patch. It was the original version of the ping-pong driver,
I was going to submit. But I've decided to develop it a bit different. And
here is why.

My goal was to create the multi-port version of the ping-pong test.
The idea of the new driver was to implement the cyclic port-to-port
ping-pong algorithm. Simply speaking each port selects two partner-ports,
one partner would be used as the source of pings and another one would be
target of pongs sent to with the defined delay.

Since IDT got a global Doorbell register, which is shared between all
the ports, I had to assign an unique doorbell bit to each port. I created a
simple algorithm, which linearised in general non-linear port numbers.
Then I used the globally unique port index to select the corresponding
doorbell bit. pp_init_flds() methods implements the corresponding algorithm,
while pp_find_next_peer() performs the next port selection to convey the
pong to.

Regarding the patch. The idea of using the port number instead of linearised
unique index should also work for Intel/AMD/IDT drivers. But the ports-space
linearization algorithm was created for the case if the real port numbers
would exceed the available Doorbell bits. I thought this might be the case of
multi-ports version of the switchtec driver.

Needless to say, that if Switchtec driver had the ports-index API 
implementation,
this patch wouldn't be needed.

Regards,
-Sergey

> Fixes: c7aeb0afdcc2 ("NTB: ntb_pp: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/ntb/test/ntb_pingpong.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/ntb/test/ntb_pingpong.c b/drivers/ntb/test/ntb_pingpong.c
> index 65865e460ab8..18d00eec7b02 100644
> --- a/drivers/ntb/test/ntb_pingpong.c
> +++ b/drivers/ntb/test/ntb_pingpong.c
> @@ -121,15 +121,14 @@ static int pp_find_next_peer(struct pp_ctx *pp)
>   link = ntb_link_is_up(pp->ntb, NULL, NULL);
>  
>   /* Find next available peer */
> - if (link & pp->nmask) {
> + if (link & pp->nmask)
>   pidx = __ffs64(link & pp->nmask);
> - out_db = BIT_ULL(pidx + 1);
> - } else if (link & pp->pmask) {
> + else if (link & pp->pmask)
>   pidx = __ffs64(link & pp->pmask);
> - out_db = BIT_ULL(pidx);
> - } else {
> + else
>   return -ENODEV;
> - }
> +
> + out_db = BIT_ULL(ntb_peer_port_number(pp->ntb, pidx));
>  
>   spin_lock(&pp->lock);
>   pp->out_pidx = pidx;
> @@ -303,7 +302,7 @@ static void pp_init_flds(struct pp_ctx *pp)
>   break;
>   }
>  
> - pp->in_db = BIT_ULL(pidx);
> + pp->in_db = BIT_ULL(lport);
>   pp->pmask = GENMASK_ULL(pidx, 0) >> 1;
>   pp->nmask = GENMASK_ULL(pcnt - 1, pidx);
>  
> @@ -435,4 +434,3 @@ static void __exit pp_exit(void)
>   debugfs_remove_recursive(pp_dbgfs_topdir);
>  }
>  module_exit(pp_exit);
> -
> -- 
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-5-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 5/8] NTB: perf: Don't require one more memory window than number of peers

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:15PM -0600, Logan Gunthorpe  
wrote:
> ntb_perf should not require more than one memory window per peer. This
> was probably an off-by-one error.
> 

Good catch. Thanks. IDT got a lot of MWs especially if LookUpTables are
enabled. That's why I didn't find the effect of this error.

Regards,
-Sergey

> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/ntb/test/ntb_perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 2a9d6b0d1f19..fe27412ffe91 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -655,7 +655,7 @@ static int perf_init_service(struct perf_ctx *perf)
>  {
>   u64 mask;
>  
> - if (ntb_peer_mw_count(perf->ntb) < perf->pcnt + 1) {
> + if (ntb_peer_mw_count(perf->ntb) < perf->pcnt) {
>   dev_err(&perf->ntb->dev, "Not enough memory windows\n");
>   return -EINVAL;
>   }
> -- 
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-6-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 6/8] NTB: perf: Fix support for hardware that doesn't have port numbers

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:17PM -0600, Logan Gunthorpe  
wrote:
> Legacy drivers do not have port numbers (but is reliably only two ports)
> and was broken by the recent commit that added mult-port support to
> ntb_perf. This is especially important to support the cross link
> topology which is perfectly symmetric and cannot assign unique port
> numbers easily.
> 

Please, see the comment to the patch 3/8. I explained everything there
including the fact, that the Intel/AMD drivers do have unique port numbers
assigned.

Regards,
-Sergey

> Hardware that returns zero for both the local port and the peer should
> just always use gidx=0 for the only peer.
> 
> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/ntb/test/ntb_perf.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index fe27412ffe91..6285cb8515ac 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -1417,6 +1417,16 @@ static int perf_init_peers(struct perf_ctx *perf)
>   if (perf->gidx == -1)
>   perf->gidx = pidx;
>  
> + /*
> +  * Hardware with only two ports may not have unique port
> +  * numbers. In this case, the gidxs should all be zero.
> +  */
> + if (perf->pcnt == 1 &&  ntb_port_number(perf->ntb) == 0 &&
> + ntb_peer_port_number(perf->ntb, 0) == 0) {
> + perf->gidx = 0;
> + perf->peers[0].gidx = 0;
> + }
> +
>   for (pidx = 0; pidx < perf->pcnt; pidx++) {
>   ret = perf_setup_peer_mw(&perf->peers[pidx]);
>   if (ret)
> -- 
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-8-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 7/8] NTB: perf: Fix race condition when run with ntb_test

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:18PM -0600, Logan Gunthorpe  
wrote:
> When running ntb_test, the script tries to run the ntb_perf test
> immediately after probing the modules. Since adding multi-port support,
> this fails seeing the new initialization procedure in ntb_perf
> can not complete instantly.
> 
> To fix this we add a completion which is waited on when a test is
> started. In this way, run can be written any time after the module is
> loaded and it will wait for the initialization to complete instead of
> sending an error.
> 

Hmm, this behavior is the feature of the driver and isn't a bug or race to be
fixed. ntb_perf driver returns -ENOLINK until the link is actually established,
when the memory windows are properly initialized so the test can be performed.
What do you think of leaving the algorithm as is, but instead to develop
the polling scheme in the ntb_test.sh script and break the script execution if
the link isn't established after sometime? At least we won't need to wait 
forever
in case if the peer hanged up or crashed while the NTB link negotiation 
algorithm
was in-progress.

Regards,
-Sergey

> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe 
> ---
>  drivers/ntb/test/ntb_perf.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 6285cb8515ac..8e2b7630ecc9 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -158,6 +158,8 @@ struct perf_peer {
>   /* NTB connection setup service */
>   struct work_struct  service;
>   unsigned long   sts;
> +
> + struct completion init_comp;
>  };
>  #define to_peer_service(__work) \
>   container_of(__work, struct perf_peer, service)
> @@ -549,6 +551,7 @@ static int perf_setup_outbuf(struct perf_peer *peer)
>  
>   /* Initialization is finally done */
>   set_bit(PERF_STS_DONE, &peer->sts);
> + complete_all(&peer->init_comp);
>  
>   return 0;
>  }
> @@ -639,6 +642,7 @@ static void perf_service_work(struct work_struct *work)
>   perf_setup_outbuf(peer);
>  
>   if (test_and_clear_bit(PERF_CMD_CLEAR, &peer->sts)) {
> + init_completion(&peer->init_comp);
>   clear_bit(PERF_STS_DONE, &peer->sts);
>   if (test_bit(0, &peer->perf->busy_flag) &&
>   peer == peer->perf->test_peer) {
> @@ -1046,8 +1050,9 @@ static int perf_submit_test(struct perf_peer *peer)
>   struct perf_thread *pthr;
>   int tidx, ret;
>  
> - if (!test_bit(PERF_STS_DONE, &peer->sts))
> - return -ENOLINK;
> + ret = wait_for_completion_interruptible(&peer->init_comp);
> + if (ret < 0)
> + return ret;
>  
>   if (test_and_set_bit_lock(0, &perf->busy_flag))
>   return -EBUSY;
> @@ -1413,6 +1418,7 @@ static int perf_init_peers(struct perf_ctx *perf)
>   peer->gidx = pidx;
>   }
>   INIT_WORK(&peer->service, perf_service_work);
> + init_completion(&peer->init_comp);
>   }
>   if (perf->gidx == -1)
>   perf->gidx = pidx;
> -- 
> 2.11.0
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to linux-ntb+unsubscr...@googlegroups.com.
> To post to this group, send email to linux-...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-9-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 8/8] NTB: ntb_test: Fix bug when counting remote files

2018-06-15 Thread Serge Semin

On Fri, Jun 08, 2018 at 06:08:19PM -0600, Logan Gunthorpe  
wrote:
> When remote files are counted in get_files_count, without using SSH,
> the code returns 0 because there is a colon prepended to $LOC. $VPATH
> should have been used instead of $LOC.
> 

Good catch. Thanks for the patch. I discovered this problem myself a few days
before you sent this patchset. So was going to submit the fix, but you were
faster.

I also tested this script in the looped-back setup. It is the case when two
NTB-device ports are available at the same RootComplex. So the NTB can be
configured from the single executional context. In this case the REMOTE_HOST is 
left
empty, so the colon is left prepended to the corresponding paths and causes 
multiple
errors including the one fixed by this patch. In order to fix it, we need to 
discard
the colon for remote-less case, for instance, by the next patch:

@@ -482,7 +495,11 @@ function perf_test()
 function ntb_tool_tests()
 {
LOCAL_TOOL="$DEBUGFS/ntb_tool/$LOCAL_DEV"
-   REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
+   if [[ "${REMOTE_HOST}" != "" ]]; then
+   REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
+   else
+   REMOTE_TOOL="$DEBUGFS/ntb_tool/$REMOTE_DEV"
+   fi

echo "Starting ntb_tool tests..."

And so on for REMOTE_PP and REMOTE_PERF. It is necessary for NTB devices, which 
ports
are looped-back to the same Root-Port. Would you be amenable if you resent this 
patch
together with the fix I suggested?

Regards,
-Sergey

> Fixes: 06bd0407d06c ("NTB: ntb_test: Update ntb_tool Scratchpad tests")
> Signed-off-by: Logan Gunthorpe 
> ---
>  tools/testing/selftests/ntb/ntb_test.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/ntb/ntb_test.sh 
> b/tools/testing/selftests/ntb/ntb_test.sh
> index 08cbfbbc7029..17ca36403d04 100755
> --- a/tools/testing/selftests/ntb/ntb_test.sh
> +++ b/tools/testing/selftests/ntb/ntb_test.sh
> @@ -250,7 +250,7 @@ function get_files_count()
>   split_remote $LOC
>  
>   if [[ "$REMOTE" == "" ]]; then
> - echo $(ls -1 "$LOC"/${NAME}* 2>/dev/null | wc -l)
> + echo $(ls -1 "$VPATH"/${NAME}* 2>/dev/null | wc -l)
>   else
>   echo $(ssh "$REMOTE" "ls -1 \"$VPATH\"/${NAME}* | \
>  wc -l" 2> /dev/null)
> -- 
> 2.11.0
>

[PATCH v2 0/4] ntb: idt: Add hwmon temperature sensor interface

2018-07-17 Thread Serge Semin

IDT PCIe-switches are equipped with an embedded temperature sensor. It
works within the range [0; 127.5]C with a resolution of 0.5C. It can
be used to monitor the chip core temperature so to have prevent it from
possible overheating. It might be very topical for the chip, since it
gets heated like in hell especially if ASPM isn't enabled.

Other than the current sampled temperatur, the sensor interface exposes
history registors with lowest and highest measured temperature, thresholds
and alarm IRQs enabled/disable bits, ADC/filter settings. The device manual
states that the switch is able to generate a msi interrupt on PCIe upstreams
if the temperature crosses one of three configurable thresholds. But in
practice we discovered that the enable/disable threshold IRQs bits interface
is very broken (see the third patch commit message), so it can't be used
to create the hwmon alarm interface. As the result we had to remove the
already available temperature sensor IRQ handler and disable the corresponding
interrupt.

Current version of the driver provides following standard hwmon sysfs
files: temperature input, lowest and highest measured temperature
with possibility to reset the history, temperature offset. The rest of the
nodes can't be safely implemented for the chip due to the described issues.

Changelog v2:
- Add "select HWMON" to the NTB_IDT kconfig

Signed-off-by: Serge Semin 

Serge Semin (4):
  ntb: idt: Alter temperature read method
  ntb: idt: Add basic hwmon sysfs interface
  ntb: idt: Discard temperature sensor IRQ handler
  ntb: idt: Alter the driver info comments

 drivers/ntb/hw/idt/Kconfig  |   4 +-
 drivers/ntb/hw/idt/ntb_hw_idt.c | 317 ++--
 drivers/ntb/hw/idt/ntb_hw_idt.h |  87 ++-
 3 files changed, 353 insertions(+), 55 deletions(-)

-- 
2.12.0

[PATCH v2 2/4] ntb: idt: Add basic hwmon sysfs interface

2018-07-17 Thread Serge Semin

IDT PCIe switches provide an embedded temperature sensor working
within [0; 127.5]C with resolution of 0.5C. They also can generate
a PCIe upstream interrupt in case if the temperature passes through
specified thresholds. Since this thresholds interface is very broken
the created hwmon-sysfs interface exposes only the next set of hwmon
nodes: current input temperature, lowest and highest values measured,
history resetting, value offset. HWmon alarm interface isn't provided.

IDT PCIe switch also've got an ADC/filter settings of the sensor.
This driver doesn't expose them to the hwmon-sysfs interface at the
moment, except the offset node.

Signed-off-by: Serge Semin 
---

Changelog v2:
- Add "select HWMON" to the NTB_IDT kconfig

 drivers/ntb/hw/idt/Kconfig  |   1 +
 drivers/ntb/hw/idt/ntb_hw_idt.c | 182 
 drivers/ntb/hw/idt/ntb_hw_idt.h |  24 +-
 3 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/hw/idt/Kconfig b/drivers/ntb/hw/idt/Kconfig
index b360e5613b9f..2ed147368fa8 100644
--- a/drivers/ntb/hw/idt/Kconfig
+++ b/drivers/ntb/hw/idt/Kconfig
@@ -1,6 +1,7 @@
 config NTB_IDT
tristate "IDT PCIe-switch Non-Transparent Bridge support"
depends on PCI
+   select HWMON
help
 This driver supports NTB of cappable IDT PCIe-switches.
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index c086ae5c601c..f12088c6a92d 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -49,11 +49,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "ntb_hw_idt.h"
@@ -1925,6 +1928,153 @@ static void idt_read_temp(struct idt_ntb_dev *ndev,
 }
 
 /*
+ * idt_write_temp() - write temperature to the chip sensor register
+ * @ntb:   NTB device context.
+ * @type:  IN - type of the temperature value to change
+ * @val:   IN - integer value of temperature in millidegree Celsius
+ */
+static void idt_write_temp(struct idt_ntb_dev *ndev,
+  const enum idt_temp_val type, const long val)
+{
+   unsigned int reg;
+   u32 data;
+   u8 fmt;
+
+   /* Retrieve the properly formatted temperature value */
+   fmt = idt_temp_get_fmt(val);
+
+   mutex_lock(&ndev->hwmon_mtx);
+   switch (type) {
+   case IDT_TEMP_LOW:
+   reg = IDT_SW_TMPALARM;
+   data = SET_FIELD(TMPALARM_LTEMP, idt_sw_read(ndev, reg), fmt) &
+   ~IDT_TMPALARM_IRQ_MASK;
+   break;
+   case IDT_TEMP_HIGH:
+   reg = IDT_SW_TMPALARM;
+   data = SET_FIELD(TMPALARM_HTEMP, idt_sw_read(ndev, reg), fmt) &
+   ~IDT_TMPALARM_IRQ_MASK;
+   break;
+   case IDT_TEMP_OFFSET:
+   reg = IDT_SW_TMPADJ;
+   data = SET_FIELD(TMPADJ_OFFSET, idt_sw_read(ndev, reg), fmt);
+   break;
+   default:
+   goto inval_spin_unlock;
+   }
+
+   idt_sw_write(ndev, reg, data);
+
+inval_spin_unlock:
+   mutex_unlock(&ndev->hwmon_mtx);
+}
+
+/*
+ * idt_sysfs_show_temp() - printout corresponding temperature value
+ * @dev:   Pointer to the NTB device structure
+ * @da:Sensor device attribute structure
+ * @buf:   Buffer to print temperature out
+ *
+ * Return: Number of written symbols or negative error
+ */
+static ssize_t idt_sysfs_show_temp(struct device *dev,
+  struct device_attribute *da, char *buf)
+{
+   struct sensor_device_attribute *attr = to_sensor_dev_attr(da);
+   struct idt_ntb_dev *ndev = dev_get_drvdata(dev);
+   enum idt_temp_val type = attr->index;
+   long mdeg;
+
+   idt_read_temp(ndev, type, &mdeg);
+   return sprintf(buf, "%ld\n", mdeg);
+}
+
+/*
+ * idt_sysfs_set_temp() - set corresponding temperature value
+ * @dev:   Pointer to the NTB device structure
+ * @da:Sensor device attribute structure
+ * @buf:   Buffer to print temperature out
+ * @count: Size of the passed buffer
+ *
+ * Return: Number of written symbols or negative error
+ */
+static ssize_t idt_sysfs_set_temp(struct device *dev,
+ struct device_attribute *da, const char *buf,
+ size_t count)
+{
+   struct sensor_device_attribute *attr = to_sensor_dev_attr(da);
+   struct idt_ntb_dev *ndev = dev_get_drvdata(dev);
+   enum idt_temp_val type = attr->index;
+   long mdeg;
+   int ret;
+
+   ret = kstrtol(buf, 10, &mdeg);
+   if (ret)
+   return ret;
+
+   /* Clamp the passed value in accordance with the type */
+   if (type == IDT_TEMP_OFFSET)
+   mdeg = clamp_val(mdeg, IDT_TEMP_MIN_OFFSET,
+

[PATCH v2 4/4] ntb: idt: Alter the driver info comments

2018-07-17 Thread Serge Semin

Since IDT PCIe-switch temperature sensor is now always available
irregardless of the EEPROM/BIOS settings, Kconfig and in-code
description should be properly altered. In addition lets update
the driver copyright lines.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/Kconfig  |  4 +---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 11 ++-
 drivers/ntb/hw/idt/ntb_hw_idt.h |  2 +-
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/ntb/hw/idt/Kconfig b/drivers/ntb/hw/idt/Kconfig
index 2ed147368fa8..f8948cf515ce 100644
--- a/drivers/ntb/hw/idt/Kconfig
+++ b/drivers/ntb/hw/idt/Kconfig
@@ -24,9 +24,7 @@ config NTB_IDT
 BAR settings of peer NT-functions, the BAR setups can't be done over
 kernel PCI fixups. That's why the alternative pre-initialization
 techniques like BIOS using SMBus interface or EEPROM should be
-utilized. Additionally if one needs to have temperature sensor
-information printed to system log, the corresponding registers must
-be initialized within BIOS/EEPROM as well.
+utilized.
 
 If unsure, say N.
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 55321086d59a..8706b8e0864a 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -4,7 +4,7 @@
  *
  *   GPL LICENSE SUMMARY
  *
- *   Copyright (C) 2016 T-Platforms All Rights Reserved.
+ *   Copyright (C) 2016-2018 T-Platforms JSC All Rights Reserved.
  *
  *   This program is free software; you can redistribute it and/or modify it
  *   under the terms and conditions of the GNU General Public License,
@@ -1824,10 +1824,11 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, 
int pidx, int midx,
  *  7. Temperature sensor operations
  *
  *IDT PCIe-switch has an embedded temperature sensor, which can be used to
- * warn a user-space of possible chip overheating. Since workload temperature
- * can be different on different platforms, temperature thresholds as well as
- * general sensor settings must be setup in the framework of BIOS/EEPROM
- * initializations. It includes the actual sensor enabling as well.
+ * check current chip core temperature. Since a workload environment can be
+ * different on different platforms, an offset and ADC/filter settings can be
+ * specified. Although the offset configuration is only exposed to the sysfs
+ * hwmon interface at the moment. The rest of the settings can be adjusted
+ * for instance by the BIOS/EEPROM firmware.
  *=
  */
 
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.h b/drivers/ntb/hw/idt/ntb_hw_idt.h
index 3517cd2e2baa..2f1aa121b0cf 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.h
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.h
@@ -4,7 +4,7 @@
  *
  *   GPL LICENSE SUMMARY
  *
- *   Copyright (C) 2016 T-Platforms All Rights Reserved.
+ *   Copyright (C) 2016-2018 T-Platforms JSC All Rights Reserved.
  *
  *   This program is free software; you can redistribute it and/or modify it
  *   under the terms and conditions of the GNU General Public License,
-- 
2.12.0

[PATCH v2 1/4] ntb: idt: Alter temperature read method

2018-07-17 Thread Serge Semin

In order to create a hwmon interface for the IDT PCIe-switch temperature
sensor the already available reader method should be improved. Particularly
we need to redesign it so one would be able to read temperature/offset
values from registers of the passed types. Since IDT sensor interface
provides temperature in unsigned format 0:7:1 (7 bits for real value
and one for fraction) we also need to have helpers for the typical sysfs
temperature data type conversion to and from this format. Even though
the IDT PCIe-switch provided temperature offset got the same but signed
type it can be translated by these methods too.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 113 ++--
 drivers/ntb/hw/idt/ntb_hw_idt.h |  56 
 2 files changed, 152 insertions(+), 17 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index dbe72f116017..c086ae5c601c 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -1829,22 +1829,99 @@ static int idt_ntb_peer_msg_write(struct ntb_dev *ntb, 
int pidx, int midx,
  */
 
 /*
+ * idt_get_deg() - convert millidegree Celsius value to just degree
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: Degree corresponding to the passed millidegree value
+ */
+static inline s8 idt_get_deg(long mdegC)
+{
+   return mdegC / 1000;
+}
+
+/*
+ * idt_get_frac() - retrieve 0/0.5 fraction of the millidegree Celsius value
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: 0/0.5 degree fraction of the passed millidegree value
+ */
+static inline u8 idt_get_deg_frac(long mdegC)
+{
+   return (mdegC % 1000) >= 500 ? 5 : 0;
+}
+
+/*
+ * idt_get_temp_fmt() - convert millidegree Celsius value to 0:7:1 format
+ * @mdegC: IN - millidegree Celsius value
+ *
+ * Return: 0:7:1 format acceptable by the IDT temperature sensor
+ */
+static inline u8 idt_temp_get_fmt(long mdegC)
+{
+   return (idt_get_deg(mdegC) << 1) | (idt_get_deg_frac(mdegC) ? 1 : 0);
+}
+
+/*
+ * idt_get_temp_sval() - convert temp sample to signed millidegree Celsius
+ * @data:  IN - shifted to LSB 8-bits temperature sample
+ *
+ * Return: signed millidegree Celsius
+ */
+static inline long idt_get_temp_sval(u32 data)
+{
+   return ((s8)data / 2) * 1000 + (data & 0x1 ? 500 : 0);
+}
+
+/*
+ * idt_get_temp_sval() - convert temp sample to unsigned millidegree Celsius
+ * @data:  IN - shifted to LSB 8-bits temperature sample
+ *
+ * Return: unsigned millidegree Celsius
+ */
+static inline long idt_get_temp_uval(u32 data)
+{
+   return (data / 2) * 1000 + (data & 0x1 ? 500 : 0);
+}
+
+/*
  * idt_read_temp() - read temperature from chip sensor
  * @ntb:   NTB device context.
- * @val:   OUT - integer value of temperature
- * @frac:  OUT - fraction
+ * @type:  IN - type of the temperature value to read
+ * @val:   OUT - integer value of temperature in millidegree Celsius
  */
-static void idt_read_temp(struct idt_ntb_dev *ndev, unsigned char *val,
- unsigned char *frac)
+static void idt_read_temp(struct idt_ntb_dev *ndev,
+ const enum idt_temp_val type, long *val)
 {
u32 data;
 
-   /* Read the data from TEMP field of the TMPSTS register */
-   data = idt_sw_read(ndev, IDT_SW_TMPSTS);
-   data = GET_FIELD(TMPSTS_TEMP, data);
-   /* TEMP field has one fractional bit and seven integer bits */
-   *val = data >> 1;
-   *frac = ((data & 0x1) ? 5 : 0);
+   /* Alter the temperature field in accordance with the passed type */
+   switch (type) {
+   case IDT_TEMP_CUR:
+   data = GET_FIELD(TMPSTS_TEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_LOW:
+   data = GET_FIELD(TMPSTS_LTEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_HIGH:
+   data = GET_FIELD(TMPSTS_HTEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   case IDT_TEMP_OFFSET:
+   /* This is the only field with signed 0:7:1 format */
+   data = GET_FIELD(TMPADJ_OFFSET,
+idt_sw_read(ndev, IDT_SW_TMPADJ));
+   *val = idt_get_temp_sval(data);
+   return;
+   default:
+   data = GET_FIELD(TMPSTS_TEMP,
+idt_sw_read(ndev, IDT_SW_TMPSTS));
+   break;
+   }
+
+   /* The rest of the fields accept unsigned 0:7:1 format */
+   *val = idt_get_temp_uval(data);
 }
 
 /*
@@ -1860,10 +1937,10 @@ static void idt_read_temp(struct idt_ntb_dev *ndev, 
unsigned char *val,
  */
 static void idt_temp_isr(struct idt_ntb_dev *ndev, u32 ntint_sts)
 {
-   unsigned char val, frac;
+   unsigned long mdeg;
 
/* R

[PATCH v2 3/4] ntb: idt: Discard temperature sensor IRQ handler

2018-07-17 Thread Serge Semin

IDT PCIe-switch temperature sensor interface is very broken. First
of all only a few combinations of TMPCTL threshold enable bits
really cause the interrupts unmasked. Even if an individual bit
indicates the event unmasked, corresponding IRQ just isn't generated.
Most of the threshold enable bits combinations are in fact useless and
non of them can help to create a fully functional alarm interface.
So to speak, we can't create a well defined hwmon alarms based on
the IDT PCI-switch threshold IRQs.

Secondly a single threshold IRQ (not a combination of thresholds) can
be successfully enabled without the issue described above. But in this
case we experienced an enormous number of interrupts generated by
the chip if the temperature got near the enabled threshold value. Filter
adjustment didn't help much. It also doesn't provide a hysteresis settings.
Due to the temperature sample fluctuations near the threshold the
interrupts spate makes the system nearly unusable until the temperature
value finally settled so being pushed either to be fully higher or lower
the threshold.

All of these issues makes the temperature sensor alarm interface useless
and even at some point dangerous to be used in the driver. In this case
it is safer to completely discard it and disable the temperature alarm
interrupts.

Signed-off-by: Serge Semin 
---
 drivers/ntb/hw/idt/ntb_hw_idt.c | 41 +
 drivers/ntb/hw/idt/ntb_hw_idt.h |  5 ++---
 2 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index f12088c6a92d..55321086d59a 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2075,38 +2075,6 @@ static struct attribute *idt_temp_attrs[] = {
 ATTRIBUTE_GROUPS(idt_temp);
 
 /*
- * idt_temp_isr() - temperature sensor alarm events ISR
- * @ndev:  IDT NTB hardware driver descriptor
- * @ntint_sts: NT-function interrupt status
- *
- * It handles events of temperature crossing alarm thresholds. Since reading
- * of TMPALARM register clears it up, the function doesn't analyze the
- * read value, instead the current temperature value just warningly printed to
- * log.
- * The method is called from PCIe ISR bottom-half routine.
- */
-static void idt_temp_isr(struct idt_ntb_dev *ndev, u32 ntint_sts)
-{
-   unsigned long mdeg;
-
-   /* Read the current temperature value */
-   idt_read_temp(ndev, IDT_TEMP_CUR, &mdeg);
-
-   /* Read the temperature alarm to clean the alarm status out */
-   /*(void)idt_sw_read(ndev, IDT_SW_TMPALARM);*/
-
-   /* Clean the corresponding interrupt bit */
-   idt_nt_write(ndev, IDT_NT_NTINTSTS, IDT_NTINTSTS_TMPSENSOR);
-
-   dev_dbg(&ndev->ntb.pdev->dev,
-   "Temp sensor IRQ detected %#08x", ntint_sts);
-
-   /* Print temperature value to log */
-   dev_warn(&ndev->ntb.pdev->dev, "Temperature %hhd.%hhuC",
-   idt_get_deg(mdeg), idt_get_deg_frac(mdeg));
-}
-
-/*
  * idt_init_temp() - initialize temperature sensor interface
  * @ndev:  IDT NTB hardware driver descriptor
  *
@@ -2188,7 +2156,7 @@ static int idt_init_isr(struct idt_ntb_dev *ndev)
goto err_free_vectors;
}
 
-   /* Unmask Message/Doorbell/SE/Temperature interrupts */
+   /* Unmask Message/Doorbell/SE interrupts */
ntint_mask = idt_nt_read(ndev, IDT_NT_NTINTMSK) & ~IDT_NTINTMSK_ALL;
idt_nt_write(ndev, IDT_NT_NTINTMSK, ntint_mask);
 
@@ -2203,7 +2171,6 @@ static int idt_init_isr(struct idt_ntb_dev *ndev)
return ret;
 }
 
-
 /*
  * idt_deinit_ist() - deinitialize PCIe interrupt handler
  * @ndev:  IDT NTB hardware driver descriptor
@@ -2264,12 +2231,6 @@ static irqreturn_t idt_thread_isr(int irq, void *devid)
handled = true;
}
 
-   /* Handle temperature sensor interrupt */
-   if (ntint_sts & IDT_NTINTSTS_TMPSENSOR) {
-   idt_temp_isr(ndev, ntint_sts);
-   handled = true;
-   }
-
dev_dbg(&ndev->ntb.pdev->dev, "IDT IRQs 0x%08x handled", ntint_sts);
 
return handled ? IRQ_HANDLED : IRQ_NONE;
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.h b/drivers/ntb/hw/idt/ntb_hw_idt.h
index 032f81cb4d44..3517cd2e2baa 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.h
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.h
@@ -688,15 +688,14 @@
  * @IDT_NTINTMSK_DBELL:Doorbell interrupt mask bit
  * @IDT_NTINTMSK_SEVENT:   Switch Event interrupt mask bit
  * @IDT_NTINTMSK_TMPSENSOR:Temperature sensor interrupt mask bit
- * @IDT_NTINTMSK_ALL:  All the useful interrupts mask
+ * @IDT_NTINTMSK_ALL:  NTB-related interrupts mask
  */
 #define IDT_NTINTMSK_MSG   0x0001U
 #define IDT_NTINTMSK_DBELL 0x0002U
 #define IDT_NTINTMSK_SEVENT0x0008U
 #define IDT_NTINTMSK_TMPSENS

Re: [PATCH] dt-bindings: correct white-spaces in examples

2023-11-24 Thread Serge Semin

On Fri, Nov 24, 2023 at 10:21:21AM +0100, Krzysztof Kozlowski wrote:
> Use only one and exactly one space around '=' in DTS example.
> 
> Signed-off-by: Krzysztof Kozlowski 
> 
> ---
> 
> Merging idea: Rob's DT.
> Should apply cleanly on Rob's for-next.
> ---
>  .../devicetree/bindings/auxdisplay/hit,hd44780.yaml   | 2 +-
>  .../devicetree/bindings/clock/baikal,bt1-ccu-pll.yaml | 2 +-
>  Documentation/devicetree/bindings/iio/adc/adi,ad7780.yaml | 6 +++---
>  .../devicetree/bindings/iio/adc/qcom,spmi-iadc.yaml   | 2 +-
>  .../devicetree/bindings/iio/adc/qcom,spmi-rradc.yaml  | 2 +-
>  .../interrupt-controller/st,stih407-irq-syscfg.yaml   | 4 ++--
>  Documentation/devicetree/bindings/mmc/arm,pl18x.yaml  | 2 +-
>  Documentation/devicetree/bindings/net/sff,sfp.yaml| 2 +-
>  .../devicetree/bindings/pci/toshiba,visconti-pcie.yaml| 2 +-
>  .../bindings/pinctrl/renesas,rzg2l-pinctrl.yaml   | 6 +++---
>  .../devicetree/bindings/power/supply/richtek,rt9455.yaml  | 8 
>  .../devicetree/bindings/regulator/mps,mp5416.yaml | 4 ++--
>  .../devicetree/bindings/regulator/mps,mpq7920.yaml| 4 ++--
>  .../devicetree/bindings/remoteproc/fsl,imx-rproc.yaml | 8 
>  14 files changed, 27 insertions(+), 27 deletions(-)
> 

[nip]

> diff --git a/Documentation/devicetree/bindings/clock/baikal,bt1-ccu-pll.yaml 
> b/Documentation/devicetree/bindings/clock/baikal,bt1-ccu-pll.yaml
> index 624984d51c10..7f8d98226437 100644
> --- a/Documentation/devicetree/bindings/clock/baikal,bt1-ccu-pll.yaml
> +++ b/Documentation/devicetree/bindings/clock/baikal,bt1-ccu-pll.yaml
> @@ -125,7 +125,7 @@ examples:
>  clk25m: clock-oscillator-25m {
>compatible = "fixed-clock";
>#clock-cells = <0>;
> -  clock-frequency  = <2500>;
> +  clock-frequency = <2500>;
>clock-output-names = "clk25m";
>  };
>  ...

For Baikal-T1 CCU PLL DT-schema
Acked-by: Serge Semin 

-Serge(y)

[PATCH 00/12] mips: Post-bootmem-memblock transition fixes

2019-04-23 Thread Serge Semin

First attempt of making the MIPS subsystem utilizing the memblock early memory
allocator was done by me a few years ago. I created a patchset with
21 patches [1]. It turned out to be too complicated and I decided to resend a
reworked patchset with smaller number of changes [2]. I did this and after a
small review process a v2 patchset was also posted. Then my spare
time was over and I couldn't proceed with the patchset support and
resubmission.

In a year Mike Rapoport took charge in this task and posted a small
patch which essentially did the bootmem allocator removal from MIPS
subsystem [3]. A single small patch did in general the whole thing my huge
patchsetes were intended for in the first place (though it lacked a few fixes).
Mike even went further and completely removed the bootmem allocator from
kernel code, so all the subsystems would need to use the only one early
memory allocator. This significantly simplified the platforms code as well
as removed a deprecated subsystem with duplicated functionality. Million
credits to Mike for this.

Getting back to the MIPS subsystem and it memblock allocator usage. Even
though the patch suggested by Mike [3] fixed most of the problems caused
by enabling the memblock allocator usage, some of them have been left
uncovered by it. First of all the PFNs calculation algorithm hasn't been
fully refactored. A reserved memory declaration loop has been left
untouched though it was clearly over-complicated for the new initialization
procedure. Secondly the MIPS platform code reserved the whole space below
kernel start address, which doesn't seem right since kernel can be
located much higher than memory space really starts. Thirdly CMA if it
is enabled reserves memory regions by means of memblock in the first place.
So the bootmem-init code doesn't need to do it again. Fifthly at early
platform initialization stage non of bootmem-left methods can be called
since there is no memory pages mapping at that moment, so __nosave* region
must be reserved by means of memblock allocator. Finally aside from memblock
allocator introduction my early patchsets included a series of useful
alterations like "nomap" property implementation for "reserved-memory"
dts-nodes, memblock_dump_all() method call after early memory allocator
initialization for debugging, low-memory test procedure, kernel memory
mapping printout at boot-time, and so on. So all of these fixes and
alterations are introduced in this new patchset. Please review. Hope
this time I'll be more responsive and finish this series up until it
is merged.

[1] https://lkml.org/lkml/2016/12/18/195
[2] https://lkml.org/lkml/2018/1/17/1201
[3] https://lkml.org/lkml/2018/9/10/302

NOTE I added a few "Reviewed-by:  Matt Redfearn "
since some patches of this series have been picked up from my earlier
patchsets, which Matt's already reviewed. I didn't add the tag for patches,
which were either new or partially ported.

Serge Semin (12):
  mips: Make sure kernel .bss exists in boot mem pool
  mips: Discard rudiments from bootmem_init
  mips: Combine memblock init and memory reservation loops
  mips: Reserve memory for the kernel image resources
  mips: Discard post-CMA-init foreach loop
  mips: Use memblock to reserve the __nosave memory range
  mips: Add reserve-nomap memory type support
  mips: Dump memblock regions for debugging
  mips: Perform early low memory test
  mips: Print the kernel virtual mem layout on debugging
  mips: Make sure dt memory regions are valid
  mips: Enable OF_RESERVED_MEM config

 arch/mips/Kconfig|   1 +
 arch/mips/include/asm/bootinfo.h |   1 +
 arch/mips/kernel/prom.c  |  18 -
 arch/mips/kernel/setup.c | 129 +--
 arch/mips/mm/init.c  |  49 
 5 files changed, 102 insertions(+), 96 deletions(-)

-- 
2.21.0

[PATCH 01/12] mips: Make sure kernel .bss exists in boot mem pool

2019-04-23 Thread Serge Semin

Current MIPS platform code makes sure the kernel text, data and init
sections are added to the boot memory map pool right after the
arch-specific memory setup method has been executed. But for some reason
the MIPS platform code skipped the kernel .bss section, which definitely
should be in the boot mem pool as well in any case. Lets fix this just be
adding the space between __bss_start and __bss_stop.

Reviewed-by: Matt Redfearn 
Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 8d1dc6c71173..0ee033c44116 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -809,6 +809,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
 PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
 BOOT_MEM_INIT_RAM);
+   arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+BOOT_MEM_RAM);
 
pr_info("Determined physical RAM map:\n");
print_memory_map();
-- 
2.21.0

[PATCH 02/12] mips: Discard rudiments from bootmem_init

2019-04-23 Thread Serge Semin

There is a pointless code left in the bootmem_init() method since
the bootmem allocator removal. First part resides the PFN ranges
calculation loop. The conditional expressions and continue operator
are useless there, since nothing is done after them. Second part is
in RAM ranges installation loop. We can simplify the conditions cascade
a bit without much of the logic redefinition, so to reduce the code
length. In particular the end boundary value can be verified after
the possible reduction to be below max_low_pfn.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0ee033c44116..53d93a727d1a 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -394,10 +394,7 @@ static void __init bootmem_init(void)
min_low_pfn = ~0UL;
max_low_pfn = 0;
 
-   /*
-* Find the highest page frame number we have available
-* and the lowest used RAM address
-*/
+   /* Find the highest and lowest page frame numbers we have available. */
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;
 
@@ -427,13 +424,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
-   if (end <= reserved_end)
-   continue;
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Skip zones before initrd and initrd itself */
-   if (initrd_end && end <= (unsigned 
long)PFN_UP(__pa(initrd_end)))
-   continue;
-#endif
}
 
if (min_low_pfn >= max_low_pfn)
@@ -474,6 +464,7 @@ static void __init bootmem_init(void)
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
 
+   /* Install all valid RAM ranges to the memblock memory region */
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;
 
@@ -481,21 +472,15 @@ static void __init bootmem_init(void)
end = PFN_DOWN(boot_mem_map.map[i].addr
+ boot_mem_map.map[i].size);
 
-   if (start <= min_low_pfn)
+   if (start < min_low_pfn)
start = min_low_pfn;
-   if (start >= end)
-   continue;
-
 #ifndef CONFIG_HIGHMEM
+   /* Ignore highmem regions if highmem is unsupported */
if (end > max_low_pfn)
end = max_low_pfn;
-
-   /*
-* ... finally, is the area going away?
-*/
+#endif
if (end <= start)
continue;
-#endif
 
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
-- 
2.21.0

[PATCH 07/12] mips: Add reserve-nomap memory type support

2019-04-23 Thread Serge Semin

It might be necessary to prevent the virtual mapping creation for a
requested memory region. For instance there is a "no-map" property
indicating exactly this feature. In this case we need to not only
reserve the specified region by pretending it doesn't exist in the
memory space, but completely remove the range from system just by
removing it from memblock. The same way it's done in default
early_init_dt_reserve_memory_arch() method.

Signed-off-by: Serge Semin 
---
 arch/mips/include/asm/bootinfo.h | 1 +
 arch/mips/kernel/prom.c  | 4 +++-
 arch/mips/kernel/setup.c | 8 
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index a301a8f4bc66..235bc2f52113 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -92,6 +92,7 @@ extern unsigned long mips_machtype;
 #define BOOT_MEM_ROM_DATA  2
 #define BOOT_MEM_RESERVED  3
 #define BOOT_MEM_INIT_RAM  4
+#define BOOT_MEM_NOMAP 5
 
 /*
  * A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 93b8e0b4332f..437a174e3ef9 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -47,7 +47,9 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
-   add_memory_region(base, size, BOOT_MEM_RESERVED);
+   add_memory_region(base, size,
+ nomap ? BOOT_MEM_NOMAP : BOOT_MEM_RESERVED);
+
return 0;
 }
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 3a5140943f54..2a1b2e7a1bc9 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -178,6 +178,7 @@ static bool __init __maybe_unused 
memory_region_available(phys_addr_t start,
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+   case BOOT_MEM_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -213,6 +214,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+   case BOOT_MEM_NOMAP:
+   printk(KERN_CONT "(nomap)\n");
+   break;
default:
printk(KERN_CONT "type %lu\n", 
boot_mem_map.map[i].type);
break;
@@ -487,6 +491,9 @@ static void __init bootmem_init(void)
switch (boot_mem_map.map[i].type) {
case BOOT_MEM_RAM:
break;
+   case BOOT_MEM_NOMAP: /* Discard the range from the system. */
+   memblock_remove(PFN_PHYS(start), PFN_PHYS(end - start));
+   continue;
default: /* Reserve the rest of the memory types at boot time */
memblock_reserve(PFN_PHYS(start), PFN_PHYS(end - 
start));
break;
@@ -861,6 +868,7 @@ static void __init resource_init(void)
res->flags |= IORESOURCE_SYSRAM;
break;
case BOOT_MEM_RESERVED:
+   case BOOT_MEM_NOMAP:
default:
res->name = "reserved";
}
-- 
2.21.0

[PATCH 05/12] mips: Discard post-CMA-init foreach loop

2019-04-23 Thread Serge Semin

Really the loop is pointless, since it walks over memblock-reserved
memory regions and mark them as reserved in memblock. Before
bootmem was removed from the kernel, this loop had been
used to map the memory reserved by CMA into the legacy bootmem
allocator. But now the early memory allocator is memblock,
which is used by CMA for reservation, so we don't need any mapping
anymore.

Reviewed-by: Matt Redfearn 
Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index f71a7d32a687..2ae6b02b948f 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -708,7 +708,6 @@ static void __init request_crashkernel(struct resource *res)
  */
 static void __init arch_mem_init(char **cmdline_p)
 {
-   struct memblock_region *reg;
extern void plat_mem_setup(void);
 
/*
@@ -814,10 +813,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
-   /* Tell bootmem about cma reserved memblock section */
-   for_each_memblock(reserved, reg)
-   if (reg->size != 0)
-   memblock_reserve(reg->base, reg->size);
 
reserve_bootmem_region(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end)); /* Reserve for hibernation 
*/
-- 
2.21.0

[PATCH 09/12] mips: Perform early low memory test

2019-04-23 Thread Serge Semin

memblock subsystem provides a method to optionally test the passed
memory region in case if it was requested via special kernel boot
argument. Lets add the function at the bottom of the arch_mem_init()
method. Testing at this point in the boot sequence should be safe since all
critical areas are now reserved and a minimum of allocations have been
done.

Reviewed-by: Matt Redfearn 
Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index ca493fdf69b0..fbd216b4e929 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -826,6 +826,8 @@ static void __init arch_mem_init(char **cmdline_p)
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
 
memblock_dump_all();
+
+   early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
 }
 
 static void __init resource_init(void)
-- 
2.21.0

[PATCH 11/12] mips: Make sure dt memory regions are valid

2019-04-23 Thread Serge Semin

There are situations when memory regions coming from dts may be
too big for the platform physical address space. It especially
concerns XPA-capable systems. Bootleader may determine more than 4GB
memory available and pass it to the kernel over dts memory node, while
kernel is built without XPA support. In this case the region
may either simply be truncated by add_memory_region() method
or by u64->phys_addr_t type casting. But in worst case the method
can even drop the memory region if it exceedes PHYS_ADDR_MAX size.
So lets make sure the retrieved from dts memory regions are valid,
and if some of them isn't just manually truncate it with a warning
printed out.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/prom.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 437a174e3ef9..28bf01961bb2 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,19 @@ char *mips_get_machine_name(void)
 #ifdef CONFIG_USE_OF
 void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-   return add_memory_region(base, size, BOOT_MEM_RAM);
+   if (base >= PHYS_ADDR_MAX) {
+   pr_warn("Trying to add an invalid memory region, skipped\n");
+   return;
+   }
+
+   /* Truncate the passed memory region instead of type casting */
+   if (base + size - 1 >= PHYS_ADDR_MAX || base + size < base) {
+   pr_warn("Truncate memory region %llx @ %llx to size %llx\n",
+   size, base, PHYS_ADDR_MAX - base);
+   size = PHYS_ADDR_MAX - base;
+   }
+
+   add_memory_region(base, size, BOOT_MEM_RAM);
 }
 
 int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
-- 
2.21.0

[PATCH 08/12] mips: Dump memblock regions for debugging

2019-04-23 Thread Serge Semin

It is useful to have the whole memblock memory space printed to console
when basic memlock initializations are done. It can be performed by
ready-to-use method memblock_dump_all(), which prints the available
and reserved memory spaces if MEMBLOCK_DEBUG config is enabled.
Lets call it at the very end of arch_mem_init() function, when
all memblock memory and reserved regions are defined, but before
any serious allocation is performed.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2a1b2e7a1bc9..ca493fdf69b0 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -824,6 +824,8 @@ static void __init arch_mem_init(char **cmdline_p)
/* Reserve for hibernation. */
memblock_reserve(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
+
+   memblock_dump_all();
 }
 
 static void __init resource_init(void)
-- 
2.21.0

[PATCH 06/12] mips: Use memblock to reserve the __nosave memory range

2019-04-23 Thread Serge Semin

Originally before legacy bootmem was removed, the memory for the range was
correctly reserved by reserve_bootmem_region(). But since memblock has been
selected for early memory allocation the function can be utilized only
after paging is fully initialized (as it is done by memblock_free_all()
function). So calling it from arch_mem_init() method is prone to errors,
and at this stage we need to reserve the memory in the memblock allocator.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2ae6b02b948f..3a5140943f54 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -814,8 +814,9 @@ static void __init arch_mem_init(char **cmdline_p)
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
 
-   reserve_bootmem_region(__pa_symbol(&__nosave_begin),
-   __pa_symbol(&__nosave_end)); /* Reserve for hibernation 
*/
+   /* Reserve for hibernation. */
+   memblock_reserve(__pa_symbol(&__nosave_begin),
+   __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
 }
 
 static void __init resource_init(void)
-- 
2.21.0

[PATCH 12/12] mips: Enable OF_RESERVED_MEM config

2019-04-23 Thread Serge Semin

Since memblock-patchset was introduced the reserved-memory nodes are
supported being declared in dt-files. So these nodes are actually parsed
during the arch setup procedure when the early_init_fdt_scan_reserved_mem()
method is called. But some of the features like private reserved memory
pools aren't available at the moment, since OF_RESERVED_MEM isn't enabled
for the MIPS platform. Lets fix it by enabling the config.

But due to the arch-specific boot mem_map container utilization we need
to manually call the fdt_init_reserved_mem() method after all the available
and reserved memory has been moved to memblock. The function call performed
before bootmem_init() fails due to the lack of any memblock memory regions
to allocate from at that stage.

Signed-off-by: Serge Semin 
---
 arch/mips/Kconfig| 1 +
 arch/mips/kernel/setup.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 4a5f5b0ee9a9..0bf9e89e4023 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2988,6 +2988,7 @@ config USE_OF
bool
select OF
select OF_EARLY_FLATTREE
+   select OF_RESERVED_MEM
select IRQ_DOMAIN
 
 config UHI_BOOT
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index fbd216b4e929..ab349d2381c3 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -825,6 +826,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_reserve(__pa_symbol(&__nosave_begin),
__pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
 
+   fdt_init_reserved_mem();
+
memblock_dump_all();
 
early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
-- 
2.21.0

[PATCH 10/12] mips: Print the kernel virtual mem layout on debugging

2019-04-23 Thread Serge Semin

It is useful at least for debugging to have the kernel virtual
memory layout printed at boot time so to have the full information
about the booted kernel. Make the printing optional and available
only when DEBUG_KERNEL config is enabled so not to leak a sensitive
kernel information.

Signed-off-by: Serge Semin 
---
 arch/mips/mm/init.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index bbb196ad5f26..c338bbd03b2a 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -56,6 +57,53 @@ unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL_GPL(empty_zero_page);
 EXPORT_SYMBOL(zero_page_mask);
 
+/*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), 
SZ_1K)
+static void __init mem_print_kmap_info(void)
+{
+#ifdef CONFIG_DEBUG_KERNEL
+   pr_notice("Kernel virtual memory layout:\n"
+ "lowmem  : 0x%px - 0x%px  (%4ld MB)\n"
+ "  .text : 0x%px - 0x%px  (%4td kB)\n"
+ "  .data : 0x%px - 0x%px  (%4td kB)\n"
+ "  .init : 0x%px - 0x%px  (%4td kB)\n"
+ "  .bss  : 0x%px - 0x%px  (%4td kB)\n"
+ "vmalloc : 0x%px - 0x%px  (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ "pkmap   : 0x%px - 0x%px  (%4ld MB)\n"
+#endif
+ "fixmap  : 0x%px - 0x%px  (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+   /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+   BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+   BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+   BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+   BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+   (unsigned long)high_memory);
+#endif
+   BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+   BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+#endif /* CONFIG_DEBUG_KERNEL */
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
 /*
  * Not static inline because used by IP27 special magic initialization code
  */
@@ -479,6 +527,7 @@ void __init mem_init(void)
setup_zero_pages(); /* Setup zeroed pages.  */
mem_init_free_highmem();
mem_init_print_info(NULL);
+   mem_print_kmap_info();
 
 #ifdef CONFIG_64BIT
if ((unsigned long) &_text > (unsigned long) CKSEG0)
-- 
2.21.0

[PATCH 03/12] mips: Combine memblock init and memory reservation loops

2019-04-23 Thread Serge Semin

Before bootmem was completely removed from the kernel, the last loop
in the bootmem_init() had been used to reserve the correspondingly
marked regions, initialize sparsemem sections and to free the low memory
pages, which then would be used for early memory allocations. After the
bootmem removing patchset had been merged the loop was left to do the first
two things only. But it didn't do them quite well.

First of all it leaves the BOOT_MEM_INIT_RAM memory types unreserved,
which is definitely bug (although it isn't noticeable due to being used
by the kernel region only, which is fully marked as reserved). Secondly
the reservation is supposed to be done for any memory including the
high one. (I couldn't figure out why the highmem was ignored in the first
place, since platforms and dts' may declare any memory region for
reservation) Thirdly the reserved_end variable had been used here to not
accidentally free memory occupied by kernel. Since we already reserved the
corresponding region higher in this method there is no need in using the
variable here anymore. Fourthly the sparsemem should be aware of all the
memory types in the system including the ROM_DATA even if it is going to
be reserved for the whole system uptime. Finally after all these notes are
fixed the loop of memory reservation can be freely merged into the memory
installation loop as it's done in this patch.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 48 ++--
 1 file changed, 7 insertions(+), 41 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 53d93a727d1a..185e0e42e009 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -483,55 +483,21 @@ static void __init bootmem_init(void)
continue;
 
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
-   }
-
-   /*
-* Register fully available low RAM pages with the bootmem allocator.
-*/
-   for (i = 0; i < boot_mem_map.nr_map; i++) {
-   unsigned long start, end, size;
-
-   start = PFN_UP(boot_mem_map.map[i].addr);
-   end   = PFN_DOWN(boot_mem_map.map[i].addr
-   + boot_mem_map.map[i].size);
 
-   /*
-* Reserve usable memory.
-*/
+   /* Reserve any memory except the ordinary RAM ranges. */
switch (boot_mem_map.map[i].type) {
case BOOT_MEM_RAM:
break;
-   case BOOT_MEM_INIT_RAM:
-   memory_present(0, start, end);
-   continue;
-   default:
-   /* Not usable memory */
-   if (start > min_low_pfn && end < max_low_pfn)
-   memblock_reserve(boot_mem_map.map[i].addr,
-   boot_mem_map.map[i].size);
-
-   continue;
+   default: /* Reserve the rest of the memory types at boot time */
+   memblock_reserve(PFN_PHYS(start), PFN_PHYS(end - 
start));
+   break;
}
 
/*
-* We are rounding up the start address of usable memory
-* and at the end of the usable range downwards.
-*/
-   if (start >= max_low_pfn)
-   continue;
-   if (start < reserved_end)
-   start = reserved_end;
-   if (end > max_low_pfn)
-   end = max_low_pfn;
-
-   /*
-* ... finally, is the area going away?
+* In any case the added to the memblock memory regions
+* (highmem/lowmem, available/reserved, etc) are considered
+* as present, so inform sparsemem about them.
 */
-   if (end <= start)
-   continue;
-   size = end - start;
-
-   /* Register lowmem ranges */
memory_present(0, start, end);
}
 
-- 
2.21.0

[PATCH 04/12] mips: Reserve memory for the kernel image resources

2019-04-23 Thread Serge Semin

The reserved_end variable had been used by the bootmem_init() code
to find a lowest limit of memory available for memmap blob. The original
code just tried to find a free memory space higher than kernel was placed.
This limitation seems justified for the memmap ragion search process, but
I can't see any obvious reason to reserve the unused space below kernel
seeing some platforms place it much higher than standard 1MB. Moreover
the RELOCATION config enables it to be loaded at any memory address.
So lets reserve the memory occupied by the kernel only, leaving the region
below being free for allocations. After doing this we can now discard the
code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols
since it's going to be free anyway (unless marked as reserved by
platforms).

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 30 +++---
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 185e0e42e009..f71a7d32a687 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -371,7 +371,6 @@ static void __init bootmem_init(void)
 
 static void __init bootmem_init(void)
 {
-   unsigned long reserved_end;
phys_addr_t ramstart = PHYS_ADDR_MAX;
int i;
 
@@ -382,10 +381,10 @@ static void __init bootmem_init(void)
 * will reserve the area used for the initrd.
 */
init_initrd();
-   reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
 
-   memblock_reserve(PHYS_OFFSET,
-(reserved_end << PAGE_SHIFT) - PHYS_OFFSET);
+   /* Reserve memory occupied by kernel. */
+   memblock_reserve(__pa_symbol(&_text),
+   __pa_symbol(&_end) - __pa_symbol(&_text));
 
/*
 * max_low_pfn is not a number of pages. The number of pages
@@ -501,29 +500,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}
 
-#ifdef CONFIG_RELOCATABLE
-   /*
-* The kernel reserves all memory below its _end symbol as bootmem,
-* but the kernel may now be at a much higher address. The memory
-* between the original and new locations may be returned to the system.
-*/
-   if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) {
-   unsigned long offset;
-   extern void show_kernel_relocation(const char *level);
-
-   offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
-   memblock_free(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
-
-#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
-   /*
-* This information is necessary when debugging the kernel
-* But is a security vulnerability otherwise!
-*/
-   show_kernel_relocation(KERN_INFO);
-#endif
-   }
-#endif
-
/*
 * Reserve initrd memory if needed.
 */
-- 
2.21.0

Re: [PATCH 12/12] mips: Enable OF_RESERVED_MEM config

2019-04-24 Thread Serge Semin

On Tue, Apr 23, 2019 at 11:17:50PM -0700, Christoph Hellwig wrote:

Hello Christoph

> On Wed, Apr 24, 2019 at 01:47:48AM +0300, Serge Semin wrote:
> > Since memblock-patchset was introduced the reserved-memory nodes are
> > supported being declared in dt-files. So these nodes are actually parsed
> > during the arch setup procedure when the early_init_fdt_scan_reserved_mem()
> > method is called. But some of the features like private reserved memory
> > pools aren't available at the moment, since OF_RESERVED_MEM isn't enabled
> > for the MIPS platform. Lets fix it by enabling the config.
> > 
> > But due to the arch-specific boot mem_map container utilization we need
> > to manually call the fdt_init_reserved_mem() method after all the available
> > and reserved memory has been moved to memblock. The function call performed
> > before bootmem_init() fails due to the lack of any memblock memory regions
> > to allocate from at that stage.
> 
> Architectures should not select this symbol directly, it will be
> automatically enabled if either DMA_DECLARE_COHERENT or DMA_CMA
> are enabled, which are required for the actual underlying memory
> allocators.

Thanks for the comment. I should have checked this before porting the patch from
kernel 4.9. This symbol has been selected there by the platforms. I'll remove 
the
forcible selection of the config in the next patchset revision.

-Sergey

[PATCH 0/5] i2c-mux-gpio: Split plat- and dt-specific code up

2019-04-24 Thread Serge Semin

The main idea of this patchset was to add the dt-based GPIOs support
in i2c-mux-gpio driver. In particular we needed to have the full GPIOs
specifier being handled including the dt-flags like GPIO_ACTIVE_HIGH,
GPIO_ACTIVE_LOW, etc. Due to using a legacy GPIO interface the former
driver implementation didn't provide this ability.

On the way of adding the full dt-GPIO flags support a small set of
refactorings has been done in order to keep the platform_data-based
systems support, make the code more readable and the alterations - clearer.
In general the whole changes might be considered as the plat- and dt-
specific code split up. In first patch we unpinned the platform-specific
method of GPIO-chip probing. The second patch makes the driver to return
an error if of-based (last resort path) failed to retrieve the driver
private data. The next three patches is the sequence of initial channel
info retrieval, platform_data-specific code isolation and finally full
dt-based GPIOs request method introduction. The last patch does what
we inteded this patchset for in the first place - adds the full dt-GPIO
specifiers support.


Serge Semin (5):
  i2c-mux-gpio: Unpin a platform-based device initialization
  i2c-mux-gpio: Return an error if no config data found
  i2c-mux-gpio: Save initial channel number to the idle data field
  i2c-mux-gpio: Unpin the platform-specific GPIOs request code
  i2c-mux-gpio: Create of-based GPIOs request method

 drivers/i2c/muxes/i2c-mux-gpio.c | 224 ---
 1 file changed, 146 insertions(+), 78 deletions(-)

-- 
2.21.0

[PATCH 1/5] i2c-mux-gpio: Unpin a platform-based device initialization

2019-04-24 Thread Serge Semin

We can unpin a code specific for i2c-mux-gpio device declared
as platform device. In this case the platform data just needs to be
copied to the private storage and if GPIO chip pointer is referring to
a valid GPIO chip descriptor save it' base number for further GPIOs
request and initialization. The rest of the code is common for both
platform and OF-based setups.

Signed-off-by: Serge Semin 
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 67 ++--
 1 file changed, 37 insertions(+), 30 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 13882a2a4f60..24cf6ec02e75 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -136,44 +136,51 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
 }
 #endif
 
+static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_gpio_platform_data *data = dev_get_platdata(&pdev->dev);
+   struct gpio_chip *gpio;
+
+   /*
+* If a GPIO chip name is provided, the GPIO pin numbers provided are
+* relative to its base GPIO number. Otherwise they are absolute.
+*/
+   if (data->gpio_chip) {
+   gpio = gpiochip_find(data->gpio_chip,
+match_gpio_chip_by_label);
+   if (!gpio)
+   return -EPROBE_DEFER;
+
+   mux->gpio_base = gpio->base;
+   } else {
+   mux->gpio_base = 0;
+   }
+
+   memcpy(&mux->data, data, sizeof(mux->data));
+
+   return 0;
+}
+
 static int i2c_mux_gpio_probe(struct platform_device *pdev)
 {
struct i2c_mux_core *muxc;
struct gpiomux *mux;
struct i2c_adapter *parent;
struct i2c_adapter *root;
-   unsigned initial_state, gpio_base;
+   unsigned initial_state;
int i, ret;
 
mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
if (!mux)
return -ENOMEM;
 
-   if (!dev_get_platdata(&pdev->dev)) {
+   if (!dev_get_platdata(&pdev->dev))
ret = i2c_mux_gpio_probe_dt(mux, pdev);
-   if (ret < 0)
-   return ret;
-   } else {
-   memcpy(&mux->data, dev_get_platdata(&pdev->dev),
-   sizeof(mux->data));
-   }
-
-   /*
-* If a GPIO chip name is provided, the GPIO pin numbers provided are
-* relative to its base GPIO number. Otherwise they are absolute.
-*/
-   if (mux->data.gpio_chip) {
-   struct gpio_chip *gpio;
-
-   gpio = gpiochip_find(mux->data.gpio_chip,
-match_gpio_chip_by_label);
-   if (!gpio)
-   return -EPROBE_DEFER;
-
-   gpio_base = gpio->base;
-   } else {
-   gpio_base = 0;
-   }
+   else
+   ret = i2c_mux_gpio_probe_plat(mux, pdev);
+   if (ret < 0)
+   return ret;
 
parent = i2c_get_adapter(mux->data.parent);
if (!parent)
@@ -194,7 +201,6 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
root = i2c_root_adapter(&parent->dev);
 
muxc->mux_locked = true;
-   mux->gpio_base = gpio_base;
 
if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE) {
initial_state = mux->data.idle;
@@ -207,14 +213,15 @@ static int i2c_mux_gpio_probe(struct platform_device 
*pdev)
struct device *gpio_dev;
struct gpio_desc *gpio_desc;
 
-   ret = gpio_request(gpio_base + mux->data.gpios[i], 
"i2c-mux-gpio");
+   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
+  "i2c-mux-gpio");
if (ret) {
dev_err(&pdev->dev, "Failed to request GPIO %d\n",
mux->data.gpios[i]);
goto err_request_gpio;
}
 
-   ret = gpio_direction_output(gpio_base + mux->data.gpios[i],
+   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
initial_state & (1 << i));
if (ret) {
dev_err(&pdev->dev,
@@ -224,7 +231,7 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
goto err_request_gpio;
}
 
-   gpio_desc = gpio_to_desc(gpio_base + mux->data.gpios[i]);
+   gpio_desc = gpio_to_desc(mux->gpio_base + mux->data.gpios[i]);
mux->gpios[i] = gpio_desc;
 
if (!muxc->mux_locked)
@@ -256,7 +263,7 @@ static int i2c_mux_gpio_probe(struct platfor

[PATCH 3/5] i2c-mux-gpio: Save initial channel number to the idle data field

2019-04-24 Thread Serge Semin

In case if the idle state has been specified in the data structure,
the idle variable is left untouched as before, so to keep a default
channel number enabled in the mux idle state. But if a platform doesn't
specify which channel is going to be enabled by default, we as before
don't setup the deselect callback, but the initial state is saved in the
idle variable for further initialization. We can safely do this here
since that variable is used for initial state setting only, when no
idling lane is specified.

The reason of this change is to prepare the code for future GPIOs request
path being split up into of- and plat- based methods. The idle variable
here is used as a container of the initial state for both of the paths in
case of idle-channel isn't specified.

Signed-off-by: Serge Semin 
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index a14fe132b0c3..535c83c43371 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -171,7 +171,6 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
struct gpiomux *mux;
struct i2c_adapter *parent;
struct i2c_adapter *root;
-   unsigned initial_state;
int i, ret;
 
mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
@@ -204,12 +203,14 @@ static int i2c_mux_gpio_probe(struct platform_device 
*pdev)
 
muxc->mux_locked = true;
 
-   if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE) {
-   initial_state = mux->data.idle;
+   /*
+* Set descelect callback if idle state has been setup otherwise just
+* use the idle variable to store the initial muxer value.
+*/
+   if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE)
muxc->deselect = i2c_mux_gpio_deselect;
-   } else {
-   initial_state = mux->data.values[0];
-   }
+   else
+   mux->data.idle = mux->data.values[0];
 
for (i = 0; i < mux->data.n_gpios; i++) {
struct device *gpio_dev;
@@ -224,7 +225,7 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
}
 
ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
-   initial_state & (1 << i));
+   mux->data.idle & (1 << i));
if (ret) {
dev_err(&pdev->dev,
"Failed to set direction of GPIO %d to 
output\n",
-- 
2.21.0

[PATCH 4/5] i2c-mux-gpio: Unpin the platform-specific GPIOs request code

2019-04-24 Thread Serge Semin

The GPIOs request loop can be safely moved to a separate function.
First of all it shall improve the code readability. Secondly the
initialization loop at this point is used for both of- and
platform_data-based initialization paths, but it will be changed in
the next patch, so by isolating the code we'll simplify the future
work.

Signed-off-by: Serge Semin 
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 105 +++
 1 file changed, 64 insertions(+), 41 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 535c83c43371..317c019e1415 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -165,12 +165,68 @@ static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
return 0;
 }
 
+static int i2c_mux_gpio_request_plat(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_core *muxc = platform_get_drvdata(pdev);
+   struct gpio_desc *gpio_desc;
+   struct i2c_adapter *root;
+   struct device *gpio_dev;
+   int i, ret;
+
+   root = i2c_root_adapter(&muxc->parent->dev);
+
+   for (i = 0; i < mux->data.n_gpios; i++) {
+   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
+  "i2c-mux-gpio");
+   if (ret) {
+   dev_err(&pdev->dev, "Failed to request GPIO %d\n",
+   mux->data.gpios[i]);
+   goto err_request_gpio;
+   }
+
+   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
+   mux->data.idle & (1 << i));
+   if (ret) {
+   dev_err(&pdev->dev,
+   "Failed to set direction of GPIO %d to 
output\n",
+   mux->data.gpios[i]);
+   i++;/* gpio_request above succeeded, so must free */
+   goto err_request_gpio;
+   }
+
+   gpio_desc = gpio_to_desc(mux->gpio_base + mux->data.gpios[i]);
+   mux->gpios[i] = gpio_desc;
+
+   if (!muxc->mux_locked)
+   continue;
+
+   gpio_dev = &gpio_desc->gdev->dev;
+   muxc->mux_locked = i2c_root_adapter(gpio_dev) == root;
+   }
+
+   return 0;
+
+err_request_gpio:
+   for (; i > 0; i--)
+   gpio_free(mux->gpio_base + mux->data.gpios[i - 1]);
+
+   return ret;
+}
+
+static void i2c_mux_gpio_free(struct gpiomux *mux)
+{
+   int i;
+
+   for (i = 0; i < mux->data.n_gpios; i++)
+   gpiod_free(mux->gpios[i]);
+}
+
 static int i2c_mux_gpio_probe(struct platform_device *pdev)
 {
struct i2c_mux_core *muxc;
struct gpiomux *mux;
struct i2c_adapter *parent;
-   struct i2c_adapter *root;
int i, ret;
 
mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
@@ -199,8 +255,6 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, muxc);
 
-   root = i2c_root_adapter(&parent->dev);
-
muxc->mux_locked = true;
 
/*
@@ -212,37 +266,9 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
else
mux->data.idle = mux->data.values[0];
 
-   for (i = 0; i < mux->data.n_gpios; i++) {
-   struct device *gpio_dev;
-   struct gpio_desc *gpio_desc;
-
-   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
-  "i2c-mux-gpio");
-   if (ret) {
-   dev_err(&pdev->dev, "Failed to request GPIO %d\n",
-   mux->data.gpios[i]);
-   goto err_request_gpio;
-   }
-
-   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
-   mux->data.idle & (1 << i));
-   if (ret) {
-   dev_err(&pdev->dev,
-   "Failed to set direction of GPIO %d to 
output\n",
-   mux->data.gpios[i]);
-   i++;/* gpio_request above succeeded, so must free */
-   goto err_request_gpio;
-   }
-
-   gpio_desc = gpio_to_desc(mux->gpio_base + mux->data.gpios[i]);
-   mux->gpios[i] = gpio_desc;
-
-   if (!muxc->mux_locked)
-   continue;
-
-   gpio_dev = &gpio_desc->gdev->dev;
-   muxc->mux_locked = i2c_root_adapter(gpio_dev) == root;
-   }
+   r

[PATCH 5/5] i2c-mux-gpio: Create of-based GPIOs request method

2019-04-24 Thread Serge Semin

Most modern platforms provide a dts with description of the devices
available in the system. It may also include i2c-gpio-mux'es.
Up until now the i2c-mux-gpio driver supported it' dts nodes, but
performed the GPIOs request by means of legacy GPIO API, which by design
and due to being legacy doesn't know anything about of/dtb/fdt/dts stuff.
It means even though the i2c-gpio-mux dts nodes are successfully mapped
to the kernel i2c-mux devices, the GPIOs used for initialization are
requested without OF_GPIO_* flags setup. It causes problems on the
platforms which fully rely on dts and reside, for instance,
i2c-gpio-muxes with active low or open drain GPIOs connected.

It is fixed by implementing a dedicated method for full dts-based
GPIOs requests. It is mostly similar to the platform one, but
utilizes the gpiod_get_from_of_node() method to request the GPIOs.

Finally the platform code i2c-gpio-mux devices are also supported.
So the fallback to dtb is performed only if array with GPIOs isn't
detected.

Signed-off-by: Serge Semin 
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 65 
 1 file changed, 50 insertions(+), 15 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 317c019e1415..e5e10ba35ad9 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -65,8 +65,8 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
struct device_node *np = pdev->dev.of_node;
struct device_node *adapter_np, *child;
struct i2c_adapter *adapter;
-   unsigned *values, *gpios;
-   int i = 0, ret;
+   unsigned int *values;
+   int i = 0;
 
if (!np)
return -ENODEV;
@@ -109,24 +109,48 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
return -EINVAL;
}
 
-   gpios = devm_kcalloc(&pdev->dev,
-mux->data.n_gpios, sizeof(*mux->data.gpios),
-GFP_KERNEL);
-   if (!gpios) {
-   dev_err(&pdev->dev, "Cannot allocate gpios array");
-   return -ENOMEM;
-   }
+   return 0;
+}
+
+static int i2c_mux_gpio_request_dt(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_core *muxc = platform_get_drvdata(pdev);
+   struct device_node *np = pdev->dev.of_node;
+   struct i2c_adapter *root;
+   struct device *gpio_dev;
+   enum gpiod_flags dflags;
+   int i, ret;
+
+   root = i2c_root_adapter(&muxc->parent->dev);
 
for (i = 0; i < mux->data.n_gpios; i++) {
-   ret = of_get_named_gpio(np, "mux-gpios", i);
-   if (ret < 0)
-   return ret;
-   gpios[i] = ret;
-   }
+   if (mux->data.idle & (1 << i))
+   dflags = GPIOD_OUT_HIGH;
+   else
+   dflags = GPIOD_OUT_LOW;
+
+   mux->gpios[i] = gpiod_get_from_of_node(np, "mux-gpios", i,
+  dflags, "i2c-mux-gpio");
+   if (IS_ERR(mux->gpios[i])) {
+   ret = PTR_ERR(mux->gpios[i]);
+   goto err_request_gpio;
+   }
 
-   mux->data.gpios = gpios;
+   if (!muxc->mux_locked)
+   continue;
+
+   gpio_dev = &mux->gpios[i]->gdev->dev;
+   muxc->mux_locked = i2c_root_adapter(gpio_dev) == root;
+   }
 
return 0;
+
+err_request_gpio:
+   for (i--; i >= 0; i--)
+   gpiod_free(mux->gpios[i]);
+
+   return ret;
 }
 #else
 static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
@@ -134,6 +158,12 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
 {
return -EINVAL;
 }
+
+static int i2c_mux_gpio_request_dt(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   return -EINVAL;
+}
 #endif
 
 static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
@@ -174,6 +204,9 @@ static int i2c_mux_gpio_request_plat(struct gpiomux *mux,
struct device *gpio_dev;
int i, ret;
 
+   if (!mux->data.gpios)
+   return -EINVAL;
+
root = i2c_root_adapter(&muxc->parent->dev);
 
for (i = 0; i < mux->data.n_gpios; i++) {
@@ -267,6 +300,8 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
mux->data.idle = mux->data.values[0];
 
ret = i2c_mux_gpio_request_plat(mux, pdev);
+   if (ret)
+   ret = i2c_mux_gpio_request_dt(mux, pdev);
if (ret)
goto alloc_failed;
 
-- 
2.21.0

[PATCH 2/5] i2c-mux-gpio: Return an error if no config data found

2019-04-24 Thread Serge Semin

It's pointless and might be even errors prone to proceed with further
initialization if neither of- no platform-based settings were discovered.
Just return an error in this case.

Signed-off-by: Serge Semin 
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 24cf6ec02e75..a14fe132b0c3 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -132,7 +132,7 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
 static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
struct platform_device *pdev)
 {
-   return 0;
+   return -EINVAL;
 }
 #endif
 
@@ -142,6 +142,9 @@ static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
struct i2c_mux_gpio_platform_data *data = dev_get_platdata(&pdev->dev);
struct gpio_chip *gpio;
 
+   if (!data)
+   return -EINVAL;
+
/*
 * If a GPIO chip name is provided, the GPIO pin numbers provided are
 * relative to its base GPIO number. Otherwise they are absolute.
@@ -175,11 +178,10 @@ static int i2c_mux_gpio_probe(struct platform_device 
*pdev)
if (!mux)
return -ENOMEM;
 
-   if (!dev_get_platdata(&pdev->dev))
+   ret = i2c_mux_gpio_probe_plat(mux, pdev);
+   if (ret)
ret = i2c_mux_gpio_probe_dt(mux, pdev);
-   else
-   ret = i2c_mux_gpio_probe_plat(mux, pdev);
-   if (ret < 0)
+   if (ret)
return ret;
 
parent = i2c_get_adapter(mux->data.parent);
-- 
2.21.0

Re: [PATCH 08/12] mips: Dump memblock regions for debugging

2019-04-24 Thread Serge Semin

On Wed, Apr 24, 2019 at 04:45:47PM +0300, Mike Rapoport wrote:
> On Wed, Apr 24, 2019 at 01:47:44AM +0300, Serge Semin wrote:
> > It is useful to have the whole memblock memory space printed to console
> > when basic memlock initializations are done. It can be performed by
> > ready-to-use method memblock_dump_all(), which prints the available
> > and reserved memory spaces if MEMBLOCK_DEBUG config is enabled.
> 
> Nit: there's no MEMBLOCK_DEBUG config option but rather memblock=debug
> command line parameter ;-)
> 

Right. Thanks. I'll reword the message in the next patchset revision.

-Sergey

> > Lets call it at the very end of arch_mem_init() function, when
> > all memblock memory and reserved regions are defined, but before
> > any serious allocation is performed.
> > 
> > Signed-off-by: Serge Semin 
> > ---
> >  arch/mips/kernel/setup.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> > index 2a1b2e7a1bc9..ca493fdf69b0 100644
> > --- a/arch/mips/kernel/setup.c
> > +++ b/arch/mips/kernel/setup.c
> > @@ -824,6 +824,8 @@ static void __init arch_mem_init(char **cmdline_p)
> > /* Reserve for hibernation. */
> > memblock_reserve(__pa_symbol(&__nosave_begin),
> > __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin));
> > +
> > +   memblock_dump_all();
> >  }
> >  
> >  static void __init resource_init(void)
> > -- 
> > 2.21.0
> > 
> 
> -- 
> Sincerely yours,
> Mike.
>

Re: [PATCH 10/12] mips: Print the kernel virtual mem layout on debugging

2019-04-24 Thread Serge Semin

On Wed, Apr 24, 2019 at 04:47:11PM +0300, Mike Rapoport wrote:
> On Wed, Apr 24, 2019 at 01:47:46AM +0300, Serge Semin wrote:
> > It is useful at least for debugging to have the kernel virtual
> > memory layout printed at boot time so to have the full information
> > about the booted kernel. Make the printing optional and available
> > only when DEBUG_KERNEL config is enabled so not to leak a sensitive
> > kernel information.
> > 
> > Signed-off-by: Serge Semin 
> > ---
> >  arch/mips/mm/init.c | 49 +
> >  1 file changed, 49 insertions(+)
> > 
> > diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> > index bbb196ad5f26..c338bbd03b2a 100644
> > --- a/arch/mips/mm/init.c
> > +++ b/arch/mips/mm/init.c
> > @@ -31,6 +31,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -56,6 +57,53 @@ unsigned long empty_zero_page, zero_page_mask;
> >  EXPORT_SYMBOL_GPL(empty_zero_page);
> >  EXPORT_SYMBOL(zero_page_mask);
> >  
> > +/*
> > + * Print out the kernel virtual memory layout
> > + */
> > +#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
> > +#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
> > +#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), 
> > SZ_1K)
> > +static void __init mem_print_kmap_info(void)
> > +{
> > +#ifdef CONFIG_DEBUG_KERNEL
> 
> Maybe CONFIG_DEBUG_VM?
> 

Last time I posted this patch Matt suggested to use CONFIG_DEBUG_KERNEL [1].
On the other hand arm platform prints this table unconditionally, but uses %lx
format for low-memory ranges and %p for kernel segments. I even more inclined
in the arm solution. But if selecting between DEBUG_KERNEL and DEBUG_VM I'd
stick with DEBUG_KERNEL, since VM-debug config help text states it is
intended for special performance checks: "Enable this to turn on extended
checks in the virtual-memory system that may impact performance," and
not for memory layout.

-Sergey

[1] https://lkml.org/lkml/2018/2/13/494

> > +   pr_notice("Kernel virtual memory layout:\n"
> > + "lowmem  : 0x%px - 0x%px  (%4ld MB)\n"
> > + "  .text : 0x%px - 0x%px  (%4td kB)\n"
> > + "  .data : 0x%px - 0x%px  (%4td kB)\n"
> > + "  .init : 0x%px - 0x%px  (%4td kB)\n"
> > + "  .bss  : 0x%px - 0x%px  (%4td kB)\n"
> > + "vmalloc : 0x%px - 0x%px  (%4ld MB)\n"
> > +#ifdef CONFIG_HIGHMEM
> > + "pkmap   : 0x%px - 0x%px  (%4ld MB)\n"
> > +#endif
> > + "fixmap  : 0x%px - 0x%px  (%4ld kB)\n",
> > + MLM(PAGE_OFFSET, (unsigned long)high_memory),
> > + MLK_ROUNDUP(_text, _etext),
> > + MLK_ROUNDUP(_sdata, _edata),
> > + MLK_ROUNDUP(__init_begin, __init_end),
> > + MLK_ROUNDUP(__bss_start, __bss_stop),
> > + MLM(VMALLOC_START, VMALLOC_END),
> > +#ifdef CONFIG_HIGHMEM
> > + MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
> > +#endif
> > + MLK(FIXADDR_START, FIXADDR_TOP));
> > +
> > +   /* Check some fundamental inconsistencies. May add something else? */
> > +#ifdef CONFIG_HIGHMEM
> > +   BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
> > +   BUG_ON(VMALLOC_END < (unsigned long)high_memory);
> > +   BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
> > +   BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
> > +   (unsigned long)high_memory);
> > +#endif
> > +   BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
> > +   BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
> > +#endif /* CONFIG_DEBUG_KERNEL */
> > +}
> > +#undef MLK
> > +#undef MLM
> > +#undef MLK_ROUNDUP
> > +
> >  /*
> >   * Not static inline because used by IP27 special magic initialization code
> >   */
> > @@ -479,6 +527,7 @@ void __init mem_init(void)
> > setup_zero_pages(); /* Setup zeroed pages.  */
> > mem_init_free_highmem();
> > mem_init_print_info(NULL);
> > +   mem_print_kmap_info();
> >  
> >  #ifdef CONFIG_64BIT
> > if ((unsigned long) &_text > (unsigned long) CKSEG0)
> > -- 
> > 2.21.0
> > 
> 
> -- 
> Sincerely yours,
> Mike.
>

Re: [PATCH 0/5] i2c-mux-gpio: Split plat- and dt-specific code up

2019-04-25 Thread Serge Semin

On Wed, Apr 24, 2019 at 09:25:24PM +, Peter Rosin wrote:

Hello Peter,

> On 2019-04-24 14:34, Serge Semin wrote:
> > The main idea of this patchset was to add the dt-based GPIOs support
> > in i2c-mux-gpio driver. In particular we needed to have the full GPIOs
> > specifier being handled including the dt-flags like GPIO_ACTIVE_HIGH,
> > GPIO_ACTIVE_LOW, etc. Due to using a legacy GPIO interface the former
> > driver implementation didn't provide this ability.
> 
> I'm curious why active low/high is of any importance? That will only affect
> the state numbering, but I fail to see any relevance in that numbering. It's
> just numbers, no?
> 
> If all the pins are inverted (anything else seems very strange), just
> reverse the order. I.e. for a 4-way mux, use 3, 2, 1, 0 instead of
> 0, 1, 2, 3.
> 
> Why not?

I may misunderstood you, but active low/high flag has nothing to do with
pins ordering. It is relevant to an individual pin setting, most likely
related with hardware setup.

Here is a simple example:
i2cmux {
compatible = "i2c-mux-gpio";
mux-gpios = <&gpioa 0 GPIO_ACTIVE_LOW
 &control 2 GPIO_ACTIVE_HIGH
 &last 5 GPIO_ACTIVE_LOW>;
};

In this setup we've got some i2c-mux with GPIOs-driven channel selector.
First channel is selected by GPIO#0 of controller &gpioa, second one -
by GPIO#2 of controller &control and third - by GPIO#3 of controller
&last. In accordance with the i2c_mux_gpio_set() method of i2c-mux-gpio
driver a GPIO from this set will be driven high in case of a corresponding
mux channel being enabled. But as you can see from the "mux-gpios" property
these GPIOs aren't identical. First of all they belong to different
controller and most importantly they've got different active-attribute.
This attribute actually means the straight or inverted activation policy.
So in case of ACTIVE_HIGH flag you get a straight policy. If you set GPIO'
value the hardware pin will be driven high, and if you clear it GPIO'
value the hardware pin will be pushed to ground. In case ACTIVE_LOW flag
is specified, the GPIO and actual hardware pin values are inverted.
So if you set GPIO to one, the hardware pin will be driven to zero,
and vise-versa. All this logic is implemented in the gpiod subsystem
of the kernel and can be defined in dts scripts, while legacy GPIO
subsystem relied on the drivers to handle this.

Yeah, it might be confusing, but some hardware is designed this way, so
the ordinary GPIO outputs are inverted on the way to the i2c-mux channel
activators. For instance in case if some level-shifter is used as a single
channel i2c-mux and we don't want i2c-bus being always connected to a bus
behind it. Such level-shifters are usually activated by ACTIVE_LOW signals.

In addition there are other than ACTIVE_LOW/ACTIVE_HIGH flags available for
GPIOs in dts, like GPIO_PUSH_PULL, GPIO_OPEN_DRAIN, GPIO_OPEN_SOURCE, which are
also specific not only to the GPIO functionality but to the target port and
hardware design in general. So the support of dt- GPIO-specifiers is very
important to properly describe the hardware setup.

-Sergey

> 
> Cheers,
> Peter
> 
> > On the way of adding the full dt-GPIO flags support a small set of
> > refactorings has been done in order to keep the platform_data-based
> > systems support, make the code more readable and the alterations - clearer.
> > In general the whole changes might be considered as the plat- and dt-
> > specific code split up. In first patch we unpinned the platform-specific
> > method of GPIO-chip probing. The second patch makes the driver to return
> > an error if of-based (last resort path) failed to retrieve the driver
> > private data. The next three patches is the sequence of initial channel
> > info retrieval, platform_data-specific code isolation and finally full
> > dt-based GPIOs request method introduction. The last patch does what
> > we inteded this patchset for in the first place - adds the full dt-GPIO
> > specifiers support.
> > 
> > 
> > Serge Semin (5):
> >   i2c-mux-gpio: Unpin a platform-based device initialization
> >   i2c-mux-gpio: Return an error if no config data found
> >   i2c-mux-gpio: Save initial channel number to the idle data field
> >   i2c-mux-gpio: Unpin the platform-specific GPIOs request code
> >   i2c-mux-gpio: Create of-based GPIOs request method
> > 
> >  drivers/i2c/muxes/i2c-mux-gpio.c | 224 ---
> >  1 file changed, 146 insertions(+), 78 deletions(-)
> > 
>

Re: [PATCH 2/5] i2c-mux-gpio: Return an error if no config data found

2019-04-25 Thread Serge Semin

On Wed, Apr 24, 2019 at 09:25:50PM +, Peter Rosin wrote:
> On 2019-04-24 14:34, Serge Semin wrote:
> > It's pointless and might be even errors prone to proceed with further
> > initialization if neither of- no platform-based settings were discovered.
> > Just return an error in this case.
> > 
> > Signed-off-by: Serge Semin 
> > ---
> >  drivers/i2c/muxes/i2c-mux-gpio.c | 12 +++-
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c 
> > b/drivers/i2c/muxes/i2c-mux-gpio.c
> > index 24cf6ec02e75..a14fe132b0c3 100644
> > --- a/drivers/i2c/muxes/i2c-mux-gpio.c
> > +++ b/drivers/i2c/muxes/i2c-mux-gpio.c
> > @@ -132,7 +132,7 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
> >  static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
> > struct platform_device *pdev)
> >  {
> > -   return 0;
> > +   return -EINVAL;
> >  }
> >  #endif
> >  
> > @@ -142,6 +142,9 @@ static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
> > struct i2c_mux_gpio_platform_data *data = dev_get_platdata(&pdev->dev);
> > struct gpio_chip *gpio;
> >  
> > +   if (!data)
> > +   return -EINVAL;
> > +
> > /*
> >  * If a GPIO chip name is provided, the GPIO pin numbers provided are
> >  * relative to its base GPIO number. Otherwise they are absolute.
> > @@ -175,11 +178,10 @@ static int i2c_mux_gpio_probe(struct platform_device 
> > *pdev)
> > if (!mux)
> > return -ENOMEM;
> >  
> > -   if (!dev_get_platdata(&pdev->dev))
> > +   ret = i2c_mux_gpio_probe_plat(mux, pdev);
> > +   if (ret)
> > ret = i2c_mux_gpio_probe_dt(mux, pdev);
> > -   else
> > -   ret = i2c_mux_gpio_probe_plat(mux, pdev);
> > -   if (ret < 0)
> > +   if (ret)
> > return ret;
> 
> I notice that after this patch, all probe failures from non-dt configs
> will return -EINVAL from the dummy i2c_mux_gpio_probe_dt that gets
> called on i2c_mux_gpio_probe_plat failure.
> 
> So, any -EPROBE_DEFER is now lost. That probably doesn't fly.
> 

So what do you suggest then? We can return to something like:
if (dev_get_platdata(&pdev->dev))
ret = i2c_mux_gpio_probe_plat(mux, pdev);
else
ret = i2c_mux_gpio_probe_dt(mux, pdev);

In this case there is no falling back to dt. Just either plat- or of-based
initialization. The same can be done for i2c_mux_gpio_request_*() methods.

-Sergey

> Cheers,
> Peter
> 
> >  
> > parent = i2c_get_adapter(mux->data.parent);
> > 
>

Re: [PATCH 3/5] i2c-mux-gpio: Save initial channel number to the idle data field

2019-04-25 Thread Serge Semin

On Wed, Apr 24, 2019 at 09:26:22PM +, Peter Rosin wrote:
> On 2019-04-24 14:34, Serge Semin wrote:
> > In case if the idle state has been specified in the data structure,
> > the idle variable is left untouched as before, so to keep a default
> > channel number enabled in the mux idle state. But if a platform doesn't
> > specify which channel is going to be enabled by default, we as before
> > don't setup the deselect callback, but the initial state is saved in the
> > idle variable for further initialization. We can safely do this here
> > since that variable is used for initial state setting only, when no
> > idling lane is specified.
> 
> While this subtlety is *maybe* ok, a comment about it belongs in the
> *code* where it will be seen when the next person makes changes.
> 
> But why not extend the struct with the initial state? How many of
> these muxes do you expect to exist in a system? Multiplied by a
> couple of bytes. Who cares?
> 

I actually thought about this when started working on the patchset.
That time saving the initial value in the idle variable seemed like a
good idea. I even put a small comment in the code about this.

Anyway lets add a new field to "struct gpiomux" structure and use
it as a container for initial value. It is also a good alternative.
I'll do this in a v2 patchset.

-Sergey

> Cheers,
> Peter
> 
> > 
> > The reason of this change is to prepare the code for future GPIOs request
> > path being split up into of- and plat- based methods. The idle variable
> > here is used as a container of the initial state for both of the paths in
> > case of idle-channel isn't specified.
> > 
> > Signed-off-by: Serge Semin 
> > ---
> >  drivers/i2c/muxes/i2c-mux-gpio.c | 15 ---
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c 
> > b/drivers/i2c/muxes/i2c-mux-gpio.c
> > index a14fe132b0c3..535c83c43371 100644
> > --- a/drivers/i2c/muxes/i2c-mux-gpio.c
> > +++ b/drivers/i2c/muxes/i2c-mux-gpio.c
> > @@ -171,7 +171,6 @@ static int i2c_mux_gpio_probe(struct platform_device 
> > *pdev)
> > struct gpiomux *mux;
> > struct i2c_adapter *parent;
> > struct i2c_adapter *root;
> > -   unsigned initial_state;
> > int i, ret;
> >  
> > mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
> > @@ -204,12 +203,14 @@ static int i2c_mux_gpio_probe(struct platform_device 
> > *pdev)
> >  
> > muxc->mux_locked = true;
> >  
> > -   if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE) {
> > -   initial_state = mux->data.idle;
> > +   /*
> > +* Set descelect callback if idle state has been setup otherwise just
> > +* use the idle variable to store the initial muxer value.
> > +*/
> > +   if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE)
> > muxc->deselect = i2c_mux_gpio_deselect;
> > -   } else {
> > -   initial_state = mux->data.values[0];
> > -   }
> > +   else
> > +   mux->data.idle = mux->data.values[0];
> >  
> > for (i = 0; i < mux->data.n_gpios; i++) {
> > struct device *gpio_dev;
> > @@ -224,7 +225,7 @@ static int i2c_mux_gpio_probe(struct platform_device 
> > *pdev)
> > }
> >  
> > ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
> > -   initial_state & (1 << i));
> > +   mux->data.idle & (1 << i));
> > if (ret) {
> > dev_err(&pdev->dev,
> > "Failed to set direction of GPIO %d to 
> > output\n",
> > 
>

Re: [PATCH 0/5] i2c-mux-gpio: Split plat- and dt-specific code up

2019-04-25 Thread Serge Semin

On Thu, Apr 25, 2019 at 07:21:02PM +, Peter Rosin wrote:
> On 2019-04-25 16:37, Serge Semin wrote:
> > On Wed, Apr 24, 2019 at 09:25:24PM +, Peter Rosin wrote:
> > 
> > Hello Peter,
> > 
> >> On 2019-04-24 14:34, Serge Semin wrote:
> >>> The main idea of this patchset was to add the dt-based GPIOs support
> >>> in i2c-mux-gpio driver. In particular we needed to have the full GPIOs
> >>> specifier being handled including the dt-flags like GPIO_ACTIVE_HIGH,
> >>> GPIO_ACTIVE_LOW, etc. Due to using a legacy GPIO interface the former
> >>> driver implementation didn't provide this ability.
> >>
> >> I'm curious why active low/high is of any importance? That will only affect
> >> the state numbering, but I fail to see any relevance in that numbering. 
> >> It's
> >> just numbers, no?
> >>
> >> If all the pins are inverted (anything else seems very strange), just
> >> reverse the order. I.e. for a 4-way mux, use 3, 2, 1, 0 instead of
> >> 0, 1, 2, 3.
> >>
> >> Why not?
> > 
> > I may misunderstood you, but active low/high flag has nothing to do with
> > pins ordering. It is relevant to an individual pin setting, most likely
> > related with hardware setup.
> 
> I was not talking about pin order. I was obviously referring to the
> state order.
> 
> > Here is a simple example:
> > i2cmux {
> > compatible = "i2c-mux-gpio";
> > mux-gpios = <&gpioa 0 GPIO_ACTIVE_LOW
> >  &control 2 GPIO_ACTIVE_HIGH
> >  &last 5 GPIO_ACTIVE_LOW>;
> > };
> 
> So, with this, instead of having two pins active-low and using state 3,
> you could use state 6 with all pins active-high. Same thing. I.e., use
> "state ^ 5" instead of the "direct" state (whatever that is, the state
> numbers have no real meaning, they are just numbers).
> 
> > In this setup we've got some i2c-mux with GPIOs-driven channel selector.
> > First channel is selected by GPIO#0 of controller &gpioa, second one -
> > by GPIO#2 of controller &control and third - by GPIO#3 of controller
> > &last. In accordance with the i2c_mux_gpio_set() method of i2c-mux-gpio
> > driver a GPIO from this set will be driven high in case of a corresponding
> > mux channel being enabled. But as you can see from the "mux-gpios" property
> > these GPIOs aren't identical. First of all they belong to different
> > controller and most importantly they've got different active-attribute.
> > This attribute actually means the straight or inverted activation policy.
> > So in case of ACTIVE_HIGH flag you get a straight policy. If you set GPIO'
> > value the hardware pin will be driven high, and if you clear it GPIO'
> > value the hardware pin will be pushed to ground. In case ACTIVE_LOW flag
> > is specified, the GPIO and actual hardware pin values are inverted.
> > So if you set GPIO to one, the hardware pin will be driven to zero,
> > and vise-versa. All this logic is implemented in the gpiod subsystem
> > of the kernel and can be defined in dts scripts, while legacy GPIO
> > subsystem relied on the drivers to handle this.
> > 
> > Yeah, it might be confusing, but some hardware is designed this way, so
> > the ordinary GPIO outputs are inverted on the way to the i2c-mux channel
> > activators. For instance in case if some level-shifter is used as a single
> > channel i2c-mux and we don't want i2c-bus being always connected to a bus
> > behind it. Such level-shifters are usually activated by ACTIVE_LOW signals.
> 
> See above, you could just adjust the state numbers instead.
> 

Ahh, now I see what you meant. Sorry for explaining the obvious.)

It is definitely a solution in case if active-low pins used for channel
selection, but it seems more like a hack than a best choice. The main
problem is that the hardware programmer needs to take into account the
active-{low,high} flags when assigning the reg-values of subnodes. While in
case if these flags are supported by GPIO subsystem itself, the reg
properties can be the same as if all GPIOs were active-high. As for me
this is a good simplification, which also makes the i2c-mux-gpio nodes
more readable. 

> > In addition there are other than ACTIVE_LOW/ACTIVE_HIGH flags available for
> > GPIOs in dts, like GPIO_PUSH_PULL, GPIO_OPEN_DRAIN, GPIO_OPEN_SOURCE, which 
> > are
> > also specific not only to the GPIO functionality but to the target port and
> > hardware design in gene

Re: [PATCH 2/5] i2c-mux-gpio: Return an error if no config data found

2019-04-25 Thread Serge Semin

On Thu, Apr 25, 2019 at 07:28:52PM +, Peter Rosin wrote:
> On 2019-04-25 17:47, Serge Semin wrote:
> > On Wed, Apr 24, 2019 at 09:25:50PM +, Peter Rosin wrote:
> >> On 2019-04-24 14:34, Serge Semin wrote:
> >>> It's pointless and might be even errors prone to proceed with further
> >>> initialization if neither of- no platform-based settings were discovered.
> >>> Just return an error in this case.
> >>>
> >>> Signed-off-by: Serge Semin 
> >>> ---
> >>>  drivers/i2c/muxes/i2c-mux-gpio.c | 12 +++-
> >>>  1 file changed, 7 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c 
> >>> b/drivers/i2c/muxes/i2c-mux-gpio.c
> >>> index 24cf6ec02e75..a14fe132b0c3 100644
> >>> --- a/drivers/i2c/muxes/i2c-mux-gpio.c
> >>> +++ b/drivers/i2c/muxes/i2c-mux-gpio.c
> >>> @@ -132,7 +132,7 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
> >>>  static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
> >>>   struct platform_device *pdev)
> >>>  {
> >>> - return 0;
> >>> + return -EINVAL;
> >>>  }
> >>>  #endif
> >>>  
> >>> @@ -142,6 +142,9 @@ static int i2c_mux_gpio_probe_plat(struct gpiomux 
> >>> *mux,
> >>>   struct i2c_mux_gpio_platform_data *data = dev_get_platdata(&pdev->dev);
> >>>   struct gpio_chip *gpio;
> >>>  
> >>> + if (!data)
> >>> + return -EINVAL;
> >>> +
> >>>   /*
> >>>* If a GPIO chip name is provided, the GPIO pin numbers provided are
> >>>* relative to its base GPIO number. Otherwise they are absolute.
> >>> @@ -175,11 +178,10 @@ static int i2c_mux_gpio_probe(struct 
> >>> platform_device *pdev)
> >>>   if (!mux)
> >>>   return -ENOMEM;
> >>>  
> >>> - if (!dev_get_platdata(&pdev->dev))
> >>> + ret = i2c_mux_gpio_probe_plat(mux, pdev);
> >>> + if (ret)
> >>>   ret = i2c_mux_gpio_probe_dt(mux, pdev);
> >>> - else
> >>> - ret = i2c_mux_gpio_probe_plat(mux, pdev);
> >>> - if (ret < 0)
> >>> + if (ret)
> >>>   return ret;
> >>
> >> I notice that after this patch, all probe failures from non-dt configs
> >> will return -EINVAL from the dummy i2c_mux_gpio_probe_dt that gets
> >> called on i2c_mux_gpio_probe_plat failure.
> >>
> >> So, any -EPROBE_DEFER is now lost. That probably doesn't fly.
> >>
> > 
> > So what do you suggest then?
> 
> I don't know, I'm just pointing out that you are breaking probe-defer.
> 
> >  We can return to something like:
> > if (dev_get_platdata(&pdev->dev))
> > ret = i2c_mux_gpio_probe_plat(mux, pdev);
> > else
> > ret = i2c_mux_gpio_probe_dt(mux, pdev);
> > 
> > In this case there is no falling back to dt. Just either plat- or of-based
> > initialization. The same can be done for i2c_mux_gpio_request_*() methods.
> 
> Works for me, I fail to see why it is interesting with a fallback
> anyway? If you supply platform data, that is supposed to take
> precedence. No?
> 
> If the platform data fails, I'd rather not have the code run into the
> weeds attempting stuff that's not even supposed to work...
> 

Yeah, you are right. Adding fallback pattern here was a bad idea.
I'll fix it in the next patchset version.

-Sergey

> Cheers,
> Peter
> 
> > -Sergey
> > 
> >> Cheers,
> >> Peter
> >>
> >>>  
> >>>   parent = i2c_get_adapter(mux->data.parent);
> >>>
> >>
>

[PATCH v2 0/3] i2c-mux-gpio: Split plat- and dt-specific code up

2019-04-25 Thread Serge Semin

The main idea of this patchset was to add the full dt GPIOs specifier
support in i2c-mux-gpio driver. In particular we needed to have the
full GPIOs specifier being handled including the flags like GPIO_ACTIVE_HIGH,
GPIO_ACTIVE_LOW, GPIO_PUSH_PULL, GPIO_OPEN_DRAIN or GPIO_OPEN_SOURCE.
Due to using a legacy GPIO interface the former driver implementation
didn't provide this ability.

On the way of adding the full dt-GPIO flags support a small set of
refactorings has been done in order to keep the platform_data-based
systems support, make the code more readable and the alterations - clearer.
In general the whole changes might be considered as the plat- and dt-
specific code split up. In the first patch we unpinned the platform-specific
method of GPIO-chip probing. The second patch introduces a new initial_state
value field into the "gpiomux" structure. The third one is responsible for
GPIO request loop isoltaing into a dedicated function. At this stage common
it is a common function for both dt- and plat- code paths. Finally last
patch introduces a full dt-based GPIOs request method, which uses
gpiod_get_from_of_node() method in order to parse the corresponding dt GPIO
specifiers with their falgs. The last patch does what we inteded this patchset
for in the first place - adds the full dt-GPIO specifiers support.

Changelog v2
- Remove fallback pattern when selecting the dt- or plat-based code paths.
  (Cause the patch "i2c-mux-gpio: Return an error if no onfig data found"
   removal.)
- Use a dedicated initial_state variable to keep the initial channels selector
  state. (Causes the patch "i2c-mux-gpio: Save initial channel number to the
  idle" removal.)
- Mention open-drain, open-source flags in the patchset descriptions.


Serge Semin (3):
  i2c-mux-gpio: Unpin a platform-based device initialization
  i2c-mux-gpio: Unpin the platform-specific GPIOs request code
  i2c-mux-gpio: Create of-based GPIOs request method

 drivers/i2c/muxes/i2c-mux-gpio.c | 226 ---
 1 file changed, 146 insertions(+), 80 deletions(-)

-- 
2.21.0

[PATCH v2 2/3] i2c-mux-gpio: Unpin the platform-specific GPIOs request code

2019-04-25 Thread Serge Semin

The GPIOs request loop can be safely moved to a separate function.
First of all it shall improve the code readability. Secondly the
initialization loop at this point is used for both of- and
platform_data-based initialization paths, but it will be changed in
the next patch, so by isolating the code we'll simplify the future
work.

Signed-off-by: Serge Semin 

---
Changelog v2
- Create a dedicated initial_state field in the "gpiomux" structure to
  keep an initial channel selector state.
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 113 +++
 1 file changed, 68 insertions(+), 45 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 54158b825acd..e10f72706b99 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -20,7 +20,8 @@
 
 struct gpiomux {
struct i2c_mux_gpio_platform_data data;
-   unsigned gpio_base;
+   unsigned int gpio_base;
+   unsigned int initial_state;
struct gpio_desc **gpios;
 };
 
@@ -162,13 +163,68 @@ static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
return 0;
 }
 
+static int i2c_mux_gpio_request_plat(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_core *muxc = platform_get_drvdata(pdev);
+   struct gpio_desc *gpio_desc;
+   struct i2c_adapter *root;
+   struct device *gpio_dev;
+   int i, ret;
+
+   root = i2c_root_adapter(&muxc->parent->dev);
+
+   for (i = 0; i < mux->data.n_gpios; i++) {
+   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
+  "i2c-mux-gpio");
+   if (ret) {
+   dev_err(&pdev->dev, "Failed to request GPIO %d\n",
+   mux->data.gpios[i]);
+   goto err_request_gpio;
+   }
+
+   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
+   mux->initial_state & (1 << i));
+   if (ret) {
+   dev_err(&pdev->dev,
+   "Failed to set direction of GPIO %d to 
output\n",
+   mux->data.gpios[i]);
+   i++;/* gpio_request above succeeded, so must free */
+   goto err_request_gpio;
+   }
+
+   gpio_desc = gpio_to_desc(mux->gpio_base + mux->data.gpios[i]);
+   mux->gpios[i] = gpio_desc;
+
+   if (!muxc->mux_locked)
+   continue;
+
+   gpio_dev = &gpio_desc->gdev->dev;
+   muxc->mux_locked = i2c_root_adapter(gpio_dev) == root;
+   }
+
+   return 0;
+
+err_request_gpio:
+   for (; i > 0; i--)
+   gpio_free(mux->gpio_base + mux->data.gpios[i - 1]);
+
+   return ret;
+}
+
+static void i2c_mux_gpio_free(struct gpiomux *mux)
+{
+   int i;
+
+   for (i = 0; i < mux->data.n_gpios; i++)
+   gpiod_free(mux->gpios[i]);
+}
+
 static int i2c_mux_gpio_probe(struct platform_device *pdev)
 {
struct i2c_mux_core *muxc;
struct gpiomux *mux;
struct i2c_adapter *parent;
-   struct i2c_adapter *root;
-   unsigned initial_state;
int i, ret;
 
mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
@@ -198,48 +254,18 @@ static int i2c_mux_gpio_probe(struct platform_device 
*pdev)
 
platform_set_drvdata(pdev, muxc);
 
-   root = i2c_root_adapter(&parent->dev);
-
muxc->mux_locked = true;
 
if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE) {
-   initial_state = mux->data.idle;
+   mux->initial_state = mux->data.idle;
muxc->deselect = i2c_mux_gpio_deselect;
} else {
-   initial_state = mux->data.values[0];
+   mux->initial_state = mux->data.values[0];
}
 
-   for (i = 0; i < mux->data.n_gpios; i++) {
-   struct device *gpio_dev;
-   struct gpio_desc *gpio_desc;
-
-   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
-  "i2c-mux-gpio");
-   if (ret) {
-   dev_err(&pdev->dev, "Failed to request GPIO %d\n",
-   mux->data.gpios[i]);
-   goto err_request_gpio;
-   }
-
-   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
-   initial_state & (1 << i));
-   if (ret) {
-   dev_err(&pdev->dev,
-   "Failed to set di

[PATCH v2 3/3] i2c-mux-gpio: Create of-based GPIOs request method

2019-04-25 Thread Serge Semin

Most modern platforms provide a dts with description of the devices
available in the system. It may also include i2c-gpio-mux'es.
Up until now the i2c-mux-gpio driver supported it' dts nodes, but
performed the GPIOs request by means of legacy GPIO API, which by design
and due to being legacy doesn't know anything about of/dtb/fdt/dts stuff.
It means even though the i2c-gpio-mux dts nodes are successfully mapped
to the kernel i2c-mux devices, the GPIOs used for initialization are
requested without OF_GPIO_* flags setup. It causes problems on the
platforms which fully rely on dts and reside, for instance,
i2c-gpio-muxes with active low, push-pull, open drain or open source
GPIOs connected.

It is fixed by implementing a dedicated method for full dts-based
GPIOs requests. It is mostly similar to the platform one, but
utilizes the gpiod_get_from_of_node() method to request the GPIOs.

Signed-off-by: Serge Semin 

---
Changelog v2
- Remove fallback pattern when selecting the dt- or plat-based GPIOs
  request methods.
- Use a dedicated initial_state field in the "gpiomux" structure to
  select a proper channel initially.
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 68 
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index e10f72706b99..d1a9c56fa1ec 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -66,8 +66,8 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
struct device_node *np = pdev->dev.of_node;
struct device_node *adapter_np, *child;
struct i2c_adapter *adapter;
-   unsigned *values, *gpios;
-   int i = 0, ret;
+   unsigned int *values;
+   int i = 0;
 
if (!np)
return -ENODEV;
@@ -110,24 +110,48 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
return -EINVAL;
}
 
-   gpios = devm_kcalloc(&pdev->dev,
-mux->data.n_gpios, sizeof(*mux->data.gpios),
-GFP_KERNEL);
-   if (!gpios) {
-   dev_err(&pdev->dev, "Cannot allocate gpios array");
-   return -ENOMEM;
-   }
+   return 0;
+}
+
+static int i2c_mux_gpio_request_dt(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_core *muxc = platform_get_drvdata(pdev);
+   struct device_node *np = pdev->dev.of_node;
+   struct i2c_adapter *root;
+   struct device *gpio_dev;
+   enum gpiod_flags dflags;
+   int i, ret;
+
+   root = i2c_root_adapter(&muxc->parent->dev);
 
for (i = 0; i < mux->data.n_gpios; i++) {
-   ret = of_get_named_gpio(np, "mux-gpios", i);
-   if (ret < 0)
-   return ret;
-   gpios[i] = ret;
-   }
+   if (mux->initial_state & (1 << i))
+   dflags = GPIOD_OUT_HIGH;
+   else
+   dflags = GPIOD_OUT_LOW;
+
+   mux->gpios[i] = gpiod_get_from_of_node(np, "mux-gpios", i,
+  dflags, "i2c-mux-gpio");
+   if (IS_ERR(mux->gpios[i])) {
+   ret = PTR_ERR(mux->gpios[i]);
+   goto err_request_gpio;
+   }
+
+   if (!muxc->mux_locked)
+   continue;
 
-   mux->data.gpios = gpios;
+   gpio_dev = &mux->gpios[i]->gdev->dev;
+   muxc->mux_locked = i2c_root_adapter(gpio_dev) == root;
+   }
 
return 0;
+
+err_request_gpio:
+   for (i--; i >= 0; i--)
+   gpiod_free(mux->gpios[i]);
+
+   return ret;
 }
 #else
 static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
@@ -135,6 +159,12 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
 {
return -EINVAL;
 }
+
+static int i2c_mux_gpio_request_dt(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   return -EINVAL;
+}
 #endif
 
 static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
@@ -172,6 +202,9 @@ static int i2c_mux_gpio_request_plat(struct gpiomux *mux,
struct device *gpio_dev;
int i, ret;
 
+   if (!mux->data.gpios)
+   return -EINVAL;
+
root = i2c_root_adapter(&muxc->parent->dev);
 
for (i = 0; i < mux->data.n_gpios; i++) {
@@ -263,7 +296,10 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
mux->initial_state = mux->data.values[0];
}
 
-   ret = i2c_mux_gpio_request_plat(mux, pdev);
+   if (!dev_get_platdata(&pdev->dev))
+   ret = i2c_mux_gpio_request_dt(mux, pdev);
+   else
+   ret = i2c_mux_gpio_request_plat(mux, pdev);
if (ret)
goto alloc_failed;
 
-- 
2.21.0

[PATCH v2 1/3] i2c-mux-gpio: Unpin a platform-based device initialization

2019-04-25 Thread Serge Semin

We can unpin a code specific for i2c-mux-gpio device declared
as platform device. In this case the platform data just needs to be
copied to the private storage and if GPIO chip pointer is referring to
a valid GPIO chip descriptor save it' base number for further GPIOs
request and initialization. The rest of the code is common for both
platform and OF-based setups.

It's also pointless and might be even errors prone to proceed with
further initialization if OF kernel config is disabled and plat-based
initialization isn't defined. Just return an error in this case.

Signed-off-by: Serge Semin 

---
Changelog v2
- Return an error if OF kconfig is disabled while dt-based GPIOs probe
  is called.
---
 drivers/i2c/muxes/i2c-mux-gpio.c | 69 ++--
 1 file changed, 38 insertions(+), 31 deletions(-)

diff --git a/drivers/i2c/muxes/i2c-mux-gpio.c b/drivers/i2c/muxes/i2c-mux-gpio.c
index 13882a2a4f60..54158b825acd 100644
--- a/drivers/i2c/muxes/i2c-mux-gpio.c
+++ b/drivers/i2c/muxes/i2c-mux-gpio.c
@@ -132,48 +132,55 @@ static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
 static int i2c_mux_gpio_probe_dt(struct gpiomux *mux,
struct platform_device *pdev)
 {
-   return 0;
+   return -EINVAL;
 }
 #endif
 
+static int i2c_mux_gpio_probe_plat(struct gpiomux *mux,
+   struct platform_device *pdev)
+{
+   struct i2c_mux_gpio_platform_data *data = dev_get_platdata(&pdev->dev);
+   struct gpio_chip *gpio;
+
+   /*
+* If a GPIO chip name is provided, the GPIO pin numbers provided are
+* relative to its base GPIO number. Otherwise they are absolute.
+*/
+   if (data->gpio_chip) {
+   gpio = gpiochip_find(data->gpio_chip,
+match_gpio_chip_by_label);
+   if (!gpio)
+   return -EPROBE_DEFER;
+
+   mux->gpio_base = gpio->base;
+   } else {
+   mux->gpio_base = 0;
+   }
+
+   memcpy(&mux->data, data, sizeof(mux->data));
+
+   return 0;
+}
+
 static int i2c_mux_gpio_probe(struct platform_device *pdev)
 {
struct i2c_mux_core *muxc;
struct gpiomux *mux;
struct i2c_adapter *parent;
struct i2c_adapter *root;
-   unsigned initial_state, gpio_base;
+   unsigned initial_state;
int i, ret;
 
mux = devm_kzalloc(&pdev->dev, sizeof(*mux), GFP_KERNEL);
if (!mux)
return -ENOMEM;
 
-   if (!dev_get_platdata(&pdev->dev)) {
+   if (!dev_get_platdata(&pdev->dev))
ret = i2c_mux_gpio_probe_dt(mux, pdev);
-   if (ret < 0)
-   return ret;
-   } else {
-   memcpy(&mux->data, dev_get_platdata(&pdev->dev),
-   sizeof(mux->data));
-   }
-
-   /*
-* If a GPIO chip name is provided, the GPIO pin numbers provided are
-* relative to its base GPIO number. Otherwise they are absolute.
-*/
-   if (mux->data.gpio_chip) {
-   struct gpio_chip *gpio;
-
-   gpio = gpiochip_find(mux->data.gpio_chip,
-match_gpio_chip_by_label);
-   if (!gpio)
-   return -EPROBE_DEFER;
-
-   gpio_base = gpio->base;
-   } else {
-   gpio_base = 0;
-   }
+   else
+   ret = i2c_mux_gpio_probe_plat(mux, pdev);
+   if (ret)
+   return ret;
 
parent = i2c_get_adapter(mux->data.parent);
if (!parent)
@@ -194,7 +201,6 @@ static int i2c_mux_gpio_probe(struct platform_device *pdev)
root = i2c_root_adapter(&parent->dev);
 
muxc->mux_locked = true;
-   mux->gpio_base = gpio_base;
 
if (mux->data.idle != I2C_MUX_GPIO_NO_IDLE) {
initial_state = mux->data.idle;
@@ -207,14 +213,15 @@ static int i2c_mux_gpio_probe(struct platform_device 
*pdev)
struct device *gpio_dev;
struct gpio_desc *gpio_desc;
 
-   ret = gpio_request(gpio_base + mux->data.gpios[i], 
"i2c-mux-gpio");
+   ret = gpio_request(mux->gpio_base + mux->data.gpios[i],
+  "i2c-mux-gpio");
if (ret) {
dev_err(&pdev->dev, "Failed to request GPIO %d\n",
mux->data.gpios[i]);
goto err_request_gpio;
}
 
-   ret = gpio_direction_output(gpio_base + mux->data.gpios[i],
+   ret = gpio_direction_output(mux->gpio_base + mux->data.gpios[i],
initial_state & (1 << i));
if (ret) {
dev_e

Re: [PATCH 04/12] mips: Reserve memory for the kernel image resources

2019-04-25 Thread Serge Semin

On Wed, Apr 24, 2019 at 10:43:48PM +, Paul Burton wrote:
> Hi Serge,
> 
> On Wed, Apr 24, 2019 at 01:47:40AM +0300, Serge Semin wrote:
> > The reserved_end variable had been used by the bootmem_init() code
> > to find a lowest limit of memory available for memmap blob. The original
> > code just tried to find a free memory space higher than kernel was placed.
> > This limitation seems justified for the memmap ragion search process, but
> > I can't see any obvious reason to reserve the unused space below kernel
> > seeing some platforms place it much higher than standard 1MB.
> 
> There are 2 reasons I'm aware of:
> 
>  1) Older systems generally had something like an ISA bus which used
> addresses below the kernel, and bootloaders like YAMON left behind
> functions that could be called right at the start of RAM. This sort
> of thing should be accounted for by /memreserve/ in DT or similar
> platform-specific reservations though rather than generically, and
> at least Malta & SEAD-3 DTs already have /memreserve/ entries for
> it. So this part I think is OK. Some other older platforms might
> need updating, but that's fine.
> 

Regarding ISA. As far as I remember devices on that bus can DMA only to the
lowest 16MB. So in case if kernel is too big or placed pretty much high,
they may be left even without reachable memory at all in current
implementation.

>  2) trap_init() only allocates memory for the exception vector if using
> a vectored interrupt mode. In other cases it just uses CAC_BASE
> which currently gets reserved as part of this region between
> PHYS_OFFSET & _text.
> 
> I think this behavior is bogus, and we should instead:
> 
> - Allocate the exception vector memory using memblock_alloc() for
>   CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable
>   EBase register). If we're not using vectored interrupts then
>   allocating one page will do, and we already have the size
>   calculation for if we are.
> 
> - Otherwise use CAC_BASE but call memblock_reserve() on the first
>   page.
> 
> I think we should make that change before this one goes in. I can
> try to get to it tomorrow, but feel free to beat me to it.
> 

As far as I understood you and the code this should be enough to fix
the problem:
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 98ca55d62201..f680253e2617 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2326,6 +2326,8 @@ void __init trap_init(void)
ebase += (read_c0_ebase() & 0x3000);
}
}
+
+   memblock_reserve(ebase, PAGE_SIZE);
}
 
if (cpu_has_mmips) {
---

Allocation has already been implemented in the if-branch under the
(cpu_has_veic || cpu_has_vint) condition. So we don't need to change
there anything.
In case if vectored interrupts aren't supported the else-clause is
taken and we need to reserve whatever is set in the exception base
address variable.

I'll add this patch between 3d and 4th ones if you are ok with it.

-Sergey

> Thanks,
> Paul

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1780 matches

Mail list logo