RE: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-19 Thread Allen Hubbe
From: Serge Semin
> 3) IDT driver redevelopment will take a lot of time, since I don't have much 
> free time to
> do it. It may be half of year or even more.
> 
> From my side, such an improvement will significantly complicate the NTB 
> Kernel API. Since
> you are the subsystem maintainer it's your decision which design to choose, 
> but I don't think
> I'll do the IDT driver suitable for this design anytime soon.

I'm sorry to have made you feel that way.

> > I hope we got it settled now. If not, We can have a Skype conversation, 
> > since writing
> such a long letters takes lot of time.

Come join irc.oftc.net #ntb



Re: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-19 Thread Serge Semin
Allen,
There is no any comment below, just this one.

After a short meditation I realized what you are trying to achieve. Your 
primary intentions
was to unify the NTB interface so it would fit both Inte/AMD and IDT hardware 
without doing
any abstraction. You may understand why I so eager in refusal this. The reason 
of most of my
objection is that making such a unified interface will lead to IDT driver 
complete redevelopment.

IDT driver is developed to fit your previous NTB Kernel API. So of course I've 
made some
abstraction to keep it suitable for API and make it as simple as possible. 
That's why I
introduced coupled Messaging subsystem and kernel threads to deliver messages.

Here are my conclusions if you still want a new inified interface:
1) I'm still eager of renaming the ntb_mw_* and ntb_peer_mw_* prefixed methods 
(see the illustrated
comment in my previous email). It just a matter of names syntax unification, so 
it would not look
confusing.

2) We could make the following interface.
Before getting to a possible interface, IDT hardware doesn't continuously 
enumerate the ports. For
instance, NTB functions can be activated on the 0, 2, 4, 6, 8, 12, 16 and 20 
ports. Activation is
usually done over an SMBus interface or using a EEPROM firmware.

I won't describe all the interface methods arguments, just new and important 
ones:

 - Link Up/down interface
ntb_link_is_up(ntb, port);
ntb_link_enable(ntb, port);
ntb_link_disable(ntb, port);

 - Memory windows interface
ntb_get_port_map(ntb); - return an array of ports with NTB function activated. 
There can be only
NTB function activated per port.

ntb_mw_count(ntb); - total number of local memory windows which can be 
initialized (up to 24 for IDT).
ntb_mw_get_maprsc(ntb, idx); - get the mapping resources of the memory window. 
Client
driver should know from internal logic which port is assigned to which memory 
window.
ntb_mw_get_align(ntb, idx); - return translation address alignment of the local 
memory window.
ntb_mw_set_trans(ntb, idx, port); - set a translation address of the 
corresponding local memory
window, so it would be connected with the RC memory of the corresponding port.
ntb_mw_get_trans(ntb, idx, port); - get a translation address of the 
corresponding local memory
window.

ntb_peer_mw_count(ntb); - total number of peer memory windows (up to 24 for 
IDT, but they can't
be reachable because of the race conditions I described in the first emails).
ntb_peer_mw_get_align(ntb, idx); - return translation address alignment of the 
peer memory window.
ntb_peer_mw_set_trans(ntb, idx, port); - set a translation address of the 
corresponding peer memory
window, so it would be connected with the RC memory of the corresponding port 
(it won't work
for IDT because of the race condition).
ntb_peer_mw_get_trans(ntb, idx, port); - get a translation address of the 
corresponding peer memory
window (it won't work for IDT).

 - Doorbell interface
Doorbells are kind of tricky in IDT. They aren't traditional doorbells like the 
AMD/Intel ones,
because of the multiple NTB-ports. First of all there is a global doorbell 
register, which is
32-bits wide. Each port has its own outbound and inbound doorbell registers 
(each one of 32-bits
wide). There is a global mask registers, which can mask ports outbound doorbell 
registers from
affecting the global doorbell register and can mask ports inbound doorbell 
registers from being
affected by the global doorbell register.
Those mask registers can not be safely accessed from a different ports, because 
of the damn race
condition. Instead we can leave them as is, so all the outbound doorbells 
affects all the bits of
global doorbell register and all the inbound doorbells are affected by the all 
the bits of the
global doorbell register.

So to speak we can leave the doorbell interface as is.

 - Scratchpad interface
Since the scratchpad registers are kind of just shared storages, we can leave 
the interface as
is. I don't think IDT will introduce Scratchpad registers in their any new 
multiport NTB-related
hardware.

 - Messaging interface
Partly we can stick to your design, but I would split the inbound and outbound 
message statuses, because
in this way client driver developer won't have to know which part of bit-field 
is related to which
inbound and outbound messages:
ntb_msg_event(ntb); - received a hardware interrupt for messages. (don't read 
message status, or anything else)
ntb_msg_read_sts_in(ntb); - read and return inbound MSGSTS bitmask.
ntb_msg_clear_sts_in(ntb) - clear bits of inbound MSGSTS bitmask.
ntb_msg_set_mask_in(ntb); - set bits in inbound part of MSGSTSMSK.
ntb_msg_clear_mask_in(ntb); - clear bits in inbound part of MSGSTSMSK.
ntb_msg_read_sts_out(ntb); - read and return outbound MSGSTS bitmask.
ntb_msg_clear_sts_out(ntb); - clear bits of outbound MSGSTS bitmask.
ntb_msg_set_mask_out(ntb); - set bits in outbound part of MSGSTSMSK.
ntb_msg_clear_mask_out(ntb); - clear bits in outbound part 

Re: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-18 Thread Serge Semin
Hello Allen,
Sorry for the delay with response and thanks for thoughtful review.

On Mon, Aug 08, 2016 at 05:48:42PM -0400, Allen Hubbe  
wrote:
> From: Serge Semin
> > Hello Allen.
> > 
> > Thanks for your careful review. Going through this mailing thread I hope 
> > we'll come up
> > with solutions, which improve the driver code as well as extend the Linux 
> > kernel support
> > of new devices like IDT PCIe-swtiches.
> > 
> > Before getting to the inline commentaries I need to give some introduction 
> > to the IDT NTB-
> > related hardware so we could speak on the same language. Additionally I'll 
> > give a brief
> > explanation how the setup of memory windows works in IDT PCIe-switches.
> 
> I found this to use as a reference for IDT:
> https://www.idt.com/document/man/89hpes24nt24g2-device-user-manual

Yes, it's supported by the IDT driver, although I am using a device with lesser 
number of ports:
https://www.idt.com/document/man/89hpes32nt8ag2-device-user-manual

> 
> > First of all, before getting into the IDT NTB driver development I had made 
> > a research of
> > the currently developed NTB kernel API and AMD/Intel hardware drivers. Due 
> > to lack of the
> > hardware manuals It might be not in deep details, but I understand how the 
> > AMD/Intel NTB-
> > hardware drivers work. At least I understand the concept of memory 
> > windowing, which led to
> > the current NTB bus kernel API.
> > 
> > So lets get to IDT PCIe-switches. There is a whole series of NTB-related 
> > switches IDT
> > produces. All of them I split into two distinct groups:
> > 1) Two NTB-ported switches (models 89PES8NT2, 89PES16NT2, 89PES12NT3, 
> > 89PES124NT3),
> > 2) Multi NTB-ported switches (models 89HPES24NT6AG2, 89HPES32NT8AG2, 
> > 89HPES32NT8BG2,
> > 89HPES12NT12G2, 89HPES16NT16G2, 89HPES24NT24G2, 89HPES32NT24AG2, 
> > 89HPES32NT24BG2).
> > Just to note all of these switches are a part of IDT PRECISE(TM) family of 
> > PCI Express®
> > switching solutions. Why do I split them up? Because of the next reasons:
> > 1) Number of upstream ports, which have access to NTB functions (obviously, 
> > yeah? =)). So
> > the switches of the first group can connect just two domains over NTB. 
> > Unlike the second
> > group of switches, which expose a way to setup an interaction between 
> > several PCIe-switch
> > ports, which have NT-function activated.
> > 2) The groups are significantly distinct by the way of NT-functions 
> > configuration.
> > 
> > Before getting further, I should note, that the uploaded driver supports 
> > the second group
> > of devices only. But still I'll give a comparative explanation, since the 
> > first group of
> > switches is very similar to the AMD/Intel NTBs.
> > 
> > Lets dive into the configurations a bit deeper. Particularly NT-functions 
> > of the first
> > group of switches can be configured the same way as AMD/Intel NTB-functions 
> > are. There is
> > an PCIe end-point configuration space, which fully reflects the 
> > cross-coupled local and
> > peer PCIe/NTB settings. So local Root complex can set any of the peer 
> > registers by direct
> > writing to mapped memory. Here is the image, which perfectly explains the 
> > configuration
> > registers mapping:
> > https://s8.postimg.org/3nhkzqfxx/IDT_NTB_old_configspace.png
> > Since the first group switches connect only two root complexes, the race 
> > condition of
> > read/write operations to cross-coupled registers can be easily resolved 
> > just by roles
> > distribution. So local root complex sets the translated base address 
> > directly to a peer
> > configuration space registers, which correspond to BAR0-BAR3 locally mapped 
> > memory
> > windows. Of course 2-4 memory windows is enough to connect just two 
> > domains. That's why
> > you made the NTB bus kernel API the way it is.
> > 
> > The things get different when one wants to have an access from one domain 
> > to multiple
> > coupling up to eight root complexes in the second group of switches. First 
> > of all the
> > hardware doesn't support the configuration space cross-coupling anymore. 
> > Instead there are
> > two Global Address Space Access registers provided to have an access to a 
> > peers
> > configuration space. In fact it is not a big problem, since there are no 
> > much differences
> > in accessing registers over a memory mapped space or a pair of fixed 
> > Address/Data
> > registers. The problem arises when one wants to share a memory windows 
> > between eight
> > domains. Five BARs are not enough for it even if they'd be configured to be 
> > of x32 address
> > type. Instead IDT introduces Lookup table address translation. So BAR2/BAR4 
> > can be
> > configured to translate addresses using 12 or 24 entries lookup tables. 
> > Each entry can be
> > initialized with translated base address of a peer and IDT switch port, 
> > which peer is
> > connected to. So when local root complex locally maps BAR2/BAR4, one can 
> > have an access to
> > a memory 

RE: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-08 Thread Allen Hubbe
From: Serge Semin
> Hello Allen.
> 
> Thanks for your careful review. Going through this mailing thread I hope 
> we'll come up
> with solutions, which improve the driver code as well as extend the Linux 
> kernel support
> of new devices like IDT PCIe-swtiches.
> 
> Before getting to the inline commentaries I need to give some introduction to 
> the IDT NTB-
> related hardware so we could speak on the same language. Additionally I'll 
> give a brief
> explanation how the setup of memory windows works in IDT PCIe-switches.

I found this to use as a reference for IDT:
https://www.idt.com/document/man/89hpes24nt24g2-device-user-manual

> First of all, before getting into the IDT NTB driver development I had made a 
> research of
> the currently developed NTB kernel API and AMD/Intel hardware drivers. Due to 
> lack of the
> hardware manuals It might be not in deep details, but I understand how the 
> AMD/Intel NTB-
> hardware drivers work. At least I understand the concept of memory windowing, 
> which led to
> the current NTB bus kernel API.
> 
> So lets get to IDT PCIe-switches. There is a whole series of NTB-related 
> switches IDT
> produces. All of them I split into two distinct groups:
> 1) Two NTB-ported switches (models 89PES8NT2, 89PES16NT2, 89PES12NT3, 
> 89PES124NT3),
> 2) Multi NTB-ported switches (models 89HPES24NT6AG2, 89HPES32NT8AG2, 
> 89HPES32NT8BG2,
> 89HPES12NT12G2, 89HPES16NT16G2, 89HPES24NT24G2, 89HPES32NT24AG2, 
> 89HPES32NT24BG2).
> Just to note all of these switches are a part of IDT PRECISE(TM) family of 
> PCI Express®
> switching solutions. Why do I split them up? Because of the next reasons:
> 1) Number of upstream ports, which have access to NTB functions (obviously, 
> yeah? =)). So
> the switches of the first group can connect just two domains over NTB. Unlike 
> the second
> group of switches, which expose a way to setup an interaction between several 
> PCIe-switch
> ports, which have NT-function activated.
> 2) The groups are significantly distinct by the way of NT-functions 
> configuration.
> 
> Before getting further, I should note, that the uploaded driver supports the 
> second group
> of devices only. But still I'll give a comparative explanation, since the 
> first group of
> switches is very similar to the AMD/Intel NTBs.
> 
> Lets dive into the configurations a bit deeper. Particularly NT-functions of 
> the first
> group of switches can be configured the same way as AMD/Intel NTB-functions 
> are. There is
> an PCIe end-point configuration space, which fully reflects the cross-coupled 
> local and
> peer PCIe/NTB settings. So local Root complex can set any of the peer 
> registers by direct
> writing to mapped memory. Here is the image, which perfectly explains the 
> configuration
> registers mapping:
> https://s8.postimg.org/3nhkzqfxx/IDT_NTB_old_configspace.png
> Since the first group switches connect only two root complexes, the race 
> condition of
> read/write operations to cross-coupled registers can be easily resolved just 
> by roles
> distribution. So local root complex sets the translated base address directly 
> to a peer
> configuration space registers, which correspond to BAR0-BAR3 locally mapped 
> memory
> windows. Of course 2-4 memory windows is enough to connect just two domains. 
> That's why
> you made the NTB bus kernel API the way it is.
> 
> The things get different when one wants to have an access from one domain to 
> multiple
> coupling up to eight root complexes in the second group of switches. First of 
> all the
> hardware doesn't support the configuration space cross-coupling anymore. 
> Instead there are
> two Global Address Space Access registers provided to have an access to a 
> peers
> configuration space. In fact it is not a big problem, since there are no much 
> differences
> in accessing registers over a memory mapped space or a pair of fixed 
> Address/Data
> registers. The problem arises when one wants to share a memory windows 
> between eight
> domains. Five BARs are not enough for it even if they'd be configured to be 
> of x32 address
> type. Instead IDT introduces Lookup table address translation. So BAR2/BAR4 
> can be
> configured to translate addresses using 12 or 24 entries lookup tables. Each 
> entry can be
> initialized with translated base address of a peer and IDT switch port, which 
> peer is
> connected to. So when local root complex locally maps BAR2/BAR4, one can have 
> an access to
> a memory of a peer just by reading/writing with a shift corresponding to the 
> lookup table
> entry. That's how more than five peers can be accessed. The root problem is 
> the way the
> lookup table is accessed. Alas It is accessed only by a pair of "Entry 
> index/Data"
> registers. So a root complex must write an entry index to one registers, then 
> read/write
> data from another. As you might realise, that weak point leads to a race 
> condition of
> multiple root complexes accessing the lookup table of one shared peer.

Re: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-07 Thread Serge Semin
Hello Allen.

Thanks for your careful review. Going through this mailing thread I hope we'll 
come up with solutions, which improve the driver code as well as extend the 
Linux kernel support of new devices like IDT PCIe-swtiches.

Before getting to the inline commentaries I need to give some introduction to 
the IDT NTB-related hardware so we could speak on the same language. 
Additionally I'll give a brief explanation how the setup of memory windows 
works in IDT PCIe-switches.

First of all, before getting into the IDT NTB driver development I had made a 
research of the currently developed NTB kernel API and AMD/Intel hardware 
drivers. Due to lack of the hardware manuals It might be not in deep details, 
but I understand how the AMD/Intel NTB-hardware drivers work. At least I 
understand the concept of memory windowing, which led to the current NTB bus 
kernel API.

So lets get to IDT PCIe-switches. There is a whole series of NTB-related 
switches IDT produces. All of them I split into two distinct groups:
1) Two NTB-ported switches (models 89PES8NT2, 89PES16NT2, 89PES12NT3, 
89PES124NT3),
2) Multi NTB-ported switches (models 89HPES24NT6AG2, 89HPES32NT8AG2, 
89HPES32NT8BG2, 89HPES12NT12G2, 89HPES16NT16G2, 89HPES24NT24G2, 
89HPES32NT24AG2, 89HPES32NT24BG2).
Just to note all of these switches are a part of IDT PRECISE(TM) family of PCI 
Express® switching solutions. Why do I split them up? Because of the next 
reasons:
1) Number of upstream ports, which have access to NTB functions (obviously, 
yeah? =)). So the switches of the first group can connect just two domains over 
NTB. Unlike the second group of switches, which expose a way to setup an 
interaction between several PCIe-switch ports, which have NT-function activated.
2) The groups are significantly distinct by the way of NT-functions 
configuration.

Before getting further, I should note, that the uploaded driver supports the 
second group of devices only. But still I'll give a comparative explanation, 
since the first group of switches is very similar to the AMD/Intel NTBs.

Lets dive into the configurations a bit deeper. Particularly NT-functions of 
the first group of switches can be configured the same way as AMD/Intel 
NTB-functions are. There is an PCIe end-point configuration space, which fully 
reflects the cross-coupled local and peer PCIe/NTB settings. So local Root 
complex can set any of the peer registers by direct writing to mapped memory. 
Here is the image, which perfectly explains the configuration registers mapping:
https://s8.postimg.org/3nhkzqfxx/IDT_NTB_old_configspace.png
Since the first group switches connect only two root complexes, the race 
condition of read/write operations to cross-coupled registers can be easily 
resolved just by roles distribution. So local root complex sets the translated 
base address directly to a peer configuration space registers, which correspond 
to BAR0-BAR3 locally mapped memory windows. Of course 2-4 memory windows is 
enough to connect just two domains. That's why you made the NTB bus kernel API 
the way it is.

The things get different when one wants to have an access from one domain to 
multiple coupling up to eight root complexes in the second group of switches. 
First of all the hardware doesn't support the configuration space 
cross-coupling anymore. Instead there are two Global Address Space Access 
registers provided to have an access to a peers configuration space. In fact it 
is not a big problem, since there are no much differences in accessing 
registers over a memory mapped space or a pair of fixed Address/Data registers. 
The problem arises when one wants to share a memory windows between eight 
domains. Five BARs are not enough for it even if they'd be configured to be of 
x32 address type. Instead IDT introduces Lookup table address translation. So 
BAR2/BAR4 can be configured to translate addresses using 12 or 24 entries 
lookup tables. Each entry can be initialized with translated base address of a 
peer and IDT switch port, which peer is connected to. So when local root 
complex locally maps BAR2/BAR4, one can have an access to a memory of a peer 
just by reading/writing with a shift corresponding to the lookup table entry. 
That's how more than five peers can be accessed. The root problem is the way 
the lookup table is accessed. Alas It is accessed only by a pair of "Entry 
index/Data" registers. So a root complex must write an entry index to one 
registers, then read/write data from another. As you might realise, that weak 
point leads to a race condition of multiple root complexes accessing the lookup 
table of one shared peer. Alas I could not come up with a simple and strong 
solution of the race.

That's why I've introduced the asynchronous hardware in the NTB bus kernel API. 
Since local root complex can't directly write a translated base address to a 
peer, it must wait until a peer asks him to allocate a memory and send the 
address back using some of a hardwa

RE: [PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-08-05 Thread Allen Hubbe
From: Serge Semin
> Currently supported AMD and Intel Non-transparent PCIe-bridges are synchronous
> devices, so translated base address of memory windows can be direcly written
> to peer registers. But there are some IDT PCIe-switches which implement
> complex interfaces using Lookup Tables of translation addresses. Due to
> the way the table is accessed, it can not be done synchronously from different
> RCs, that's why the asynchronous interface should be developed.
> 
> For these purpose the Memory Window related interface is correspondingly split
> as it is for Doorbell and Scratchpad registers. The definition of Memory 
> Window
> is following: "It is a virtual memory region, which locally reflects a 
> physical
> memory of peer device." So to speak the "ntb_peer_mw_"-prefixed methods 
> control
> the peers memory windows, "ntb_mw_"-prefixed functions work with the local
> memory windows.
> Here is the description of the Memory Window related NTB-bus callback
> functions:
>  - ntb_mw_count() - number of local memory windows.
>  - ntb_mw_get_maprsc() - get the physical address and size of the local memory
>  window to map.
>  - ntb_mw_set_trans() - set translation address of local memory window (this
> address should be somehow retrieved from a peer).
>  - ntb_mw_get_trans() - get translation address of local memory window.
>  - ntb_mw_get_align() - get alignment of translated base address and size of
> local memory window. Additionally one can get the
> upper size limit of the memory window.
>  - ntb_peer_mw_count() - number of peer memory windows (it can differ from the
>  local number).
>  - ntb_peer_mw_set_trans() - set translation address of peer memory window
>  - ntb_peer_mw_get_trans() - get translation address of peer memory window
>  - ntb_peer_mw_get_align() - get alignment of translated base address and size
>  of peer memory window.Additionally one can get 
> the
>  upper size limit of the memory window.
> 
> As one can see current AMD and Intel NTB drivers mostly implement the
> "ntb_peer_mw_"-prefixed methods. So this patch correspondingly renames the
> driver functions. IDT NTB driver mostly expose "ntb_nw_"-prefixed methods,
> since it doesn't have convenient access to the peer Lookup Table.
> 
> In order to pass information from one RC to another NTB functions of IDT
> PCIe-switch implement Messaging subsystem. They currently support four message
> registers to transfer DWORD sized data to a specified peer. So there are two
> new callback methods are introduced:
>  - ntb_msg_size() - get the number of DWORDs supported by NTB function to send
> and receive messages
>  - ntb_msg_post() - send message of size retrieved from ntb_msg_size()
> to a peer
> Additionally there is a new event function:
>  - ntb_msg_event() - it is invoked when either a new message was retrieved
>  (NTB_MSG_NEW), or last message was successfully sent
>  (NTB_MSG_SENT), or the last message failed to be sent
>  (NTB_MSG_FAIL).
> 
> The last change concerns the IDs (practically names) of NTB-devices on the
> NTB-bus. It is not good to have the devices with same names in the system
> and it brakes my IDT NTB driver from being loaded =) So I developed a simple
> algorithm of NTB devices naming. Particulary it generates names "ntbS{N}" for
> synchronous devices, "ntbA{N}" for asynchronous devices, and "ntbAS{N}" for
> devices supporting both interfaces.

Thanks for the work that went into writing this driver, and thanks for your 
patience with the review.  Please read my initial comments inline.  I would 
like to approach this from a top-down api perspective first, and settle on that 
first before requesting any specific changes in the hardware driver.  My major 
concern about these changes is that they introduce a distinct classification 
for sync and async hardware, supported by different sets of methods in the api, 
neither is a subset of the other.

You know the IDT hardware, so if any of my requests below are infeasible, I 
would like your constructive opinion (even if it means significant changes to 
existing drivers) on how to resolve the api so that new and existing hardware 
drivers can be unified under the same api, if possible.

> 
> Signed-off-by: Serge Semin 
> 
> ---
>  drivers/ntb/Kconfig |   4 +-
>  drivers/ntb/hw/amd/ntb_hw_amd.c |  49 ++-
>  drivers/ntb/hw/intel/ntb_hw_intel.c |  59 +++-
>  drivers/ntb/ntb.c   |  86 +-
>  drivers/ntb/ntb_transport.c |  19 +-
>  drivers/ntb/test/ntb_perf.c |  16 +-
>  drivers/ntb/test/ntb_pingpong.c |   5 +
>  drivers/ntb/test/ntb_tool.c |  25 +-
>  include/linux/ntb.h | 600 
> +---

[PATCH v2 1/3] ntb: Add asynchronous devices support to NTB-bus interface

2016-07-28 Thread Serge Semin
Currently supported AMD and Intel Non-transparent PCIe-bridges are synchronous
devices, so translated base address of memory windows can be direcly written
to peer registers. But there are some IDT PCIe-switches which implement
complex interfaces using Lookup Tables of translation addresses. Due to
the way the table is accessed, it can not be done synchronously from different
RCs, that's why the asynchronous interface should be developed.

For these purpose the Memory Window related interface is correspondingly split
as it is for Doorbell and Scratchpad registers. The definition of Memory Window
is following: "It is a virtual memory region, which locally reflects a physical
memory of peer device." So to speak the "ntb_peer_mw_"-prefixed methods control
the peers memory windows, "ntb_mw_"-prefixed functions work with the local
memory windows.
Here is the description of the Memory Window related NTB-bus callback
functions:
 - ntb_mw_count() - number of local memory windows.
 - ntb_mw_get_maprsc() - get the physical address and size of the local memory
 window to map.
 - ntb_mw_set_trans() - set translation address of local memory window (this
address should be somehow retrieved from a peer).
 - ntb_mw_get_trans() - get translation address of local memory window.
 - ntb_mw_get_align() - get alignment of translated base address and size of
local memory window. Additionally one can get the
upper size limit of the memory window.
 - ntb_peer_mw_count() - number of peer memory windows (it can differ from the
 local number).
 - ntb_peer_mw_set_trans() - set translation address of peer memory window
 - ntb_peer_mw_get_trans() - get translation address of peer memory window
 - ntb_peer_mw_get_align() - get alignment of translated base address and size
 of peer memory window.Additionally one can get the
 upper size limit of the memory window.

As one can see current AMD and Intel NTB drivers mostly implement the
"ntb_peer_mw_"-prefixed methods. So this patch correspondingly renames the
driver functions. IDT NTB driver mostly expose "ntb_nw_"-prefixed methods,
since it doesn't have convenient access to the peer Lookup Table.

In order to pass information from one RC to another NTB functions of IDT
PCIe-switch implement Messaging subsystem. They currently support four message
registers to transfer DWORD sized data to a specified peer. So there are two
new callback methods are introduced:
 - ntb_msg_size() - get the number of DWORDs supported by NTB function to send
and receive messages
 - ntb_msg_post() - send message of size retrieved from ntb_msg_size()
to a peer
Additionally there is a new event function:
 - ntb_msg_event() - it is invoked when either a new message was retrieved
 (NTB_MSG_NEW), or last message was successfully sent
 (NTB_MSG_SENT), or the last message failed to be sent
 (NTB_MSG_FAIL).

The last change concerns the IDs (practically names) of NTB-devices on the
NTB-bus. It is not good to have the devices with same names in the system
and it brakes my IDT NTB driver from being loaded =) So I developed a simple
algorithm of NTB devices naming. Particulary it generates names "ntbS{N}" for
synchronous devices, "ntbA{N}" for asynchronous devices, and "ntbAS{N}" for
devices supporting both interfaces.

Signed-off-by: Serge Semin 

---
 drivers/ntb/Kconfig |   4 +-
 drivers/ntb/hw/amd/ntb_hw_amd.c |  49 ++-
 drivers/ntb/hw/intel/ntb_hw_intel.c |  59 +++-
 drivers/ntb/ntb.c   |  86 +-
 drivers/ntb/ntb_transport.c |  19 +-
 drivers/ntb/test/ntb_perf.c |  16 +-
 drivers/ntb/test/ntb_pingpong.c |   5 +
 drivers/ntb/test/ntb_tool.c |  25 +-
 include/linux/ntb.h | 600 +---
 9 files changed, 701 insertions(+), 162 deletions(-)

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index 95944e5..67d80c4 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -14,8 +14,6 @@ if NTB
 
 source "drivers/ntb/hw/Kconfig"
 
-source "drivers/ntb/test/Kconfig"
-
 config NTB_TRANSPORT
tristate "NTB Transport Client"
help
@@ -25,4 +23,6 @@ config NTB_TRANSPORT
 
 If unsure, say N.
 
+source "drivers/ntb/test/Kconfig"
+
 endif # NTB
diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 6ccba0d..ab6f353 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "ntb_hw_amd.h"
@@ -84,11 +85,8 @@ static int amd_ntb_mw_count(struct ntb_dev *ntb)
return ntb_ndev(ntb)->mw_count;
 }
 
-static int amd_ntb_mw_get_range(struct ntb_dev *ntb, int idx,
-