Re: [PATCH] powerpc: process.c: fix Kconfig typo

2016-10-25 Thread Michael Ellerman
Cyril Bur  writes:

> On Wed, 2016-10-05 at 07:57 +0200, Valentin Rothberg wrote:
>> s/ALIVEC/ALTIVEC/
>> 
>
> Oops, nice catch
>
>> Signed-off-by: Valentin Rothberg 
>
> Reviewed-by: Cyril Bur 

How did we not notice? Sounds like we need a new selftest.

Looks like this should have:

Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware 
transactional memory in use")


And I guess I need to start running checkkconfigsymbols.py on every
commit.

cheers


[PATCH v3 2/2] PCI: Disable VF's memory space on updating IOV BAR in pci_update_resource()

2016-10-25 Thread Gavin Shan
pci_update_resource() might be called to update (shift) IOV BARs
in PPC PowerNV specific pcibios_sriov_enable() when enabling PF's
SRIOV capability. At that point, the PF have been functional if
the SRIOV is enabled through sysfs entry "sriov_numvfs". The PF's
memory decoding (0x2 in PCI_COMMAND) shouldn't be disabled when
updating its IOV BARs with pci_update_resource(). Otherwise, we
receives EEH error caused by MMIO access to PF's memory BARs during
the window when PF's memory decoding is disabled.

   sriov_numvfs_store
   pdev->driver->sriov_configure
   mlx5_core_sriov_configure
   pci_enable_sriov
   sriov_enable
   pcibios_sriov_enable
   pnv_pci_sriov_enable
   pnv_pci_vf_resource_shift
   pci_update_resource

This disables VF's memory space instead of PF's memory decoding
when 64-bits IOV BARs are updated in pci_update_resource().

Reported-by: Carol Soto 
Suggested-by: Bjorn Helgaas 
Signed-off-by: Gavin Shan 
Tested-by: Carol Soto 
---
 drivers/pci/setup-res.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 66c4d8f..1456896 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -29,10 +29,10 @@
 void pci_update_resource(struct pci_dev *dev, int resno)
 {
struct pci_bus_region region;
-   bool disable;
-   u16 cmd;
+   bool disable = false;
+   u16 cmd, bit;
u32 new, check, mask;
-   int reg;
+   int reg, cmd_reg;
enum pci_bar_type type;
struct resource *res = dev->resource + resno;
 
@@ -81,11 +81,23 @@ void pci_update_resource(struct pci_dev *dev, int resno)
 * disable decoding so that a half-updated BAR won't conflict
 * with another device.
 */
-   disable = (res->flags & IORESOURCE_MEM_64) && !dev->mmio_always_on;
+   if (res->flags & IORESOURCE_MEM_64) {
+   if (resno <= PCI_ROM_RESOURCE) {
+   disable = !dev->mmio_always_on;
+   cmd_reg = PCI_COMMAND;
+   bit = PCI_COMMAND_MEMORY;
+   } else {
+#ifdef CONFIG_PCI_IOV
+   disable = true;
+   cmd_reg = dev->sriov->pos + PCI_SRIOV_CTRL;
+   bit = PCI_SRIOV_CTRL_MSE;
+#endif
+   }
+   }
+
if (disable) {
-   pci_read_config_word(dev, PCI_COMMAND, );
-   pci_write_config_word(dev, PCI_COMMAND,
- cmd & ~PCI_COMMAND_MEMORY);
+   pci_read_config_word(dev, cmd_reg, );
+   pci_write_config_word(dev, cmd_reg, cmd & ~bit);
}
 
pci_write_config_dword(dev, reg, new);
@@ -107,7 +119,7 @@ void pci_update_resource(struct pci_dev *dev, int resno)
}
 
if (disable)
-   pci_write_config_word(dev, PCI_COMMAND, cmd);
+   pci_write_config_word(dev, cmd_reg, cmd);
 }
 
 int pci_claim_resource(struct pci_dev *dev, int resource)
-- 
2.1.0



[PATCH v3 0/2] Disable VF's memory space on updating IOV BARs

2016-10-25 Thread Gavin Shan
This moves pcibios_sriov_enable() to the point before VF and VF BARs
are enabled on PowerNV platform. Also, pci_update_resource() is used
to update IOV BARs on PowerNV platform, the PF might have been functional
when the function is called. We shouldn't disable PF's memory decoding
at that point. Instead, the VF's memory space should be disabled.

Changelog
=
v3:
  * Disable VF's memory space when IOV BARs are updated in
pcibios_sriov_enable().
v2:
  * Added one patch calling pcibios_sriov_enable() before the VF
and VF BARs are enabled.

Gavin Shan (2):
  PCI: Call pcibios_sriov_enable() before IOV BARs are enabled
  PCI: Disable VF's memory space on updating IOV BAR in
pci_update_resource()

 drivers/pci/iov.c   | 14 +++---
 drivers/pci/setup-res.c | 28 
 2 files changed, 27 insertions(+), 15 deletions(-)

-- 
2.1.0



[PATCH v3 1/2] PCI: Call pcibios_sriov_enable() before IOV BARs are enabled

2016-10-25 Thread Gavin Shan
In current implementation, pcibios_sriov_enable() is used by PPC
PowerNV platform only. In PowerNV specific pcibios_sriov_enable(),
PF's IOV BARs might be updated (shifted) by pci_update_resource().
It means the IOV BARs aren't ready for decoding incoming memory
address until pcibios_sriov_enable() returns.

This calls pcibios_sriov_enable() earlier before the IOV BARs are
enabled. As the result, the IOV BARs have been configured correctly
when they are enabled.

Signed-off-by: Gavin Shan 
Tested-by: Carol Soto 
---
 drivers/pci/iov.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index e30f05c..d41ec29 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -306,13 +306,6 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
return rc;
}
 
-   pci_iov_set_numvfs(dev, nr_virtfn);
-   iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
-   pci_cfg_access_lock(dev);
-   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
-   msleep(100);
-   pci_cfg_access_unlock(dev);
-
iov->initial_VFs = initial;
if (nr_virtfn < initial)
initial = nr_virtfn;
@@ -323,6 +316,13 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
goto err_pcibios;
}
 
+   pci_iov_set_numvfs(dev, nr_virtfn);
+   iov->ctrl |= PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE;
+   pci_cfg_access_lock(dev);
+   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
+   msleep(100);
+   pci_cfg_access_unlock(dev);
+
for (i = 0; i < initial; i++) {
rc = pci_iov_add_virtfn(dev, i, 0);
if (rc)
-- 
2.1.0



Re: [PATCH v2 2/2] PCI: Don't disable PF's memory decoding when enabling SRIOV

2016-10-25 Thread Gavin Shan
On Mon, Oct 24, 2016 at 10:51:13PM -0500, Bjorn Helgaas wrote:
>On Tue, Oct 25, 2016 at 12:47:28PM +1100, Gavin Shan wrote:
>> On Mon, Oct 24, 2016 at 09:03:16AM -0500, Bjorn Helgaas wrote:
>> >On Mon, Oct 24, 2016 at 10:28:02AM +1100, Gavin Shan wrote:
>> >> On Fri, Oct 21, 2016 at 11:50:34AM -0500, Bjorn Helgaas wrote:
>> >> >On Fri, Sep 30, 2016 at 09:47:50AM +1000, Gavin Shan wrote:

.../...

>
>That specific case (pci_enable_device() followed by
>pci_update_resource()) should *not* work.  pci_enable_device() is
>normally called by a driver's .probe() method, and after we call a
>.probe() method, the PCI core shouldn't touch the device at all
>because there's no means of mutual exclusion between the driver and
>the PCI core.
>
>I think pci_update_resource() should only be called in situations
>where the caller already knows that nobody is using the device.  For
>regular PCI BARs, that doesn't necessarily mean PCI_COMMAND_MEMORY is
>turned off, because firmware leaves PCI_COMMAND_MEMORY enabled for
>many devices, even though nobody is using them.
>
>Anyway, I think that's a project for another day.  That's too much to
>tackle for the limited problem you're trying to solve.
>

Bjorn, it's all about discussion. Please take your time and reply when
you have bandwidth.

Well, some drivers break the order and expects the relaxed order to work.
One example is drivers/char/agp/efficeon-agp.c::agp_efficeon_probe(). I
didn't check all usage cases.

I think it's hard for user, who calls pci_update_resource(), to know
that nobody is using the devcie (limited to memory BARs as we concern).
The memory write is usually non-posted transaction and it can be on
the way to target device when pci_update_resource() is called. So which
one tranaction will be completed first (disabling memory decoding and
memory write). I guess it can happen even with the mutual exclusion,
especially on SMP system. Yes, the situation is worse without the
synchronization.

.../...

>> 
>> Yeah, it would be the solution to have. If you agree, I will post
>> updatd version according to this: Clearing PCI_SRIOV_CTRL_MSE when
>> updating IOV BARs. The bit won't be touched if pdev->mmio_always_on
>> is true.
>
>I think you should ignore pdev->mmio_always_on for IOV BARs.
>mmio_always_on is basically a workaround for devices that either don't
>follow the spec or where we didn't completely understand the problem.
>I don't think there's any reason to set mmio_always_on for SR-IOV
>devices.
>

Agree, thanks for the comments again. I will post updated version
shortly.

Thanks,
Gavin



Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

2016-10-25 Thread Reza Arbab

On Wed, Oct 26, 2016 at 09:34:18AM +1100, Balbir Singh wrote:

I still believe we need your changes, I was wondering if we've tested
it against normal memory nodes and checked if any memblock
allocations end up there. Michael showed me some memblock
allocations on node 1 of a two node machine with movable_node


The movable_node option is x86-only. Both of those nodes contain normal 
memory, so allocations on both are allowed.


Longer; if you use "movable_node", x86 can identify these nodes at 
boot. They call memblock_mark_hotplug() while parsing the SRAT. Then, 
when the zones are initialized, those markings are used to determine 
ZONE_MOVABLE.


We have no analog of this SRAT information, so our movable nodes can 
only be created post boot, by hotplugging and explicitly onlining 
with online_movable.


Is this true for all of system memory as well or only for nodes
hotplugged later?


As far as I know, power has nothing like the SRAT that tells us, at 
boot, which memory is hotpluggable. So there is nothing to wire the 
movable_node option up to.


Of course, any memory you hotplug afterwards is, by definition, 
hotpluggable. So we can still create movable nodes that way.


--
Reza Arbab



[PATCH net-next] ibmveth: v1 calculate correct gso_size and set gso_type

2016-10-25 Thread Jon Maxwell
We recently encountered a bug where a few customers using ibmveth on the 
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the 
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size 
which is translated by TCP into the MSS later up the stack. The MSS is 
used to calculate the TCP window size and as that was abnormally large, 
it was calculating a zero window, even although the sockets receive buffer 
was completely empty. 

We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
and Marcelo for all your help and review on this.

The patch fixes both our internal reproduction tests and our customers tests.

Signed-off-by: Jon Maxwell 
---
 drivers/net/ethernet/ibm/ibmveth.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index 29c05d0..c51717e 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int 
budget)
int frames_processed = 0;
unsigned long lpar_rc;
struct iphdr *iph;
+   bool large_packet = 0;
+   u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
 
 restart_poll:
while (frames_processed < budget) {
@@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, int 
budget)
iph->check = 0;
iph->check = 
ip_fast_csum((unsigned char *)iph, iph->ihl);
adapter->rx_large_packets++;
+   large_packet = 1;
}
}
}
 
+   if (skb->len > netdev->mtu) {
+   iph = (struct iphdr *)skb->data;
+   if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
+   iph->protocol == IPPROTO_TCP) {
+   hdr_len += sizeof(struct iphdr);
+   skb_shinfo(skb)->gso_type = 
SKB_GSO_TCPV4;
+   skb_shinfo(skb)->gso_size = netdev->mtu 
- hdr_len;
+   } else if (be16_to_cpu(skb->protocol) == 
ETH_P_IPV6 &&
+  iph->protocol == IPPROTO_TCP) {
+   hdr_len += sizeof(struct ipv6hdr);
+   skb_shinfo(skb)->gso_type = 
SKB_GSO_TCPV6;
+   skb_shinfo(skb)->gso_size = netdev->mtu 
- hdr_len;
+   }
+   if (!large_packet)
+   adapter->rx_large_packets++;
+   }
+
napi_gro_receive(napi, skb);/* send it up */
 
netdev->stats.rx_packets++;
-- 
1.8.3.1



Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

2016-10-25 Thread Balbir Singh


On 26/10/16 02:55, Reza Arbab wrote:
> On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:
>> After the ack, I realized there were some more checks needed, IOW
>> questions for you :)
> 
> Hey! No takebacks!
> 

I still believe we need your changes, I was wondering if we've tested
it against normal memory nodes and checked if any memblock
allocations end up there. Michael showed me some memblock
allocations on node 1 of a two node machine with movable_node
I'll double check at my end. See my question below


> The short answer is that neither of these is a concern.
> 
> Longer; if you use "movable_node", x86 can identify these nodes at boot. They 
> call memblock_mark_hotplug() while parsing the SRAT. Then, when the zones are 
> initialized, those markings are used to determine ZONE_MOVABLE.
> 
> We have no analog of this SRAT information, so our movable nodes can only be 
> created post boot, by hotplugging and explicitly onlining with online_movable.
>

Is this true for all of system memory as well or only for nodes
hotplugged later?

Balbir Singh.


Re: [PATCH V6 7/8] powerpc: Check arch.vec earlier during boot for memory features

2016-10-25 Thread Michael Bringmann
:
> On 09/21/2016 09:17 AM, Michael Bringmann wrote:
>> architecture.vec5 features: The boot-time memory management needs to
>> know the form of the "ibm,dynamic-memory-v2" property early during
>> scanning of the flattened device tree.  This patch moves execution of
>> the function pseries_probe_fw_features() early enough to be before
>> the scanning of the memory properties in the device tree to allow
>> recognition of the supported properties.
>>
>> [V2: No change]
>> [V3: Updated after commit 3808a88985b4f5f5e947c364debce4441a380fb8.]
>> [V4: Update comments]
>> [V5: Resynchronize/resubmit]
>> [V6: Resync to v4.7 kernel code]
>>
>> Signed-off-by: Michael Bringmann 
>> ---
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index 946e34f..2034edc 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -753,6 +753,9 @@ void __init early_init_devtree(void *params)
>>   */
>>  of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line);
>>
>> +/* Now try to figure out if we are running on LPAR and so on */
>> +pseries_probe_fw_features();
>> +
> 
> I'll have to defer to others on whether calling this earlier in boot
> is ok.

It is scanning the flattened device tree supplied by the BMC, though this
is not the first such call to do so.  The relevant content of the device
tree should not change between the earlier point of the relocated point,
and the later point of the former location.

> I do notice that you do not remove the call later on, any reason?

Bug in patch.  Corrected in next patch group submission.

> -Nathan
> 
>>  /* Scan memory nodes and rebuild MEMBLOCKs */
>>  of_scan_flat_dt(early_init_dt_scan_root, NULL);
>>  of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);
>>
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

2016-10-25 Thread Balbir Singh


On 26/10/16 02:55, Reza Arbab wrote:
> On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:
>> After the ack, I realized there were some more checks needed, IOW
>> questions for you :)
> 
> Hey! No takebacks!
> 

I still believe we need your changes, I was wondering if we've tested
it against normal memory nodes and checked if any memblock
allocations end up there. Michael showed me some memblock
allocations on node 1 of a two node machine with movable_node
I'll double check at my end. See my question below


> The short answer is that neither of these is a concern.
> 
> Longer; if you use "movable_node", x86 can identify these nodes at boot. They 
> call memblock_mark_hotplug() while parsing the SRAT. Then, when the zones are 
> initialized, those markings are used to determine ZONE_MOVABLE.
> 
> We have no analog of this SRAT information, so our movable nodes can only be 
> created post boot, by hotplugging and explicitly onlining with online_movable.
>

Is this true for all of system memory as well or only for nodes
hotplugged later?

Balbir Singh.


[net-next PATCH 18/27] arch/powerpc: Add option to skip DMA sync as a part of mapping

2016-10-25 Thread Alexander Duyck
This change allows us to pass DMA_ATTR_SKIP_CPU_SYNC which allows us to
avoid invoking cache line invalidation if the driver will just handle it
via a sync_for_cpu or sync_for_device call.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Alexander Duyck 
---
 arch/powerpc/kernel/dma.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index e64a601..6877e3f 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -203,6 +203,10 @@ static int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl,
for_each_sg(sgl, sg, nents, i) {
sg->dma_address = sg_phys(sg) + get_dma_offset(dev);
sg->dma_length = sg->length;
+
+   if (attrs & DMA_ATTR_SKIP_CPU_SYNC)
+   continue;
+
__dma_sync_page(sg_page(sg), sg->offset, sg->length, direction);
}
 
@@ -235,7 +239,10 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
 unsigned long attrs)
 {
BUG_ON(dir == DMA_NONE);
-   __dma_sync_page(page, offset, size, dir);
+
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+   __dma_sync_page(page, offset, size, dir);
+
return page_to_phys(page) + offset + get_dma_offset(dev);
 }
 



[PATCH V8 0/8] powerpc/devtree: Add support for 2 new DRC properties

2016-10-25 Thread Michael Bringmann
Several properties in the DRC device tree format are replaced by
more compact representations to allow, for example, for the encoding
of vast amounts of memory, and or reduced duplication of information
in related data structures.

"ibm,drc-info": This property, when present, replaces the following
four properties: "ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains".  This property is defined for all
dynamically reconfigurable platform nodes.  The "ibm,drc-info" elements
are intended to provide a more compact representation, and reduce some
search overhead.

"ibm,dynamic-memory-v2": This property replaces the "ibm,dynamic-memory"
node representation within the "ibm,dynamic-reconfiguration-memory"
property provided by the BMC.  This element format is intended to provide
a more compact representation of memory, especially, for systems with
massive amounts of RAM.  To simplify portability, this property is
converted to the "ibm,dynamic-memory" property during system boot.

"ibm,architecture.vec": Bidirectional communication mechanism between
the host system and the front end processor indicating what features
the host system supports and what features the front end processor will
actually provide.  In this case, we are indicating that the host system
can support the new device tree structures "ibm,drc-info" and
"ibm,dynamic-memory-v2".

[V1: Initial presentation of PAPR 2.7 changes to device tree.]
[V2: Revise constant names.  Fix some syntax errors.  Improve comments.]
[V3: Revise tests for presence of new properties to always scan devicetree
 instead of depending upon architecture vec, due to reboot issues.]
[V4: Rearrange some code changes in patches to better match application,
 and other code cleanup.]
[V5: Resynchronize patches.]
[V6: Resync to latest kernel commit code]
[V7: Correct mail threading]
[v8: Insert more useful variable names]

Signed-off-by: Michael Bringmann 

---

Michael Bringmann (8):
  powerpc/firmware: Add definitions for new firmware features.
  powerpc/memory: Parse new memory property to register blocks.
  powerpc/memory: Parse new memory property to initialize structures.
  pseries/hotplug init: Convert new DRC memory property for hotplug runtime
  pseries/drc-info: Search new DRC properties for CPU indexes
  hotplug/drc-info: Add code to search new devtree properties
  powerpc: Check arch.vec earlier during boot for memory features
  powerpc: Enable support for new DRC devtree properties



 arch/powerpc/include/asm/firmware.h |5 -
 arch/powerpc/include/asm/prom.h |   38 -
 arch/powerpc/kernel/prom.c  |  103 +++--
 arch/powerpc/kernel/prom_init.c |3 
 arch/powerpc/mm/numa.c  |  168 ++--
 arch/powerpc/platforms/pseries/Makefile |4 
 arch/powerpc/platforms/pseries/firmware.c   |2 
 arch/powerpc/platforms/pseries/hotplug-memory.c |   93 +++
 arch/powerpc/platforms/pseries/pseries_energy.c |  189 ---
 drivers/pci/hotplug/rpadlpar_core.c |   13 +-
 drivers/pci/hotplug/rpaphp.h|4 
 drivers/pci/hotplug/rpaphp_core.c   |  109 ++---
 12 files changed, 628 insertions(+), 103 deletions(-)

--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
m...@linux.vnet.ibm.com



Re: [PATCH net-next] ibmveth: calculate correct gso_size and set gso_type

2016-10-25 Thread Jonathan Maxwell
>> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);

> Compiler may optmize this, but maybe move hdr_len to [*] ?>

There are other places in the stack where a u16 is used for the
same purpose. So I'll rather stick to that convention.

I'll make the other formatting changes you suggested and
resubmit as v1.

Thanks

Jon

On Tue, Oct 25, 2016 at 9:31 PM, Marcelo Ricardo Leitner
 wrote:
> On Tue, Oct 25, 2016 at 04:13:41PM +1100, Jon Maxwell wrote:
>> We recently encountered a bug where a few customers using ibmveth on the
>> same LPAR hit an issue where a TCP session hung when large receive was
>> enabled. Closer analysis revealed that the session was stuck because the
>> one side was advertising a zero window repeatedly.
>>
>> We narrowed this down to the fact the ibmveth driver did not set gso_size
>> which is translated by TCP into the MSS later up the stack. The MSS is
>> used to calculate the TCP window size and as that was abnormally large,
>> it was calculating a zero window, even although the sockets receive buffer
>> was completely empty.
>>
>> We were able to reproduce this and worked with IBM to fix this. Thanks Tom
>> and Marcelo for all your help and review on this.
>>
>> The patch fixes both our internal reproduction tests and our customers tests.
>>
>> Signed-off-by: Jon Maxwell 
>> ---
>>  drivers/net/ethernet/ibm/ibmveth.c | 19 +++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
>> b/drivers/net/ethernet/ibm/ibmveth.c
>> index 29c05d0..3028c33 100644
>> --- a/drivers/net/ethernet/ibm/ibmveth.c
>> +++ b/drivers/net/ethernet/ibm/ibmveth.c
>> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int 
>> budget)
>>   int frames_processed = 0;
>>   unsigned long lpar_rc;
>>   struct iphdr *iph;
>> + bool large_packet = 0;
>> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
>
> Compiler may optmize this, but maybe move hdr_len to [*] ?
>
>>
>>  restart_poll:
>>   while (frames_processed < budget) {
>> @@ -1236,10 +1238,27 @@ static int ibmveth_poll(struct napi_struct *napi, 
>> int budget)
>>   iph->check = 0;
>>   iph->check = 
>> ip_fast_csum((unsigned char *)iph, iph->ihl);
>>   adapter->rx_large_packets++;
>> + large_packet = 1;
>>   }
>>   }
>>   }
>>
>> + if (skb->len > netdev->mtu) {
>
> [*]
>
>> + iph = (struct iphdr *)skb->data;
>> + if (be16_to_cpu(skb->protocol) == ETH_P_IP && 
>> iph->protocol == IPPROTO_TCP) {
>
> The if line above is too long, should be broken in two.
>
>> + hdr_len += sizeof(struct iphdr);
>> + skb_shinfo(skb)->gso_type = 
>> SKB_GSO_TCPV4;
>> + skb_shinfo(skb)->gso_size = 
>> netdev->mtu - hdr_len;
>> + } else if (be16_to_cpu(skb->protocol) == 
>> ETH_P_IPV6 &&
>> + iph->protocol == IPPROTO_TCP) {
> ^
> And this one should start 3 spaces later, right below be16_
>
>   Marcelo
>
>> + hdr_len += sizeof(struct ipv6hdr);
>> + skb_shinfo(skb)->gso_type = 
>> SKB_GSO_TCPV6;
>> + skb_shinfo(skb)->gso_size = 
>> netdev->mtu - hdr_len;
>> + }
>> + if (!large_packet)
>> + adapter->rx_large_packets++;
>> + }
>> +
>>   napi_gro_receive(napi, skb);/* send it up */
>>
>>   netdev->stats.rx_packets++;
>> --
>> 1.8.3.1
>>


[PATCHv2 7/7] mm: kill arch_mremap

2016-10-25 Thread Dmitry Safonov
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and
commit 2ae416b142b6 ("mm: new mm hook framework").
It also keeps the same functionality of mremapping vDSO blob with
introducing vm_special_mapping mremap op for powerpc.

Cc: Laurent Dufour 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Kirill A. Shutemov" 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: Andrew Morton 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
v2: use vdso64_pages only under CONFIG_PPC64

 arch/alpha/include/asm/Kbuild|  1 -
 arch/arc/include/asm/Kbuild  |  1 -
 arch/arm/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/Kbuild|  1 -
 arch/avr32/include/asm/Kbuild|  1 -
 arch/blackfin/include/asm/Kbuild |  1 -
 arch/c6x/include/asm/Kbuild  |  1 -
 arch/cris/include/asm/Kbuild |  1 -
 arch/frv/include/asm/Kbuild  |  1 -
 arch/h8300/include/asm/Kbuild|  1 -
 arch/hexagon/include/asm/Kbuild  |  1 -
 arch/ia64/include/asm/Kbuild |  1 -
 arch/m32r/include/asm/Kbuild |  1 -
 arch/m68k/include/asm/Kbuild |  1 -
 arch/metag/include/asm/Kbuild|  1 -
 arch/microblaze/include/asm/Kbuild   |  1 -
 arch/mips/include/asm/Kbuild |  1 -
 arch/mn10300/include/asm/Kbuild  |  1 -
 arch/nios2/include/asm/Kbuild|  1 -
 arch/openrisc/include/asm/Kbuild |  1 -
 arch/parisc/include/asm/Kbuild   |  1 -
 arch/powerpc/include/asm/mm-arch-hooks.h | 28 
 arch/powerpc/kernel/vdso.c   | 25 +
 arch/powerpc/kernel/vdso_common.c|  1 +
 arch/s390/include/asm/Kbuild |  1 -
 arch/score/include/asm/Kbuild|  1 -
 arch/sh/include/asm/Kbuild   |  1 -
 arch/sparc/include/asm/Kbuild|  1 -
 arch/tile/include/asm/Kbuild |  1 -
 arch/um/include/asm/Kbuild   |  1 -
 arch/unicore32/include/asm/Kbuild|  1 -
 arch/x86/include/asm/Kbuild  |  1 -
 arch/xtensa/include/asm/Kbuild   |  1 -
 include/asm-generic/mm-arch-hooks.h  | 16 
 include/linux/mm-arch-hooks.h| 25 -
 mm/mremap.c  |  4 
 36 files changed, 26 insertions(+), 103 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index bf8475ce85ee..0a5e0ec2842b 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -6,7 +6,6 @@ generic-y += exec.h
 generic-y += export.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index c332604606dd..e6059a808463 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,7 +22,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 0745538b26d3..44b717cb4a55 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
-generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
 generic-y += msi.h
 generic-y += param.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 44e1d7f10add..a42a1367aea4 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 241b9b9729d8..519810d0d5e1 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index 91d49c0a3118..c80181e4454f 100644
--- a/arch/blackfin/include/asm/Kbuild
+++ 

unsubscribe

2016-10-25 Thread cybin

unsubscribe



Re: [PATCH] powerpc: Use pr_warn instead of pr_warning

2016-10-25 Thread Geoff Levand
On 10/24/2016 09:00 PM, Joe Perches wrote:
> At some point, pr_warning will be removed so all logging messages use
> a consistent _warn style.
> 
> Update arch/powerpc/

>  arch/powerpc/platforms/ps3/device-init.c| 12 +---
>  arch/powerpc/platforms/ps3/mm.c |  4 ++--
>  arch/powerpc/platforms/ps3/os-area.c|  2 +-

PS3 parts look OK.

Acked-by: Geoff Levand 


Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

2016-10-25 Thread Reza Arbab

On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:

After the ack, I realized there were some more checks needed, IOW
questions for you :)


Hey! No takebacks!

The short answer is that neither of these is a concern.

Longer; if you use "movable_node", x86 can identify these nodes at boot. 
They call memblock_mark_hotplug() while parsing the SRAT. Then, when the 
zones are initialized, those markings are used to determine ZONE_MOVABLE.


We have no analog of this SRAT information, so our movable nodes can 
only be created post boot, by hotplugging and explicitly onlining with 
online_movable.



1. Have you checked to see if our memblock allocations spill
over to probably hotpluggable nodes?


Since our nodes don't exist at boot, we don't have that short window 
before the zones are drawn where the node has normal memory, and a 
kernel allocation might occur within.



2. Shouldn't we be marking nodes discovered as movable via
memblock_mark_hotplug()?


Again, this early boot marking mechanism only applies to movable_node.

--
Reza Arbab



[PATCH 4/7] powerpc/vdso: introduce init_vdso{32,64}_pagelist

2016-10-25 Thread Dmitry Safonov
Common code with allocation/initialization of vDSO's pagelist.

Impact: cleanup

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c| 27 ++-
 arch/powerpc/kernel/vdso_common.c | 22 ++
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 8010a0d82049..25d03d773c49 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -382,8 +382,6 @@ early_initcall(vdso_getcpu_init);
 
 static int __init vdso_init(void)
 {
-   int i;
-
 #ifdef CONFIG_PPC64
/*
 * Fill up the "systemcfg" stuff for backward compatibility
@@ -454,32 +452,11 @@ static int __init vdso_init(void)
}
 
 #ifdef CONFIG_VDSO32
-   /* Make sure pages are in the correct state */
-   vdso32_pagelist = kzalloc(sizeof(struct page *) * (vdso32_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso32_pagelist == NULL);
-   for (i = 0; i < vdso32_pages; i++) {
-   struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso32_pagelist[i] = pg;
-   }
-   vdso32_pagelist[i++] = virt_to_page(vdso_data);
-   vdso32_pagelist[i] = NULL;
+   init_vdso32_pagelist();
 #endif
 
 #ifdef CONFIG_PPC64
-   vdso64_pagelist = kzalloc(sizeof(struct page *) * (vdso64_pages + 2),
- GFP_KERNEL);
-   BUG_ON(vdso64_pagelist == NULL);
-   for (i = 0; i < vdso64_pages; i++) {
-   struct page *pg = virt_to_page(vdso64_kbase + i*PAGE_SIZE);
-   ClearPageReserved(pg);
-   get_page(pg);
-   vdso64_pagelist[i] = pg;
-   }
-   vdso64_pagelist[i++] = virt_to_page(vdso_data);
-   vdso64_pagelist[i] = NULL;
+   init_vdso64_pagelist();
 #endif /* CONFIG_PPC64 */
 
get_page(virt_to_page(vdso_data));
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index ac25d66134fb..c97c30606b3f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,6 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
+#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -202,6 +203,25 @@ static __init int vdso_setup(struct lib_elfinfo *v)
return 0;
 }
 
+#define init_vdso_pagelist CONCAT3(init_vdso, BITS, _pagelist)
+static __init void init_vdso_pagelist(void)
+{
+   int i;
+
+   /* Make sure pages are in the correct state */
+   vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
+ GFP_KERNEL);
+   BUG_ON(vdso_pagelist == NULL);
+   for (i = 0; i < vdso_pages; i++) {
+   struct page *pg = virt_to_page(vdso_kbase + i*PAGE_SIZE);
+
+   ClearPageReserved(pg);
+   get_page(pg);
+   vdso_pagelist[i] = pg;
+   }
+   vdso_pagelist[i++] = virt_to_page(vdso_data);
+   vdso_pagelist[i] = NULL;
+}
 
 #undef find_section
 #undef find_symbol
@@ -211,10 +231,12 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 #undef vdso_fixup_datapage
 #undef vdso_fixup_features
 #undef vdso_setup
+#undef init_vdso_pagelist
 
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
+#undef vdso_pagelist
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.0



[PATCH 3/7] powerpc/vdso: separate common code in vdso_common

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

I also switched usage of printk(KERNEL_,...) on pr_(...)
and used pr_fmt() macro for "vDSO{32,64}: " prefix.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c| 352 ++
 arch/powerpc/kernel/vdso_common.c | 221 
 2 files changed, 234 insertions(+), 339 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso_common.c

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 278b9aa25a1c..8010a0d82049 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,13 +51,13 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static void *vdso32_kbase;
 static struct page **vdso32_pagelist;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
 #ifdef CONFIG_VDSO32
 extern char vdso32_start, vdso32_end;
+static void *vdso32_kbase;
 #endif
 
 #ifdef CONFIG_PPC64
@@ -246,250 +246,16 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
 }
 
-
-
 #ifdef CONFIG_VDSO32
-static void * __init find_section32(Elf32_Ehdr *ehdr, const char *secname,
- unsigned long *size)
-{
-   Elf32_Shdr *sechdrs;
-   unsigned int i;
-   char *secnames;
-
-   /* Grab section headers and strings so we can tell who is who */
-   sechdrs = (void *)ehdr + ehdr->e_shoff;
-   secnames = (void *)ehdr + sechdrs[ehdr->e_shstrndx].sh_offset;
-
-   /* Find the section they want */
-   for (i = 1; i < ehdr->e_shnum; i++) {
-   if (strcmp(secnames+sechdrs[i].sh_name, secname) == 0) {
-   if (size)
-   *size = sechdrs[i].sh_size;
-   return (void *)ehdr + sechdrs[i].sh_offset;
-   }
-   }
-   *size = 0;
-   return NULL;
-}
-
-static Elf32_Sym * __init find_symbol32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   unsigned int i;
-   char name[MAX_SYMNAME], *c;
-
-   for (i = 0; i < (lib->dynsymsize / sizeof(Elf32_Sym)); i++) {
-   if (lib->dynsym[i].st_name == 0)
-   continue;
-   strlcpy(name, lib->dynstr + lib->dynsym[i].st_name,
-   MAX_SYMNAME);
-   c = strchr(name, '@');
-   if (c)
-   *c = 0;
-   if (strcmp(symname, name) == 0)
-   return >dynsym[i];
-   }
-   return NULL;
-}
-
-/* Note that we assume the section is .text and the symbol is relative to
- * the library base
- */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   Elf32_Sym *sym = find_symbol32(lib, symname);
-
-   if (sym == NULL) {
-   printk(KERN_WARNING "vDSO32: function %s not found !\n",
-  symname);
-   return 0;
-   }
-   return sym->st_value - VDSO32_LBASE;
-}
-
-static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  const char *orig, const char *fix)
-{
-   Elf32_Sym *sym32_gen, *sym32_fix;
-
-   sym32_gen = find_symbol32(v32, orig);
-   if (sym32_gen == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", orig);
-   return -1;
-   }
-   if (fix == NULL) {
-   sym32_gen->st_name = 0;
-   return 0;
-   }
-   sym32_fix = find_symbol32(v32, fix);
-   if (sym32_fix == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol %s !\n", fix);
-   return -1;
-   }
-   sym32_gen->st_value = sym32_fix->st_value;
-   sym32_gen->st_size = sym32_fix->st_size;
-   sym32_gen->st_info = sym32_fix->st_info;
-   sym32_gen->st_other = sym32_fix->st_other;
-   sym32_gen->st_shndx = sym32_fix->st_shndx;
-
-   return 0;
-}
-#else /* !CONFIG_VDSO32 */
-static unsigned long __init find_function32(struct lib32_elfinfo *lib,
-   const char *symname)
-{
-   return 0;
-}
-
-static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  const char *orig, const char *fix)
-{
-   return 0;
-}
+#include "vdso_common.c"
 #endif /* CONFIG_VDSO32 */
 
-
 #ifdef CONFIG_PPC64
-
-static void * __init find_section64(Elf64_Ehdr *ehdr, const char *secname,
- unsigned long *size)
-{
-   Elf64_Shdr *sechdrs;
-   unsigned int i;
-   char *secnames;
-
-   /* Grab section headers and strings so 

[PATCH 7/7] mm: kill arch_mremap

2016-10-25 Thread Dmitry Safonov
This reverts commit 4abad2ca4a4d ("mm: new arch_remap() hook") and
commit 2ae416b142b6 ("mm: new mm hook framework").
It also keeps the same functionality of mremapping vDSO blob with
introducing vm_special_mapping mremap op for powerpc.

Cc: Laurent Dufour 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Kirill A. Shutemov" 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: Andrew Morton 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/alpha/include/asm/Kbuild|  1 -
 arch/arc/include/asm/Kbuild  |  1 -
 arch/arm/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/Kbuild|  1 -
 arch/avr32/include/asm/Kbuild|  1 -
 arch/blackfin/include/asm/Kbuild |  1 -
 arch/c6x/include/asm/Kbuild  |  1 -
 arch/cris/include/asm/Kbuild |  1 -
 arch/frv/include/asm/Kbuild  |  1 -
 arch/h8300/include/asm/Kbuild|  1 -
 arch/hexagon/include/asm/Kbuild  |  1 -
 arch/ia64/include/asm/Kbuild |  1 -
 arch/m32r/include/asm/Kbuild |  1 -
 arch/m68k/include/asm/Kbuild |  1 -
 arch/metag/include/asm/Kbuild|  1 -
 arch/microblaze/include/asm/Kbuild   |  1 -
 arch/mips/include/asm/Kbuild |  1 -
 arch/mn10300/include/asm/Kbuild  |  1 -
 arch/nios2/include/asm/Kbuild|  1 -
 arch/openrisc/include/asm/Kbuild |  1 -
 arch/parisc/include/asm/Kbuild   |  1 -
 arch/powerpc/include/asm/mm-arch-hooks.h | 28 
 arch/powerpc/kernel/vdso.c   | 19 +++
 arch/powerpc/kernel/vdso_common.c|  1 +
 arch/s390/include/asm/Kbuild |  1 -
 arch/score/include/asm/Kbuild|  1 -
 arch/sh/include/asm/Kbuild   |  1 -
 arch/sparc/include/asm/Kbuild|  1 -
 arch/tile/include/asm/Kbuild |  1 -
 arch/um/include/asm/Kbuild   |  1 -
 arch/unicore32/include/asm/Kbuild|  1 -
 arch/x86/include/asm/Kbuild  |  1 -
 arch/xtensa/include/asm/Kbuild   |  1 -
 include/asm-generic/mm-arch-hooks.h  | 16 
 include/linux/mm-arch-hooks.h| 25 -
 mm/mremap.c  |  4 
 36 files changed, 20 insertions(+), 103 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index bf8475ce85ee..0a5e0ec2842b 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -6,7 +6,6 @@ generic-y += exec.h
 generic-y += export.h
 generic-y += irq_work.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
 generic-y += trace_clock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index c332604606dd..e6059a808463 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,7 +22,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 0745538b26d3..44b717cb4a55 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += irq_regs.h
 generic-y += kdebug.h
 generic-y += local.h
 generic-y += local64.h
-generic-y += mm-arch-hooks.h
 generic-y += msgbuf.h
 generic-y += msi.h
 generic-y += param.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 44e1d7f10add..a42a1367aea4 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -20,7 +20,6 @@ generic-y += kvm_para.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += mman.h
 generic-y += msgbuf.h
 generic-y += msi.h
diff --git a/arch/avr32/include/asm/Kbuild b/arch/avr32/include/asm/Kbuild
index 241b9b9729d8..519810d0d5e1 100644
--- a/arch/avr32/include/asm/Kbuild
+++ b/arch/avr32/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += irq_work.h
 generic-y += local.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
diff --git a/arch/blackfin/include/asm/Kbuild b/arch/blackfin/include/asm/Kbuild
index 91d49c0a3118..c80181e4454f 100644
--- a/arch/blackfin/include/asm/Kbuild
+++ b/arch/blackfin/include/asm/Kbuild
@@ -21,7 +21,6 @@ 

[PATCH 6/7] powerpc/vdso: switch from legacy_special_mapping_vmops

2016-10-25 Thread Dmitry Safonov
This will allow to introduce mremap hook (the next patch).

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c| 19 +++
 arch/powerpc/kernel/vdso_common.c |  8 ++--
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index e68601ffc9ad..9ee3fd65c6e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -51,7 +51,7 @@
 #define VDSO_ALIGNMENT (1 << 16)
 
 static unsigned int vdso32_pages;
-static struct page **vdso32_pagelist;
+static struct vm_special_mapping vdso32_mapping;
 unsigned long vdso32_sigtramp;
 unsigned long vdso32_rt_sigtramp;
 
@@ -64,7 +64,7 @@ static void *vdso32_kbase;
 extern char vdso64_start, vdso64_end;
 static void *vdso64_kbase = _start;
 static unsigned int vdso64_pages;
-static struct page **vdso64_pagelist;
+static struct vm_special_mapping vdso64_mapping;
 unsigned long vdso64_rt_sigtramp;
 #endif /* CONFIG_PPC64 */
 
@@ -143,10 +143,11 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+static int map_vdso(struct vm_special_mapping *vsm, unsigned long vdso_pages,
unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
+   struct vm_area_struct *vma;
int ret = 0;
 
mm->context.vdso_base = 0;
@@ -198,12 +199,14 @@ static int map_vdso(struct page **vdso_pagelist, unsigned 
long vdso_pages,
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   vma = _install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-vdso_pagelist);
-   if (ret)
+vsm);
+   if (IS_ERR(vma)) {
+   ret = PTR_ERR(vma);
current->mm->context.vdso_base = 0;
+   }
 
 out_up_mmap_sem:
up_write(>mmap_sem);
@@ -220,7 +223,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return 0;
 
if (is_32bit_task())
-   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+   return map_vdso(_mapping, vdso32_pages, VDSO32_MBASE);
 #ifdef CONFIG_PPC64
else
/*
@@ -228,7 +231,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * allows get_unmapped_area to find an area near other mmaps
 * and most likely share a SLB entry.
 */
-   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+   return map_vdso(_mapping, vdso64_pages, 0);
 #endif
WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
return -1;
diff --git a/arch/powerpc/kernel/vdso_common.c 
b/arch/powerpc/kernel/vdso_common.c
index c97c30606b3f..047f6b8b230f 100644
--- a/arch/powerpc/kernel/vdso_common.c
+++ b/arch/powerpc/kernel/vdso_common.c
@@ -14,7 +14,7 @@
 #define VDSO_LBASE CONCAT3(VDSO, BITS, _LBASE)
 #define vdso_kbase CONCAT3(vdso, BITS, _kbase)
 #define vdso_pages CONCAT3(vdso, BITS, _pages)
-#define vdso_pagelist  CONCAT3(vdso, BITS, _pagelist)
+#define vdso_mapping   CONCAT3(vdso, BITS, _mapping)
 
 #undef pr_fmt
 #define pr_fmt(fmt)"vDSO" __stringify(BITS) ": " fmt
@@ -207,6 +207,7 @@ static __init int vdso_setup(struct lib_elfinfo *v)
 static __init void init_vdso_pagelist(void)
 {
int i;
+   struct page **vdso_pagelist;
 
/* Make sure pages are in the correct state */
vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 2),
@@ -221,6 +222,9 @@ static __init void init_vdso_pagelist(void)
}
vdso_pagelist[i++] = virt_to_page(vdso_data);
vdso_pagelist[i] = NULL;
+
+   vdso_mapping.pages = vdso_pagelist;
+   vdso_mapping.name = "[vdso]";
 }
 
 #undef find_section
@@ -236,7 +240,7 @@ static __init void init_vdso_pagelist(void)
 #undef VDSO_LBASE
 #undef vdso_kbase
 #undef vdso_pages
-#undef vdso_pagelist
+#undef vdso_mapping
 #undef lib_elfinfo
 #undef BITS
 #undef _CONCAT3
-- 
2.10.0



[PATCH 5/7] powerpc/vdso: split map_vdso from arch_setup_additional_pages

2016-10-25 Thread Dmitry Safonov
I'll be easier to introduce vm_special_mapping struct in
a smaller map_vdso() function (see the next patches).

Impact: cleanup

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c | 67 +-
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 25d03d773c49..e68601ffc9ad 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -143,52 +143,23 @@ struct lib64_elfinfo
unsigned long   text;
 };
 
-
-/*
- * This is called from binfmt_elf, we create the special vma for the
- * vDSO and insert it into the mm struct tree
- */
-int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+static int map_vdso(struct page **vdso_pagelist, unsigned long vdso_pages,
+   unsigned long vdso_base)
 {
struct mm_struct *mm = current->mm;
-   struct page **vdso_pagelist;
-   unsigned long vdso_pages;
-   unsigned long vdso_base;
int ret = 0;
 
-   if (!vdso_ready)
-   return 0;
-
-#ifdef CONFIG_PPC64
-   if (is_32bit_task()) {
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-   } else {
-   vdso_pagelist = vdso64_pagelist;
-   vdso_pages = vdso64_pages;
-   /*
-* On 64bit we don't have a preferred map address. This
-* allows get_unmapped_area to find an area near other mmaps
-* and most likely share a SLB entry.
-*/
-   vdso_base = 0;
-   }
-#else
-   vdso_pagelist = vdso32_pagelist;
-   vdso_pages = vdso32_pages;
-   vdso_base = VDSO32_MBASE;
-#endif
-
-   current->mm->context.vdso_base = 0;
+   mm->context.vdso_base = 0;
 
-   /* vDSO has a problem and was disabled, just don't "enable" it for the
+   /*
+* vDSO has a problem and was disabled, just don't "enable" it for the
 * process
 */
if (vdso_pages == 0)
return 0;
+
/* Add a page to the vdso size for the data page */
-   vdso_pages ++;
+   vdso_pages++;
 
/*
 * pick a base address for the vDSO in process space. We try to put it
@@ -239,6 +210,30 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
return ret;
 }
 
+/*
+ * This is called from binfmt_elf, we create the special vma for the
+ * vDSO and insert it into the mm struct tree
+ */
+int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
+{
+   if (!vdso_ready)
+   return 0;
+
+   if (is_32bit_task())
+   return map_vdso(vdso32_pagelist, vdso32_pages, VDSO32_MBASE);
+#ifdef CONFIG_PPC64
+   else
+   /*
+* On 64bit we don't have a preferred map address. This
+* allows get_unmapped_area to find an area near other mmaps
+* and most likely share a SLB entry.
+*/
+   return map_vdso(vdso64_pagelist, vdso64_pages, 0);
+#endif
+   WARN_ONCE(1, "task is not 32-bit on non PPC64 kernel");
+   return -1;
+}
+
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
if (vma->vm_mm && vma->vm_start == vma->vm_mm->context.vdso_base)
-- 
2.10.0



[PATCH 2/7] powerpc/vdso: remove unused params in vdso_do_func_patch{32, 64}

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4ffb82a2d9e9..278b9aa25a1c 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -309,7 +309,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf32_Sym *sym32_gen, *sym32_fix;
@@ -344,7 +343,6 @@ static unsigned long __init find_function32(struct 
lib32_elfinfo *lib,
 }
 
 static int __init vdso_do_func_patch32(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
return 0;
@@ -419,8 +417,7 @@ static unsigned long __init find_function64(struct 
lib64_elfinfo *lib,
 #endif
 }
 
-static int __init vdso_do_func_patch64(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64,
+static int __init vdso_do_func_patch64(struct lib64_elfinfo *v64,
   const char *orig, const char *fix)
 {
Elf64_Sym *sym64_gen, *sym64_fix;
@@ -619,11 +616,9 @@ static __init int vdso_fixup_alt_funcs(struct 
lib32_elfinfo *v32,
 * It would be easy to do, but doesn't seem to be necessary,
 * patching the OPD symbol is enough.
 */
-   vdso_do_func_patch32(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch32(v32, patch->gen_name, patch->fix_name);
 #ifdef CONFIG_PPC64
-   vdso_do_func_patch64(v32, v64, patch->gen_name,
-patch->fix_name);
+   vdso_do_func_patch64(v64, patch->gen_name, patch->fix_name);
 #endif /* CONFIG_PPC64 */
}
 
-- 
2.10.0



[PATCH 1/7] powerpc/vdso: unify return paths in setup_additional_pages

2016-10-25 Thread Dmitry Safonov
Impact: cleanup

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andy Lutomirski 
Cc: Oleg Nesterov 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org 
Signed-off-by: Dmitry Safonov 
---
 arch/powerpc/kernel/vdso.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 4111d30badfa..4ffb82a2d9e9 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -154,7 +154,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
struct page **vdso_pagelist;
unsigned long vdso_pages;
unsigned long vdso_base;
-   int rc;
+   int ret = 0;
 
if (!vdso_ready)
return 0;
@@ -203,8 +203,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
  ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
  0, 0);
if (IS_ERR_VALUE(vdso_base)) {
-   rc = vdso_base;
-   goto fail_mmapsem;
+   ret = vdso_base;
+   goto out_up_mmap_sem;
}
 
/* Add required alignment. */
@@ -227,21 +227,16 @@ int arch_setup_additional_pages(struct linux_binprm 
*bprm, int uses_interp)
 * It's fine to use that for setting breakpoints in the vDSO code
 * pages though.
 */
-   rc = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
+   ret = install_special_mapping(mm, vdso_base, vdso_pages << PAGE_SHIFT,
 VM_READ|VM_EXEC|
 VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 vdso_pagelist);
-   if (rc) {
+   if (ret)
current->mm->context.vdso_base = 0;
-   goto fail_mmapsem;
-   }
-
-   up_write(>mmap_sem);
-   return 0;
 
- fail_mmapsem:
+out_up_mmap_sem:
up_write(>mmap_sem);
-   return rc;
+   return ret;
 }
 
 const char *arch_vma_name(struct vm_area_struct *vma)
-- 
2.10.0



[PATCH 0/7] powerpc/mm: refactor vDSO mapping code

2016-10-25 Thread Dmitry Safonov
Cleanup patches for vDSO on powerpc.
Originally, I wanted to add vDSO remapping on arm/aarch64 and
I decided to cleanup that part on powerpc.
I've add a hook for vm_ops for vDSO just like I did for x86.
Other changes - reduce exhaustive code duplication.
No visible to userspace changes expected.

Tested on qemu with buildroot rootfs.

Dmitry Safonov (7):
  powerpc/vdso: unify return paths in setup_additional_pages
  powerpc/vdso: remove unused params in vdso_do_func_patch{32,64}
  powerpc/vdso: separate common code in vdso_common
  powerpc/vdso: introduce init_vdso{32,64}_pagelist
  powerpc/vdso: split map_vdso from arch_setup_additional_pages
  powerpc/vdso: switch from legacy_special_mapping_vmops
  mm: kill arch_mremap

 arch/alpha/include/asm/Kbuild|   1 -
 arch/arc/include/asm/Kbuild  |   1 -
 arch/arm/include/asm/Kbuild  |   1 -
 arch/arm64/include/asm/Kbuild|   1 -
 arch/avr32/include/asm/Kbuild|   1 -
 arch/blackfin/include/asm/Kbuild |   1 -
 arch/c6x/include/asm/Kbuild  |   1 -
 arch/cris/include/asm/Kbuild |   1 -
 arch/frv/include/asm/Kbuild  |   1 -
 arch/h8300/include/asm/Kbuild|   1 -
 arch/hexagon/include/asm/Kbuild  |   1 -
 arch/ia64/include/asm/Kbuild |   1 -
 arch/m32r/include/asm/Kbuild |   1 -
 arch/m68k/include/asm/Kbuild |   1 -
 arch/metag/include/asm/Kbuild|   1 -
 arch/microblaze/include/asm/Kbuild   |   1 -
 arch/mips/include/asm/Kbuild |   1 -
 arch/mn10300/include/asm/Kbuild  |   1 -
 arch/nios2/include/asm/Kbuild|   1 -
 arch/openrisc/include/asm/Kbuild |   1 -
 arch/parisc/include/asm/Kbuild   |   1 -
 arch/powerpc/include/asm/mm-arch-hooks.h |  28 --
 arch/powerpc/kernel/vdso.c   | 492 +--
 arch/powerpc/kernel/vdso_common.c| 248 
 arch/s390/include/asm/Kbuild |   1 -
 arch/score/include/asm/Kbuild|   1 -
 arch/sh/include/asm/Kbuild   |   1 -
 arch/sparc/include/asm/Kbuild|   1 -
 arch/tile/include/asm/Kbuild |   1 -
 arch/um/include/asm/Kbuild   |   1 -
 arch/unicore32/include/asm/Kbuild|   1 -
 arch/x86/include/asm/Kbuild  |   1 -
 arch/xtensa/include/asm/Kbuild   |   1 -
 include/asm-generic/mm-arch-hooks.h  |  16 -
 include/linux/mm-arch-hooks.h|  25 --
 mm/mremap.c  |   4 -
 36 files changed, 323 insertions(+), 520 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/mm-arch-hooks.h
 create mode 100644 arch/powerpc/kernel/vdso_common.c
 delete mode 100644 include/asm-generic/mm-arch-hooks.h
 delete mode 100644 include/linux/mm-arch-hooks.h

-- 
2.10.0



Re: [Patch v5 04/12] irqchip: xilinx: Add support for parent intc

2016-10-25 Thread Marc Zyngier
On 25/10/16 15:44, Sören Brinkmann wrote:
> On Tue, 2016-10-25 at 12:49:33 +0200, Thomas Gleixner wrote:
>> On Tue, 25 Oct 2016, Zubair Lutfullah Kakakhel wrote:
>>> On 10/21/2016 10:48 AM, Marc Zyngier wrote:
 Shouldn't you return an error if irq is zero?

>>>
>>> I'll add the following for the error case
>>>
>>> pr_err("%s: Parent exists but interrupts property not defined\n" ,
>>> __func__);
>>
>> Please do not use this silly __func__ stuff. It's not giving any value to
>> the printout.
>>
>> Set a proper prefix for your pr_* stuff, so the string is prefixed with
>> 'irq-xilinx:' or whatever you think is appropriate. Then the string itself
>> is good enough to find from which place this printk comes.
> 
> Haven't looked at the real code, but is there probably a way to get a
> struct device pointer and use dev_err?

You wish. Interrupt controllers (and timers) are brought up way before
the device model is available, hence no struct device.

I've started untangling that mess a couple of times, and always ran out
of available time (you start pulling the VFS, then the scheduler, the
creation of the first thread, and then things lock up because you need
to context switch and no timer is ready yet).

I may try to spend some time on it again while travelling to LPC...

M.
-- 
Jazz is not dead. It just smells funny...


Re: [Patch v5 04/12] irqchip: xilinx: Add support for parent intc

2016-10-25 Thread Sören Brinkmann
On Tue, 2016-10-25 at 12:49:33 +0200, Thomas Gleixner wrote:
> On Tue, 25 Oct 2016, Zubair Lutfullah Kakakhel wrote:
> > On 10/21/2016 10:48 AM, Marc Zyngier wrote:
> > > Shouldn't you return an error if irq is zero?
> > > 
> > 
> > I'll add the following for the error case
> > 
> > pr_err("%s: Parent exists but interrupts property not defined\n" ,
> > __func__);
> 
> Please do not use this silly __func__ stuff. It's not giving any value to
> the printout.
> 
> Set a proper prefix for your pr_* stuff, so the string is prefixed with
> 'irq-xilinx:' or whatever you think is appropriate. Then the string itself
> is good enough to find from which place this printk comes.

Haven't looked at the real code, but is there probably a way to get a
struct device pointer and use dev_err?

Sören


Re: [PATCH] powerpc/pseries: fix spelling mistake: "Attemping" -> "Attempting"

2016-10-25 Thread Nathan Fontenot
On 10/24/2016 05:02 PM, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in pr_debug message
> 
> Signed-off-by: Colin Ian King 

Reviewed-by: Nathan Fontenot 

> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index a1b63e0..c8929cb 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -553,7 +553,7 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, 
> u32 drc_index)
>  {
>   int rc;
> 
> - pr_debug("Attemping to remove CPU %s, drc index: %x\n",
> + pr_debug("Attempting to remove CPU %s, drc index: %x\n",
>dn->name, drc_index);
> 
>   rc = dlpar_offline_cpu(dn);
> 



Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

2016-10-25 Thread Balbir Singh


On 11/10/16 23:26, Balbir Singh wrote:
> 
> 
> On 07/10/16 05:36, Reza Arbab wrote:
>> Currently, CONFIG_MOVABLE_NODE depends on X86_64. In preparation to
>> enable it for other arches, we need to factor a detail which is unique
>> to x86 out of the generic mm code.
>>
>> Specifically, as documented in kernel-parameters.txt, the use of
>> "movable_node" should remain restricted to x86:
>>
>> movable_node[KNL,X86] Boot-time switch to enable the effects
>> of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>>
>> This option tells x86 to find movable nodes identified by the ACPI SRAT.
>> On other arches, it would have no benefit, only the undesired side
>> effect of setting bottom-up memblock allocation.
>>
>> Since #ifdef CONFIG_MOVABLE_NODE will no longer be enough to restrict
>> this option to x86, move it to an arch-specific compilation unit
>> instead.
>>
>> Signed-off-by: Reza Arbab 
> 
> Acked-by: Balbir Singh 
> 

After the ack, I realized there were some more checks needed, IOW
questions for you :)

1. Have you checked to see if our memblock allocations spill
over to probably hotpluggable nodes?
2. Shouldn't we be marking nodes discovered as movable via
memblock_mark_hotplug()?

Balbir Singh.


Re: [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code

2016-10-25 Thread Gautham R Shenoy
Hi Paul,

On Fri, Oct 21, 2016 at 08:04:17PM +1100, Paul Mackerras wrote:
> This fixes a race condition where one thread that is entering or
> leaving a power-saving state can inadvertently ignore the lock bit
> that was set by another thread, and potentially also clear it.
> The core_idle_lock_held function is called when the lock bit is
> seen to be set.  It polls the lock bit until it is clear, then
> does a lwarx to load the word containing the lock bit and thread
> idle bits so it can be updated.  However, it is possible that the
> value loaded with the lwarx has the lock bit set, even though an
> immediately preceding lwz loaded a value with the lock bit clear.
> If this happens then we go ahead and update the word despite the
> lock bit being set, and when called from pnv_enter_arch207_idle_mode,
> we will subsequently clear the lock bit.
> 
> No identifiable misbehaviour has been attributed to this race.
> 
> This fixes it by checking the lock bit in the value loaded by the
> lwarx.  If it is set then we just go back and keep on polling.
> 
> Fixes: b32aadc1a8ed

This fixes the code which has been around since 4.2 kernel. Should
this be marked to stable as well ?

> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kernel/idle_book3s.S | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 0d8712a..72dac0b 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -90,6 +90,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>   * Threads will spin in HMT_LOW until the lock bit is cleared.
>   * r14 - pointer to core_idle_state
>   * r15 - used to load contents of core_idle_state
> + * r9  - used as a temporary variable
>   */
> 
>  core_idle_lock_held:
> @@ -99,6 +100,8 @@ core_idle_lock_held:
>   bne 3b
>   HMT_MEDIUM
>   lwarx   r15,0,r14
> + andi.   r9,r15,PNV_CORE_IDLE_LOCK_BIT
> + bne core_idle_lock_held
>   blr
> 
>  /*
> -- 
> 2.7.4
> 

--
Thanks and Regards
gautham.



Re: [Patch v5 04/12] irqchip: xilinx: Add support for parent intc

2016-10-25 Thread Thomas Gleixner
On Tue, 25 Oct 2016, Zubair Lutfullah Kakakhel wrote:
> On 10/21/2016 10:48 AM, Marc Zyngier wrote:
> > Shouldn't you return an error if irq is zero?
> > 
> 
> I'll add the following for the error case
> 
>   pr_err("%s: Parent exists but interrupts property not defined\n" ,
> __func__);

Please do not use this silly __func__ stuff. It's not giving any value to
the printout.

Set a proper prefix for your pr_* stuff, so the string is prefixed with
'irq-xilinx:' or whatever you think is appropriate. Then the string itself
is good enough to find from which place this printk comes.

Thanks,

tglx


Re: [PATCH net-next] ibmveth: calculate correct gso_size and set gso_type

2016-10-25 Thread Marcelo Ricardo Leitner
On Tue, Oct 25, 2016 at 04:13:41PM +1100, Jon Maxwell wrote:
> We recently encountered a bug where a few customers using ibmveth on the 
> same LPAR hit an issue where a TCP session hung when large receive was
> enabled. Closer analysis revealed that the session was stuck because the 
> one side was advertising a zero window repeatedly.
> 
> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> which is translated by TCP into the MSS later up the stack. The MSS is 
> used to calculate the TCP window size and as that was abnormally large, 
> it was calculating a zero window, even although the sockets receive buffer 
> was completely empty. 
> 
> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> and Marcelo for all your help and review on this.
> 
> The patch fixes both our internal reproduction tests and our customers tests.
> 
> Signed-off-by: Jon Maxwell 
> ---
>  drivers/net/ethernet/ibm/ibmveth.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
> b/drivers/net/ethernet/ibm/ibmveth.c
> index 29c05d0..3028c33 100644
> --- a/drivers/net/ethernet/ibm/ibmveth.c
> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, int 
> budget)
>   int frames_processed = 0;
>   unsigned long lpar_rc;
>   struct iphdr *iph;
> + bool large_packet = 0;
> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);

Compiler may optmize this, but maybe move hdr_len to [*] ?

>  
>  restart_poll:
>   while (frames_processed < budget) {
> @@ -1236,10 +1238,27 @@ static int ibmveth_poll(struct napi_struct *napi, int 
> budget)
>   iph->check = 0;
>   iph->check = 
> ip_fast_csum((unsigned char *)iph, iph->ihl);
>   adapter->rx_large_packets++;
> + large_packet = 1;
>   }
>   }
>   }
>  
> + if (skb->len > netdev->mtu) {

[*]

> + iph = (struct iphdr *)skb->data;
> + if (be16_to_cpu(skb->protocol) == ETH_P_IP && 
> iph->protocol == IPPROTO_TCP) {

The if line above is too long, should be broken in two.

> + hdr_len += sizeof(struct iphdr);
> + skb_shinfo(skb)->gso_type = 
> SKB_GSO_TCPV4;
> + skb_shinfo(skb)->gso_size = netdev->mtu 
> - hdr_len;
> + } else if (be16_to_cpu(skb->protocol) == 
> ETH_P_IPV6 &&
> + iph->protocol == IPPROTO_TCP) {
^
And this one should start 3 spaces later, right below be16_

  Marcelo

> + hdr_len += sizeof(struct ipv6hdr);
> + skb_shinfo(skb)->gso_type = 
> SKB_GSO_TCPV6;
> + skb_shinfo(skb)->gso_size = netdev->mtu 
> - hdr_len;
> + }
> + if (!large_packet)
> + adapter->rx_large_packets++;
> + }
> +
>   napi_gro_receive(napi, skb);/* send it up */
>  
>   netdev->stats.rx_packets++;
> -- 
> 1.8.3.1
> 


Re: [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest

2016-10-25 Thread Gautham R Shenoy
Hi Paul,

[Added Shreyas's current e-mail address ]

On Fri, Oct 21, 2016 at 08:03:05PM +1100, Paul Mackerras wrote:
> Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering
> nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
> thread entering a KVM guest could switch the MMU context to the guest
> while another thread was still in host kernel context with the MMU on.
> That commit moved the point where a thread entering a power-saving
> mode set its kvm_hstate.hwthread_state field in its PACA to
> KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
> MMU had been switched off.  That commit also added a comment
> explaining that we have to switch to real mode before setting
> hwthread_state to avoid this race.
> 
> Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make
> pnv_powersave_common more generic", 2016-07-08) subsequently moved
> the setting of hwthread_state back to a point where the MMU is on,
> thus reintroducing the race, despite the comment saying that this
> should not be done being included in full in the context lines of
> the patch that did it.
>

Sorry about missing that part. I am at fault, since I reviewed
4eae2c9ae54a patch. Will keep this in mind in the future.

> This fixes the race again and adds a bigger and shoutier comment
> explaining the potential race condition.
> 
> Cc: sta...@vger.kernel.org # v4.8
> Fixes: 4eae2c9ae54a
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kernel/idle_book3s.S | 32 ++--
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index bd739fe..0d8712a 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -163,12 +163,6 @@ _GLOBAL(pnv_powersave_common)
>   std r9,_MSR(r1)
>   std r1,PACAR1(r13)
> 
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> - /* Tell KVM we're entering idle */
> - li  r4,KVM_HWTHREAD_IN_IDLE
> - stb r4,HSTATE_HWTHREAD_STATE(r13)
> -#endif
> -
>   /*
>* Go to real mode to do the nap, as required by the architecture.
>* Also, we need to be in real mode before setting hwthread_state,
> @@ -185,6 +179,26 @@ _GLOBAL(pnv_powersave_common)
> 
>   .globl pnv_enter_arch207_idle_mode
>  pnv_enter_arch207_idle_mode:
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* Tell KVM we're entering idle */
> + li  r4,KVM_HWTHREAD_IN_IDLE
> + /**/
> + /*  N O T E   W E L L! ! !N O T E   W E L L   */
> + /* The following store to HSTATE_HWTHREAD_STATE(r13)  */
> + /* MUST occur in real mode, i.e. with the MMU off,*/
> + /* and the MMU must stay off until we clear this flag */
> + /* and test HSTATE_HWTHREAD_REQ(r13) in the system*/
> + /* reset interrupt vector in exceptions-64s.S.*/
> + /* The reason is that another thread can switch the   */
> + /* MMU to a guest context whenever this flag is set   */
> + /* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on,*/
> + /* that would potentially cause this thread to start  */
> + /* executing instructions from guest memory in*/
> + /* hypervisor mode, leading to a host crash or data   */
> + /* corruption, or worse.  */
> + /**/
> + stb r4,HSTATE_HWTHREAD_STATE(r13)
> +#endif
>   stb r3,PACA_THREAD_IDLE_STATE(r13)
>   cmpwi   cr3,r3,PNV_THREAD_SLEEP
>   bge cr3,2f
> @@ -250,6 +264,12 @@ enter_winkle:
>   * r3 - requested stop state
>   */
>  power_enter_stop:
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> + /* Tell KVM we're entering idle */
> + li  r4,KVM_HWTHREAD_IN_IDLE
> + /* DO THIS IN REAL MODE!  See comment above. */
> + stb r4,HSTATE_HWTHREAD_STATE(r13)
> +#endif
>  /*
>   * Check if the requested state is a deep idle state.
>   */
> -- 
> 2.7.4
> 



Re: [Patch v5 04/12] irqchip: xilinx: Add support for parent intc

2016-10-25 Thread Zubair Lutfullah Kakakhel

Hi,

Thanks for the review.
Some comments in-line.

On 10/21/2016 10:48 AM, Marc Zyngier wrote:

On 17/10/16 17:52, Zubair Lutfullah Kakakhel wrote:

The MIPS based xilfpga platform has the following IRQ structure

Peripherals --> xilinx_intcontroller -> mips_cpu_int controller

Add support for the driver to chain the irq handler

Signed-off-by: Zubair Lutfullah Kakakhel 

---
V4 -> V5
Rebased to v4.9-rc1
Missing curly braces

V3 -> V4
Clean up if/else when a parent is found
Pass irqchip structure to handler as data

V2 -> V3
Reused existing parent node instead of finding again.
Cleanup up handler based on review

V1 -> V2

No change
---
 drivers/irqchip/irq-xilinx-intc.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-xilinx-intc.c 
b/drivers/irqchip/irq-xilinx-intc.c
index 45e5154..dbf8b0c 100644
--- a/drivers/irqchip/irq-xilinx-intc.c
+++ b/drivers/irqchip/irq-xilinx-intc.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 

 /* No one else should require these constants, so define them locally here. */
 #define ISR 0x00   /* Interrupt Status Register */
@@ -154,11 +155,23 @@ static int xintc_map(struct irq_domain *d, unsigned int 
irq, irq_hw_number_t hw)
.map = xintc_map,
 };

+static void xil_intc_irq_handler(struct irq_desc *desc)
+{
+   u32 pending;
+
+   do {
+   pending = xintc_get_irq();
+   if (pending == -1U)
+   break;
+   generic_handle_irq(pending);
+   } while (true);


This is missing the chained_irq_enter()/exit() calls, which will lead to
races or lockups on the root irqchip.



I 'll fix it up in the next series.


+}
+
 static int __init xilinx_intc_of_init(struct device_node *intc,
 struct device_node *parent)
 {
u32 nr_irq;
-   int ret;
+   int ret, irq;
struct xintc_irq_chip *irqc;

if (xintc_irqc) {
@@ -221,7 +234,16 @@ static int __init xilinx_intc_of_init(struct device_node 
*intc,
goto err_alloc;
}

-   irq_set_default_host(root_domain);
+   if (parent) {
+   irq = irq_of_parse_and_map(intc, 0);
+   if (irq)
+   irq_set_chained_handler_and_data(irq,
+xil_intc_irq_handler,
+irqc);
+


Shouldn't you return an error if irq is zero?



I'll add the following for the error case

pr_err("%s: Parent exists but interrupts property not defined\n" , 
__func__);
goto err_alloc;

Thanks
ZubairLK


+   } else {
+   irq_set_default_host(root_domain);
+   }

return 0;




Thanks,

M.



Re: [PATCH v4 3/5] powerpc/mm: allow memory hotplug into a memoryless node

2016-10-25 Thread Michael Ellerman
Balbir Singh  writes:
> FYI, these checks were temporary to begin with
>
> I found this in git history
>
> b226e462124522f2f23153daff31c311729dfa2f (powerpc: don't add memory to empty 
> node/zone)

Nice thanks for digging it up.

  commit b226e462124522f2f23153daff31c311729dfa2f
  Author: Mike Kravetz 
  AuthorDate: Fri Dec 16 14:30:35 2005 -0800
  
  
That is why maintainers don't like to merge "temporary" patches :)

cheers


[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield

2016-10-25 Thread Christian Borntraeger
Peter,

here is v2 with some improved patch descriptions and some fixes. The
previous version has survived one day of linux-next and I only changed
small parts.
So unless there is some other issue, feel free to pull (or to apply
the patches) to tip/locking.

The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69:

  Linux 4.9-rc2 (2016-10-23 17:10:14 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  
tags/cpurelax

for you to fetch changes up to dcc37f9044436438360402714b7544a8e8779b07:

  processor.h: remove cpu_relax_lowlatency (2016-10-25 09:49:57 +0200)


cpu_relax: drop lowlatency, introduce yield

For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

So my proposal boils down to:
- lowest latency: use barrier() or mb() if necessary
- low latency: use cpu_relax (e.g. might give up some cpu for the other
  _hardware_ threads)
- really give up CPU: use  cpu_relax_yield

PS: In the long run I would also try to provide for s390 something
like cpu_relax_yield_to with a cpu number (or just add that to
cpu_relax_yield), since a yield_to is always better than a yield as
long as we know the waiter.


Christian Borntraeger (5):
  processor.h: introduce cpu_relax_yield
  stop_machine: yield CPU during stop machine
  s390: make cpu_relax a barrier again
  processor.h: Remove cpu_relax_lowlatency users
  processor.h: remove cpu_relax_lowlatency

 arch/alpha/include/asm/processor.h  | 2 +-
 arch/arc/include/asm/processor.h| 4 ++--
 arch/arm/include/asm/processor.h| 2 +-
 arch/arm64/include/asm/processor.h  | 2 +-
 arch/avr32/include/asm/processor.h  | 2 +-
 arch/blackfin/include/asm/processor.h   | 2 +-
 arch/c6x/include/asm/processor.h| 2 +-
 arch/cris/include/asm/processor.h   | 2 +-
 arch/frv/include/asm/processor.h| 2 +-
 arch/h8300/include/asm/processor.h  | 2 +-
 arch/hexagon/include/asm/processor.h| 2 +-
 arch/ia64/include/asm/processor.h   | 2 +-
 arch/m32r/include/asm/processor.h   | 2 +-
 arch/m68k/include/asm/processor.h   | 2 +-
 arch/metag/include/asm/processor.h  | 2 +-
 arch/microblaze/include/asm/processor.h | 2 +-
 arch/mips/include/asm/processor.h   | 2 +-
 arch/mn10300/include/asm/processor.h| 2 +-
 arch/nios2/include/asm/processor.h  | 2 +-
 arch/openrisc/include/asm/processor.h   | 2 +-
 arch/parisc/include/asm/processor.h | 2 +-
 arch/powerpc/include/asm/processor.h| 2 +-
 arch/s390/include/asm/processor.h   | 4 ++--
 arch/s390/kernel/processor.c| 4 ++--
 arch/score/include/asm/processor.h  | 2 +-
 arch/sh/include/asm/processor.h | 2 +-
 arch/sparc/include/asm/processor_32.h   | 2 +-
 arch/sparc/include/asm/processor_64.h   | 2 +-
 arch/tile/include/asm/processor.h   | 2 +-
 arch/unicore32/include/asm/processor.h  | 2 +-
 arch/x86/include/asm/processor.h| 2 +-
 arch/x86/um/asm/processor.h | 2 +-
 arch/xtensa/include/asm/processor.h | 2 +-
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c | 4 ++--
 kernel/locking/mcs_spinlock.h   | 4 ++--
 kernel/locking/mutex.c  | 4 ++--
 kernel/locking/osq_lock.c   | 6 +++---
 kernel/locking/qrwlock.c| 6 +++---
 kernel/locking/rwsem-xadd.c | 4 ++--
 kernel/stop_machine.c   | 2 +-
 lib/lockref.c   | 2 +-
 42 files changed, 53 insertions(+), 53 deletions(-)



[GIT PULL v2 1/5] processor.h: introduce cpu_relax_yield

2016-10-25 Thread Christian Borntraeger
For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.

Signed-off-by: Christian Borntraeger 
---
 arch/alpha/include/asm/processor.h  | 1 +
 arch/arc/include/asm/processor.h| 2 ++
 arch/arm/include/asm/processor.h| 1 +
 arch/arm64/include/asm/processor.h  | 1 +
 arch/avr32/include/asm/processor.h  | 1 +
 arch/blackfin/include/asm/processor.h   | 1 +
 arch/c6x/include/asm/processor.h| 1 +
 arch/cris/include/asm/processor.h   | 1 +
 arch/frv/include/asm/processor.h| 1 +
 arch/h8300/include/asm/processor.h  | 1 +
 arch/hexagon/include/asm/processor.h| 1 +
 arch/ia64/include/asm/processor.h   | 1 +
 arch/m32r/include/asm/processor.h   | 1 +
 arch/m68k/include/asm/processor.h   | 1 +
 arch/metag/include/asm/processor.h  | 1 +
 arch/microblaze/include/asm/processor.h | 1 +
 arch/mips/include/asm/processor.h   | 1 +
 arch/mn10300/include/asm/processor.h| 1 +
 arch/nios2/include/asm/processor.h  | 1 +
 arch/openrisc/include/asm/processor.h   | 1 +
 arch/parisc/include/asm/processor.h | 1 +
 arch/powerpc/include/asm/processor.h| 1 +
 arch/s390/include/asm/processor.h   | 3 ++-
 arch/s390/kernel/processor.c| 4 ++--
 arch/score/include/asm/processor.h  | 1 +
 arch/sh/include/asm/processor.h | 1 +
 arch/sparc/include/asm/processor_32.h   | 1 +
 arch/sparc/include/asm/processor_64.h   | 1 +
 arch/tile/include/asm/processor.h   | 1 +
 arch/unicore32/include/asm/processor.h  | 1 +
 arch/x86/include/asm/processor.h| 1 +
 arch/x86/um/asm/processor.h | 1 +
 arch/xtensa/include/asm/processor.h | 1 +
 33 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h 
b/arch/alpha/include/asm/processor.h
index 43a7559..0556fda 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -58,6 +58,7 @@ unsigned long get_wchan(struct task_struct *p);
   ((tsk) == current ? rdusp() : task_thread_info(tsk)->pcb.usp)
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 16b630f..6c158d5 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -60,6 +60,7 @@ struct task_struct;
 #ifndef CONFIG_EZNPS_MTM_EXT
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency() cpu_relax()
 
 #else
@@ -67,6 +68,7 @@ struct task_struct;
 #define cpu_relax() \
__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency() barrier()
 
 #endif
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 8a1e8e9..db660e0 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -82,6 +82,7 @@ unsigned long get_wchan(struct task_struct *p);
 #define cpu_relax()barrier()
 #endif
 
+#define cpu_relax_yield()cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 
 #define task_pt_regs(p) \
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 60e3482..3f9b0e5 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -149,6 +149,7 @@ static inline void cpu_relax(void)
asm volatile("yield" ::: "memory");
 }
 
+#define cpu_relax_yield() cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 
 /* Thread switching */
diff --git a/arch/avr32/include/asm/processor.h 
b/arch/avr32/include/asm/processor.h
index 941593c..e412e8b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -92,6 +92,7 @@ extern struct avr32_cpuinfo boot_cpu_data;
 #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))
 
 #define cpu_relax()barrier()
+#define cpu_relax_yield()  cpu_relax()
 #define cpu_relax_lowlatency()cpu_relax()
 #define cpu_sync_pipeline()

[GIT PULL v2 5/5] processor.h: remove cpu_relax_lowlatency

2016-10-25 Thread Christian Borntraeger
As there are no users left, we can remove cpu_relax_lowlatency.

Signed-off-by: Christian Borntraeger 
---
 arch/alpha/include/asm/processor.h  | 1 -
 arch/arc/include/asm/processor.h| 2 --
 arch/arm/include/asm/processor.h| 1 -
 arch/arm64/include/asm/processor.h  | 1 -
 arch/avr32/include/asm/processor.h  | 1 -
 arch/blackfin/include/asm/processor.h   | 1 -
 arch/c6x/include/asm/processor.h| 1 -
 arch/cris/include/asm/processor.h   | 1 -
 arch/frv/include/asm/processor.h| 1 -
 arch/h8300/include/asm/processor.h  | 1 -
 arch/hexagon/include/asm/processor.h| 1 -
 arch/ia64/include/asm/processor.h   | 1 -
 arch/m32r/include/asm/processor.h   | 1 -
 arch/m68k/include/asm/processor.h   | 1 -
 arch/metag/include/asm/processor.h  | 1 -
 arch/microblaze/include/asm/processor.h | 1 -
 arch/mips/include/asm/processor.h   | 1 -
 arch/mn10300/include/asm/processor.h| 1 -
 arch/nios2/include/asm/processor.h  | 1 -
 arch/openrisc/include/asm/processor.h   | 1 -
 arch/parisc/include/asm/processor.h | 1 -
 arch/powerpc/include/asm/processor.h| 1 -
 arch/s390/include/asm/processor.h   | 1 -
 arch/score/include/asm/processor.h  | 1 -
 arch/sh/include/asm/processor.h | 1 -
 arch/sparc/include/asm/processor_32.h   | 1 -
 arch/sparc/include/asm/processor_64.h   | 1 -
 arch/tile/include/asm/processor.h   | 1 -
 arch/unicore32/include/asm/processor.h  | 1 -
 arch/x86/include/asm/processor.h| 1 -
 arch/x86/um/asm/processor.h | 1 -
 arch/xtensa/include/asm/processor.h | 1 -
 32 files changed, 33 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h 
b/arch/alpha/include/asm/processor.h
index 0556fda..31e8dbe 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -59,7 +59,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()barrier()
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #define ARCH_HAS_PREFETCH
 #define ARCH_HAS_PREFETCHW
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 6c158d5..d102a49 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -61,7 +61,6 @@ struct task_struct;
 
 #define cpu_relax()barrier()
 #define cpu_relax_yield()  cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 #else
 
@@ -69,7 +68,6 @@ struct task_struct;
__asm__ __volatile__ (".word %0" : : "i"(CTOP_INST_SCHD_RW) : "memory")
 
 #define cpu_relax_yield()  cpu_relax()
-#define cpu_relax_lowlatency() barrier()
 
 #endif
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index db660e0..9e71c58b 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -83,7 +83,6 @@ unsigned long get_wchan(struct task_struct *p);
 #endif
 
 #define cpu_relax_yield()cpu_relax()
-#define cpu_relax_lowlatency()cpu_relax()
 
 #define task_pt_regs(p) \
((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index 3f9b0e5..6132f64 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -150,7 +150,6 @@ static inline void cpu_relax(void)
 }
 
 #define cpu_relax_yield() cpu_relax()
-#define cpu_relax_lowlatency()cpu_relax()
 
 /* Thread switching */
 extern struct task_struct *cpu_switch_to(struct task_struct *prev,
diff --git a/arch/avr32/include/asm/processor.h 
b/arch/avr32/include/asm/processor.h
index e412e8b..ee62365 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -93,7 +93,6 @@ extern struct avr32_cpuinfo boot_cpu_data;
 
 #define cpu_relax()barrier()
 #define cpu_relax_yield()  cpu_relax()
-#define cpu_relax_lowlatency()cpu_relax()
 #define cpu_sync_pipeline()asm volatile("sub pc, -2" : : : "memory")
 
 struct cpu_context {
diff --git a/arch/blackfin/include/asm/processor.h 
b/arch/blackfin/include/asm/processor.h
index 8b8704a..57acfb1 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -93,7 +93,6 @@ unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()smp_mb()
 #define cpu_relax_yield()  cpu_relax()
-#define cpu_relax_lowlatency() cpu_relax()
 
 /* Get the Silicon Revision of the chip */
 static inline uint32_t __pure bfin_revid(void)
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 914d730..1fd22e7 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -122,7 +122,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()do { } while (0)
 #define cpu_relax_yield()

[GIT PULL v2 3/5] s390: make cpu_relax a barrier again

2016-10-25 Thread Christian Borntraeger
stop_machine seemed to be the only important place for yielding during
cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can
now redefine cpu_relax to be a barrier instead on s390, making s390
identical to all other architectures.

Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h 
b/arch/s390/include/asm/processor.h
index d05965b..5d262cf 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned short stap(void)
  */
 void cpu_relax_yield(void);
 
-#define cpu_relax() cpu_relax_yield()
+#define cpu_relax() barrier()
 #define cpu_relax_lowlatency()  barrier()
 
 #define ECAG_CACHE_ATTRIBUTE   0
-- 
2.5.5



[GIT PULL v2 2/5] stop_machine: yield CPU during stop machine

2016-10-25 Thread Christian Borntraeger
Some time ago commit 57f2ffe14fd125c2  ("s390: remove diag 44 calls
from cpu_relax()") did stop cpu_relax on s390 yielding to the
hypervisor.

As it turns out this made stop_machine run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, the yielding was reintroduced with commit 4d92f50249eb
("s390: reintroduce diag 44 calls for cpu_relax()"), but in fact the
stop machine code seems to be the only place where this yielding
was really necessary. This place is probably the most important one
as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield, we can use this in multi_cpu_stop.
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger 
---
 kernel/stop_machine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ec9ab2f..1eb8266 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
/* Simple state machine */
do {
/* Chill out and ensure we re-read multi_stop_state. */
-   cpu_relax();
+   cpu_relax_yield();
if (msdata->state != curstate) {
curstate = msdata->state;
switch (curstate) {
-- 
2.5.5



[GIT PULL v2 4/5] processor.h: Remove cpu_relax_lowlatency users

2016-10-25 Thread Christian Borntraeger
With the s390 special case of a yielding cpu_relax implementation gone,
we can now remove all users of cpu_relax_lowlatency and replace them
with cpu_relax.

Signed-off-by: Christian Borntraeger 
---
 drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
 drivers/vhost/net.c | 4 ++--
 kernel/locking/mcs_spinlock.h   | 4 ++--
 kernel/locking/mutex.c  | 4 ++--
 kernel/locking/osq_lock.c   | 6 +++---
 kernel/locking/qrwlock.c| 6 +++---
 kernel/locking/rwsem-xadd.c | 4 ++--
 lib/lockref.c   | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8e..383d134 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -723,7 +723,7 @@ bool __i915_spin_request(const struct drm_i915_gem_request 
*req,
if (busywait_stop(timeout_us, cpu))
break;
 
-   cpu_relax_lowlatency();
+   cpu_relax();
} while (!need_resched());
 
return false;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc128a..5dc3465 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -342,7 +342,7 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(vq->dev, endtime) &&
   vhost_vq_avail_empty(vq->dev, vq))
-   cpu_relax_lowlatency();
+   cpu_relax();
preempt_enable();
r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
  out_num, in_num, NULL, NULL);
@@ -533,7 +533,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net 
*net, struct sock *sk)
while (vhost_can_busy_poll(>dev, endtime) &&
   !sk_has_rx_data(sk) &&
   vhost_vq_avail_empty(>dev, vq))
-   cpu_relax_lowlatency();
+   cpu_relax();
 
preempt_enable();
 
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index c835270..6a385aa 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -28,7 +28,7 @@ struct mcs_spinlock {
 #define arch_mcs_spin_lock_contended(l)
\
 do {   \
while (!(smp_load_acquire(l)))  \
-   cpu_relax_lowlatency(); \
+   cpu_relax();\
 } while (0)
 #endif
 
@@ -108,7 +108,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
return;
/* Wait until the next pointer is set */
while (!(next = READ_ONCE(node->next)))
-   cpu_relax_lowlatency();
+   cpu_relax();
}
 
/* Pass lock to next waiter. */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..4463405 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -241,7 +241,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct 
task_struct *owner)
break;
}
 
-   cpu_relax_lowlatency();
+   cpu_relax();
}
rcu_read_unlock();
 
@@ -377,7 +377,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 * memory barriers as we'll eventually observe the right
 * values at the cost of a few extra spins.
 */
-   cpu_relax_lowlatency();
+   cpu_relax();
}
 
osq_unlock(>osq);
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4ea2710 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -75,7 +75,7 @@ osq_wait_next(struct optimistic_spin_queue *lock,
break;
}
 
-   cpu_relax_lowlatency();
+   cpu_relax();
}
 
return next;
@@ -122,7 +122,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
if (need_resched())
goto unqueue;
 
-   cpu_relax_lowlatency();
+   cpu_relax();
}
return true;
 
@@ -148,7 +148,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
if (smp_load_acquire(>locked))
return true;
 
-   cpu_relax_lowlatency();
+   cpu_relax();
 
/*
 * Or we race against a concurrent unqueue()'s step-B, in which
diff --git a/kernel/locking/qrwlock.c 

Re: [PATCH v3 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly

2016-10-25 Thread Johannes Thumshirn
On Fri, Oct 14, 2016 at 09:38:21AM +0200, Johannes Thumshirn wrote:
> On Thu, Oct 13, 2016 at 05:55:11PM +0200, Steffen Maier wrote:
> > Hm, still behaves for me like I reported for v2:
> > http://marc.info/?l=linux-scsi=147637177902937=2
> 
> Hi Steffen,
> 
> Can you please try the following on top of 2/16?
> 
> diff --git a/drivers/scsi/scsi_transport_fc.c 
> b/drivers/scsi/scsi_transport_fc.c
> index 4149dac..baebaab 100644
> --- a/drivers/scsi/scsi_transport_fc.c
> +++ b/drivers/scsi/scsi_transport_fc.c
> @@ -3786,6 +3786,12 @@ enum fc_dispatch_result {
>   int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
>   int ret;
>  
> + /* check if we really have all the request data needed */
> + if (job->request_len < cmdlen) {
> + ret = -ENOMSG;
> + goto fail_host_msg;
> + }
> +
>   /* Validate the host command */
>   switch (bsg_request->msgcode) {
>   case FC_BSG_HST_ADD_RPORT:
> @@ -3831,12 +3837,6 @@ enum fc_dispatch_result {
>   goto fail_host_msg;
>   }
>  
> - /* check if we really have all the request data needed */
> - if (job->request_len < cmdlen) {
> - ret = -ENOMSG;
> - goto fail_host_msg;
> - }
> -
>   ret = i->f->bsg_request(job);
>   if (!ret)
>   return FC_DISPATCH_UNLOCKED;
> @@ -3887,6 +3887,12 @@ enum fc_dispatch_result {
>   int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
>   int ret;
>  
> + /* check if we really have all the request data needed */
> + if (job->request_len < cmdlen) {
> + ret = -ENOMSG;
> + goto fail_rport_msg;
> + }
> +
>   /* Validate the rport command */
>   switch (bsg_request->msgcode) {
>   case FC_BSG_RPT_ELS:
> 
> 
> 
> The rational behind this is, in fc_req_to_bsgjob() we're assigning
> job->request as req->cmd and job->request_len = req->cmd_len. But without
> checkinf job->request_len we don't know whether we're save to touch
> job->request (a.k.a. bsg_request).

Hi Steffen,
Did you have any chance testing this? I hacked fcping to work with non-FCoE
and rports as well and tested with FCoE and lpfc. No problems seen from my
side. I've also pused the series (With this change folded in) to my git 
tree at [1] if this helps you in any way.

[1] 
https://git.kernel.org/cgit/linux/kernel/git/jth/linux.git/log/?h=scsi-bsg-rewrite-v4

Thanks a lot,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850