date:20160607

[PATCH V2] watchdog: f71808e_wdt: Add F81866 support

2016-06-07 Thread Ji-Ze Hong (Peter Hong)

Adds watchdog enable support for Fintek F81866 Super-IO chip to
Fintek wdt driver (f71808e_wdt)

Tested and verified on iBASE MI802 Industrial PC

Datasheet references:
http://www.alldatasheet.com/datasheet-pdf/pdf/459085/FINTEK/F81866AD-I.html

Suggested-by: Guenter Roeck 
Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
Change Log:
V2:
1. Put the newer F81866 registers in order with olds.
2. Set the register from constant value to BIT() marco

 drivers/watchdog/f71808e_wdt.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/watchdog/f71808e_wdt.c b/drivers/watchdog/f71808e_wdt.c
index d4ba262..3b2de11 100644
--- a/drivers/watchdog/f71808e_wdt.c
+++ b/drivers/watchdog/f71808e_wdt.c
@@ -45,9 +45,11 @@
 #define SIO_REG_DEVREV 0x22/* Device revision */
 #define SIO_REG_MANID  0x23/* Fintek ID (2 bytes) */
 #define SIO_REG_ROM_ADDR_SEL   0x27/* ROM address select */
+#define SIO_F81866_REG_PORT_SEL0x27/* F81866 Multi-Function 
Register */
 #define SIO_REG_MFUNCT10x29/* Multi function select 1 */
 #define SIO_REG_MFUNCT20x2a/* Multi function select 2 */
 #define SIO_REG_MFUNCT30x2b/* Multi function select 3 */
+#define SIO_F81866_REG_GPIO1   0x2c/* F81866 GPIO1 Enable Register */
 #define SIO_REG_ENABLE 0x30/* Logical device enable */
 #define SIO_REG_ADDR   0x60/* Logical device address (2 bytes) */
 
@@ -60,6 +62,7 @@
 #define SIO_F71882_ID  0x0541  /* Chipset ID */
 #define SIO_F71889_ID  0x0723  /* Chipset ID */
 #define SIO_F81865_ID  0x0704  /* Chipset ID */
+#define SIO_F81866_ID  0x1010  /* Chipset ID */
 
 #define F71808FG_REG_WDO_CONF  0xf0
 #define F71808FG_REG_WDT_CONF  0xf5
@@ -116,7 +119,8 @@ module_param(start_withtimeout, uint, 0);
 MODULE_PARM_DESC(start_withtimeout, "Start watchdog timer on module load with"
" given initial timeout. Zero (default) disables this feature.");
 
-enum chips { f71808fg, f71858fg, f71862fg, f71869, f71882fg, f71889fg, f81865 
};
+enum chips { f71808fg, f71858fg, f71862fg, f71869, f71882fg, f71889fg, f81865,
+f81866};
 
 static const char *f71808e_names[] = {
"f71808fg",
@@ -126,6 +130,7 @@ static const char *f71808e_names[] = {
"f71882fg",
"f71889fg",
"f81865",
+   "f81866",
 };
 
 /* Super-I/O Function prototypes */
@@ -370,6 +375,22 @@ static int watchdog_start(void)
superio_clear_bit(watchdog.sioaddr, SIO_REG_MFUNCT3, 5);
break;
 
+   case f81866:
+   /* Set pin 70 to WDTRST# */
+   superio_clear_bit(watchdog.sioaddr, SIO_F81866_REG_PORT_SEL,
+ BIT(3) | BIT(0));
+   superio_set_bit(watchdog.sioaddr, SIO_F81866_REG_PORT_SEL,
+   BIT(2));
+   /*
+* GPIO1 Control Register when 27h BIT3:2 = 01 & BIT0 = 0.
+* The PIN 70(GPIO15/WDTRST) is controlled by 2Ch:
+* BIT5: 0 -> WDTRST#
+*   1 -> GPIO15
+*/
+   superio_clear_bit(watchdog.sioaddr, SIO_F81866_REG_GPIO1,
+ BIT(5));
+   break;
+
default:
/*
 * 'default' label to shut up the compiler and catch
@@ -382,7 +403,7 @@ static int watchdog_start(void)
superio_select(watchdog.sioaddr, SIO_F71808FG_LD_WDT);
superio_set_bit(watchdog.sioaddr, SIO_REG_ENABLE, 0);
 
-   if (watchdog.type == f81865)
+   if (watchdog.type == f81865 || watchdog.type == f81866)
superio_set_bit(watchdog.sioaddr, F81865_REG_WDO_CONF,
F81865_FLAG_WDOUT_EN);
else
@@ -788,6 +809,9 @@ static int __init f71808e_find(int sioaddr)
case SIO_F81865_ID:
watchdog.type = f81865;
break;
+   case SIO_F81866_ID:
+   watchdog.type = f81866;
+   break;
default:
pr_info("Unrecognized Fintek device: %04x\n",
(unsigned int)devid);
-- 
1.9.1

re

2016-06-07 Thread Mrs. Maria-Elisabeth Schaeffler




Did you get my message?

Re: [PATCH v3 3/6] watchdog: add pretimeout read-only device attribute to sysfs

2016-06-07 Thread Wolfram Sang


> + else if (attr == &dev_attr_pretimeout.attr &&
> +  !(wdd->info->options & WDIOF_PRETIMEOUT))
> + mode = 0;

Good catch, this one.



signature.asc
Description: PGP signature

Re: [PATCH v3 4/6] watchdog: add watchdog pretimeout framework

2016-06-07 Thread Wolfram Sang

On Tue, Jun 07, 2016 at 08:38:45PM +0300, Vladimir Zapolskiy wrote:
> The change adds a simple watchdog pretimeout framework infrastructure,
> its purpose is to allow users to select a desired handling of watchdog
> pretimeout events, which may be generated by some watchdog devices.
> 
> A user selects a default watchdog pretimeout governor during
> compilation stage.
> 
> Watchdogs with WDIOF_PRETIMEOUT capability now have two device
> attributes in sysfs: pretimeout to display currently set pretimeout
> value and pretimeout_governor attribute to display the selected
> watchdog pretimeout governor.
> 
> Watchdogs with no WDIOF_PRETIMEOUT capability has no changes in
> sysfs, and such watchdog devices do not require the framework.
> 
> Signed-off-by: Vladimir Zapolskiy 
> ---
> Changes from v2 to v3:
> * essentially simplified the implementation due to removal of runtime
>   dynamic selection of watchdog pretimeout governors by a user, this
>   feature is supposed to be added later on

Hmm, your call, but I'm not sure this will make the reviewing process
easier...

> * removed support of sleepable watchdog pretimeout governors

This does.

> * moved sysfs device attributes to watchdog_dev.c, this required to
>   add exported watchdog_pretimeout_governor_name() interface

Why this move? Before, all the pretimeout stuff was nicely encapsulated
in its own file which could be compiled out. Now things are mixing. What
was wrong with the approach I took?`

> @@ -244,6 +245,13 @@ static int __watchdog_register_device(struct 
> watchdog_device *wdd)
>   }
>   }
>  
> + ret = watchdog_register_pretimeout(wdd);
> + if (ret) {
> + watchdog_dev_unregister(wdd);
> + ida_simple_remove(&watchdog_ida, wdd->id);
> + return ret;
> + }
> +

What is the advantage of adding it here instead of adding it in
watchdog_dev.c? I mean the files to control govenors are tied to the
watchdog_device anyhow, so I'd think it's cleaner to move all that
action to watchdog_dev instead of having this stray one in the core.



signature.asc
Description: PGP signature

[PATCH net-next] r8152: replace netdev_alloc_skb_ip_align with napi_alloc_skb

2016-06-07 Thread Hayes Wang

Replace netdev_alloc_skb_ip_align() with napi_alloc_skb() which can save
several CPU cycles by avoiding having to disable and re-enable IRQs.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 3f9f6ed..161c25e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1742,7 +1742,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
pkt_len -= CRC_SIZE;
rx_data += sizeof(struct rx_desc);
 
-   skb = netdev_alloc_skb_ip_align(netdev, pkt_len);
+   skb = napi_alloc_skb(&tp->napi, pkt_len);
if (!skb) {
stats->rx_dropped++;
goto find_next_rx;
-- 
2.4.11

Re: [PATCH v3 2/2] spi: spi-ti-qspi: Add DMA support for QSPI mmap read

2016-06-07 Thread Vignesh R



On Tuesday 07 June 2016 02:47 PM, Peter Ujfalusi wrote:
[...]
>> @@ -637,6 +770,33 @@ static int ti_qspi_probe(struct platform_device *pdev)
>>  if (ret)
>>  goto free_master;
>>  
>> +dma_cap_zero(mask);
>> +dma_cap_set(DMA_MEMCPY, mask);
>> +
>> +qspi->rx_chan = dma_request_channel(mask, NULL, NULL);
> 
> dma_request_channel is deprecated, please use the:
> dma_request_chan_by_mask()
> 

Updated to use dma_request_chan_by_mask() in v4, Thanks.


-- 
Regards
Vignesh

[PATCH 0/3] Bug fixes for octeon driver

2016-06-07 Thread Jan Glauber

Testing ipmi_ssif on ThunderX several bugs were found that also
apply to the Octeon i2c driver changes coming with 4.7.

I'll need to rebase the pending ThunderX driver series after this
fixes which I'll do shortly.

Please consider for 4.7.

thanks,
Jan

Jan Glauber (3):
  i2c: octeon: Missing AAK flag in case of I2C_M_RECV_LEN
  i2c: octeon: Add retry logic after receiving STAT_RXADDR_NAK
  i2c: octeon: Avoid printk after too long SMBUS message

 drivers/i2c/busses/i2c-octeon.c | 45 -
 1 file changed, 31 insertions(+), 14 deletions(-)

-- 
2.9.0.rc0.21.g322

[PATCH 1/3] i2c: octeon: Missing AAK flag in case of I2C_M_RECV_LEN

2016-06-07 Thread Jan Glauber

During receive the controller requires the AAK flag for all
bytes but the final one. This was wrong in case of I2C_M_RECV_LEN,
where the decision if the final byte is to be transmitted
happened before adding the additional received length byte.

Set the AAK flag if additional bytes are to be received.

Signed-off-by: Jan Glauber 
---
 drivers/i2c/busses/i2c-octeon.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-octeon.c b/drivers/i2c/busses/i2c-octeon.c
index aa5f01e..1922e4a 100644
--- a/drivers/i2c/busses/i2c-octeon.c
+++ b/drivers/i2c/busses/i2c-octeon.c
@@ -934,8 +934,15 @@ static int octeon_i2c_read(struct octeon_i2c *i2c, int 
target,
return result;
 
for (i = 0; i < length; i++) {
-   /* for the last byte TWSI_CTL_AAK must not be set */
-   if (i + 1 == length)
+   /*
+* For the last byte to receive TWSI_CTL_AAK must not be set.
+*
+* A special case is I2C_M_RECV_LEN where we don't know the
+* additional length yet. If recv_len is set we assume we're
+* not reading the final byte and therefore need to set
+* TWSI_CTL_AAK.
+*/
+   if ((i + 1 == length) && !(recv_len && i == 0))
final_read = true;
 
/* clear iflg to allow next event */
-- 
2.9.0.rc0.21.g322

[PATCH 2/3] i2c: octeon: Add retry logic after receiving STAT_RXADDR_NAK

2016-06-07 Thread Jan Glauber

The controller specification states that when receiving STAT_RXADDR_NAK
the START should be sent again. Retry several times before finally
failing with -ENXIO.

Without this change the IPMI SSIF driver fails executing several commands
like 'ipmitool fru' on ThunderX.

Signed-off-by: Jan Glauber 
---
 drivers/i2c/busses/i2c-octeon.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/busses/i2c-octeon.c b/drivers/i2c/busses/i2c-octeon.c
index 1922e4a..8ade7fb 100644
--- a/drivers/i2c/busses/i2c-octeon.c
+++ b/drivers/i2c/busses/i2c-octeon.c
@@ -880,6 +880,10 @@ static int octeon_i2c_write(struct octeon_i2c *i2c, int 
target,
 {
int i, result;
 
+   result = octeon_i2c_start(i2c);
+   if (result)
+   return result;
+
octeon_i2c_data_write(i2c, target << 1);
octeon_i2c_ctl_write(i2c, TWSI_CTL_ENAB);
 
@@ -918,9 +922,14 @@ static int octeon_i2c_write(struct octeon_i2c *i2c, int 
target,
 static int octeon_i2c_read(struct octeon_i2c *i2c, int target,
   u8 *data, u16 *rlength, bool recv_len)
 {
-   int i, result, length = *rlength;
+   int i, result, length = *rlength, retries = 10;
bool final_read = false;
 
+restart:
+   result = octeon_i2c_start(i2c);
+   if (result)
+   return result;
+
octeon_i2c_data_write(i2c, (target << 1) | 1);
octeon_i2c_ctl_write(i2c, TWSI_CTL_ENAB);
 
@@ -930,8 +939,17 @@ static int octeon_i2c_read(struct octeon_i2c *i2c, int 
target,
 
/* address OK ? */
result = octeon_i2c_check_status(i2c, false);
-   if (result)
-   return result;
+   if (result) {
+   /*
+* According to controller specification on STAT_RXADDR_NAK
+* the START should be repeated so retry several times before
+* giving up with -ENXIO.
+*/
+   if (result == -ENXIO && --retries > 0)
+   goto restart;
+   else
+   return result;
+   }
 
for (i = 0; i < length; i++) {
/*
@@ -1019,10 +1037,6 @@ static int octeon_i2c_xfer(struct i2c_adapter *adap, 
struct i2c_msg *msgs,
break;
}
 
-   ret = octeon_i2c_start(i2c);
-   if (ret)
-   return ret;
-
if (pmsg->flags & I2C_M_RD)
ret = octeon_i2c_read(i2c, pmsg->addr, pmsg->buf,
  &pmsg->len, pmsg->flags & 
I2C_M_RECV_LEN);
-- 
2.9.0.rc0.21.g322

[PATCH 3/3] i2c: octeon: Avoid printk after too long SMBUS message

2016-06-07 Thread Jan Glauber

Remove the warning about a too long SMBUS message because
the ipmi_ssif driver triggers this warning too frequently so it
spams the message log.

Signed-off-by: Jan Glauber 
---
 drivers/i2c/busses/i2c-octeon.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/i2c/busses/i2c-octeon.c b/drivers/i2c/busses/i2c-octeon.c
index 8ade7fb..83fd6d8 100644
--- a/drivers/i2c/busses/i2c-octeon.c
+++ b/drivers/i2c/busses/i2c-octeon.c
@@ -975,12 +975,8 @@ restart:
 
data[i] = octeon_i2c_data_read(i2c);
if (recv_len && i == 0) {
-   if (data[i] > I2C_SMBUS_BLOCK_MAX + 1) {
-   dev_err(i2c->dev,
-   "%s: read len > I2C_SMBUS_BLOCK_MAX 
%d\n",
-   __func__, data[i]);
+   if (data[i] > I2C_SMBUS_BLOCK_MAX + 1)
return -EPROTO;
-   }
length += data[i];
}
 
-- 
2.9.0.rc0.21.g322

Re: [PATCH v3 1/2] spi: Add DMA support for spi_flash_read()

2016-06-07 Thread Vignesh R



On Wednesday 08 June 2016 03:59 AM, kbuild test robot wrote:
> Hi,
> 
> [auto build test ERROR on spi/for-next]
> [also build test ERROR on v4.7-rc2 next-20160607]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Vignesh-R/spi-Add-DMA-support-for-ti-qspi/20160607-162134
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi for-next
> config: m32r-allyesconfig (attached as .config)
> compiler: m32r-linux-gcc (GCC) 4.9.0
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=m32r 
> 
> All errors (new ones prefixed by >>):
> 
>drivers/spi/spi.c: In function 'spi_flash_read':
>>> drivers/spi/spi.c:2758:3: error: implicit declaration of function 
>>> 'spi_map_buf' [-Werror=implicit-function-declaration]
>   ret = spi_map_buf(master, rx_dev, &msg->rx_sg,
>   ^
>>> drivers/spi/spi.c:2766:3: error: implicit declaration of function 
>>> 'spi_unmap_buf' [-Werror=implicit-function-declaration]
>   spi_unmap_buf(master, rx_dev, &msg->rx_sg,
>   ^
>cc1: some warnings being treated as errors
> 


Oops, posted v4 fixing these errors.


-- 
Regards
Vignesh

[PATCH v4 0/2] spi: Add DMA support for ti-qspi

2016-06-07 Thread Vignesh R


This series adds support for DMA during QSPI flash read using memory
mapped mode.

Tested on DRA74 EVM, DRA72 EVM and AM437x SK on linux-next.

v3: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1160792.html
v2: https://lkml.org/lkml/2016/4/25/187
v1: https://lkml.org/lkml/2016/4/4/855


Vignesh R (2):
  spi: Add DMA support for spi_flash_read()
  spi: spi-ti-qspi: Add DMA support for QSPI mmap read

 drivers/spi/spi-ti-qspi.c | 188 ++
 drivers/spi/spi.c |  28 +++
 include/linux/spi/spi.h   |   4 +
 3 files changed, 207 insertions(+), 13 deletions(-)

-- 
2.8.3

[PATCH v4 1/2] spi: Add DMA support for spi_flash_read()

2016-06-07 Thread Vignesh R

Few SPI devices provide accelerated read interfaces to read from
SPI-NOR flash devices. These hardwares also support DMA to transfer data
from flash to memory either via mem-to-mem DMA or dedicated slave DMA
channels. Hence, add support for DMA in order to improve throughput and
reduce CPU load.
Use spi_map_buf() to get sg table for the buffer and pass it to SPI
driver.

Signed-off-by: Vignesh R 
---

v4: Fix build errors reported by kbuild test bot.
v3: No changes.
v2: use cur_msg_mapped flag to indicate success/failure of spi_map_buf()


 drivers/spi/spi.c   | 28 
 include/linux/spi/spi.h |  4 
 2 files changed, 32 insertions(+)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 77e6e45951f4..c9a8d544e467 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -849,6 +849,20 @@ static int __spi_unmap_msg(struct spi_master *master, 
struct spi_message *msg)
return 0;
 }
 #else /* !CONFIG_HAS_DMA */
+static inline int spi_map_buf(struct spi_master *master,
+ struct device *dev, struct sg_table *sgt,
+ void *buf, size_t len,
+ enum dma_data_direction dir)
+{
+   return -EINVAL;
+}
+
+static inline void spi_unmap_buf(struct spi_master *master,
+struct device *dev, struct sg_table *sgt,
+enum dma_data_direction dir)
+{
+}
+
 static inline int __spi_map_msg(struct spi_master *master,
struct spi_message *msg)
 {
@@ -2725,6 +2739,7 @@ int spi_flash_read(struct spi_device *spi,
 
 {
struct spi_master *master = spi->master;
+   struct device *rx_dev = NULL;
int ret;
 
if ((msg->opcode_nbits == SPI_NBITS_DUAL ||
@@ -2750,9 +2765,22 @@ int spi_flash_read(struct spi_device *spi,
return ret;
}
}
+
mutex_lock(&master->bus_lock_mutex);
+   if (master->dma_rx) {
+   rx_dev = master->dma_rx->device->dev;
+   ret = spi_map_buf(master, rx_dev, &msg->rx_sg,
+ msg->buf, msg->len,
+ DMA_FROM_DEVICE);
+   if (!ret)
+   msg->cur_msg_mapped = true;
+   }
ret = master->spi_flash_read(spi, msg);
+   if (msg->cur_msg_mapped)
+   spi_unmap_buf(master, rx_dev, &msg->rx_sg,
+ DMA_FROM_DEVICE);
mutex_unlock(&master->bus_lock_mutex);
+
if (master->auto_runtime_pm)
pm_runtime_put(master->dev.parent);
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 1f03483f61e5..7b53af4ba5f8 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -1143,6 +1143,8 @@ static inline ssize_t spi_w8r16be(struct spi_device *spi, 
u8 cmd)
  * @opcode_nbits: number of lines to send opcode
  * @addr_nbits: number of lines to send address
  * @data_nbits: number of lines for data
+ * @rx_sg: Scatterlist for receive data read from flash
+ * @cur_msg_mapped: message has been mapped for DMA
  */
 struct spi_flash_read_message {
void *buf;
@@ -1155,6 +1157,8 @@ struct spi_flash_read_message {
u8 opcode_nbits;
u8 addr_nbits;
u8 data_nbits;
+   struct sg_table rx_sg;
+   bool cur_msg_mapped;
 };
 
 /* SPI core interface for flash read support */
-- 
2.8.3

[PATCH v4 2/2] spi: spi-ti-qspi: Add DMA support for QSPI mmap read

2016-06-07 Thread Vignesh R

Use mem-to-mem DMA to read from flash when reading in mmap mode. This
gives improved read performance and reduces CPU load.

With this patch the raw-read throughput is ~16MB/s on DRA74 EVM. And CPU
load is <20%. UBIFS read ~13 MB/s.

Signed-off-by: Vignesh R 
---

v4: use dma_request_chan_by_mask() and fix build warnings.
v3: Cleanup code based on review comments for v2.
v2: Handle kmap'd buffers of JFFS2 FS

 drivers/spi/spi-ti-qspi.c | 188 ++
 1 file changed, 175 insertions(+), 13 deletions(-)

diff --git a/drivers/spi/spi-ti-qspi.c b/drivers/spi/spi-ti-qspi.c
index 29ea8d2f9824..427f031c895b 100644
--- a/drivers/spi/spi-ti-qspi.c
+++ b/drivers/spi/spi-ti-qspi.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -41,6 +42,8 @@ struct ti_qspi_regs {
 };
 
 struct ti_qspi {
+   struct completion   transfer_complete;
+
/* list synchronization */
struct mutexlist_lock;
 
@@ -54,6 +57,9 @@ struct ti_qspi {
 
struct ti_qspi_regs ctx_reg;
 
+   dma_addr_t  mmap_phys_base;
+   struct dma_chan *rx_chan;
+
u32 spi_max_frequency;
u32 cmd;
u32 dc;
@@ -379,6 +385,72 @@ static int qspi_transfer_msg(struct ti_qspi *qspi, struct 
spi_transfer *t,
return 0;
 }
 
+static void ti_qspi_dma_callback(void *param)
+{
+   struct ti_qspi *qspi = param;
+
+   complete(&qspi->transfer_complete);
+}
+
+static int ti_qspi_dma_xfer(struct ti_qspi *qspi, dma_addr_t dma_dst,
+   dma_addr_t dma_src, size_t len)
+{
+   struct dma_chan *chan = qspi->rx_chan;
+   struct dma_device *dma_dev = chan->device;
+   dma_cookie_t cookie;
+   enum dma_ctrl_flags flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT;
+   struct dma_async_tx_descriptor *tx;
+   int ret;
+
+   tx = dma_dev->device_prep_dma_memcpy(chan, dma_dst, dma_src,
+len, flags);
+   if (!tx) {
+   dev_err(qspi->dev, "device_prep_dma_memcpy error\n");
+   return -EIO;
+   }
+
+   tx->callback = ti_qspi_dma_callback;
+   tx->callback_param = qspi;
+   cookie = tx->tx_submit(tx);
+
+   ret = dma_submit_error(cookie);
+   if (ret) {
+   dev_err(qspi->dev, "dma_submit_error %d\n", cookie);
+   return -EIO;
+   }
+
+   dma_async_issue_pending(chan);
+   ret = wait_for_completion_timeout(&qspi->transfer_complete,
+ msecs_to_jiffies(len));
+   if (ret <= 0) {
+   dmaengine_terminate_sync(chan);
+   dev_err(qspi->dev, "DMA wait_for_completion_timeout\n");
+   return -ETIMEDOUT;
+   }
+
+   return 0;
+}
+
+static int ti_qspi_dma_xfer_sg(struct ti_qspi *qspi, struct sg_table rx_sg,
+  loff_t from)
+{
+   struct scatterlist *sg;
+   dma_addr_t dma_src = qspi->mmap_phys_base + from;
+   dma_addr_t dma_dst;
+   int i, len, ret;
+
+   for_each_sg(rx_sg.sgl, sg, rx_sg.nents, i) {
+   dma_dst = sg_dma_address(sg);
+   len = sg_dma_len(sg);
+   ret = ti_qspi_dma_xfer(qspi, dma_dst, dma_src, len);
+   if (ret)
+   return ret;
+   dma_src += len;
+   }
+
+   return 0;
+}
+
 static void ti_qspi_enable_memory_map(struct spi_device *spi)
 {
struct ti_qspi  *qspi = spi_master_get_devdata(spi->master);
@@ -426,7 +498,40 @@ static void ti_qspi_setup_mmap_read(struct spi_device *spi,
  QSPI_SPI_SETUP_REG(spi->chip_select));
 }
 
-static int ti_qspi_spi_flash_read(struct  spi_device *spi,
+#ifdef CONFIG_HIGHMEM
+static int ti_qspi_map_buf(struct ti_qspi *qspi, void *buf,
+  unsigned int len, struct sg_table *sgt)
+{
+   unsigned int max_seg_size =
+   dma_get_max_seg_size(qspi->rx_chan->device->dev);
+   unsigned int desc_len = min_t(int, max_seg_size, PAGE_SIZE);
+   int sgs = DIV_ROUND_UP(len + offset_in_page(buf), desc_len);
+   struct page *vm_page;
+   size_t min;
+   int i, ret;
+
+   ret = sg_alloc_table(sgt, sgs, GFP_KERNEL);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < sgs; i++) {
+   min = min_t(size_t, len, desc_len -
+   offset_in_page(buf));
+   vm_page = kmap_to_page(buf);
+   if (!vm_page) {
+   sg_free_table(sgt);
+   return -ENOMEM;
+   }
+   sg_set_page(&sgt->sgl[i], vm_page, min,
+   offset_in_page(buf));
+   buf += min;
+   len -= min;
+   }
+   return 0;
+}
+#endif
+
+static int ti_qspi_spi_flash_read(struct spi_device *spi,
  struct spi_flash_read_message *msg)
 {
struct ti_qspi *

Re: [PATCH 1/3] dt-bindings: pwm: Add MediaTek display PWM bindings

2016-06-07 Thread weiqing kong

On Fri, 2016-06-03 at 17:03 +0200, Matthias Brugger wrote:
> 
> On 03/06/16 08:45, weiqing kong wrote:
> > On Thu, 2016-06-02 at 17:45 -0500, Rob Herring wrote:
> >> On Mon, May 30, 2016 at 04:41:50PM +0800, Weiqing Kong wrote:
> >>> Add MT2701 compatible string.
> >>>
> >>> Signed-off-by: Weiqing Kong 
> >>> ---
> >>>   Documentation/devicetree/bindings/pwm/pwm-mtk-disp.txt | 5 +++--
> >>>   1 file changed, 3 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/devicetree/bindings/pwm/pwm-mtk-disp.txt 
> >>> b/Documentation/devicetree/bindings/pwm/pwm-mtk-disp.txt
> >>> index f8f59ba..f2f2fa9 100644
> >>> --- a/Documentation/devicetree/bindings/pwm/pwm-mtk-disp.txt
> >>> +++ b/Documentation/devicetree/bindings/pwm/pwm-mtk-disp.txt
> >>> @@ -1,9 +1,10 @@
> >>>   MediaTek display PWM controller
> >>>
> >>>   Required properties:
> >>> - - compatible: should be "mediatek,-disp-pwm":
> >>> -   - "mediatek,mt8173-disp-pwm": found on mt8173 SoC.
> >>> + - compatible: should be like these:
> >>> +   - "mediatek,mt2701-disp-bls": found on mt2701 SoC.
> >>
> >> Why bls instead of pwm?
> >
> > In mt6595/mt8173, we have a hardware module named pwm.
> > In mt2701, we have bls hardware module which includes disp pwm function,
> > so we named it mt2701-disp-bls rather than mt2701-disp-pwm in order to
> > match with hardware spec.
> > thanks.
> >
> 
>  From what I understand pwm module and disp pwm in mt6589/mt8173 is 
> independent from the disp pwm. Actually the disp pwm in mt6589 is part 
> of the bls module, so this should me mediatek,mt2701-disp-pwm as well
> 
> Thanks,
> Matthias
> 

ok, I will modify it into mediatek,mt2701-disp-pwm. 

> >>
> >>>  - "mediatek,mt6595-disp-pwm": found on mt6595 SoC.
> >>> +   - "mediatek,mt8173-disp-pwm": found on mt8173 SoC.
> >>>- reg: physical base address and length of the controller's registers.
> >>>- #pwm-cells: must be 2. See pwm.txt in this directory for a 
> >>> description of
> >>>  the cell format.
> >>> --
> >>> 1.8.1.1.dirty
> >>>
> >
> >

Re: [PATCH V3 2/2] vhost_net: conditionally enable tx polling

2016-06-07 Thread Jason Wang




On 2016年06月07日 20:26, Michael S. Tsirkin wrote:

On Wed, Jun 01, 2016 at 01:56:34AM -0400, Jason Wang wrote:

We always poll tx for socket, this is sub optimal since:

- it will be only used when we exceed the sndbuf of the socket.
- since we use two independent polls for tx and vq, this will slightly
   increase the waitqueue traversing time and more important, vhost
   could not benefit from commit
   9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 ("net-tun: restructure
   tun_do_read for better sleep/wakeup efficiency") even if we've
   stopped rx polling during handle_rx since tx poll were still left in
   the waitqueue.

Fix this by conditionally enable tx polling only when -EAGAIN were
met.

Test shows about 8% improvement on guest rx pps.

Before: ~135
After:  ~146

Signed-off-by: Jason Wang 
---
  drivers/vhost/net.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1d3e45f..e75ffcc 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -378,6 +378,7 @@ static void handle_tx(struct vhost_net *net)
goto out;
  
  	vhost_disable_notify(&net->dev, vq);

+   vhost_net_disable_vq(net, vq);
  
  	hdr_size = nvq->vhost_hlen;

zcopy = nvq->ubufs;
@@ -459,6 +460,8 @@ static void handle_tx(struct vhost_net *net)
% UIO_MAXIOV;
}
vhost_discard_vq_desc(vq, 1);
+   if (err == -EAGAIN)
+   vhost_net_enable_vq(net, vq);
break;
}
if (err != len)

This seems rather risky. What if TX failed for some other reason?
Polling won't ever be re-enabled ...



But why we need to enable tx poll in this case? Even if we enable it, we 
wont' get any wakeup.



--
1.8.3.1

Re: [PATCH 1/1] perf/x86/intel: Add extended event constraints for Knights Landing

2016-06-07 Thread Peter Zijlstra

On Wed, Jun 08, 2016 at 06:02:16AM +0200, Lukasz Odzioba wrote:
> For Knights Landing processor we need to filter OFFCORE_RESPONSE
> events by config1 parameter to make sure that it will end up in
> an appropriate PMC to meet specification.
> 
> On Knights Landing:
> MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1
> MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0
> 
> This patch introduces INTEL_EEVENT_CONSTRAINT where third parameter
> specifies extended config bits allowed only on given PMCs.
> 

How does this work in the light of intel_alt_er() ?

Re: [PATCH 1/2] xen-blkfront: don't call talk_to_blkback when already connected to blkback

2016-06-07 Thread Bob Liu


On 06/07/2016 11:25 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 01, 2016 at 01:49:23PM +0800, Bob Liu wrote:
>>
>> On 06/01/2016 04:33 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, May 31, 2016 at 04:59:16PM +0800, Bob Liu wrote:
 Sometimes blkfont may receive twice blkback_changed() notification after
 migration, then talk_to_blkback() will be called twice too and confused
 xen-blkback.
>>>
>>> Could you enlighten the patch description by having some form of
>>> state transition here? I am curious how you got the frontend
>>> to get in XenbusStateConnected (via blkif_recover right) and then
>>> the backend triggering the update once more?
>>>
>>> Or is just a simple race - the backend moves from XenbusStateConnected->
>>> XenbusStateConnected - which retriggers the frontend to hit in
>>> blkback_changed the XenbusStateConnected state and go in there?
>>> (That would be in conenct_ring changing the state). But I don't
>>> see how the frontend_changed code get there as we have:
>>>
>>>  770 /*
>>>  771  * Ensure we connect even when two watches fire in
>>>  772  * close succession and we miss the intermediate value
>>>  773  * of frontend_state.
>>>  774  */
>>>  775 if (dev->state == XenbusStateConnected)
>>>  776 break;
>>>  777 
>>>
>>> ?
>>>
>>> Now what about 'blkfront_connect' being called on the second time?
>>>
>>> Ah, info->connected is probably by then in BLKIF_STATE_CONNECTED
>>> (as blkif_recover changed) and we just reread the size of the disk.
>>>
>>> Is that how about the flow goes?
>>
>>  blkfrontblkback
>> blkfront_resume()   
>>  > talk_to_blkback()
>>   > Set blkfront to XenbusStateInitialised
>>  Front changed()
>>   > Connect()
>>> Set blkback to 
>> XenbusStateConnected
>>
>> blkback_changed()
>>  > Skip talk_to_blkback()
>>because frontstate == XenbusStateInitialised
>>  > blkfront_connect()
>>   > Set blkfront to XenbusStateConnected
>>
>>
>> --
>> But sometimes blkfront receives
>> blkback_changed() event more than once!
> 
> I think I know why. The udev scripts that get invoked when when
> we attach a disk are a bit custom. As such I think they just
> revalidate the size leading to this.
> 
> And this 'poke-at-XenbusStateConnected' state multiple times
> is allowed. It is used to signal disk changes (or just to revalidate).
> Hence it does not matter why really - we need to deal with this.
> 
> I modified your patch a bit and are testing it:
> 

Looks much better, thank you very much!

Bob

> From e49dc9fc65eda4923b41d903ac51a7ddee182bcd Mon Sep 17 00:00:00 2001
> From: Bob Liu 
> Date: Tue, 7 Jun 2016 10:43:15 -0400
> Subject: [PATCH] xen-blkfront: don't call talk_to_blkback when already
>  connected to blkback
> 
> Sometimes blkfront may twice receive blkback_changed() notification
> (XenbusStateConnected) after migration, which will cause
> talk_to_blkback() to be called twice too and confuse xen-blkback.
> 
> The flow is as follow:
>blkfrontblkback
> blkfront_resume()
>  > talk_to_blkback()
>   > Set blkfront to XenbusStateInitialised
> front changed()
>  > Connect()
>   > Set blkback to 
> XenbusStateConnected
> 
> blkback_changed()
>  > Skip talk_to_blkback()
>because frontstate == XenbusStateInitialised
>  > blkfront_connect()
>   > Set blkfront to XenbusStateConnected
> 
> -
> And here we get another XenbusStateConnected notification leading
> to:
> -
> blkback_changed()
>  > because now frontstate != XenbusStateInitialised
>talk_to_blkback() is also called again
>   > blkfront state changed from
>   XenbusStateConnected to XenbusStateInitialised
> (Which is not correct!)
> 
>   front_changed():
>  > Do nothing because blkback
>already in 
> XenbusStateConnected
> 
> Now blkback is in XenbusStateConnected but blkfront is still
> in XenbusStateInitialised - leading to no disks.
> 
> Poking of the XenbusStateConnected state is allowed (to deal with
> block disk change) and has to be dealt with. The most likely
> cause of this bug are custom udev scripts hooking up the disks
> and then validating the size.
> 
> Signed-off-by: Bob Liu 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  drivers/block/xen-blkfront.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/xen-blkfron

[RESEND PATCH v9 01/22] net: hns: Add reset function support for RoCE driver

2016-06-07 Thread Lijun Ou

It added reset function for RoCE driver. RoCE is a feature of hns.
In hip06 SoC, in RoCE reset process, it's needed to configure dsaf
channel reset, port and sl map info. Reset function of RoCE is
located in dsaf module, we only call it in RoCE driver when needed.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
Signed-off-by: Sheng Li 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 84 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 32 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 57 ---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  | 15 +++-
 4 files changed, 178 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 1c2ddb2..0c4a87c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2685,6 +2686,89 @@ static struct platform_driver g_dsaf_driver = {
 
 module_platform_driver(g_dsaf_driver);
 
+/**
+ * hns_dsaf_roce_reset - reset dsaf and roce
+ * @dsaf_fwnode: Pointer to framework node for the dasf
+ * @enable: false - request reset , true - drop reset
+ * retuen 0 - success , negative -fail
+ */
+int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool enable)
+{
+   struct dsaf_device *dsaf_dev;
+   struct platform_device *pdev;
+   unsigned int mp;
+   unsigned int sl;
+   unsigned int credit;
+   int i;
+   const u32 port_map[DSAF_ROCE_CREDIT_CHN][DSAF_ROCE_CHAN_MODE_NUM] = {
+   {DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1, DSAF_ROCE_PORT_0},
+   {DSAF_ROCE_PORT_4, DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_4, DSAF_ROCE_PORT_2, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_5, DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1},
+   {DSAF_ROCE_PORT_5, DSAF_ROCE_PORT_3, DSAF_ROCE_PORT_1},
+   };
+   const u32 sl_map[DSAF_ROCE_CREDIT_CHN][DSAF_ROCE_CHAN_MODE_NUM] = {
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_0},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_1, DSAF_ROCE_SL_1},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_2},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_1, DSAF_ROCE_SL_3},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_0},
+   {DSAF_ROCE_SL_1, DSAF_ROCE_SL_1, DSAF_ROCE_SL_1},
+   {DSAF_ROCE_SL_0, DSAF_ROCE_SL_0, DSAF_ROCE_SL_2},
+   {DSAF_ROCE_SL_1, DSAF_ROCE_SL_1, DSAF_ROCE_SL_3},
+   };
+
+   if (!is_of_node(dsaf_fwnode)) {
+   pr_err("hisi_dsaf: Only support DT node!\n");
+   return -EINVAL;
+   }
+   pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
+   dsaf_dev = dev_get_drvdata(&pdev->dev);
+   if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
+   dev_err(dsaf_dev->dev, "%s v1 chip do not support roce!\n",
+   dsaf_dev->ae_dev.name);
+   return -ENODEV;
+   }
+
+   if (!enable) {
+   /* Reset rocee-channels in dsaf and rocee */
+   hns_dsaf_srst_chns(dsaf_dev, DSAF_CHNS_MASK, false);
+   hns_dsaf_roce_srst(dsaf_dev, false);
+   } else {
+   /* Configure dsaf tx roce correspond to port map and sl map */
+   mp = dsaf_read_dev(dsaf_dev, DSAF_ROCE_PORT_MAP_REG);
+   for (i = 0; i < DSAF_ROCE_CREDIT_CHN; i++)
+   dsaf_set_field(mp, 7 << i * 3, i * 3,
+  port_map[i][DSAF_ROCE_6PORT_MODE]);
+   dsaf_set_field(mp, 3 << i * 3, i * 3, 0);
+   dsaf_write_dev(dsaf_dev, DSAF_ROCE_PORT_MAP_REG, mp);
+
+   sl = dsaf_read_dev(dsaf_dev, DSAF_ROCE_SL_MAP_REG);
+   for (i = 0; i < DSAF_ROCE_CREDIT_CHN; i++)
+   dsaf_set_field(sl, 3 << i * 2, i * 2,
+  sl_map[i][DSAF_ROCE_6PORT_MODE]);
+   dsaf_write_dev(dsaf_dev, DSAF_ROCE_SL_MAP_REG, sl);
+
+   /* De-reset rocee-channels in dsaf and rocee */
+   hns_dsaf_srst_chns(dsaf_dev, DSAF_CHNS_MASK, true);
+   msleep(20);
+   hns_dsaf_roce_srst(dsaf_dev, true);
+
+   /* Eanble dsaf channel rocee credit */
+   credit = dsaf_read_dev(dsaf_dev, DSAF_SBM_ROCEE_CFG_REG_REG);
+   dsaf_set_bit(credit, DSAF_SBM_ROCEE_CFG_CRD_EN_B, 0);
+   dsaf_write_dev(dsaf_dev, DSAF_SBM_ROCEE_CFG_REG_REG, credit);
+
+   dsaf_set_bit(credit, DSAF_SBM_ROCEE_CFG_CRD_EN_B, 1);
+

[RESEND PATCH v9 08/22] IB/hns: Add icm support

2016-06-07 Thread Lijun Ou

This patch mainly added icm support for RoCE. It initializes icm
which managers the relative memory blocks for RoCE. The data
structures of RoCE will be located in it. For example, CQ table,
QP table and MTPT table so on.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_common.h |  19 ++
 drivers/infiniband/hw/hns/hns_roce_device.h |  30 ++
 drivers/infiniband/hw/hns/hns_roce_icm.c| 458 
 drivers/infiniband/hw/hns/hns_roce_icm.h| 119 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  84 +
 5 files changed, 710 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_icm.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_icm.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 4805852..f15bf1b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -53,6 +53,22 @@
 #define roce_set_bit(origin, shift, val) \
roce_set_field((origin), (1ul << (shift)), (shift), (val))
 
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S 0
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_M   \
+   (((1UL << 19) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_S 19
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_S 20
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_M   \
+   (((1UL << 2) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_MDF_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_S 22
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_BA_H_S)
+
+#define ROCEE_BT_CMD_H_ROCEE_BT_CMD_HW_SYNS_S 31
+
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
(((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
@@ -93,6 +109,8 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_BT_CMD_H_REG 0x204
+
 #define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
 #define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
 
@@ -105,6 +123,7 @@
 
 #define ROCEE_CAEP_CE_INTERVAL_CFG_REG 0x190
 #define ROCEE_CAEP_CE_BURST_NUM_CFG_REG0x194
+#define ROCEE_BT_CMD_L_REG 0x200
 
 #define ROCEE_MB1_REG  0x210
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index dc0b520..f7d9e0c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -111,6 +111,26 @@ enum {
HNS_ROCE_CMD_SUCCESS= 1,
 };
 
+struct hns_roce_icm_table {
+   /* ICM type: 0 = qpc 1 = mtt 2 = cqc 3 = srq 4 = other */
+   u32 type;
+   /* ICM array elment num */
+   unsigned long   num_icm;
+   /* ICM entry record obj total num */
+   unsigned long   num_obj;
+   /*Single obj size */
+   unsigned long   obj_size;
+   int lowmem;
+   int coherent;
+   struct mutexmutex;
+   struct hns_roce_icm **icm;
+};
+
+struct hns_roce_mr_table {
+   struct hns_roce_icm_table   mtt_table;
+   struct hns_roce_icm_table   mtpt_table;
+};
+
 struct hns_roce_buf_list {
void*buf;
dma_addr_t  map;
@@ -126,11 +146,14 @@ struct hns_roce_cq {
 
 struct hns_roce_qp_table {
spinlock_t  lock;
+   struct hns_roce_icm_table   qp_table;
+   struct hns_roce_icm_table   irrl_table;
 };
 
 struct hns_roce_cq_table {
spinlock_t  lock;
struct radix_tree_root  tree;
+   struct hns_roce_icm_table   table;
 };
 
 struct hns_roce_cmd_context {
@@ -259,6 +282,7 @@ struct hns_roce_hw {
 struct hns_roce_dev {
struct ib_deviceib_dev;
struct platform_device  *pdev;
+   spinlock_t  bt_cmd_lock;
struct hns_roce_ib_iboe iboe;
 
int irq[HNS_ROCE_MAX_IRQ_NUM];
@@ -273,6 +297,7 @@ struct hns_roce_dev {
u32 hw_rev;
 
struct hns_roce_cmdqcmd;
+   struct hns_roce_mr_table  mr_table;
struct hns_roce_cq_table  cq_table;
struct hns_roce_qp_table  qp_table;
struct hns_roce_eq_table  eq_table;
@@ -282,6 +307,11 @@ struct hns_roce_dev {
struct hns_roce_hw  *hw;
 };
 
+static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
+{
+   __raw_writeq(*(u64 *) val, dest);
+}
+
 static inline struct hns_roce_qp
*__hns_roce_qp_lookup(struct hns_roce_dev *hr_dev, u32 qpn)
 {
diff --git a/drivers/infiniband/hw/hns/hns_roce_icm.c 
b/drivers/infiniband/hw/hns/hns_roce_icm.c
new file mode 100644
index 000..7404a6f
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_icm.

[RESEND PATCH v9 00/22] Add HiSilicon RoCE driver

2016-06-07 Thread Lijun Ou

The HiSilicon Network Substem is a long term evolution IP which is
supposed to be used in HiSilicon ICT SoCs. HNS (HiSilicon Network
Sybsystem) also has a hardware support of performing RDMA with
RoCEE.
The driver for HiSilicon RoCEE(RoCE Engine) is a platform driver and
will support mulitple versions of SOCs in future. This version of driver
is meant to support Hip06 SoC(which confirms to RoCEEv1 hardware
specifications).

Changes v8 -> v9:
1. delete the definition of ADDR_SHIFT_n, use literal 12, 32 and 44 and
   add comments
2. use roce_read/roce_write/readl/writel instead of roce_readl/roce_writel
3. delete the print error/debug messages for memory allocation errors
4. use exit instead of uninit, for example hw->uninit -> hw->exit
5. use roce_raw_write instead of _raw_writel in eq_set_cons_index
6. modify the label with underscore
7. adjust the indentation for the macro definitions in hns_roce_hw_v1.c
8. simplify some lines in few functions and structures.
9. adjust the alphabetic order in MAINTAINERS

Changes v7 -> v8:
1. add a verbs operation named get_port_immutable. It is an 
   independent patch.
2. add a comment for the definition of ADDR_SHIFT_n, n are 12,32
   and 44.
3. restructures the code to align with naming convention of the Linux
   according to the review of Doug Ledford.
4. modify the state for all .c and .h files.

Changes v6 -> v7:
1. modify some type of parameter, use bool replace the original type.
2. add the Signed-off-by signatures in the first patch.
3. delete the improper print sentence in hns_roce_create_eq.

Changes v5 -> v6:
1. modify the type of obj for unsigned long according the reviews, and
   modify the same questions in RoCE module.
2. fix the spelling error.
3. fix the Signed-off-by signatures.

Changes v4 -> v5:
1. redesign the patchset for RoCE modules in order to split the huge
   patch into small patches.
2. fix the directory path for RoCE module. Delete the hisilicon level.
3. modify the name of roce_v1_hw into roce_hw_v1.

Changes v3 -> v4:
1. modify roce.o into hns-roce.o in Makefile and Kconfig file.

Changes v2 -> v3:
1. modify the formats of RoCE driver code base v2 by the experts 
   reviewing. also, it used kmalloc_array instead of kmalloc, kcalloc
   instead of kzalloc, when refer to memory allocation for array
2. remove some functions without use and unconnected macros
3. modify the binding document with RoCE DT base v2 which added
   interrupt-names
4. redesign the port_map and si_map in hns_dsaf_roce_reset
5. add HiSilicon RoCE driver maintainers introduction in MAINTAINERS
   document

Changes v1 -> v2:
1. modify the formats of roce driver code by the experts reviewing
2. modify the bindings file with roce dts. add the attribute named 
   interrput-names.
3. modify the way of defining port mode in hns_dsaf_main.c
4. move the Kconfig file into the hns directory and send it with roce

Lijun Ou (22):
  net: hns: Add reset function support for RoCE driver
  devicetree: bindings: IB: Add binding document for HiSilicon RoCE
  IB/hns: Add initial main frame driver and get cfg info
  IB/hns: Add RoCE engine reset function
  IB/hns: Add initial profile resource
  IB/hns: Add initial cmd operation
  IB/hns: Add event queue support
  IB/hns: Add icm support
  IB/hns: Add hca support
  IB/hns: Add process flow to init RoCE engine
  IB/hns: Add IB device registration
  IB/hns: Set mtu and gid support
  IB/hns: Add interface of the protocol stack registration
  IB/hns: Add operations support for IB device and port
  IB/hns: Add PD operations support
  IB/hns: Add ah operations support
  IB/hns: Add QP operations support
  IB/hns: Add CQ operations support
  IB/hns: Add memory region operations support
  IB/hns: Add operation for getting immutable port
  IB/hns: Kconfig and Makefile for RoCE module
  MAINTAINERS: Add maintainers for HiSilicon RoCE driver

 .../bindings/infiniband/hisilicon-hns-roce.txt |  107 +
 MAINTAINERS|8 +
 drivers/infiniband/Kconfig |1 +
 drivers/infiniband/hw/Makefile |1 +
 drivers/infiniband/hw/hns/Kconfig  |   10 +
 drivers/infiniband/hw/hns/Makefile |8 +
 drivers/infiniband/hw/hns/hns_roce_ah.c|  132 +
 drivers/infiniband/hw/hns/hns_roce_alloc.c |  262 ++
 drivers/infiniband/hw/hns/hns_roce_cmd.c   |  388 +++
 drivers/infiniband/hw/hns/hns_roce_cmd.h   |   84 +
 drivers/infiniband/hw/hns/hns_roce_common.h|  325 +++
 drivers/infiniband/hw/hns/hns_roce_cq.c|  456 
 drivers/infiniband/hw/hns/hns_roce_device.h|  748 ++
 drivers/infiniband/hw/hns/hns_roce_eq.c|  768 ++
 drivers/infiniband/hw/hns/hns_roce_eq.h|  131 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2787 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h |  981 +++
 drivers/infiniband/hw/hns/hns_roce_icm.c

[RESEND PATCH v9 07/22] IB/hns: Add event queue support

2016-06-07 Thread Lijun Ou

This patch added event queue support for RoCE driver. It is used
for RoCE interrupt. RoCE includes 32 synchronous event irqs, 1
asynchronous event irq and 1 common overflow irq.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c|  22 +
 drivers/infiniband/hw/hns/hns_roce_common.h |  70 +++
 drivers/infiniband/hw/hns/hns_roce_cq.c |  77 +++
 drivers/infiniband/hw/hns/hns_roce_device.h | 136 +
 drivers/infiniband/hw/hns/hns_roce_eq.c | 768 
 drivers/infiniband/hw/hns/hns_roce_eq.h | 131 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |  24 +
 drivers/infiniband/hw/hns/hns_roce_qp.c |  63 +++
 8 files changed, 1291 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cq.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_eq.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_eq.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_qp.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 64e84fe..67b3137 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -45,6 +45,28 @@
 
 #define CMD_MAX_NUM32
 
+static int hns_roce_status_to_errno(u8 orig_status)
+{
+   if (orig_status == HNS_ROCE_CMD_SUCCESS)
+   return 0;
+   else
+   return -EIO;
+}
+
+void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status,
+   u64 out_param)
+{
+   struct hns_roce_cmd_context
+   *context = &hr_dev->cmd.context[token & hr_dev->cmd.token_mask];
+
+   if (token != context->token)
+   return;
+
+   context->result = hns_roce_status_to_errno(status);
+   context->out_param = out_param;
+   complete(&context->done);
+}
+
 int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
 {
struct device *dev = &hr_dev->pdev->dev;
diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 595cda9..4805852 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -33,7 +33,56 @@
 #ifndef _HNS_ROCE_COMMON_H
 #define _HNS_ROCE_COMMON_H
 
+#define roce_write(dev, reg, val)  writel((val), (dev)->reg_base + (reg))
 #define roce_read(dev, reg)readl((dev)->reg_base + (reg))
+#define roce_raw_write(value, addr) \
+   __raw_writel((__force u32)cpu_to_le32(value), (addr))
+
+#define roce_get_field(origin, mask, shift) \
+   (((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+   roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+   do { \
+   (origin) &= (~(mask)); \
+   (origin) |= (((u32)(val) << (shift)) & (mask)); \
+   } while (0)
+
+#define roce_set_bit(origin, shift, val) \
+   roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
+   (((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_S 8
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_M   \
+   (((1UL << 4) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_AEQE_SHIFT_S)
+
+#define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQ_ALM_OVF_INT_ST_S 17
+
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_S 0
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_M   \
+   (((1UL << 5) - 1) << ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQ_BT_H_S)
+
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_S 16
+#define ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_M   \
+   (((1UL << 16) - 1) << ROCEE_CAEP_AEQE_CUR_IDX_CAEP_AEQE_CUR_IDX_S)
+
+#define ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_S 0
+#define ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_M   \
+   (((1UL << 16) - 1) << ROCEE_CAEP_AEQE_CONS_IDX_CAEP_AEQE_CONS_IDX_S)
+
+#define ROCEE_CAEP_CEQC_SHIFT_CAEP_CEQ_ALM_OVF_INT_ST_S 16
+#define ROCEE_CAEP_CE_IRQ_MASK_CAEP_CEQ_ALM_OVF_MASK_S 1
+#define ROCEE_CAEP_CEQ_ALM_OVF_CAEP_CEQ_ALM_OVF_S 0
+
+#define ROCEE_CAEP_AE_MASK_CAEP_AEQ_ALM_OVF_MASK_S 0
+#define ROCEE_CAEP_AE_MASK_CAEP_AE_IRQ_MASK_S 1
+
+#define ROCEE_CAEP_AE_ST_CAEP_AEQ_ALM_OVF_S 0
 
 /*ROCEE_REG DEFINITION/
 #define ROCEE_VENDOR_ID_REG0x0
@@ -44,8 +93,29 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
+#define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
+
+#define ROCEE_ECC_UCERR_ALM1_REG   0xB38
+#define ROCEE_ECC_UCERR_ALM2_REG   0xB3C
+#define ROCEE_ECC_CERR_ALM1_REG0xB44
+#define ROCEE_ECC_CERR_ALM2_REG0xB48
+
 #d

[RESEND PATCH v9 16/22] IB/hns: Add ah operations support

2016-06-07 Thread Lijun Ou

This patch was for implementing of address handle operations.
It includes three verbs that create ah, query ah and destroy
ah. They is completed independently by RoCE driver.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_ah.c | 132 
 drivers/infiniband/hw/hns/hns_roce_device.h |  30 +++
 drivers/infiniband/hw/hns/hns_roce_main.c   |   5 ++
 3 files changed, 167 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_ah.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_ah.c 
b/drivers/infiniband/hw/hns/hns_roce_ah.c
new file mode 100644
index 000..9397614
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_ah.c
@@ -0,0 +1,132 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_common.h"
+#include "hns_roce_device.h"
+
+#define HNS_ROCE_PORT_NUM_SHIFT24
+#define HNS_ROCE_VLAN_SL_BIT_MASK  7
+#define HNS_ROCE_VLAN_SL_SHIFT 13
+
+struct ib_ah *hns_roce_create_ah(struct ib_pd *ibpd, struct ib_ah_attr 
*ah_attr)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ibpd->device);
+   struct device *dev = &hr_dev->pdev->dev;
+   struct ib_gid_attr gid_attr;
+   struct hns_roce_ah *ah;
+   u16 vlan_tag = 0x;
+   struct in6_addr in6;
+   union ib_gid sgid;
+   int ret;
+
+   ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
+   if (!ah)
+   return ERR_PTR(-ENOMEM);
+
+   /* Get mac address */
+   memcpy(&in6, ah_attr->grh.dgid.raw, sizeof(ah_attr->grh.dgid.raw));
+   if (rdma_is_multicast_addr(&in6))
+   rdma_get_mcast_mac(&in6, ah->av.mac);
+   else
+   memcpy(ah->av.mac, ah_attr->dmac, sizeof(ah_attr->dmac));
+
+   /* Get source gid */
+   ret = ib_get_cached_gid(ibpd->device, ah_attr->port_num,
+   ah_attr->grh.sgid_index, &sgid, &gid_attr);
+   if (ret) {
+   dev_err(dev, "get sgid failed! ret = %d\n", ret);
+   kfree(ah);
+   return ERR_PTR(ret);
+   }
+
+   if (gid_attr.ndev) {
+   if (is_vlan_dev(gid_attr.ndev))
+   vlan_tag = vlan_dev_vlan_id(gid_attr.ndev);
+   dev_put(gid_attr.ndev);
+   }
+
+   if (vlan_tag < 0x1000)
+   vlan_tag |= (ah_attr->sl & HNS_ROCE_VLAN_SL_BIT_MASK) <<
+HNS_ROCE_VLAN_SL_SHIFT;
+
+   ah->av.port_pd = cpu_to_be32(to_hr_pd(ibpd)->pdn | (ah_attr->port_num <<
+HNS_ROCE_PORT_NUM_SHIFT));
+   ah->av.gid_index = ah_attr->grh.sgid_index;
+   ah->av.vlan = cpu_to_le16(vlan_tag);
+   dev_dbg(dev, "gid_index = 0x%x,vlan = 0x%x\n", ah->av.gid_index,
+   ah->av.vlan);
+
+   if (ah_attr->static_rate)
+   ah->av.stat_rate = IB_RATE_10_GBPS;
+
+   memcpy(ah->av.dgid, ah_attr->grh.dgid.raw, HNS_ROCE_GID_SIZE);
+   ah->av.sl_tclass_flowlabel = cpu_to_le32(ah_attr->sl <<
+HNS_ROCE_SL_SHIFT);
+
+   return &ah->ibah;
+}
+
+int hns_roce_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+{
+   struct hns_roce_ah *ah = to_hr_ah(ibah);
+
+   memset(ah_attr, 0, sizeof(*ah_attr));
+
+   ah_attr->sl = le32_to_cpu(ah->av.sl_tclass_flowlabel) >>
+ HNS_ROCE_SL_SHIFT;
+   ah_attr->port_num = le32_to_cpu(ah->av.port_pd) >>
+   HNS_RO

Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Martin Schwidefsky

On Tue,  7 Jun 2016 22:49:30 +0100
Sudip Mukherjee  wrote:

> etr_ptff definitions are moved and renamed but we missed updating them
> here and as a result s390 defconfig and allmodconfig was failing with
> the error:
> arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> 
> Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> Signed-off-by: Sudip Mukherjee 
> ---
> 
> s390 defconfig build log is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> 
>  arch/s390/kvm/kvm-s390.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fa51aef..3039eaf 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
>   }
> 
>   if (test_facility(28)) /* TOD-clock steering */
> - etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> + ptff(kvm_s390_available_subfunc.ptff,
> +  sizeof(kvm_s390_available_subfunc.ptff),
> +  PTFF_QAF);
> 
>   if (test_facility(17)) { /* MSA */
>   __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);

Already send the exact same patch to Stephen yesterday. But thanks anyway.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

[RESEND PATCH v9 13/22] IB/hns: Add interface of the protocol stack registration

2016-06-07 Thread Lijun Ou

This patch mainly added the function module which netif notify
registered the protocol stack. It includes interface functions
as follows:
1. The executive called interface of RoCE when the netlink
   event that registered protocol stack was generated
2. The executive called interface of RoCE when ip address
   that registered protocol stack was changed.
In addition that, it will free the relative resource when RoCE
is removed.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |   3 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 209 
 2 files changed, 212 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 531e488..2d75585 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -256,7 +256,10 @@ struct hns_roce_qp {
 };
 
 struct hns_roce_ib_iboe {
+   spinlock_t  lock;
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   struct notifier_block   nb;
+   struct notifier_block   nb_inet;
/* 16 GID is shared by 6 port in v1 engine. */
union ib_gidgid_table[HNS_ROCE_MAX_GID_NUM];
u8  phy_port[HNS_ROCE_MAX_PORTS];
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index b1f4e7a..8f8bc7a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -63,6 +63,46 @@
 #include "hns_roce_icm.h"
 
 /**
+ * hns_roce_addrconf_ifid_eui48 - Get default gid.
+ * @eui: eui.
+ * @vlan_id:  gid
+ * @dev:  net device
+ * Description:
+ *MAC convert to GID
+ *gid[0..7] = fe80   
+ *gid[8] = mac[0] ^ 2
+ *gid[9] = mac[1]
+ *gid[10] = mac[2]
+ *gid[11] = ff(VLAN ID high byte (4 MS bits))
+ *gid[12] = fe(VLAN ID low byte)
+ *gid[13] = mac[3]
+ *gid[14] = mac[4]
+ *gid[15] = mac[5]
+ */
+static void hns_roce_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
+struct net_device *dev)
+{
+   memcpy(eui, dev->dev_addr, 3);
+   memcpy(eui + 5, dev->dev_addr + 3, 3);
+   if (vlan_id < 0x1000) {
+   eui[3] = vlan_id >> 8;
+   eui[4] = vlan_id & 0xff;
+   } else {
+   eui[3] = 0xff;
+   eui[4] = 0xfe;
+   }
+   eui[0] ^= 2;
+}
+
+void hns_roce_make_default_gid(struct net_device *dev, union ib_gid *gid)
+{
+   memset(gid, 0, sizeof(*gid));
+   gid->raw[0] = 0xFE;
+   gid->raw[1] = 0x80;
+   hns_roce_addrconf_ifid_eui48(&gid->raw[8], 0x, dev);
+}
+
+/**
  * hns_get_gid_index - Get gid index.
  * @hr_dev: pointer to structure hns_roce_dev.
  * @port:  port, value range: 0 ~ MAX
@@ -140,6 +180,152 @@ void hns_roce_update_gids(struct hns_roce_dev *hr_dev, 
int port)
ib_dispatch_event(&event);
 }
 
+static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
+  unsigned long event)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+   struct net_device *netdev;
+   unsigned long flags;
+   union ib_gid gid;
+   int ret = 0;
+
+   netdev = hr_dev->iboe.netdevs[port];
+   if (!netdev) {
+   dev_err(dev, "port(%d) can't find netdev\n", port);
+   return -ENODEV;
+   }
+
+   spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+
+   switch (event) {
+   case NETDEV_UP:
+   case NETDEV_CHANGE:
+   case NETDEV_REGISTER:
+   case NETDEV_CHANGEADDR:
+   hns_roce_set_mac(hr_dev, port, netdev->dev_addr);
+   hns_roce_make_default_gid(netdev, &gid);
+   ret = hns_roce_set_gid(hr_dev, port, 0, &gid);
+   if (!ret)
+   hns_roce_update_gids(hr_dev, port);
+   break;
+   case NETDEV_DOWN:
+   /*
+   * In v1 engine, only support all ports closed together.
+   */
+   break;
+   default:
+   dev_dbg(dev, "NETDEV event = 0x%x!\n", (u32)(event));
+   break;
+   }
+
+   spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+   return ret;
+}
+
+static int hns_roce_netdev_event(struct notifier_block *self,
+unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct hns_roce_ib_iboe *iboe = NULL;
+   struct hns_roce_dev *hr_dev = NULL;
+   u8 port = 0;
+   int ret = 0;
+
+   hr_dev = container_of(self, struct hns_roce_dev, iboe.nb);
+   iboe = &hr_dev->iboe;
+
+   for (port = 0; port < hr_dev->caps.num_ports; port++) {
+   if (dev == iboe->netdevs[port]) {
+   ret = handle_en_event(hr_dev, port, event)

Re: [PATCH] mm/zsmalloc: add trace events for zs_compact

2016-06-07 Thread Ganesh Mahendran

Hi, Minchan:

2016-06-08 13:13 GMT+08:00 Minchan Kim :
> On Wed, Jun 08, 2016 at 09:48:30AM +0800, Ganesh Mahendran wrote:
>> Hi, Minchan:
>>
>> 2016-06-08 8:16 GMT+08:00 Minchan Kim :
>> > Hello Ganesh,
>> >
>> > On Tue, Jun 07, 2016 at 04:56:44PM +0800, Ganesh Mahendran wrote:
>> >> Currently zsmalloc is widely used in android device.
>> >> Sometimes, we want to see how frequently zs_compact is
>> >> triggered or how may pages freed by zs_compact(), or which
>> >> zsmalloc pool is compacted.
>> >>
>> >> Most of the time, user can get the brief information from
>> >> trace_mm_shrink_slab_[start | end], but in some senario,
>> >> they do not use zsmalloc shrinker, but trigger compaction manually.
>> >> So add some trace events in zs_compact is convenient. Also we
>> >> can add some zsmalloc specific information(pool name, total compact
>> >> pages, etc) in zsmalloc trace.
>> >
>> > Sorry, I cannot understand what's the problem now and what you want to
>> > solve. Could you elaborate it a bit?
>> >
>> > Thanks.
>>
>> We have backported the zs_compact() to our product(kernel 3.18).
>> It is usefull for a longtime running device.
>> But there is not a convenient way to get the detailed information
>> of zs_comapct() which is usefull for  performance optimization.
>> Information about how much time zs_compact used, which pool is
>> compacted, how many page freed, etc.
>
> You can know how many pages are freed by object compaction via mm_stat
> each /sys/block/zram-id/mm_stat. And you can use function_graph to know
> how much time zs_compact used.

zsmalloc is not only used by zram, but also zswap. Maybe
others in the future.

I tried to use function_graph. It seems there are too much log
printed:
--
root@leo-test:/sys/kernel/debug/tracing# cat trace
# tracer: function_graph
#
# CPU  DURATION  FUNCTION CALLS
# | |   | |   |   |   |
 2)   |  zs_compact [zsmalloc]() {
 2)   |  /* zsmalloc_compact_start: pool zram0 */
 2)   0.889 us|_raw_spin_lock();
 2)   0.896 us|isolate_zspage [zsmalloc]();
 2)   0.938 us|_raw_spin_lock();
 2)   0.875 us|isolate_zspage [zsmalloc]();
 2)   0.942 us|_raw_spin_lock();
 2)   0.962 us|isolate_zspage [zsmalloc]();
...
 2)   0.879 us|  insert_zspage [zsmalloc]();
 2)   4.520 us|}
 2)   0.975 us|_raw_spin_lock();
 2)   0.890 us|isolate_zspage [zsmalloc]();
 2)   0.882 us|_raw_spin_lock();
 2)   0.894 us|isolate_zspage [zsmalloc]();
 2)   |  /* zsmalloc_compact_end: pool zram0: 0 pages
compacted(total 0) */
 2) # 1351.241 us |  }
--
=> 1351.241 us used

And it seems the overhead of function_graph is bigger than trace event.

bash-3682  [002]   1439.180646: zsmalloc_compact_start: pool zram0
bash-3682  [002]   1439.180659: zsmalloc_compact_end: pool zram0:
0 pages compacted(total 0)
=> 13 us > 1351.241 us

Thanks.

>
>
>> With these information, we will know what is going on in zs_comapct.
>> And draw the relation between free mem and zs_comapct.
>>
>> >
>> >>
>> >> This patch add two trace events for zs_compact(), below the trace log:
>> >> -
>> >> root@land:/ # cat /d/tracing/trace
>> >>  kswapd0-125   [007] ...1   174.176979: zsmalloc_compact_start: 
>> >> pool zram0
>> >>  kswapd0-125   [007] ...1   174.181967: zsmalloc_compact_end: 
>> >> pool zram0: 608 pages compacted(total 1794)
>> >>  kswapd0-125   [000] ...1   184.134475: zsmalloc_compact_start: 
>> >> pool zram0
>> >>  kswapd0-125   [000] ...1   184.135010: zsmalloc_compact_end: 
>> >> pool zram0: 62 pages compacted(total 1856)
>> >>  kswapd0-125   [003] ...1   226.927221: zsmalloc_compact_start: 
>> >> pool zram0
>> >>  kswapd0-125   [003] ...1   226.928575: zsmalloc_compact_end: 
>> >> pool zram0: 250 pages compacted(total 2106)
>> >> -
>> >>
>> >> Signed-off-by: Ganesh Mahendran 
>> >> ---
>> >>  include/trace/events/zsmalloc.h | 56 
>> >> +
>> >>  mm/zsmalloc.c   | 10 
>> >>  2 files changed, 66 insertions(+)
>> >>  create mode 100644 include/trace/events/zsmalloc.h
>> >>
>> >> diff --git a/include/trace/events/zsmalloc.h 
>> >> b/include/trace/events/zsmalloc.h
>> >> new file mode 100644
>> >> index 000..3b6f14e
>> >> --- /dev/null
>> >> +++ b/include/trace/events/zsmalloc.h
>> >> @@ -0,0 +1,56 @@
>> >> +#undef TRACE_SYSTEM
>> >> +#define TRACE_SYSTEM zsmalloc
>> >> +
>> >> +#if !defined(_TRACE_ZSMALLOC_H) || defined(TRACE_HEADER_MULTI_READ)
>> >> +#define _TRACE_ZSMALLOC_H
>> >> +
>> >> +#include 
>> >> +#include 
>> >> +
>> >> +TRACE_EVENT(zsmalloc_compact_start,
>> >> +
>> >> + TP_PROTO(const char *pool_name),
>> >> +
>> >> + TP_ARGS(pool_name),
>> >> +
>> >> + TP_STRUCT__entry(
>> >> + __field(const char *, pool_name)
>> >> + ),
>> >> +

[RESEND PATCH v9 11/22] IB/hns: Add IB device registration

2016-06-07 Thread Lijun Ou

This patch registered IB device when loaded, and unregistered
IB device when removed.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_main.c | 46 +++
 1 file changed, 46 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 7fb0d34..f179a7f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -62,6 +62,41 @@
 #include "hns_roce_device.h"
 #include "hns_roce_icm.h"
 
+void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
+{
+   ib_unregister_device(&hr_dev->ib_dev);
+}
+
+int hns_roce_register_device(struct hns_roce_dev *hr_dev)
+{
+   int ret;
+   struct hns_roce_ib_iboe *iboe = NULL;
+   struct ib_device *ib_dev = NULL;
+   struct device *dev = &hr_dev->pdev->dev;
+
+   iboe = &hr_dev->iboe;
+
+   ib_dev = &hr_dev->ib_dev;
+   strlcpy(ib_dev->name, "hisi_%d", IB_DEVICE_NAME_MAX);
+
+   ib_dev->owner   = THIS_MODULE;
+   ib_dev->node_type   = RDMA_NODE_IB_CA;
+   ib_dev->dma_device  = dev;
+
+   ib_dev->phys_port_cnt   = hr_dev->caps.num_ports;
+   ib_dev->local_dma_lkey  = hr_dev->caps.reserved_lkey;
+   ib_dev->num_comp_vectors= hr_dev->caps.num_comp_vectors;
+   ib_dev->uverbs_abi_ver  = 1;
+
+   ret = ib_register_device(ib_dev, NULL);
+   if (ret) {
+   dev_err(dev, "ib_register_device failed!\n");
+   return ret;
+   }
+
+   return 0;
+}
+
 int hns_roce_get_cfg(struct hns_roce_dev *hr_dev)
 {
int i;
@@ -363,6 +398,17 @@ static int hns_roce_probe(struct platform_device *pdev)
goto error_failed_engine_init;
}
 
+   ret = hns_roce_register_device(hr_dev);
+   if (ret) {
+   dev_err(dev, "register_device failed!\n");
+   goto error_failed_register_device;
+   }
+
+   return 0;
+
+error_failed_register_device:
+   hns_roce_engine_exit(hr_dev);
+
 error_failed_engine_init:
hns_roce_cleanup_bitmap(hr_dev);
 
-- 
1.9.1

[RESEND PATCH v9 06/22] IB/hns: Add initial cmd operation

2016-06-07 Thread Lijun Ou

This patch added the operation for cmd, and added some functions
for initializing eq table and selecting cmd mode.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c| 117 
 drivers/infiniband/hw/hns/hns_roce_cmd.h|  42 ++
 drivers/infiniband/hw/hns/hns_roce_common.h |   2 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  41 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   |  13 
 5 files changed, 215 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_cmd.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
new file mode 100644
index 000..64e84fe
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_common.h"
+#include "hns_roce_device.h"
+#include "hns_roce_cmd.h"
+
+#define CMD_MAX_NUM32
+
+int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+
+   mutex_init(&hr_dev->cmd.hcr_mutex);
+   sema_init(&hr_dev->cmd.poll_sem, 1);
+   hr_dev->cmd.use_events = 0;
+   hr_dev->cmd.toggle = 1;
+   hr_dev->cmd.max_cmds = CMD_MAX_NUM;
+   hr_dev->cmd.hcr = hr_dev->reg_base + ROCEE_MB1_REG;
+   hr_dev->cmd.pool = dma_pool_create("hns_roce_cmd", dev,
+  HNS_ROCE_MAILBOX_SIZE,
+  HNS_ROCE_MAILBOX_SIZE, 0);
+   if (!hr_dev->cmd.pool)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void hns_roce_cmd_cleanup(struct hns_roce_dev *hr_dev)
+{
+   dma_pool_destroy(hr_dev->cmd.pool);
+}
+
+int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
+{
+   struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
+   int i;
+
+   hr_cmd->context = kmalloc(hr_cmd->max_cmds *
+ sizeof(struct hns_roce_cmd_context),
+ GFP_KERNEL);
+   if (!hr_cmd->context)
+   return -ENOMEM;
+
+   for (i = 0; i < hr_cmd->max_cmds; ++i) {
+   hr_cmd->context[i].token = i;
+   hr_cmd->context[i].next = i + 1;
+   }
+
+   hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
+   hr_cmd->free_head = 0;
+
+   sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+   spin_lock_init(&hr_cmd->context_lock);
+
+   for (hr_cmd->token_mask = 1; hr_cmd->token_mask < hr_cmd->max_cmds;
+hr_cmd->token_mask <<= 1)
+   ;
+   --hr_cmd->token_mask;
+   hr_cmd->use_events = 1;
+
+   down(&hr_cmd->poll_sem);
+
+   return 0;
+}
+
+void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
+{
+   struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
+   int i;
+
+   hr_cmd->use_events = 0;
+
+   for (i = 0; i < hr_cmd->max_cmds; ++i)
+   down(&hr_cmd->event_sem);
+
+   kfree(hr_cmd->context);
+   up(&hr_cmd->poll_sem);
+}
diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h 
b/drivers/infiniband/hw/hns/hns_roce_cmd.h
new file mode 100644
index 000..ff8e62d
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed

Re: [PATCH v3 1/6] watchdog: add set_pretimeout interface

2016-06-07 Thread Wolfram Sang

On Tue, Jun 07, 2016 at 08:38:42PM +0300, Vladimir Zapolskiy wrote:
> From: Robin Gong 
> 
> Add set_pretimeout since our watchdog driver has those interfaces and
> obviously, the new common watchdog framework didn't implement this
> interface.
> 
> Signed-off-by: Robin Gong 
> [vzapolskiy: rebased, added an inline comment to describe new interface]
> Signed-off-by: Vladimir Zapolskiy 

Why didn't you just take my patch and worked on it? This would have
added the documentation Guenter explicitly requested when reviewing
Robin's patch. And it would have get rid of the bug you have in the
*_invalid function. I also don't think the split-up into three patches
is necessary here, but that might just be me.

signature.asc
Description: PGP signature

[RESEND PATCH v9 17/22] IB/hns: Add QP operations support

2016-06-07 Thread Lijun Ou

This patch was implementing for queue pair operations. QP Consists
of a Send Work Queue and a Receive Work Queue. Send and receive
queues are always created as a pair and remain that way throughout
their lifetime. A Queue Pair is identified by its Queue Pair Number.
QP operations as follows:
1. create QP. When a QP is created, a complete set of initial
   attributes must be specified by the Consumer.
2. query QP. Returns the attribute list and current values for
   the specified QP.
3. modify QP. modify QP relative attributes by it.
4. destroy QP. When a QP is destroyed, any outstanding Work
   Requests are no longer considered to be in the scope of
   the Channel Interface. It is the responsibility of the
   Consumer to be able to clean up any resources
5. post send request. Builds one or more WQEs for the Send Queue
   in the specified QP.
6. post receive request. Builds one or more WQEs for the receive
   Queue in the specified QP.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
Signed-off-by: Salil Mehta 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  |  134 +++
 drivers/infiniband/hw/hns/hns_roce_cmd.c|  249 
 drivers/infiniband/hw/hns/hns_roce_cmd.h|   35 +-
 drivers/infiniband/hw/hns/hns_roce_common.h |   58 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  167 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 1629 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  628 ++-
 drivers/infiniband/hw/hns/hns_roce_icm.c|   56 +
 drivers/infiniband/hw/hns/hns_roce_icm.h|9 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |   14 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c |  161 +++
 drivers/infiniband/hw/hns/hns_roce_qp.c |  762 +
 drivers/infiniband/hw/hns/hns_roce_user.h   |   13 +
 13 files changed, 3912 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index d2932c1..786385a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -71,6 +71,45 @@ void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, 
unsigned long obj)
hns_roce_bitmap_free_range(bitmap, obj, 1);
 }
 
+int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
+   int align, unsigned long *obj)
+{
+   int ret = 0;
+   int i;
+
+   if (likely(cnt == 1 && align == 1))
+   return hns_roce_bitmap_alloc(bitmap, obj);
+
+   spin_lock(&bitmap->lock);
+
+   *obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max,
+ bitmap->last, cnt, align - 1);
+   if (*obj >= bitmap->max) {
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   *obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max, 0,
+ cnt, align - 1);
+   }
+
+   if (*obj < bitmap->max) {
+   for (i = 0; i < cnt; i++)
+   set_bit(*obj + i, bitmap->table);
+
+   if (*obj == bitmap->last) {
+   bitmap->last = (*obj + cnt);
+   if (bitmap->last >= bitmap->max)
+   bitmap->last = 0;
+   }
+   *obj |= bitmap->top;
+   } else {
+   ret = -1;
+   }
+
+   spin_unlock(&bitmap->lock);
+
+   return ret;
+}
+
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
unsigned long obj, int cnt)
 {
@@ -118,6 +157,101 @@ void hns_roce_bitmap_cleanup(struct hns_roce_bitmap 
*bitmap)
kfree(bitmap->table);
 }
 
+void hns_roce_buf_free(struct hns_roce_dev *hr_dev, u32 size,
+  struct hns_roce_buf *buf)
+{
+   int i;
+   struct device *dev = &hr_dev->pdev->dev;
+   u32 bits_per_long = BITS_PER_LONG;
+
+   if (buf->nbufs == 1) {
+   dma_free_coherent(dev, size, buf->direct.buf, buf->direct.map);
+   } else {
+   if (bits_per_long == 64)
+   vunmap(buf->direct.buf);
+
+   for (i = 0; i < buf->nbufs; ++i)
+   if (buf->page_list[i].buf)
+   dma_free_coherent(&hr_dev->pdev->dev, PAGE_SIZE,
+ buf->page_list[i].buf,
+ buf->page_list[i].map);
+   kfree(buf->page_list);
+   }
+}
+
+int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, u32 max_direct,
+  struct hns_roce_buf *buf)
+{
+   int i = 0;
+   dma_addr_t t;
+   struct page **pages;
+   struct device *dev = &hr_dev->pdev->dev;
+   u32 bits_per_long = BITS_PER_LONG;
+
+   /* SQ/RQ buf

[RESEND PATCH v9 14/22] IB/hns: Add operations support for IB device and port

2016-06-07 Thread Lijun Ou

This patch mainly registered some relative verbs for the kernel.
These operation functions will be called by user. For example:
1. modify device
2. query device
3. query_port
4. modify_port
and so on.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_common.h |   4 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  21 +++
 drivers/infiniband/hw/hns/hns_roce_main.c   | 228 
 drivers/infiniband/hw/hns/hns_roce_user.h   |  40 +
 4 files changed, 293 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_user.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index b66e96f..ee87689 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -33,6 +33,10 @@
 #ifndef _HNS_ROCE_COMMON_H
 #define _HNS_ROCE_COMMON_H
 
+#ifndef assert
+#define assert(cond)
+#endif
+
 #define roce_write(dev, reg, val)  writel((val), (dev)->reg_base + (reg))
 #define roce_read(dev, reg)readl((dev)->reg_base + (reg))
 #define roce_raw_write(value, addr) \
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2d75585..99f2653 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -45,6 +45,7 @@
 #define DRV_NAME "hns_roce"
 
 #define MAC_ADDR_OCTET_NUM 6
+#define HNS_ROCE_MAX_MSG_LEN   0x8000
 
 #define HNS_ROCE_BA_SIZE   (32 * 4096)
 
@@ -57,6 +58,10 @@
 
 #define HNS_ROCE_MAX_PORTS 6
 #define HNS_ROCE_MAX_GID_NUM   16
+#define HNS_ROCE_GID_SIZE  16
+
+#define PKEY_ID0x
+#define NODE_DESC_SIZE 64
 
 #define PAGES_SHIFT_16 16
 
@@ -124,6 +129,11 @@ struct hns_roce_uar {
unsigned long   index;
 };
 
+struct hns_roce_ucontext {
+   struct ib_ucontext  ibucontext;
+   struct hns_roce_uar uar;
+};
+
 struct hns_roce_bitmap {
/* Bitmap Traversal last a bit which is 1 */
unsigned long   last;
@@ -378,6 +388,17 @@ struct hns_roce_dev {
struct hns_roce_hw  *hw;
 };
 
+static inline struct hns_roce_dev *to_hr_dev(struct ib_device *ib_dev)
+{
+   return container_of(ib_dev, struct hns_roce_dev, ib_dev);
+}
+
+static inline struct hns_roce_ucontext
+   *to_hr_ucontext(struct ib_ucontext *ibucontext)
+{
+   return container_of(ibucontext, struct hns_roce_ucontext, ibucontext);
+}
+
 static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
 {
__raw_writeq(*(u64 *) val, dest);
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 8f8bc7a..64cf5c8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -60,6 +60,7 @@
 #include 
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
+#include "hns_roce_user.h"
 #include "hns_roce_icm.h"
 
 /**
@@ -360,6 +361,217 @@ int hns_roce_setup_mtu_gids(struct hns_roce_dev  *hr_dev)
return ret;
 }
 
+static int hns_roce_query_device(struct ib_device *ib_dev,
+struct ib_device_attr *props,
+struct ib_udata *uhw)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
+
+   memset(props, 0, sizeof(*props));
+
+   props->fw_ver = hr_dev->fw_ver;
+   props->sys_image_guid = hr_dev->sys_image_guid;
+   props->max_mr_size = (u64)(~(0ULL));
+   props->page_size_cap = hr_dev->caps.page_size_cap;
+   props->vendor_id = hr_dev->vendor_id;
+   props->vendor_part_id = hr_dev->vendor_part_id;
+   props->hw_ver = hr_dev->hw_rev;
+   props->max_qp = hr_dev->caps.num_qps;
+   props->max_qp_wr = hr_dev->caps.max_wqes;
+   props->device_cap_flags = IB_DEVICE_PORT_ACTIVE_EVENT |
+ IB_DEVICE_RC_RNR_NAK_GEN |
+ IB_DEVICE_LOCAL_DMA_LKEY;
+   props->max_sge = hr_dev->caps.max_sq_sg;
+   props->max_sge_rd = 1;
+   props->max_cq = hr_dev->caps.num_cqs;
+   props->max_cqe = hr_dev->caps.max_cqes;
+   props->max_mr = hr_dev->caps.num_mtpts;
+   props->max_pd = hr_dev->caps.num_pds;
+   props->max_qp_rd_atom = hr_dev->caps.max_qp_dest_rdma;
+   props->max_qp_init_rd_atom = hr_dev->caps.max_qp_init_rdma;
+   props->atomic_cap = IB_ATOMIC_NONE;
+   props->max_pkeys = 1;
+   props->local_ca_ack_delay = hr_dev->caps.local_ca_ack_delay;
+
+   return 0;
+}
+
+static int hns_roce_query_port(struct ib_device *ib_dev, u8 port_num,
+  struct ib_port_attr *props)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);

[RESEND PATCH v9 18/22] IB/hns: Add CQ operations support

2016-06-07 Thread Lijun Ou

This patch was implementing for Completion Queue(CQ) operations.
A CQ can be used to multiplex work completions from multiple work
queues across queue pairs on the same HCA. CQ as the notification
mechanism for Work Request completions.
CQ operations as follows:
1. create CQ. CQ are created through the Channel Interface,
   The maximum number of Completion Queue Entries (CQEs) that
   may be outstanding on a CQ must be specified when the CQ
   is created.
2. destroy CQ. Destroys the specified CQ. Resources allocated
   by the Channel Interface to implement the CQ must be
   deallocated during the destroy operation.
3. request completion notification. Requests the CQ event handler
   be called when the next completion entry of the specified type
   is added to the specified CQ.
4. poll CQ. Polls the specified CQ for a Work Completion.
   A Work Completion indicates that a Work Request for a Work
   Queue associated with the CQ is done.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
Signed-off-by: Salil Mehta 
---
 drivers/infiniband/hw/hns/hns_roce_cq.c | 357 
 drivers/infiniband/hw/hns/hns_roce_device.h |  33 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 345 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 117 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |   9 +
 5 files changed, 861 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 52179fb..c0c3d79 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -34,6 +34,363 @@
 #include 
 #include 
 #include "hns_roce_device.h"
+#include "hns_roce_cmd.h"
+#include "hns_roce_icm.h"
+#include "hns_roce_user.h"
+#include "hns_roce_common.h"
+
+static void hns_roce_ib_cq_comp(struct hns_roce_cq *hr_cq)
+{
+   struct ib_cq *ibcq = &hr_cq->ib_cq;
+
+   ibcq->comp_handler(ibcq, ibcq->cq_context);
+}
+
+static void hns_roce_ib_cq_event(struct hns_roce_cq *hr_cq,
+enum hns_roce_event event_type)
+{
+   struct hns_roce_dev *hr_dev;
+   struct ib_event event;
+   struct ib_cq *ibcq;
+
+   ibcq = &hr_cq->ib_cq;
+   hr_dev = to_hr_dev(ibcq->device);
+
+   if (event_type != HNS_ROCE_EVENT_TYPE_CQ_ID_INVALID &&
+   event_type != HNS_ROCE_EVENT_TYPE_CQ_ACCESS_ERROR &&
+   event_type != HNS_ROCE_EVENT_TYPE_CQ_OVERFLOW) {
+   dev_err(&hr_dev->pdev->dev,
+   "hns_roce_ib: Unexpected event type 0x%x on CQ %06lx\n",
+   event_type, hr_cq->cqn);
+   return;
+   }
+
+   if (ibcq->event_handler) {
+   event.device = ibcq->device;
+   event.event = IB_EVENT_CQ_ERR;
+   event.element.cq = ibcq;
+   ibcq->event_handler(&event, ibcq->cq_context);
+   }
+}
+
+static int hns_roce_sw2hw_cq(struct hns_roce_dev *dev,
+struct hns_roce_cmd_mailbox *mailbox,
+unsigned long cq_num)
+{
+   return hns_roce_cmd_mbox(dev, mailbox->dma, 0, cq_num, 0,
+   HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIME_CLASS_A);
+}
+
+static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, int nent,
+struct hns_roce_mtt *hr_mtt,
+struct hns_roce_uar *hr_uar,
+struct hns_roce_cq *hr_cq, int vector,
+int collapsed)
+{
+   struct hns_roce_cmd_mailbox *mailbox = NULL;
+   struct hns_roce_cq_table *cq_table = NULL;
+   struct device *dev = &hr_dev->pdev->dev;
+   dma_addr_t dma_handle;
+   u64 *mtts = NULL;
+   int ret = 0;
+
+   cq_table = &hr_dev->cq_table;
+
+   /* Get the physical address of cq buf */
+   mtts = hns_roce_table_find(&hr_dev->mr_table.mtt_table,
+  hr_mtt->first_seg, &dma_handle);
+   if (!mtts) {
+   dev_err(dev, "CQ alloc.Failed to find cq buf addr.\n");
+   return -EINVAL;
+   }
+
+   if (vector >= hr_dev->caps.num_comp_vectors) {
+   dev_err(dev, "CQ alloc.Invalid vector.\n");
+   return -EINVAL;
+   }
+   hr_cq->vector = vector;
+
+   ret = hns_roce_bitmap_alloc(&cq_table->bitmap, &hr_cq->cqn);
+   if (ret == -1) {
+   dev_err(dev, "CQ alloc.Failed to alloc index.\n");
+   return -ENOMEM;
+   }
+
+   /* Get CQC memory icm table */
+   ret = hns_roce_table_get(hr_dev, &cq_table->table, hr_cq->cqn);
+   if (ret) {
+   dev_err(dev, "CQ alloc.Failed to get context mem.\n");
+   goto err_out;
+   }
+
+   /* The cq insert radix tree */
+   spin_lock_irq(&cq_table->lock);
+   /* Radix_tree: The associated pointer and long integer key val

[RESEND PATCH v9 10/22] IB/hns: Add process flow to init RoCE engine

2016-06-07 Thread Lijun Ou

This patch mainly initialized the RoCE engine. It is absolutely
necessary to run RoCE. It mainly includes that configure DMAE
user, initialize doorbell and raq operations, enable port.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_common.h | 107 +++
 drivers/infiniband/hw/hns/hns_roce_device.h |  10 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 443 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |  58 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  21 ++
 5 files changed, 639 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index f15bf1b..776286c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -53,6 +53,93 @@
 #define roce_set_bit(origin, shift, val) \
roce_set_field((origin), (1ul << (shift)), (shift), (val))
 
+#define ROCEE_GLB_CFG_ROCEE_DB_SQ_MODE_S 3
+#define ROCEE_GLB_CFG_ROCEE_DB_OTH_MODE_S 4
+
+#define ROCEE_GLB_CFG_SQ_EXT_DB_MODE_S 5
+
+#define ROCEE_GLB_CFG_OTH_EXT_DB_MODE_S 6
+
+#define ROCEE_GLB_CFG_ROCEE_PORT_ST_S 10
+#define ROCEE_GLB_CFG_ROCEE_PORT_ST_M  \
+   (((1UL << 6) - 1) << ROCEE_GLB_CFG_ROCEE_PORT_ST_S)
+
+#define ROCEE_GLB_CFG_TRP_RAQ_DROP_EN_S 16
+
+#define ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_S 0
+#define ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_M  \
+   (((1UL << 24) - 1) << ROCEE_DMAE_USER_CFG1_ROCEE_STREAM_ID_TB_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_S 24
+#define ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_M  \
+   (((1UL << 4) - 1) << ROCEE_DMAE_USER_CFG1_ROCEE_CACHE_TB_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_S 0
+#define ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_M   \
+   (((1UL << 24) - 1) << ROCEE_DMAE_USER_CFG2_ROCEE_STREAM_ID_PKT_CFG_S)
+
+#define ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_S 24
+#define ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_M   \
+   (((1UL << 4) - 1) << ROCEE_DMAE_USER_CFG2_ROCEE_CACHE_PKT_CFG_S)
+
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_S 0
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_S)
+
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_S 16
+#define ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_SQ_WL_ROCEE_DB_SQ_WL_EMPTY_S)
+
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_S 0
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_S)
+
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_S 16
+#define ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_M   \
+   (((1UL << 16) - 1) << ROCEE_DB_OTHERS_WL_ROCEE_DB_OTH_WL_EMPTY_S)
+
+#define ROCEE_RAQ_WL_ROCEE_RAQ_WL_S 0
+#define ROCEE_RAQ_WL_ROCEE_RAQ_WL_M   \
+   (((1UL << 8) - 1) << ROCEE_RAQ_WL_ROCEE_RAQ_WL_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_S 0
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_M   \
+   (((1UL << 15) - 1) << \
+   ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_POL_TIME_INTERVAL_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_S 16
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_M   \
+   (((1UL << 4) - 1) << \
+   ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_CFG_S)
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_RAQ_TIMEOUT_CHK_EN_S 20
+
+#define ROCEE_WRMS_POL_TIME_INTERVAL_WRMS_EXT_RAQ_MODE 21
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_S 0
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_SHIFT_S)
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_S 5
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_SQ_BA_H_S)
+
+#define ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_S 0
+#define ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_OTH_H_EXT_DB_OTH_SHIFT_S)
+
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_S 5
+#define ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_DB_SQ_H_EXT_DB_OTH_BA_H_S)
+
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_S 0
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_RAQ_H_EXT_RAQ_SHIFT_S)
+
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_S 8
+#define ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_M   \
+   (((1UL << 5) - 1) << ROCEE_EXT_RAQ_H_EXT_RAQ_BA_H_S)
+
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S 0
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_M   \
(((1UL << 19) - 1) << ROCEE_BT_CMD_H_ROCEE_BT_CMD_IN_MDF_S)
@@ -120,6 +207,26 @@
 #define ROCEE_ECC_CERR_ALM2_REG0xB48
 
 #define ROCEE_ACK_DELAY_REG0x14
+#define ROCEE_GLB_CFG_REG  0x18
+
+#define ROCEE_DMAE_USER_CFG1_REG   0x40
+#define ROCEE_DMAE_USER_CFG2_REG   0x44
+
+#define ROCEE_DB_SQ_WL_REG 0x154

Re: [PATCH v3] power_supply: power_supply_read_temp only if use_cnt > 0

2016-06-07 Thread Krzysztof Kozlowski

On 06/07/2016 10:26 PM, Rhyland Klein wrote:
> Change power_supply_read_temp() to use power_supply_get_property()
> so that it will check the use_cnt and ensure it is > 0. The use_cnt
> will be incremented at the end of __power_supply_register, so this
> will block to case where get_property can be called before the supply
> is fully registered. This fixes the issue show in the stack below:
> 
> [1.452598] power_supply_read_temp+0x78/0x80
> [1.458680] thermal_zone_get_temp+0x5c/0x11c
> [1.464765] thermal_zone_device_update+0x34/0xb4
> [1.471195] thermal_zone_device_register+0x87c/0x8cc
> [1.477974] __power_supply_register+0x364/0x424
> [1.484317] power_supply_register_no_ws+0x10/0x18
> [1.490833] bq27xxx_battery_setup+0x10c/0x164
> [1.497003] bq27xxx_battery_i2c_probe+0xd0/0x1b0
> [1.503435] i2c_device_probe+0x174/0x240
> [1.509172] driver_probe_device+0x1fc/0x29c
> [1.515167] __driver_attach+0xa4/0xa8
> [1.520643] bus_for_each_dev+0x58/0x98
> [1.526204] driver_attach+0x20/0x28
> [1.531505] bus_add_driver+0x1c8/0x22c
> [1.537067] driver_register+0x68/0x108
> [1.542630] i2c_register_driver+0x38/0x7c
> [1.548457] bq27xxx_battery_i2c_driver_init+0x18/0x20
> [1.555321] do_one_initcall+0x38/0x12c
> [1.560886] kernel_init_freeable+0x148/0x1ec
> [1.566972] kernel_init+0x10/0xfc
> [1.572101] ret_from_fork+0x10/0x40
> 
> Also make the same change to ps_get_max_charge_cntl_limit() and
> ps_get_cur_chrage_cntl_limit() to be safe. Lastly, change the return
> value of power_supply_get_property() to -EAGAIN from -ENODEV if
> use_cnt <= 0.
> 
> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Rhyland Klein 
> ---
> v3:
>  - Changed calls to ->get_property() to use common
>power_supply_get_property()
>  - reworded description, added "Fixes" line
>  - Changed return value of power_supply_get_property() to -EAGAIN
> 
> v2:
>  - Added cc stable
>  - changed return to -EAGAIN in case of use_cnt < 1
>  - Removed WARNING
>  - return value check added in additional patch:
>https://lkml.org/lkml/2016/6/6/706
> 
>  drivers/power/power_supply_core.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/power/power_supply_core.c 
> b/drivers/power/power_supply_core.c
> index 456987c88baa..630bd68e 100644
> --- a/drivers/power/power_supply_core.c
> +++ b/drivers/power/power_supply_core.c
> @@ -492,7 +492,7 @@ int power_supply_get_property(struct power_supply *psy,
>   union power_supply_propval *val)
>  {
>   if (atomic_read(&psy->use_cnt) <= 0)
> - return -ENODEV;
> + return -EAGAIN;

Wait, no. I was thinking of changing the return value in
power_supply_read_temp() if we really want EAGAIN:
ret = power_supply_get_property(...);
if (ret)
return -EAGAIN;

On the other hand, here both return values look correct... the call can
be executed too early (not very common) or too late after unbinding the
driver (also kind of specific).

>  
>   return psy->desc->get_property(psy, psp, val);
>  }
> @@ -564,12 +564,14 @@ static int power_supply_read_temp(struct 
> thermal_zone_device *tzd,
>   int ret;
>  
>   WARN_ON(tzd == NULL);
> +
>   psy = tzd->devdata;
> - ret = psy->desc->get_property(psy, POWER_SUPPLY_PROP_TEMP, &val);
> + ret = power_supply_get_property(psy, POWER_SUPPLY_PROP_TEMP, &val);
> + if (!ret)
> + return ret;

I think you wanted reverse:
if (ret)

>  
>   /* Convert tenths of degree Celsius to milli degree Celsius. */
> - if (!ret)
> - *temp = val.intval * 100;
> + *temp = val.intval * 100;
>  
>   return ret;
>  }
> @@ -612,10 +614,12 @@ static int ps_get_max_charge_cntl_limit(struct 
> thermal_cooling_device *tcd,
>   int ret;
>  
>   psy = tcd->devdata;
> - ret = psy->desc->get_property(psy,
> - POWER_SUPPLY_PROP_CHARGE_CONTROL_LIMIT_MAX, &val);
> + ret = power_supply_get_property(psy,
> + POWER_SUPPLY_PROP_CHARGE_CONTROL_LIMIT_MAX, &val);
>   if (!ret)
> - *state = val.intval;
> + return ret;

Wait, again - why are you inverting the logic of 'ret'?

BR,
Krzysztof

> +
> + *state = val.intval;
>  
>   return ret;
>  }
> @@ -628,10 +632,12 @@ static int ps_get_cur_chrage_cntl_limit(struct 
> thermal_cooling_device *tcd,
>   int ret;
>  
>   psy = tcd->devdata;
> - ret = psy->desc->get_property(psy,
> - POWER_SUPPLY_PROP_CHARGE_CONTROL_LIMIT, &val);
> + ret = power_supply_get_property(psy,
> + POWER_SUPPLY_PROP_CHARGE_CONTROL_LIMIT, &val);
>   if (!ret)
> - *state = val.intval;
> + return ret;
> +
> + *state = val.intval;
>  
>   return ret;
>  }
>

[RESEND PATCH v9 12/22] IB/hns: Set mtu and gid support

2016-06-07 Thread Lijun Ou

This patch mainly set mtu and gid resource. These resource
will be used to set up network transmission in nodes.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_common.h |  16 
 drivers/infiniband/hw/hns/hns_roce_device.h |  14 
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  65 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   1 +
 drivers/infiniband/hw/hns/hns_roce_main.c   | 123 
 5 files changed, 219 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 776286c..b66e96f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -156,6 +156,14 @@
 
 #define ROCEE_BT_CMD_H_ROCEE_BT_CMD_HW_SYNS_S 31
 
+#define ROCEE_SMAC_H_ROCEE_SMAC_H_S 0
+#define ROCEE_SMAC_H_ROCEE_SMAC_H_M   \
+   (((1UL << 16) - 1) << ROCEE_SMAC_H_ROCEE_SMAC_H_S)
+
+#define ROCEE_SMAC_H_ROCEE_PORT_MTU_S 16
+#define ROCEE_SMAC_H_ROCEE_PORT_MTU_M   \
+   (((1UL << 4) - 1) << ROCEE_SMAC_H_ROCEE_PORT_MTU_S)
+
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S 0
 #define ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_M   \
(((1UL << 2) - 1) << ROCEE_CAEP_AEQC_AEQE_SHIFT_CAEP_AEQC_STATE_S)
@@ -196,8 +204,16 @@
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
+#define ROCEE_PORT_GID_L_0_REG 0x50
+#define ROCEE_PORT_GID_ML_0_REG0x54
+#define ROCEE_PORT_GID_MH_0_REG0x58
+#define ROCEE_PORT_GID_H_0_REG 0x5C
+
 #define ROCEE_BT_CMD_H_REG 0x204
 
+#define ROCEE_SMAC_L_0_REG 0x240
+#define ROCEE_SMAC_H_0_REG 0x244
+
 #define ROCEE_CAEP_AEQE_CONS_IDX_REG   0x3AC
 #define ROCEE_CAEP_CEQC_CONS_IDX_0_REG 0x3BC
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index c8f8831..531e488 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -44,6 +44,8 @@
 
 #define DRV_NAME "hns_roce"
 
+#define MAC_ADDR_OCTET_NUM 6
+
 #define HNS_ROCE_BA_SIZE   (32 * 4096)
 
 #define HNS_ROCE_MAX_IRQ_NUM   34
@@ -54,6 +56,9 @@
 #define HNS_ROCE_AEQE_OF_VEC_NUM   1
 
 #define HNS_ROCE_MAX_PORTS 6
+#define HNS_ROCE_MAX_GID_NUM   16
+
+#define PAGES_SHIFT_16 16
 
 enum hns_roce_event {
HNS_ROCE_EVENT_TYPE_PATH_MIG  = 0x01,
@@ -252,6 +257,8 @@ struct hns_roce_qp {
 
 struct hns_roce_ib_iboe {
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   /* 16 GID is shared by 6 port in v1 engine. */
+   union ib_gidgid_table[HNS_ROCE_MAX_GID_NUM];
u8  phy_port[HNS_ROCE_MAX_PORTS];
 };
 
@@ -326,6 +333,11 @@ struct hns_roce_hw {
void (*hw_profile)(struct hns_roce_dev *hr_dev);
int (*hw_init)(struct hns_roce_dev *hr_dev);
void (*hw_exit)(struct hns_roce_dev *hr_dev);
+   void (*set_gid)(struct hns_roce_dev *hr_dev, u8 port, int gid_index,
+   union ib_gid *gid);
+   void (*set_mac)(struct hns_roce_dev *hr_dev, u8 phy_port, u8 *addr);
+   void (*set_mtu)(struct hns_roce_dev *hr_dev, u8 phy_port,
+   enum ib_mtu mtu);
void*priv;
 };
 
@@ -343,6 +355,7 @@ struct hns_roce_dev {
struct hns_roce_capscaps;
struct radix_tree_root  qp_table_tree;
 
+   unsigned char   dev_addr[HNS_ROCE_MAX_PORTS][MAC_ADDR_OCTET_NUM];
u64 fw_ver;
u64 sys_image_guid;
u32 vendor_id;
@@ -412,6 +425,7 @@ void hns_roce_bitmap_free_range(struct hns_roce_bitmap 
*bitmap,
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn);
 void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type);
 void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type);
+int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index);
 
 extern struct hns_roce_hw hns_roce_hw_v1;
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 883e181..10acef2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -583,9 +583,74 @@ void hns_roce_v1_exit(struct hns_roce_dev *hr_dev)
hns_roce_db_free(hr_dev);
 }
 
+void hns_roce_v1_set_gid(struct hns_roce_dev *hr_dev, u8 port, int gid_index,
+union ib_gid *gid)
+{
+   u32 *p = NULL;
+   u8 gid_idx = 0;
+
+   gid_idx = hns_get_gid_index(hr_dev, port, gid_index);
+
+   p = (u32 *)&gid->raw[0];
+   roce_raw_write(*p, hr_d

[RESEND PATCH v9 20/22] IB/hns: Add operation for getting immutable port

2016-06-07 Thread Lijun Ou

This patch added a new verbs that is getting port immutable.
It is added in the 4.5 kernel and latest. It is necessary to
solve the fail questions for registering ib device.

Signed-off-by: Wei Hu 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_main.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index fb21b8a..63f5a62 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -572,6 +572,25 @@ static int hns_roce_mmap(struct ib_ucontext *context,
return 0;
 }
 
+static int hns_roce_port_immutable(struct ib_device *ib_dev, u8 port_num,
+  struct ib_port_immutable *immutable)
+{
+   struct ib_port_attr attr;
+   int ret;
+
+   ret = hns_roce_query_port(ib_dev, port_num, &attr);
+   if (ret)
+   return ret;
+
+   immutable->pkey_tbl_len = attr.pkey_tbl_len;
+   immutable->gid_tbl_len = attr.gid_tbl_len;
+
+   immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
+   immutable->max_mad_size = IB_MGMT_MAD_SIZE;
+
+   return 0;
+}
+
 void hns_roce_unregister_device(struct hns_roce_dev *hr_dev)
 {
struct hns_roce_ib_iboe *iboe = &hr_dev->iboe;
@@ -657,6 +676,9 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->reg_user_mr = hns_roce_reg_user_mr;
ib_dev->dereg_mr= hns_roce_dereg_mr;
 
+   /* OTHERS */
+   ib_dev->get_port_immutable  = hns_roce_port_immutable;
+
ret = ib_register_device(ib_dev, NULL);
if (ret) {
dev_err(dev, "ib_register_device failed!\n");
-- 
1.9.1

[RESEND PATCH v9 22/22] MAINTAINERS: Add maintainers for HiSilicon RoCE driver

2016-06-07 Thread Lijun Ou

This patch added maintainers for RoCE driver.

Signed-off-by: Wei Hu 
Signed-off-by: Lijun Ou 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7304d2e..3de2ef0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10280,6 +10280,14 @@ W: http://www.emulex.com
 S: Supported
 F: drivers/infiniband/hw/ocrdma/
 
+HISILICON ROCE DRIVER
+M: Lijun Ou 
+M: Wei Hu(Xavier) 
+L: linux-r...@vger.kernel.org
+S: Maintained
+F: drivers/infiniband/hw/hns/
+F: Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
+
 SFC NETWORK DRIVER
 M: Solarflare linux maintainers 
 M: Edward Cree 
-- 
1.9.1

[RESEND PATCH v9 04/22] IB/hns: Add RoCE engine reset function

2016-06-07 Thread Lijun Ou

This patch mainly added reset flow of RoCE engine in RoCE
driver. It is necessary when RoCE is loaded and removed.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  7 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 72 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 40 
 drivers/infiniband/hw/hns/hns_roce_main.c   | 17 ++-
 4 files changed, 135 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_hw_v1.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_hw_v1.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index f9de8e4..2e18488 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -56,6 +56,10 @@ struct hns_roce_caps {
u8  num_ports;
 };
 
+struct hns_roce_hw {
+   int (*reset)(struct hns_roce_dev *hr_dev, bool enable);
+};
+
 struct hns_roce_dev {
struct ib_deviceib_dev;
struct platform_device  *pdev;
@@ -67,6 +71,9 @@ struct hns_roce_dev {
 
int cmd_mod;
int loop_idc;
+   struct hns_roce_hw  *hw;
 };
 
+extern struct hns_roce_hw hns_roce_hw_v1;
+
 #endif /* _HNS_ROCE_DEVICE_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
new file mode 100644
index 000..198be3b
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_device.h"
+#include "hns_roce_hw_v1.h"
+
+/**
+ * hns_roce_v1_reset - reset roce
+ * @hr_dev: roce device struct pointer
+ * @enable: true -- drop reset, false -- reset
+ * return 0 - success , negative --fail
+ */
+int hns_roce_v1_reset(struct hns_roce_dev *hr_dev, bool enable)
+{
+   struct device_node *dsaf_node;
+   struct device *dev = &hr_dev->pdev->dev;
+   struct device_node *np = dev->of_node;
+   int ret;
+
+   dsaf_node = of_parse_phandle(np, "dsaf-handle", 0);
+
+   if (!enable) {
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, false);
+   } else {
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, false);
+   if (ret)
+   return ret;
+
+   msleep(SLEEP_TIME_INTERVAL);
+   ret = hns_dsaf_roce_reset(&dsaf_node->fwnode, true);
+   }
+
+   return ret;
+}
+
+struct hns_roce_hw hns_roce_hw_v1 = {
+   .reset = hns_roce_v1_reset,
+};
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
new file mode 100644
index 000..ca69d0b
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and th

[RESEND PATCH v9 19/22] IB/hns: Add memory region operations support

2016-06-07 Thread Lijun Ou

This patch was mainly for implementing of memory region.
Memory Registration provides mechanisms that allow consumers
to describe a set of virtually contiguous memory locations or
a set of physically contiguous memory locations.
MR operations includes as follows:
1. get dma MR in kernel mode
2. get MR in user mode
3. deregister MR
And the locations of some functions was adjusted in
some files.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
Signed-off-by: Salil Mehta 
---
 drivers/infiniband/hw/hns/hns_roce_cmd.h|   9 +
 drivers/infiniband/hw/hns/hns_roce_device.h |  45 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 157 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 109 +++-
 drivers/infiniband/hw/hns/hns_roce_icm.h|   1 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |   7 +
 drivers/infiniband/hw/hns/hns_roce_mr.c | 250 
 drivers/infiniband/hw/hns/hns_roce_qp.c |   1 +
 8 files changed, 576 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h 
b/drivers/infiniband/hw/hns/hns_roce_cmd.h
index cb3e85a..7b37bea 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.h
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -36,6 +36,14 @@
 #include 
 
 enum {
+   /* TPT commands */
+   HNS_ROCE_CMD_SW2HW_MPT  = 0xd,
+   HNS_ROCE_CMD_HW2SW_MPT  = 0xf,
+
+   /* CQ commands */
+   HNS_ROCE_CMD_SW2HW_CQ   = 0x16,
+   HNS_ROCE_CMD_HW2SW_CQ   = 0x17,
+
/* QP/EE commands */
HNS_ROCE_CMD_RST2INIT_QP= 0x19,
HNS_ROCE_CMD_INIT2RTR_QP= 0x1a,
@@ -51,6 +59,7 @@ enum {
 
 enum {
HNS_ROCE_CMD_TIME_CLASS_A   = 1,
+   HNS_ROCE_CMD_TIME_CLASS_B   = 1,
HNS_ROCE_CMD_TIME_CLASS_C   = 1,
 };
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 669f915..5e87f69 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -57,6 +57,10 @@
 #define HNS_ROCE_MIN_CQE_NUM   0x40
 #define HNS_ROCE_MIN_WQE_NUM   0x20
 
+/* Hardware specification only for v1 engine */
+#define HNS_ROCE_MAX_INNER_MTPT_NUM0x7
+#define HNS_ROCE_MAX_MTPT_PBL_NUM  0x10
+
 #define HNS_ROCE_MAX_IRQ_NUM   34
 
 #define HNS_ROCE_COMP_VEC_NUM  32
@@ -73,10 +77,21 @@
 #define HNS_ROCE_MAX_GID_NUM   16
 #define HNS_ROCE_GID_SIZE  16
 
+#define MR_TYPE_MR 0x00
+#define MR_TYPE_DMA0x03
+
 #define PKEY_ID0x
 #define NODE_DESC_SIZE 64
 
+#define SERV_TYPE_RC   0
+#define SERV_TYPE_RD   1
+#define SERV_TYPE_UC   2
+#define SERV_TYPE_UD   3
+
+#define PAGES_SHIFT_8  8
 #define PAGES_SHIFT_16 16
+#define PAGES_SHIFT_24 24
+#define PAGES_SHIFT_32 32
 
 enum hns_roce_qp_state {
HNS_ROCE_QP_STATE_RST= 0,
@@ -220,6 +235,23 @@ struct hns_roce_mtt {
int page_shift;
 };
 
+/* Only support 4K page size for mr register */
+#define MR_SIZE_4K 0
+
+struct hns_roce_mr {
+   struct ib_mribmr;
+   struct ib_umem  *umem;
+   u64 iova; /* MR's virtual orignal addr */
+   u64 size; /* Address range of MR */
+   u32 key; /* Key of MR */
+   u32 pd;   /* PD num of MR */
+   u32 access;/* Access permission of MR */
+   int enabled; /* MR's active status */
+   int type;   /* MR's register type */
+   u64 *pbl_buf;/* MR's PBL space */
+   dma_addr_t  pbl_dma_addr;   /* MR's PBL space PA */
+};
+
 struct hns_roce_mr_table {
struct hns_roce_bitmap  mtpt_bitmap;
struct hns_roce_buddy   mtt_buddy;
@@ -487,6 +519,8 @@ struct hns_roce_hw {
void (*set_mac)(struct hns_roce_dev *hr_dev, u8 phy_port, u8 *addr);
void (*set_mtu)(struct hns_roce_dev *hr_dev, u8 phy_port,
enum ib_mtu mtu);
+   int (*write_mtpt)(void *mb_buf, struct hns_roce_mr *mr,
+ unsigned long mtpt_idx);
void (*write_cqc)(struct hns_roce_dev *hr_dev,
  struct hns_roce_cq *hr_cq, void *mb_buf, u64 *mtts,
  dma_addr_t dma_handle, int nent, u32 vector);
@@ -561,6 +595,11 @@ static inline struct hns_roce_ah *to_hr_ah(struct ib_ah 
*ibah)
return container_of(ibah, struct hns_roce_ah, ibah);
 }

[RESEND PATCH v9 21/22] IB/hns: Kconfig and Makefile for RoCE module

2016-06-07 Thread Lijun Ou

This patch added Kconfig and Makefile for building RoCE module.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/Kconfig |  1 +
 drivers/infiniband/hw/Makefile |  1 +
 drivers/infiniband/hw/hns/Kconfig  | 10 ++
 drivers/infiniband/hw/hns/Makefile |  8 
 4 files changed, 20 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/Kconfig
 create mode 100644 drivers/infiniband/hw/hns/Makefile

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 2137adf..767f92b 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -74,6 +74,7 @@ source "drivers/infiniband/hw/mlx5/Kconfig"
 source "drivers/infiniband/hw/nes/Kconfig"
 source "drivers/infiniband/hw/ocrdma/Kconfig"
 source "drivers/infiniband/hw/usnic/Kconfig"
+source "drivers/infiniband/hw/hns/Kconfig"
 
 source "drivers/infiniband/ulp/ipoib/Kconfig"
 
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index c0c7cf8..2ad851d 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_INFINIBAND_NES)+= nes/
 obj-$(CONFIG_INFINIBAND_OCRDMA)+= ocrdma/
 obj-$(CONFIG_INFINIBAND_USNIC) += usnic/
 obj-$(CONFIG_INFINIBAND_HFI1)  += hfi1/
+obj-$(CONFIG_INFINIBAND_HISILICON_HNS) += hns/
diff --git a/drivers/infiniband/hw/hns/Kconfig 
b/drivers/infiniband/hw/hns/Kconfig
new file mode 100644
index 000..c47c168
--- /dev/null
+++ b/drivers/infiniband/hw/hns/Kconfig
@@ -0,0 +1,10 @@
+config INFINIBAND_HISILICON_HNS
+   tristate "Hisilicon Hns ROCE Driver"
+   depends on NET_VENDOR_HISILICON
+   depends on ARM64 && HNS && HNS_DSAF && HNS_ENET
+   ---help---
+ This is a ROCE/RDMA driver for the Hisilicon RoCE engine. The engine
+ is used in Hisilicon Hi1610 and more further ICT SoC.
+
+ To compile this driver as a module, choose M here: the module
+ will be called hns-roce.
diff --git a/drivers/infiniband/hw/hns/Makefile 
b/drivers/infiniband/hw/hns/Makefile
new file mode 100644
index 000..40b6307
--- /dev/null
+++ b/drivers/infiniband/hw/hns/Makefile
@@ -0,0 +1,8 @@
+#
+# Makefile for the HISILICON RoCE drivers.
+#
+
+obj-$(CONFIG_INFINIBAND_HISILICON_HNS) += hns-roce.o
+hns-roce-objs := hns_roce_main.o hns_roce_cmd.o hns_roce_eq.o hns_roce_pd.o \
+   hns_roce_ah.o hns_roce_icm.o hns_roce_mr.o hns_roce_qp.o \
+   hns_roce_cq.o hns_roce_alloc.o hns_roce_hw_v1.o
-- 
1.9.1

[RESEND PATCH v9 02/22] devicetree: bindings: IB: Add binding document for HiSilicon RoCE

2016-06-07 Thread Lijun Ou

This patch added DTS binding document for HiSilicon RoCE driver.

Signed-off-by: Wei Hu 
Signed-off-by: Lijun Ou 
---
 .../bindings/infiniband/hisilicon-hns-roce.txt | 107 +
 1 file changed, 107 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt

diff --git 
a/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt 
b/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
new file mode 100644
index 000..2c59ed9
--- /dev/null
+++ b/Documentation/devicetree/bindings/infiniband/hisilicon-hns-roce.txt
@@ -0,0 +1,107 @@
+HiSilicon RoCE DT description
+
+HiSilicon RoCE engine is a part of network subsystem.
+It works depending on other part of network wubsytem, such as, gmac and
+dsa fabric.
+
+Additional properties are described here:
+
+Required properties:
+- compatible: Should contain "hisilicon,hns-roce-v1".
+- reg: Physical base address of the roce driver and
+length of memory mapped region.
+- eth-handle: phandle, specifies a reference to a node
+representing a ethernet device.
+- dsaf-handle: phandle, specifies a reference to a node
+representing a dsaf device.
+- #address-cells: must be 2
+- #size-cells: must be 2
+Optional properties:
+- dma-coherent: Present if DMA operations are coherent.
+- interrupt-parent: the interrupt parent of this device.
+- interrupts: should contain 32 completion event irq,1 async event irq
+and 1 event overflow irq.
+- interrupt-names:should be one of 34 irqs for roce device
+  - hns-roce-comp-0 ~ hns-roce-comp-31: 32 complete event irq
+  - hns-roce-async: 1 async event irq
+  - hns-roce-common: named common exception warning irq
+Example:
+   infiniband@c400 {
+   compatible = "hisilicon,hns-roce-v1";
+   reg = <0x0 0xc400 0x0 0x10>;
+   dma-coherent;
+   eth-handle = <ð2 ð3 ð4 ð5 ð6 ð7>;
+   dsaf-handle = <&soc0_dsa>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+   interrupt-parent = <&mbigen_dsa>;
+   interrupts = <722 1>,
+   <723 1>,
+   <724 1>,
+   <725 1>,
+   <726 1>,
+   <727 1>,
+   <728 1>,
+   <729 1>,
+   <730 1>,
+   <731 1>,
+   <732 1>,
+   <733 1>,
+   <734 1>,
+   <735 1>,
+   <736 1>,
+   <737 1>,
+   <738 1>,
+   <739 1>,
+   <740 1>,
+   <741 1>,
+   <742 1>,
+   <743 1>,
+   <744 1>,
+   <745 1>,
+   <746 1>,
+   <747 1>,
+   <748 1>,
+   <749 1>,
+   <750 1>,
+   <751 1>,
+   <752 1>,
+   <753 1>,
+   <785 1>,
+   <754 4>;
+
+   interrupt-names = "hns-roce-comp-0",
+   "hns-roce-comp-1",
+   "hns-roce-comp-2",
+   "hns-roce-comp-3",
+   "hns-roce-comp-4",
+   "hns-roce-comp-5",
+   "hns-roce-comp-6",
+   "hns-roce-comp-7",
+   "hns-roce-comp-8",
+   "hns-roce-comp-9",
+   "hns-roce-comp-10",
+   "hns-roce-comp-11",
+   "hns-roce-comp-12",
+   "hns-roce-comp-13",
+   "hns-roce-comp-14",
+   "hns-roce-comp-15",
+   "hns-roce-comp-16",
+   "hns-roce-comp-17",
+   "hns-roce-comp-18",
+   "hns-roce-

[RESEND PATCH v9 09/22] IB/hns: Add hca support

2016-06-07 Thread Lijun Ou

This patch mainly setup hca for RoCE. It will do a series of
initial works, as follows:
1. init uar table, allocate uar resource
2. init pd table
3. init cq table
4. init mr table
5. init qp table

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  | 128 +
 drivers/infiniband/hw/hns/hns_roce_cq.c |  22 +++
 drivers/infiniband/hw/hns/hns_roce_device.h |  69 +
 drivers/infiniband/hw/hns/hns_roce_icm.c|  88 
 drivers/infiniband/hw/hns/hns_roce_icm.h|   7 +
 drivers/infiniband/hw/hns/hns_roce_main.c   |  79 +++
 drivers/infiniband/hw/hns/hns_roce_mr.c | 210 
 drivers/infiniband/hw/hns/hns_roce_pd.c |  88 
 drivers/infiniband/hw/hns/hns_roce_qp.c |  30 
 9 files changed, 721 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_alloc.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_mr.c
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_pd.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
new file mode 100644
index 000..d2932c1
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -0,0 +1,128 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ * Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hns_roce_device.h"
+
+int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, unsigned long *obj)
+{
+   int ret = 0;
+
+   spin_lock(&bitmap->lock);
+   *obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
+   if (*obj >= bitmap->max) {
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   *obj = find_first_zero_bit(bitmap->table, bitmap->max);
+   }
+
+   if (*obj < bitmap->max) {
+   set_bit(*obj, bitmap->table);
+   bitmap->last = (*obj + 1);
+   if (bitmap->last == bitmap->max)
+   bitmap->last = 0;
+   *obj |= bitmap->top;
+   } else {
+   ret = -1;
+   }
+
+   spin_unlock(&bitmap->lock);
+
+   return ret;
+}
+
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj)
+{
+   hns_roce_bitmap_free_range(bitmap, obj, 1);
+}
+
+void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
+   unsigned long obj, int cnt)
+{
+   int i;
+
+   obj &= bitmap->max + bitmap->reserved_top - 1;
+
+   spin_lock(&bitmap->lock);
+   for (i = 0; i < cnt; i++)
+   clear_bit(obj + i, bitmap->table);
+
+   bitmap->last = min(bitmap->last, obj);
+   bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
+  & bitmap->mask;
+   spin_unlock(&bitmap->lock);
+}
+
+int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, u32 num, u32 mask,
+u32 reserved_bot, u32 reserved_top)
+{
+   u32 i;
+
+   if (num != roundup_pow_of_two(num))
+   return -EINVAL;
+
+   bitmap->last = 0;
+   bitmap->top = 0;
+   bitmap->max = num - reserved_top;
+   bitmap->mask = mask;
+   bitmap->reserved_top = reserved_top;
+   spin_lock_init(&bitmap->lock);
+   bitmap->table = kcalloc(BITS_TO_LONGS(bitmap->max), sizeof(long),
+   GFP_KERNEL);
+   if (!bitmap->

[RESEND PATCH v9 05/22] IB/hns: Add initial profile resource

2016-06-07 Thread Lijun Ou

This patch mainly configured some profile resoure. For example,
vendor_id, hardware version, and some data structure sizes so on.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_common.h | 49 +++
 drivers/infiniband/hw/hns/hns_roce_device.h | 55 -
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  | 76 +
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  | 38 ++-
 drivers/infiniband/hw/hns/hns_roce_main.c   |  7 +++
 5 files changed, 223 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_common.h

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
new file mode 100644
index 000..4cc4761
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_COMMON_H
+#define _HNS_ROCE_COMMON_H
+
+#define roce_read(dev, reg)readl((dev)->reg_base + (reg))
+
+/*ROCEE_REG DEFINITION/
+#define ROCEE_VENDOR_ID_REG0x0
+#define ROCEE_VENDOR_PART_ID_REG   0x4
+
+#define ROCEE_HW_VERSION_REG   0x8
+
+#define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
+#define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
+
+#define ROCEE_ACK_DELAY_REG0x14
+
+#endif /* _HNS_ROCE_COMMON_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2e18488..5a93670 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -45,6 +45,12 @@
 #define DRV_NAME "hns_roce"
 
 #define HNS_ROCE_MAX_IRQ_NUM   34
+
+#define HNS_ROCE_COMP_VEC_NUM  32
+
+#define HNS_ROCE_AEQE_VEC_NUM  1
+#define HNS_ROCE_AEQE_OF_VEC_NUM   1
+
 #define HNS_ROCE_MAX_PORTS 6
 
 struct hns_roce_ib_iboe {
@@ -53,11 +59,52 @@ struct hns_roce_ib_iboe {
 };
 
 struct hns_roce_caps {
-   u8  num_ports;
+   u64 fw_ver;
+   u8  num_ports;
+   int gid_table_len[HNS_ROCE_MAX_PORTS];
+   int pkey_table_len[HNS_ROCE_MAX_PORTS];
+   int local_ca_ack_delay;
+   int num_uars;
+   u32 phy_num_uars;
+   u32 max_sq_sg;  /* 2 */
+   u32 max_sq_inline;  /* 32 */
+   u32 max_rq_sg;  /* 2 */
+   int num_qps;/* 256k */
+   u32 max_wqes;   /* 16k */
+   u32 max_sq_desc_sz; /* 64 */
+   u32 max_rq_desc_sz; /* 64 */
+   int max_qp_init_rdma;
+   int max_qp_dest_rdma;
+   int sqp_start;
+   int num_cqs;
+   int max_cqes;
+   int reserved_cqs;
+   int num_aeq_vectors;/* 1 */
+   int num_comp_vectors;   /* 32 ceq */
+   int num_other_vectors;
+   int num_mtpts;
+   u32 num_mtt_segs;
+   int reserved_mtts;
+   int reserved_mrws;
+   int reserved_uars;
+   int num_pds;
+   int reserved_pds;
+   u32 mtt_entry_sz;
+   u32 cq_entry_sz;
+   u32 page_size_cap;
+   u32

[RESEND PATCH v9 03/22] IB/hns: Add initial main frame driver and get cfg info

2016-06-07 Thread Lijun Ou

This patch mainly added the initial bare main driver. It
could get the relative configure information of net node.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  72 ++
 drivers/infiniband/hw/hns/hns_roce_main.c   | 197 
 2 files changed, 269 insertions(+)
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_device.h
 create mode 100644 drivers/infiniband/hw/hns/hns_roce_main.c

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
new file mode 100644
index 000..f9de8e4
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -0,0 +1,72 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_DEVICE_H
+#define _HNS_ROCE_DEVICE_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME "hns_roce"
+
+#define HNS_ROCE_MAX_IRQ_NUM   34
+#define HNS_ROCE_MAX_PORTS 6
+
+struct hns_roce_ib_iboe {
+   struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
+   u8  phy_port[HNS_ROCE_MAX_PORTS];
+};
+
+struct hns_roce_caps {
+   u8  num_ports;
+};
+
+struct hns_roce_dev {
+   struct ib_deviceib_dev;
+   struct platform_device  *pdev;
+   struct hns_roce_ib_iboe iboe;
+
+   int irq[HNS_ROCE_MAX_IRQ_NUM];
+   u8 __iomem  *reg_base;
+   struct hns_roce_capscaps;
+
+   int cmd_mod;
+   int loop_idc;
+};
+
+#endif /* _HNS_ROCE_DEVICE_H */
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
new file mode 100644
index 000..21c5e8e
--- /dev/null
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ * Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#incl

[RESEND PATCH v9 15/22] IB/hns: Add PD operations support

2016-06-07 Thread Lijun Ou

This patch added the verbs to operate PD. It mainly includes
the functions of allocating PD and deallocating PD.

Signed-off-by: Wei Hu 
Signed-off-by: Nenglong Zhao 
Signed-off-by: Lijun Ou 
---
 drivers/infiniband/hw/hns/hns_roce_device.h | 17 
 drivers/infiniband/hw/hns/hns_roce_main.c   |  8 +++-
 drivers/infiniband/hw/hns/hns_roce_pd.c | 62 +
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 99f2653..36fd4f3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -134,6 +134,11 @@ struct hns_roce_ucontext {
struct hns_roce_uar uar;
 };
 
+struct hns_roce_pd {
+   struct ib_pdibpd;
+   unsigned long   pdn;
+};
+
 struct hns_roce_bitmap {
/* Bitmap Traversal last a bit which is 1 */
unsigned long   last;
@@ -399,6 +404,11 @@ static inline struct hns_roce_ucontext
return container_of(ibucontext, struct hns_roce_ucontext, ibucontext);
 }
 
+static inline struct hns_roce_pd *to_hr_pd(struct ib_pd *ibpd)
+{
+   return container_of(ibpd, struct hns_roce_pd, ibpd);
+}
+
 static inline void hns_roce_write64_k(__be32 val[2], void __iomem *dest)
 {
__raw_writeq(*(u64 *) val, dest);
@@ -446,6 +456,13 @@ int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap 
*bitmap, int cnt,
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
unsigned long obj, int cnt);
 
+struct ib_pd *hns_roce_alloc_pd(struct ib_device *ib_dev,
+   struct ib_ucontext *context,
+   struct ib_udata *udata);
+int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, unsigned long *pdn);
+void hns_roce_pd_free(struct hns_roce_dev *hr_dev, unsigned long pdn);
+int hns_roce_dealloc_pd(struct ib_pd *pd);
+
 void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn);
 void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type);
 void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type);
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 64cf5c8..2cebbc8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -604,7 +604,9 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->uverbs_cmd_mask =
(1ULL << IB_USER_VERBS_CMD_GET_CONTEXT) |
(1ULL << IB_USER_VERBS_CMD_QUERY_DEVICE) |
-   (1ULL << IB_USER_VERBS_CMD_QUERY_PORT);
+   (1ULL << IB_USER_VERBS_CMD_QUERY_PORT) |
+   (1ULL << IB_USER_VERBS_CMD_ALLOC_PD) |
+   (1ULL << IB_USER_VERBS_CMD_DEALLOC_PD);
 
/* HCA||device||port */
ib_dev->modify_device   = hns_roce_modify_device;
@@ -618,6 +620,10 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->dealloc_ucontext= hns_roce_dealloc_ucontext;
ib_dev->mmap= hns_roce_mmap;
 
+   /* PD */
+   ib_dev->alloc_pd= hns_roce_alloc_pd;
+   ib_dev->dealloc_pd  = hns_roce_dealloc_pd;
+
ret = ib_register_device(ib_dev, NULL);
if (ret) {
dev_err(dev, "ib_register_device failed!\n");
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c 
b/drivers/infiniband/hw/hns/hns_roce_pd.c
index 6ad38f2..f7f8fc0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -40,6 +40,28 @@
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
 
+int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, unsigned long *pdn)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+   unsigned long pd_number;
+   int ret = 0;
+
+   ret = hns_roce_bitmap_alloc(&hr_dev->pd_bitmap, &pd_number);
+   if (ret == -1) {
+   dev_err(dev, "alloc pdn from pdbitmap failed\n");
+   return -ENOMEM;
+   }
+
+   *pdn = pd_number;
+
+   return 0;
+}
+
+void hns_roce_pd_free(struct hns_roce_dev *hr_dev, unsigned long pdn)
+{
+   hns_roce_bitmap_free(&hr_dev->pd_bitmap, pdn);
+}
+
 int hns_roce_init_pd_table(struct hns_roce_dev *hr_dev)
 {
return hns_roce_bitmap_init(&hr_dev->pd_bitmap, hr_dev->caps.num_pds,
@@ -52,6 +74,46 @@ void hns_roce_cleanup_pd_table(struct hns_roce_dev *hr_dev)
hns_roce_bitmap_cleanup(&hr_dev->pd_bitmap);
 }
 
+struct ib_pd *hns_roce_alloc_pd(struct ib_device *ib_dev,
+   struct ib_ucontext *context,
+   struct ib_udata *udata)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
+   struct device *dev = &hr_dev->pdev->dev;
+   struct hns_roce_pd *pd;
+   int ret;
+
+   pd = kmalloc(sizeof(*pd), GFP_KERNEL);
+

Re: [PATCH v10 2/7] usb: mux: add generic code for dual role port mux

2016-06-07 Thread Peter Chen

On Tue, Jun 07, 2016 at 06:05:25PM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Roger Quadros  writes:
> >> I might be able to find some time to implement a proof of concept which
> >> would allow your platforms to get dual-role with code we already have,
> >> but I need DWC3's OTG support which, I'm assuming, you already have :-)
> >> 
> >> If you wanna try something offline, just ping me ;-) I'll be happy to
> >> help.
> >
> > What you are proposing is a dwc3 only solution. With the otg/dual-role
> > series we are trying to be generic as much as possible.
> 
> Well, if there is a need for that, sure. Take MUSB for instance. It
> makes use of nothing of the sorts, because it doesn't have to.
> 

Indeed, some centralized IP drivers like MUSB, chipidea, dwc3 do not
need this framework for role switch. But there are some common stuffs,
like OTG FSM (fully/simplified), manage roles and sysfs for role switch,
these things can be in a framework, the purpose of this framework is
easy for dual-role switch function.

Besides, when the host and device driver are in different folders for
platform, eg host/ and gadget/udc/, a role switch driver is needed if 
we need dual role function.

Recently, the dual-role function is more and more common for USB, a
framework can avoid duplicated work and let switch be standardized.

> > Whether controller drivers want to use it or not is upto the driver
> > maintainers but we should at least ensure that user space ABI if any,
> > is consistent across different implementations.
> 
> Role decisions should not be exposed to userspace unless as debug
> feature (using e.g. DebugFS). That should be done either by the HW or
> within the kernel.
> 
> If we're discussing userspace ABI here, there's something very wrong
> with OTG/DRD layer design.

Currently, there are some use cases which need to switch role on the
fly (will be more for type-c in future), a sysfs for role switch is
necessary.

-- 

Best Regards,
Peter Chen

Re: [PATCH 9/9] pinctrl: at91-pio4: make it explicitly non-modular

2016-06-07 Thread Ludovic Desroches

On Mon, Jun 06, 2016 at 10:43:08PM -0400, Paul Gortmaker wrote:
> The Kconfig currently controlling compilation of this code is:
> 
> drivers/pinctrl/Kconfig:config PINCTRL_AT91PIO4
> drivers/pinctrl/Kconfig:bool "AT91 PIO4 pinctrl driver"
> 
> ...meaning that it currently is not being built as a module by anyone.
> 
> Lets remove the modular code that is essentially orphaned, so that
> when reading the driver there is no doubt it is builtin-only.
> 
> We explicitly disallow a driver unbind, since that doesn't have a
> sensible use case anyway, and it allows us to drop the ".remove"
> code for non-modular drivers.
> 
> Since module_platform_driver() uses the same init level priority as
> builtin_platform_driver() the init ordering remains unchanged with
> this commit.
> 
> Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
> 
> We also delete the MODULE_LICENSE tag etc. since all that information
> is already contained at the top of the file in the comments.
> 
> Cc: Ludovic Desroches 
> Cc: Linus Walleij 
> Cc: linux-g...@vger.kernel.org
> Signed-off-by: Paul Gortmaker 

Thanks for cleaning this stuff.
Acked-by: Ludovic Desroches 

> ---
>  drivers/pinctrl/pinctrl-at91-pio4.c | 22 +++---
>  1 file changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/pinctrl/pinctrl-at91-pio4.c 
> b/drivers/pinctrl/pinctrl-at91-pio4.c
> index a025b40d246b..4438dca85c1c 100644
> --- a/drivers/pinctrl/pinctrl-at91-pio4.c
> +++ b/drivers/pinctrl/pinctrl-at91-pio4.c
> @@ -20,7 +20,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -879,7 +879,6 @@ static const struct of_device_id atmel_pctrl_of_match[] = 
> {
>   /* sentinel */
>   }
>  };
> -MODULE_DEVICE_TABLE(of, atmel_pctrl_of_match);
>  
>  static int atmel_pinctrl_probe(struct platform_device *pdev)
>  {
> @@ -1074,28 +1073,13 @@ clk_prepare_enable_error:
>   return ret;
>  }
>  
> -int atmel_pinctrl_remove(struct platform_device *pdev)
> -{
> - struct atmel_pioctrl *atmel_pioctrl = platform_get_drvdata(pdev);
> -
> - irq_domain_remove(atmel_pioctrl->irq_domain);
> - clk_disable_unprepare(atmel_pioctrl->clk);
> - gpiochip_remove(atmel_pioctrl->gpio_chip);
> -
> - return 0;
> -}
> -
>  static struct platform_driver atmel_pinctrl_driver = {
>   .driver = {
>   .name = "pinctrl-at91-pio4",
>   .of_match_table = atmel_pctrl_of_match,
>   .pm = &atmel_pctrl_pm_ops,
> + .suppress_bind_attrs = true,
>   },
>   .probe = atmel_pinctrl_probe,
> - .remove = atmel_pinctrl_remove,
>  };
> -module_platform_driver(atmel_pinctrl_driver);
> -
> -MODULE_AUTHOR(Ludovic Desroches );
> -MODULE_DESCRIPTION("Atmel PIO4 pinctrl driver");
> -MODULE_LICENSE("GPL v2");
> +builtin_platform_driver(atmel_pinctrl_driver);
> -- 
> 2.8.0
>

Re: [PATCH 06/10] mm, oom: kill all tasks sharing the mm

2016-06-07 Thread Michal Hocko

On Tue 07-06-16 15:15:37, David Rientjes wrote:
> On Tue, 7 Jun 2016, Oleg Nesterov wrote:
> 
> > On 06/06, David Rientjes wrote:
> > >
> > > > There is a potential race where we kill the oom disabled task which is
> > > > highly unlikely but possible. It would happen if __set_oom_adj raced
> > > > with select_bad_process and then it is OK to consider the old value or
> > > > with fork when it should be acceptable as well.
> > > > Let's add a little note to the log so that people would tell us that
> > > > this really happens in the real life and it matters.
> > > >
> > >
> > > We cannot kill oom disabled processes at all, little race or otherwise.
> > 
> > But this change doesn't really make it worse?
> > 
> 
> Why is the patch asking users to report oom killing of a process that 
> raced with setting /proc/pid/oom_score_adj to OOM_SCORE_ADJ_MIN?  What is 
> possibly actionable about it?

Well, the primary point is to know whether such races happen in the real
loads and whether they actually matter. If yes we can harden the locking
or come up with a less racy solutions.
-- 
Michal Hocko
SUSE Labs

RE: [PATCH v10 2/7] usb: mux: add generic code for dual role port mux

2016-06-07 Thread Jun Li

Hi, Baolu

From: Lu Baolu [mailto:baolu...@linux.intel.com] 
Sent: Wednesday, June 08, 2016 1:11 PM
To: Jun Li ; Felipe Balbi ; Roger 
Quadros ; Peter Chen 
Cc: Mathias Nyman ; Greg Kroah-Hartman 
; Lee Jones ; Heikki Krogerus 
; Liam Girdwood ; Mark 
Brown ; linux-...@vger.kernel.org; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 2/7] usb: mux: add generic code for dual role port mux

Hi,

[I have to resend my reply. The previous reply was failed to deliver
to usb mailing list. Sorry for inconvenience.]
On 06/08/2016 11:04 AM, Jun Li wrote:
Whether controller drivers want to use it or not is upto the driver
> > maintainers but we should at least ensure that user space ABI if any,
> > is consistent across different implementations.
> 
> Role decisions should not be exposed to userspace unless as debug feature
> (using e.g. DebugFS). That should be done either by the HW or within the
> kernel.
> In many cases the role decision is made by usersapce, this also should be
> covered.
> This patchset also expose it to userspace but I think it isn't for debug:
> /sys/bus/platform/devices/.../portmux.N/state

> Please don't use this interface for host/gadget role switch, and the
> document doesn't tell you to do so as well. This is only designed to
> put the port mux device to a right direction. Host/gadget dual
> role switch includes other elements, like ID pin detection, type-c
> events, VBUS management and so on.

Confused, then what's the purpose of it? How to use it?
Below is all about it in document, it's seems telling me can do that,
but you say no:)

+What:  /sys/bus/platform/devices/.../portmux.N/name
+   /sys/bus/platform/devices/.../portmux.N/state
+Date:  April 2016
+Contact:   Lu Baolu 
+Description:
+   In some platforms, a single USB port is shared between a USB 
host
+   controller and a device controller. A USB mux driver is needed 
to
+   handle the port mux. Read-only attribute "name" shows the name 
of
+   the port mux device. "state" attribute shows and stores the mux
+   state.
+   For read:
+   'unknown'- the mux hasn't been set yet;
+   'peripheral' - mux has been switched to PERIPHERAL controller;
+   'host'   - mux has been switched to HOST controller.
+   For write:
+   'peripheral' - mux will be switched to PERIPHERAL controller;
+   'host'   - mux will be switched to HOST controller.

> Best regards,
> Lu Baolu

Re: [PATCH net] Driver: Vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread Shrikrishna Khare



On Tue, 7 Jun 2016, David Miller wrote:

> From: Shrikrishna Khare 
> Date: Tue,  7 Jun 2016 22:55:17 -0700
> 
> > The device emulation may send segCnt of 1 for LRO packets.
> > 
> > Signed-off-by: Shrikrishna Khare 
> > Signed-off-by: Jin Heo 
> 
> Please do not capitalize subsystem prefixes in your Subject line,
> and "Driver: " is so generic that it's pointless to use it as a
> part of the subsystem prefix.  Plain "vmxnet3: " is sufficient.
> 

Ok. Sent v2 of the patch with the subject line fixed.

Re: [PATCH] ext4: mballoc.c: fix ac_g_ex and ac_f_ex misuse bug in EXT4_MB_HINT_TRY_GOAL path

2016-06-07 Thread Darrick J. Wong

On Wed, Jun 08, 2016 at 02:08:21PM +0800, Lin Feng wrote:
> Hi Andreas,
> 
> Thanks for your reply and review.
> 
> On 06/08/2016 05:01 AM, Andreas Dilger wrote:
> >On Jun 2, 2016, at 6:01 AM, Lin Feng  wrote:
> >>
> >>Descriptions:
> >>ext4 block allocation core stack:
> >>ext4_mb_new_blocks
> >>  ext4_mb_normalize_request
> >>  ext4_mb_regular_allocator
> >>ext4_mb_find_by_goal
> >>  mb_find_extent(e4b, ac->ac_g_ex.fe_start, ac->ac_g_ex.fe_len, &ex);
> >>
> >>The start block searching hint for merging(use EXT4_MB_HINT_TRY_GOAL flag)
> >>set in ext4_mb_normalize_request is stored in ac_f_ex, while in
> >>EXT4_MB_HINT_TRY_GOAL path which falls in ext4_mb_find_by_goal always use
> >>ac_g_ex as a hint and the hint set in ext4_mb_normalize_request is never
> >>use.
> >>
> >>We could hit this bug by writing a sparse file from backward mode and the
> >>file may get fragments even if the physical blocks in the hole is free,
> >>which is expected to be merged into a single extent.
> >
> >This looks reasonable.  Do you have any kind of test case that shows the
> >effect of the change (e.g. fragmentation counts per file before/after)?
> 
> I found this bug by fiddling the block allocation policy for ext4.
> 
> In order to see the effect of this patch, we could do the following:
> 
> On a fresh created ext4 fs, make a new topdir called b to hash the test to
> some blockgroup relatively empty, trying to not to be effected by original
> physical fragments. Then write a new file in backward mode.
> 
> steps:
> 1. mkdir b && cd b
> 
> 2. before this patch:
> [root@CentOS6 b]# rm dat -f && sync && sleep 2 && dd if=/dev/zero of=dat
> bs=4k count=256 seek=511 conv=notrunc && sync && sleep 2 && dd if=/dev/zero
> of=dat bs=4k count=254 seek=257 conv=notrunc && sync && filefrag -vv dat
> 256+0 records in
> 256+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000825721 s, 1.3 GB/s
> 254+0 records in
> 254+0 records out
> 1040384 bytes (1.0 MB) copied, 0.000766912 s, 1.4 GB/s
> Filesystem type is: ef53
> File size of dat is 3141632 (767 blocks, blocksize 4096)
>  ext logical physical expected length flags
>0 257 9512 254
>1 511   557056 9766256 eof
> dat: 2 extents found

Sure would be nice to have an xfstests for this...

--D

> 
> 3. after this patch:
> [root@CentOS6 b]# rm dat -f && sync && sleep 2 && dd if=/dev/zero of=dat
> bs=4k count=256 seek=511 conv=notrunc && sync && sleep 2 && dd if=/dev/zero
> of=dat bs=4k count=254 seek=257 conv=notrunc && sync && filefrag -vv dat
> 256+0 records in
> 256+0 records out
> 1048576 bytes (1.0 MB) copied, 0.000856416 s, 1.2 GB/s
> 254+0 records in
> 254+0 records out
> 1040384 bytes (1.0 MB) copied, 0.000669862 s, 1.6 GB/s
> Filesystem type is: ef53
> File size of dat is 3141632 (767 blocks, blocksize 4096)
>  ext logical physical expected length flags
>0 257   556802 510 eof
> dat: 1 extent found
> 
> thanks,
> linfeng
> >
> >Reviewed-by: Andreas Dilger 
> >
> >>Signed-off-by: Lin Feng 
> >>---
> >>fs/ext4/mballoc.c | 8 
> >>1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> >>index c1ab3ec..e31fc63 100644
> >>--- a/fs/ext4/mballoc.c
> >>+++ b/fs/ext4/mballoc.c
> >>@@ -3198,15 +3198,15 @@ ext4_mb_normalize_request(struct 
> >>ext4_allocation_context *ac,
> >>if (ar->pright && (ar->lright == (start + size))) {
> >>/* merge to the right */
> >>ext4_get_group_no_and_offset(ac->ac_sb, ar->pright - size,
> >>-   &ac->ac_f_ex.fe_group,
> >>-   &ac->ac_f_ex.fe_start);
> >>+   &ac->ac_g_ex.fe_group,
> >>+   &ac->ac_g_ex.fe_start);
> >>ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
> >>}
> >>if (ar->pleft && (ar->lleft + 1 == start)) {
> >>/* merge to the left */
> >>ext4_get_group_no_and_offset(ac->ac_sb, ar->pleft + 1,
> >>-   &ac->ac_f_ex.fe_group,
> >>-   &ac->ac_f_ex.fe_start);
> >>+   &ac->ac_g_ex.fe_group,
> >>+   &ac->ac_g_ex.fe_start);
> >>ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
> >>}
> >>
> >>--
> >>1.9.3
> >>
> >>
> >
> >
> >Cheers, Andreas
> >
> >
> >
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net v2] vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread Shrikrishna Khare

The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare 
Signed-off-by: Jin Heo 

---
v2: fix subject line
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index db8022a..6f399b2 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1369,7 +1369,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd;
 
segCnt = rcdlro->segCnt;
-   BUG_ON(segCnt <= 1);
+   BUG_ON(segCnt == 0);
mss = rcdlro->mss;
if (unlikely(segCnt <= 1))
segCnt = 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index c482539..3d2b64e 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.7.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.8.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040700
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040800
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
2.8.2

Re: [PATCH] ext4: mballoc.c: fix ac_g_ex and ac_f_ex misuse bug in EXT4_MB_HINT_TRY_GOAL path

2016-06-07 Thread Lin Feng


Hi Andreas,

Thanks for your reply and review.

On 06/08/2016 05:01 AM, Andreas Dilger wrote:

On Jun 2, 2016, at 6:01 AM, Lin Feng  wrote:


Descriptions:
ext4 block allocation core stack:
ext4_mb_new_blocks
  ext4_mb_normalize_request
  ext4_mb_regular_allocator
ext4_mb_find_by_goal
  mb_find_extent(e4b, ac->ac_g_ex.fe_start, ac->ac_g_ex.fe_len, &ex);

The start block searching hint for merging(use EXT4_MB_HINT_TRY_GOAL flag)
set in ext4_mb_normalize_request is stored in ac_f_ex, while in
EXT4_MB_HINT_TRY_GOAL path which falls in ext4_mb_find_by_goal always use
ac_g_ex as a hint and the hint set in ext4_mb_normalize_request is never
use.

We could hit this bug by writing a sparse file from backward mode and the
file may get fragments even if the physical blocks in the hole is free,
which is expected to be merged into a single extent.


This looks reasonable.  Do you have any kind of test case that shows the
effect of the change (e.g. fragmentation counts per file before/after)?


I found this bug by fiddling the block allocation policy for ext4.

In order to see the effect of this patch, we could do the following:

On a fresh created ext4 fs, make a new topdir called b to hash the test to some 
blockgroup relatively empty, trying to not to be effected by original physical 
fragments. Then write a new file in backward mode.


steps:
1. mkdir b && cd b

2. before this patch:
[root@CentOS6 b]# rm dat -f && sync && sleep 2 && dd if=/dev/zero of=dat bs=4k 
count=256 seek=511 conv=notrunc && sync && sleep 2 && dd if=/dev/zero of=dat 
bs=4k count=254 seek=257 conv=notrunc && sync && filefrag -vv dat

256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 0.000825721 s, 1.3 GB/s
254+0 records in
254+0 records out
1040384 bytes (1.0 MB) copied, 0.000766912 s, 1.4 GB/s
Filesystem type is: ef53
File size of dat is 3141632 (767 blocks, blocksize 4096)
 ext logical physical expected length flags
   0 257 9512 254
   1 511   557056 9766256 eof
dat: 2 extents found

3. after this patch:
[root@CentOS6 b]# rm dat -f && sync && sleep 2 && dd if=/dev/zero of=dat bs=4k 
count=256 seek=511 conv=notrunc && sync && sleep 2 && dd if=/dev/zero of=dat 
bs=4k count=254 seek=257 conv=notrunc && sync && filefrag -vv dat

256+0 records in
256+0 records out
1048576 bytes (1.0 MB) copied, 0.000856416 s, 1.2 GB/s
254+0 records in
254+0 records out
1040384 bytes (1.0 MB) copied, 0.000669862 s, 1.6 GB/s
Filesystem type is: ef53
File size of dat is 3141632 (767 blocks, blocksize 4096)
 ext logical physical expected length flags
   0 257   556802 510 eof
dat: 1 extent found

thanks,
linfeng


Reviewed-by: Andreas Dilger 


Signed-off-by: Lin Feng 
---
fs/ext4/mballoc.c | 8 
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c1ab3ec..e31fc63 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3198,15 +3198,15 @@ ext4_mb_normalize_request(struct 
ext4_allocation_context *ac,
if (ar->pright && (ar->lright == (start + size))) {
/* merge to the right */
ext4_get_group_no_and_offset(ac->ac_sb, ar->pright - size,
-   &ac->ac_f_ex.fe_group,
-   &ac->ac_f_ex.fe_start);
+   &ac->ac_g_ex.fe_group,
+   &ac->ac_g_ex.fe_start);
ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
}
if (ar->pleft && (ar->lleft + 1 == start)) {
/* merge to the left */
ext4_get_group_no_and_offset(ac->ac_sb, ar->pleft + 1,
-   &ac->ac_f_ex.fe_group,
-   &ac->ac_f_ex.fe_start);
+   &ac->ac_g_ex.fe_group,
+   &ac->ac_g_ex.fe_start);
ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
}

--
1.9.3





Cheers, Andreas

Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU

2016-06-07 Thread Abdul Haleem


Hi Gautham,

Thanks a lot for the fix.

With your patches applied, 4.7.0-rc2 builds fine on ppc64le bare metal.
Boot was successful with No call traces.

Thanks for all your support !

Regard's
Abdul

On Tuesday 07 June 2016 08:44 PM, Gautham R. Shenoy wrote:


With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

[ cut here ]
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
<..snip..>
Call Trace:
[c00f273ff920] [c010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 
(unreliable)
[c00f273ffac0] [c00ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c00f273ffb70] [c00f5c58] notifier_call_chain+0x98/0x100
[c00f273ffbc0] [c00c5ed0] __cpu_notify+0x70/0xe0
[c00f273ffc00] [c00c6028] notify_online+0x38/0x50
[c00f273ffc30] [c00c5214] cpuhp_invoke_callback+0x84/0x250
[c00f273ffc90] [c00c562c] cpuhp_up_callbacks+0x5c/0x120
[c00f273ffce0] [c00c64d4] cpuhp_thread_fun+0x184/0x1c0
[c00f273ffd20] [c00fa050] smpboot_thread_fn+0x290/0x2a0
[c00f273ffd80] [c00f45b0] kthread+0x110/0x130
[c00f273ffe30] [c0009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch sets the affinity of the worker to
a) the only online CPU in the cpumask of the worker pool when it comes
online.
b) the cpumask of the worker pool when the second CPU in the pool's
cpumask comes online.

Reported-by: Abdul Haleem 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tejun Heo 
Cc: Michael Ellerman 
Signed-off-by: Gautham R. Shenoy 
---

RE: [PATCH 3/4] dell-wmi: Add information about other WMI event codes

2016-06-07 Thread Mario_Limonciello

> -Original Message-
> From: Pali Rohár [mailto:pali.ro...@gmail.com]
> Sent: Tuesday, June 7, 2016 6:00 PM
> To: Gabriele Mazzotta ; Limonciello, Mario
> 
> Cc: Matthew Garrett ; Darren Hart
> ; Michał Kępień ; Andy Lutomirski
> ; Alex Hung ; platform-driver-
> x...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 3/4] dell-wmi: Add information about other WMI event
> codes
> 
> On Friday 27 May 2016 00:04:23 Gabriele Mazzotta wrote:
> > On 22/05/2016 13:36, Pali Rohár wrote:
> > > ACPI DSDT tables have defined other WMI codes, but does not contain
> > > any description when those codes are emitted. Some other codes can
> > > be found in logs on internet. In this patch are all which I saw, but
> > > lot of them are not tested properly (e.g. for duplicate events with
> > > AT keyboard). Now we have all WMI event codes at one place and in
> > > future after proper testing those codes can be correctly enabled or
> disabled...
> > >
> > > Signed-off-by: Pali Rohár 
> > > ---
> > >  drivers/platform/x86/dell-wmi.c |   32
> 
> > >  1 file changed, 32 insertions(+)
> > >
> > > diff --git a/drivers/platform/x86/dell-wmi.c
> > > b/drivers/platform/x86/dell-wmi.c index 363d927..7aac1dc 100644
> > > --- a/drivers/platform/x86/dell-wmi.c
> > > +++ b/drivers/platform/x86/dell-wmi.c
> > > @@ -110,6 +110,9 @@ static const struct key_entry
> dell_wmi_legacy_keymap[] __initconst = {
> > >   /* BIOS error detected */
> > >   { KE_IGNORE, 0xe00d, { KEY_RESERVED } },
> > >
> > > + /* Unknown, defined in ACPI DSDT */
> > > + /* { KE_IGNORE, 0xe00e, { KEY_RESERVED } }, */
> > > +
> >
> > I'm interested in knowing what's the meaning of this 0xe00e. This
> > event is sent multiple times when I suspend/resume my laptop and it's
> > definitely not a keypress.
> 
> From DSDT dumps which I have seen, I guess it could be something with battery
> charging... but that is only my guess.
> 
> Mario, do you have any idea, what these unknown events are?

Off-hand I'm not sure, it would require some more digging.

Can you please remind me what model numbers and BIOS combinations you have 
found e00e in DSDT and what context the events are actually happening?   
Anything released in the past two years?

Re: [PATCH] Staging: unisys: visorhba: visorhba_main: fixed a coding style issue

2016-06-07 Thread Greg KH

On Sat, May 21, 2016 at 12:30:46AM +0530, Rumesh Hapuarachchi wrote:
> fixed checkpatch.pl warning about 'Prefer 'unsigned int' to bare use of 
> 'unsigned'
> 
> Signed-off-by: Rumesh Hapuarahcchi 
> ---
>  drivers/staging/unisys/visorhba/visorhba_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c 
> b/drivers/staging/unisys/visorhba/visorhba_main.c
> index 6a4570d..3b69b33 100644
> --- a/drivers/staging/unisys/visorhba/visorhba_main.c
> +++ b/drivers/staging/unisys/visorhba/visorhba_main.c
> @@ -1122,9 +1122,9 @@ static int visorhba_probe(struct visor_device *dev)
>   if (err < 0)
>   goto err_scsi_host_put;
>  
> - scsihost->max_id = (unsigned)max.max_id;
> - scsihost->max_lun = (unsigned)max.max_lun;
> - scsihost->cmd_per_lun = (unsigned)max.cmd_per_lun;
> + scsihost->max_id = (unsigned int)max.max_id;
> + scsihost->max_lun = (unsigned int)max.max_lun;
> + scsihost->cmd_per_lun = (unsigned int)max.cmd_per_lun;
>   scsihost->max_sectors =
>   (unsigned short)(max.max_io_size >> 9);
>   scsihost->sg_tablesize =

Someone else sent this patch for this issue before you did, sorry :(

Re: [PATCH net] Driver: Vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread David Miller

From: Shrikrishna Khare 
Date: Tue,  7 Jun 2016 22:55:17 -0700

> The device emulation may send segCnt of 1 for LRO packets.
> 
> Signed-off-by: Shrikrishna Khare 
> Signed-off-by: Jin Heo 

Please do not capitalize subsystem prefixes in your Subject line,
and "Driver: " is so generic that it's pointless to use it as a
part of the subsystem prefix.  Plain "vmxnet3: " is sufficient.

RE: [PATCH] sd: remove redundant check for BLK_DEF_MAX_SECTORS

2016-06-07 Thread Long Li

Hi Martin,

Thanks for looking into this. The problem I'm trying to solve is that, I want 
to have lower layer driver to setup max_sectors bigger than 
BLK_DEF_MAX_SECTORS. In Hyper-v, we use 2MB max transfer I/O size, in future 
version the max transfer I/O size will increase to 8MB.

The implementation of sd.c limits the maximum value of max_sectors  to 
BLK_DEF_MAX_SECTORS.  Because sd_revalidate_disk is called late in the SCSI 
disk initialization process, there is no way for a lower layer driver to set 
this value to its "bigger" optimal size. 

The reason why I think it may not be necessary for sd.c to setup max_sectors, 
it's because this value may have already been setup twice before reaching the 
code in sd.c:
1. When this disk device is first scanned, or re-scanned (in scsi_scan.c), 
where it eventually calls __scsi_init_queue(), and use the max_sectors in the 
scsi_host_template.
2. in slave_configure of scsi_host_template, when the lower layer driver 
implements this function in its template and it can change this value there.

Long

> -Original Message-
> From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
> Sent: Monday, June 6, 2016 8:42 PM
> To: Long Li 
> Cc: Tom Yan ; James E.J. Bottomley
> ; Martin K. Petersen
> ; linux-s...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] sd: remove redundant check for
> BLK_DEF_MAX_SECTORS
> 
> > "Long" == Long Li  writes:
> 
> Long,
> 
> Long> The reason is that, max_sectors already has value at this point,
> Long> the default value is SCSI_DEFAULT_MAX_SECTORS
> Long> (include/scsi/scsi_host.h). The lower layer host driver can change
> Long> this value in its template.
> 
> The LLD sets max_hw_sectors which indicates the capabilities of the
> controller DMA hardware. Whereas the max_sectors limit is set by sd to
> either follow advise by the device or--if not provided--use the block layer
> default. max_sectors governs the size of READ/WRITE requests and do not
> reflect the capabilities of the DMA hardware.
> 
> Long> I think the drivers care about this value have already set it. So
> Long> it's better not to change it again. If they want max_sectors to be
> Long> set by sd, they can use BLOCK LIMITS VPD to tell it to do so.
> 
> Most drivers don't have the luxury of being able to generate VPDs for their
> attached target devices :)
> 
> --
> Martin K. PetersenOracle Linux Engineering

[PATCH net] Driver: Vmxnet3: segCnt can be 1 for LRO packets

2016-06-07 Thread Shrikrishna Khare

The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare 
Signed-off-by: Jin Heo 
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
 drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index db8022a..6f399b2 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1369,7 +1369,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd;
 
segCnt = rcdlro->segCnt;
-   BUG_ON(segCnt <= 1);
+   BUG_ON(segCnt == 0);
mss = rcdlro->mss;
if (unlikely(segCnt <= 1))
segCnt = 0;
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index c482539..3d2b64e 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.7.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.8.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040700
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040800
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
-- 
2.8.2

Re: [PATCH v13 10/10] kprobes: Add arm64 case in kprobe example module

2016-06-07 Thread Huang Shijie

On Thu, Jun 02, 2016 at 11:26:24PM -0400, David Long wrote:
> From: Sandeepa Prabhu 
> 
> Add info prints in sample kprobe handlers for ARM64
> 
> Signed-off-by: Sandeepa Prabhu 
> ---
>  samples/kprobes/kprobe_example.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/samples/kprobes/kprobe_example.c 
> b/samples/kprobes/kprobe_example.c
> index ed0ca0c..aad8e6f 100644
> --- a/samples/kprobes/kprobe_example.c
> +++ b/samples/kprobes/kprobe_example.c
> @@ -46,6 +46,10 @@ static int handler_pre(struct kprobe *p, struct pt_regs 
> *regs)
>   " ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->pc, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
Please add the "p->symbol_name" for the log, just as the above line.

> +#endif
>  
>   /* A dump_stack() here will give a stack backtrace */
>   return 0;
> @@ -71,6 +75,10 @@ static void handler_post(struct kprobe *p, struct pt_regs 
> *regs,
>   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
>   p->symbol_name, p->addr, regs->ex1);
>  #endif
> +#ifdef CONFIG_ARM64
> + pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx\n",
> + p->addr, (long)regs->pc);
> +#endif
>  }
Ditto.

thanks
Huang Shijie

Re: [PATCH 0/2] Proper ro_after_init implementation on s390

2016-06-07 Thread Heiko Carstens

On Tue, Jun 07, 2016 at 11:11:17AM -0700, Kees Cook wrote:
> On Tue, Jun 7, 2016 at 11:07 AM, Heiko Carstens
>  wrote:
> > On Tue, Jun 07, 2016 at 08:49:14AM -0700, Kees Cook wrote:
> >> > Heiko Carstens (2):
> >> >   vmlinux.lds.h: allow arch specific handling of ro_after_init data 
> >> > section
> >> >   s390/mm: add proper __ro_after_init support
> >> >
> >> >  arch/s390/include/asm/cache.h |  3 ---
> >> >  arch/s390/include/asm/sections.h  |  1 +
> >> >  arch/s390/kernel/vmlinux.lds.S| 12 +++-
> >> >  arch/s390/mm/init.c   |  7 ---
> >> >  arch/s390/mm/vmem.c   |  7 +++
> >> >  include/asm-generic/vmlinux.lds.h | 10 +-
> >> >  6 files changed, 28 insertions(+), 12 deletions(-)
> >>
> >> Awesome! This looks great to me! Have you had a chance to look through
> >> any of the arch/s390/ __init code for variables that should be marked
> >> __ro_after_init?
> >
> > Not yet, and actually this I'm a bit reluctant to do that, since any wrong
> > annotation will lead to kernel crashes sooner or later ;)
> > However I'll look into this as well.
> 
> Yup, though the good news is it's usually discovered very quickly. :)

Eventually it might make sense to add something like
DEBUG_SECTION_MISMATCH, which would only report on _write_ accesses from
non-init sections.

Not sure if this can be done easily and without the need of a new compiler
feature. The new problem class I'm afraid of is more or less the same that
we had when non-init code referenced (already freed) initdata objects.

Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao

Hi Matthias,

On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
> 
> On 03/06/16 15:11, Matthias Brugger wrote:
> >
> >
> [...]
> 
> >> +
> >> +smp_mb(); /* modify jump before enable thread */
> >> +}
> >> +
> >> +cmdq_thread_writel(thread, task->pa_base +
> >> task->command_size,
> >> +   CMDQ_THR_END_ADDR);
> >> +cmdq_thread_resume(thread);
> >> +}
> >> +list_move_tail(&task->list_entry, &thread->task_busy_list);
> >> +spin_unlock_irqrestore(&cmdq->exec_lock, flags);
> >> +}
> >> +
> >> +static void cmdq_handle_error_done(struct cmdq *cmdq,
> >> +   struct cmdq_thread *thread, u32 irq_flag)
> >> +{
> >> +struct cmdq_task *task, *tmp, *curr_task = NULL;
> >> +u32 curr_pa;
> >> +struct cmdq_cb_data cmdq_cb_data;
> >> +bool err;
> >> +
> >> +if (irq_flag & CMDQ_THR_IRQ_ERROR)
> >> +err = true;
> >> +else if (irq_flag & CMDQ_THR_IRQ_DONE)
> >> +err = false;
> >> +else
> >> +return;
> >> +
> >> +curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> >> +
> >> +list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
> >> + list_entry) {
> >> +if (curr_pa >= task->pa_base &&
> >> +curr_pa < (task->pa_base + task->command_size))
> >
> > What are you checking here? It seems as if you make some implcit
> > assumptions about pa_base and the order of execution of
> > commands in the
> > thread. Is it save to do so? Does dma_alloc_coherent give any
> > guarantees
> > about dma_handle?
> 
>  1. Check what is the current running task in this GCE thread.
>  2. Yes.
>  3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
> 
> >>>
> >>> Yes, physical addresses might be continous, but AFAIK there is no
> >>> guarantee that the dma_handle address is steadily growing, when
> >>> calling
> >>> dma_alloc_coherent. And if I understand the code correctly, you
> >>> use this
> >>> assumption to decide if the task picked from task_busy_list is
> >>> currently
> >>> executing. So I think this mecanism is not working.
> >>
> >> I don't use dma_handle address, and just use physical addresses.
> >>From CPU's point of view, tasks are linked by the busy list.
> >>From GCE's point of view, tasks are linked by the JUMP command.
> >>
> >>> In which cases does the HW thread raise an interrupt.
> >>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
> >>
> >> GCE will raise interrupt if any task is done or error.
> >> However, GCE is fast, so CPU may get multiple done tasks
> >> when it is running ISR.
> >>
> >> In case of error, that GCE thread will pause and raise interrupt.
> >> So, CPU may get multiple done tasks and one error task.
> >>
> >
> > I think we should reimplement the ISR mechanism. Can't we just read
> > CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
> > cmdq_handle_error_done to the thread_fn? You will need to pass
> > information from the handler to thread_fn, but that shouldn't be an
> > issue. AFAIK interrupts are disabled in the handler, so we should stay
> > there as short as possible. Traversing task_busy_list is expensive, so
> > we need to do it in a thread context.
> 
>  Actually, our initial implementation is similar to your suggestion,
>  but display needs CMDQ to return callback function very precisely,
>  else display will drop frame.
>  For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>  and CMDQ needs to call callback function in ISR.
>  If we defer callback to workqueue, the time interval may be larger than
>  32 ms.sometimes.
> 
> >>>
> >>> I think the problem is, that you implemented the workqueue as a ordered
> >>> workqueue, so there is no parallel processing. I'm still not sure why
> >>> you need the workqueue to be ordered. Can you please explain.
> >>
> >> The order should be kept.
> >> Let me use mouse cursor as an example.
> >> If task 1 means move mouse cursor to point A, task 2 means point B,
> >> and task 3 means point C, our expected result is A -> B -> C.
> >> If the order is not kept, the result could become A -> C -> B.
> >>
> >
> > Got it, thanks for the clarification.
> >
> 
> I think a way to get rid of the workqueue is to use a timer, which gets 
> programmed to the time a timeout in the first task in the busy list 
> would happen. Everytime we update the busy list (e.g. because of task 
> got finished by the thread), we up

Re: [LKP] [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

2016-06-07 Thread Ye Xiaolong

On Tue, Jun 07, 2016 at 05:56:27PM -0400, Johannes Weiner wrote:
>On Tue, Jun 07, 2016 at 12:48:17PM +0800, Ye Xiaolong wrote:
>> FYI, below is the comparison info between 3ed3a4f, 795ae7ay, v4.7-rc2 and the
>> revert commit (eaa7f0d).
>
>Thanks for running this.
>
>Alas, I still can not make heads or tails of this, or reproduce it
>locally for that matter.
>
>With this test run, there seems to be a significant increase in system time:
>
>>  92.03 ±  0%  +5.6%  97.23 ± 11% +30.5% 120.08 ±  1% 
>> +30.0% 119.61 ±  0%  pixz.time.system_time
>
>Would it be possible to profile the testruns using perf? Maybe we can
>find out where the kernel is spending the extra time.
>
>But just to make sure I'm looking at the right code, can you first try
>the following patch on top of Linus's current tree and see if that
>gets performance back to normal? It's a partial revert of the
>watermarks that singles out the fair zone allocator:

Seems that this patch doesn't help to gets performance back.
I've attached the comparison result among 3ed3a4f, 795ae7ay, v4.7-rc2 and
1fe49ba5 ("mm: revert fairness batching to before the watermarks were")
with perf profile information.  You can find it via searching 'perf-profile'.

Thanks,
Xiaolong

>
>From 2015eaad688486d65fcf86185e213fff8506b3fe Mon Sep 17 00:00:00 2001
>From: Johannes Weiner 
>Date: Tue, 7 Jun 2016 17:45:03 -0400
>Subject: [PATCH] mm: revert fairness batching to before the watermarks were
> boosted
>
>Signed-off-by: Johannes Weiner 
>---
> include/linux/mmzone.h | 2 ++
> mm/page_alloc.c| 6 --
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>index 02069c2..4565b92 100644
>--- a/include/linux/mmzone.h
>+++ b/include/linux/mmzone.h
>@@ -327,6 +327,8 @@ struct zone {
>   /* zone watermarks, access with *_wmark_pages(zone) macros */
>   unsigned long watermark[NR_WMARK];
> 
>+  unsigned long fairbatch;
>+
>   unsigned long nr_reserved_highatomic;
> 
>   /*
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 6903b69..33387ab 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -2889,7 +2889,7 @@ static void reset_alloc_batches(struct zone 
>*preferred_zone)
> 
>   do {
>   mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
>   clear_bit(ZONE_FAIR_DEPLETED, &zone->flags);
>   } while (zone++ != preferred_zone);
>@@ -6842,6 +6842,8 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_MIN] = tmp;
>   }
> 
>+  zone->fairbatch = tmp >> 2;
>+
>   /*
>* Set the kswapd watermarks distance according to the
>* scale factor in proportion to available memory, but
>@@ -6855,7 +6857,7 @@ static void __setup_per_zone_wmarks(void)
>   zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
> 
>   __mod_zone_page_state(zone, NR_ALLOC_BATCH,
>-  high_wmark_pages(zone) - low_wmark_pages(zone) -
>+  zone->fairbatch -
>   atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
> 
>   spin_unlock_irqrestore(&zone->lock, flags);
>-- 
>2.8.2
=
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
  gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/ivb43/pixz

commit: 
  3ed3a4f0ddffece942bb2661924d87be4ce63cb7
  795ae7a0de6b834a0cc202aa55c190ef81496665
  v4.7-rc2
  1fe49ba5002a50aefd5b6c4913e61eff86ac7253

3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55   v4.7-rc2 
1fe49ba5002a50aefd5b6c4913
 -- -- 
--
   fail:runs  %reproductionfail:runs  %reproductionfail:runs  
%reproductionfail:runs
   | | | | |
 | |
   :40%:70%:4   
50%   2:4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
   :4   50%   2:70%:4   
 0%:4 kmsg.Spurious_LAPIC_timer_interrupt_on_cpu
   :40%:7   14%   1:4   
25%   1:4 kmsg.igb#:#:#:exceed_max#second
 %stddev %change %stddev %change %stddev 
%change %stddev
 \  |\  |\  
|\
  78505362 ±  0%  -9.2%   71298182 ±  0% -11.8%   69280014 ±  0%  
-9.1%   71350485 ±  0%  pixz.throughput
   55

[PATCH 5/5] Staging: comedi: dmm32at: Prefer using the BIT macro

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya

This fixes all occurences of (1<
---
 drivers/staging/comedi/drivers/dmm32at.c | 86 
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dmm32at.c 
b/drivers/staging/comedi/drivers/dmm32at.c
index 958c0d4..4ca6104 100644
--- a/drivers/staging/comedi/drivers/dmm32at.c
+++ b/drivers/staging/comedi/drivers/dmm32at.c
@@ -46,73 +46,73 @@
 #define DMM32AT_AI_START_CONV_REG  0x00
 #define DMM32AT_AI_LSB_REG 0x00
 #define DMM32AT_AUX_DOUT_REG   0x01
-#define DMM32AT_AUX_DOUT2  (1 << 2)  /* J3.42 - OUT2 (OUT2EN) */
-#define DMM32AT_AUX_DOUT1  (1 << 1)  /* J3.43 */
-#define DMM32AT_AUX_DOUT0  (1 << 0)  /* J3.44 - OUT0 (OUT0EN) */
+#define DMM32AT_AUX_DOUT2  BIT(2)  /* J3.42 - OUT2 (OUT2EN) */
+#define DMM32AT_AUX_DOUT1  BIT(1)  /* J3.43 */
+#define DMM32AT_AUX_DOUT0  BIT(0)  /* J3.44 - OUT0 (OUT0EN) */
 #define DMM32AT_AI_MSB_REG 0x01
 #define DMM32AT_AI_LO_CHAN_REG 0x02
 #define DMM32AT_AI_HI_CHAN_REG 0x03
 #define DMM32AT_AUX_DI_REG 0x04
-#define DMM32AT_AUX_DI_DACBUSY (1 << 7)
-#define DMM32AT_AUX_DI_CALBUSY (1 << 6)
-#define DMM32AT_AUX_DI3(1 << 3)  /* J3.45 - ADCLK 
(CLKSEL) */
-#define DMM32AT_AUX_DI2(1 << 2)  /* J3.46 - GATE12 
(GT12EN) */
-#define DMM32AT_AUX_DI1(1 << 1)  /* J3.47 - GATE0 
(GT0EN) */
-#define DMM32AT_AUX_DI0(1 << 0)  /* J3.48 - CLK0 
(SRC0) */
+#define DMM32AT_AUX_DI_DACBUSY BIT(7)
+#define DMM32AT_AUX_DI_CALBUSY BIT(6)
+#define DMM32AT_AUX_DI3BIT(3)  /* J3.45 - ADCLK 
(CLKSEL) */
+#define DMM32AT_AUX_DI2BIT(2)  /* J3.46 - GATE12 
(GT12EN) */
+#define DMM32AT_AUX_DI1BIT(1)  /* J3.47 - GATE0 
(GT0EN) */
+#define DMM32AT_AUX_DI0BIT(0)  /* J3.48 - CLK0 (SRC0) 
*/
 #define DMM32AT_AO_LSB_REG 0x04
 #define DMM32AT_AO_MSB_REG 0x05
 #define DMM32AT_AO_MSB_DACH(x) ((x) << 6)
 #define DMM32AT_FIFO_DEPTH_REG 0x06
 #define DMM32AT_FIFO_CTRL_REG  0x07
-#define DMM32AT_FIFO_CTRL_FIFOEN   (1 << 3)
-#define DMM32AT_FIFO_CTRL_SCANEN   (1 << 2)
-#define DMM32AT_FIFO_CTRL_FIFORST  (1 << 1)
+#define DMM32AT_FIFO_CTRL_FIFOEN   BIT(3)
+#define DMM32AT_FIFO_CTRL_SCANEN   BIT(2)
+#define DMM32AT_FIFO_CTRL_FIFORST  BIT(1)
 #define DMM32AT_FIFO_STATUS_REG0x07
-#define DMM32AT_FIFO_STATUS_EF (1 << 7)
-#define DMM32AT_FIFO_STATUS_HF (1 << 6)
-#define DMM32AT_FIFO_STATUS_FF (1 << 5)
-#define DMM32AT_FIFO_STATUS_OVF(1 << 4)
-#define DMM32AT_FIFO_STATUS_FIFOEN (1 << 3)
-#define DMM32AT_FIFO_STATUS_SCANEN (1 << 2)
+#define DMM32AT_FIFO_STATUS_EF BIT(7)
+#define DMM32AT_FIFO_STATUS_HF BIT(6)
+#define DMM32AT_FIFO_STATUS_FF BIT(5)
+#define DMM32AT_FIFO_STATUS_OVFBIT(4)
+#define DMM32AT_FIFO_STATUS_FIFOEN BIT(3)
+#define DMM32AT_FIFO_STATUS_SCANEN BIT(2)
 #define DMM32AT_FIFO_STATUS_PAGE_MASK  (3 << 0)
 #define DMM32AT_CTRL_REG   0x08
-#define DMM32AT_CTRL_RESETA(1 << 5)
-#define DMM32AT_CTRL_RESETD(1 << 4)
-#define DMM32AT_CTRL_INTRST(1 << 3)
+#define DMM32AT_CTRL_RESETABIT(5)
+#define DMM32AT_CTRL_RESETDBIT(4)
+#define DMM32AT_CTRL_INTRSTBIT(3)
 #define DMM32AT_CTRL_PAGE_8254 (0 << 0)
-#define DMM32AT_CTRL_PAGE_8255 (1 << 0)
+#define DMM32AT_CTRL_PAGE_8255 BIT(0)
 #define DMM32AT_CTRL_PAGE_CALIB(3 << 0)
 #define DMM32AT_AI_STATUS_REG  0x08
-#define DMM32AT_AI_STATUS_STS  (1 << 7)
-#define DMM32AT_AI_STATUS_SD1  (1 << 6)
-#define DMM32AT_AI_STATUS_SD0  (1 << 5)
+#define DMM32AT_AI_STATUS_STS  BIT(7)
+#define DMM32AT_AI_STATUS_SD1  BIT(6)
+#define DMM32AT_AI_STATUS_SD0  BIT(5)
 #define DMM32AT_AI_STATUS_ADCH_MASK(0x1f << 0)
 #define DMM32AT_INTCLK_REG 0x09
-#define DMM32AT_INTCLK_ADINT   (1 << 7)
-#define DMM32AT_INTCLK_DINT(1 << 6)
-#define DMM32AT_INTCLK_TINT(1 << 5)
-#define DMM32AT_INTCLK_CLKEN   (1 << 1)  /* 1=see below  0=software */
-#define DMM32AT_INTCLK_CLKSEL  (1 << 0)  /* 1=OUT2  0=EXTCLK */
+#define DMM32AT_INTCLK_ADINT   BIT(7)
+#define DMM32AT_INTCLK_DINTBIT(6)
+#define DMM32AT_INTCLK_TINTBIT(5)
+#define DMM32AT_INTCLK_CLKEN   BIT(1)  /* 1=see below  0=software */
+#define DMM32AT_INTCLK_CLKSEL  BIT(0)  /* 1=OUT2  0=EXTCLK */
 #define DMM32AT_CTRDIO_CFG_REG 0x0a
-#define DMM32AT_CTRDIO_CFG_FREQ12  (1 << 7)  /* CLK12 1=100KHz 0=10MHz */
-#define DMM32AT_CTRDIO_CFG_FREQ0   (1 << 6)  /* CLK0  1=10KHz  0

Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Heiko Carstens

On Wed, Jun 08, 2016 at 07:17:35AM +0200, Christian Borntraeger wrote:
> On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> > etr_ptff definitions are moved and renamed but we missed updating them
> > here and as a result s390 defconfig and allmodconfig was failing with
> > the error:
> > arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> > 
> > Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> > Signed-off-by: Sudip Mukherjee 
> 
> Thank you for the report and patch.
> 
> This is linux-next only. Its a conflict between my kvms390 queue and 
> Martins s390 queue. We cannot apply this directly as it would break
> the build of my tree when not merged in next. (and it does not apply
> on Martins tree).
> 
> I will have a look how to fix that up.

We could ask Stephen Rothwell to apply the patch only to linux-next? ;)

> > ---
> > 
> > s390 defconfig build log is at:
> > https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> > 
> >  arch/s390/kvm/kvm-s390.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index fa51aef..3039eaf 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -29,7 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
> > }
> > 
> > if (test_facility(28)) /* TOD-clock steering */
> > -   etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> > +   ptff(kvm_s390_available_subfunc.ptff,
> > +sizeof(kvm_s390_available_subfunc.ptff),
> > +PTFF_QAF);
> > 
> > if (test_facility(17)) { /* MSA */
> > __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
> > 
>

Re: [PATCH v6 3/6] crypto: AF_ALG -- add asymmetric cipher interface

2016-06-07 Thread Stephan Mueller

Am Dienstag, 7. Juni 2016, 17:28:07 schrieb Mat Martineau:

Hi Mat,

> > +   used = ctx->used;
> > +
> > +   /* convert iovecs of output buffers into scatterlists */
> > +   while (iov_iter_count(&msg->msg_iter)) {
> > +   /* make one iovec available as scatterlist */
> > +   err = af_alg_make_sg(&ctx->rsgl[cnt], &msg->msg_iter,
> > +iov_iter_count(&msg->msg_iter));
> > +   if (err < 0)
> > +   goto unlock;
> > +   usedpages += err;
> > +   /* chain the new scatterlist with previous one */
> > +   if (cnt)
> > +   af_alg_link_sg(&ctx->rsgl[cnt - 1], &ctx->rsgl[cnt]);
> > +
> > +   iov_iter_advance(&msg->msg_iter, err);
> > +   cnt++;
> > +   }
> > +
> > +   /* ensure output buffer is sufficiently large */
> > +   if (usedpages < akcipher_calcsize(ctx)) {
> > +   err = -EMSGSIZE;
> > +   goto unlock;
> > +   }
> 
> Why is the size of the output buffer enforced here instead of depending on
> the algorithm implementation?

akcipher_calcsize calls crypto_akcipher_maxsize to get the maximum size the 
algorithm generates as output during its operation.

The code ensures that the caller provided at least that amount of memory for 
the kernel to store its data in. This check therefore is present to ensure the 
kernel does not overstep memory boundaries in user space.

What is your concern?

Thanks

Ciao
Stephan

[PATCH 3/5] Staging: comedi: das800: fix comment issue

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya

This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 102 
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index b02f122..0680d87 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -1,56 +1,56 @@
 /*
-comedi/drivers/das800.c
-Driver for Keitley das800 series boards and compatibles
-Copyright (C) 2000 Frank Mori Hess 
-
-COMEDI - Linux Control and Measurement Device Interface
-Copyright (C) 2000 David A. Schleef 
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-*/
+ * comedi/drivers/das800.c
+ * Driver for Keitley das800 series boards and compatibles
+ * Copyright (C) 2000 Frank Mori Hess 
+ *
+ * COMEDI - Linux Control and Measurement Device Interface
+ * Copyright (C) 2000 David A. Schleef 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
 /*
-Driver: das800
-Description: Keithley Metrabyte DAS800 (& compatibles)
-Author: Frank Mori Hess 
-Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
-  DAS-802 (das-802),
-  [Measurement Computing] CIO-DAS800 (cio-das800),
-  CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
-  CIO-DAS802/16 (cio-das802/16)
-Status: works, cio-das802/16 untested - email me if you have tested it
-
-Configuration options:
-  [0] - I/O port base address
-  [1] - IRQ (optional, required for timed or externally triggered conversions)
-
-Notes:
-   IRQ can be omitted, although the cmd interface will not work without it.
-
-   All entries in the channel/gain list must use the same gain and be
-   consecutive channels counting upwards in channel number (these are
-   hardware limitations.)
-
-   I've never tested the gain setting stuff since I only have a
-   DAS-800 board with fixed gain.
-
-   The cio-das802/16 does not have a fifo-empty status bit!  Therefore
-   only fifo-half-full transfers are possible with this card.
-
-cmd triggers supported:
-   start_src:  TRIG_NOW | TRIG_EXT
-   scan_begin_src: TRIG_FOLLOW
-   scan_end_src:   TRIG_COUNT
-   convert_src:TRIG_TIMER | TRIG_EXT
-   stop_src:   TRIG_NONE | TRIG_COUNT
-*/
+ * Driver: das800
+ * Description: Keithley Metrabyte DAS800 (& compatibles)
+ * Author: Frank Mori Hess 
+ * Devices: [Keithley Metrabyte] DAS-800 (das-800), DAS-801 (das-801),
+ * DAS-802 (das-802),
+ * [Measurement Computing] CIO-DAS800 (cio-das800),
+ * CIO-DAS801 (cio-das801), CIO-DAS802 (cio-das802),
+ * CIO-DAS802/16 (cio-das802/16)
+ * Status: works, cio-das802/16 untested - email me if you have tested it
+ *
+ * Configuration options:
+ * [0] - I/O port base address
+ *  [1] - IRQ (optional, required for timed or externally triggered 
conversions)
+ *
+ * Notes:
+ * IRQ can be omitted, although the cmd interface will not work without it.
+ *
+ * All entries in the channel/gain list must use the same gain and be
+ * consecutive channels counting upwards in channel number (these are
+ * hardware limitations.)
+ *
+ * I've never tested the gain setting stuff since I only have a
+ * DAS-800 board with fixed gain.
+ *
+ * The cio-das802/16 does not have a fifo-empty status bit!  Therefore
+ * only fifo-half-full transfers are possible with this card.
+ *
+ * cmd triggers supported:
+ * start_src:  TRIG_NOW | TRIG_EXT
+ * scan_begin_src: TRIG_FOLLOW
+ * scan_end_src:   TRIG_COUNT
+ * convert_src:TRIG_TIMER | TRIG_EXT
+ * stop_src:   TRIG_NONE | TRIG_COUNT
+ */
 
 #include 
 #include 
-- 
1.9.1

[PATCH 1/5] Staging: comedi: das16: fix blank line

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya

This fixes up a blank line after function/struct/union/enum check found
by the checkpatch.pl tool

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index fd8e0b7..4d6e581 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -198,6 +198,7 @@ enum {
das16_pg_1601,
das16_pg_1602,
 };
+
 static const int *const das16_gainlists[] = {
NULL,
das16jr_gainlist,
-- 
1.9.1

[PATCH 2/5] Staging: comedi: das16: fix Block comment

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya

This fixes up a WARNING: 'Block comments use a trailing */ on a
separate line'found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das16.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das16.c 
b/drivers/staging/comedi/drivers/das16.c
index 4d6e581..ef345dc 100644
--- a/drivers/staging/comedi/drivers/das16.c
+++ b/drivers/staging/comedi/drivers/das16.c
@@ -429,8 +429,10 @@ static const struct das16_board das16_boards[] = {
},
 };
 
-/* Period for timer interrupt in jiffies.  It's a function
- * to deal with possibility of dynamic HZ patches  */
+/*
+ * Period for timer interrupt in jiffies.  It's a function
+ * to deal with possibility of dynamic HZ patches
+ */
 static inline int timer_period(void)
 {
return HZ / 20;
-- 
1.9.1

[PATCH 4/5] Staging: comedi: das800: Prefer unsigned int instead of unsigned

2016-06-07 Thread Ravishankar Karkala Mallikarjunayya

This fixes up a WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
found by the checkpatch.pl tool.

Signed-off-by: Ravishankar Karkala Mallikarjunayya 
---
 drivers/staging/comedi/drivers/das800.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/comedi/drivers/das800.c 
b/drivers/staging/comedi/drivers/das800.c
index 0680d87..ef48c48 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -218,7 +218,7 @@ struct das800_private {
 };
 
 static void das800_ind_write(struct comedi_device *dev,
-unsigned val, unsigned reg)
+unsigned int val, unsigned int reg)
 {
/*
 * Select dev->iobase + 2 to be desired register
@@ -228,7 +228,7 @@ static void das800_ind_write(struct comedi_device *dev,
outb(val, dev->iobase + 2);
 }
 
-static unsigned das800_ind_read(struct comedi_device *dev, unsigned reg)
+static unsigned int das800_ind_read(struct comedi_device *dev, unsigned int 
reg)
 {
/*
 * Select dev->iobase + 7 to be desired register
-- 
1.9.1

Re: [PATCH 1/3] usb: pci: Remove unnecessary pci_set_drvdata().

2016-06-07 Thread Greg KH

On Wed, May 11, 2016 at 06:08:15PM +0530, Sandhya Bankar wrote:
> Unnecessary [platform|pci]_set_drvdata() have been removed since the driver 
> core clears the driver data to NULLafter device release or on probe failure. 
> There is no need to manually clear the
> device driver data to NULL.

Please fix your changelog text to be wrapped at 72 columns like it is
supposed to be.

thanks,

greg k-h

Re: NVMe over Fabrics target implementation

2016-06-07 Thread Nicholas A. Bellinger

On Tue, 2016-06-07 at 12:55 +0200, Christoph Hellwig wrote:
> There is absolutely no point in dragging in an overcomplicated configfs 
> structure for a very simple protocol which also is very different from
> SCSI in it's nitty gritty details.

Please be more specific wrt the two individual points that have been
raised.

>  Keeping the nvme target self contains
> allows it to be both much simpler and much easier to understand, as well
> as much better testable - see the amount of test coverage we could easily
> add for example.

I disagree.

> 
> Or to put it the other way around - if there was any major synergy in
> reusing the SCSI target code that just shows we're missing functionality
> in the block layer or configfs.
> 

To reiterate the points again.

*) Extensible to multiple types of backend drivers.

nvme-target needs a way to absorb new backend drivers, that
does not effect existing configfs group layout or attributes.

Looking at the nvmet/configfs layout as-is, there are no multiple
backend types defined, nor a way to control backend feature bits
exposed to nvme namespaces at runtime.

What is being proposed is a way to share target-core backends via
existing configfs symlinks across SCSI and NVMe targets.

Which means:

   - All I/O state + memory submission is done at RCU protected
 se_device level via sbc_ops
   - percpu reference counting is done outside of target-core
   - Absorb all nvmet/io-cmd optimizations into target_core_iblock.c
   - Base starting point for features in SCSI + NVMe that span
 across multiple endpoints and instances (reservations + APTPL, 
 multipath, copy-offload across fabric types)

Using target-core backends means we get features like T10-PI and
sbc_ops->write_same for free that don't exist in nvmet, and can
utilize a common set of backend drivers for SCSI and NVMe via an
existing configfs ABI and python userspace community.

And to the second, and more important point for defining a configfs ABI
that works for both today's requirements, as well into the 2020s
without breaking user-space compatibility.

As-is, the initial design using top level nvmet configfs symlinks of
subsystem groups into individual port + host groups does not scale.

That is, it currently does:

  - Sequential list lookup under global rw_mutex of top-level nvmet_port
and nvmet_host symlink ->allow_link() and ->drop_link() configfs
callbacks.
  - nvmet_fabrics_ops->add_port() callback invoked under same global
rw mutex.

This is very bad for several reasons.

As-is, this blocks all other configfs port + host operations from
occurring even during normal operation, which makes it quite useless for
any type of multi-tenant target environment where the individual target
endpoints *must* be able to operate independently.

Seriously, there is never a good reason why configfs group or item
callbacks should be performing list lookup under a global lock at
this level.

Why does it ever make sense for $SUBSYSTEM_NQN_0 with $PORT_DRIVER_FOO
to block operation of $SUBSYSTEM_NQN_1 with $PORT_DRIVER_BAR..?

A simple example where this design breaks down quickly is a NVMf
ops->add_port() call that requires a HW reset, or say reloading of
firmware that can take multiple seconds. (qla2xxx comes to mind).

There is a simple test to highlight this limitation.  Take any
nvme-target driver that is capable of multiple ports, and introduce
a sleep(5) into each ops->add_port() call.

Now create 256 different subsystem NQNs with 256 different ports
across four different user-space processes.

What happens to other subsystems, ports and host groups configfs
symlinks when this occurs..?

What happens to the other user-space processes..?

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Ming Lin

On Tue, 2016-06-07 at 22:49 -0600, Jens Axboe wrote:
> On 06/06/2016 03:21 PM, Christoph Hellwig wrote:
> > From: Ming Lin 
> > 
> > For some protocols like NVMe over Fabrics we need to be able to
> > send
> > initialization commands to a specific queue.
> > 
> > Based on an earlier patch from Christoph Hellwig .
> > 
> > Signed-off-by: Ming Lin 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >   block/blk-mq.c | 33 +
> >   include/linux/blk-mq.h |  2 ++
> >   2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 29cbc1b..7bb45ed 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct
> > request_queue *q, int rw,
> >   }
> >   EXPORT_SYMBOL(blk_mq_alloc_request);
> > 
> > +struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> > int rw,
> > +   unsigned int flags, unsigned int hctx_idx)
> > +{
> > +   struct blk_mq_hw_ctx *hctx;
> > +   struct blk_mq_ctx *ctx;
> > +   struct request *rq;
> > +   struct blk_mq_alloc_data alloc_data;
> > +   int ret;
> > +
> > +   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> > +   if (ret)
> > +   return ERR_PTR(ret);
> > +
> > +   hctx = q->queue_hw_ctx[hctx_idx];
> > +   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
> > +
> > +   blk_mq_set_alloc_data(&alloc_data, q, flags, ctx, hctx);
> > +
> > +   rq = __blk_mq_alloc_request(&alloc_data, rw);
> > +   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
> > +   __blk_mq_run_hw_queue(hctx);
> > +
> > +   rq =  __blk_mq_alloc_request(&alloc_data, rw);
> > +   }
> 
> Why are we duplicating this code here? If NOWAIT isn't set, then
> we'll
> always return a request. bt_get() will run the queue for us, if it
> needs
> to. blk_mq_alloc_request() does this too, and I'm guessing that code
> was
> just copied. I'll fix that up. Looks like this should just be:
> 
>   rq = __blk_mq_alloc_request(&alloc_data, rw);
>   if (rq)
>   return rq;
> 
>   blk_queue_exit(q);
>   return ERR_PTR(-EWOULDBLOCK);
> 
> for this case.

Yes,

But the bt_get() reminds me that this patch actually has a problem.

blk_mq_alloc_request_hctx() ->
  __blk_mq_alloc_request() ->
    blk_mq_get_tag() -> 
      __blk_mq_get_tag() ->
        bt_get() ->
          blk_mq_put_ctx(data->ctx);

Here are blk_mq_get_ctx() and blk_mq_put_ctx().

static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
{   
return __blk_mq_get_ctx(q, get_cpu());
} 

static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
{
put_cpu();
}

blk_mq_alloc_request_hctx() calls __blk_mq_get_ctx() instead
of blk_mq_get_ctx(). Then reason is the "hctx" could belong to other
cpu. So blk_mq_get_ctx() doesn't work.

But then above put_cpu() in blk_mq_put_ctx() will trigger a WARNING
because we didn't do get_cpu() in blk_mq_alloc_request_hctx()

Re: [PATCH 3.10 000/143] 3.10.102-stable review

2016-06-07 Thread Willy Tarreau

On Tue, Jun 07, 2016 at 05:52:52PM -0700, Guenter Roeck wrote:
> Here we are;
> 
> Build results:
>   total: 123 pass: 123 fail: 0
> Qemu test results:
>   total: 75 pass: 75 fail: 0
> 
> Details are available at http://kerneltests.org/builders.

Excellent, thank you Guenter!

Willy

Re: [PATCH] KVM: s390: fix build failure

2016-06-07 Thread Christian Borntraeger

On 06/07/2016 11:49 PM, Sudip Mukherjee wrote:
> etr_ptff definitions are moved and renamed but we missed updating them
> here and as a result s390 defconfig and allmodconfig was failing with
> the error:
> arch/s390/kvm/kvm-s390.c:230:45: error: 'ETR_PTFF_QAF' undeclared
> 
> Fixes: cc8f94656487 ("s390/time: move PTFF definitions")
> Signed-off-by: Sudip Mukherjee 

Thank you for the report and patch.

This is linux-next only. Its a conflict between my kvms390 queue and 
Martins s390 queue. We cannot apply this directly as it would break
the build of my tree when not merged in next. (and it does not apply
on Martins tree).

I will have a look how to fix that up.


> ---
> 
> s390 defconfig build log is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/135776067
> 
>  arch/s390/kvm/kvm-s390.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fa51aef..3039eaf 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -227,7 +227,9 @@ static void kvm_s390_cpu_feat_init(void)
>   }
> 
>   if (test_facility(28)) /* TOD-clock steering */
> - etr_ptff(kvm_s390_available_subfunc.ptff, ETR_PTFF_QAF);
> + ptff(kvm_s390_available_subfunc.ptff,
> +  sizeof(kvm_s390_available_subfunc.ptff),
> +  PTFF_QAF);
> 
>   if (test_facility(17)) { /* MSA */
>   __cpacf_query(CPACF_KMAC, kvm_s390_available_subfunc.kmac);
>

Re: [PATCH] mm/zsmalloc: add trace events for zs_compact

2016-06-07 Thread Minchan Kim

On Wed, Jun 08, 2016 at 09:48:30AM +0800, Ganesh Mahendran wrote:
> Hi, Minchan:
> 
> 2016-06-08 8:16 GMT+08:00 Minchan Kim :
> > Hello Ganesh,
> >
> > On Tue, Jun 07, 2016 at 04:56:44PM +0800, Ganesh Mahendran wrote:
> >> Currently zsmalloc is widely used in android device.
> >> Sometimes, we want to see how frequently zs_compact is
> >> triggered or how may pages freed by zs_compact(), or which
> >> zsmalloc pool is compacted.
> >>
> >> Most of the time, user can get the brief information from
> >> trace_mm_shrink_slab_[start | end], but in some senario,
> >> they do not use zsmalloc shrinker, but trigger compaction manually.
> >> So add some trace events in zs_compact is convenient. Also we
> >> can add some zsmalloc specific information(pool name, total compact
> >> pages, etc) in zsmalloc trace.
> >
> > Sorry, I cannot understand what's the problem now and what you want to
> > solve. Could you elaborate it a bit?
> >
> > Thanks.
> 
> We have backported the zs_compact() to our product(kernel 3.18).
> It is usefull for a longtime running device.
> But there is not a convenient way to get the detailed information
> of zs_comapct() which is usefull for  performance optimization.
> Information about how much time zs_compact used, which pool is
> compacted, how many page freed, etc.

You can know how many pages are freed by object compaction via mm_stat
each /sys/block/zram-id/mm_stat. And you can use function_graph to know
how much time zs_compact used.


> With these information, we will know what is going on in zs_comapct.
> And draw the relation between free mem and zs_comapct.
> 
> >
> >>
> >> This patch add two trace events for zs_compact(), below the trace log:
> >> -
> >> root@land:/ # cat /d/tracing/trace
> >>  kswapd0-125   [007] ...1   174.176979: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [007] ...1   174.181967: zsmalloc_compact_end: pool 
> >> zram0: 608 pages compacted(total 1794)
> >>  kswapd0-125   [000] ...1   184.134475: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [000] ...1   184.135010: zsmalloc_compact_end: pool 
> >> zram0: 62 pages compacted(total 1856)
> >>  kswapd0-125   [003] ...1   226.927221: zsmalloc_compact_start: 
> >> pool zram0
> >>  kswapd0-125   [003] ...1   226.928575: zsmalloc_compact_end: pool 
> >> zram0: 250 pages compacted(total 2106)
> >> -
> >>
> >> Signed-off-by: Ganesh Mahendran 
> >> ---
> >>  include/trace/events/zsmalloc.h | 56 
> >> +
> >>  mm/zsmalloc.c   | 10 
> >>  2 files changed, 66 insertions(+)
> >>  create mode 100644 include/trace/events/zsmalloc.h
> >>
> >> diff --git a/include/trace/events/zsmalloc.h 
> >> b/include/trace/events/zsmalloc.h
> >> new file mode 100644
> >> index 000..3b6f14e
> >> --- /dev/null
> >> +++ b/include/trace/events/zsmalloc.h
> >> @@ -0,0 +1,56 @@
> >> +#undef TRACE_SYSTEM
> >> +#define TRACE_SYSTEM zsmalloc
> >> +
> >> +#if !defined(_TRACE_ZSMALLOC_H) || defined(TRACE_HEADER_MULTI_READ)
> >> +#define _TRACE_ZSMALLOC_H
> >> +
> >> +#include 
> >> +#include 
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_start,
> >> +
> >> + TP_PROTO(const char *pool_name),
> >> +
> >> + TP_ARGS(pool_name),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + ),
> >> +
> >> + TP_printk("pool %s",
> >> +   __entry->pool_name)
> >> +);
> >> +
> >> +TRACE_EVENT(zsmalloc_compact_end,
> >> +
> >> + TP_PROTO(const char *pool_name, unsigned long pages_compacted,
> >> + unsigned long pages_total_compacted),
> >> +
> >> + TP_ARGS(pool_name, pages_compacted, pages_total_compacted),
> >> +
> >> + TP_STRUCT__entry(
> >> + __field(const char *, pool_name)
> >> + __field(unsigned long, pages_compacted)
> >> + __field(unsigned long, pages_total_compacted)
> >> + ),
> >> +
> >> + TP_fast_assign(
> >> + __entry->pool_name = pool_name;
> >> + __entry->pages_compacted = pages_compacted;
> >> + __entry->pages_total_compacted = pages_total_compacted;
> >> + ),
> >> +
> >> + TP_printk("pool %s: %ld pages compacted(total %ld)",
> >> +   __entry->pool_name,
> >> +   __entry->pages_compacted,
> >> +   __entry->pages_total_compacted)
> >> +);
> >> +
> >> +#endif /* _TRACE_ZSMALLOC_H */
> >> +
> >> +/* This part must be outside protection */
> >> +#include 
> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> >> index 213d0e1..441b9f7 100644
> >> --- a/mm/zsmalloc.c
> >> +++ b/mm/zsmalloc.c
> >> @@ -30,6 +30,8 @@
> >>
> >>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >>
> >> +#define CREATE_TRACE_POINTS
> >> +
> >>  #include 
> >>  #include 
> >>  #inc

Re: [PATCH 09/10] x86, asm: Use CC_SET()/CC_OUT() and static_cpu_has() in archrandom.h

2016-06-07 Thread Andy Lutomirski

On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> Use CC_SET()/CC_OUT() and static_cpu_has().  This produces code good
> enough to eliminate ad hoc use of alternatives in ,
> greatly simplifying the code.

Looks reasonable.

RE: [PATCH 2/2] aer: add support aer interrupt with none MSI/MSI-X/INTx mode

2016-06-07 Thread Po Liu

Hi Bjorn,

Thanks for the kindly reply. All these are helpful.

>  From: Bjorn Helgaas [mailto:helg...@kernel.org]
>  On Wed, June 08, 2016 6:47 AM
>  
>  On Tue, Jun 07, 2016 at 10:07:40AM +, Po Liu wrote:
>  > Hi Bjorn,
>  >
>  > >  -Original Message-
>  > >
>  > >  On Mon, Jun 06, 2016 at 10:01:44AM -0400, Murali Karicheri wrote:
>  > >  > On 06/06/2016 03:32 AM, Po Liu wrote:
>  > >  > > Hi Bjorn,
>  > >  > > I confirm we met same problem with KeyStone base on DesignWare
>  > > design.
>  > >  > >
>  > >  > >
>  > >  > > Best regards,
>  > >  > > Liu Po
>  > >  > >
>  > >  > >>  -Original Message-
>  > >  > >>  From: Bjorn Helgaas [mailto:helg...@kernel.org]  > >>  Sent:
>  > > Saturday, June 04, 2016 11:49 AM  > >>  To: Murali Karicheri  > >>
>  > > Cc: Po Liu; linux-...@vger.kernel.org; linux-arm-  > >>
>  > > ker...@lists.infradead.org; linux-kernel@vger.kernel.org;  > >>
>  > > devicet...@vger.kernel.org; Arnd Bergmann; Roy Zang; Marc Zyngier;
>  > > > >> Stuart Yoder; Yang-Leo Li; Minghuan Lian; Bjorn Helgaas; Shawn
>  > > Guo;  > >> Mingkai Hu; Rob Herring  > >>  Subject: Re: [PATCH 2/2]
>  > > aer: add support aer interrupt with none  > >> MSI/MSI-X/INTx mode
>  > > > >>  > >>  On Fri, Jun 03, 2016 at 01:31:11PM -0400, Murali
>  > > Karicheri wrote:
>  > >  > >>  > Po,
>  > >  > >>  >
>  > >  > >>  > Sorry to hijack your discussion, but the problem seems to
>  > > be  > >> same for  > Keystone PCI controller which is also
>  > > designware (old
>  > >  version) based.
>  > >  > >>  >
>  > >  > >>  > On 06/03/2016 12:09 AM, Bjorn Helgaas wrote:
>  > >  > >>  > > On Thu, Jun 02, 2016 at 11:37:28AM -0400, Murali
>  > > Karicheri
>  > >  wrote:
>  > >  > >>  > >> On 06/02/2016 09:55 AM, Bjorn Helgaas wrote:
>  > >  > >>  > >>> On Thu, Jun 02, 2016 at 05:01:19AM +, Po Liu wrote:
>  > >  > >>  > >  -Original Message-  > >  From: Bjorn
>  > > Helgaas  > >> [mailto:helg...@kernel.org]  > >  Sent: Thursday,
>  > > June 02, 2016  > >> 11:48 AM  > >  To: Po Liu  > >  Cc:
>  > >  > >> linux-...@vger.kernel.org;  > >  > >>
>  > > linux-arm-ker...@lists.infradead.org;
>  > >  > >>  > >  linux-kernel@vger.kernel.org;
>  > > devicet...@vger.kernel.org;  > >> Arnd  > > Bergmann;  Roy Zang;
>  > > Marc Zyngier; Stuart Yoder;  > >> Yang-Leo Li;  > > Minghuan
>  > > Lian; Bjorn  Helgaas; Shawn Guo;  > >> Mingkai Hu; Rob  > >
>  > > Herring  > >  Subject: Re: [PATCH 2/2]  > >> aer: add support
>  > > aer interrupt with  > > none  MSI/MSI-X/INTx  > >> mode  > >
>  > > > >  [+cc Rob]  > >  > >  Hi Po,  >  > >> >  > >
>  > > On Thu, May 26, 2016 at 02:00:06PM +0800, Po Liu  > >> wrote:
>  > >  > >>  > >  > On some platforms, root port doesn't support  > >>
>  > > MSI/MSI-X/INTx  in RC mode.
>  > >  > >>  > >  > When chip support the aer interrupt with none  >
>  > > >> MSI/MSI-X/INTx  > > mode,  > maybe there is interrupt line
>  > > for  > >> aer pme etc. Search  > > the interrupt  > number in
>  > > the fdt  file.
>  > >  > >>  > >
>  > >  > >>  > >  My understanding is that AER interrupt signaling can
>  > > be  > >> done  > > via INTx,  MSI, or MSI-X (PCIe spec r3.0, sec
>  > > 6.2.4.1.2).
>  > >  > >>  > > Apparently your device  doesn't support MSI or MSI-X.
>  > > Are  > >> you  > > saying it doesn't support INTx  either?  How
>  > > is the  > >> interrupt  you're requesting here different from INTx?
>  > >  > >>  > 
>  > >  > >>  >  Layerscape use none of MSI or MSI-X or INTx to
>  > > indicate the  > >> >  devices or root error in RC mode. But use
>  > > an independent SPI  > >> >  interrupt(arm interrupt controller)
>  line.
>  > >  > >>  > >>>
>  > >  > >>  > >>> The Root Port is a PCI device and should follow the
>  > > normal  > >> PCI  > >>> rules for interrupts.  As far as I
>  > > understand, that  > >> means it  > >>> should use MSI, MSI-X, or
>  > > INTx.  If your Root Port  > >> doesn't use MSI  > >>> or MSI-X, it
>  > > should use INTx, the  > >> PCI_INTERRUPT_PIN register  > >>> should
>  > > tell us which (INTA/  > >> INTB/etc.), and  PCI_COMMAND_INTX_DISABLE
>  should work to disable it.
>  > >  > >>  > >>> That's all from the PCI point of view, of course.
>  > >  > >>  > >>
>  > >  > >>  > >> I am faced with the same issue on Keystone PCI hardware
>  > > and  > >> it has  > >> been on my TODO list  for quite some time.
>  > > Keystone  > >> PCI hardware  > >> also doesn't use MSI or MSI-X or
>  > > INTx for  > >> reporting errors received  > >> at the root port, but
>  > > use a  > >> platform interrupt instead (not  > >> complaint to PCI
>  > > standard as  > >> per PCI base spec). So I would need  > >> similar
>  > > change to have  > >> the error interrupt passed to the aer  > >>
>  > > driver. So there are  > >> hardware out there like Keystone which
>  > > requires to support this  through platform IRQ.
>  > >  > >>  > >
>  > >  > >>  >

Re: [PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

2016-06-07 Thread Horng-Shyang Liao

Hi Matthias,

On Tue, 2016-06-07 at 19:04 +0200, Matthias Brugger wrote:
> 
> On 30/05/16 05:19, HS Liao wrote:
> > This patch is first version of Mediatek Command Queue(CMDQ) driver. The
> > CMDQ is used to help read/write registers with critical time limitation,
> > such as updating display configuration during the vblank. It controls
> > Global Command Engine (GCE) hardware to achieve this requirement.
> > Currently, CMDQ only supports display related hardwares, but we expect
> > it can be extended to other hardwares for future requirements.
> >
> > Signed-off-by: HS Liao 
> > Signed-off-by: CK Hu 
> > ---
> 
> [...]
> 
> > +static void cmdq_handle_error_done(struct cmdq *cmdq,
> > +  struct cmdq_thread *thread, u32 irq_flag)
> > +{
> > +   struct cmdq_task *task, *tmp, *curr_task = NULL;
> > +   u32 curr_pa;
> > +   struct cmdq_cb_data cmdq_cb_data;
> > +   bool err;
> > +
> > +   if (irq_flag & CMDQ_THR_IRQ_ERROR)
> > +   err = true;
> > +   else if (irq_flag & CMDQ_THR_IRQ_DONE)
> > +   err = false;
> > +   else
> > +   return;
> > +
> > +   curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
> > +
> > +   list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
> > +list_entry) {
> > +   if (curr_pa >= task->pa_base &&
> > +   curr_pa < (task->pa_base + task->command_size))
> > +   curr_task = task;
> > +   if (task->cb.cb) {
> > +   cmdq_cb_data.err = curr_task ? err : false;
> > +   cmdq_cb_data.data = task->cb.data;
> > +   task->cb.cb(cmdq_cb_data);
> > +   }
> 
> I think this is not right. If we got an IRQ_DONE, then the current task 
> is in execution, we should not call the callback until it has finished.

Thanks for your finding. This is a bug from CMDQ v6.
I will fix it in next version (CMDQ v9).

> 
> Regards,
> Matthias

Thanks,
HS

Re: [PATCH 04/10] x86, asm: define CC_SET() and CC_OUT() macros

2016-06-07 Thread Andy Lutomirski

On Tue, Jun 7, 2016 at 4:31 PM, H. Peter Anvin  wrote:
> From: "H. Peter Anvin" 
>
> The CC_SET() and CC_OUT() macros can be used together to take
> advantage of the new __GCC_ASM_FLAG_OUTPUTS__ feature in gcc 6+ while
> remaining backwards compatible.  CC_SET() generates a SET instruction
> on older compilers; CC_OUT() makes sure the output is received in the
> correct variable.

Nice.

Reviewed-by: Andy Lutomirski

Re: [PATCH v2 1/6] power: Introduce Broadcom kona reset driver

2016-06-07 Thread Sebastian Reichel

Hi,

On Tue, Jun 07, 2016 at 12:40:41PM -0700, Chris Brand wrote:
> On Mon, Jun 6, 2016 at 6:50 PM, Sebastian Reichel  wrote:
> > Hi,
> >
> > On Mon, Jun 06, 2016 at 09:42:03AM -0700, Chris Brand wrote:
> >> On Thu, Jun 2, 2016 at 7:38 PM, Sebastian Reichel  wrote:
> >> > Feel free to queue it via arm-soc with
> >> >
> >> > Acked-By: Sebastian Reichel 
> >> >
> >> > If I didn't overlook it, it's missing DT documentation, though.
> >>
> >> Thanks, Sebastian. Because this is effectively a move of code from
> >> arch/arm rather than new code, there's already dt documentation in
> >> Documentation/devicetree/bindings/reset/brcm,bcm21664-resetmgr.txt
> >
> > Ok. That directory is usually used for periphal reset controller.
> > Board/System reset controllers are usually documented in
> > .../bindings/power/reset (following kernel strucuture
> > [drivers/reset and drivers/power/reset]).
> >
> > -- Sebastian
> 
> Would you like me to send a separate patch to move that file ?

That would nice, thanks!

-- Sebastian


signature.asc
Description: PGP signature

linux-next: Tree for Jun 8

2016-06-07 Thread Stephen Rothwell

Hi all,

News: there will be no linux-next releases on Friday or Monday, so the
release following tomorrow's will be next-20160614.

Changes since 20160607:

Removed tree: drm-vc4 (merged into the bcm2835 tree)

Dropped tree: amlogic (build failure)

My fixes tree contains:

  of: silence warnings due to max() usage

The amlogic tree still had its build failure so I dropped it for today.

The net-next tree gained a conflict against the net tree.

The clockevents tree still had its build failure so I used the version
from next-20160606.

I applied a supplied merge fix for a semantic conlict between the s390
and kvms390 trees.

Non-merge commits (relative to Linus' tree): 1881
 1809 files changed, 74075 insertions(+), 32709 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 232 trees (counting Linus' and 34 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (43c082e72745 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fixes/master (b31033aacbd0 of: silence warnings due to max() usage)
Merging kbuild-current/rc-fixes (b36fad65d61f kbuild: Initialize exported 
variables)
Merging arc-current/for-curr (ed6aefed726a Revert "ARCv2: 
spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential 
backoff")
Merging arm-current/fixes (e2dfb4b88014 ARM: fix PTRACE_SETVFPREGS on SMP 
systems)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (8a934efe9434 powerpc/pseries: Fix PCI config 
address for DDW)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (6b15d6650c53 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging net/master (a03e6fe56971 act_police: fix a crash during removal)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (3ec10d3a2ba5 ipvs: update real-server binding of outgoing 
connections in SIP-pe)
Merging wireless-drivers/master (182fd9eecb28 MAINTAINERS: Add file patterns 
for wireless device tree bindings)
Merging mac80211/master (6fe04128f158 mac80211: fix fast_tx header alignment)
Merging sound-current/for-linus (f90d83b30170 ALSA: hda - Fix headset mic 
detection problem for Dell machine)
Merging pci-current/for-linus (1a695a905c18 Linux 4.7-rc1)
Merging driver-core.current/driver-core-linus (1a695a905c18 Linux 4.7-rc1)
Merging tty.current/tty-linus (1a695a905c18 Linux 4.7-rc1)
Merging usb.current/usb-linus (7b2c17f82954 usb: musb: Stop bulk endpoint while 
queue is rotated)
Merging usb-gadget-fixes/fixes (50c763f8c1ba usb: dwc3: Set the ClearPendIN bit 
on Clear Stall EP command)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (1a695a905c18 Linux 4.7-rc1)
Merging char-misc.current/char-misc-linus (1a695a905c18 Linux 4.7-rc1)
Merging input-current/for-linus (540c26087bfb Input: xpad - fix rumble on Xbox 
One controllers with 2015 firmware)
Merging crypto-current/master (ab6a11a7c8ef crypto: ccp - Fi

Re: [PATCH 1/8] blk-mq: add blk_mq_alloc_request_hctx

2016-06-07 Thread Jens Axboe


On 06/06/2016 03:21 PM, Christoph Hellwig wrote:

From: Ming Lin 

For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.

Based on an earlier patch from Christoph Hellwig .

Signed-off-by: Ming Lin 
Signed-off-by: Christoph Hellwig 
---
  block/blk-mq.c | 33 +
  include/linux/blk-mq.h |  2 ++
  2 files changed, 35 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 29cbc1b..7bb45ed 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -266,6 +266,39 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, int rw,
  }
  EXPORT_SYMBOL(blk_mq_alloc_request);

+struct request *blk_mq_alloc_request_hctx(struct request_queue *q, int rw,
+   unsigned int flags, unsigned int hctx_idx)
+{
+   struct blk_mq_hw_ctx *hctx;
+   struct blk_mq_ctx *ctx;
+   struct request *rq;
+   struct blk_mq_alloc_data alloc_data;
+   int ret;
+
+   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+   if (ret)
+   return ERR_PTR(ret);
+
+   hctx = q->queue_hw_ctx[hctx_idx];
+   ctx = __blk_mq_get_ctx(q, cpumask_first(hctx->cpumask));
+
+   blk_mq_set_alloc_data(&alloc_data, q, flags, ctx, hctx);
+
+   rq = __blk_mq_alloc_request(&alloc_data, rw);
+   if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) {
+   __blk_mq_run_hw_queue(hctx);
+
+   rq =  __blk_mq_alloc_request(&alloc_data, rw);
+   }


Why are we duplicating this code here? If NOWAIT isn't set, then we'll
always return a request. bt_get() will run the queue for us, if it needs
to. blk_mq_alloc_request() does this too, and I'm guessing that code was
just copied. I'll fix that up. Looks like this should just be:

rq = __blk_mq_alloc_request(&alloc_data, rw);
if (rq)
return rq;

blk_queue_exit(q);
return ERR_PTR(-EWOULDBLOCK);

for this case.

--
Jens Axboe

Re: [PATCH v10 6/7] usb: pci-quirks: add Intel USB drcfg mux device

2016-06-07 Thread Greg Kroah-Hartman

On Thu, Jun 02, 2016 at 09:37:28AM +0800, Lu Baolu wrote:
> In some Intel platforms, a single usb port is shared between USB host
> and device controllers. The shared port is under control of a switch
> which is defined in the Intel vendor defined extended capability for
> xHCI.
> 
> This patch adds the support to detect and create the platform device
> for the port mux switch.

Why do you need a platform device for this?  You do nothing with this
device, why create it at all?

And why is it a platform device, isn't is really a PCI device?  Why
would you ever find a "platform" device below a PCI device?  Don't abuse
platform devices for things that aren't.  It makes me want to delete
that whole interface more and more...

greg k-h

Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES

2016-06-07 Thread Ganapatrao Kulkarni

On Wed, Jun 8, 2016 at 7:46 AM, Leizhen (ThunderTown)
 wrote:
>
>
> On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
>> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
>>  wrote:
>>>
>>>
>>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
 On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei  
 wrote:
> Some numa nodes may have no memory. For example:
> 1. cpu0 on node0
> 2. cpu1 on node1
> 3. device0 access the momory from node0 and node1 take the same time.

 i am wondering, if access to both nodes is same, then why you need numa.
 the example you are quoting is against the basic principle of "numa"
 what is device0 here? cpu?
>>> The device0 can also be a cpu. I drew a simple diagram:
>>>
>>>   cpu0 cpu1cpu2/device0
>>> ||  |
>>> ||  |
>>>DDR0 DDR1No DIMM slots or no DIMM plugged
>>>  (node0)  (node1) (node2)
>>>
>>
>> thanks for the clarification. your example is for 3 node system, where
>> third node is memory less node.
>> do you see any issue in supporting this topology with existing code?
> If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
> memoryless node.

i see couple of arch enabled HAVE_MEMORYLESS_NODES, but i don't see
any code in arch specific numa code for this.
is that means the core code will take care of this?

>
> For example, in include/linux/topology.h
> #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return per_cpu(_numa_mem_, cpu);
> }
> ...
> #else
> ...
> static inline int cpu_to_mem(int cpu)
> {
> return cpu_to_node(cpu);
> }
> ...
> #endif
>
>> I think, this use case should be supported with present code.
>>
>
> So, we can not simply classify device0 to node0 or node1, but we can
> define a node2 which distances to node0 and node1 are the same.
>
> Signed-off-by: Zhen Lei 
> ---
>  arch/arm64/Kconfig  |  4 
>  arch/arm64/kernel/smp.c |  1 +
>  arch/arm64/mm/numa.c| 43 +--
>  3 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 05c1bf1..5904a62 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> +config HAVE_MEMORYLESS_NODES
> +   def_bool y
> +   depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index d099306..9e15297 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
> }
>
> bootcpu_valid = true;
> +   early_map_cpu_to_node(0, of_node_to_nid(dn));
>
> /*
>  * cpu_logical_map has already been
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index df5c842..d73b0a0 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, 
> int nid)
> nid = 0;
>
> cpu_to_node_map[cpu] = nid;
> +
> +   /*
> +* We should set the numa node of cpu0 as soon as possible, 
> because it
> +* has already been set up online before. cpu_to_node(0) will 
> soon be
> +* called.
> +*/
> +   if (!cpu)
> +   set_cpu_numa_node(cpu, nid);
>  }
>
>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 
> end)
> return ret;
>  }
>
> +static u64 __init alloc_node_data_from_nearest_node(int nid, const 
> size_t size)
> +{
> +   int i, best_nid, distance;
> +   u64 pa;
> +   DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> +   bitmap_zero(nodes_map, MAX_NUMNODES);
> +   bitmap_set(nodes_map, nid, 1);
> +
> +find_nearest_node:
> +   best_nid = NUMA_NO_NODE;
> +   distance = INT_MAX;
> +
> +   for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> +   if (numa_distance[nid][i] < distance) {
> +   best_nid = i;
> +   distance = numa_distance[nid][i];
> +   }
> +
> +   pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
> +   if (!pa) {
> +   BUG_ON(best_nid == NUMA_NO_NODE);
> +   bitmap_set(nodes_map, best_nid, 1);
> +   goto find_nearest_node;
> +   }
> +
>>

Re: [PATCH v10 1/7] regulator: fixed: add support for ACPI interface

2016-06-07 Thread Greg Kroah-Hartman

On Thu, Jun 02, 2016 at 09:37:23AM +0800, Lu Baolu wrote:
> Add support to retrieve fixed voltage configure information through
> ACPI interface. This is needed for Intel Bay Trail devices, where a
> GPIO is used to control the USB vbus.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/regulator/fixed.c | 46 ++
>  1 file changed, 46 insertions(+)

Can't do anything with this until I get an ack from the "owners" of this
file.

And what happened to the acks from other Intel developers for this whole
patch series, I don't see that here :(

greg k-h

[PATCH v3 2/2] ARM: at91/dt: sama5d2: Use new compatible for ohci node

2016-06-07 Thread Wenyou Yang

Use compatible "atmel,sama5d2-ohci" to be capable of suspending
ports while sleep to save the power consumption.

Signed-off-by: Wenyou Yang 
---

Changes in v3: None
Changes in v2:
 - Use the new compatible for ohci-node.

 arch/arm/boot/dts/sama5d2.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sama5d2.dtsi b/arch/arm/boot/dts/sama5d2.dtsi
index 78996bd..03d6724 100644
--- a/arch/arm/boot/dts/sama5d2.dtsi
+++ b/arch/arm/boot/dts/sama5d2.dtsi
@@ -232,7 +232,7 @@
};
 
usb1: ohci@0040 {
-   compatible = "atmel,at91rm9200-ohci", "usb-ohci";
+   compatible = "atmel,sama5d2-ohci", "usb-ohci";
reg = <0x0040 0x10>;
interrupts = <41 IRQ_TYPE_LEVEL_HIGH 2>;
clocks = <&uhphs_clk>, <&uhphs_clk>, <&uhpck>;
-- 
2.7.4

[PATCH v3 0/2] ARM: ohci-at91: Add support to forcibly suspend ports while sleep

2016-06-07 Thread Wenyou Yang

To save the power consumption, add a new compatible to support forcibly
suspend the USB PORTA/B/C via OHCI Interrupt Configuration SFR Register.

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.
 - Use the new compatible for ohci-node.

Wenyou Yang (2):
  usb: ohci-at91: Forcibly suspend ports while USB suspend
  ARM: at91/dt: sama5d2: Use new compatible for ohci node

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 arch/arm/boot/dts/sama5d2.dtsi |  2 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 4 files changed, 113 insertions(+), 4 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

-- 
2.7.4

[PATCH v3 1/2] usb: ohci-at91: Forcibly suspend ports while USB suspend

2016-06-07 Thread Wenyou Yang

In order to the save power consumption, as a workaround, suspend
forcibly the USB PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI
Interrupt Configuration Register in the SFRs while OHCI USB suspend.

This suspend operation must be done before the USB clock is disabled,
resume after the USB clock is enabled.

Signed-off-by: Wenyou Yang 
---

Changes in v3:
 - Change the compatible description for more precise.

Changes in v2:
 - Add compatible to support forcibly suspend the ports.
 - Add soc/at91/at91_sfr.h to accommodate the defines.
 - Add error checking for .sfr_regmap.
 - Remove unnecessary regmap_read() statement.

 .../devicetree/bindings/usb/atmel-usb.txt  |  6 +-
 drivers/usb/host/ohci-at91.c   | 80 +-
 include/soc/at91/at91_sfr.h| 29 
 3 files changed, 112 insertions(+), 3 deletions(-)
 create mode 100644 include/soc/at91/at91_sfr.h

diff --git a/Documentation/devicetree/bindings/usb/atmel-usb.txt 
b/Documentation/devicetree/bindings/usb/atmel-usb.txt
index 5883b73..888deaa 100644
--- a/Documentation/devicetree/bindings/usb/atmel-usb.txt
+++ b/Documentation/devicetree/bindings/usb/atmel-usb.txt
@@ -3,8 +3,10 @@ Atmel SOC USB controllers
 OHCI
 
 Required properties:
- - compatible: Should be "atmel,at91rm9200-ohci" for USB controllers
-   used in host mode.
+ - compatible: Should be one of the following
+  "atmel,at91rm9200-ohci" for USB controllers used in host mode.
+  "atmel,sama5d2-ohci" for USB controllers used in host mode
+  on SAMA5D2 which can force to suspend.
  - reg: Address and length of the register set for the device
  - interrupts: Should contain ehci interrupt
  - clocks: Should reference the peripheral, host and system clocks
diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c
index d177372..54e8feb 100644
--- a/drivers/usb/host/ohci-at91.c
+++ b/drivers/usb/host/ohci-at91.c
@@ -21,8 +21,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
 
 #include "ohci.h"
 
@@ -45,12 +48,18 @@ struct at91_usbh_data {
u8 overcurrent_changed[AT91_MAX_USBH_PORTS];
 };
 
+struct ohci_at91_caps {
+   bool suspend_ctrl;
+};
+
 struct ohci_at91_priv {
struct clk *iclk;
struct clk *fclk;
struct clk *hclk;
bool clocked;
bool wakeup;/* Saved wake-up state for resume */
+   const struct ohci_at91_caps *caps;
+   struct regmap *sfr_regmap;
 };
 /* interface and function clocks; sometimes also an AHB clock */
 
@@ -132,6 +141,17 @@ static void at91_stop_hc(struct platform_device *pdev)
 
 /*-*/
 
+struct regmap *at91_dt_syscon_sfr(void)
+{
+   struct regmap *regmap;
+
+   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
+   if (IS_ERR(regmap))
+   regmap = NULL;
+
+   return regmap;
+}
+
 static void usb_hcd_at91_remove (struct usb_hcd *, struct platform_device *);
 
 /* configure so an HC device and id are always provided */
@@ -197,6 +217,17 @@ static int usb_hcd_at91_probe(const struct hc_driver 
*driver,
goto err;
}
 
+   ohci_at91->caps = (const struct ohci_at91_caps *)
+ of_device_get_match_data(&pdev->dev);
+   if (!ohci_at91->caps)
+   return -ENODEV;
+
+   if (ohci_at91->caps->suspend_ctrl) {
+   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
+   if (!ohci_at91->sfr_regmap)
+   dev_warn(dev, "failed to find sfr node\n");
+   }
+
board = hcd->self.controller->platform_data;
ohci = hcd_to_ohci(hcd);
ohci->num_ports = board->ports;
@@ -440,8 +471,17 @@ static irqreturn_t ohci_hcd_at91_overcurrent_irq(int irq, 
void *data)
return IRQ_HANDLED;
 }
 
+static const struct ohci_at91_caps at91rm9200_caps = {
+   .suspend_ctrl = false,
+};
+
+static const struct ohci_at91_caps sama5d2_caps = {
+   .suspend_ctrl = true,
+};
+
 static const struct of_device_id at91_ohci_dt_ids[] = {
-   { .compatible = "atmel,at91rm9200-ohci" },
+   { .compatible = "atmel,at91rm9200-ohci", .data = &at91rm9200_caps },
+   { .compatible = "atmel,sama5d2-ohci", .data = &sama5d2_caps },
{ /* sentinel */ }
 };
 
@@ -581,6 +621,38 @@ static int ohci_hcd_at91_drv_remove(struct platform_device 
*pdev)
return 0;
 }
 
+static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable)
+{
+   u32 regval;
+   int ret;
+
+   if (!regmap)
+   return -EINVAL;
+
+   ret = regmap_read(regmap, SFR_OHCIICR, ®val);
+   if (ret)
+   return ret;
+
+   if (enable)
+   regval &= ~SFR_OHCIICR_USB_SUSPEND;
+   else
+   regval |= SFR_OHCIICR_USB_SUSPEND;
+
+   regmap_write(regmap, SFR_OHCIICR, regval);
+
+   return 0;
+}
+
+

[PATCH] mmc: dw_mmc: remove UBSAN warning in dw_mci_setup_bus()

2016-06-07 Thread Seung-Woo Kim

This patch removes following UBSAN warnings in dw_mci_setup_bus().
The warnings are caused because of shift with more than 31 on 32
bit variable, so this patch fixes to shift only for less than 32.

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1102:14
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x3a0/0x438
  [...]

  UBSAN: Undefined behaviour in drivers/mmc/host/dw_mmc.c:1132:27
  shift exponent 250 is too large for 32-bit type 'unsigned int'
  Call trace:
  [] dump_backtrace+0x0/0x380
  [] show_stack+0x14/0x20
  [] dump_stack+0xe0/0x120
  [] ubsan_epilogue+0x18/0x68
  [] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
  [] dw_mci_setup_bus+0x384/0x438
  [] dw_mci_set_ios+0x184/0x798
  [] mmc_power_up+0x11c/0x260
  [] mmc_start_host+0x88/0x100
  [] mmc_add_host+0x6c/0x128
  [] dw_mci_probe+0x1088/0x1750
  [] dw_mci_pltfm_register+0x108/0x178
  [] dw_mci_exynos_probe+0x4c/0x88
  [] platform_drv_probe+0x78/0x180
  [] driver_probe_device+0x144/0x460
  [] __driver_attach+0xf4/0x140
  [] bus_for_each_dev+0xf0/0x160
  [] driver_attach+0x34/0x58
  [] bus_add_driver+0x2c0/0x398
  [] driver_register+0xbc/0x1e0
  [] __platform_driver_register+0x84/0xa8
  [] dw_mci_exynos_pltfm_driver_init+0x18/0x20
  [] do_one_initcall+0xa0/0x2c8
  [] kernel_init_freeable+0x52c/0x5dc
  [] kernel_init+0x1c/0xf8
  [] ret_from_fork+0x10/0x40

Signed-off-by: Seung-Woo Kim 
---
 drivers/mmc/host/dw_mmc.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 2cc6123..dff045e 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1099,7 +1099,8 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
 
div = (host->bus_hz != clock) ? DIV_ROUND_UP(div, 2) : 0;
 
-   if ((clock << div) != slot->__clk_old || force_clkinit)
+   if (((div < 32) ? (clock << div) : 0) != slot->__clk_old ||
+   force_clkinit)
dev_info(&slot->mmc->class_dev,
 "Bus speed (slot %d) = %dHz (slot req %dHz, 
actual %dHZ div = %d)\n",
 slot->id, host->bus_hz, clock,
@@ -1129,7 +1130,7 @@ static void dw_mci_setup_bus(struct dw_mci_slot *slot, 
bool force_clkinit)
mci_send_cmd(slot, sdmmc_cmd_bits, 0);
 
/* keep the clock with reflecting clock dividor */
-   slot->__clk_old = clock << div;
+   slot->__clk_old = (div < 32) ? (clock << div) : 0;
}
 
host->current_speed = clock;
-- 
1.7.4.1

[PATCH 1/1] perf/x86/intel: Add extended event constraints for Knights Landing

2016-06-07 Thread Lukasz Odzioba

For Knights Landing processor we need to filter OFFCORE_RESPONSE
events by config1 parameter to make sure that it will end up in
an appropriate PMC to meet specification.

On Knights Landing:
MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1
MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0

This patch introduces INTEL_EEVENT_CONSTRAINT where third parameter
specifies extended config bits allowed only on given PMCs.

Patch depends on "Change offcore response masks for Knights Landing"

Reported-by: Andi Kleen 
Acked-by: Andi Kleen 
Signed-off-by: Lukasz Odzioba 
---
 arch/x86/events/core.c |  3 ++-
 arch/x86/events/intel/core.c   | 17 ++---
 arch/x86/events/intel/uncore.c |  2 +-
 arch/x86/events/perf_event.h   | 41 -
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 33787ee..a4be71c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -122,6 +122,7 @@ static int x86_pmu_extra_regs(u64 config, struct perf_event 
*event)
continue;
if (event->attr.config1 & ~er->valid_mask)
return -EINVAL;
+
/* Check if the extra msrs can be safely accessed*/
if (!er->extra_msr_access)
return -ENXIO;
@@ -1736,7 +1737,7 @@ static int __init init_hw_perf_events(void)
 
unconstrained = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-  0, x86_pmu.num_counters, 0, 0);
+  0, x86_pmu.num_counters, 0, 0, 0);
 
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7c66695..794f5c8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -177,6 +177,17 @@ static struct event_constraint 
intel_slm_event_constraints[] __read_mostly =
EVENT_CONSTRAINT_END
 };
 
+static struct event_constraint intel_knl_event_constraints[] __read_mostly = {
+   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* pseudo CPU_CLK_UNHALTED.REF */
+   /* MSR_OFFCORE_RSP_1 bits 8, 11, 14 can be used only on PMC1 */
+   INTEL_EEVENT_CONSTRAINT(0x02b7, 2, 0x4900),
+   /* MSR_OFFCORE_RSP_0 bit 38 can be used only on PMC0 */
+   INTEL_EEVENT_CONSTRAINT(0x01b7, 1, 1ull<<38),
+   EVENT_CONSTRAINT_END
+};
+
 struct event_constraint intel_skl_event_constraints[] = {
FIXED_EVENT_CONSTRAINT(0x00c0, 0),  /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1),  /* CPU_CLK_UNHALTED.CORE */
@@ -2284,16 +2295,16 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, 
int idx,
  struct perf_event *event)
 {
struct event_constraint *c;
-
if (x86_pmu.event_constraints) {
for_each_event_constraint(c, x86_pmu.event_constraints) {
if ((event->hw.config & c->cmask) == c->code) {
+   if (c->emask && !(c->emask & 
event->attr.config1))
+   continue;
event->hw.flags |= c->flags;
return c;
}
}
}
-
return &unconstrained;
 }
 
@@ -3784,7 +3795,7 @@ __init int intel_pmu_init(void)
   knl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
intel_pmu_lbr_init_knl();
 
-   x86_pmu.event_constraints = intel_slm_event_constraints;
+   x86_pmu.event_constraints = intel_knl_event_constraints;
x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
x86_pmu.extra_regs = intel_knl_extra_regs;
 
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index fce7406..fc5b866 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -839,7 +839,7 @@ static int __init uncore_type_init(struct intel_uncore_type 
*type, bool setid)
type->pmus = pmus;
type->unconstrainted = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-   0, type->num_counters, 0, 0);
+   0, type->num_counters, 0, 0, 0);
 
if (type->event_descs) {
for (i = 0; type->event_descs[i].attr.attr.name; i++);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8bd764d..47241ed5 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -52,6 +52,7 @@ struct event_constraint {
int weight;
int overlap;
int flags;
+   u64 emask;
 };
 /*
  *

Re: [PATCH 06/10] drm/amdgpu: use drm_crtc_vblank_{on,off}()

2016-06-07 Thread Michel Dänzer

On 07.06.2016 23:07, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Replace the legacy drm_vblank_{on,off}() with the new helper functions.
> 
> Signed-off-by: Gustavo Padovan 

Patches 6 & 8-10 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

Re: Files leak from nfsd in 4.7.1-rc1 (and more?)

2016-06-07 Thread Oleg Drokin


On Jun 7, 2016, at 10:22 PM, Oleg Drokin wrote:

> 
> On Jun 7, 2016, at 8:03 PM, Jeff Layton wrote:
> 
 That said, this code is quite subtle. I'd need to look over it in more
 detail before I offer up any fixes. I'd also appreciate it if anyone
 else wants to sanity check my analysis there.
 
>> Yeah, I think you're right. It's fine since r/w opens have a distinct
>> slot, even though the refcounting just tracks the number of read and
>> write references. So yeah, the leak probably is in an error path
>> someplace, or maybe a race someplace.
> 
> So I noticed that set_access is always called locked, but clear_access is not,
> this does not sound right.
> 
> So I placed this strategic WARN_ON:
> @@ -3991,6 +4030,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, 
> struct nfs4_file *fp,
>goto out_put_access;
>spin_lock(&fp->fi_lock);
>if (!fp->fi_fds[oflag]) {
> +WARN_ON(!test_access(open->op_share_access, stp));
>fp->fi_fds[oflag] = filp;
>filp = NULL;
> 
> This is right in the place where nfsd set the access flag already, discovered
> that the file is not opened and went on to open it, yet some parallel thread
> came in and cleared the flag by the time we got the file opened.
> It did trigger (but there are 30 minutes left till test finish, so I don't
> know yet if this will correspond to the problem at hand yet, so below is 
> speculation).

Duh, I looked for a warning, but did not cross reference, and it was not this 
one that
hit yet.

Though apparently I am hitting some of the "impossible" warnings, so you might 
want to
look into that anyway.

status = nfsd4_process_open2(rqstp, resfh, open);
WARN(status && open->op_created,
 "nfsd4_process_open2 failed to open newly-created file! 
status=%u\n",
 be32_to_cpu(status));

and

filp = find_readable_file(fp);
if (!filp) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
locks_free_lock(fl);
return -EBADF;
}

Re: [PATCH V3 8/9] cpufreq: Keep policy->freq_table sorted in ascending order

2016-06-07 Thread Viresh Kumar

On 08-06-16, 02:38, Rafael J. Wysocki wrote:
> On Tuesday, June 07, 2016 09:58:07 AM Viresh Kumar wrote:
> > On 06-06-16, 23:56, Rafael J. Wysocki wrote:
> > > Since you are adding new code, you can write it so it doesn't do
> > > unnecessary checks from the start.
> > 
> > Hmm, I will do all that in this series only now.
> > 
> > > While at it, the "if ((freq < policy->min) || (freq > policy->max))"
> > > checks in cpufreq_find_index_l() and cpufreq_find_index_h() don't look
> > > good to me, because they very well may cause those function to return
> > > -EINVAL even when there's a valid table and that may cause
> > > acpi_cpufreq_fast_switch() to do bad things.
> > 
> > Hmm. So, the checks are for sure required here, otherwise we may end up
> > returning a frequency which we aren't allowed to. Also note that 'freq' here
> > isn't the target-freq, but the entry in the freq-table.
> > 
> > This routine should be returning a valid freq within the ranges specified by
> > policy->min/max.
> 
> Which in principle may not be possible if the range doesn't include any
> frequency in the table, eg. min == max and between the table entries.

By within ranges I meant, policy->min <= freq <= policy->max, and that's how all
our checks are. So even if the table will have a single valid frequency, we will
return that only.

> However, the CPU has to run at *some* frequency, even if there's none in the
> min/max range.

I completely agree. But the error will be fired only if there is no frequency
within ranges we can switch to. And that's a bug somewhere else then.

> And if we are sure that there is at least one valid frequency between min
> and max, please note that target_freq has already been clamped between them,

Yeah, its already clamped by the freq-change helpers in cpufreq core, but others
may not be doing it properly.

> > Also note that these routines shall *never* return -EINVAL, otherwise it is
> > mostly a bug we are hitting.
> 
> So make them explicitly return a valid frequency every time.

I thought about return Index 0 on such errors, will that be fine ? Anyway the
new patches have added a WARN() for such cases.

> > We have enough checks in place to make sure that there is at least one valid
> > entry in the freq-table which is >= policy->min and <= policy->max.
> 
> That assuming that the driver will always do the right thing in its ->verify
> callback.

Yeah.

-- 
viresh

[PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-07 Thread Su Xuemin

From: "Su, Xuemin" 

There is a corner case in which udp packets belonging to a same
flow are hashed to different socket when hslot->count changes from 10
to 11:

1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
and always passes 'daddr' to udp_ehashfn().

2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
INADDR_ANY instead of some specific addr.

That means when hslot->count changes from 10 to 11, the hash calculated by
udp_ehashfn() is also changed, and the udp packets belonging to a same
flow will be hashed to different socket.

This is easily reproduced:
1) Create 10 udp sockets and bind all of them to 0.0.0.0:4.
2) From the same host send udp packets to 127.0.0.1:4, record the
socket index which receives the packets.
3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the
same hslot as the aformentioned 10 sockets, and makes the hslot->count
change from 10 to 11.
4) From the same host send udp packets to 127.0.0.1:4, and the socket
index which receives the packets will be different from the one received
in step 2.
This should not happen as the socket bound to 0.0.0.0:44096 should not
change the behavior of the sockets bound to 0.0.0.0:4.

The fix here is that when searching udp_table->hash, if the socket
supports reuseport, pass inet_sk(sk)->inet_rcv_saddr to udp_ehashfn()
instead of daddr. When the sockets are bound to some specific addr,
inet_sk(sk)->inet_rcv_saddr should equal to daddr, and when the sockets
are bould to INADDR_ANY, this will pass INADDR_ANY to udp_ehashfn() as
what is done when searching udp_table->hash2.

It's the same case for IPv6, and this patch also fixes that.

Signed-off-by: Su, Xuemin 
---
The patch v1 does not fix the code in IPv6. Thank Eric Dumazet for
pointing that.
And I use this tree to generate this patch, hope it's correct:
  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

 net/ipv4/udp.c | 4 +++-
 net/ipv6/udp.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d56c055..57c38f6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -577,7 +577,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp_ehashfn(net, daddr, hnum,
+   hash = udp_ehashfn(net,
+  inet_sk(sk)->inet_rcv_saddr,
+  hnum,
   saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2da1896..41ca493 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -290,7 +290,9 @@ begin:
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
-   hash = udp6_ehashfn(net, daddr, hnum,
+   hash = udp6_ehashfn(net,
+   &sk->sk_v6_rcv_saddr,
+   hnum,
saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));
-- 
1.8.3.1

Re: [alsa-devel] [PATCH v2 6/9] ASoC: mediatek: add mt2701 platform driver implementation.

2016-06-07 Thread Garlic Tseng

On Tue, 2016-06-07 at 17:31 +0100, Mark Brown wrote:
> On Fri, Jun 03, 2016 at 12:56:21PM +0800, Garlic Tseng wrote:
> 
> > +   if (val < 0 || val > MT2701_I2S_NUM) {
> > +   dev_err(afe->dev, "%s, num not available, num %d, val %d\n",
> > +   __func__, num, val);
> > +   return -1;
> 
> Real error codes please.

OK I'll fix it.

> 
> > +static const struct snd_kcontrol_new mt2701_afe_multi_ch_out_asrc3[] = {
> > +   SOC_DAPM_SINGLE_AUTODISABLE("Multi ch asrc out3", PWR2_TOP_CON, 7, 1,
> > +   1),
> > +};
> 
> On/off controls should end in Switch.

Do you means that the name should end in Switch? Something like "Multi
ch asrc out3 Switch" (or maybe a shorter one)

I'll fix it (if I don't misunderstand the comment)

Thanks!

> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

Re: [PATCH] dmaengine: xilinx_vdma: Use dma_pool_zalloc

2016-06-07 Thread Vinod Koul

On Wed, Jun 08, 2016 at 12:48:38AM +0530, Amitoj Kaur Chawla wrote:
> Dma_pool_zalloc combines dma_pool_alloc and memset 0.
> 
> The Coccinelle semantic patch used to make this change is as follows:
> @@
> type T;
> T *d;
> expression e;
> statement S;
> @@
> 
> d =
> -dma_pool_alloc
> +dma_pool_zalloc
>  (...);
> if (!d) S
> -   memset(d, 0, sizeof(T));

Thanks for your patch, but I have already applied a similar patch fixing
this.

-- 
~Vinod

[PATCH 3/5] cputime: allow irq time accounting to be selected as an option

2016-06-07 Thread riel

From: Rik van Riel 

Allow CONFIG_IRQ_TIME_ACCOUNTING to be selected as an option, on top
of CONFIG_VIRT_CPU_ACCOUNTING_GEN (and potentially others?).

This allows for the irq time accounting code to be used with nohz_idle
CPUs, which is how several distributions ship their kernels. Using the
same code for several timer modes also allows us to drop duplicate code.

Signed-off-by: Rik van Riel 
---
 init/Kconfig | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 0dfd09d54c65..4c7ee4f136cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -375,9 +375,11 @@ config VIRT_CPU_ACCOUNTING_GEN
 
  If unsure, say N.
 
+endchoice
+
 config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
-   depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL
+   depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
help
  Select this option to enable fine granularity task irq time
  accounting. This is done by reading a timestamp on each
@@ -386,8 +388,6 @@ config IRQ_TIME_ACCOUNTING
 
  If in doubt, say N here.
 
-endchoice
-
 config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
depends on MULTIUSER
-- 
2.5.5

[PATCH 5/5] irqtime: drop local_irq_save/restore from irqtime_account_irq

2016-06-07 Thread riel

From: Rik van Riel 

Drop local_irq_save/restore from irqtime_account_irq.
Instead, have softirq and hardirq track their time spent
independently, with the softirq code subtracting hardirq
time that happened during the duration of the softirq run.

The softirq code can be interrupted by hardirq code at
any point in time, but it can check whether it got a
consistent snapshot of the timekeeping variables it wants,
and loop around in the unlikely case that it did not.

Signed-off-by: Rik van Riel 
---
 kernel/sched/cputime.c | 54 --
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e009077aeab6..466aff107f73 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -26,7 +26,9 @@
 DEFINE_PER_CPU(u64, cpu_hardirq_time);
 DEFINE_PER_CPU(u64, cpu_softirq_time);
 
-static DEFINE_PER_CPU(u64, irq_start_time);
+static DEFINE_PER_CPU(u64, hardirq_start_time);
+static DEFINE_PER_CPU(u64, softirq_start_time);
+static DEFINE_PER_CPU(u64, prev_hardirq_time);
 static int sched_clock_irqtime;
 
 void enable_sched_clock_irqtime(void)
@@ -53,36 +55,66 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
  * softirq -> hardirq, hardirq -> softirq
  *
  * When exiting hardirq or softirq time, account the elapsed time.
+ *
+ * When exiting softirq time, subtract the amount of hardirq time that
+ * interrupted this softirq run, to avoid double accounting of that time.
  */
 void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
-   unsigned long flags;
-   s64 delta;
+   u64 prev_softirq_start;
+   u64 prev_hardirq;
+   u64 hardirq_time;
+   s64 delta = 0;
int cpu;
 
if (!sched_clock_irqtime)
return;
 
-   local_irq_save(flags);
-
cpu = smp_processor_id();
-   delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-   __this_cpu_add(irq_start_time, delta);
+   prev_hardirq = __this_cpu_read(prev_hardirq_time);
+   prev_softirq_start = __this_cpu_read(softirq_start_time);
+   /*
+* Softirq context may get interrupted by hardirq context,
+* on the same CPU. At softirq 
+*/
+   if (irqtype == HARDIRQ_OFFSET) {
+   delta = sched_clock_cpu(cpu) - 
__this_cpu_read(hardirq_start_time);
+   __this_cpu_add(hardirq_start_time, delta);
+   } else do {
+   hardirq_time = READ_ONCE(per_cpu(cpu_hardirq_time, cpu));
+   u64 now = sched_clock_cpu(cpu);
+
+   delta = now - prev_softirq_start;
+   if (in_serving_softirq()) {
+   /*
+* Leaving softirq context. Avoid double counting by
+* subtracting hardirq time from this interval.
+*/
+   delta -= hardirq_time - prev_hardirq;
+   } else {
+   /* Entering softirq context. Note start times. */
+   __this_cpu_write(softirq_start_time, now);
+   __this_cpu_write(prev_hardirq_time, hardirq_time);
+   }
+   /*
+* If a hardirq happened during this calculation, it may not
+* have gotten a consistent snapshot. Try again.
+*/
+   } while (hardirq_time != READ_ONCE(per_cpu(cpu_hardirq_time, cpu)));
 
-   irq_time_write_begin();
/*
 * We do not account for softirq time from ksoftirqd here.
 * We want to continue accounting softirq time to ksoftirqd thread
 * in that case, so as not to confuse scheduler with a special task
 * that do not consume any time, but still wants to run.
 */
-   if (hardirq_count())
+   if (irqtype == HARDIRQ_OFFSET && hardirq_count())
__this_cpu_add(cpu_hardirq_time, delta);
-   else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+   else if (irqtype == SOFTIRQ_OFFSET && in_serving_softirq() &&
+   curr != this_cpu_ksoftirqd())
__this_cpu_add(cpu_softirq_time, delta);
 
irq_time_write_end();
-   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-- 
2.5.5

[PATCH 4/5] irqtime: add irq type parameter to irqtime_account_irq

2016-06-07 Thread riel

From: Rik van Riel 

Add an irq type parameter and documentation to irqtime_account_irq,
this can be used to distinguish between transitioning from process
context to hardirq time, and from process context to softirq time.

This is necessary to be able to remove the local_irq_disable from
irqtime_account_irq.

Signed-off-by: Rik van Riel 
---
 include/linux/hardirq.h | 20 ++--
 include/linux/vtime.h   | 12 ++--
 kernel/sched/cputime.c  |  9 -
 kernel/softirq.c|  6 +++---
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index dfd59d6bc6f0..1ebb31f56285 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,11 +32,11 @@ extern void rcu_nmi_exit(void);
  * always balanced, so the interrupted value of ->hardirq_context
  * will always be restored.
  */
-#define __irq_enter()  \
-   do {\
-   account_irq_enter_time(current);\
-   preempt_count_add(HARDIRQ_OFFSET);  \
-   trace_hardirq_enter();  \
+#define __irq_enter()  \
+   do {\
+   account_irq_enter_time(current, HARDIRQ_OFFSET);\
+   preempt_count_add(HARDIRQ_OFFSET);  \
+   trace_hardirq_enter();  \
} while (0)
 
 /*
@@ -47,11 +47,11 @@ extern void irq_enter(void);
 /*
  * Exit irq context without processing softirqs:
  */
-#define __irq_exit()   \
-   do {\
-   trace_hardirq_exit();   \
-   account_irq_exit_time(current); \
-   preempt_count_sub(HARDIRQ_OFFSET);  \
+#define __irq_exit()   \
+   do {\
+   trace_hardirq_exit();   \
+   account_irq_exit_time(current, HARDIRQ_OFFSET); \
+   preempt_count_sub(HARDIRQ_OFFSET);  \
} while (0)
 
 /*
diff --git a/include/linux/vtime.h b/include/linux/vtime.h
index 3b384bf5ce1a..58f036f3ebea 100644
--- a/include/linux/vtime.h
+++ b/include/linux/vtime.h
@@ -112,21 +112,21 @@ static inline void vtime_account_irq_enter(struct 
task_struct *tsk)
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
-extern void irqtime_account_irq(struct task_struct *tsk);
+extern void irqtime_account_irq(struct task_struct *tsk, int irqtype);
 #else
-static inline void irqtime_account_irq(struct task_struct *tsk) { }
+static inline void irqtime_account_irq(struct task_struct *tsk, int irqtype) { 
}
 #endif
 
-static inline void account_irq_enter_time(struct task_struct *tsk)
+static inline void account_irq_enter_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_enter(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
-static inline void account_irq_exit_time(struct task_struct *tsk)
+static inline void account_irq_exit_time(struct task_struct *tsk, int irqtype)
 {
vtime_account_irq_exit(tsk);
-   irqtime_account_irq(tsk);
+   irqtime_account_irq(tsk, irqtype);
 }
 
 #endif /* _LINUX_KERNEL_VTIME_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 2f862dfdb520..e009077aeab6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -46,8 +46,15 @@ DEFINE_PER_CPU(seqcount_t, irq_time_seq);
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
+ *
+ * There are six possible transitions:
+ * process -> softirq, softirq -> process
+ * process -> hardirq, hardirq -> process
+ * softirq -> hardirq, hardirq -> softirq
+ *
+ * When exiting hardirq or softirq time, account the elapsed time.
  */
-void irqtime_account_irq(struct task_struct *curr)
+void irqtime_account_irq(struct task_struct *curr, int irqtype)
 {
unsigned long flags;
s64 delta;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 17caf4b63342..a311c9622c86 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -245,7 +245,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
current->flags &= ~PF_MEMALLOC;
 
pending = local_softirq_pending();
-   account_irq_enter_time(current);
+   account_irq_enter_time(current, SOFTIRQ_OFFSET);
 
__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
in_hardirq = lockdep_softirq_start();
@@ -295,7 +295,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
}
 
lockdep_softirq_end(in_hardirq);
-   account_irq_exit_time(current);
+   account_irq

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1210 matches

Mail list logo