date:20161208

Re: [PATCH] x86/kbuild: enable modversions for symbols exported from asm

2016-12-08 Thread Stanislav Kozina


The question is how to provide a similar guarantee if a different way?

As a tool to aid distro reviewers, modversions has some value, but the
debug info parsing tools that have been mentioned in this thread seem
superior (not that I've tested them).


On the other hand the big advantage of modversions is that it also 
verifies the checksum during runtime (module loading). In other words, I 
believe that any other solution should still generate some form of 
checksum/watermark which can be easily checked for compatibility on 
module load.
It should not be hard to add to the DWARF based tools though. We'd just 
parse DWARF data instead of the C code.


Regards,
-Stanislav

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Cao jin



On 12/09/2016 02:44 PM, Linas Vepstas wrote:
> On Fri, Dec 9, 2016 at 2:37 PM, Cao jin  wrote:
>>
>>
>> On 12/09/2016 02:24 PM, Linas Vepstas wrote:
>>> I suppose I'm confused, but I recall that link resets are non-fatal.
>>> Fatal errors typically require that the the pci adapter be completely
>>> reset, any adapter firmware to be reloaded from scratch, the device
>>> driver has to kill all device state and start from scratch. Its huge.
>>> If the fatal error is on pci device that is under a block device
>>> holding a file system, then (usually) there is no way to recover,
>>> because the block layer (and file system) cannot deal with a block
>>> device that disappeared and then reappeared some few seconds later.
>>> (maybe some future zfs or lvm or btrfs might be able to deal with
>>> this, but not today)
>>>
>>> By contrast, link resets are far more gentle: the device driver might
>>> have to discard some half-full FIFO's, or cancel some in-flight
>>> commands, but can otherwise gracefully recover without telling the
>>> higher layers that there were any problems.
>>>
>>> --linas
>>>
>>
>> I am little confused too, even not sure if we are talking the same
>> *fatal error*, I am talking the fatal error defined in PCI Express spec,
>> chapter 6.2.2.2.1:
>>
>> Fatal errors are uncorrectable error conditions which render the
>> particular Link and related hardware unreliable. For Fatal errors, a
>> reset of the components on the Link may be required to return to
>> reliable operation. Platform handling of Fatal errors, and any efforts
>> to limit the effects of these errors, is platform implementation specific.
>>
>> Link reset means set *secondary bus reset* bit in pci bridge config
>> space, can reset the link and device simultaneously, is the strongest
>> kind of reset as I know.
> 
> OK, well, its been far too many years, and I don't have the PCI spec
> at my fingertips.
> Isn't there a link reset that can be performed, without forcing a device 
> reset?
> 

At least I don't find the exact words saying that.

-- 
Sincerely,
Cao jin

> The intent was that some PCI link errors are due to vibration,
> ground-bounce, humidity, etc. and that these errors can be detected
> and do not corrupt the device state or the device driver state.  Since
> they are not associated with data corruption (or rather, the
> corruption is local to the link), these can be recovered by reseting
> just the link, without resetting the whole adapter. They may require
> reseting some device-driver state, but not all of it.
> 
> However, this was all decided before the PCI-E spec was written, so
> maybe the newer PCI-E specs now say something different.
> 
> --linas
> 
>>
>>> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:


 On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
> On Thu, 8 Dec 2016 16:16:14 +0800
> Cao jin  wrote:
>
>>  The platform resets the link, and then calls the link_reset() callback
>>  on all affected device drivers.  This is a PCI-Express specific state
>> -and is done whenever a non-fatal error has been detected that can be
>> +and is done whenever a fatal error has been detected that can be
>>  "solved" by resetting the link. This call informs the driver of the
>
> As far as I can tell, the original text was correct here; why do you
> think this change needs to be made?
>

 See do_recovery() in aer core, reset_link() is called only seeing fatal
 error.

 --
 Sincerely,
 Cao jin


>>>
>>>
>>>
>>
>> --
>> Sincerely,
>> Cao jin
>>
>>
> 
> 
> .
>

RE: [PATCH v10 1/6] drivers/platform/x86/p2sb: New Primary to Sideband bridge support driver for Intel SOC's

2016-12-08 Thread Tan, Jui Nee



> -Original Message-
> From: linux-gpio-ow...@vger.kernel.org [mailto:linux-gpio-
> ow...@vger.kernel.org] On Behalf Of Andy Shevchenko
> Sent: Friday, November 11, 2016 12:07 AM
> To: Tan, Jui Nee ; mika.westerb...@linux.intel.com;
> heikki.kroge...@linux.intel.com; t...@linutronix.de; dvh...@infradead.org;
> mi...@redhat.com; h...@zytor.com; x...@kernel.org; pty...@xes-inc.com;
> lee.jo...@linaro.org; linus.wall...@linaro.org
> Cc: linux-g...@vger.kernel.org; platform-driver-...@vger.kernel.org;
> linux-kernel@vger.kernel.org; Yong, Jonathan ;
> Yu, Ong Hock ; Luck, Tony ;
> Wan Mohamad, Wan Ahmad Zainie ;
> Sun, Yunying 
> Subject: Re: [PATCH v10 1/6] drivers/platform/x86/p2sb: New Primary to
> Sideband bridge support driver for Intel SOC's
> 
> On Thu, 2016-11-10 at 17:00 +0800, Tan Jui Nee wrote:
> > From: Andy Shevchenko 
> >
> > There is already one and at least one more user coming which require
> > an access to Primary to Sideband bridge (P2SB) in order to get IO or
> > MMIO bar hidden by BIOS.
> > Create a driver to access P2SB for x86 devices.
> >
> > Signed-off-by: Yong, Jonathan 
> > Signed-off-by: Andy Shevchenko 
> 
> 
> > +int p2sb_bar(struct pci_dev *pdev, unsigned int devfn,
> > +   struct resource *res)
> > +{
> > +   u32 base_addr;
> > +   u64 base64_addr;
> > +   unsigned long flags;
> > +
> >
> 
> > +   if (!res)
> > +   return -EINVAL;
> 
> I don't remember the details, one version was quite changed, so, I think
> these lines are not needed anymore.
> 
Noted, these lines will be removed in next patch version (v12).
> > +   /* Get IO or MMIO BAR */
> > +   pci_bus_read_config_dword(pdev->bus, devfn, SBREG_BAR,
> > &base_addr);
> > +   if ((base_addr & PCI_BASE_ADDRESS_SPACE) ==
> > PCI_BASE_ADDRESS_SPACE_IO) {
> > +   flags = IORESOURCE_IO;
> > +   base64_addr = base_addr & PCI_BASE_ADDRESS_IO_MASK;
> > +   } else {
> > +   flags = IORESOURCE_MEM;
> > +   base64_addr = base_addr & PCI_BASE_ADDRESS_MEM_MASK;
> > +   if (base_addr & PCI_BASE_ADDRESS_MEM_TYPE_64) {
> > +   flags |= IORESOURCE_MEM_64;
> >
> 
> > +   pci_bus_read_config_dword(pdev->bus, devfn,
> > +   SBREG_BAR + 4, &base_addr);
> 
> Fix indentation.
> 
Thanks for pointing that out. I will fix that in next patch version (v12). 
> > +   base64_addr |= (u64)base_addr << 32;
> > +   }
> > +   }
> > +
> > +   /* Hide the P2SB device */
> > +   pci_bus_write_config_byte(pdev->bus, devfn, SBREG_HIDE,
> > 0x01);
> > +
> > +   spin_unlock(&p2sb_spinlock);
> > +
> 
> > +   /* User provides prefilled resources */
> 
> Not anymore as far I as I can see. You just return here the result.
> 
Current version is returning status of p2sb_bar function, i.e., 0 on success or 
appropriate errno value on error. Perhaps you could share the reason of return 
the result instead of status. 
> > +   res->start = (resource_size_t)base64_addr;
> > +   res->flags = flags;
> 
> --
> Andy Shevchenko 
> Intel Finland Oy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/8] power: supply: tps65217: Use 'poll_task' on unloading the module

2016-12-08 Thread Milo Kim

TPS65217 has two interrupt numbers so checking single IRQ number is not
appropriate when the module is removed.
Use the task_struct variable for running polling thread. If polling task
is activated, then use it to stop running thread.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 2000e59..55371d6 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -202,6 +202,7 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
struct tps65217 *tps = dev_get_drvdata(pdev->dev.parent);
struct tps65217_charger *charger;
struct power_supply_config cfg = {};
+   struct task_struct *poll_task;
int irq[NUM_CHARGER_IRQS];
int ret;
int i;
@@ -238,15 +239,16 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
 
/* Create a polling thread if an interrupt is invalid */
if (irq[0] < 0 || irq[1] < 0) {
-   charger->poll_task = kthread_run(tps65217_charger_poll_task,
-   charger, "ktps65217charger");
-   if (IS_ERR(charger->poll_task)) {
-   ret = PTR_ERR(charger->poll_task);
+   poll_task = kthread_run(tps65217_charger_poll_task,
+   charger, "ktps65217charger");
+   if (IS_ERR(poll_task)) {
+   ret = PTR_ERR(poll_task);
dev_err(charger->dev,
"Unable to run kthread err %d\n", ret);
return ret;
}
 
+   charger->poll_task = poll_task;
return 0;
}
 
@@ -274,7 +276,7 @@ static int tps65217_charger_remove(struct platform_device 
*pdev)
 {
struct tps65217_charger *charger = platform_get_drvdata(pdev);
 
-   if (charger->irq == -ENXIO)
+   if (charger->poll_task)
kthread_stop(charger->poll_task);
 
return 0;
-- 
2.9.3

[PATCH v2 7/8] power: supply: tps65217: Use generic name for get_property()

2016-12-08 Thread Milo Kim

Rename it as tps65217_charger_get_property().

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 79afeca..63c5556 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -115,9 +115,9 @@ static int tps65217_enable_charging(struct tps65217_charger 
*charger)
return 0;
 }
 
-static int tps65217_ac_get_property(struct power_supply *psy,
-   enum power_supply_property psp,
-   union power_supply_propval *val)
+static int tps65217_charger_get_property(struct power_supply *psy,
+enum power_supply_property psp,
+union power_supply_propval *val)
 {
struct tps65217_charger *charger = power_supply_get_drvdata(psy);
 
@@ -190,7 +190,7 @@ static int tps65217_charger_poll_task(void *data)
 static const struct power_supply_desc tps65217_charger_desc = {
.name   = "tps65217-ac",
.type   = POWER_SUPPLY_TYPE_MAINS,
-   .get_property   = tps65217_ac_get_property,
+   .get_property   = tps65217_charger_get_property,
.properties = tps65217_charger_props,
.num_properties = ARRAY_SIZE(tps65217_charger_props),
 };
-- 
2.9.3

[PATCH v2 5/8] power: supply: tps65217: Use generic name for power supply structure

2016-12-08 Thread Milo Kim

Replace 'ac' of tps65217_charger structure with 'psy'.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 424a6d3..5daf361 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -42,7 +42,7 @@
 struct tps65217_charger {
struct tps65217 *tps;
struct device *dev;
-   struct power_supply *ac;
+   struct power_supply *psy;
 
int online;
int prev_online;
@@ -157,7 +157,7 @@ static irqreturn_t tps65217_charger_irq(int irq, void *dev)
}
 
if (charger->prev_online != charger->online)
-   power_supply_changed(charger->ac);
+   power_supply_changed(charger->psy);
 
ret = tps65217_reg_read(charger->tps, TPS65217_REG_CHGCONFIG0, &val);
if (ret < 0) {
@@ -218,12 +218,12 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
cfg.of_node = pdev->dev.of_node;
cfg.drv_data = charger;
 
-   charger->ac = devm_power_supply_register(&pdev->dev,
-&tps65217_charger_desc,
-&cfg);
-   if (IS_ERR(charger->ac)) {
+   charger->psy = devm_power_supply_register(&pdev->dev,
+ &tps65217_charger_desc,
+ &cfg);
+   if (IS_ERR(charger->psy)) {
dev_err(&pdev->dev, "failed: power supply register\n");
-   return PTR_ERR(charger->ac);
+   return PTR_ERR(charger->psy);
}
 
irq[0] = platform_get_irq_byname(pdev, "USB");
-- 
2.9.3

[PATCH v2 0/8] power: supply: tps65217: Support USB charger feature

2016-12-08 Thread Milo Kim

TPS65217 device supports two charger inputs - AC and USB.
Currently, only AC charger is supported. This patch-set adds USB charger 
feature. Tested on Beaglebone black.

Patch 1: Main patch
Patch 2, 3: Clean up for charger driver data
Patch 4 ~ 8: Naming changes for generic power supply class structure

v2:
  Regenerate the patchset for better code review

Milo Kim (8):
  power: supply: tps65217: Support USB charger interrupt
  power: supply: tps65217: Use 'poll_task' on unloading the module
  power: supply: tps65217: Remove IRQ data from driver data
  power: supply: tps65217: Use generic name for charger online
  power: supply: tps65217: Use generic name for power supply structure
  power: supply: tps65217: Use generic name for power supply property
  power: supply: tps65217: Use generic name for get_property()
  power: supply: tps65217: Use generic charger name

 drivers/power/supply/tps65217_charger.c | 99 ++---
 1 file changed, 53 insertions(+), 46 deletions(-)

-- 
2.9.3

[PATCH v2 8/8] power: supply: tps65217: Use generic charger name

2016-12-08 Thread Milo Kim

"tps65217-charger" is more appropriate name because the driver supports
not only AC but also USB charger.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 63c5556..29b61e8 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -188,7 +188,7 @@ static int tps65217_charger_poll_task(void *data)
 }
 
 static const struct power_supply_desc tps65217_charger_desc = {
-   .name   = "tps65217-ac",
+   .name   = "tps65217-charger",
.type   = POWER_SUPPLY_TYPE_MAINS,
.get_property   = tps65217_charger_get_property,
.properties = tps65217_charger_props,
-- 
2.9.3

[PATCH v2 1/8] power: supply: tps65217: Support USB charger interrupt

2016-12-08 Thread Milo Kim

TPS65217 has two charger interrupts - AC or USB power status change.

Interrupt handler:
  Check not only AC but also USB charger status.
  In both cases, enable charging operation.

Interrupt request:
  If an interrupt number is invalid, then use legacy polling thread.
  Otherwise, create IRQ threads to handle AC and USB charger event.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 47 +++--
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 9fd019f..2000e59 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -35,6 +35,8 @@
 #include 
 #include 
 
+#define CHARGER_STATUS_PRESENT (TPS65217_STATUS_ACPWR | TPS65217_STATUS_USBPWR)
+#define NUM_CHARGER_IRQS   2
 #define POLL_INTERVAL  (HZ * 2)
 
 struct tps65217_charger {
@@ -144,8 +146,8 @@ static irqreturn_t tps65217_charger_irq(int irq, void *dev)
 
dev_dbg(charger->dev, "%s: 0x%x\n", __func__, val);
 
-   /* check for AC status bit */
-   if (val & TPS65217_STATUS_ACPWR) {
+   /* check for charger status bit */
+   if (val & CHARGER_STATUS_PRESENT) {
ret = tps65217_enable_charging(charger);
if (ret) {
dev_err(charger->dev,
@@ -200,8 +202,9 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
struct tps65217 *tps = dev_get_drvdata(pdev->dev.parent);
struct tps65217_charger *charger;
struct power_supply_config cfg = {};
-   int irq;
+   int irq[NUM_CHARGER_IRQS];
int ret;
+   int i;
 
dev_dbg(&pdev->dev, "%s\n", __func__);
 
@@ -224,10 +227,8 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
return PTR_ERR(charger->ac);
}
 
-   irq = platform_get_irq_byname(pdev, "AC");
-   if (irq < 0)
-   irq = -ENXIO;
-   charger->irq = irq;
+   irq[0] = platform_get_irq_byname(pdev, "USB");
+   irq[1] = platform_get_irq_byname(pdev, "AC");
 
ret = tps65217_config_charger(charger);
if (ret < 0) {
@@ -235,29 +236,35 @@ static int tps65217_charger_probe(struct platform_device 
*pdev)
return ret;
}
 
-   if (irq != -ENXIO) {
-   ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
+   /* Create a polling thread if an interrupt is invalid */
+   if (irq[0] < 0 || irq[1] < 0) {
+   charger->poll_task = kthread_run(tps65217_charger_poll_task,
+   charger, "ktps65217charger");
+   if (IS_ERR(charger->poll_task)) {
+   ret = PTR_ERR(charger->poll_task);
+   dev_err(charger->dev,
+   "Unable to run kthread err %d\n", ret);
+   return ret;
+   }
+
+   return 0;
+   }
+
+   /* Create IRQ threads for charger interrupts */
+   for (i = 0; i < NUM_CHARGER_IRQS; i++) {
+   ret = devm_request_threaded_irq(&pdev->dev, irq[i], NULL,
tps65217_charger_irq,
0, "tps65217-charger",
charger);
if (ret) {
dev_err(charger->dev,
-   "Unable to register irq %d err %d\n", irq,
+   "Unable to register irq %d err %d\n", irq[i],
ret);
return ret;
}
 
/* Check current state */
-   tps65217_charger_irq(irq, charger);
-   } else {
-   charger->poll_task = kthread_run(tps65217_charger_poll_task,
-   charger, "ktps65217charger");
-   if (IS_ERR(charger->poll_task)) {
-   ret = PTR_ERR(charger->poll_task);
-   dev_err(charger->dev,
-   "Unable to run kthread err %d\n", ret);
-   return ret;
-   }
+   tps65217_charger_irq(-1, charger);
}
 
return 0;
-- 
2.9.3

[PATCH v2 4/8] power: supply: tps65217: Use generic name for charger online

2016-12-08 Thread Milo Kim

This driver supports AC and USB chargers. Generic name is preferred.
Replace 'ac_online' with 'online'.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 482ee9f..424a6d3 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -44,8 +44,8 @@ struct tps65217_charger {
struct device *dev;
struct power_supply *ac;
 
-   int ac_online;
-   int prev_ac_online;
+   int online;
+   int prev_online;
 
struct task_struct  *poll_task;
 };
@@ -95,7 +95,7 @@ static int tps65217_enable_charging(struct tps65217_charger 
*charger)
int ret;
 
/* charger already enabled */
-   if (charger->ac_online)
+   if (charger->online)
return 0;
 
dev_dbg(charger->dev, "%s: enable charging\n", __func__);
@@ -110,7 +110,7 @@ static int tps65217_enable_charging(struct tps65217_charger 
*charger)
return ret;
}
 
-   charger->ac_online = 1;
+   charger->online = 1;
 
return 0;
 }
@@ -122,7 +122,7 @@ static int tps65217_ac_get_property(struct power_supply 
*psy,
struct tps65217_charger *charger = power_supply_get_drvdata(psy);
 
if (psp == POWER_SUPPLY_PROP_ONLINE) {
-   val->intval = charger->ac_online;
+   val->intval = charger->online;
return 0;
}
return -EINVAL;
@@ -133,7 +133,7 @@ static irqreturn_t tps65217_charger_irq(int irq, void *dev)
int ret, val;
struct tps65217_charger *charger = dev;
 
-   charger->prev_ac_online = charger->ac_online;
+   charger->prev_online = charger->online;
 
ret = tps65217_reg_read(charger->tps, TPS65217_REG_STATUS, &val);
if (ret < 0) {
@@ -153,10 +153,10 @@ static irqreturn_t tps65217_charger_irq(int irq, void 
*dev)
return IRQ_HANDLED;
}
} else {
-   charger->ac_online = 0;
+   charger->online = 0;
}
 
-   if (charger->prev_ac_online != charger->ac_online)
+   if (charger->prev_online != charger->online)
power_supply_changed(charger->ac);
 
ret = tps65217_reg_read(charger->tps, TPS65217_REG_CHGCONFIG0, &val);
-- 
2.9.3

[PATCH v2 3/8] power: supply: tps65217: Remove IRQ data from driver data

2016-12-08 Thread Milo Kim

IRQ number is only used on requesting the interrupt, so no need to keep
it inside the driver data.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 55371d6..482ee9f 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -48,8 +48,6 @@ struct tps65217_charger {
int prev_ac_online;
 
struct task_struct  *poll_task;
-
-   int irq;
 };
 
 static enum power_supply_property tps65217_ac_props[] = {
-- 
2.9.3

[PATCH v2 6/8] power: supply: tps65217: Use generic name for power supply property

2016-12-08 Thread Milo Kim

Replace 'ac_props' with 'charger_props'.

Signed-off-by: Milo Kim 
---
 drivers/power/supply/tps65217_charger.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/power/supply/tps65217_charger.c 
b/drivers/power/supply/tps65217_charger.c
index 5daf361..79afeca 100644
--- a/drivers/power/supply/tps65217_charger.c
+++ b/drivers/power/supply/tps65217_charger.c
@@ -50,7 +50,7 @@ struct tps65217_charger {
struct task_struct  *poll_task;
 };
 
-static enum power_supply_property tps65217_ac_props[] = {
+static enum power_supply_property tps65217_charger_props[] = {
POWER_SUPPLY_PROP_ONLINE,
 };
 
@@ -191,8 +191,8 @@ static const struct power_supply_desc tps65217_charger_desc 
= {
.name   = "tps65217-ac",
.type   = POWER_SUPPLY_TYPE_MAINS,
.get_property   = tps65217_ac_get_property,
-   .properties = tps65217_ac_props,
-   .num_properties = ARRAY_SIZE(tps65217_ac_props),
+   .properties = tps65217_charger_props,
+   .num_properties = ARRAY_SIZE(tps65217_charger_props),
 };
 
 static int tps65217_charger_probe(struct platform_device *pdev)
-- 
2.9.3

Re: [PATCH] vfio/pci: Support error recovery

2016-12-08 Thread Cao jin



On 12/09/2016 12:30 AM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 10:46:59PM +0800, Cao jin wrote:
>>
>>
>> On 12/06/2016 11:35 PM, Alex Williamson wrote:
>>> On Tue, 6 Dec 2016 18:46:04 +0800
>>> Cao jin  wrote:
>>>
 On 12/06/2016 12:59 PM, Alex Williamson wrote:
> On Tue, 6 Dec 2016 05:55:28 +0200
> "Michael S. Tsirkin"  wrote:
>   
>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:  
>>> If you're going to take the lead for these AER patches, I would
>>> certainly suggest that understanding the reasoning behind the bus reset
>>> behavior is a central aspect to this series.  This effort has dragged
>>> out for nearly two years and I apologize, but I don't really have a lot
>>> of patience for rehashing some of these issues if you're not going to
>>> read the previous discussions or consult with your colleagues to
>>> understand how we got to this point.  If you want to challenge some of
>>> the design points, that's great, it could use some new eyes, but please
>>> understand how we got here first.
>>
>> Well I'm guessing Cao jin here isn't the only one not
>> willing to plough through all historical versions of the patchset
>> just to figure out the motivation for some code.
>>
>> Including a summary of a high level architecture couldn't hurt.
>>
>> Any chance of writing such?  Alternatively, we can try to build it as
>> part of this thread.  Shouldn't be hard as it seems somewhat
>> straight-forward on the surface:
>>
>> - detect link error on the host, don't reset link as we would normally 
>> do  
>
> This is actually a new approach that I'm not sure I agree with.  By
> skipping the host directed link reset, vfio is taking responsibility
> for doing this, but then we just assume the user will do it.  I have
> issues with this.
>
> The previous approach was to use the error detected notifier to block
> access to the device, allowing the host to perform the link reset.  A
> subsequent notification in the AER process released the user access
> which allowed the user AER process to proceed.  This did result in both
> a host directed and a guest directed link reset, but other than
> coordinating the blocking of the user process during host reset, that
> hasn't been brought up as an issue previously.
>   

 Tests on previous versions didn't bring up issues as I find, I think
 that is because we didn't test it completely. As I know, before August
 of this year, we didn't have cable connected to NIC, let alone
 connecting NIC to gateway.
>>>
>>> Lack of testing has been a significant issue throughout the development
>>> of this series.
>>>
 Even if I fixed the guest oops issue in igb driver that Alex found in
 v9, v9 still cannot work in my test. And in my test, disable link
 reset(in host) in aer core for vfio-pci is the most significant step to
 get my test passed.
>>>
>>> But is it the correct step?  I'm not convinced.  Why did blocking guest
>>> access not work?  How do you plan to manage vfio taking the
>>> responsibility to perform a bus reset when you don't know whether QEMU
>>> is the user of the device or whether the user supports AER recovery?
>>>  
>>
>> Maybe currently we don't have enough proof to prove the correctness, but
>> I think I did find some facts to prove that link reset in host is a big
>> trouble, and can answer part of questions above.
>>
>> 1st, some thoughts:
>> In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
>> a recovery consists of several steps(callbacks), link reset is one of
>> them, and except link reset, the others are seems kind of device
>> specific. In our case, both host & guest will do recovery, I think the
>> host recovery actually is some kind of fake recovery, see vfio-pci
>> driver's error_detected & resume callback, they don't do anything
>> special, mainly signal error to user, but the link reset in host "fake
>> reset" does some serious work, in other words, I think host does the
>> recovery incompletely, so I was thinking, why not just drop incompletely
>> host recovery(drop link reset) for vfio-pci, and let the guest take care
>> of the whole serious recovery.  This is part of the reason of why my
>> version looks like this.  But yes, I admit the issue Alex mentioned,
>> vfio can't guarantee that user will do a bus reset, this is an issue I
>> will keep looking for a solution.
>>
>> 2nd, some facts and analyzation from test:
>> In host, the relationship between time and behviour in each component
>> roughly looks as following:
>>
>>  +   HW+  host kernel   + qemu  + guest kernel  +
>>  | |(error recovery)|   |   |
>>  | ||   |   |
>>  | | vfio-pci's |   |   |
>>

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Cao jin



On 12/09/2016 02:24 PM, Linas Vepstas wrote:
> I suppose I'm confused, but I recall that link resets are non-fatal.
> Fatal errors typically require that the the pci adapter be completely
> reset, any adapter firmware to be reloaded from scratch, the device
> driver has to kill all device state and start from scratch. Its huge.
> If the fatal error is on pci device that is under a block device
> holding a file system, then (usually) there is no way to recover,
> because the block layer (and file system) cannot deal with a block
> device that disappeared and then reappeared some few seconds later.
> (maybe some future zfs or lvm or btrfs might be able to deal with
> this, but not today)
> 
> By contrast, link resets are far more gentle: the device driver might
> have to discard some half-full FIFO's, or cancel some in-flight
> commands, but can otherwise gracefully recover without telling the
> higher layers that there were any problems.
> 
> --linas
> 

I am little confused too, even not sure if we are talking the same
*fatal error*, I am talking the fatal error defined in PCI Express spec,
chapter 6.2.2.2.1:

Fatal errors are uncorrectable error conditions which render the
particular Link and related hardware unreliable. For Fatal errors, a
reset of the components on the Link may be required to return to
reliable operation. Platform handling of Fatal errors, and any efforts
to limit the effects of these errors, is platform implementation specific.

Link reset means set *secondary bus reset* bit in pci bridge config
space, can reset the link and device simultaneously, is the strongest
kind of reset as I know.

> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>>
>>
>> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>>> On Thu, 8 Dec 2016 16:16:14 +0800
>>> Cao jin  wrote:
>>>
  The platform resets the link, and then calls the link_reset() callback
  on all affected device drivers.  This is a PCI-Express specific state
 -and is done whenever a non-fatal error has been detected that can be
 +and is done whenever a fatal error has been detected that can be
  "solved" by resetting the link. This call informs the driver of the
>>>
>>> As far as I can tell, the original text was correct here; why do you
>>> think this change needs to be made?
>>>
>>
>> See do_recovery() in aer core, reset_link() is called only seeing fatal
>> error.
>>
>> --
>> Sincerely,
>> Cao jin
>>
>>
> 
> 
> 

-- 
Sincerely,
Cao jin

Re: [PATCH] vfio/pci: Support error recovery

2016-12-08 Thread Cao jin



On 12/08/2016 10:46 PM, Cao jin wrote:
> 
> 
> On 12/06/2016 11:35 PM, Alex Williamson wrote:
>> On Tue, 6 Dec 2016 18:46:04 +0800
>> Cao jin  wrote:
>>
>>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
 On Tue, 6 Dec 2016 05:55:28 +0200
 "Michael S. Tsirkin"  wrote:
   
> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:  
>> If you're going to take the lead for these AER patches, I would
>> certainly suggest that understanding the reasoning behind the bus reset
>> behavior is a central aspect to this series.  This effort has dragged
>> out for nearly two years and I apologize, but I don't really have a lot
>> of patience for rehashing some of these issues if you're not going to
>> read the previous discussions or consult with your colleagues to
>> understand how we got to this point.  If you want to challenge some of
>> the design points, that's great, it could use some new eyes, but please
>> understand how we got here first.
>
> Well I'm guessing Cao jin here isn't the only one not
> willing to plough through all historical versions of the patchset
> just to figure out the motivation for some code.
>
> Including a summary of a high level architecture couldn't hurt.
>
> Any chance of writing such?  Alternatively, we can try to build it as
> part of this thread.  Shouldn't be hard as it seems somewhat
> straight-forward on the surface:
>
> - detect link error on the host, don't reset link as we would normally do 
>  

 This is actually a new approach that I'm not sure I agree with.  By
 skipping the host directed link reset, vfio is taking responsibility
 for doing this, but then we just assume the user will do it.  I have
 issues with this.

 The previous approach was to use the error detected notifier to block
 access to the device, allowing the host to perform the link reset.  A
 subsequent notification in the AER process released the user access
 which allowed the user AER process to proceed.  This did result in both
 a host directed and a guest directed link reset, but other than
 coordinating the blocking of the user process during host reset, that
 hasn't been brought up as an issue previously.
   
>>>
>>> Tests on previous versions didn't bring up issues as I find, I think
>>> that is because we didn't test it completely. As I know, before August
>>> of this year, we didn't have cable connected to NIC, let alone
>>> connecting NIC to gateway.
>>
>> Lack of testing has been a significant issue throughout the development
>> of this series.
>>
>>> Even if I fixed the guest oops issue in igb driver that Alex found in
>>> v9, v9 still cannot work in my test. And in my test, disable link
>>> reset(in host) in aer core for vfio-pci is the most significant step to
>>> get my test passed.
>>
>> But is it the correct step?  I'm not convinced.  Why did blocking guest
>> access not work?  How do you plan to manage vfio taking the
>> responsibility to perform a bus reset when you don't know whether QEMU
>> is the user of the device or whether the user supports AER recovery?
>>  
> 
> Maybe currently we don't have enough proof to prove the correctness, but
> I think I did find some facts to prove that link reset in host is a big
> trouble, and can answer part of questions above.
> 
> 1st, some thoughts:
> In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
> a recovery consists of several steps(callbacks), link reset is one of
> them, and except link reset, the others are seems kind of device
> specific. In our case, both host & guest will do recovery, I think the
> host recovery actually is some kind of fake recovery, see vfio-pci
> driver's error_detected & resume callback, they don't do anything
> special, mainly signal error to user, but the link reset in host "fake
> reset" does some serious work, in other words, I think host does the
> recovery incompletely, so I was thinking, why not just drop incompletely
> host recovery(drop link reset) for vfio-pci, and let the guest take care
> of the whole serious recovery.  This is part of the reason of why my
> version looks like this.  But yes, I admit the issue Alex mentioned,
> vfio can't guarantee that user will do a bus reset, this is an issue I
> will keep looking for a solution.
> 
> 2nd, some facts and analyzation from test:
> In host, the relationship between time and behviour in each component
> roughly looks as following:
> 
>  +   HW+  host kernel   + qemu  + guest kernel  +
>  | |(error recovery)|   |   |
>  | ||   |   |
>  | | vfio-pci's |   |   |
>  | | error_detected |   |   |
>  | | +  |   |   |
>  | | |

[git pull] single drm fix

2016-12-08 Thread Dave Airlie

Hi Linus,

Just a single fix for amdgpu to just suspend the gpu on "shutdown"
instead of shutting it down fully, as for some reason the hw was
getting upset in some situations.

Dave.

The following changes since commit 3e5de27e940d00d8d504dfb96625fb654f641509:

  Linux 4.9-rc8 (2016-12-04 12:50:51 -0800)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 4e4f3e984954143fb0b8e5035df7ff22dd07bb6a:

  Merge branch 'drm-fixes-4.9' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes (2016-12-08
10:32:27 +1000)


Alex Deucher (1):
  drm/amdgpu: just suspend the hw on pci shutdown

Dave Airlie (1):
  Merge branch 'drm-fixes-4.9' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes

 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 5 -
 3 files changed, 6 insertions(+), 2 deletions(-)

Re: [RFC][PATCH] HACK: usb: dwc2: Workaround case where GOTGCTL state is wrong

2016-12-08 Thread John Stultz

On Thu, Dec 8, 2016 at 11:09 PM, Chen Yu  wrote:
> On 2016/12/9 7:29, John Youn wrote:
>> On 12/8/2016 2:43 PM, John Stultz wrote:
>>> On Tue, Dec 6, 2016 at 7:52 PM, John Youn  wrote:
 On 12/6/2016 5:48 PM, John Stultz wrote:
> This patch works around the issue by re-reading the GOTGCTL
> state to check if the GOTGCTL_CONID_B is still set and if not
> restarting the change status logic.

 This also seems weird. The connector id status shouldn't go back to A,
 assuming you've left the cable unplugged.
>>>
>>> So I suspect this has something to do with the way the USB-A host
>>> ports on the board are wired up. As removing the usb-b plug seems to
>>> switch the device back into A mode.
>>>
>>> One quirk with this board is that the USB-A ports on the board do not
>>> function if anything is in the OTG/B plug (which is frustrating to use
>>> at times).
>>>
>>
>> Do you mean there are multiple A-ports on the board hooked up to the
>> same controller?
>>
>> If so, that would go a long way towards explaining things. Because the
>> hsotg is a single-port OTG controller. If there are multiple A-ports,
>> that means a hub has to be hard-wired internally to the port. But if
>> that's the case the OTG function won't work because OTG doesn't work
>> through a hub. It must go directly to the otg port. So there must be
>> some external logic kicking-in to switch routing to the OTG port or to
>> the HUB.
>>
>> This would explain this behavior with the ID pin status. Since hooking
>> up the HUB would make the controller an A-device whereas normally it
>> would be a B-device.
>>
>>> Guodong or Chen Yu understand the hardware details a bit better, and
>>> might be able to explain more if you need more information.
>>>
>>
>> Yeah it would be good to get some insight into this from a hardware
>> point of view.
>>
>
> Actually, I'm not very clear about the hardware details.
>
> In simple terms, there are two Type A USB 2.0 host ports and 
> one microUSB OTG port on the front edge of the board.
> The two Type A USB 2.0 host ports connect to a high-speed hub 
> and the hub connect to a USB Switch to which the microUSB OTG port
> also connect.
> If the Vbus of the microUSB OTG port was high or the ID of 
> the microUSB OTG port was low, the Switch will switch the DP and DM of the SOC
> to microUSB OTG port. If no cable was inserted to microUSB OTG port, 
> the Switch will switch the DP and DM of the SOC to the high-speed hub.
> There is another import point, the ID pin of soc will be 
> pulled high in both cases:
> 1.no cable is inserted to microUSB OTG port
> 2.cable is inserted to microUSB OTG port and ID of microUSB 
> OTG port is low.
>
> If my explanation confuse you,  maybe these documents can be helpful.
>
> 
> 1、https://github.com/96boards/documentation/blob/master/ConsumerEdition/HiKey/HardwareDocs/HardwareNotes.md
>
> USB Ports
>
> There are multiple USB ports on the HiKey board:
>
> One microUSB OTG port on the front edge of the board
> Two Type A USB 2.0 host ports on the front edge of the board
> One USB 2.0 host port on the high-speed expansion bus
>
> 
> 2、https://github.com/96boards/documentation/tree/master/ConsumerEdition/HiKey/AdditionalDocs
> Hardware User Guide

Yea, Page 12 in this pdf seems to explain it:
https://github.com/96boards/documentation/blob/master/ConsumerEdition/HiKey/AdditionalDocs/HiKey_Hardware_User_Manual_Rev0.2.pdf

There is a usb switch which enables the micro-usb-b port if a cable is
present, or switches to using the hub(which has its own limitations
wrt multi-speed support) for the usb-a ports.

thanks
-john

Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial numbers

2016-12-08 Thread Greg KH

On Fri, Dec 09, 2016 at 12:05:53AM +, KY Srinivasan wrote:
> 
> 
> > -Original Message-
> > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > Sent: Thursday, December 8, 2016 7:56 AM
> > To: KY Srinivasan 
> > Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> > o...@aepfle.de; a...@canonical.com; vkuzn...@redhat.com;
> > jasow...@redhat.com; leann.ogasaw...@canonical.com;
> > bjorn.helg...@gmail.com; Haiyang Zhang 
> > Subject: Re: [PATCH 3/3] hv_netvsc: Implement VF matching based on serial
> > numbers
> > 
> > On Thu, Dec 08, 2016 at 12:33:43AM -0800, k...@exchange.microsoft.com
> > wrote:
> > > From: Haiyang Zhang 
> > >
> > > We currently use MAC address to match VF and synthetic NICs. Hyper-V
> > > provides a serial number to both devices for this purpose. This patch
> > > implements the matching based on VF serial numbers. This is the way
> > > specified by the protocol and more reliable.
> > >
> > > Signed-off-by: Haiyang Zhang 
> > > Signed-off-by: K. Y. Srinivasan 
> > > ---
> > >  drivers/net/hyperv/netvsc_drv.c |   55
> > ---
> > >  1 files changed, 51 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/hyperv/netvsc_drv.c
> > b/drivers/net/hyperv/netvsc_drv.c
> > > index 9522763..c5778cf 100644
> > > --- a/drivers/net/hyperv/netvsc_drv.c
> > > +++ b/drivers/net/hyperv/netvsc_drv.c
> > > @@ -1165,9 +1165,10 @@ static void netvsc_free_netdev(struct
> > net_device *netdev)
> > >   free_netdev(netdev);
> > >  }
> > >
> > > -static struct net_device *get_netvsc_bymac(const u8 *mac)
> > > +static struct net_device *get_netvsc_byvfser(u32 vfser)
> > >  {
> > >   struct net_device *dev;
> > > + struct net_device_context *ndev_ctx;
> > >
> > >   ASSERT_RTNL();
> > >
> > > @@ -1175,7 +1176,8 @@ static void netvsc_free_netdev(struct net_device
> > *netdev)
> > >   if (dev->netdev_ops != &device_ops)
> > >   continue;   /* not a netvsc device */
> > >
> > > - if (ether_addr_equal(mac, dev->perm_addr))
> > > + ndev_ctx = netdev_priv(dev);
> > > + if (ndev_ctx->vf_serial == vfser)
> > >   return dev;
> > >   }
> > >
> > > @@ -1205,21 +1207,66 @@ static void netvsc_free_netdev(struct
> > net_device *netdev)
> > >   return NULL;
> > >  }
> > >
> > > +static u32 netvsc_get_vfser(struct net_device *vf_netdev)
> > > +{
> > > + struct device *dev;
> > > + struct hv_device *hdev;
> > > + struct hv_pcibus_device *hbus = NULL;
> > > + struct list_head *iter;
> > > + struct hv_pci_dev *hpdev;
> > > + unsigned long flags;
> > > + u32 vfser = 0;
> > > + u32 count = 0;
> > > +
> > > + for (dev = &vf_netdev->dev; dev; dev = dev->parent) {
> > 
> > You are going to walk the whole device tree backwards?  That's crazy.
> > And foolish.  And racy and broken (what happens if the tree changes
> > while you do this?)  Where is the lock being grabbed while this happens?
> > What about reference counts?  Do you see other drivers ever doing this
> > (if you do, point them out and I'll go yell at them too...)
> 
> Greg,
> 
> We are registering for netdev events. Coming into this function, the caller
> guarantees that the list of netdevs does not change - we assert this on entry:
> ASSERT_RTNL(). We are only walking up the device tree for the netdevs whose
> state change is being notified to us - the device tree being walked here is 
> limited to
> netdevs under question. 

But a netdev is a child of some type of "real" device, and you are now
walking the tree of all devices up to the "root" parent device, which
means you will hit PCI bridges, USB controllers, and all sorts of fun
things if you are a child of those types of devices.

And can't you tell if the netdev for this event, really is "your"
netdev?  Or are you getting called this for "all" netdevs?  Sorry, I
don't know this api, any pointers to it would be appreciated.

> We have a reference to the device and we know the device is not going away. 
> Is it not
> safe to dereference the parent pointer - after all the child has taken a 
> reference on
> the parent as part of  device_add() call.

It might be, and might not be.  There's a reason you don't see this
pattern anywhere in the kernel because of this...

> > > + if (!dev_is_vmbus(dev))
> > > + continue;
> > 
> > Ick.
> > 
> > Why isn't your parent pointer a vmbus device all the time?  How could
> > you get burried down in the device hierarchy when you are the driver for
> > a specific bus type in the first place?  How could this function ever be
> > called for a device that is NOT of this type?
> 
> We get notified when state changes on any of the netdev devices in the system.
> Not all netdevs in the system belong to vmbus. Consider for instance the 
> emulated NIC that can be configured. This is an emulated PCI NIC. We are only
> interested in netdevs that correspond to the VF instance that we are 
> interested in.

Can you "know" this is your net

Re: [V9fs-developer] [PATCH 2/5] 9p: store req details and callback in struct p9_req_t

2016-12-08 Thread Dominique Martinet


Nice. I like the idea of async I/Os :)

Stefano Stabellini wrote on Thu, Dec 08, 2016:
> Add a few fields to struct p9_req_t. Callback is the function which will
> be called upon requestion completion. offset, rsize, pagevec and kiocb
> store important information regarding the read or write request,
> essential to complete the request.
> 
> Currently not utilized, but they will be used in a later patch.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  include/net/9p/client.h | 8 
>  net/9p/client.c | 9 -
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index aef19c6..69fc2f0 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -110,6 +110,7 @@ enum p9_req_status_t {
>   *
>   */
>  
> +struct p9_client;
>  struct p9_req_t {
>   int status;
>   int t_err;
> @@ -118,6 +119,13 @@ struct p9_req_t {
>   struct p9_fcall *rc;
>   void *aux;
>  
> +/* Used for async requests */
> + void (*callback)(struct p9_client *c, struct p9_req_t *req, int status);
> + size_t offset;
> + u64 rsize;
> + struct page **pagevec;
> + struct kiocb *kiocb;
> +
>   struct list_head req_list;
>  };
>  
> diff --git a/net/9p/client.c b/net/9p/client.c
> index b5ea9a3..bfe1715 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -405,6 +405,10 @@ static void p9_free_req(struct p9_client *c, struct 
> p9_req_t *r)
>   int tag = r->tc->tag;
>   p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
>  
> + r->offset = 0;
> + r->rsize = 0;
> + r->kiocb = NULL;
> + r->callback = NULL;

Probably want to cleanup r->pagevec here too, even if that doesn't seem
to have any implication short-term (e.g. only looked at if callback is
not empty from what I've seen)

>   r->status = REQ_STATUS_IDLE;
>   if (tag != P9_NOTAG && p9_idpool_check(tag, c->tagpool))
>   p9_idpool_put(tag, c->tagpool);
> @@ -427,7 +431,10 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t 
> *req, int status)
>   smp_wmb();
>   req->status = status;
>  
> - wake_up(req->wq);
> + if (req->callback != NULL)
> + req->callback(c, req, status);
> + else
> + wake_up(req->wq);
>   p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
>  }
>  EXPORT_SYMBOL(p9_client_cb);

Mostly a warning here, but p9_client_cb is called from an interrupt
context in 9P/RDMA.
This has been working up till now because we only do a wake_up and
there's no waiting, but (looking at later patches),
p9_client_read_complete for example does allocations and possibly other
unsafe operations from an interrupt context.

I don't know if the way forward is to move p9_client_cb from that
context or to have the callback be scheduled in a work queue instead;
but we'll need to fix that later.

-- 
Dominique Martinet

Re: [V9fs-developer] [PATCH 4/5] 9p: introduce async read requests

2016-12-08 Thread Dominique Martinet

Stefano Stabellini wrote on Thu, Dec 08, 2016:
> If the read is an async operation, send a 9p request and return
> EIOCBQUEUED. Do not wait for completion.
> 
> Complete the read operation from a callback instead.
> 
> Signed-off-by: Stefano Stabellini 
> ---
>  net/9p/client.c | 88 
> +++--
>  1 file changed, 86 insertions(+), 2 deletions(-)
> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index eb589ef..f9f09db 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1554,13 +1555,68 @@ int p9_client_unlinkat(struct p9_fid *dfid, const 
> char *name, int flags)
>  }
>  EXPORT_SYMBOL(p9_client_unlinkat);
>  
> +static void
> +p9_client_read_complete(struct p9_client *clnt, struct p9_req_t *req, int 
> status)
> +{
> + int err, count, n, i, total = 0;
> + char *dataptr, *to;
> +
> + if (req->status == REQ_STATUS_ERROR) {
> + p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
> + err = req->t_err;
> + goto out;
> + }
> + err = p9_check_errors(clnt, req);
> + if (err)
> + goto out;
> +
> + err = p9pdu_readf(req->rc, clnt->proto_version,
> + "D", &count, &dataptr);
> + if (err) {
> + trace_9p_protocol_dump(clnt, req->rc);
> + goto out;
> + }
> + if (!count) {
> + p9_debug(P9_DEBUG_ERROR, "count=%d\n", count);
> + err = 0;
> + goto out;
> + }
> +
> + p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
> + if (count > req->rsize)
> + count = req->rsize;
> +
> + for (i = 0; i < ((req->rsize + PAGE_SIZE - 1) / PAGE_SIZE); i++) {
> + to = kmap(req->pagevec[i]);
> + to += req->offset;
> + n = PAGE_SIZE - req->offset;
> + if (n > count)
> + n = count;
> + memcpy(to, dataptr, n);
> + kunmap(req->pagevec[i]);
> + req->offset = 0;
> + count -= n;
> + total += n;
> + }
> +
> + err = total;
> + req->kiocb->ki_pos += total;
> +
> +out:
> + req->kiocb->ki_complete(req->kiocb, err, 0);
> +
> + release_pages(req->pagevec, (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE, 
> false);
> + kvfree(req->pagevec);
> + p9_free_req(clnt, req);
> +}
> +
>  int
>  p9_client_read(struct p9_fid *fid, struct kiocb *iocb, u64 offset,
>   struct iov_iter *to, int *err)
>  {
>   struct p9_client *clnt = fid->clnt;
>   struct p9_req_t *req;
> - int total = 0;
> + int total = 0, i;
>   *err = 0;
>  
>   p9_debug(P9_DEBUG_9P, ">>> TREAD fid %d offset %llu %d\n",
> @@ -1587,10 +1643,38 @@ int p9_client_unlinkat(struct p9_fid *dfid, const 
> char *name, int flags)
>   req = p9_client_zc_rpc(clnt, P9_TREAD, to, NULL, rsize,
>  0, 11, "dqd", fid->fid,
>  offset, rsize);
> - } else {
> + /* sync request */
> + } else if(iocb == NULL || is_sync_kiocb(iocb)) {
>   non_zc = 1;
>   req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, 
> offset,
>   rsize);
> + /* async request */
> + } else {

I'm not too familiar with iocb/how async IOs should work, but a logic
question just to make sure that has been thought out:
We prefer zc here to async, even if zc can be slow?

Ideally at some point zc and async aren't exclusive so we'll have async
zc and async normal, but for now I'd say async comes before zc - yes
there will be an extra copy in memory, but it will be done
asynchronously.
Was it intentional to prefer zc here?

> + req = p9_client_get_req(clnt, P9_TREAD, "dqd", 
> fid->fid, offset, rsize);
> + if (IS_ERR(req)) {
> + *err = PTR_ERR(req);
> + break;
> + }
> + req->rsize = iov_iter_get_pages_alloc(to, 
> &req->pagevec, 
> + (size_t)rsize, &req->offset);
> + req->kiocb = iocb;
> + for (i = 0; i < req->rsize; i += PAGE_SIZE)
> + 
> page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]);
> + req->callback = p9_client_read_complete;
> +
> + *err = clnt->trans_mod->request(clnt, req);
> + if (*err < 0) {
> + clnt->status = Disconnected;
> + release_pages(req->pagevec,
> + (req->rsize + PAGE_SIZE - 1) / 
> PAGE_SIZE,
> +

Re: uvcvideo logging kernel warnings on device disconnect

2016-12-08 Thread Greg KH

On Fri, Dec 09, 2016 at 01:09:21AM +0200, Laurent Pinchart wrote:
> Hi Dave,
> 
> (CC'ing LKML and Greg KH)
> 
> On Thursday 08 Dec 2016 12:31:55 Dave Stevenson wrote:
> > Hi All.
> > 
> > I'm working with a USB webcam which has been seen to spontaneously
> > disconnect when in use. That's a separate issue, but when it does it
> > throws a load of warnings into the kernel log if there is a file handle
> > on the device open at the time, even if not streaming.
> > 
> > I've reproduced this with a generic Logitech C270 webcam on:
> > - Ubuntu 16.04 (kernel 4.4.0-51) vanilla, and with the latest media tree
> > from linuxtv.org
> > - Ubuntu 14.04 (kernel 4.4.0-42) vanilla
> > - an old 3.10.x tree on an embedded device.
> > 
> > To reproduce:
> > - connect USB webcam.
> > - run a simple app that opens /dev/videoX, sleeps for a while, and then
> > closes the handle.
> > - disconnect the webcam whilst the app is running.
> > - read kernel logs - observe warnings. We get the disconnect logged as
> > it occurs, but the warnings all occur when the file descriptor is
> > closed. (A copy of the logs from my Ubuntu 14.04 machine are below).
> > 
> > I can fully appreciate that the open file descriptor is holding
> > references to a now invalid device, but is there a way to avoid them? Or
> > do we really not care and have to put up with the log noise when doing
> > such silly things?
> 
> This is a known problem, caused by the driver core trying to remove the same 
> sysfs attributes group twice.

Ick, not good.

> The group is first removed when the USB device is disconnected. The input 
> device and media device created by the uvcvideo driver are children of the 
> USB 
> interface device, which is deleted from the system when the camera is 
> unplugged. Due to the parent-child relationship, all sysfs attribute groups 
> of 
> the children are removed.

Wait, why is the USB device being removed from sysfs at this point,
didn't the input and media subsystems grab a reference to it so that it
does not disappear just yet?

> Then, when the device node is closed, the media device and input device are 
> unregistered, causing the corresponding devices to be deleted too. The driver 
> core tries to remove the sysfs attributes groups related to those devices, 
> and 
> issues a warning as they have been removed already.
> 
> I'm not sure how to fix that, any hint from LKML would be appreciated.

Properly grab a reference to the USB device?  :)

If that's already happening, please let me know and I'll see what needs
to be done, but I think that should solve the issue for you.

thanks,

greg k-h

Re: [PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

2016-12-08 Thread Greg KH

On Fri, Dec 09, 2016 at 07:34:04AM +0100, Henrik Austad wrote:
> Instead of using get_user_pages_fast() and kmap_atomic() when writing
> to the trace_marker file, just allocate enough space on the ring buffer
> directly, and write into it via copy_from_user().
> 
> Writing into the trace_marker file use to allocate a temporary buffer
> to perform the copy_from_user(), as we didn't want to write into the
> ring buffer if the copy failed. But as a trace_marker write is suppose
> to be extremely fast, and allocating memory causes other tracepoints to
> trigger, Peter Zijlstra suggested using get_user_pages_fast() and
> kmap_atomic() to keep the user space pages in memory and reading it
> directly.
> 
> Instead, just allocate the space in the ring buffer and use
> copy_from_user() directly. If it faults, return -EFAULT and write
> "" into the ring buffer.
> 
> On architectures without a arch-specific get_user_pages_fast(), this
> will end up in the generic get_user_pages_fast() and this grabs
> mm->mmap_sem. Once you do this, then suddenly writing to the
> trace_marker can cause priority-inversions.
> 
> This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
> signed-off-chain by is somewhat uncertain at this stage.
> 
> The patch compiles, boots and does not immediately explode on impact. By
> definition [2] it must therefore be perfect
> 
> 2) https://www.spinics.net/lists/kernel/msg2400769.html
> 2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html
> 
> Cc: Ingo Molnar 
> Cc: Henrik Austad 
> Cc: Peter Zijlstra 
> Cc: Steven Rostedt 
> Cc: sta...@vger.kernel.org
> 
> Suggested-by: Thomas Gleixner 
> Used-to-be-signed-off-by: Steven Rostedt 
> Backported-by: Henrik Austad 
> Tested-by: Henrik Austad 
> Signed-off-by: Henrik Austad 
> ---
>  kernel/trace/trace.c | 78 
> +++-
>  1 file changed, 22 insertions(+), 56 deletions(-)

What is the git commit id of this patch in Linus's tree?  And what
stable trees do you feel it should be applied to?

thanks,

greg k-h

Re: [PATCH v2] USB: OHCI: pxa27x:fix warnings and error

2016-12-08 Thread Greg Kroah-Hartman

On Thu, Dec 08, 2016 at 10:30:35PM +, manju goudar wrote:
> 
> 
> On Thu, Dec 8, 2016 at 4:49 PM, Greg Kroah-Hartman 
> 
> wrote:
> 
> On Wed, Dec 07, 2016 at 11:37:45PM +, csmanjuvi...@gmail.com wrote:
> > From: Manjunath Goudar 
> >
> > This patch will fix the checkpatch.pl following warnings and error:
> > WARNING: Block comments use * on subsequent lines
> > WARNING: Block comments use a trailing */ on a separate line
> > WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev,
> > ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
> > ERROR: space prohibited after that open parenthesis '('
> >
> > Signed-off-by: Manjunath Goudar 
> > Cc: Alan Stern 
> > Cc: Greg Kroah-Hartman 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> > changelog V1->V2:
> > Warnings and error message is added to the patch discrition.
> >
> >  drivers/usb/host/ohci-pxa27x.c | 24 +++-
> >  1 file changed, 11 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/usb/host/ohci-pxa27x.c b/drivers/usb/host/ohci-
> pxa27x.c
> > index 79efde8f..73445ab 100644
> > --- a/drivers/usb/host/ohci-pxa27x.c
> > +++ b/drivers/usb/host/ohci-pxa27x.c
> > @@ -106,7 +106,8 @@
> >  #define UHCHIE_UPS2IE        (1 << 12)       /* Power Sense Port2 IntEn
> */
> >  #define UHCHIE_UPS1IE        (1 << 11)       /* Power Sense Port1 IntEn
> */
> >  #define UHCHIE_TAIE  (1 << 10)       /* HCI Interface Transfer Abort
> > -                                        Interrupt Enable*/
> > +                                      * Interrupt Enable
> > +                                      */
> >  #define UHCHIE_HBAIE (1 << 8)        /* HCI Buffer Active IntEn */
> >  #define UHCHIE_RWIE  (1 << 7)        /* Remote Wake-up IntEn */
> >
> > @@ -128,14 +129,14 @@ struct pxa27x_ohci {
> >  #define to_pxa27x_ohci(hcd)  (struct pxa27x_ohci *)(hcd_to_ohci(hcd)->
> priv)
> >
> >  /*
> > -  PMM_NPS_MODE -- PMM Non-power switching mode
> > -      Ports are powered continuously.
> > -
> > -  PMM_GLOBAL_MODE -- PMM global switching mode
> > -      All ports are powered at the same time.
> > -
> > -  PMM_PERPORT_MODE -- PMM per port switching mode
> > -      Ports are powered individually.
> > + * PMM_NPS_MODE -- PMM Non-power switching mode
> > + *     Ports are powered continuously.
> > + *
> > + * PMM_GLOBAL_MODE -- PMM global switching mode
> > + *     All ports are powered at the same time.
> > + *
> > + * PMM_PERPORT_MODE -- PMM per port switching mode
> > + *     Ports are powered individually.
> >   */
> >  static int pxa27x_ohci_select_pmm(struct pxa27x_ohci *pxa_ohci, int
> mode)
> >  {
> > @@ -157,10 +158,7 @@ static int pxa27x_ohci_select_pmm(struct 
> pxa27x_ohci
> *pxa_ohci, int mode)
> >               uhcrhdb |= (0x7<<17);
> >               break;
> >       default:
> > -             printk( KERN_ERR
> > -                     "Invalid mode %d, set to non-power switch 
> mode.\n",
> > -                     mode );
> > -
> > +             dev_err(mode, "Invalid mode %d,set to non-power switch
> mode.\n");
> 
> Did you even compile this code?
> 
> 
> Yes It is successful compiled. 

I don't believe you.  Look at your change here and tell me how that
dev_err() function is correct.

> Please do so...
> 
> And don't mix different types of fixes in the same patch please.
> 
> don't mix up means each type of warning fix as a separate patch?  

Yes please.

thanks,

greg k-h

Re: [PATCH v2] kexec: add cond_resched into kimage_alloc_crash_control_pages

2016-12-08 Thread Xunlei Pang

On 12/09/2016 at 01:13 PM, zhong jiang wrote:
> On 2016/12/8 17:41, Xunlei Pang wrote:
>> On 12/08/2016 at 10:37 AM, zhongjiang wrote:
>>> From: zhong jiang 
>>>
>>> A soft lookup will occur when I run trinity in syscall kexec_load.
>>> the corresponding stack information is as follows.
>>>
>>> [  237.235937] BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859]
>>> [  237.242699] Kernel panic - not syncing: softlockup: hung tasks
>>> [  237.248573] CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G   O L 
>>> V---   3.10.0-327.28.3.35.zhongjiang.x86_64 #1
>>> [  237.259984] Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 
>>> V2/BC01SRSA0, BIOS RMIBV386 06/30/2014
>>> [  237.269752]  8187626b 18cfde31 88184c803e18 
>>> 81638f16
>>> [  237.277471]  88184c803e98 8163278f 0008 
>>> 88184c803ea8
>>> [  237.285190]  88184c803e48 18cfde31 88184c803e67 
>>> 
>>> [  237.292909] Call Trace:
>>> [  237.295404][] dump_stack+0x19/0x1b
>>> [  237.301352]  [] panic+0xd8/0x214
>>> [  237.306196]  [] watchdog_timer_fn+0x1cc/0x1e0
>>> [  237.312157]  [] ? watchdog_enable+0xc0/0xc0
>>> [  237.317955]  [] __hrtimer_run_queues+0xd2/0x260
>>> [  237.324087]  [] hrtimer_interrupt+0xb0/0x1e0
>>> [  237.329963]  [] ? call_softirq+0x1c/0x30
>>> [  237.335500]  [] local_apic_timer_interrupt+0x37/0x60
>>> [  237.342228]  [] smp_apic_timer_interrupt+0x3f/0x60
>>> [  237.348771]  [] apic_timer_interrupt+0x6d/0x80
>>> [  237.354967][] ? 
>>> kimage_alloc_control_pages+0x80/0x270
>>> [  237.362875]  [] ? kmem_cache_alloc_trace+0x1ce/0x1f0
>>> [  237.369592]  [] ? do_kimage_alloc_init+0x1f/0x90
>>> [  237.375992]  [] kimage_alloc_init+0x12a/0x180
>>> [  237.382103]  [] SyS_kexec_load+0x20a/0x260
>>> [  237.387957]  [] system_call_fastpath+0x16/0x1b
>>>
>>> the first time allocate control pages may take too much time because
>>> crash_res.end can be set to a higher value. we need to add cond_resched
>>> to avoid the issue.
>>>
>>> The patch have been tested and above issue is not appear.
>>>
>>> Signed-off-by: zhong jiang 
>>> ---
>>>  kernel/kexec_core.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>> index 5616755..bfc9621 100644
>>> --- a/kernel/kexec_core.c
>>> +++ b/kernel/kexec_core.c
>>> @@ -441,6 +441,8 @@ static struct page 
>>> *kimage_alloc_crash_control_pages(struct kimage *image,
>>> while (hole_end <= crashk_res.end) {
>>> unsigned long i;
>>>  
>>> +   cond_resched();
>>> +
>> I can't see why it would take a long time to loop inside, the job it does is 
>> simply to find a control area
>> not overlapped with image->segment[], you can see the loop "for (i = 0; i < 
>> image->nr_segments; i++)",
>> @hole_end will be advanced to the end of its next nearby segment once 
>> overlap was detected each loop,
>> also there are limited (<=16) segments, so it won't take long to locate the 
>> right area.
>>
>> Am I missing something?
>>
>> Regards,
>> Xunlei
>   if the crashkernel = auto is set in cmdline.  it represent crashk_res.end 
> will exceed to 4G, the first allocate control pages will
>   loop  million times. if we set crashk_res.end to the higher value manually, 
>  you can image

How does "loop million times" happen? See my inlined comments prefixed with 
"pxl".

kimage_alloc_crash_control_pages():
while (hole_end <= crashk_res.end) {
unsigned long i;

if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT)
break;
/* See if I overlap any of the segments */
for (i = 0; i < image->nr_segments; i++) {  // pxl: max 16 loops, all 
existent segments are not overlapped, though may not sorted.
unsigned long mstart, mend;

mstart = image->segment[i].mem;
mend   = mstart + image->segment[i].memsz - 1;
if ((hole_end >= mstart) && (hole_start <= mend)) {
/* Advance the hole to the end of the segment */
hole_start = (mend + (size - 1)) & ~(size - 1);
hole_end   = hole_start + size - 1;
break;  // pxl: If overlap was found, break for loop, @hole_end 
starts after the overlapped segment area, and will while loop again
}
}
/* If I don't overlap any segments I have found my hole! */
if (i == image->nr_segments) {
pages = pfn_to_page(hole_start >> PAGE_SHIFT);
image->control_page = hole_end;
break;   // pxl: no overlap with all the segments, get the result 
and break the while loop. END.
}   
}

So, the worst "while" loops in theory would be (image->nr_segments + 1), no?

Regards,
Xunlei

Re: [RFC][PATCH] HACK: usb: dwc2: Workaround case where GOTGCTL state is wrong

2016-12-08 Thread Chen Yu



On 2016/12/9 7:29, John Youn wrote:
> On 12/8/2016 2:43 PM, John Stultz wrote:
>> On Tue, Dec 6, 2016 at 7:52 PM, John Youn  wrote:
>>> On 12/6/2016 5:48 PM, John Stultz wrote:
 Hey John,
   Just wanted to send this by you, as it seems something is
 slightly off with the GOTGCTL state when removing a otg adapter
 cable. The following seems to work around the issue I'm seeing.


 When removing a USB-A to USB-otg adapter cable, we get a change
 status irq, and then in dwc2_conn_id_status_change, we
 erroniously see the GOTGCTL_CONID_B flag set. This causes us to
>>>
>>> This is the correct behavior for an OTG controller. When you unplug a
>>> cable or plug in the B end of a cable, the ID pin floats, indicating
>>> it is a B-Device.
>>>
>>> When you plug in an A-cable, which is what your adapter is, it will
>>> ground the pin, meaning A-device.
>>
>> Hrm... So normally, when I plug in the gadget cable into the OTG port,
>> I see the change_status irq comes in and the function sees:
>>
>> dwc2 f72c.usb: gotgctl=401
>> dwc2 f72c.usb: gotgctl.b.conidsts=1
>> dwc2 f72c.usb: Do port resume before switching to device mode
>> dwc2 f72c.usb: dwc2_hsotg_enqueue_setup: failed queue (-11)
>> dwc2 f72c.usb: new device is high-speed
>> dwc2 f72c.usb: new device is high-speed
>> dwc2 f72c.usb: new device is high-speed
>> dwc2 f72c.usb: new address 37
>> configfs-gadget gadget: high-speed config #1: b
>>
>> Then when I unplug the cable:
>>
>> dwc2 f72c.usb: gotgctl=220
>> dwc2 f72c.usb: gotgctl.b.conidsts=0
>> usb 1-1: reset high-speed USB device number 13 using dwc2
>>
>>
>>
>> When I plug in the OTG to USB-A adapter cable w/ a mouse plugged in
>> (note I see no change interrupt):
>>
>> usb 1-1: USB disconnect, device number 13
>> usb 1-1: new low-speed USB device number 14 using dwc2
>> input: Logitech USB Optical Mouse as
>> /devices/platform/soc/f72c.usb/usb1/1-1/1-1:1.0/0003:046D:C058.0003/input/input3
>> hid-generic 0003:046D:C058.0003: input,hidraw0: USB HID v1.11 Mouse
>> [Logitech USB Optical Mouse] on usb-f72c.usb-1/input0
>>
>>
>> Then unplugging the OTG to USB-A adapter cable w/ mouse:
>>
>> dwc2 f72c.usb: gotgctl=401
>> dwc2 f72c.usb: gotgctl.b.conidsts=1
>> dwc2 f72c.usb: Do port resume before switching to device mode
>> dwc2 f72c.usb: Waiting for Peripheral Mode, Mode=Host
>>
>> > patch from this thread>
>>
>> usb 1-1: USB disconnect, device number 14
>> dwc2 f72c.usb: gotgctl=220
>> dwc2 f72c.usb: gotgctl.b.conidsts=0
>> usb 1-1: new high-speed USB device number 15 using dwc2
>> hub 1-1:1.0: USB hub found
>> hub 1-1:1.0: 3 ports detected
>>
>>
>> So I only get the change irq when:
>> * I plug in a micro-usb-B cable for gadget mode
>> * I remove the micro-usb-B cable being used for gadget mode
>> * I remove a OTG to USB-A adapter
>>
> 
> That's very strange. It's opposite of how it's supposed to work.
> 
>> One slight quirk, is that I don't always see the change irq when
>> removing the OTG to USB, as if I plug in a highspeed mass-storage
>> device, instead of the low-speed mouse, I don't see the change
>> interrupt and the device shows up and disappears the same as when I
>> plug into the normal USB-A host ports on the board.
>>
>>
 get  stuck in the "while (!dwc2_is_device_mode(hsotg))" loop,
 spitting out "Waiting for Peripheral Mode, Mode=Host" warnings
 until it fails out many seconds later.
>>>
>>> This is weird. Once the ID pin goes to B, the core should become a
>>> peripheral and this should be reflected in the status registers.
>>>

 This patch works around the issue by re-reading the GOTGCTL
 state to check if the GOTGCTL_CONID_B is still set and if not
 restarting the change status logic.
>>>
>>> This also seems weird. The connector id status shouldn't go back to A,
>>> assuming you've left the cable unplugged.
>>
>> So I suspect this has something to do with the way the USB-A host
>> ports on the board are wired up. As removing the usb-b plug seems to
>> switch the device back into A mode.
>>
>> One quirk with this board is that the USB-A ports on the board do not
>> function if anything is in the OTG/B plug (which is frustrating to use
>> at times).
>>
> 
> Do you mean there are multiple A-ports on the board hooked up to the
> same controller?
> 
> If so, that would go a long way towards explaining things. Because the
> hsotg is a single-port OTG controller. If there are multiple A-ports,
> that means a hub has to be hard-wired internally to the port. But if
> that's the case the OTG function won't work because OTG doesn't work
> through a hub. It must go directly to the otg port. So there must be
> some external logic kicking-in to switch routing to the OTG port or to
> the HUB.
> 
> This would explain this behavior with the ID pin status. Since hooking
> up the HUB would make the controller an A-device whereas normally it
> would be a B-devic

Re: [PATCH v3 1/9] staging: fsl-mc: move bus driver out of staging

2016-12-08 Thread Greg KH

On Fri, Dec 09, 2016 at 12:36:26AM +, Stuart Yoder wrote:
> > -Original Message-
> > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > Sent: Thursday, December 08, 2016 10:05 AM
> > To: Stuart Yoder 
> > Cc: de...@driverdev.osuosl.org; ag...@suse.de; a...@arndb.de; 
> > linux-kernel@vger.kernel.org; Leo Li
> > ; Catalin Horghidan ; Ioana 
> > Ciornei
> > ; Laurentiu Tudor 
> > Subject: Re: [PATCH v3 1/9] staging: fsl-mc: move bus driver out of staging
> > 
> > On Wed, Dec 07, 2016 at 08:19:20PM +, Stuart Yoder wrote:
> > > > -Original Message-
> > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > Sent: Wednesday, December 07, 2016 9:53 AM
> > > > To: Stuart Yoder 
> > > > Cc: de...@driverdev.osuosl.org; linux-kernel@vger.kernel.org; 
> > > > ag...@suse.de; a...@arndb.de; Leo Li
> > > > ; Ioana Ciornei ; Catalin 
> > > > Horghidan
> > > > ; Laurentiu Tudor ; 
> > > > Ruxandra Ioana Radulescu
> > > > 
> > > > Subject: Re: [PATCH v3 1/9] staging: fsl-mc: move bus driver out of 
> > > > staging
> > > >
> > > > On Thu, Dec 01, 2016 at 04:41:26PM -0600, Stuart Yoder wrote:
> > > > > Move the source files out of staging into their final locations:
> > > > >   -include files in drivers/staging/fsl-mc/include go to 
> > > > > include/linux/fsl
> > > > >   -irq-gic-v3-its-fsl-mc-msi.c goes to drivers/irqchip
> > > > >   -source in drivers/staging/fsl-mc/bus goes to drivers/bus/fsl-mc
> > > > >   -README.txt, providing and overview of DPAA goes to
> > > > >Documentation/dpaa2/overview.txt
> > > > >   -update MAINTAINERS with new location
> > > > >
> > > > > Delete other remaining staging files-- Makefile, Kconfig, TODO
> > > >
> > > > Ok, given that I haven't ever reviewed this code, I had a few questions
> > > > that I couldn't easily figure out by looking at your code:
> > > > - what is the lifecycle of your 'struct device' usage?  Who
> > > >   creates it, who frees it, and who accesses it?
> > >
> > > We embed a 'struct device' inside our bus specific device struct
> > > 'struct fsl_mc_device'.  So, when a new fsl-mc object is discovered
> > > on the bus during initial enumeration or hotplug we create a new
> > > 'struct fsl_mc_device' and do a device_initialize()/device_add().
> > > (see fsl_mc_device_add() for where this is done)
> > >
> > > 'struct device' is freed when a device is removed-- the reverse
> > > of the above.
> > 
> > Where is the device freed?  I see you trying to do some "odd" stuff in
> > fsl_mc_device_remove() by deleting and then putting a device structure.
> > I can't find a "release()" callback anywhere for your bus, where is it?
> > 
> > What happens when the reference count falls to 0 for your struct device?
> 
> Hrm...something seems wrong in free path, and I think this needs to
> be refactored.
> 
> IIRC, when German (former maintainer) wrote that code he loosely based
> it on the register/unregister platform bus code:
> 
> int platform_device_register(struct platform_device *pdev)
> {
> device_initialize(&pdev->dev);
> arch_setup_pdev_archdata(pdev);
> return platform_device_add(pdev);
> }
> void platform_device_unregister(struct platform_device *pdev)
> {
> platform_device_del(pdev);
> platform_device_put(pdev);
> }
> 
> ...I'm puzzling over how that code handles a refcount of zero
> as I see no 'release' callback anywhere, but I must be missing
> something.
> 
> In any case, we'll get this refactored.

Have you tried removing a device?  The kernel should complain loudly
about there not being a release function for your device.

> > > > - root_dprc_count, why are you using an atomic variable for
> > > >   this?  What is it for other than "look, I'm running!"?
> > >
> > > There can be multiple root buses, and this variable simply tracks the 
> > > count
> > > of them.
> > 
> > Why does it matter?
> > 
> > > It's is atomic there might be a theoretical race condition where 2
> > > buses might be added at the same time.  The root buses are found in
> > > the device tree and so if there is no chance that device tree
> > > processing happens in parallel on multiple cores then we could remove
> > > the atomic.
> > 
> > Why not just use a lock, or better yet, not care about a "count" at all?
> > I don't see you doing anything with the count, other than emitting a
> > WARN() if it drops down below 0 for some reason, or when you call
> > fsl_mc_bus_exists() which for some reason is exported yet no one uses
> > it...
> 
> We can drop this count.  At one time I think there was envisioned an 
> external user who needed it, but it's no longer the case.

Please do, we are trying to get rid of atomic_t abuse on other mailing
lists, and this one fits the pattern of "no real need for it" :)

> Given the additional refactoring, I think the fsl-mc bus driver needs
> to stay in staging for a bit.  In order to facilitate further review
> I'm going to refactor the patch series:
>   staging: f

Re: Still OOM problems with 4.9er kernels

2016-12-08 Thread Gerhard Wiesinger


Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes 
better).


./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7 
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Any chance to get it fixed in 4.9.0 release?

Ciao,
Gerhard


On 30.11.2016 08:20, Gerhard Wiesinger wrote:

Hello,

See also:
Bug 1314697 - Kernel 4.4.3-300.fc23.x86_64 is not stable inside a KVM VM
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Ciao,
Gerhard


On 30.11.2016 08:10, Gerhard Wiesinger wrote:

Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get 
more and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/ 


https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 
4.9.0-0.rc6.git2.1.fc26.x86_64


But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Re: [PATCH] Staging: ks7010: ks7010_sdio.h: Fixed coding style errors

2016-12-08 Thread Greg KH

On Fri, Dec 09, 2016 at 12:29:21PM +, Manoj Sawai wrote:
> Errors - Complex macro not a parentheses and trailing whitespace
> Also fixed other small checkpatch warnings and checks.

If you ever say "also" in a changelog, that's a huge hint that the patch
needs to be broken up into multiple patches.  That is the case here,
please only do one type of coding style fix at a time.

thanks,

greg k-h

Re: Tearing down DMA transfer setup after DMA client has finished

2016-12-08 Thread Vinod Koul

On Thu, Dec 08, 2016 at 04:48:18PM +, Måns Rullgård wrote:
> Vinod Koul  writes:
> 
> > To make it efficient, disregarding your Sbox HW issue, the solution is
> > virtual channels. You can delink physical channels and virtual channels. If
> > one has SW controlled MUX then a channel can service any client. For few
> > controllers request lines are hard wired so they cant use any channel. But
> > if you dont have this restriction then driver can queue up many transactions
> > from different controllers.
> 
> Have you been paying attention at all?  This exactly what the driver
> ALREADY DOES.

And have you read what the question was?

-- 
~Vinod

[PATCH] Staging: ks7010: ks7010_sdio.h: Fixed coding style errors

2016-12-08 Thread Manoj Sawai

Errors - Complex macro not a parentheses and trailing whitespace
Also fixed other small checkpatch warnings and checks.

Signed-off-by: Manoj Sawai 
---
 drivers/staging/ks7010/ks7010_sdio.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/ks7010/ks7010_sdio.h 
b/drivers/staging/ks7010/ks7010_sdio.h
index 0f5fd848e23d..f12d41835b03 100644
--- a/drivers/staging/ks7010/ks7010_sdio.h
+++ b/drivers/staging/ks7010/ks7010_sdio.h
@@ -46,7 +46,7 @@
  */
 #define WSTATUS_RSIZE  0x14
 #define WSTATUS_MASK   0x80/* Write Status Register value */
-#define RSIZE_MASK 0x7F/* Read Data Size Register value [10:4] 
*/
+#define RSIZE_MASK 0x7F// Read Data Size Register value [10:4]
 
 /* ARM to SD interrupt Enable */
 #define INT_ENABLE 0x20
@@ -81,11 +81,11 @@
 
 /* AHB Data Window  0x01-0x01 */
 #define DATA_WINDOW0x01
-#define WINDOW_SIZE64*1024
+#define WINDOW_SIZE(64 * 1024)
 
 #define KS7010_IRAM_ADDRESS0x0600
 
-/* 
+/*
  * struct define
  */
 struct hw_info_t {
@@ -115,7 +115,7 @@ struct ks_sdio_card {
 struct tx_device_buffer {
unsigned char *sendp;   /* pointer of send req data */
unsigned int size;
-   void (*complete_handler) (void *arg1, void *arg2);
+   void (*complete_handler)(void *arg1, void *arg2);
void *arg1;
void *arg2;
 };
@@ -142,6 +142,7 @@ struct rx_device {
unsigned int qtail; /* rx buffer queue last pointer */
spinlock_t rx_dev_lock;
 };
+
 #defineROM_FILE "ks7010sd.rom"
 
 #endif /* _KS7010_SDIO_H */
-- 
2.11.0

Re: netlink: GPF in sock_sndtimeo

2016-12-08 Thread Cong Wang

On Thu, Dec 8, 2016 at 10:02 PM, Richard Guy Briggs  wrote:
> I also tried to extend Cong Wang's idea to attempt to proactively respond to a
> NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
> stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
> Eliminating the lock since the sock is dead anways eliminates the error.
>
> Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
> get the test case to compile.

It doesn't look safe, because 'audit_sock', 'audit_nlk_portid' and 'audit_pid'
are updated as a whole and race between audit_receive_msg() and
NETLINK_URELEASE.


> @@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net *net)
>  {
> struct audit_net *aunet = net_generic(net, audit_net_id);
> struct sock *sock = aunet->nlsk;
> +
> +   mutex_lock(&audit_cmd_mutex);
> if (sock == audit_sock) {
> audit_pid = 0;
> +   audit_nlk_portid = 0;
> audit_sock = NULL;
> }
> +   mutex_unlock(&audit_cmd_mutex);
>

If you decide to use NETLINK_URELEASE notifier, the above piece is no
longer needed, the net_exit path simply releases a refcnt.

[PATCH] firmware: dmi_scan: Always show system identification string

2016-12-08 Thread Kefeng Wang

Let's keep consistent when print dmi_ids_string between SMBIOS 2.x
and SMBIOS 3.x, and always show the system identification string,
like Vendor, Product/Board name and BIOS infos.

Signed-off-by: Kefeng Wang 
---
 drivers/firmware/dmi_scan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 88bebe1..54be60e 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -560,7 +560,7 @@ static int __init dmi_present(const u8 *buf)
dmi_ver >> 16, (dmi_ver >> 8) & 0xFF);
}
dmi_format_ids(dmi_ids_string, sizeof(dmi_ids_string));
-   printk(KERN_DEBUG "DMI: %s\n", dmi_ids_string);
+   pr_info("DMI: %s\n", dmi_ids_string);
return 0;
}
}
@@ -588,7 +588,7 @@ static int __init dmi_smbios3_present(const u8 *buf)
dmi_ver >> 16, (dmi_ver >> 8) & 0xFF,
dmi_ver & 0xFF);
dmi_format_ids(dmi_ids_string, sizeof(dmi_ids_string));
-   pr_debug("DMI: %s\n", dmi_ids_string);
+   pr_info("DMI: %s\n", dmi_ids_string);
return 0;
}
}
-- 
1.7.12.4

RE: ATH9 driver issues on ARM64

2016-12-08 Thread Bharat Kumar Gogada

Sorry, Forgot to add kernel version, we are using 4.6 kernel. 

> Hi,
> Can any one tell, when exactly the chip sends ASSERT & DEASSERT in driver.
> It might help us to debug issue further.
> 
> Thanks & Regards,
> Bharat
> 
> > >  > [+cc Kalle, ath9k list]
> >
> > Thanks, but please also CC linux-wireless. Full thread below for the folks 
> > there.
> >
> > >> On Thu, Dec 08, 2016 at 01:49:42PM +, Bharat Kumar Gogada wrote:
> > >> > Hi,
> > >> >
> > >> > Did anyone test Atheros ATH9
> > >> > driver(drivers/net/wireless/ath/ath9k/)
> > >> > on ARM64.  The end point is TP link wifi card with which supports
> > >> > only legacy interrupts.
> > >>
> > >> If it works on other arches and the arm64 PCI enumeration works, my
> > >> first guess would be an INTx issue, e.g., maybe the driver is
> > >> waiting for an interrupt that never arrives.
> > > We are not sure for now.
> > >>
> > >> > We are trying to test it on ARM64 with
> > >> > (drivers/pci/host/pcie-xilinx-nwl.c) as root port.
> > >> >
> > >> > EP is getting enumerated and able to link up.
> > >> >
> > >> > But when we start scan system gets hanged.
> > >>
> > >> When you say the system hangs when you start a scan, I assume you
> > >> mean a wifi scan, not the PCI enumeration.  A problem with a wifi
> > >> scan might cause a *process* to hang, but it shouldn't hang the
> > >> entire system.
> > >>
> > > Yes wifi scan.
> > >> > When we took trace we see that after we start scan assert message
> > >> > is sent but there is no de assert from end point.
> > >>
> > >> Are you talking about a trace from a PCIe analyzer?  Do you see an
> > >> Assert_INTx PCIe message on the link?
> > >>
> > > Yes lecroy trace, yes we do see Assert_INTx and Deassert_INTx
> > > happening
> > when we do interface link up.
> > > When we have less debug prints in Atheros driver, and do wifi scan
> > > we see Assert_INTx but never Deassert_INTx,
> > >> > What might cause end point not sending de assert ?
> > >>
> > >> If the endpoint doesn't send a Deassert_INTx message, I expect that
> > >> would mean the driver didn't service the interrupt and remove the
> > >> condition that caused the device to assert the interrupt in the
> > >> first place.
> > >>
> > >> If the driver didn't receive the interrupt, it couldn't service it,
> > >> of course.  You could add a printk in the ath9k interrupt service
> > >> routine to see if you ever get there.
> > >>
> > > The interrupt behavior is changing w.r.t amount of debug prints we
> > > add. (I kept many prints to aid debug) root@Xilinx-ZCU102-2016_3:~#
> > > iw dev
> > wlan0 scan
> > > [   83.064675] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.069486] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.074257] ath9k_hw_kill_interrupts793
> > > [   83.078260] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.083107] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.087882] ath9k_hw_kill_interrupts793
> > > [   83.095450] ath9k_hw_enable_interrupts  821
> > > [   83.099557] ath9k_hw_enable_interrupts  825
> > > [   83.103721] ath9k_hw_enable_interrupts  832
> > > [   83.107887] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.112748] AR_SREV_9100 0
> > > [   83.115438] ath9k_hw_enable_interrupts  848
> > > [   83.119607] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.124389] ath9k_hw_intrpend   762
> > > [   83.127761] (AR_SREV_9340(ah) val 0
> > > [   83.131234] ath9k_hw_intrpend   767
> > > [   83.134628] ath_isr 603
> > > [   83.137134] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.141995] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.146771] ath9k_hw_kill_interrupts793
> > > [   83.150864] ath9k_hw_enable_interrupts  821
> > > [   83.154971] ath9k_hw_enable_interrupts  825
> > > [   83.159135] ath9k_hw_enable_interrupts  832
> > > [   83.163300] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.168161] AR_SREV_9100 0
> > > [   83.170852] ath9k_hw_enable_interrupts  848
> > > [   83.170855] ath9k_hw_intrpend   762
> > > [   83.178398] (AR_SREV_9340(ah) val 0
> > > [   83.181873] ath9k_hw_intrpend   767
> > > [   83.185265] ath_isr 603
> > > [   83.187773] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.192635] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.197411] ath9k_hw_kill_interrupts793
> > > [   83.201414] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.206258] ath9k_hw_enable_interrupts  821
> > > [   83.210368] ath9k_hw_enable_interrupts  825
> > > [   83.214531] ath9k_hw_enable_interrupts  832
> > > [   83.218698] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.223558] AR_SREV_9100 0
> > > [   83.226243] ath9k_hw_enable_interrupts  848
> > > [   83.226246] ath9k_hw_intrpend   762
> > > [   83.233794] (AR_SREV_9340(ah) val 0
> > > [   83.237268] ath9k_hw_intrpend   767
> > > [   83.240661] ath_isr 603
> > > [   83.243169] ath9k: ath9k_iowrite32 ff800a400024
> > > [   83.248030] ath9k: ath9k_ioread32 ff800a400024
> > > [   83.252806] ath9

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko

On Fri 09-12-16 06:38:04, Al Viro wrote:
> On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:
> 
> > > Easier to handle those in vmalloc() itself.
> > 
> > I think there were some attempts in the past but some of the code paths
> > are burried too deep and adding gfp_mask all the way down there seemed
> > like a major surgery.
> 
> No need to propagate gfp_mask - the same trick XFS is doing right now can
> be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
> patches and post them tomorrow after I get some sleep.

That would work as an immediate mitigation. No question about that but
what I've tried to point out in the reply to Dave is that longerm we
shouldn't hide this trickiness inside the vmalloc and rather handle
those users who are requesting NOFS/NOIO context from vmalloc. We
already have a scope api for NOIO and I want to add the same for NOFS.
I believe that much more sane approach is to use the API at those places
which really start/stop reclaim recursion dangerous scope (e.g. the
transaction context) rather than using GFP_NOFS randomly because this
approach has proven to not work properly over years. We have so many
place using GFP_NOFS just because nobody bothered to think whether it is
needed but it must be safe for sure that it is not funny.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Andrew Donnellan


On 09/12/16 17:24, Linas Vepstas wrote:

I suppose I'm confused, but I recall that link resets are non-fatal.
Fatal errors typically require that the the pci adapter be completely
reset, any adapter firmware to be reloaded from scratch, the device
driver has to kill all device state and start from scratch. Its huge.


Is there a difference in terminology between an AER fatal error and what 
EEH/IBM people think of as a fatal error?



If the fatal error is on pci device that is under a block device
holding a file system, then (usually) there is no way to recover,
because the block layer (and file system) cannot deal with a block
device that disappeared and then reappeared some few seconds later.
(maybe some future zfs or lvm or btrfs might be able to deal with
this, but not today)


Is this still true? I'm not at all familiar with the block device side 
of it, but the cxlflash driver has reasonably full EEH support, 
including surviving a full PHB fence and complete reset.


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

Re: [PATCH 4/7] blk-flush: run the queue when inserting blk-mq flush

2016-12-08 Thread Hannes Reinecke


On 12/08/2016 09:13 PM, Jens Axboe wrote:

Currently we pass in to run the queue async, but don't flag the
queue to be run. We don't need to run it async here, but we should
run it. So fixup the parameters.

Signed-off-by: Jens Axboe 
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 1bdbb3d3e5f5..27a42dab5a36 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -426,7 +426,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops) {
-   blk_mq_insert_request(rq, false, false, true);
+   blk_mq_insert_request(rq, false, true, false);
} else
list_add_tail(&rq->queuelist, &q->queue_head);
return;


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 2/7] blk-mq: abstract out blk_mq_dispatch_rq_list() helper

2016-12-08 Thread Hannes Reinecke


On 12/08/2016 09:13 PM, Jens Axboe wrote:

Takes a list of requests, and dispatches it. Moves any residual
requests to the dispatch list.

Signed-off-by: Jens Axboe 
---
 block/blk-mq.c | 85 --
 block/blk-mq.h |  1 +
 2 files changed, 48 insertions(+), 38 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 3/7] elevator: make the rqhash helpers exported

2016-12-08 Thread Hannes Reinecke


On 12/08/2016 09:13 PM, Jens Axboe wrote:

Signed-off-by: Jens Axboe 
---
 block/elevator.c | 8 
 include/linux/elevator.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

On Fri, Dec 9, 2016 at 2:37 PM, Cao jin  wrote:
>
>
> On 12/09/2016 02:24 PM, Linas Vepstas wrote:
>> I suppose I'm confused, but I recall that link resets are non-fatal.
>> Fatal errors typically require that the the pci adapter be completely
>> reset, any adapter firmware to be reloaded from scratch, the device
>> driver has to kill all device state and start from scratch. Its huge.
>> If the fatal error is on pci device that is under a block device
>> holding a file system, then (usually) there is no way to recover,
>> because the block layer (and file system) cannot deal with a block
>> device that disappeared and then reappeared some few seconds later.
>> (maybe some future zfs or lvm or btrfs might be able to deal with
>> this, but not today)
>>
>> By contrast, link resets are far more gentle: the device driver might
>> have to discard some half-full FIFO's, or cancel some in-flight
>> commands, but can otherwise gracefully recover without telling the
>> higher layers that there were any problems.
>>
>> --linas
>>
>
> I am little confused too, even not sure if we are talking the same
> *fatal error*, I am talking the fatal error defined in PCI Express spec,
> chapter 6.2.2.2.1:
>
> Fatal errors are uncorrectable error conditions which render the
> particular Link and related hardware unreliable. For Fatal errors, a
> reset of the components on the Link may be required to return to
> reliable operation. Platform handling of Fatal errors, and any efforts
> to limit the effects of these errors, is platform implementation specific.
>
> Link reset means set *secondary bus reset* bit in pci bridge config
> space, can reset the link and device simultaneously, is the strongest
> kind of reset as I know.

OK, well, its been far too many years, and I don't have the PCI spec
at my fingertips.
Isn't there a link reset that can be performed, without forcing a device reset?

The intent was that some PCI link errors are due to vibration,
ground-bounce, humidity, etc. and that these errors can be detected
and do not corrupt the device state or the device driver state.  Since
they are not associated with data corruption (or rather, the
corruption is local to the link), these can be recovered by reseting
just the link, without resetting the whole adapter. They may require
reseting some device-driver state, but not all of it.

However, this was all decided before the PCI-E spec was written, so
maybe the newer PCI-E specs now say something different.

--linas

>
>> On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>>>
>>>
>>> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
 On Thu, 8 Dec 2016 16:16:14 +0800
 Cao jin  wrote:

>  The platform resets the link, and then calls the link_reset() callback
>  on all affected device drivers.  This is a PCI-Express specific state
> -and is done whenever a non-fatal error has been detected that can be
> +and is done whenever a fatal error has been detected that can be
>  "solved" by resetting the link. This call informs the driver of the

 As far as I can tell, the original text was correct here; why do you
 think this change needs to be made?

>>>
>>> See do_recovery() in aer core, reset_link() is called only seeing fatal
>>> error.
>>>
>>> --
>>> Sincerely,
>>> Cao jin
>>>
>>>
>>
>>
>>
>
> --
> Sincerely,
> Cao jin
>
>

Re: fs, net: deadlock between bind/splice on af_unix

2016-12-08 Thread Al Viro

On Thu, Dec 08, 2016 at 10:32:00PM -0800, Cong Wang wrote:

> > Why do we do autobind there, anyway, and why is it conditional on
> > SOCK_PASSCRED?  Note that e.g. for SOCK_STREAM we can bloody well get
> > to sending stuff without autobind ever done - just use socketpair()
> > to create that sucker and we won't be going through the connect()
> > at all.
> 
> In the case Dmitry reported, unix_dgram_sendmsg() calls unix_autobind(),
> not SOCK_STREAM.

Yes, I've noticed.  What I'm asking is what in there needs autobind triggered
on sendmsg and why doesn't the same need affect the SOCK_STREAM case?

> I guess some lock, perhaps the u->bindlock could be dropped before
> acquiring the next one (sb_writer), but I need to double check.

Bad idea, IMO - do you *want* autobind being able to come through while
bind(2) is busy with mknod?

Re: [PATCH] linux/types.h: enable endian checks for all sparse builds

2016-12-08 Thread Madhani, Himanshu

Hi Mike/Bart, 







On 12/8/16, 8:17 AM, "virtualization-boun...@lists.linux-foundation.org on 
behalf of Michael S. Tsirkin" 
 wrote:

>On Thu, Dec 08, 2016 at 06:38:11AM +, Bart Van Assche wrote:
>> On 12/07/16 21:54, Michael S. Tsirkin wrote:
>> > On Thu, Dec 08, 2016 at 05:21:47AM +, Bart Van Assche wrote:
>> >> Additionally, there are notable exceptions to the rule that most drivers
>> >> are endian-clean, e.g. drivers/scsi/qla2xxx. I would appreciate it if it
>> >> would remain possible to check such drivers with sparse without enabling
>> >> endianness checks. Have you considered to change #ifdef __CHECK_ENDIAN__
>> >> into e.g. #ifndef __DONT_CHECK_ENDIAN__?
>> >
>> > The right thing is probably just to fix these, isn't it?
>> > Until then, why not just ignore the warnings?
>> 
>> Neither option is realistic. With endian-checking enabled the qla2xxx 
>> driver triggers so many warnings that it becomes a real challenge to 
>> filter the non-endian warnings out manually:
>> 
>> $ for f in "" CF=-D__CHECK_ENDIAN__; do make M=drivers/scsi/qla2xxx C=2\
>>  $f | &grep -c ': warning:'; done
>> 4
>> 752
>
>You can always revert this patch in your tree, or whatever.  It does not
>look like this will get fixed otherwise.
>
>> If you think it would be easy to fix the endian warnings triggered by 
>> the qla2xxx driver, you are welcome to try to fix these.
>> 
>> Bart.
>
>Yea, this hardware was designed by someone who thought mixing
>LE and BE all over the place is a good idea.
>But who said it should be easy?
>
>Maybe this change will be enough to motivate the maintainers.
>
>Here's a minor buglet for you as a motivator:
>
>if (ct_rsp->header.response !=
>cpu_to_be16(CT_ACCEPT_RESPONSE)) {
>ql_dbg(ql_dbg_disc + ql_dbg_buffer, vha, 
> 0x2077,
>"%s failed rejected request on port_id: 
> %02x%02x%02x Compeltion status 0x%x, response 0x%x\n",
>routine, vha->d_id.b.domain,
>vha->d_id.b.area, vha->d_id.b.al_pa, 
> comp_status, ct_rsp->header.response);
>
>
>response is BE and isn't printed correctly.
>
>another:
>
>eiter->a.max_frame_size = cpu_to_be32(eiter->a.max_frame_size);
>size += 4 + 4;
>
>ql_dbg(ql_dbg_disc, vha, 0x20bc,
>"Max_Frame_Size = %x.\n", eiter->a.max_frame_size);
>
>printed too late, it's be by that time.
>
>Here's another suspicious line
>
>ctio24->u.status1.flags = (atio->u.isp24.attr << 9) |
>cpu_to_le16(CTIO7_FLAGS_STATUS_MODE_1 |
>CTIO7_FLAGS_TERMINATE);
>
>shifting attr by 9 bits gives different results on BE and LE,
>mixing it with le16 looks rather strange.
>
>Another:
>
>ha->flags.dport_enabled =
>(mid_init_cb->init_cb.firmware_options_1 & BIT_7) != 0;
>
>BIT_7 is native endian, firmware_options_1 is LE I think.
>
>
>
>Look at qla27xx_find_valid_image as well.
>
>if (pri_image_status.signature != QLA27XX_IMG_STATUS_SIGN)
>
>qla27xx_image_status seems to be data coming from flash, but is
>somehow native-endian? Maybe ...
>
>
>lun = a->u.isp24.fcp_cmnd.lun;
>
>I think lun here is in hardware format (le?), code treats it
>as native.
>
>
>Not to speak about interface abuse all over the place.
>How about this:
>
>uint32_t *
>qla24xx_read_flash_data(scsi_qla_host_t *vha, uint32_t *dwptr, uint32_t
>faddr,
>uint32_t dwords) 
>{
>uint32_t i; 
>struct qla_hw_data *ha = vha->hw;
>
>/* Dword reads to flash. */
>for (i = 0; i < dwords; i++, faddr++)
>dwptr[i] = cpu_to_le32(qla24xx_read_flash_dword(ha,
>flash_data_addr(ha, faddr)));
>
>return dwptr;   
>}
>
>OK so we convert to LE ...
>
>qla24xx_read_flash_data(vha, dcode, faddr, 4); 
>
>risc_addr = be32_to_cpu(dcode[2]);
>*srisc_addr = *srisc_addr == 0 ? risc_addr : *srisc_addr;
>risc_size = be32_to_cpu(dcode[3]);
>
>then happily assume it's BE.
>
>And again, coming from flash, it's unlikely to actually be in the native
>endian-ness as callers seem to assume. I'm guessing it's all BE.
>
>I poked at it a bit and was able to cut down # of warnings
>from 1700 to 1400 in an hour. Someone familiar with the code
>should look at it.

We’ll take a look and send patches to resolve these warnings. 

>
>-- 
>MST
>___
>Virtualization mailing list
>virtualizat...@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Peter Zijlstra

On Fri, Dec 09, 2016 at 06:26:38AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 08, 2016 at 08:49:39PM -, Thomas Gleixner wrote:
> 
> > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> > delta)
> > +{
> > +   u32 dh, dl;
> > +   u64 nsec;
> > +
> > +   dl = delta;
> > +   dh = delta >> 32;
> > +
> > +   nsec = ((u64)dl * tkr->mult) + tkr->xtime_nsec;
> > +   nsec >>= tkr->shift;
> > +   if (unlikely(dh))
> > +   nsec += ((u64)dh * tkr->mult) << (32 - tkr->shift);
> > +   return nsec;
> > +}
> 
> Just for giggles, on tilegx the branch is actually slower than doing the
> mult unconditionally.
> 
> The problem is that the two multiplies would otherwise completely
> pipeline, whereas with the conditional you serialize them.

On my Haswell laptop the unconditional version is faster too.

> (came to light while talking about why the mul_u64_u32_shr() fallback
> didn't work right for them, which was a combination of the above issue
> and the fact that their compiler 'lost' the fact that these are
> 32x32->64 mults and did 64x64 ones instead).

Turns out using GCC-6.2.1 we have the same problem on i386, GCC doesn't
recognise the 32x32 mults and generates crap.

This used to work :/

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Al Viro

On Fri, Dec 09, 2016 at 07:22:25AM +0100, Michal Hocko wrote:

> > Easier to handle those in vmalloc() itself.
> 
> I think there were some attempts in the past but some of the code paths
> are burried too deep and adding gfp_mask all the way down there seemed
> like a major surgery.

No need to propagate gfp_mask - the same trick XFS is doing right now can
be done in vmalloc.c in a couple of places and that's it; I'll resurrect the
patches and post them tomorrow after I get some sleep.

[PATCH] tracing: (backport) Replace kmap with copy_from_user() in trace_marker

2016-12-08 Thread Henrik Austad

Instead of using get_user_pages_fast() and kmap_atomic() when writing
to the trace_marker file, just allocate enough space on the ring buffer
directly, and write into it via copy_from_user().

Writing into the trace_marker file use to allocate a temporary buffer
to perform the copy_from_user(), as we didn't want to write into the
ring buffer if the copy failed. But as a trace_marker write is suppose
to be extremely fast, and allocating memory causes other tracepoints to
trigger, Peter Zijlstra suggested using get_user_pages_fast() and
kmap_atomic() to keep the user space pages in memory and reading it
directly.

Instead, just allocate the space in the ring buffer and use
copy_from_user() directly. If it faults, return -EFAULT and write
"" into the ring buffer.

On architectures without a arch-specific get_user_pages_fast(), this
will end up in the generic get_user_pages_fast() and this grabs
mm->mmap_sem. Once you do this, then suddenly writing to the
trace_marker can cause priority-inversions.

This is a backport of Steven Rostedts patch [1] and applied to 3.10.x so the
signed-off-chain by is somewhat uncertain at this stage.

The patch compiles, boots and does not immediately explode on impact. By
definition [2] it must therefore be perfect

2) https://www.spinics.net/lists/kernel/msg2400769.html
2) http://lkml.iu.edu/hypermail/linux/kernel/9804.1/0149.html

Cc: Ingo Molnar 
Cc: Henrik Austad 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: sta...@vger.kernel.org

Suggested-by: Thomas Gleixner 
Used-to-be-signed-off-by: Steven Rostedt 
Backported-by: Henrik Austad 
Tested-by: Henrik Austad 
Signed-off-by: Henrik Austad 
---
 kernel/trace/trace.c | 78 +++-
 1 file changed, 22 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 18cdf91..94eb1ee 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4501,15 +4501,13 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
struct ring_buffer *buffer;
struct print_entry *entry;
unsigned long irq_flags;
-   struct page *pages[2];
-   void *map_page[2];
-   int nr_pages = 1;
+   const char faulted[] = "";
ssize_t written;
-   int offset;
int size;
int len;
-   int ret;
-   int i;
+
+/* Used in tracing_mark_raw_write() as well */
+#define FAULTED_SIZE (sizeof(faulted) - 1) /* '\0' is already accounted for */
 
if (tracing_disabled)
return -EINVAL;
@@ -4520,60 +4518,34 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
if (cnt > TRACE_BUF_SIZE)
cnt = TRACE_BUF_SIZE;
 
-   /*
-* Userspace is injecting traces into the kernel trace buffer.
-* We want to be as non intrusive as possible.
-* To do so, we do not want to allocate any special buffers
-* or take any locks, but instead write the userspace data
-* straight into the ring buffer.
-*
-* First we need to pin the userspace buffer into memory,
-* which, most likely it is, because it just referenced it.
-* But there's no guarantee that it is. By using get_user_pages_fast()
-* and kmap_atomic/kunmap_atomic() we can get access to the
-* pages directly. We then write the data directly into the
-* ring buffer.
-*/
BUILD_BUG_ON(TRACE_BUF_SIZE >= PAGE_SIZE);
 
-   /* check if we cross pages */
-   if ((addr & PAGE_MASK) != ((addr + cnt) & PAGE_MASK))
-   nr_pages = 2;
-
-   offset = addr & (PAGE_SIZE - 1);
-   addr &= PAGE_MASK;
-
-   ret = get_user_pages_fast(addr, nr_pages, 0, pages);
-   if (ret < nr_pages) {
-   while (--ret >= 0)
-   put_page(pages[ret]);
-   written = -EFAULT;
-   goto out;
-   }
+   local_save_flags(irq_flags);
+   size = sizeof(*entry) + cnt + 2; /* add '\0' and possible '\n' */
 
-   for (i = 0; i < nr_pages; i++)
-   map_page[i] = kmap_atomic(pages[i]);
+   /* If less than "", then make sure we can still add that */
+   if (cnt < FAULTED_SIZE)
+   size += FAULTED_SIZE - cnt;
 
-   local_save_flags(irq_flags);
-   size = sizeof(*entry) + cnt + 2; /* possible \n added */
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
  irq_flags, preempt_count());
-   if (!event) {
-   /* Ring buffer disabled, return as if not open for write */
-   written = -EBADF;
-   goto out_unlock;
-   }
+
+   if (unlikely(!event))
+   /* Ring buffer disabled, return as if not open for write */
+   return -EBADF;
 
entry = ring_buffer_event_data(event);
entry->ip = _THIS_IP_;
 
-   if (nr_pages == 2) {
-   len = PAGE_SI

Re: fs, net: deadlock between bind/splice on af_unix

2016-12-08 Thread Cong Wang

On Thu, Dec 8, 2016 at 5:32 PM, Al Viro  wrote:
> On Thu, Dec 08, 2016 at 04:08:27PM -0800, Cong Wang wrote:
>> On Thu, Dec 8, 2016 at 8:30 AM, Dmitry Vyukov  wrote:
>> > Chain exists of:
>> >  Possible unsafe locking scenario:
>> >
>> >CPU0CPU1
>> >
>> >   lock(sb_writers#5);
>> >lock(&u->bindlock);
>> >lock(sb_writers#5);
>> >   lock(&pipe->mutex/1);
>>
>> This looks false positive, probably just needs lockdep_set_class()
>> to set keys for pipe->mutex and unix->bindlock.
>
> I'm afraid that it's not a false positive at all.

Right, I was totally misled by the scenario output of lockdep, the stack
traces actually are much more reasonable.

The deadlock scenario is easy actually, comparing with the netlink one
which has 4 locks involved, it is:

unix_bind() path:
u->bindlock ==> sb_writer

do_splice() path:
sb_writer ==> pipe->mutex ==> u->bindlock

 *** DEADLOCK ***

>
> Why do we do autobind there, anyway, and why is it conditional on
> SOCK_PASSCRED?  Note that e.g. for SOCK_STREAM we can bloody well get
> to sending stuff without autobind ever done - just use socketpair()
> to create that sucker and we won't be going through the connect()
> at all.

In the case Dmitry reported, unix_dgram_sendmsg() calls unix_autobind(),
not SOCK_STREAM.

I guess some lock, perhaps the u->bindlock could be dropped before
acquiring the next one (sb_writer), but I need to double check.

[PATCH 4/4] dt-bindings: input: Specify the interrupt number of TPS65217 power button

2016-12-08 Thread Milo Kim

Specify the power button interrupt number which is from the datasheet.

Signed-off-by: Milo Kim 
---
 Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt 
b/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt
index 3e5b979..8682ab6 100644
--- a/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt
+++ b/Documentation/devicetree/bindings/input/tps65218-pwrbutton.txt
@@ -8,8 +8,9 @@ This driver provides a simple power button event via an 
Interrupt.
 Required properties:
 - compatible: should be "ti,tps65217-pwrbutton" or "ti,tps65218-pwrbutton"
 
-Required properties for TPS65218:
+Required properties:
 - interrupts: should be one of the following
+   - <2>: For controllers compatible with tps65217
- <3 IRQ_TYPE_EDGE_BOTH>: For controllers compatible with tps65218
 
 Examples:
@@ -17,6 +18,7 @@ Examples:
 &tps {
tps65217-pwrbutton {
compatible = "ti,tps65217-pwrbutton";
+   interrupts = <2>;
};
 };
 
-- 
2.9.3

[PATCH 3/4] dt-bindings: power/supply: Update TPS65217 properties

2016-12-08 Thread Milo Kim

Add interrupt specifiers for USB and AC charger input. Interrupt numbers
are from the datasheet.
Fix wrong property for compatible string.

Signed-off-by: Milo Kim 
---
 .../devicetree/bindings/power/supply/tps65217_charger.txt  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git 
a/Documentation/devicetree/bindings/power/supply/tps65217_charger.txt 
b/Documentation/devicetree/bindings/power/supply/tps65217_charger.txt
index 98d131a..a11072c 100644
--- a/Documentation/devicetree/bindings/power/supply/tps65217_charger.txt
+++ b/Documentation/devicetree/bindings/power/supply/tps65217_charger.txt
@@ -2,11 +2,16 @@ TPS65217 Charger
 
 Required Properties:
 -compatible: "ti,tps65217-charger"
+-interrupts: TPS65217 interrupt numbers for the AC and USB charger input 
change.
+ Should be <0> for the USB charger and <1> for the AC adapter.
+-interrupt-names: Should be "USB" and "AC"
 
 This node is a subnode of the tps65217 PMIC.
 
 Example:
 
tps65217-charger {
-   compatible = "ti,tps65090-charger";
+   compatible = "ti,tps65217-charger";
+   interrupts = <0>, <1>;
+   interrupt-names = "USB", "AC";
};
-- 
2.9.3

[PATCH 2/4] dt-bindings: mfd: Remove TPS65217 interrupts

2016-12-08 Thread Milo Kim

Interrupt numbers are from the datasheet, so no need to keep them in
the ABI. Use the number in the DT file.

Signed-off-by: Milo Kim 
---
 arch/arm/boot/dts/am335x-bone-common.dtsi |  8 +++-
 include/dt-bindings/mfd/tps65217.h| 26 --
 2 files changed, 3 insertions(+), 31 deletions(-)
 delete mode 100644 include/dt-bindings/mfd/tps65217.h

diff --git a/arch/arm/boot/dts/am335x-bone-common.dtsi 
b/arch/arm/boot/dts/am335x-bone-common.dtsi
index 14b6269..3e32dd1 100644
--- a/arch/arm/boot/dts/am335x-bone-common.dtsi
+++ b/arch/arm/boot/dts/am335x-bone-common.dtsi
@@ -6,8 +6,6 @@
  * published by the Free Software Foundation.
  */
 
-#include 
-
 / {
cpus {
cpu@0 {
@@ -319,13 +317,13 @@
ti,pmic-shutdown-controller;
 
charger {
-   interrupts = , ;
-   interrupt-names = "AC", "USB";
+   interrupts = <0>, <1>;
+   interrupt-names = "USB", "AC";
status = "okay";
};
 
pwrbutton {
-   interrupts = ;
+   interrupts = <2>;
status = "okay";
};
 
diff --git a/include/dt-bindings/mfd/tps65217.h 
b/include/dt-bindings/mfd/tps65217.h
deleted file mode 100644
index cafb9e6..000
--- a/include/dt-bindings/mfd/tps65217.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/*
- * This header provides macros for TI TPS65217 DT bindings.
- *
- * Copyright (C) 2016 Texas Instruments
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program.  If not, see .
- */
-
-#ifndef __DT_BINDINGS_TPS65217_H__
-#define __DT_BINDINGS_TPS65217_H__
-
-#define TPS65217_IRQ_USB   0
-#define TPS65217_IRQ_AC1
-#define TPS65217_IRQ_PB2
-
-#endif
-- 
2.9.3

[PATCH 1/4] ARM: dts: am335x: Fix the interrupt name of TPS65217

2016-12-08 Thread Milo Kim

Use 'interrupt-names' for getting the charger interrupt number.

Fixes: 1934e89a769b ("ARM: dts: am335x: Add the charger interrupt")
Signed-off-by: Milo Kim 
---
 arch/arm/boot/dts/am335x-bone-common.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/am335x-bone-common.dtsi 
b/arch/arm/boot/dts/am335x-bone-common.dtsi
index dc561d5..14b6269 100644
--- a/arch/arm/boot/dts/am335x-bone-common.dtsi
+++ b/arch/arm/boot/dts/am335x-bone-common.dtsi
@@ -320,7 +320,7 @@
 
charger {
interrupts = , ;
-   interrupts-names = "AC", "USB";
+   interrupt-names = "AC", "USB";
status = "okay";
};
 
-- 
2.9.3

[PATCH 0/4] dt-bindings: mfd: Update TPS65217 interrupts

2016-12-08 Thread Milo Kim

This patch-set fixes wrong property name and uses TPS65217 HW interrupt 
number from the datasheet instead of the DT ABI. DT bindings are also 
updated.

Milo Kim (4):
  ARM: dts: am335x: Fix the interrupt name of TPS65217
  dt-bindings: mfd: Remove TPS65217 interrupts
  dt-bindings: power/supply: Update TPS65217 properties
  dt-bindings: input: Add interrupt number for TPS65217

 .../bindings/input/tps65218-pwrbutton.txt  |  4 +++-
 .../bindings/power/supply/tps65217_charger.txt |  7 +-
 arch/arm/boot/dts/am335x-bone-common.dtsi  |  8 +++
 include/dt-bindings/mfd/tps65217.h | 26 --
 4 files changed, 12 insertions(+), 33 deletions(-)
 delete mode 100644 include/dt-bindings/mfd/tps65217.h

-- 
2.9.3

Re: [PATCH] pci-error-recover: doc cleanup

2016-12-08 Thread Linas Vepstas

I suppose I'm confused, but I recall that link resets are non-fatal.
Fatal errors typically require that the the pci adapter be completely
reset, any adapter firmware to be reloaded from scratch, the device
driver has to kill all device state and start from scratch. Its huge.
If the fatal error is on pci device that is under a block device
holding a file system, then (usually) there is no way to recover,
because the block layer (and file system) cannot deal with a block
device that disappeared and then reappeared some few seconds later.
(maybe some future zfs or lvm or btrfs might be able to deal with
this, but not today)

By contrast, link resets are far more gentle: the device driver might
have to discard some half-full FIFO's, or cancel some in-flight
commands, but can otherwise gracefully recover without telling the
higher layers that there were any problems.

--linas

On Thu, Dec 8, 2016 at 10:13 PM, Cao jin  wrote:
>
>
> On 12/08/2016 10:05 PM, Jonathan Corbet wrote:
>> On Thu, 8 Dec 2016 16:16:14 +0800
>> Cao jin  wrote:
>>
>>>  The platform resets the link, and then calls the link_reset() callback
>>>  on all affected device drivers.  This is a PCI-Express specific state
>>> -and is done whenever a non-fatal error has been detected that can be
>>> +and is done whenever a fatal error has been detected that can be
>>>  "solved" by resetting the link. This call informs the driver of the
>>
>> As far as I can tell, the original text was correct here; why do you
>> think this change needs to be made?
>>
>
> See do_recovery() in aer core, reset_link() is called only seeing fatal
> error.
>
> --
> Sincerely,
> Cao jin
>
>

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko

On Fri 09-12-16 02:00:17, Al Viro wrote:
> On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> > On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > > From: Michal Hocko 
> > > 
> > > Using kmalloc with the vmalloc fallback for larger allocations is a
> > > common pattern in the kernel code. Yet we do not have any common helper
> > > for that and so users have invented their own helpers. Some of them are
> > > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > > it is implemented properly. This implementation makes sure to not make
> > > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > > to not warn about allocation failures. This also rules out the OOM
> > > killer as the vmalloc is a more approapriate fallback than a disruptive
> > > user visible action.
> > > 
> > > This patch also changes some existing users and removes helpers which
> > > are specific for them. In some cases this is not possible (e.g.
> > > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > > in general (note that the page table allocation is GFP_KERNEL). Those
> > > need to be fixed separately.
> > 
> > See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> > kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> > functionality will have to play these memalloc_noio_save/
> > memalloc_noio_restore games to ensure they are GFP_NOFS safe
> 
> Easier to handle those in vmalloc() itself.

I think there were some attempts in the past but some of the code paths
are burried too deep and adding gfp_mask all the way down there seemed
like a major surgery.

> The problem I have with these
> helpers is that different places have different cutoff thresholds for
> switch from kmalloc to vmalloc; has anyone done an analysis of those?

Yes, I have noticed some creativity as well. Some of them didn't bother
to kmalloc at all for size > PAGE_SIZE. Some where playing tricks with
PAGE_ALLOC_COSTLY_ORDER. I believe the right thing to do is to simply do
not hammer the system with size > PAGE_SZE which means __GFP_NORETRY for
them and fallback to vmalloc on the failure (basically what
seq_buf_alloc did). I cannot offer any numbers but at least
seq_buf_alloc has proven to do the right thing over time.

-- 
Michal Hocko
SUSE Labs

Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

2016-12-08 Thread Michal Hocko

On Fri 09-12-16 12:44:17, Dave Chinner wrote:
> On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > Using kmalloc with the vmalloc fallback for larger allocations is a
> > common pattern in the kernel code. Yet we do not have any common helper
> > for that and so users have invented their own helpers. Some of them are
> > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > it is implemented properly. This implementation makes sure to not make
> > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > to not warn about allocation failures. This also rules out the OOM
> > killer as the vmalloc is a more approapriate fallback than a disruptive
> > user visible action.
> > 
> > This patch also changes some existing users and removes helpers which
> > are specific for them. In some cases this is not possible (e.g.
> > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > in general (note that the page table allocation is GFP_KERNEL). Those
> > need to be fixed separately.
> 
> See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> functionality will have to play these memalloc_noio_save/
> memalloc_noio_restore games to ensure they are GFP_NOFS safe

Well, I didn't want to play this games in the generic kvmalloc, at least
not now, because all the converted users didn't really need it so far
and I believe that the existing users need a) inspection to check
whether NO{FS,IO} context is really needed and b) I still believe that
the scope nofs api should be used longterm rather than an explicit
GFP_NOFS. I am already working on ext[34] code.

-- 
Michal Hocko
SUSE Labs

RE: [PATCH] ACPI / OSL: Fix a regression by returning table size via acpi_get_table_with_size()

2016-12-08 Thread Zheng, Lv

Hi, Rafael

> From: rjwyso...@gmail.com [mailto:rjwyso...@gmail.com] On Behalf Of Rafael J. 
> Wysocki
> Subject: Re: [PATCH] ACPI / OSL: Fix a regression by returning table size via
> acpi_get_table_with_size()
> 
> On Fri, Dec 9, 2016 at 3:21 AM, Lv Zheng  wrote:
> > The returned size is still used by the drivers.
> >
> > Reported-by: Dan Williams 
> > Cc: Dan Williams 
> > Signed-off-by: Lv Zheng 
> > ---
> >  drivers/acpi/osl.c |8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > index 5bef0f65..adf1ec4 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -445,8 +445,12 @@ void __ref acpi_os_unmap_memory(void *virt, acpi_size 
> > size)
> >
> > status = acpi_get_table(signature, instance, out_table);
> > if (ACPI_SUCCESS(status)) {
> > -   /* No longer used by early_acpi_os_unmap_memory() */
> > -   *tbl_size = 0;
> > +   /*
> > +* No longer used by early_acpi_os_unmap_memory(), but still
> > +* used by the ACPI table drivers.
> > +*/
> > +   if (*out_table)
> > +   *tbl_size = (*out_table)->length;
> > }
> >
> > return (status);
> > --
> 
> The changelog doesn't explain anything.  Please say (a) what the
> problem is and (b) how it is being addressed by your patch.

OK, I'll also add fixes tag to it.

Thanks
Lv

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Peter Zijlstra

On Fri, Dec 09, 2016 at 06:11:17AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 08, 2016 at 08:49:39PM -, Thomas Gleixner wrote:
> 
> > +/*
> > + * Enabled when timekeeping is supposed to deal with virtualization keeping
> > + * VMs long enough scheduled out that the 64 * 32 bit multiplication in
> > + * timekeeping_delta_to_ns() overflows 64bit.
> > + */
> > +#ifdef CONFIG_TIMEKEEPING_USE_128BIT_MATH
> > +
> > +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
> > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> > delta)
> > +{
> > +   unsigned __int128 nsec;
> > +
> > +   nsec = ((unsigned __int128)delta * tkr->mult) + tkr->xtime_nsec;
> > +   return (u64) (nsec >> tkr->shift);
> > +}
> > +#else
> > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> > delta)
> > +{
> > +   u32 dh, dl;
> > +   u64 nsec;
> > +
> > +   dl = delta;
> > +   dh = delta >> 32;
> > +
> > +   nsec = ((u64)dl * tkr->mult) + tkr->xtime_nsec;
> > +   nsec >>= tkr->shift;
> > +   if (unlikely(dh))
> > +   nsec += ((u64)dh * tkr->mult) << (32 - tkr->shift);
> > +   return nsec;
> > +}
> > +#endif
> > +
> > +#else /* CONFIG_TIMEKEEPING_USE_128BIT_MATH */
> 
> xtime_nsec confuses me, contrary to its name, its not actually in nsec,
> its in shifted nsec units for some reason (and that might well be a good
> reason, but I don't know).
> 
> In any case, it needing to be inside the shift is somewhat unfortunate
> in that it doesn't allow you to use the existing mul_u64_u32_shr()

Wouldn't something like:

nsec = mul_u64_u32_shr(delta, tkr->mult, tkr->shift);
nsec += tkr->xtime_nsec >> tkr->shift;

Be good enough? Sure you have a slight rounding error, which results in
a few jaggies in the actual timeline, but it would still be monotonic.

That is, we'll observe the ns rollover 'late', but given its ns, does
anybody really care?

Re: netlink: GPF in sock_sndtimeo

2016-12-08 Thread Richard Guy Briggs

On 2016-11-29 23:52, Richard Guy Briggs wrote:
> On 2016-11-29 15:13, Cong Wang wrote:
> > On Tue, Nov 29, 2016 at 8:48 AM, Richard Guy Briggs  wrote:
> > > On 2016-11-26 17:11, Cong Wang wrote:
> > >> It is racy on audit_sock, especially on the netns exit path.
> > >
> > > I think that is the only place it is racy.  The other places audit_sock
> > > is set is when the socket failure has just triggered a reset.
> > >
> > > Is there a notifier callback for failed or reaped sockets?
> > 
> > Is NETLINK_URELEASE event what you are looking for?
> 
> Possibly, yes.  Thanks, I'll have a look.

I tried a quick compile attempt on the test case (I assume it is a
socket fuzzer) and get the following compile error:
cc -g -O0 -Wall -D_GNU_SOURCE -o socket_fuzz socket_fuzz.c
socket_fuzz.c:16:1: warning: "_GNU_SOURCE" redefined
: warning: this is the location of the previous definition
socket_fuzz.c: In function ‘segv_handler’:
socket_fuzz.c:89: warning: implicit declaration of function ‘__atomic_load_n’
socket_fuzz.c:89: error: ‘__ATOMIC_RELAXED’ undeclared (first use in this 
function)
socket_fuzz.c:89: error: (Each undeclared identifier is reported only once
socket_fuzz.c:89: error: for each function it appears in.)
socket_fuzz.c: In function ‘loop’:
socket_fuzz.c:280: warning: unused variable ‘errno0’
socket_fuzz.c: In function ‘test’:
socket_fuzz.c:303: warning: implicit declaration of function 
‘__atomic_fetch_add’
socket_fuzz.c:303: error: ‘__ATOMIC_SEQ_CST’ undeclared (first use in this 
function)
socket_fuzz.c:303: warning: implicit declaration of function 
‘__atomic_fetch_sub’

I also tried to extend Cong Wang's idea to attempt to proactively respond to a
NETLINK_URELEASE on the audit_sock and reset it, but ran into a locking error
stack dump using mutex_lock(&audit_cmd_mutex) in the notifier callback.
Eliminating the lock since the sock is dead anways eliminates the error.

Is it safe?  I'll resubmit if this looks remotely sane.  Meanwhile I'll try to
get the test case to compile.

This is being tracked as https://github.com/linux-audit/audit-kernel/issues/30

Subject: [PATCH] audit: proactively reset audit_sock on matching 
NETLINK_URELEASE

diff --git a/kernel/audit.c b/kernel/audit.c
index f1ca116..91d222d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -423,6 +423,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
snprintf(s, sizeof(s), "audit_pid=%d reset", 
audit_pid);
audit_log_lost(s);
audit_pid = 0;
+   audit_nlk_portid = 0;
audit_sock = NULL;
} else {
pr_warn("re-scheduling(#%d) write to 
audit_pid=%d\n",
@@ -1143,6 +1144,28 @@ static int audit_bind(struct net *net, int group)
return 0;
 }
 
+static int audit_sock_netlink_notify(struct notifier_block *nb,
+unsigned long event,
+void *_notify)
+{
+   struct netlink_notify *notify = _notify;
+   struct audit_net *aunet = net_generic(notify->net, audit_net_id);
+
+   if (event == NETLINK_URELEASE && notify->protocol == NETLINK_AUDIT) {
+   if (audit_nlk_portid == notify->portid &&
+   audit_sock == aunet->nlsk) {
+   audit_pid = 0;
+   audit_nlk_portid = 0;
+   audit_sock = NULL;
+   }
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block audit_netlink_notifier = {
+   .notifier_call = audit_sock_netlink_notify,
+};
+
 static int __net_init audit_net_init(struct net *net)
 {
struct netlink_kernel_cfg cfg = {
@@ -1167,10 +1190,14 @@ static void __net_exit audit_net_exit(struct net *net)
 {
struct audit_net *aunet = net_generic(net, audit_net_id);
struct sock *sock = aunet->nlsk;
+
+   mutex_lock(&audit_cmd_mutex);
if (sock == audit_sock) {
audit_pid = 0;
+   audit_nlk_portid = 0;
audit_sock = NULL;
}
+   mutex_unlock(&audit_cmd_mutex);
 
RCU_INIT_POINTER(aunet->nlsk, NULL);
synchronize_net();
@@ -1202,6 +1229,7 @@ static int __init audit_init(void)
audit_enabled = audit_default;
audit_ever_enabled |= !!audit_default;
 
+   netlink_register_notifier(&audit_netlink_notifier);
audit_log(NULL, GFP_KERNEL, AUDIT_KERNEL, "initialized");
 
for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
-- 
1.7.1


> - RGB

- RGB

--
Richard Guy Briggs 
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

linux-next: Tree for Dec 9

2016-12-08 Thread Stephen Rothwell

Hi all,

Changes since 20161208:

The pci tree gained a conflict against Linus' tree.

The spi tree gained a build failure so I used the version from
next-20161208.

The tip tree gained a conflict against the pci tree.

Non-merge commits (relative to Linus' tree): 10377
 9526 files changed, 656932 insertions(+), 222564 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(with KALLSYMS_EXTRA_PASS=1) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 245 trees (counting Linus' and 35 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (318c8932ddec Merge branch 'akpm' (patches from Andrew))
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (152b695d7437 builddeb: fix cross-building to 
arm64 producing host-arch debs)
Merging arc-current/for-curr (7badf6fefca8 ARC: axs10x: really enable ARC PGU)
Merging arm-current/fixes (8478132a8784 Revert "arm: move exports to 
definitions")
Merging m68k-current/for-linus (7e251bb21ae0 m68k: Fix ndelay() macro)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (dadc4a1bb9f0 powerpc/64: Fix placement of .text to 
be immediately following .head.text)
Merging sparc/master (bc3913a5378c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging net/master (1a31cc86ef3c driver: ipvlan: Unlink the upper dev when 
ipvlan_link_new failed)
Merging ipsec/master (bc3913a5378c Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging netfilter/master (7bbf91ce27dd Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec)
Merging ipvs/master (9b6c14d51bd2 net: tcp response should set oif only if it 
is L3 master)
Merging wireless-drivers/master (fcd2042e8d36 mwifiex: printk() overflow with 
32-byte SSIDs)
Merging mac80211/master (9590112241ba tipc: fix link statistics counter errors)
Merging sound-current/for-linus (f73cd43ac3b4 ALSA: hda - Gate the mic jack on 
HP Z1 Gen3 AiO)
Merging pci-current/for-linus (e42010d8207f PCI: Set Read Completion Boundary 
to 128 iff Root Port supports it (_HPX))
Merging driver-core.current/driver-core-linus (a25f0944ba9b Linux 4.9-rc5)
Merging tty.current/tty-linus (a909d3e63699 Linux 4.9-rc3)
Merging usb.current/usb-linus (e5517c2a5a49 Linux 4.9-rc7)
Merging usb-gadget-fixes/fixes (05e78c6933d6 usb: gadget: f_fs: fix wrong 
parenthesis in ffs_func_req_match())
Merging usb-serial-fixes/usb-linus (46490c347df4 USB: serial: option: add dlink 
dwm-158)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (4320f9d4c183 phy: sun4i: check PMU presence when poking 
unknown bit of pmu)
Merging staging.current/staging-linus (a25f0944ba9b Linux 4.9-rc5)
Merging char-misc.current/char-misc-linus (a25f0944ba9b Linux 4.9-rc5)
Merging input-current/for-linus (2425f1808123 Input: change KEY_DATA from 0x275 
to 0x277)
Merging crypto-current/master (678b5c6b22fe crypto: algif_aead - fix 
uninitialized variable warning)
Merging ide/master (797cee982eef Merge branch 'stable-4.8' of 
git://git.infradead.org/users/pcmoore/audit)
Merging vfio-fixes/for-linus (05692d7005a3 vfio/pci: Fix integer ove

Re: [PATCH v2] kexec: add cond_resched into kimage_alloc_crash_control_pages

2016-12-08 Thread zhong jiang

On 2016/12/9 13:19, Eric W. Biederman wrote:
> zhong jiang  writes:
>
>> On 2016/12/8 17:41, Xunlei Pang wrote:
>>> On 12/08/2016 at 10:37 AM, zhongjiang wrote:
 From: zhong jiang 

> [snip]
 diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
 index 5616755..bfc9621 100644
 --- a/kernel/kexec_core.c
 +++ b/kernel/kexec_core.c
 @@ -441,6 +441,8 @@ static struct page 
 *kimage_alloc_crash_control_pages(struct kimage *image,
while (hole_end <= crashk_res.end) {
unsigned long i;
  
 +  cond_resched();
 +
>>> I can't see why it would take a long time to loop inside, the job it does 
>>> is simply to find a control area
>>> not overlapped with image->segment[], you can see the loop "for (i = 0; i < 
>>> image->nr_segments; i++)",
>>> @hole_end will be advanced to the end of its next nearby segment once 
>>> overlap was detected each loop,
>>> also there are limited (<=16) segments, so it won't take long to locate the 
>>> right area.
>>>
>>> Am I missing something?
>>>
>>> Regards,
>>> Xunlei
>>   if the crashkernel = auto is set in cmdline.  it represent crashk_res.end 
>> will exceed to 4G, the first allocate control pages will
>>   loop  million times. if we set crashk_res.end to the higher value
>>   manually,  you can image
> Or in short the cond_resched is about keeping things reasonable when the
> loop has worst case behavior.
>
> Eric
>
>
  Yes,   Thank you reply and comment.

  Regards,
  zhongjiang

[PATCH v2] mmc: core: Export device lifetime information through sysf

2016-12-08 Thread Jungseung Lee

In the eMMC 5.0 version of the spec, several EXT_CSD fields about
device lifetime are added.

 - Two types of estimated indications reflected by averaged wear out of memory
 - An indication reflected by average reserved blocks

Export the information through sysfs.

Signed-off-by: Jungseung Lee 
---
 drivers/mmc/core/mmc.c   | 12 
 include/linux/mmc/card.h |  3 +++
 include/linux/mmc/mmc.h  |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index b61b52f9..c0e2507 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -617,6 +617,12 @@ static int mmc_decode_ext_csd(struct mmc_card *card, u8 
*ext_csd)
card->ext_csd.ffu_capable =
(ext_csd[EXT_CSD_SUPPORTED_MODE] & 0x1) &&
!(ext_csd[EXT_CSD_FW_CONFIG] & 0x1);
+
+   card->ext_csd.pre_eol_info = ext_csd[EXT_CSD_PRE_EOL_INFO];
+   card->ext_csd.device_life_time_est_typ_a =
+   ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A];
+   card->ext_csd.device_life_time_est_typ_b =
+   ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B];
}
 
/* eMMC v5.1 or later */
@@ -764,6 +770,10 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid);
 MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name);
 MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid);
 MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv);
+MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info);
+MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n",
+   card->ext_csd.device_life_time_est_typ_a,
+   card->ext_csd.device_life_time_est_typ_b);
 MMC_DEV_ATTR(serial, "0x%08x\n", card->cid.serial);
 MMC_DEV_ATTR(enhanced_area_offset, "%llu\n",
card->ext_csd.enhanced_area_offset);
@@ -817,6 +827,8 @@ static struct attribute *mmc_std_attrs[] = {
&dev_attr_name.attr,
&dev_attr_oemid.attr,
&dev_attr_prv.attr,
+   &dev_attr_pre_eol_info.attr,
+   &dev_attr_life_time.attr,
&dev_attr_serial.attr,
&dev_attr_enhanced_area_offset.attr,
&dev_attr_enhanced_area_size.attr,
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index 95d69d4..00449e5 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -121,6 +121,9 @@ struct mmc_ext_csd {
u8  raw_pwr_cl_ddr_200_360; /* 253 */
u8  raw_bkops_status;   /* 246 */
u8  raw_sectors[4]; /* 212 - 4 bytes */
+   u8  pre_eol_info;   /* 267 */
+   u8  device_life_time_est_typ_a; /* 268 */
+   u8  device_life_time_est_typ_b; /* 269 */
 
unsigned intfeature_support;
 #define MMC_DISCARD_FEATUREBIT(0)  /* CMD38 feature */
diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h
index 672730a..a074082 100644
--- a/include/linux/mmc/mmc.h
+++ b/include/linux/mmc/mmc.h
@@ -339,6 +339,9 @@ struct _mmc_csd {
 #define EXT_CSD_CACHE_SIZE 249 /* RO, 4 bytes */
 #define EXT_CSD_PWR_CL_DDR_200_360 253 /* RO */
 #define EXT_CSD_FIRMWARE_VERSION   254 /* RO, 8 bytes */
+#define EXT_CSD_PRE_EOL_INFO   267 /* RO */
+#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A 268 /* RO */
+#define EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B 269 /* RO */
 #define EXT_CSD_CMDQ_DEPTH 307 /* RO */
 #define EXT_CSD_CMDQ_SUPPORT   308 /* RO */
 #define EXT_CSD_SUPPORTED_MODE 493 /* RO */
-- 
2.10.1

RE: Re: [PATCH 1/2] arm64: Correcting format specifier for printin 64 bit addresses

2016-12-08 Thread Maninder Singh

 
Hi Will,

>There are a bunch of these you haven't caught:
>

>arch/arm64/mm/mmu.c:pr_warn("fix_to_virt(FIX_BTMAP_END):   
>%08lx\n",
>
>so it would probably make sense to fix these to be consistent.
>
>Will

All changes are sent in new patch except kvm changes, because for kvm we don't 
not much idea.


>arch/arm64/kernel/signal32.c:   pr_info_ratelimited("%s[%d]: bad frame 
>in %s: pc=%08llx sp=%08llx\n",
>arch/arm64/kernel/signal32.c:   pr_info_ratelimited("%s[%d]: bad frame 
>in %s: pc=%08llx sp=%08llx\n",

and signal32 fiel changes are not required, because it meant only for 32 bit.


Thanks and Regards,
Maninder Singh

Re: [PATCH v2] inotify: Convert to using per-namespace limits

2016-12-08 Thread Eric W. Biederman

Nikolay Borisov  writes:

> On  8.12.2016 08:58, Nikolay Borisov wrote:
>> 
>> 
>> On  8.12.2016 03:40, Eric W. Biederman wrote:
>>> Nikolay Borisov  writes:
>>>
 Gentle ping, now that rc1 has shipped and Jan's sysctl concern hopefully
 resolved.
>>>
>>> After getting slowed down by some fixes I am now taking a hard look at
>>> your patch in the hopes of merging it.
>>>
>>> Did you happen to see the kbuild test roboot boot failures and did you
>>> happen to look into what caused them?  I have just skimmed them and it
>>> appears to be related to your patch.
>> 
>> I saw them in the beginning but they did look like a generic memory
>> corruption and I believe at the time those patches were submitted there
>> was a lingering memory corruption hitting various patches. Thus I didn't
>> think it was related to my patches. I've since left my work so been
>> taking a bit of time off and haven't looked really hard, so those
>> patches have been kind of lingering.
>> 
>> 
>> But now that you mention it I will try and take a second look to see
>> what might cause the memory corruption? Is there a way to force 0day to
>> re-run them to see whether the failure was indeed caused by my patches
>> or were intermittent?
>
> Ok, I took another look into the report but bear in mind that the
> corruption indeed happened in retire_userns_sysctls. But also this row
> in the report leads me to believe it's not my patch that's the culprit:
>
> [   65.527277] INFO: Allocated in setup_userns_sysctls+0x3f/0xa6 age=5
> cpu=1 pid=418
> [   65.558397] INFO: Freed in free_ctx+0x1d/0x20 age=6 cpu=0 pid=19
>
>
> So a free_ctx function did free it originally, likely causing the
> corruption. And there is no such function involved in the code I'm touching.

Yes.  I read through your patch carefully and it doesn't look like it
could possibly cause that kind of corruption, the code is just too
simple.

So I have (belatedly) placed this change in linux-next.

Eric

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Peter Zijlstra

On Fri, Dec 09, 2016 at 06:22:03AM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra  wrote:
> 
> > On Fri, Dec 09, 2016 at 05:08:26AM +0100, Ingo Molnar wrote:
> > > > +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
> > > > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, 
> > > > u64 delta)
> > > > +{
> > > > +   unsigned __int128 nsec;
> > > > +
> > > > +   nsec = ((unsigned __int128)delta * tkr->mult) + tkr->xtime_nsec;
> > > > +   return (u64) (nsec >> tkr->shift);
> > > > +}
> > > 
> > > Actually, 128-bit multiplication shouldn't be too horrible - at least on 
> > > 64-bit 
> > > architectures. (128-bit division is another matter, but there's no 
> > > division here.)
> > 
> > IIRC there are 64bit architectures that do not have a 64x64->128 mult,
> > only a 64x64->64 mult instruction. Its not immediately apparent using
> > __int128 will generate optimal code for those, nor is it a given GCC
> > will not require libgcc functions for those.
> 
> Well, if the overflow case is rare (which it is in this case) then it should 
> still 
> be relatively straightforward, something like:
> 
> X and Y are 64-bit:
> 
>   X = Xh*2^32 + Xl
>   Y = Yh*2^32 + Yl
> 
>   X*Y = (Xh*2^32 + Xl)*(Yh*2^32 + Yl)
> 
>   =   Xh*2^32*(Yh*2^32 + Yl)
> +  Xl*(Yh*2^32 + Yl)
> 
>   =   Xh*Yh*2^64
> + Xh*Yl*2^32
> + Xl*Yh*2^32
> + XL*Yl
> 
> Which is four 32x32->64 multiplications in the worst case.

Yeah, that's the full 64x64->128 mult on 3bit. Luckily we only need
64x32->96, which reduces to 2 32x32->64 mults.

But my point was that unconditionally using __int128 might not be the
right thing.

> Where a valid overflow threshold is relatively easy to determine in a hot 
> path 
> compatible fashion:
> 
>   if (Xh != 0 || Yh != 0)
>   slow_path();
> 
> And this simple and fast overflow check should still cover the overwhelming 
> majority of 'sane' systems. (A more involved 'could it overflow' check of 
> counting 
> the high bits with 8 bit granularity by looking at the high bytes not at the 
> words 
> could be done in the slow path - to still avoid the 4 multiplications in most 
> cases.)
> 
> Am I missing something?

Yeah, the fact that we only need the 2 mults and that the fallback
already does the second multiply conditionally :-) But then look at the
email where I said that that condition actually makes the thing vastly
more expensive on some archs (like tilegx).

[PATCH V1 0/2] Implement break control for F81232/F81534

2016-12-08 Thread Ji-Ze Hong (Peter Hong)

The following 2 patches makes break control available for
Fintek F81232/F81534.

Ji-Ze Hong (Peter Hong) (2):
  usb:serial: Implement Fintek F81232 break on/off
  usb:serial: Implement Fintek f81534 break on/off

 drivers/usb/serial/f81232.c | 40 ++--
 drivers/usb/serial/f81534.c | 40 
 2 files changed, 74 insertions(+), 6 deletions(-)

-- 
1.9.1

[PATCH V1 2/2] usb:serial: Implement Fintek f81534 break on/off

2016-12-08 Thread Ji-Ze Hong (Peter Hong)

Implement Fintek f81534 break on/off with LCR register
It's the same with 16550A LCR register layout

We'll add a shadow LCR variable to save the final LCR we
had set due to the "read ep0" operations maybe slow down all
the serial ports performance.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81534.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/drivers/usb/serial/f81534.c b/drivers/usb/serial/f81534.c
index 8282a6a..1b6ba81 100644
--- a/drivers/usb/serial/f81534.c
+++ b/drivers/usb/serial/f81534.c
@@ -126,8 +126,10 @@ struct f81534_serial_private {
 
 struct f81534_port_private {
struct mutex mcr_mutex;
+   struct mutex lcr_mutex;
unsigned long tx_empty;
spinlock_t msr_lock;
+   u8 shadow_lcr;
u8 shadow_mcr;
u8 shadow_msr;
u8 phy_num;
@@ -683,6 +685,7 @@ static void f81534_set_termios(struct tty_struct *tty,
struct usb_serial_port *port,
struct ktermios *old_termios)
 {
+   struct f81534_port_private *port_priv = usb_get_serial_port_data(port);
u8 new_lcr = 0;
int status;
u32 baud;
@@ -721,6 +724,10 @@ static void f81534_set_termios(struct tty_struct *tty,
break;
}
 
+   mutex_lock(&port_priv->lcr_mutex);
+   port_priv->shadow_lcr = new_lcr;
+   mutex_unlock(&port_priv->lcr_mutex);
+
baud = tty_get_baud_rate(tty);
if (!baud)
return;
@@ -812,6 +819,35 @@ static int f81534_read_msr(struct usb_serial_port *port)
return 0;
 }
 
+static void f81534_set_break(struct usb_serial_port *port, bool enable)
+{
+   struct f81534_port_private *port_priv = usb_get_serial_port_data(port);
+   int status;
+
+   mutex_lock(&port_priv->lcr_mutex);
+
+   if (enable)
+   port_priv->shadow_lcr |= UART_LCR_SBC;
+   else
+   port_priv->shadow_lcr &= ~UART_LCR_SBC;
+
+   status = f81534_set_port_register(port, F81534_LINE_CONTROL_REG,
+   port_priv->shadow_lcr);
+   if (status) {
+   dev_err(&port->dev, "%s: set break failed: %x\n", __func__,
+   status);
+   }
+
+   mutex_unlock(&port_priv->lcr_mutex);
+}
+
+static void f81534_break_ctl(struct tty_struct *tty, int break_state)
+{
+   struct usb_serial_port *port = tty->driver_data;
+
+   f81534_set_break(port, break_state);
+}
+
 static int f81534_open(struct tty_struct *tty, struct usb_serial_port *port)
 {
struct f81534_serial_private *serial_priv =
@@ -877,6 +913,8 @@ static void f81534_close(struct usb_serial_port *port)
}
 
mutex_unlock(&serial_priv->urb_mutex);
+
+   f81534_set_break(port, false);
 }
 
 static int f81534_get_serial_info(struct usb_serial_port *port,
@@ -1244,6 +1282,7 @@ static int f81534_port_probe(struct usb_serial_port *port)
 
spin_lock_init(&port_priv->msr_lock);
mutex_init(&port_priv->mcr_mutex);
+   mutex_init(&port_priv->lcr_mutex);
 
/* Assign logic-to-phy mapping */
port_priv->phy_num = f81534_logic_to_phy_port(port->serial, port);
@@ -1389,6 +1428,7 @@ static int f81534_resume(struct usb_serial *serial)
.dtr_rts =  f81534_dtr_rts,
.process_read_urb = f81534_process_read_urb,
.ioctl =f81534_ioctl,
+   .break_ctl =f81534_break_ctl,
.tiocmget = f81534_tiocmget,
.tiocmset = f81534_tiocmset,
.write_bulk_callback =  f81534_write_usb_callback,
-- 
1.9.1

Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-08 Thread John Stultz

On Tue, Dec 6, 2016 at 10:23 AM, Tejun Heo  wrote:
> Hello,
>
> On Tue, Dec 06, 2016 at 10:13:53AM -0800, Andy Lutomirski wrote:
>> > Delegation is an explicit operation and reflected in the ownership of
>> > the subdirectories and cgroup interface files in them.  The
>> > subhierarchy containment is achieved by requiring the user who's
>> > trying to migrate a process to have write perm on cgroup.procs on the
>> > common ancestor of the source and target in addition to the target.
>>
>> OK, I see what you're doing.  That's interesting.
>
> It's something born out of usages of cgroup v1.  People used it that
> way (chowning files and directories) and combined with the uid checksn
> it yielded something which is useful sometimes, but it always had
> issues with hierarchical behaviors, which files to chmod and the weird
> combination of uid checks.  cgroup v2 has a clear delegation model but
> the uid checks are still left in as not changing was the default.
>
> It's not necessary and I'm thinking about queueing something like the
> following in the next cycle.
>
> As for the android CAP discussion, I think it'd be nice to share an
> existing CAP but if we can't find a good one to share, let's create a
> new one.

So just to clarify the discussion for my purposes and make sure I
understood, per-cgroup CAP rules was not desired, and instead we
should either utilize an existing cap (are there still objections to
CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
bring back the older CAP_CGROUP_MIGRATE patch).

Tejun: Do you have a more finished version of your patch that I should
add my changes on top of?

thanks
-john

[PATCH V1 1/2] usb:serial: Implement Fintek F81232 break on/off

2016-12-08 Thread Ji-Ze Hong (Peter Hong)

Implement Fintek F81232 break on/off with LCR register,
it's the same with 16550A LCR register layout.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 40 ++--
 1 file changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index 972f5a5..d45a70e 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -131,6 +131,21 @@ static int f81232_set_register(struct usb_serial_port 
*port, u16 reg, u8 val)
return status;
 }
 
+static int f81232_set_mask_register(struct usb_serial_port *port, u16 reg,
+   u8 mask, u8 val)
+{
+   int status;
+   u8 tmp;
+
+   status = f81232_get_register(port, reg, &tmp);
+   if (status)
+   return status;
+
+   tmp = (tmp & ~mask) | (val & mask);
+
+   return f81232_set_register(port, reg, tmp);
+}
+
 static void f81232_read_msr(struct usb_serial_port *port)
 {
int status;
@@ -335,15 +350,27 @@ static void f81232_process_read_urb(struct urb *urb)
tty_flip_buffer_push(&port->port);
 }
 
+static void f81232_set_break(struct usb_serial_port *port, bool enable)
+{
+   int status;
+   u8 tmp = 0;
+
+   if (enable)
+   tmp = UART_LCR_SBC;
+
+   status = f81232_set_mask_register(port, LINE_CONTROL_REGISTER,
+   UART_LCR_SBC, tmp);
+   if (status) {
+   dev_err(&port->dev, "%s: set break failed: %x\n", __func__,
+   status);
+   }
+}
+
 static void f81232_break_ctl(struct tty_struct *tty, int break_state)
 {
-   /* FIXME - Stubbed out for now */
+   struct usb_serial_port *port = tty->driver_data;
 
-   /*
-* break_state = -1 to turn on break, and 0 to turn off break
-* see drivers/char/tty_io.c to see it used.
-* last_set_data_urb_value NEVER has the break bit set in it.
-*/
+   f81232_set_break(port, break_state);
 }
 
 static void f81232_set_baudrate(struct usb_serial_port *port, speed_t baudrate)
@@ -563,6 +590,7 @@ static void f81232_close(struct usb_serial_port *port)
f81232_port_disable(port);
usb_serial_generic_close(port);
usb_kill_urb(port->interrupt_in_urb);
+   f81232_set_break(port, false);
 }
 
 static void f81232_dtr_rts(struct usb_serial_port *port, int on)
-- 
1.9.1

Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in Cadence GEM.

2016-12-08 Thread Harini Katakam

Hi,

On Thu, Dec 8, 2016 at 8:11 PM,   wrote:
>
>
>> -Original Message-
>> From: Richard Cochran [mailto:richardcoch...@gmail.com]
>> Sent: Wednesday, December 07, 2016 11:04 PM
>> To: Andrei Pistirica - M16132
>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
>> ker...@lists.infradead.org; da...@davemloft.net;
>> nicolas.fe...@atmel.com; harinikatakamli...@gmail.com;
>> harini.kata...@xilinx.com; punn...@xilinx.com; mich...@xilinx.com;
>> anir...@xilinx.com; boris.brezil...@free-electrons.com;
>> alexandre.bell...@free-electrons.com; tbul...@pixelsurmer.com;
>> raf...@cadence.com
>> Subject: Re: [RFC PATCH net-next v3 1/2] macb: Add 1588 support in
>> Cadence GEM.
>>
>> On Wed, Dec 07, 2016 at 08:39:09PM +0100, Richard Cochran wrote:
>> > > +static s32 gem_ptp_max_adj(unsigned int f_nom) {
>> > > + u64 adj;
>> > > +
>> > > + /* The 48 bits of seconds for the GEM overflows every:
>> > > +  * 2^48/(365.25 * 24 * 60 *60) =~ 8 925 512 years (~= 9 mil years),
>> > > +  * thus the maximum adjust frequency must not overflow CNS
>> register:
>> > > +  *
>> > > +  * addend  = 10^9/nominal_freq
>> > > +  * adj_max = +/- addend*ppb_max/10^9
>> > > +  * max_ppb = (2^8-1)*nominal_freq-10^9
>> > > +  */
>> > > + adj = f_nom;
>> > > + adj *= 0x;
>> > > + adj -= 10ULL;
>> >
>> > What is this computation, and how does it relate to the comment?
>
> I considered the following simple equation: increment value at nominal 
> frequency (which is 10^9/nominal frequency nsecs) + the maximum drift value 
> (nsecs) <= maximum increment value at nominal frequency (which is 
> 8bit:0x).
> If maximum drift is written as function of nominal frequency and maximum ppb, 
> then the equation above yields that the maximum ppb is: (2^8 - 1) 
> *nominal_frequency - 10^9. The equation is also simplified by the fact that 
> the drift is written as ppm + 16bit_fractions and the increment value is 
> written as nsec + 16bit_fractions.
>
> Rafal said that this value is hardcoded: 0x64E6, while Harini said: 25000.

@ Andrei, I may have equated max ppb to max tsu frequency allowed on
the system and set that.
That will be wrong.

>
> I need to dig into this...
>
>>
>> I am not sure what you meant, but it sounds like you are on the wrong track.
>> Let me explain...
>
> Thanks.
>
>>
>> The max_adj has nothing at all to do with the width of the time register.
>> Rather, it should reflect the maximum possible change in the tuning word.
>>
>> For example, with a nominal 8 ns period, the tuning word is 0x8.
>> Looking at running the clock more slowly, the slowest possible word is
>> 0x1, meaning a difference of 0x7.  This implies an adjustment of
>> 0x7/0x8 or 98092 ppb.  Running more quickly, we can already
>> have 0x10, twice as fast, or just under 2 billion ppb.
>>
>> You should consider the extreme cases to determine the most limited
>> (smallest) max_adj value:
>>
>> Case 1 - high frequency
>> ~~~
>>
>> With a nominal 1 ns period, we have the nominal tuning word 0x1.
>> The smallest is 0x1 for a difference of 0x.  This corresponds to an
>> adjustment of 0x/0x1 = .847412109375 or 84741 ppb.
>>
>> Case 2 - low frequency
>> ~~
>>
>> With a nominal 255 ns period, the nominal word is 0xFF, the largest
>> 0xFF, and the difference is 0x.  This corresponds to and adjustment
>> of 0x/0xFF = .0039215087890625 or 3921508 ppb.
>>
>> Since 3921508 ppb is a huge adjustment, you can simply use that as a safe
>> maximum, ignoring the actual input clock.
>>

Thanks Richard.
So, if I understand right, this is theoretically limited by the
maximum input clock:
So if the highest frequency allowed (also commonly sourced in my case)
is 200MHz,
then with a 5ns time period, considering the adjustment to slowest
possible word,
0x4/0x5 will be 96948 ppb.
Shouldn't this be the max_adj?
I'm afraid I don't get why we are choosing the most limited max adj..
Sorry if I'm missing something - could you please help me understand?

Regards,
Harini

[PATCH v9 1/3] of: Add vendor prefix for Lattice Semiconductor

2016-12-08 Thread Joel Holdsworth

Lattice Semiconductor Corporation is a manufacturer of integrated
circuits and IP products, including low-power FPGAs, video connectivity
devices and millimeter wave wireless products.

Website: http://latticesemi.com

Signed-off-by: Joel Holdsworth 
Acked-by: Rob Herring 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
---
 Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 64fdc8c..7a87932 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -158,6 +158,7 @@ kosagi  Sutajio Ko-Usagi PTE Ltd.
 kyoKyocera Corporation
 lacie  LaCie
 lantiq Lantiq Semiconductor
+latticeLattice Semiconductor
 lenovo Lenovo Group Ltd.
 lg LG Corporation
 linux  Linux-specific binding
-- 
2.7.4

[PATCH v9 2/3] Documentation: Add binding document for Lattice iCE40 FPGA manager

2016-12-08 Thread Joel Holdsworth

This adds documentation of the device tree bindings of the Lattice iCE40
FPGA driver for the FPGA manager framework.

Signed-off-by: Joel Holdsworth 
Acked-by: Rob Herring 
Acked-by: Alan Tull 
Acked-by: Moritz Fischer 
Acked-by: Marek Vasut 
---
 .../bindings/fpga/lattice-ice40-fpga-mgr.txt| 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/fpga/lattice-ice40-fpga-mgr.txt

diff --git a/Documentation/devicetree/bindings/fpga/lattice-ice40-fpga-mgr.txt 
b/Documentation/devicetree/bindings/fpga/lattice-ice40-fpga-mgr.txt
new file mode 100644
index 000..7e7a78b
--- /dev/null
+++ b/Documentation/devicetree/bindings/fpga/lattice-ice40-fpga-mgr.txt
@@ -0,0 +1,21 @@
+Lattice iCE40 FPGA Manager
+
+Required properties:
+- compatible:  Should contain "lattice,ice40-fpga-mgr"
+- reg: SPI chip select
+- spi-max-frequency:   Maximum SPI frequency (>=100, <=2500)
+- cdone-gpios: GPIO input connected to CDONE pin
+- reset-gpios: Active-low GPIO output connected to CRESET_B pin. Note
+   that unless the GPIO is held low during startup, the
+   FPGA will enter Master SPI mode and drive SCK with a
+   clock signal potentially jamming other devices on the
+   bus until the firmware is loaded.
+
+Example:
+   ice40: ice40@0 {
+   compatible = "lattice,ice40-fpga-mgr";
+   reg = <0>;
+   spi-max-frequency = <100>;
+   cdone-gpios = <&gpio 24 GPIO_ACTIVE_HIGH>;
+   reset-gpios = <&gpio 22 GPIO_ACTIVE_LOW>;
+   };
-- 
2.7.4

[PATCH v9 3/3] fpga: Add support for Lattice iCE40 FPGAs

2016-12-08 Thread Joel Holdsworth

The Lattice iCE40 is a family of FPGAs with a minimalistic architecture
and very regular structure, designed for low-cost, high-volume consumer
and system applications.

This patch adds support to the FPGA manager for configuring the SRAM of
iCE40LM, iCE40LP, iCE40HX, iCE40 Ultra, iCE40 UltraLite and iCE40
UltraPlus devices, through slave SPI.

The iCE40 family is notable because it is the first FPGA family to have
complete reverse engineered bit-stream documentation for the iCE40LP and
iCE40HX devices. Furthermore, there is now a Free Software Verilog
synthesis tool-chain: the "IceStorm" tool-chain.

This project is the work of Clifford Wolf, who is the maintainer of
Yosys Verilog RTL synthesis framework, and Mathias Lasser, with notable
contributions from "Cotton Seed", the main author of "arachne-pnr"; a
place-and-route tool for iCE40 FPGAs.

Having a Free Software synthesis tool-chain offers interesting
opportunities for embedded devices that are able reconfigure themselves
with open firmware that is generated on the device itself. For example
a mobile device might have an application processor with an iCE40 FPGA
attached, which implements slave devices, or through which the processor
communicates with other devices through the FPGA fabric.

A kernel driver for the iCE40 is useful, because in some cases, the FPGA
may need to be configured before other devices can be accessed.

An example of such a device is the icoBoard; a RaspberryPI HAT which
features an iCE40HX8K with a 1 or 8 MBit SRAM and ports for
Digilent-compatible PMOD modules. A PMOD module may contain a device
with which the kernel communicates, via the FPGA.

Signed-off-by: Joel Holdsworth 
---
 drivers/fpga/Kconfig |   6 ++
 drivers/fpga/Makefile|   1 +
 drivers/fpga/ice40-spi.c | 213 +++
 3 files changed, 220 insertions(+)
 create mode 100644 drivers/fpga/ice40-spi.c

diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
index ce861a2..967cda4 100644
--- a/drivers/fpga/Kconfig
+++ b/drivers/fpga/Kconfig
@@ -20,6 +20,12 @@ config FPGA_REGION
  FPGA Regions allow loading FPGA images under control of
  the Device Tree.
 
+config FPGA_MGR_ICE40_SPI
+   tristate "Lattice iCE40 SPI"
+   depends on OF && SPI
+   help
+ FPGA manager driver support for Lattice iCE40 FPGAs over SPI.
+
 config FPGA_MGR_SOCFPGA
tristate "Altera SOCFPGA FPGA Manager"
depends on ARCH_SOCFPGA || COMPILE_TEST
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 8df07bc..cc0d364 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -6,6 +6,7 @@
 obj-$(CONFIG_FPGA) += fpga-mgr.o
 
 # FPGA Manager Drivers
+obj-$(CONFIG_FPGA_MGR_ICE40_SPI)   += ice40-spi.o
 obj-$(CONFIG_FPGA_MGR_SOCFPGA) += socfpga.o
 obj-$(CONFIG_FPGA_MGR_SOCFPGA_A10) += socfpga-a10.o
 obj-$(CONFIG_FPGA_MGR_ZYNQ_FPGA)   += zynq-fpga.o
diff --git a/drivers/fpga/ice40-spi.c b/drivers/fpga/ice40-spi.c
new file mode 100644
index 000..3c99859
--- /dev/null
+++ b/drivers/fpga/ice40-spi.c
@@ -0,0 +1,213 @@
+/*
+ * FPGA Manager Driver for Lattice iCE40.
+ *
+ *  Copyright (c) 2016 Joel Holdsworth
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * This driver adds support to the FPGA manager for configuring the SRAM of
+ * Lattice iCE40 FPGAs through slave SPI.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ICE40_SPI_FPGAMGR_RESET_DELAY 1 /* us (>200ns) */
+#define ICE40_SPI_FPGAMGR_HOUSEKEEPING_DELAY 1200 /* us */
+
+#define ICE40_SPI_FPGAMGR_NUM_ACTIVATION_BYTES DIV_ROUND_UP(49, 8)
+
+struct ice40_fpga_priv {
+   struct spi_device *dev;
+   struct gpio_desc *reset;
+   struct gpio_desc *cdone;
+};
+
+static enum fpga_mgr_states ice40_fpga_ops_state(struct fpga_manager *mgr)
+{
+   struct ice40_fpga_priv *priv = mgr->priv;
+
+   return gpiod_get_value(priv->cdone) ? FPGA_MGR_STATE_OPERATING :
+   FPGA_MGR_STATE_UNKNOWN;
+}
+
+static int ice40_fpga_ops_write_init(struct fpga_manager *mgr,
+struct fpga_image_info *info,
+const char *buf, size_t count)
+{
+   struct ice40_fpga_priv *priv = mgr->priv;
+   struct spi_device *dev = priv->dev;
+   struct spi_message message;
+   struct spi_transfer assert_cs_then_reset_delay = {
+   .cs_change   = 1,
+   .delay_usecs = ICE40_SPI_FPGAMGR_RESET_DELAY
+   };
+   struct spi_transfer housekeeping_delay_then_release_cs = {
+   .delay_usecs = ICE40_SPI_FPGAMGR_HOUSEKEEPING_DELAY
+   };
+   int ret;
+
+   if ((info->flags & FPGA_MGR_PARTIAL_RECONFIG)) {
+   dev_err(&dev->dev,
+   "Partial reconfiguration is not sup

[PATCH] staging: i4l :fixed coding style

2016-12-08 Thread Tabrez khan

Remove braces {} for single if statement block.

Signed-off-by: Tabrez khan 
---
 drivers/staging/i4l/act2000/module.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/i4l/act2000/module.c 
b/drivers/staging/i4l/act2000/module.c
index 99c9c0a..fc14de4 100644
--- a/drivers/staging/i4l/act2000/module.c
+++ b/drivers/staging/i4l/act2000/module.c
@@ -372,9 +372,8 @@ act2000_command(act2000_card *card, isdn_ctrl *c)
if (!(chan = find_channel(card, c->arg & 0x0f)))
break;
if (strlen(c->parm.num)) {
-   if (card->ptype == ISDN_PTYPE_EURO) {
+   if (card->ptype == ISDN_PTYPE_EURO)
chan->eazmask = act2000_find_msn(card, 
c->parm.num, 0);
-   }
if (card->ptype == ISDN_PTYPE_1TR6) {
int i;
chan->eazmask = 0;
-- 
2.7.4

RE: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-12-08 Thread Li, Liang Z

> On 12/08/2016 08:45 PM, Li, Liang Z wrote:
> > What's the conclusion of your discussion? It seems you want some
> > statistic before deciding whether to  ripping the bitmap from the ABI,
> > am I right?
> 
> I think Andrea and David feel pretty strongly that we should remove the
> bitmap, unless we have some data to support keeping it.  I don't feel as
> strongly about it, but I think their critique of it is pretty valid.  I think 
> the
> consensus is that the bitmap needs to go.
> 

Thanks for you clarification.

> The only real question IMNHO is whether we should do a power-of-2 or a
> length.  But, if we have 12 bits, then the argument for doing length is pretty
> strong.  We don't need anywhere near 12 bits if doing power-of-2.
> 
So each item can max represent 16MB Bytes, seems not big enough,
but enough for most case.
Things became much more simple without the bitmap, and I like simple solution 
too. :)

I will prepare the v6 and remove all the bitmap related stuffs. Thank you all!

Liang

Re: [patch 0/6] timekeeping: Cure the signed/unsigned wreckage

2016-12-08 Thread Peter Zijlstra

On Thu, Dec 08, 2016 at 08:49:31PM -, Thomas Gleixner wrote:

> Here is the queue:
> 
>   timekeeping: Force unsigned clocksource to nanoseconds conversions
>   timekeeping: Make the conversion call chain consistently unsigned
>   timekeeping: Get rid of pointless typecasts
> 
> These three patches are definitely urgent material
> 
>   timekeeping: Use mul_u64_u32_shr() instead of open coding it
> 

Acked-by: Peter Zijlstra (Intel)

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Peter Zijlstra

On Thu, Dec 08, 2016 at 08:49:39PM -, Thomas Gleixner wrote:

> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> delta)
> +{
> + u32 dh, dl;
> + u64 nsec;
> +
> + dl = delta;
> + dh = delta >> 32;
> +
> + nsec = ((u64)dl * tkr->mult) + tkr->xtime_nsec;
> + nsec >>= tkr->shift;
> + if (unlikely(dh))
> + nsec += ((u64)dh * tkr->mult) << (32 - tkr->shift);
> + return nsec;
> +}

Just for giggles, on tilegx the branch is actually slower than doing the
mult unconditionally.

The problem is that the two multiplies would otherwise completely
pipeline, whereas with the conditional you serialize them.

(came to light while talking about why the mul_u64_u32_shr() fallback
didn't work right for them, which was a combination of the above issue
and the fact that their compiler 'lost' the fact that these are
32x32->64 mults and did 64x64 ones instead).

[FYI] Output of 'cat /proc/lockdep' after applying crossrelease

2016-12-08 Thread Byungchul Park


all lock classes:
c1c8b858 FD:   38 BD:1 +.+...: cgroup_mutex
 -> [c1c8b7b0] cgroup_idr_lock
 -> [c1c934b8] pcpu_alloc_mutex
 -> [c25d2d9c] &(&idp->lock)->rlock
 -> [c1ccb5f0] simple_ida_lock
 -> [c1c9de78] kernfs_mutex
 -> [c1c8b770] cgroup_file_kn_lock
 -> [c1c8b7f0] css_set_lock
 -> [c1c8bbb8] freezer_mutex

c1cd86c8 FD:1 BD:  104 -.-...: input_pool.lock

c1cd85c8 FD:1 BD:  103 ..-...: nonblocking_pool.lock

c1c80634 FD:2 BD:   16 ..: resource_lock
 -> [c1c805f0] bootmem_resource_lock

c1c7f3f0 FD:1 BD:   21 +.+...: pgd_lock

c1c7e1d8 FD:   12 BD:1 +.+...: acpi_ioapic_lock
 -> [c1c7e6d0] ioapic_lock
 -> [c1c7e698] ioapic_mutex

c1c7e6d0 FD:2 BD:   71 -.-...: ioapic_lock
 -> [c1c7bdb0] i8259A_lock

c1cf63d0 FD:1 BD:1 ..: map_entries_lock

c1c7d7b0 FD:1 BD:1 ..: x86_mce_decoder_chain.lock

c1c89db8 FD:   44 BD:2 +.+...: clocksource_mutex
 -> [c1c89d10] watchdog_lock

c1c89d10 FD:9 BD:4 +.-...: watchdog_lock
 -> [c259f24c] &(&base->lock)->rlock
 -> [c1e0a1a4] &(&pool->lock)->rlock

c1c7ff18 FD:  172 BD:2 +.+.+.: cpu_add_remove_lock
 -> [c25bc904] &(&zone->lock)->rlock
 -> [c25bc6b4] &swhash->hlist_mutex
 -> [c1c7fec0] cpu_hotplug.lock
 -> [c1e0a1a4] &(&pool->lock)->rlock
 -> [c1e08e10] (complete)&work.complete
 -> [c1e08e08] &x->wait#6
 -> [c1e0a8bc] &rq->lock
 -> [c25db814] &x->wait#3
 -> [c1cddc78] gdp_mutex
 -> [c25d2da8] &(&k->list_lock)->rlock
 -> [c25d2d9c] &(&idp->lock)->rlock
 -> [c1ccb5f0] simple_ida_lock
 -> [c1c9de78] kernfs_mutex
 -> [c1ccd680] bus_type_sem
 -> [c1c9dfd0] sysfs_symlink_target_lock
 -> [c25db698] &(&dev->power.lock)->rlock
 -> [c1cdee58] dpm_list_mtx
 -> [c1cde990] req_lock
 -> [c1e098e4] &p->pi_lock
 -> [c25db718] (complete)&req.done
 -> [c25db710] &x->wait#5
 -> [c1ccb6d8] uevent_sock_mutex
 -> [c1e0a1a5] &pool->lock/1
 -> [c1c820b0] running_helpers_waitq.lock
 -> [c1e06400] subsys mutex#18
 -> [c1e0640c] subsys mutex#19
 -> [c1c934b8] pcpu_alloc_mutex
 -> [c1e0a16c] &wq->mutex
 -> [c1c82a10] kthread_create_lock
 -> [c1e0a360] (complete)&done
 -> [c1e0a358] &x->wait
 -> [c1c827b8] wq_pool_mutex
 -> [c259f24c] &(&base->lock)->rlock
 -> [c25c4058] &(&n->list_lock)->rlock
 -> [c2607b50] &(&k->k_lock)->rlock

c1c90058 FD:   42 BD:8 +.+...: jump_label_mutex

c25bc904 FD:1 BD:  164 ..-...: &(&zone->lock)->rlock

c1c934b8 FD:2 BD:   77 +.+.+.: pcpu_alloc_mutex
 -> [c1c934f0] pcpu_lock

c1c934f0 FD:1 BD:   80 ..-...: pcpu_lock

c25c4058 FD:1 BD:8 -.-...: &(&n->list_lock)->rlock

c1c7fec0 FD:   41 BD:3 ++: cpu_hotplug.lock
 -> [c1c7fea8] cpu_hotplug.lock#2

c1c7fea8 FD:   40 BD:   26 +.+.+.: cpu_hotplug.lock#2
 -> [c1c7fe54] cpu_hotplug.wq.lock
 -> [c1e098e4] &p->pi_lock
 -> [c1c830d8] smpboot_threads_lock
 -> [c25bc6b4] &swhash->hlist_mutex
 -> [c25d2d9c] &(&idp->lock)->rlock
 -> [c1ccb5f0] simple_ida_lock
 -> [c1c82a10] kthread_create_lock
 -> [c1e0a360] (complete)&done
 -> [c1e0a358] &x->wait
 -> [c1e0a8bc] &rq->lock
 -> [c1e0a184] &pool->attach_mutex
 -> [c1e0a1a4] &(&pool->lock)->rlock
 -> [c25bc904] &(&zone->lock)->rlock
 -> [c259f1d0] rcu_node_0
 -> [c1c934b8] pcpu_alloc_mutex
 -> [c1c8cf98] relay_channels_mutex
 -> [c1c7c378] smp_alt
 -> [c1c878d8] sparse_irq_lock
 -> [c1e09d9d] (complete)&st->done
 -> [c1e09d95] &x->wait#2

c1c93558 FD:   47 BD:   14 +.+.+.: slab_mutex
 -> [c1c934b8] pcpu_alloc_mutex
 -> [c25d2d9c] &(&idp->lock)->rlock
 -> [c1ccb5f0] simple_ida_lock
 -> [c1c9de78] kernfs_mutex
 -> [c25d2da8] &(&k->list_lock)->rlock
 -> [c1ccb6d8] uevent_sock_mutex
 -> [c1e0a1a5] &pool->lock/1
 -> [c1c820b0] running_helpers_waitq.lock
 -> [c1c9dfd0] sysfs_symlink_target_lock
 -> [c25bc904] &(&zone->lock)->rlock
 -> [c1e0a8bc] &rq->lock
 -> [c25b4244] &(kretprobe_table_locks[i].lock)

c1e0aa3c FD:1 BD:2 +.+...: &dl_b->dl_runtime_lock

c1e0a8bc FD:4 BD:  361 -.-.-.: &rq->lock
 -> [c1e0a9e8] &rt_b->rt_runtime_lock
 -> [c1e0aa54] &cp->lock

c1c6d034 FD:5 BD:1 ..: init_task.pi_lock
 -> [c1e0a8bc] &rq->lock

c259f13f FD:1 BD:1 ..: rcu_read_lock

c259f1d0 FD:1 BD:   82 ..-...: rcu_node_0

c1c8d458 FD:   24 BD:1 +.+.+.: trace_types_lock
 -> [c1c99ff0] pin_fs_lock
 -> [c1cc3234] &sb->s_type->i_mutex_key#6

c1c7f970 FD:1 BD:1 ..: panic_notifier_list.lock

c1c82a50 FD:1 BD:1 ..: die_chain.lock

c1c8d6d8 FD:   25 BD:2 +.+.+.: trace_event_sem
 -> [c1c99ff0] pin_fs_lock
 -> [c1cc3234] &sb->s_type->i_mutex_key#6
 -> [c25c4058] &(&n->list_lock)->rlock
 -> [c25bc904] &(&zone->lock)->rlock

c1c8e238 FD:1 BD:1 ..: trigger_cmd_mutex

c1c7bdb0 FD:1 BD:   72 -.-...: i8259A_lock

c259f0c4 FD:4 BD:   70 -.-...: &irq_desc_lock_class
 -> [c1c7bdb0] i8259A_lock
 -> [c1c7e5b0] vector_lock
 -> [c1c7e6d0] ioapic_lock

c1c87bf8 FD:   10 BD:3 +.+.+.: irq_domain_mutex
 -> [c1e0a8bc] &rq->lock
 -> [c1c7e5b0] vector_lock
 -> [c259f0c4] &irq_desc_lock_class
 -> [c1c7bdb0] i8259A_lock
 -> [c1c87b98] revmap_trees_mute

Re: [PATCH v2] kexec: add cond_resched into kimage_alloc_crash_control_pages

2016-12-08 Thread Eric W. Biederman

zhong jiang  writes:

> On 2016/12/8 17:41, Xunlei Pang wrote:
>> On 12/08/2016 at 10:37 AM, zhongjiang wrote:
>>> From: zhong jiang 
>>>
[snip]
>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>> index 5616755..bfc9621 100644
>>> --- a/kernel/kexec_core.c
>>> +++ b/kernel/kexec_core.c
>>> @@ -441,6 +441,8 @@ static struct page 
>>> *kimage_alloc_crash_control_pages(struct kimage *image,
>>> while (hole_end <= crashk_res.end) {
>>> unsigned long i;
>>>  
>>> +   cond_resched();
>>> +
>> I can't see why it would take a long time to loop inside, the job it does is 
>> simply to find a control area
>> not overlapped with image->segment[], you can see the loop "for (i = 0; i < 
>> image->nr_segments; i++)",
>> @hole_end will be advanced to the end of its next nearby segment once 
>> overlap was detected each loop,
>> also there are limited (<=16) segments, so it won't take long to locate the 
>> right area.
>>
>> Am I missing something?
>>
>> Regards,
>> Xunlei
>   if the crashkernel = auto is set in cmdline.  it represent crashk_res.end 
> will exceed to 4G, the first allocate control pages will
>   loop  million times. if we set crashk_res.end to the higher value
>   manually,  you can image

Or in short the cond_resched is about keeping things reasonable when the
loop has worst case behavior.

Eric

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Fri, Dec 09, 2016 at 05:08:26AM +0100, Ingo Molnar wrote:
> > > +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
> > > +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> > > delta)
> > > +{
> > > + unsigned __int128 nsec;
> > > +
> > > + nsec = ((unsigned __int128)delta * tkr->mult) + tkr->xtime_nsec;
> > > + return (u64) (nsec >> tkr->shift);
> > > +}
> > 
> > Actually, 128-bit multiplication shouldn't be too horrible - at least on 
> > 64-bit 
> > architectures. (128-bit division is another matter, but there's no division 
> > here.)
> 
> IIRC there are 64bit architectures that do not have a 64x64->128 mult,
> only a 64x64->64 mult instruction. Its not immediately apparent using
> __int128 will generate optimal code for those, nor is it a given GCC
> will not require libgcc functions for those.

Well, if the overflow case is rare (which it is in this case) then it should 
still 
be relatively straightforward, something like:

X and Y are 64-bit:

X = Xh*2^32 + Xl
Y = Yh*2^32 + Yl

X*Y = (Xh*2^32 + Xl)*(Yh*2^32 + Yl)

=   Xh*2^32*(Yh*2^32 + Yl)
  +  Xl*(Yh*2^32 + Yl)

=   Xh*Yh*2^64
  + Xh*Yl*2^32
  + Xl*Yh*2^32
  + XL*Yl

Which is four 32x32->64 multiplications in the worst case.

Where a valid overflow threshold is relatively easy to determine in a hot path 
compatible fashion:

if (Xh != 0 || Yh != 0)
slow_path();

And this simple and fast overflow check should still cover the overwhelming 
majority of 'sane' systems. (A more involved 'could it overflow' check of 
counting 
the high bits with 8 bit granularity by looking at the high bytes not at the 
words 
could be done in the slow path - to still avoid the 4 multiplications in most 
cases.)

Am I missing something?

Thanks,

Ingo

[PATCH v4 04/15] lockdep: Add a function building a chain between two classes

2016-12-08 Thread Byungchul Park

add_chain_cache() should be used in the context where the hlock is
owned since it might be racy in another context. However crossrelease
feature needs to build a chain between two locks regardless of context.
So introduce a new function making it possible.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 56 
 1 file changed, 56 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5df56aa..111839f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2105,6 +2105,62 @@ static int check_no_collision(struct task_struct *curr,
return 1;
 }
 
+/*
+ * This is for building a chain between just two different classes,
+ * instead of adding a new hlock upon current, which is done by
+ * add_chain_cache().
+ *
+ * This can be called in any context with two classes, while
+ * add_chain_cache() must be done within the lock owener's context
+ * since it uses hlock which might be racy in another context.
+ */
+static inline int add_chain_cache_classes(unsigned int prev,
+ unsigned int next,
+ unsigned int irq_context,
+ u64 chain_key)
+{
+   struct hlist_head *hash_head = chainhashentry(chain_key);
+   struct lock_chain *chain;
+
+   /*
+* Allocate a new chain entry from the static array, and add
+* it to the hash:
+*/
+
+   /*
+* We might need to take the graph lock, ensure we've got IRQs
+* disabled to make this an IRQ-safe lock.. for recursion reasons
+* lockdep won't complain about its own locking errors.
+*/
+   if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+   return 0;
+
+   if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+   if (!debug_locks_off_graph_unlock())
+   return 0;
+
+   print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+   dump_stack();
+   return 0;
+   }
+
+   chain = lock_chains + nr_lock_chains++;
+   chain->chain_key = chain_key;
+   chain->irq_context = irq_context;
+   chain->depth = 2;
+   if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) 
{
+   chain->base = nr_chain_hlocks;
+   nr_chain_hlocks += chain->depth;
+   chain_hlocks[chain->base] = prev - 1;
+   chain_hlocks[chain->base + 1] = next -1;
+   }
+   hlist_add_head_rcu(&chain->entry, hash_head);
+   debug_atomic_inc(chain_lookup_misses);
+   inc_chains();
+
+   return 1;
+}
+
 static inline int add_chain_cache(struct task_struct *curr,
  struct held_lock *hlock,
  u64 chain_key)
-- 
1.9.1

Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII

2016-12-08 Thread Jie Deng



On 2016/12/9 6:15, Florian Fainelli wrote:
> On 12/06/2016 07:57 PM, Jie Deng wrote:
>> This patch adds phy-mode support for Synopsys XLGMAC
> The functional changes look good, but I would like to see some
> description of what the XL part stands for here.
>
> While you are modifying this, do you also mind submitting a Device Tree
> specification change:
>
> https://www.devicetree.org/specifications/
>
> Thanks!
Thank you for the information.

Currenlty, the XLGMAC is a new IP from Synopsys. We are using a PCI driver for
testing on FPGA platform.  Is it possible to add these changes first and submit
a device tree in the future?
>> Signed-off-by: Jie Deng 
>> ---
>>  Documentation/devicetree/bindings/net/ethernet.txt | 1 +
>>  include/linux/phy.h| 3 +++
>>  2 files changed, 4 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/net/ethernet.txt 
>> b/Documentation/devicetree/bindings/net/ethernet.txt
>> index 0515095..2378f00 100644
>> --- a/Documentation/devicetree/bindings/net/ethernet.txt
>> +++ b/Documentation/devicetree/bindings/net/ethernet.txt
>> @@ -28,6 +28,7 @@ The following properties are common to the Ethernet 
>> controllers:
>>* "rtbi"
>>* "smii"
>>* "xgmii"
>> +  * "xlgmii"
>>* "trgmii"
>>  - phy-connection-type: the same as "phy-mode" property but described in 
>> ePAPR;
>>  - phy-handle: phandle, specifies a reference to a node representing a PHY
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index feb8a98..b52f9f8 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -79,6 +79,7 @@
>>  PHY_INTERFACE_MODE_RTBI,
>>  PHY_INTERFACE_MODE_SMII,
>>  PHY_INTERFACE_MODE_XGMII,
>> +PHY_INTERFACE_MODE_XLGMII,
>>  PHY_INTERFACE_MODE_MOCA,
>>  PHY_INTERFACE_MODE_QSGMII,
>>  PHY_INTERFACE_MODE_TRGMII,
>> @@ -136,6 +137,8 @@ static inline const char *phy_modes(phy_interface_t 
>> interface)
>>  return "smii";
>>  case PHY_INTERFACE_MODE_XGMII:
>>  return "xgmii";
>> +case PHY_INTERFACE_MODE_XLGMII:
>> +return "xlgmii";
>>  case PHY_INTERFACE_MODE_MOCA:
>>  return "moca";
>>  case PHY_INTERFACE_MODE_QSGMII:
>>
>

[PATCH v4 03/15] lockdep: Refactor lookup_chain_cache()

2016-12-08 Thread Byungchul Park

Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each one is useful. So this
patch makes lookup_chain_cache() only do lookup functionality and
makes add_chain_cahce() only do add functionality. And it's more
readable than these functionalities are mixed in a function.

Crossrelease feature also needs to use each one separately.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 129 +--
 1 file changed, 81 insertions(+), 48 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a71..5df56aa 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2105,15 +2105,9 @@ static int check_no_collision(struct task_struct *curr,
return 1;
 }
 
-/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
- */
-static inline int lookup_chain_cache(struct task_struct *curr,
-struct held_lock *hlock,
-u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+ struct held_lock *hlock,
+ u64 chain_key)
 {
struct lock_class *class = hlock_class(hlock);
struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2121,49 +2115,18 @@ static inline int lookup_chain_cache(struct task_struct 
*curr,
int i, j;
 
/*
+* Allocate a new chain entry from the static array, and add
+* it to the hash:
+*/
+
+   /*
 * We might need to take the graph lock, ensure we've got IRQs
 * disabled to make this an IRQ-safe lock.. for recursion reasons
 * lockdep won't complain about its own locking errors.
 */
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return 0;
-   /*
-* We can walk it lock-free, because entries only get added
-* to the hash:
-*/
-   hlist_for_each_entry_rcu(chain, hash_head, entry) {
-   if (chain->chain_key == chain_key) {
-cache_hit:
-   debug_atomic_inc(chain_lookup_hits);
-   if (!check_no_collision(curr, hlock, chain))
-   return 0;
 
-   if (very_verbose(class))
-   printk("\nhash chain already cached, key: "
-   "%016Lx tail class: [%p] %s\n",
-   (unsigned long long)chain_key,
-   class->key, class->name);
-   return 0;
-   }
-   }
-   if (very_verbose(class))
-   printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
-   (unsigned long long)chain_key, class->key, class->name);
-   /*
-* Allocate a new chain entry from the static array, and add
-* it to the hash:
-*/
-   if (!graph_lock())
-   return 0;
-   /*
-* We have to walk the chain again locked - to avoid duplicates:
-*/
-   hlist_for_each_entry(chain, hash_head, entry) {
-   if (chain->chain_key == chain_key) {
-   graph_unlock();
-   goto cache_hit;
-   }
-   }
if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
if (!debug_locks_off_graph_unlock())
return 0;
@@ -2215,6 +2178,75 @@ cache_hit:
return 1;
 }
 
+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+   struct hlist_head *hash_head = chainhashentry(chain_key);
+   struct lock_chain *chain;
+
+   /*
+* We can walk it lock-free, because entries only get added
+* to the hash:
+*/
+   hlist_for_each_entry_rcu(chain, hash_head, entry) {
+   if (chain->chain_key == chain_key) {
+   debug_atomic_inc(chain_lookup_hits);
+   return chain;
+   }
+   }
+   return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+struct held_lock *hlock,
+u64 chain_key)
+{
+   struct lock_class *class = hlock_class(hlock);
+   struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+   if (chain) {
+cache_hit:
+   if (!check_no_collision(curr, hlock, chain))

[PATCH v4 10/15] lockdep: Apply crossrelease to completion operation

2016-12-08 Thread Byungchul Park

wait_for_completion() and its family can cause deadlock. Nevertheless,
it cannot use the lock correntness validator because complete() will be
called in different context from the context calling
wait_for_completion(), which violates lockdep's assumption without
crossrelease feature.

However, thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can apply the lockdep
detector to wait_for_completion() and complete(). Applied it.

Signed-off-by: Byungchul Park 
---
 include/linux/completion.h | 121 +
 kernel/locking/lockdep.c   |  17 +++
 kernel/sched/completion.c  |  54 +++-
 lib/Kconfig.debug  |   8 +++
 4 files changed, 167 insertions(+), 33 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..67a27af 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include 
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,53 @@
 struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+   struct lockdep_map map;
+   struct cross_lock xlock;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+   lock_acquire_exclusive(&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+   lock_release(&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+   lock_commit_crosslock(&x->map);
+}
+
+#define init_completion(x) \
+do {   \
+   static struct lock_class_key __key; \
+   lockdep_init_map_crosslock(&(x)->map,   \
+   &(x)->xlock,\
+   "(complete)" #x,\
+   &__key, 0); \
+   __init_completion(x);   \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x, int try) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+   { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+   STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work), \
+   &(work).xlock), STATIC_CROSS_LOCK_INIT()}
+#else
 #define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(&work); work; })
@@ -70,7 +116,7 @@ struct completion {
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
 {
x->done = 0;
init_waitqueue_head(&x->wait);
@@ -88,18 +134,75 @@ static inline void reinit_completion(struct completion *x)
x->done = 0;
 }
 
-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
   unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
struct completion *x, unsigned long timeout);
+
+static inline void wait_for_completion(struct completion *x)
+{
+   complete_acquire(x);
+   __wait_for_completion(x);
+   complete_release(x);
+}
+
+static inline void wait_for_completion_io(struct completion *x)
+{
+   complete_acquire(x);
+   __wait_for_completion_io(x);
+   complete_release(x);
+}
+
+static inline int wait_for_completion_interruptible(struct completion *x)
+{
+   int ret;
+   complete_ac

[PATCH v4 06/15] lockdep: Make save_trace can skip stack tracing of the current

2016-12-08 Thread Byungchul Park

Currently, save_trace() always performs save_stack_trace() for the
current. However, crossrelease needs to use stack trace data of another
context instead of the current. So add a parameter for skipping stack
tracing of the current and make it use trace data, which is already
saved by crossrelease framework.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 33 -
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3eaa11c..11580ec 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -387,15 +387,22 @@ static void print_lockdep_off(const char *bug_msg)
 #endif
 }
 
-static int save_trace(struct stack_trace *trace)
+static int save_trace(struct stack_trace *trace, int skip_tracing)
 {
-   trace->nr_entries = 0;
-   trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
-   trace->entries = stack_trace + nr_stack_trace_entries;
+   unsigned int nr_avail = MAX_STACK_TRACE_ENTRIES - 
nr_stack_trace_entries;
 
-   trace->skip = 3;
-
-   save_stack_trace(trace);
+   if (skip_tracing) {
+   trace->nr_entries = min(trace->nr_entries, nr_avail);
+   memcpy(stack_trace + nr_stack_trace_entries, trace->entries,
+   trace->nr_entries * sizeof(trace->entries[0]));
+   trace->entries = stack_trace + nr_stack_trace_entries;
+   } else {
+   trace->nr_entries = 0;
+   trace->max_entries = nr_avail;
+   trace->entries = stack_trace + nr_stack_trace_entries;
+   trace->skip = 3;
+   save_stack_trace(trace);
+   }
 
/*
 * Some daft arches put -1 at the end to indicate its a full trace.
@@ -1172,7 +1179,7 @@ static noinline int print_circular_bug(struct lock_list 
*this,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
 
-   if (!save_trace(&this->trace))
+   if (!save_trace(&this->trace, 0))
return 0;
 
depth = get_lock_depth(target);
@@ -1518,13 +1525,13 @@ print_bad_irq_dependency(struct task_struct *curr,
 
printk("\nthe dependencies between %s-irq-safe lock", irqclass);
printk(" and the holding lock:\n");
-   if (!save_trace(&prev_root->trace))
+   if (!save_trace(&prev_root->trace, 0))
return 0;
print_shortest_lock_dependencies(backwards_entry, prev_root);
 
printk("\nthe dependencies between the lock to be acquired");
printk(" and %s-irq-unsafe lock:\n", irqclass);
-   if (!save_trace(&next_root->trace))
+   if (!save_trace(&next_root->trace, 0))
return 0;
print_shortest_lock_dependencies(forwards_entry, next_root);
 
@@ -1856,7 +1863,7 @@ check_prev_add(struct task_struct *curr, struct held_lock 
*prev,
}
 
if (!own_trace && stack_saved && !*stack_saved) {
-   if (!save_trace(&trace))
+   if (!save_trace(&trace, 0))
return 0;
*stack_saved = 1;
}
@@ -2547,7 +2554,7 @@ print_irq_inversion_bug(struct task_struct *curr,
lockdep_print_held_locks(curr);
 
printk("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
-   if (!save_trace(&root->trace))
+   if (!save_trace(&root->trace, 0))
return 0;
print_shortest_lock_dependencies(other, root);
 
@@ -3134,7 +3141,7 @@ static int mark_lock(struct task_struct *curr, struct 
held_lock *this,
 
hlock_class(this)->usage_mask |= new_mask;
 
-   if (!save_trace(hlock_class(this)->usage_traces + new_bit))
+   if (!save_trace(hlock_class(this)->usage_traces + new_bit, 0))
return 0;
 
switch (new_bit) {
-- 
1.9.1

[PATCH v4 11/15] pagemap.h: Remove trailing white space

2016-12-08 Thread Byungchul Park

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park 
---
 include/linux/pagemap.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9735410..0cf6980 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -485,7 +485,7 @@ static inline void wake_up_page(struct page *page, int bit)
__wake_up_bit(page_waitqueue(page), &page->flags, bit);
 }
 
-/* 
+/*
  * Wait for a page to be unlocked.
  *
  * This must be called with the caller "holding" the page,
@@ -498,7 +498,7 @@ static inline void wait_on_page_locked(struct page *page)
wait_on_page_bit(compound_head(page), PG_locked);
 }
 
-/* 
+/*
  * Wait for a page to complete writeback
  */
 static inline void wait_on_page_writeback(struct page *page)
-- 
1.9.1

Re: [PATCHv3 perf/core 5/7] samples/bpf: Switch over to libbpf

2016-12-08 Thread Wangnan (F)




On 2016/12/9 13:04, Wangnan (F) wrote:



On 2016/12/9 10:46, Joe Stringer wrote:

[SNIP]


  diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 62d89d50fcbd..616bd55f3be8 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -149,6 +149,8 @@ CMD_TARGETS = $(LIB_FILE)
TARGETS = $(CMD_TARGETS)
  +libbpf: all
+


Why we need this? I tested this patch without it and it seems to work, 
and

this line causes an extra error:
 $ pwd
 /home/wn/kernel/tools/lib/bpf
 $ make libbpf
 ...
 gcc -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DHAVE_ELF_GETPHDRNUM_SUPPORT 
-Wbad-function-cast -Wdeclaration-after-statement -Wformat-security 
-Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes 
-Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked 
-Wredundant-decls -Wshadow -Wstrict-aliasing=3 -Wstrict-prototypes 
-Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat 
-Werror -Wall -fPIC -I. -I/home/wn/kernel-hydrogen/tools/include 
-I/home/wn/kernel-hydrogen/tools/arch/x86/include/uapi 
-I/home/wn/kernel-hydrogen/tools/include/uapilibbpf.c all   -o libbpf

 gcc: error: all: No such file or directory
 make: *** [libbpf] Error 1

Thank you.


It is not 'caused' by your patch. 'make libbpf' fails without
your change because it tries to build an executable from
libbpf.c, but main() is missing.

I think libbpf should never be used as a make target. Your
new dependency looks strange.

Thank you.

[PATCH v4 08/15] lockdep: Make crossrelease use save_stack_trace_fast()

2016-12-08 Thread Byungchul Park

Currently crossrelease feature uses save_stack_trace() to save
backtrace. However, it has much overhead. So this patch makes it
use save_stack_trace_norm() instead, which has smaller overhead.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2c8b2c1..fbd07ee 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4768,7 +4768,7 @@ static void add_plock(struct held_lock *hlock, unsigned 
int prev_gen_id,
plock->trace.max_entries = MAX_PLOCK_TRACE_ENTRIES;
plock->trace.entries = plock->trace_entries;
plock->trace.skip = 3;
-   save_stack_trace(&plock->trace);
+   save_stack_trace_fast(&plock->trace);
}
 }
 
-- 
1.9.1

[PATCH v4 09/15] lockdep: Make print_circular_bug() crosslock-aware

2016-12-08 Thread Byungchul Park

Friends of print_circular_bug() reporting circular bug assumes that
target hlock is owned by the current. However, in crossrelease feature,
target hlock can be owned by any context.

In this case, the circular bug is caused by target hlock which cannot be
released since its dependent lock cannot be released. So the report
format needs to be changed to be aware of this.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 56 +---
 1 file changed, 39 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index fbd07ee..cb1a600 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1126,22 +1126,41 @@ print_circular_lock_scenario(struct held_lock *src,
printk("\n\n");
}
 
-   printk(" Possible unsafe locking scenario:\n\n");
-   printk("   CPU0CPU1\n");
-   printk("   \n");
-   printk("  lock(");
-   __print_lock_name(target);
-   printk(");\n");
-   printk("   lock(");
-   __print_lock_name(parent);
-   printk(");\n");
-   printk("   lock(");
-   __print_lock_name(target);
-   printk(");\n");
-   printk("  lock(");
-   __print_lock_name(source);
-   printk(");\n");
-   printk("\n *** DEADLOCK ***\n\n");
+   if (cross_class(target)) {
+   printk(" Possible unsafe locking scenario by crosslock:\n\n");
+   printk("   CPU0CPU1\n");
+   printk("   \n");
+   printk("  lock(");
+   __print_lock_name(parent);
+   printk(");\n");
+   printk("  lock(");
+   __print_lock_name(target);
+   printk(");\n");
+   printk("   lock(");
+   __print_lock_name(source);
+   printk(");\n");
+   printk("   unlock(");
+   __print_lock_name(target);
+   printk(");\n");
+   printk("\n *** DEADLOCK ***\n\n");
+   } else {
+   printk(" Possible unsafe locking scenario:\n\n");
+   printk("   CPU0CPU1\n");
+   printk("   \n");
+   printk("  lock(");
+   __print_lock_name(target);
+   printk(");\n");
+   printk("   lock(");
+   __print_lock_name(parent);
+   printk(");\n");
+   printk("   lock(");
+   __print_lock_name(target);
+   printk(");\n");
+   printk("  lock(");
+   __print_lock_name(source);
+   printk(");\n");
+   printk("\n *** DEADLOCK ***\n\n");
+   }
 }
 
 /*
@@ -1166,7 +1185,10 @@ print_circular_bug_header(struct lock_list *entry, 
unsigned int depth,
printk("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(check_src);
-   printk("\nbut task is already holding lock:\n");
+   if (cross_class(hlock_class(check_tgt)))
+   printk("\nbut now in the release context of lock:\n");
+   else
+   printk("\nbut task is already holding lock:\n");
print_lock(check_tgt);
printk("\nwhich lock already depends on the new lock.\n\n");
printk("\nthe existing dependency chain (in reverse order) is:\n");
-- 
1.9.1

[PATCH v4 14/15] lockdep: Move data used in CONFIG_LOCKDEP_PAGELOCK from page to page_ext

2016-12-08 Thread Byungchul Park

CONFIG_LOCKDEP_PAGELOCK is keeping data, with which lockdep can check
and detect deadlock by page lock, e.g. lockdep_map and cross_lock in
struct page. But move it to page_ext since it's a debug feature so it's
preferred to keep it in struct page_ext than struct page.

Signed-off-by: Byungchul Park 
---
 include/linux/mm_types.h   |  5 
 include/linux/page-flags.h | 19 ++--
 include/linux/page_ext.h   |  5 
 include/linux/pagemap.h| 28 +++---
 lib/Kconfig.debug  |  1 +
 mm/filemap.c   | 72 ++
 mm/page_alloc.c|  3 --
 mm/page_ext.c  |  4 +++
 8 files changed, 122 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 87db0ac..6558e12 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -224,11 +224,6 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
 #endif
-
-#ifdef CONFIG_LOCKDEP_PAGELOCK
-   struct lockdep_map map;
-   struct cross_lock xlock;
-#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e28f232..9f677ff 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -352,28 +352,41 @@ PAGEFLAG(Idle, idle, PF_ANY)
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include 
+#include 
 
 TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
 
 static __always_inline void __SetPageLocked(struct page *page)
 {
+   struct page_ext *e;
+
__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
page = compound_head(page);
-   lock_acquire_exclusive(&page->map, 0, 1, NULL, _RET_IP_);
+   e = lookup_page_ext(page);
+   if (unlikely(!e))
+   return;
+
+   lock_acquire_exclusive(&e->map, 0, 1, NULL, _RET_IP_);
 }
 
 static __always_inline void __ClearPageLocked(struct page *page)
 {
+   struct page_ext *e;
+
__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
 
page = compound_head(page);
+   e = lookup_page_ext(page);
+   if (unlikely(!e))
+   return;
+
/*
 * lock_commit_crosslock() is necessary for crosslock
 * when the lock is released, before lock_release().
 */
-   lock_commit_crosslock(&page->map);
-   lock_release(&page->map, 0, _RET_IP_);
+   lock_commit_crosslock(&e->map);
+   lock_release(&e->map, 0, _RET_IP_);
 }
 #else
 __PAGEFLAG(Locked, locked, PF_NO_TAIL)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index e1fe7cf..f84e9be 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -48,6 +48,11 @@ struct page_ext {
int last_migrate_reason;
unsigned long trace_entries[8];
 #endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+   struct lockdep_map map;
+   struct cross_lock xlock;
+#endif
 };
 
 extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index dbe7adf..79174ad 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,6 +16,7 @@
 #include 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
 #include 
+#include 
 #endif
 
 /*
@@ -417,28 +418,47 @@ static inline pgoff_t linear_page_index(struct 
vm_area_struct *vma,
 }
 
 #ifdef CONFIG_LOCKDEP_PAGELOCK
+extern struct page_ext_operations lockdep_pagelock_ops;
+
 #define lock_page_init(p)  \
 do {   \
static struct lock_class_key __key; \
-   lockdep_init_map_crosslock(&(p)->map, &(p)->xlock,  \
+   struct page_ext *e = lookup_page_ext(p);\
+   \
+   if (unlikely(!e))   \
+   break;  \
+   \
+   lockdep_init_map_crosslock(&(e)->map, &(e)->xlock,  \
"(PG_locked)" #p, &__key, 0);   \
 } while (0)
 
 static inline void lock_page_acquire(struct page *page, int try)
 {
+   struct page_ext *e;
+
page = compound_head(page);
-   lock_acquire_exclusive(&page->map, 0, try, NULL, _RET_IP_);
+   e = lookup_page_ext(page);
+   if (unlikely(!e))
+   return;
+
+   lock_acquire_exclusive(&e->map, 0, try, NULL, _RET_IP_);
 }
 
 static inline void lock_page_release(struct page *page)
 {
+   struct page_ext *e;
+
page = compound_head(page);
+   e = lookup_page_ext(page);
+   if (unlikely(!e))
+   return;
+
/*
 * lock_commit_crosslock() is necessary for crosslock
 * when the lock is released, before lock_release().
 */
-   lock_commit_crosslock(&page->map);
-   lock_release(&page->map, 0, _RET_IP_

[PATCH v4 07/15] lockdep: Implement crossrelease feature

2016-12-08 Thread Byungchul Park

Crossrelease feature calls a lock 'crosslock' if it is releasable
in any context. For crosslock, all locks having been held in the
release context of the crosslock, until eventually the crosslock
will be released, have dependency with the crosslock.

Using crossrelease feature, we can detect deadlock possibility even
for lock_page(), wait_for_complete() and so on.

Signed-off-by: Byungchul Park 
---
 include/linux/irqflags.h |  12 +-
 include/linux/lockdep.h  | 122 +++
 include/linux/sched.h|   5 +
 kernel/exit.c|   9 +
 kernel/fork.c|  20 ++
 kernel/locking/lockdep.c | 517 +--
 lib/Kconfig.debug|  13 ++
 7 files changed, 682 insertions(+), 16 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..b1854fa 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,9 +23,17 @@
 # define trace_softirq_context(p)  ((p)->softirq_context)
 # define trace_hardirqs_enabled(p) ((p)->hardirqs_enabled)
 # define trace_softirqs_enabled(p) ((p)->softirqs_enabled)
-# define trace_hardirq_enter() do { current->hardirq_context++; } while (0)
+# define trace_hardirq_enter() \
+do {   \
+   current->hardirq_context++; \
+   crossrelease_hardirq_start();   \
+} while (0)
 # define trace_hardirq_exit()  do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter()   do { current->softirq_context++; } 
while (0)
+# define lockdep_softirq_enter()   \
+do {   \
+   current->softirq_context++; \
+   crossrelease_softirq_start();   \
+} while (0)
 # define lockdep_softirq_exit()do { current->softirq_context--; } 
while (0)
 # define INIT_TRACE_IRQFLAGS   .softirqs_enabled = 1,
 #else
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index eabe013..6b3708b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -108,6 +108,12 @@ struct lock_class {
unsigned long   contention_point[LOCKSTAT_POINTS];
unsigned long   contending_point[LOCKSTAT_POINTS];
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+   /*
+* Flag to indicate whether it's a crosslock or normal.
+*/
+   int cross;
+#endif
 };
 
 #ifdef CONFIG_LOCK_STAT
@@ -143,6 +149,9 @@ struct lock_class_stats lock_stats(struct lock_class 
*class);
 void clear_lock_stats(struct lock_class *class);
 #endif
 
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+struct cross_lock;
+#endif
 /*
  * Map the lock object (the lock instance) to the lock-class object.
  * This is embedded into specific lock instances:
@@ -155,6 +164,9 @@ struct lockdep_map {
int cpu;
unsigned long   ip;
 #endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+   struct cross_lock   *xlock;
+#endif
 };
 
 static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -258,7 +270,82 @@ struct held_lock {
unsigned int hardirqs_off:1;
unsigned int references:12; /* 32 
bits */
unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+   /*
+* This is used to find out the first plock among plocks having
+* been acquired since a crosslock was held. Crossrelease feature
+* uses chain cache between the crosslock and the first plock to
+* avoid building unnecessary dependencies, like how lockdep uses
+* a sort of chain cache for normal locks.
+*/
+   unsigned int gen_id;
+#endif
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_PLOCK_TRACE_ENTRIES5
+
+/*
+ * This is for keeping locks waiting for commit to happen so that
+ * dependencies are actually built later at commit step.
+ *
+ * Every task_struct has an array of pend_lock. Each entiry will be
+ * added with a lock whenever lock_acquire() is called for normal lock.
+ */
+struct pend_lock {
+   /*
+* prev_gen_id is used to check whether any other hlock in the
+* current is already dealing with the xlock, with which commit
+* is performed. If so, this plock can be skipped.
+*/
+   unsigned intprev_gen_id;
+   /*
+* A kind of global timestamp increased and set when this plock
+* is inserted.
+*/
+   unsigned intgen_id;
+
+   int hardirq_context;
+   int softirq_context;
+
+   /*
+* Whenever irq happens, these are updated so that we can
+* distinguish each irq context uniquely.
+*/
+   unsigned inthardirq_id;
+   unsigned intsoftirq_id;
+
+   /*
+* Seperate stack_trace data. This will be used at commit step.
+*/
+   struct stack_trace  trace;
+   unsign

[PATCH v4 13/15] lockdep: Apply lock_acquire(release) on Set(Clear)PageLocked

2016-12-08 Thread Byungchul Park

Usually PG_locked bit is updated by lock_page() or unlock_page().
However, it can be also updated through __SetPageLocked() or
__ClearPageLockded(). They have to be considered, to get paired between
acquire and release.

Furthermore, e.g. __SetPageLocked() in add_to_page_cache_lru() is called
frequently. We might miss many chances to check deadlock if we ignore it.
Consider __Set(__Clear)PageLockded as well.

Signed-off-by: Byungchul Park 
---
 include/linux/page-flags.h | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a3244..e28f232 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -249,7 +249,6 @@ static inline int TestClearPage##uname(struct page *page) { 
return 0; }
 #define TESTSCFLAG_FALSE(uname)
\
TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)
 
-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
 PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, 
PF_NO_COMPOUND)
 PAGEFLAG(Referenced, referenced, PF_HEAD)
TESTCLEARFLAG(Referenced, referenced, PF_HEAD)
@@ -351,6 +350,35 @@ TESTCLEARFLAG(Young, young, PF_ANY)
 PAGEFLAG(Idle, idle, PF_ANY)
 #endif
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include 
+
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+   __set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+   page = compound_head(page);
+   lock_acquire_exclusive(&page->map, 0, 1, NULL, _RET_IP_);
+}
+
+static __always_inline void __ClearPageLocked(struct page *page)
+{
+   __clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+   page = compound_head(page);
+   /*
+* lock_commit_crosslock() is necessary for crosslock
+* when the lock is released, before lock_release().
+*/
+   lock_commit_crosslock(&page->map);
+   lock_release(&page->map, 0, _RET_IP_);
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+#endif
+
 /*
  * On an anonymous page mapped into a user virtual memory area,
  * page->mapping points to its anon_vma, not to a struct address_space;
-- 
1.9.1

[PATCH v4 00/15] lockdep: Implement crossrelease feature

2016-12-08 Thread Byungchul Park

I checked if crossrelease feature works well on my qemu-i386 machine.
There's no problem at all to work on mine. But I wonder if it's also
true even on other machines. Especially, on large system. Could you
let me know if it doesn't work on yours? Or Could you let me know if
crossrelease feature is useful? Please let me know if you need to
backport it to another version but it's not easy. Then I can provide
the backported version after working it.

I added output text of 'cat /proc/lockdep' on my machine applying
crossrelease feature, showing dependencies of lockdep. You can check
what kind of dependencies are added by crossrelease feature. Please
use '(complete)' or '(PG_locked)' as a keyword to find dependencies
added by this patch set.

And I still keep the base unchanged (v4.7). I will rebase it on the
latest once you have a consensus on it. Your opinions?

-8<-

Change from v3
- reviced document

Change from v2
- rebase on vanilla v4.7 tag
- move lockdep data for page lock from struct page to page_ext
- allocate plocks buffer via vmalloc instead of in struct task
- enhanced comments and document
- optimize performance
- make reporting function crossrelease-aware

Change from v1
- enhanced the document
- removed save_stack_trace() optimizing patch
- made this based on the seperated save_stack_trace patchset
  
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1182242.html

Can we detect deadlocks below with original lockdep?

Example 1)

PROCESS X   PROCESS Y
--  --
mutext_lock A
lock_page B
lock_page B
mutext_lock A // DEADLOCK
unlock_page B
mutext_unlock A
mutex_unlock A
unlock_page B

where A and B are different lock classes.

No, we cannot.

Example 2)

PROCESS X   PROCESS Y   PROCESS Z
--  --  --
mutex_lock A
lock_page B
lock_page B
mutext_lock A // DEADLOCK
mutext_unlock A
unlock_page B
(B was held by PROCESS X)
unlock_page B
mutex_unlock A

where A and B are different lock classes.

No, we cannot.

Example 3)

PROCESS X   PROCESS Y
--  --
mutex_lock A
mutex_lock A
mutex_unlock A
wait_for_complete B // DEADLOCK
complete B
mutex_unlock A

where A is a lock class and B is a completion variable.

No, we cannot.

Not only lock operations, but also any operations causing to wait or
spin for something can cause deadlock unless it's eventually *released*
by someone. The important point here is that the waiting or spinning
must be *released* by someone.

Using crossrelease feature, we can check dependency and detect deadlock
possibility not only for typical lock, but also for lock_page(),
wait_for_xxx() and so on, which might be released in any context.

See the last patch including the document for more information.

Byungchul Park (15):
  x86/dumpstack: Optimize save_stack_trace
  x86/dumpstack: Add save_stack_trace()_fast()
  lockdep: Refactor lookup_chain_cache()
  lockdep: Add a function building a chain between two classes
  lockdep: Make check_prev_add can use a separate stack_trace
  lockdep: Make save_trace can skip stack tracing of the current
  lockdep: Implement crossrelease feature
  lockdep: Make crossrelease use save_stack_trace_fast()
  lockdep: Make print_circular_bug() crosslock-aware
  lockdep: Apply crossrelease to completion operation
  pagemap.h: Remove trailing white space
  lockdep: Apply crossrelease to PG_locked lock
  lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
  lockdep: Move data used in CONFIG_LOCKDEP_PAGELOCK from page to
page_ext
  lockdep: Crossrelease feature documentation

 Documentation/locking/crossrelease.txt | 1053 
 arch/x86/include/asm/stacktrace.h  |1 +
 arch/x86/kernel/dumpstack.c|4 +
 arch/x86/kernel/dumpstack_32.c |2 +
 arch/x86/kernel/stacktrace.c   |   32 +
 include/linux/completion.h |  121 +++-
 include/linux/irqflags.h   |   12 +-
 include/linux/lockdep.h|  122 
 include/linux/mm_types.h   |4 +
 include/linux/page-flags.h |   43 +-
 include/linux/page_ext.h   |5 +
 include/linux/pagemap.h|  124 +++-
 include/linux/sched.h  |5 +
 include/linux/stacktrace.h |2 +
 kernel/exit.c

[PATCH v4 15/15] lockdep: Crossrelease feature documentation

2016-12-08 Thread Byungchul Park

This document describes the concept of crossrelease feature, which
generalizes what causes a deadlock and how can detect a deadlock.

Signed-off-by: Byungchul Park 
---
 Documentation/locking/crossrelease.txt | 1053 
 1 file changed, 1053 insertions(+)
 create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt 
b/Documentation/locking/crossrelease.txt
new file mode 100644
index 000..7170b2f
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,1053 @@
+Crossrelease
+
+
+Started by Byungchul Park 
+
+Contents:
+
+ (*) Background.
+
+ - What causes deadlock.
+ - What lockdep detects.
+ - How lockdep works.
+
+ (*) Limitation.
+
+ - Limit to typical locks.
+ - Pros from the limitation.
+ - Cons from the limitation.
+
+ (*) Generalization.
+
+ - Relax the limitation.
+
+ (*) Crossrelease.
+
+ - Introduce crossrelease.
+ - Pick true dependencies.
+ - Introduce commit.
+
+ (*) Implementation.
+
+ - Data structures.
+ - How crossrelease works.
+
+ (*) Optimizations.
+
+ - Avoid duplication.
+ - Lockless for hot paths.
+
+
+==
+Background
+==
+
+What causes deadlock
+
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason. Single or more contexts
+paricipate in such a deadlock.
+
+For example,
+
+   A context going to trigger event D is waiting for event A to happen.
+   A context going to trigger event A is waiting for event B to happen.
+   A context going to trigger event B is waiting for event C to happen.
+   A context going to trigger event C is waiting for event D to happen.
+
+A deadlock occurs when these four wait operations run at the same time,
+because event D cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen, which in turn cannot be
+triggered if event D does not happen. After all, no event can be
+triggered since any of them never meets its precondition to wake up.
+
+In terms of dependency, a wait for an event creates a dependency if the
+context is going to wake up another waiter by triggering an proper event.
+In other words, a dependency exists if,
+
+   COND 1. There are two waiters waiting for each event at the same time.
+   COND 2. Only way to wake up each waiter is to trigger its events.
+   COND 3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like,
+
+   Event D depends on event A.
+   Event A depends on event B.
+   Event B depends on event C.
+   Event C depends on event D.
+
+   NOTE: Precisely speaking, a dependency is one between whether a
+   waiter for an event can be woken up and whether another waiter for
+   another event can be woken up. However from now on, we will describe
+   a dependency as if it's one between an event and another event for
+   simplicity, so e.g. 'event D depends on event A'.
+
+And they form circular dependencies like,
+
+-> D -> A -> B -> C -
+   / \
+   \ /
+-
+
+   where A, B,..., D are different events, and '->' represents 'depends
+   on'.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its precondition to wake up if they run simultaneously, as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+What lockdep detects
+
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations e.i. acquire and release. Waiting for a lock to be
+released corresponds to waiting for an event to happen, and releasing a
+lock corresponds to triggering an event. See 'What causes deadlock'
+section.
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. Lockdep does the both.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the problematic
+dependencies run simultaneously.
+
+To detect a potential possibility which means a deadlock has not
+happened yet but might happen in future, lockdep considers all possible
+combinations of dependencies so that its potential possibility can be
+detected in advance. To do this, lockdep is trying to,
+
+1. Use a global dependency graph.
+
+

[PATCH v4 12/15] lockdep: Apply crossrelease to PG_locked lock

2016-12-08 Thread Byungchul Park

lock_page() and its family can cause deadlock. Nevertheless, it cannot
use the lock correctness validator becasue unlock_page() can be called
in different context from the context calling lock_page(), which
violates lockdep's assumption without crossrelease feature.

However, thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can apply the lockdep
detector to lock_page(). Applied it.

Signed-off-by: Byungchul Park 
---
 include/linux/mm_types.h |   9 +
 include/linux/pagemap.h  | 100 ---
 lib/Kconfig.debug|   8 
 mm/filemap.c |   4 +-
 mm/page_alloc.c  |   3 ++
 5 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ca3e517..87db0ac 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -16,6 +16,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include 
+#endif
+
 #ifndef AT_VECTOR_SIZE_ARCH
 #define AT_VECTOR_SIZE_ARCH 0
 #endif
@@ -220,6 +224,11 @@ struct page {
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
 #endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+   struct lockdep_map map;
+   struct cross_lock xlock;
+#endif
 }
 /*
  * The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0cf6980..dbe7adf 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,9 @@
 #include 
 #include  /* for in_interrupt() */
 #include 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include 
+#endif
 
 /*
  * Bits in mapping->flags.  The lower __GFP_BITS_SHIFT bits are the page
@@ -413,26 +416,90 @@ static inline pgoff_t linear_page_index(struct 
vm_area_struct *vma,
return pgoff;
 }
 
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#define lock_page_init(p)  \
+do {   \
+   static struct lock_class_key __key; \
+   lockdep_init_map_crosslock(&(p)->map, &(p)->xlock,  \
+   "(PG_locked)" #p, &__key, 0);   \
+} while (0)
+
+static inline void lock_page_acquire(struct page *page, int try)
+{
+   page = compound_head(page);
+   lock_acquire_exclusive(&page->map, 0, try, NULL, _RET_IP_);
+}
+
+static inline void lock_page_release(struct page *page)
+{
+   page = compound_head(page);
+   /*
+* lock_commit_crosslock() is necessary for crosslock
+* when the lock is released, before lock_release().
+*/
+   lock_commit_crosslock(&page->map);
+   lock_release(&page->map, 0, _RET_IP_);
+}
+#else
+static inline void lock_page_init(struct page *page) {}
+static inline void lock_page_free(struct page *page) {}
+static inline void lock_page_acquire(struct page *page, int try) {}
+static inline void lock_page_release(struct page *page) {}
+#endif
+
 extern void __lock_page(struct page *page);
 extern int __lock_page_killable(struct page *page);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void do_raw_unlock_page(struct page *page);
 
-static inline int trylock_page(struct page *page)
+static inline void unlock_page(struct page *page)
+{
+   lock_page_release(page);
+   do_raw_unlock_page(page);
+}
+
+static inline int do_raw_trylock_page(struct page *page)
 {
page = compound_head(page);
return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
 }
 
+static inline int trylock_page(struct page *page)
+{
+   if (do_raw_trylock_page(page)) {
+   lock_page_acquire(page, 1);
+   return 1;
+   }
+   return 0;
+}
+
 /*
  * lock_page may only be called if we have the page's inode pinned.
  */
 static inline void lock_page(struct page *page)
 {
might_sleep();
-   if (!trylock_page(page))
+
+   if (!do_raw_trylock_page(page))
__lock_page(page);
+   /*
+* Acquire() must be after actual lock operation of crosslock.
+* This way crosslock and other locks can be serialized like,
+*
+*  CONTEXT 1   CONTEXT 2
+*  LOCK crosslock
+*  ACQUIRE crosslock
+*atomic_inc_return
+*  ~~
+*  ACQUIRE lock1
+*atomic_read_acquire lock1
+*  LOCK lock1
+*  LOCK lock2
+*
+* so that 'crosslock -> lock1 -> lock2' can be seen globally.
+*/
+   lock_page_acquire(page, 0);
 }
 
 /*
@@ -442,9 +509,20 @@ static inline void lock_page(struct page *page)
  */
 static inline int lock_page_killable(struct page *page)
 {
+   int ret;
+
might_sleep();

[PATCH v4 01/15] x86/dumpstack: Optimize save_stack_trace

2016-12-08 Thread Byungchul Park

Currently, x86 implementation of save_stack_trace() is walking all stack
region word by word regardless of what the trace->max_entries is.
However, it's unnecessary to walk after already fulfilling caller's
requirement, say, if trace->nr_entries >= trace->max_entries is true.

I measured its overhead and printed its difference of sched_clock() with
my QEMU x86 machine. The latency was improved over 70% when
trace->max_entries = 5.

Before this patch:

[2.329573] save_stack_trace() takes 76820 ns
[2.329863] save_stack_trace() takes 62131 ns
[2.33] save_stack_trace() takes 99476 ns
[2.329846] save_stack_trace() takes 62419 ns
[2.33] save_stack_trace() takes 88918 ns
[2.330253] save_stack_trace() takes 73669 ns
[2.330520] save_stack_trace() takes 67876 ns
[2.330671] save_stack_trace() takes 75963 ns
[2.330983] save_stack_trace() takes 95079 ns
[2.330451] save_stack_trace() takes 62352 ns

After this patch:

[2.795000] save_stack_trace() takes 21147 ns
[2.795397] save_stack_trace() takes 20230 ns
[2.795397] save_stack_trace() takes 31274 ns
[2.795739] save_stack_trace() takes 19706 ns
[2.796484] save_stack_trace() takes 20266 ns
[2.796484] save_stack_trace() takes 20902 ns
[2.797000] save_stack_trace() takes 38110 ns
[2.797510] save_stack_trace() takes 20224 ns
[2.798181] save_stack_trace() takes 20172 ns
[2.798837] save_stack_trace() takes 20824 ns

Signed-off-by: Byungchul Park 
---
 arch/x86/include/asm/stacktrace.h | 1 +
 arch/x86/kernel/dumpstack.c   | 4 
 arch/x86/kernel/dumpstack_32.c| 2 ++
 arch/x86/kernel/stacktrace.c  | 7 +++
 4 files changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 0944218..f6d0694 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -41,6 +41,7 @@ struct stacktrace_ops {
/* On negative return stop dumping */
int (*stack)(void *data, char *name);
walk_stack_twalk_stack;
+   int (*end_walk)(void *data);
 };
 
 void dump_trace(struct task_struct *tsk, struct pt_regs *regs,
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index ef8017c..274d42a 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -113,6 +113,8 @@ print_context_stack(struct task_struct *task,
print_ftrace_graph_addr(addr, data, ops, task, graph);
}
stack++;
+   if (ops->end_walk && ops->end_walk(data))
+   break;
}
return bp;
 }
@@ -138,6 +140,8 @@ print_context_stack_bp(struct task_struct *task,
frame = frame->next_frame;
ret_addr = &frame->return_address;
print_ftrace_graph_addr(addr, data, ops, task, graph);
+   if (ops->end_walk && ops->end_walk(data))
+   break;
}
 
return (unsigned long)frame;
diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c
index fef917e..762d1fd 100644
--- a/arch/x86/kernel/dumpstack_32.c
+++ b/arch/x86/kernel/dumpstack_32.c
@@ -69,6 +69,8 @@ void dump_trace(struct task_struct *task, struct pt_regs 
*regs,
 
bp = ops->walk_stack(task, stack, bp, ops, data,
 end_stack, &graph);
+   if (ops->end_walk && ops->end_walk(data))
+   break;
 
/* Stop if not on irq stack */
if (!end_stack)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 9ee98ee..a44de4d 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -47,10 +47,17 @@ save_stack_address_nosched(void *data, unsigned long addr, 
int reliable)
return __save_stack_address(data, addr, reliable, true);
 }
 
+static int save_stack_end(void *data)
+{
+   struct stack_trace *trace = data;
+   return trace->nr_entries >= trace->max_entries;
+}
+
 static const struct stacktrace_ops save_stack_ops = {
.stack  = save_stack_stack,
.address= save_stack_address,
.walk_stack = print_context_stack,
+   .end_walk   = save_stack_end,
 };
 
 static const struct stacktrace_ops save_stack_ops_nosched = {
-- 
1.9.1

[PATCH v4 05/15] lockdep: Make check_prev_add can use a separate stack_trace

2016-12-08 Thread Byungchul Park

check_prev_add() saves a stack trace of the current. But crossrelease
feature needs to use a separate stack trace of another context in
check_prev_add(). So make it use a separate stack trace instead of one
of the current.

Signed-off-by: Byungchul Park 
---
 kernel/locking/lockdep.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 111839f..3eaa11c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1793,7 +1793,8 @@ check_deadlock(struct task_struct *curr, struct held_lock 
*next,
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-  struct held_lock *next, int distance, int *stack_saved)
+  struct held_lock *next, int distance, int *stack_saved,
+  struct stack_trace *own_trace)
 {
struct lock_list *entry;
int ret;
@@ -1854,7 +1855,7 @@ check_prev_add(struct task_struct *curr, struct held_lock 
*prev,
}
}
 
-   if (!*stack_saved) {
+   if (!own_trace && stack_saved && !*stack_saved) {
if (!save_trace(&trace))
return 0;
*stack_saved = 1;
@@ -1866,14 +1867,14 @@ check_prev_add(struct task_struct *curr, struct 
held_lock *prev,
 */
ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
   &hlock_class(prev)->locks_after,
-  next->acquire_ip, distance, &trace);
+  next->acquire_ip, distance, own_trace ?: &trace);
 
if (!ret)
return 0;
 
ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
   &hlock_class(next)->locks_before,
-  next->acquire_ip, distance, &trace);
+  next->acquire_ip, distance, own_trace ?: &trace);
if (!ret)
return 0;
 
@@ -1882,7 +1883,8 @@ check_prev_add(struct task_struct *curr, struct held_lock 
*prev,
 */
if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
/* We drop graph lock, so another thread can overwrite trace. */
-   *stack_saved = 0;
+   if (stack_saved)
+   *stack_saved = 0;
graph_unlock();
printk("\n new dependency: ");
print_lock_name(hlock_class(prev));
@@ -1931,8 +1933,8 @@ check_prevs_add(struct task_struct *curr, struct 
held_lock *next)
 * added:
 */
if (hlock->read != 2 && hlock->check) {
-   if (!check_prev_add(curr, hlock, next,
-   distance, &stack_saved))
+   if (!check_prev_add(curr, hlock, next, distance,
+   &stack_saved, NULL))
return 0;
/*
 * Stop after the first non-trylock entry,
-- 
1.9.1

[PATCH v4 02/15] x86/dumpstack: Add save_stack_trace()_fast()

2016-12-08 Thread Byungchul Park

In non-oops case, it's usually not necessary to check all words of stack
area to extract backtrace. Instead, we can achieve it by tracking frame
pointer. So made it possible to save stack trace lightly in normal case.

I measured its ovehead and printed its difference of sched_clock() with
my QEMU x86 machine. The latency was improved over 80% when
trace->max_entries = 5.

Before this patch:

[2.795000] save_stack_trace() takes 21147 ns
[2.795397] save_stack_trace() takes 20230 ns
[2.795397] save_stack_trace() takes 31274 ns
[2.795739] save_stack_trace() takes 19706 ns
[2.796484] save_stack_trace() takes 20266 ns
[2.796484] save_stack_trace() takes 20902 ns
[2.797000] save_stack_trace() takes 38110 ns
[2.797510] save_stack_trace() takes 20224 ns
[2.798181] save_stack_trace() takes 20172 ns
[2.798837] save_stack_trace() takes 20824 ns

After this patch:

[3.133807] save_stack_trace() takes 3297 ns
[3.133954] save_stack_trace() takes 3330 ns
[3.134235] save_stack_trace() takes 3517 ns
[3.134711] save_stack_trace() takes 3773 ns
[3.135000] save_stack_trace() takes 3685 ns
[3.135541] save_stack_trace() takes 4757 ns
[3.135865] save_stack_trace() takes 3420 ns
[3.136000] save_stack_trace() takes 3329 ns
[3.137000] save_stack_trace() takes 4058 ns
[3.137000] save_stack_trace() takes 3499 ns

Signed-off-by: Byungchul Park 
---
 arch/x86/kernel/stacktrace.c | 25 +
 include/linux/stacktrace.h   |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index a44de4d..d8da90f 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -53,6 +53,10 @@ static int save_stack_end(void *data)
return trace->nr_entries >= trace->max_entries;
 }
 
+/*
+ * This operation should be used in the oops case where
+ * stack might be broken.
+ */
 static const struct stacktrace_ops save_stack_ops = {
.stack  = save_stack_stack,
.address= save_stack_address,
@@ -60,6 +64,13 @@ static const struct stacktrace_ops save_stack_ops = {
.end_walk   = save_stack_end,
 };
 
+static const struct stacktrace_ops save_stack_ops_fast = {
+   .stack  = save_stack_stack,
+   .address= save_stack_address,
+   .walk_stack = print_context_stack_bp,
+   .end_walk   = save_stack_end,
+};
+
 static const struct stacktrace_ops save_stack_ops_nosched = {
.stack  = save_stack_stack,
.address= save_stack_address_nosched,
@@ -68,6 +79,7 @@ static const struct stacktrace_ops save_stack_ops_nosched = {
 
 /*
  * Save stack-backtrace addresses into a stack_trace buffer.
+ * It works even in oops.
  */
 void save_stack_trace(struct stack_trace *trace)
 {
@@ -77,6 +89,19 @@ void save_stack_trace(struct stack_trace *trace)
 }
 EXPORT_SYMBOL_GPL(save_stack_trace);
 
+/*
+ * Save stack-backtrace addresses into a stack_trace buffer.
+ * This is perfered in normal case where we expect the stack is
+ * reliable.
+ */
+void save_stack_trace_fast(struct stack_trace *trace)
+{
+   dump_trace(current, NULL, NULL, 0, &save_stack_ops_fast, trace);
+   if (trace->nr_entries < trace->max_entries)
+   trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+EXPORT_SYMBOL_GPL(save_stack_trace_fast);
+
 void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
 {
dump_trace(current, regs, NULL, 0, &save_stack_ops, trace);
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 0a34489..ddef1d0 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -14,6 +14,7 @@ struct stack_trace {
 };
 
 extern void save_stack_trace(struct stack_trace *trace);
+extern void save_stack_trace_fast(struct stack_trace *trace);
 extern void save_stack_trace_regs(struct pt_regs *regs,
  struct stack_trace *trace);
 extern void save_stack_trace_tsk(struct task_struct *tsk,
@@ -31,6 +32,7 @@ extern void save_stack_trace_user(struct stack_trace *trace);
 
 #else
 # define save_stack_trace(trace)   do { } while (0)
+# define save_stack_trace_fast(trace)  do { } while (0)
 # define save_stack_trace_tsk(tsk, trace)  do { } while (0)
 # define save_stack_trace_user(trace)  do { } while (0)
 # define print_stack_trace(trace, spaces)  do { } while (0)
-- 
1.9.1

Re: [PATCH v2] kexec: add cond_resched into kimage_alloc_crash_control_pages

2016-12-08 Thread zhong jiang

On 2016/12/8 17:41, Xunlei Pang wrote:
> On 12/08/2016 at 10:37 AM, zhongjiang wrote:
>> From: zhong jiang 
>>
>> A soft lookup will occur when I run trinity in syscall kexec_load.
>> the corresponding stack information is as follows.
>>
>> [  237.235937] BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859]
>> [  237.242699] Kernel panic - not syncing: softlockup: hung tasks
>> [  237.248573] CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G   O L 
>> V---   3.10.0-327.28.3.35.zhongjiang.x86_64 #1
>> [  237.259984] Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 
>> V2/BC01SRSA0, BIOS RMIBV386 06/30/2014
>> [  237.269752]  8187626b 18cfde31 88184c803e18 
>> 81638f16
>> [  237.277471]  88184c803e98 8163278f 0008 
>> 88184c803ea8
>> [  237.285190]  88184c803e48 18cfde31 88184c803e67 
>> 
>> [  237.292909] Call Trace:
>> [  237.295404][] dump_stack+0x19/0x1b
>> [  237.301352]  [] panic+0xd8/0x214
>> [  237.306196]  [] watchdog_timer_fn+0x1cc/0x1e0
>> [  237.312157]  [] ? watchdog_enable+0xc0/0xc0
>> [  237.317955]  [] __hrtimer_run_queues+0xd2/0x260
>> [  237.324087]  [] hrtimer_interrupt+0xb0/0x1e0
>> [  237.329963]  [] ? call_softirq+0x1c/0x30
>> [  237.335500]  [] local_apic_timer_interrupt+0x37/0x60
>> [  237.342228]  [] smp_apic_timer_interrupt+0x3f/0x60
>> [  237.348771]  [] apic_timer_interrupt+0x6d/0x80
>> [  237.354967][] ? 
>> kimage_alloc_control_pages+0x80/0x270
>> [  237.362875]  [] ? kmem_cache_alloc_trace+0x1ce/0x1f0
>> [  237.369592]  [] ? do_kimage_alloc_init+0x1f/0x90
>> [  237.375992]  [] kimage_alloc_init+0x12a/0x180
>> [  237.382103]  [] SyS_kexec_load+0x20a/0x260
>> [  237.387957]  [] system_call_fastpath+0x16/0x1b
>>
>> the first time allocate control pages may take too much time because
>> crash_res.end can be set to a higher value. we need to add cond_resched
>> to avoid the issue.
>>
>> The patch have been tested and above issue is not appear.
>>
>> Signed-off-by: zhong jiang 
>> ---
>>  kernel/kexec_core.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 5616755..bfc9621 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -441,6 +441,8 @@ static struct page 
>> *kimage_alloc_crash_control_pages(struct kimage *image,
>>  while (hole_end <= crashk_res.end) {
>>  unsigned long i;
>>  
>> +cond_resched();
>> +
> I can't see why it would take a long time to loop inside, the job it does is 
> simply to find a control area
> not overlapped with image->segment[], you can see the loop "for (i = 0; i < 
> image->nr_segments; i++)",
> @hole_end will be advanced to the end of its next nearby segment once overlap 
> was detected each loop,
> also there are limited (<=16) segments, so it won't take long to locate the 
> right area.
>
> Am I missing something?
>
> Regards,
> Xunlei
  if the crashkernel = auto is set in cmdline.  it represent crashk_res.end 
will exceed to 4G, the first allocate control pages will
  loop  million times. if we set crashk_res.end to the higher value manually,  
you can image

  Thanks
  zhongjiang
>>  if (hole_end > KEXEC_CRASH_CONTROL_MEMORY_LIMIT)
>>  break;
>>  /* See if I overlap any of the segments */
>
> .
>

Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

2016-12-08 Thread Peter Zijlstra

On Thu, Dec 08, 2016 at 08:49:39PM -, Thomas Gleixner wrote:

> +/*
> + * Enabled when timekeeping is supposed to deal with virtualization keeping
> + * VMs long enough scheduled out that the 64 * 32 bit multiplication in
> + * timekeeping_delta_to_ns() overflows 64bit.
> + */
> +#ifdef CONFIG_TIMEKEEPING_USE_128BIT_MATH
> +
> +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> delta)
> +{
> + unsigned __int128 nsec;
> +
> + nsec = ((unsigned __int128)delta * tkr->mult) + tkr->xtime_nsec;
> + return (u64) (nsec >> tkr->shift);
> +}
> +#else
> +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 
> delta)
> +{
> + u32 dh, dl;
> + u64 nsec;
> +
> + dl = delta;
> + dh = delta >> 32;
> +
> + nsec = ((u64)dl * tkr->mult) + tkr->xtime_nsec;
> + nsec >>= tkr->shift;
> + if (unlikely(dh))
> + nsec += ((u64)dh * tkr->mult) << (32 - tkr->shift);
> + return nsec;
> +}
> +#endif
> +
> +#else /* CONFIG_TIMEKEEPING_USE_128BIT_MATH */

xtime_nsec confuses me, contrary to its name, its not actually in nsec,
its in shifted nsec units for some reason (and that might well be a good
reason, but I don't know).

In any case, it needing to be inside the shift is somewhat unfortunate
in that it doesn't allow you to use the existing mul_u64_u32_shr()

Re: net: deadlock on genl_mutex

2016-12-08 Thread Cong Wang

On Thu, Dec 8, 2016 at 4:32 PM, Cong Wang  wrote:
> On Thu, Dec 8, 2016 at 9:16 AM, Dmitry Vyukov  wrote:
>> Chain exists of:
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(genl_mutex);
>>lock(nlk->cb_mutex);
>>lock(genl_mutex);
>>   lock(rtnl_mutex);
>>
>>  *** DEADLOCK ***
>
> This one looks legitimate, because nlk->cb_mutex could be rtnl_mutex.
> Let me think about it.

Never mind. Actually both reports in this thread are legitimate.

I know what happened now, the lock chain is so long, 4 locks are involved
to form a chain!!!

Let me think about how to break the chain.

[PATCH] staging: android: ion: return -ENOMEM in ion_cma_heap allocation failure

2016-12-08 Thread Jaewon Kim

Initial Commit 349c9e138551 ("gpu: ion: add CMA heap") returns -1 in allocation
failure. The returned value is passed up to userspace through ioctl. So user can
misunderstand error reason as -EPERM(1) rather than -ENOMEM(12).

This patch simply changed this to return -ENOMEM.

Signed-off-by: Jaewon Kim 
---
 drivers/staging/android/ion/ion_cma_heap.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index 6c7de74..22b9582 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -24,8 +24,6 @@
 #include "ion.h"
 #include "ion_priv.h"
 
-#define ION_CMA_ALLOCATE_FAILED -1
-
 struct ion_cma_heap {
struct ion_heap heap;
struct device *dev;
@@ -59,7 +57,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
 
info = kzalloc(sizeof(struct ion_cma_buffer_info), GFP_KERNEL);
if (!info)
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 
info->cpu_addr = dma_alloc_coherent(dev, len, &(info->handle),
GFP_HIGHUSER | __GFP_ZERO);
@@ -88,7 +86,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
dma_free_coherent(dev, len, info->cpu_addr, info->handle);
 err:
kfree(info);
-   return ION_CMA_ALLOCATE_FAILED;
+   return -ENOMEM;
 }
 
 static void ion_cma_free(struct ion_buffer *buffer)
-- 
1.9.1

[PATCH] [RFC] drivers: dma-coherent: pass struct dma_attrs to dma_alloc_from_coherent

2016-12-08 Thread Jaewon Kim

dma_alloc_from_coherent does not get struct dma_attrs information.
If dma_attrs information is passed to dma_alloc_from_coherent,
dma_alloc_from_coherent can do more jobs accodring to the information.
As a example I added DMA_ATTR_SKIP_ZEROING to skip zeroing. Accoring
to driver implementation ZEROING could be skipped or could be done later.

Signed-off-by: Jaewon Kim 
---
 drivers/base/dma-coherent.c | 6 +-
 include/linux/dma-mapping.h | 7 ---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 640a7e6..428eced 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -151,6 +151,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
  * @dma_handle:This will be filled with the correct dma handle
  * @ret:   This pointer will be filled with the virtual address
  * to allocated area.
+ * @attrs: dma_attrs to pass additional information
  *
  * This function should be only called from per-arch dma_alloc_coherent()
  * to support allocation from per-device coherent memory pools.
@@ -159,7 +160,8 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
  * generic memory areas, or !0 if dma_alloc_coherent should return @ret.
  */
 int dma_alloc_from_coherent(struct device *dev, ssize_t size,
-  dma_addr_t *dma_handle, void **ret)
+  dma_addr_t *dma_handle, void **ret,
+  struct dma_attrs *attrs)
 {
struct dma_coherent_mem *mem;
int order = get_order(size);
@@ -190,6 +192,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t 
size,
*ret = mem->virt_base + (pageno << PAGE_SHIFT);
dma_memory_map = (mem->flags & DMA_MEMORY_MAP);
spin_unlock_irqrestore(&mem->spinlock, flags);
+   if (dma_get_attr(DMA_ATTR_SKIP_ZEROING, attrs))
+   return 1;
if (dma_memory_map)
memset(*ret, 0, size);
else
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 08528af..737fd71 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -151,13 +151,14 @@ static inline int is_device_dma_capable(struct device 
*dev)
  * Don't use them in device drivers.
  */
 int dma_alloc_from_coherent(struct device *dev, ssize_t size,
-  dma_addr_t *dma_handle, void **ret);
+  dma_addr_t *dma_handle, void **ret,
+  struct dma_attrs *attrs);
 int dma_release_from_coherent(struct device *dev, int order, void *vaddr);
 
 int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, size_t size, int *ret);
 #else
-#define dma_alloc_from_coherent(dev, size, handle, ret) (0)
+#define dma_alloc_from_coherent(dev, size, handle, ret, attrs) (0)
 #define dma_release_from_coherent(dev, order, vaddr) (0)
 #define dma_mmap_from_coherent(dev, vma, vaddr, order, ret) (0)
 #endif /* CONFIG_HAVE_GENERIC_DMA_COHERENT */
@@ -456,7 +457,7 @@ static inline void *dma_alloc_attrs(struct device *dev, 
size_t size,
 
BUG_ON(!ops);
 
-   if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr))
+   if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr, attrs))
return cpu_addr;
 
if (!arch_dma_alloc_attrs(&dev, &flag))
-- 
1.9.1

1 2 3 4 5 6 7 8 9 >

1 - 100 of 857 matches

Mail list logo