Re: [PATCH v2] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-05-11 Thread Yongji Xie

On 2016/5/11 23:10, Bjorn Helgaas wrote:


On Wed, May 11, 2016 at 07:34:19PM +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs. This will cause some
performance issues when we passthrough a PCI device with
this kind of BARs. Guest will be not able to handle the mmio
accesses to the BARs which leads to mmio emulations in host.

However, not all sub-page BARs will share page with other BARs.
We should allow to mmap the sub-page MMIO BARs which we can
make sure will not share page with other BARs.

This patch adds support for this case. And we try to add some
dummy resources to reserve the remaind of the page which
hot-add device's BAR might be assigned into.

This is starting to look more reasonable from a safety perspective.
At least I don't have an allergic reaction to mapping a page that may
contain BARs from other random devices :)


Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |   85 ---
  drivers/vfio/pci/vfio_pci_private.h |8 
  2 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 98059df..33282b8 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -110,16 +110,77 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
  }
  
+static bool vfio_pci_bar_mmap_supported(struct vfio_pci_device *vdev, int index)

+{
+   struct resource *res = vdev->pdev->resource + index;
+   struct vfio_pci_dummy_resource *dummy_res1 = NULL;
+   struct vfio_pci_dummy_resource *dummy_res2 = NULL;
+
+   if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) && res->flags & IORESOURCE_MEM &&
+   resource_size(res) > 0) {
+   if (resource_size(res) >= PAGE_SIZE)
+   return true;
+
+   if ((res->start & ~PAGE_MASK)) {
+   /*
+* Add a dummy resource to reserve the portion
+* before res->start in exclusive page in case
+* that hot-add device's bar is assigned into it.
+*/
+   dummy_res1 = kzalloc(sizeof(*dummy_res1), GFP_KERNEL);

Should check for kzalloc() failure here.


+   dummy_res1->resource.start = res->start & PAGE_MASK;
+   dummy_res1->resource.end = res->start - 1;
+   dummy_res1->resource.flags = res->flags;
+   if (request_resource(res->parent,
+   _res1->resource)) {
+   kfree(dummy_res1);
+   return false;
+   }
+   dummy_res1->index = index;
+   list_add(_res1->res_next,
+   >dummy_resources_list);

The main body of this function is unnecessarily indented.  If you did
this it would be less indented:

   if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
 return false;

   if (!res->flags & IORESOURCE_MEM)
 return false;

   /*
* Not sure this is necessary; the PCI core *shouldn't* set up a
* resource with a type but zero size.  But there may be bugs that
* cause us to do that.
*/
   if (!resource_size(res))
 return false;

   if (resource_size(res) >= PAGE_SIZE)
 return true;

   if ((res->start & ~PAGE_MASK)) {
 ...



Thanks for your comments. I'll send a v3 soon.

Regards,
Yongji



Re: [PATCH v2] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

2016-05-11 Thread Yongji Xie

On 2016/5/11 23:10, Bjorn Helgaas wrote:


On Wed, May 11, 2016 at 07:34:19PM +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs. This will cause some
performance issues when we passthrough a PCI device with
this kind of BARs. Guest will be not able to handle the mmio
accesses to the BARs which leads to mmio emulations in host.

However, not all sub-page BARs will share page with other BARs.
We should allow to mmap the sub-page MMIO BARs which we can
make sure will not share page with other BARs.

This patch adds support for this case. And we try to add some
dummy resources to reserve the remaind of the page which
hot-add device's BAR might be assigned into.

This is starting to look more reasonable from a safety perspective.
At least I don't have an allergic reaction to mapping a page that may
contain BARs from other random devices :)


Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |   85 ---
  drivers/vfio/pci/vfio_pci_private.h |8 
  2 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 98059df..33282b8 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -110,16 +110,77 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
  }
  
+static bool vfio_pci_bar_mmap_supported(struct vfio_pci_device *vdev, int index)

+{
+   struct resource *res = vdev->pdev->resource + index;
+   struct vfio_pci_dummy_resource *dummy_res1 = NULL;
+   struct vfio_pci_dummy_resource *dummy_res2 = NULL;
+
+   if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) && res->flags & IORESOURCE_MEM &&
+   resource_size(res) > 0) {
+   if (resource_size(res) >= PAGE_SIZE)
+   return true;
+
+   if ((res->start & ~PAGE_MASK)) {
+   /*
+* Add a dummy resource to reserve the portion
+* before res->start in exclusive page in case
+* that hot-add device's bar is assigned into it.
+*/
+   dummy_res1 = kzalloc(sizeof(*dummy_res1), GFP_KERNEL);

Should check for kzalloc() failure here.


+   dummy_res1->resource.start = res->start & PAGE_MASK;
+   dummy_res1->resource.end = res->start - 1;
+   dummy_res1->resource.flags = res->flags;
+   if (request_resource(res->parent,
+   _res1->resource)) {
+   kfree(dummy_res1);
+   return false;
+   }
+   dummy_res1->index = index;
+   list_add(_res1->res_next,
+   >dummy_resources_list);

The main body of this function is unnecessarily indented.  If you did
this it would be less indented:

   if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
 return false;

   if (!res->flags & IORESOURCE_MEM)
 return false;

   /*
* Not sure this is necessary; the PCI core *shouldn't* set up a
* resource with a type but zero size.  But there may be bugs that
* cause us to do that.
*/
   if (!resource_size(res))
 return false;

   if (resource_size(res) >= PAGE_SIZE)
 return true;

   if ((res->start & ~PAGE_MASK)) {
 ...



Thanks for your comments. I'll send a v3 soon.

Regards,
Yongji



RE: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-05-11 Thread Yang, Wenyou


> -Original Message-
> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com]
> Sent: 2016年5月12日 11:53
> To: Yang, Wenyou 
> Cc: Alan Stern ; Greg Kroah-Hartman
> ; Ferre, Nicolas ;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending
> 
> Hi,
> 
> On 12/05/2016 at 09:05:34 +0800, Wenyou Yang wrote :
> > In order to get lower consumption, as a workaround, suspend the USB
> > PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> > Configuration Register while OHCI USB suspending.
> >
> 
> Why is that a workaround?

Because if these bits is not set as this patch, there is 5mA current left
at VDDOSC rail during suspend. If apply this patch, this current will disappear.

So, I think it is a workaround.

> 
> > This suspend operation must be done before stopping the USB clock,
> > resume after the USB clock enabled.
> >
> > Signed-off-by: Wenyou Yang 
> > ---
> >
> >  drivers/usb/host/ohci-at91.c | 63
> > 
> >  1 file changed, 63 insertions(+)
> >
> > diff --git a/drivers/usb/host/ohci-at91.c
> > b/drivers/usb/host/ohci-at91.c index d177372..ce898e0 100644
> > --- a/drivers/usb/host/ohci-at91.c
> > +++ b/drivers/usb/host/ohci-at91.c
> > @@ -21,6 +21,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -51,6 +53,7 @@ struct ohci_at91_priv {
> > struct clk *hclk;
> > bool clocked;
> > bool wakeup;/* Saved wake-up state for resume */
> > +   struct regmap *sfr_regmap;
> >  };
> >  /* interface and function clocks; sometimes also an AHB clock */
> >
> > @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device
> > *pdev)
> >
> >
> > /*
> > -*/
> >
> > +struct regmap *at91_dt_syscon_sfr(void) {
> > +   struct regmap *regmap;
> > +
> > +   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> > +   if (IS_ERR(regmap))
> > +   return NULL;
> > +
> > +   return regmap;
> > +}
> > +
> >  static void usb_hcd_at91_remove (struct usb_hcd *, struct
> > platform_device *);
> >
> >  /* configure so an HC device and id are always provided */ @@ -197,6
> > +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver *driver,
> > goto err;
> > }
> >
> > +   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> > +
> > board = hcd->self.controller->platform_data;
> > ohci = hcd_to_ohci(hcd);
> > ohci->num_ports = board->ports;
> > @@ -581,6 +597,49 @@ static int ohci_hcd_at91_drv_remove(struct
> platform_device *pdev)
> > return 0;
> >  }
> >
> > +#define SFR_OHCIICR0x10
> > +#define SFR_OHCIICR_SUSPEND_A  BIT(8)
> > +#define SFR_OHCIICR_SUSPEND_B  BIT(9)
> > +#define SFR_OHCIICR_SUSPEND_C  BIT(10)
> > +
> > +#define SFR_OHCIICR_USB_SUSPEND(SFR_OHCIICR_SUSPEND_A | \
> > +SFR_OHCIICR_SUSPEND_B | \
> > +SFR_OHCIICR_SUSPEND_C)
> > +
> 
> Those defines should go in a header file either in include/soc/at91 or in
> include/linux/mfd
> 
> > +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable) {
> > +   u32 regval;
> > +   int ret;
> > +
> > +   if (IS_ERR(regmap))
> > +   return PTR_ERR(regmap);
> > +
> > +   ret = regmap_read(regmap, SFR_OHCIICR, );
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (enable)
> > +   regval &= ~SFR_OHCIICR_USB_SUSPEND;
> > +   else
> > +   regval |= SFR_OHCIICR_USB_SUSPEND;
> > +
> > +   regmap_write(regmap, SFR_OHCIICR, regval);
> > +
> > +   regmap_read(regmap, SFR_OHCIICR, );
> > +
> > +   return 0;
> > +}
> > +
> > +static int ohci_at91_port_suspend(struct regmap *regmap) {
> > +   return ohci_at91_port_ctrl(regmap, false); }
> > +
> > +static int ohci_at91_port_resume(struct regmap *regmap) {
> > +   return ohci_at91_port_ctrl(regmap, true); }
> > +
> >  static int __maybe_unused
> >  ohci_hcd_at91_drv_suspend(struct device *dev)  { @@ -618,6 +677,8 @@
> > ohci_hcd_at91_drv_suspend(struct device *dev)
> > ohci_writel(ohci, ohci->hc_control, >regs->control);
> > ohci->rh_state = OHCI_RH_HALTED;
> >
> > +   ohci_at91_port_suspend(ohci_at91->sfr_regmap);
> > +
> 
> The main issue I see here is that you will call that function for all SoCs 
> and it will
> always fail unless it is running on a sama5d2. Maybe we could be smarter about
> that.

Can we use another compatible, such as "atmel,sama5d2-ohci" to differentiate 
it? 


Best Regards,
Wenyou Yang


RE: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-05-11 Thread Yang, Wenyou


> -Original Message-
> From: Alexandre Belloni [mailto:alexandre.bell...@free-electrons.com]
> Sent: 2016年5月12日 11:53
> To: Yang, Wenyou 
> Cc: Alan Stern ; Greg Kroah-Hartman
> ; Ferre, Nicolas ;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending
> 
> Hi,
> 
> On 12/05/2016 at 09:05:34 +0800, Wenyou Yang wrote :
> > In order to get lower consumption, as a workaround, suspend the USB
> > PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> > Configuration Register while OHCI USB suspending.
> >
> 
> Why is that a workaround?

Because if these bits is not set as this patch, there is 5mA current left
at VDDOSC rail during suspend. If apply this patch, this current will disappear.

So, I think it is a workaround.

> 
> > This suspend operation must be done before stopping the USB clock,
> > resume after the USB clock enabled.
> >
> > Signed-off-by: Wenyou Yang 
> > ---
> >
> >  drivers/usb/host/ohci-at91.c | 63
> > 
> >  1 file changed, 63 insertions(+)
> >
> > diff --git a/drivers/usb/host/ohci-at91.c
> > b/drivers/usb/host/ohci-at91.c index d177372..ce898e0 100644
> > --- a/drivers/usb/host/ohci-at91.c
> > +++ b/drivers/usb/host/ohci-at91.c
> > @@ -21,6 +21,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -51,6 +53,7 @@ struct ohci_at91_priv {
> > struct clk *hclk;
> > bool clocked;
> > bool wakeup;/* Saved wake-up state for resume */
> > +   struct regmap *sfr_regmap;
> >  };
> >  /* interface and function clocks; sometimes also an AHB clock */
> >
> > @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device
> > *pdev)
> >
> >
> > /*
> > -*/
> >
> > +struct regmap *at91_dt_syscon_sfr(void) {
> > +   struct regmap *regmap;
> > +
> > +   regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> > +   if (IS_ERR(regmap))
> > +   return NULL;
> > +
> > +   return regmap;
> > +}
> > +
> >  static void usb_hcd_at91_remove (struct usb_hcd *, struct
> > platform_device *);
> >
> >  /* configure so an HC device and id are always provided */ @@ -197,6
> > +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver *driver,
> > goto err;
> > }
> >
> > +   ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> > +
> > board = hcd->self.controller->platform_data;
> > ohci = hcd_to_ohci(hcd);
> > ohci->num_ports = board->ports;
> > @@ -581,6 +597,49 @@ static int ohci_hcd_at91_drv_remove(struct
> platform_device *pdev)
> > return 0;
> >  }
> >
> > +#define SFR_OHCIICR0x10
> > +#define SFR_OHCIICR_SUSPEND_A  BIT(8)
> > +#define SFR_OHCIICR_SUSPEND_B  BIT(9)
> > +#define SFR_OHCIICR_SUSPEND_C  BIT(10)
> > +
> > +#define SFR_OHCIICR_USB_SUSPEND(SFR_OHCIICR_SUSPEND_A | \
> > +SFR_OHCIICR_SUSPEND_B | \
> > +SFR_OHCIICR_SUSPEND_C)
> > +
> 
> Those defines should go in a header file either in include/soc/at91 or in
> include/linux/mfd
> 
> > +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable) {
> > +   u32 regval;
> > +   int ret;
> > +
> > +   if (IS_ERR(regmap))
> > +   return PTR_ERR(regmap);
> > +
> > +   ret = regmap_read(regmap, SFR_OHCIICR, );
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (enable)
> > +   regval &= ~SFR_OHCIICR_USB_SUSPEND;
> > +   else
> > +   regval |= SFR_OHCIICR_USB_SUSPEND;
> > +
> > +   regmap_write(regmap, SFR_OHCIICR, regval);
> > +
> > +   regmap_read(regmap, SFR_OHCIICR, );
> > +
> > +   return 0;
> > +}
> > +
> > +static int ohci_at91_port_suspend(struct regmap *regmap) {
> > +   return ohci_at91_port_ctrl(regmap, false); }
> > +
> > +static int ohci_at91_port_resume(struct regmap *regmap) {
> > +   return ohci_at91_port_ctrl(regmap, true); }
> > +
> >  static int __maybe_unused
> >  ohci_hcd_at91_drv_suspend(struct device *dev)  { @@ -618,6 +677,8 @@
> > ohci_hcd_at91_drv_suspend(struct device *dev)
> > ohci_writel(ohci, ohci->hc_control, >regs->control);
> > ohci->rh_state = OHCI_RH_HALTED;
> >
> > +   ohci_at91_port_suspend(ohci_at91->sfr_regmap);
> > +
> 
> The main issue I see here is that you will call that function for all SoCs 
> and it will
> always fail unless it is running on a sama5d2. Maybe we could be smarter about
> that.

Can we use another compatible, such as "atmel,sama5d2-ohci" to differentiate 
it? 


Best Regards,
Wenyou Yang


linux-next: Tree for May 12

2016-05-11 Thread Stephen Rothwell
Hi all,

Changes since 20160511:

The net-next tree gained a conflict against the net tree.

The sound-asoc tree lost its build failure.

The tip tree gained conflicts against the arc and amr64 trees.

The kvm tree gained conflicts against the tip tree.

The pinctrl tree gained a build failure so I used the version from
next-20160511.

The gpio tree gained a conflict against the mips tree.

Non-merge commits (relative to Linus' tree): 9676
 8370 files changed, 419862 insertions(+), 179824 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 235 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (685764b108a7 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (7b8ba82ad4ad m68k/defconfig: Update defconfigs 
for v4.6-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (e271c7b4420d gre: do not keep the GRE header around in 
collect medata mode)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (84add303ef95 ALSA: usb-audio: Yet another 
Phoneix Audio device quirk)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (c52c545ead97 Input: twl6040-vibra - fix DT 
node memory management)
Merging crypto-current/master (df27b26f04ed crypto: testmgr - Use kmalloc 
memory for RSA input)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix t

linux-next: Tree for May 12

2016-05-11 Thread Stephen Rothwell
Hi all,

Changes since 20160511:

The net-next tree gained a conflict against the net tree.

The sound-asoc tree lost its build failure.

The tip tree gained conflicts against the arc and amr64 trees.

The kvm tree gained conflicts against the tip tree.

The pinctrl tree gained a build failure so I used the version from
next-20160511.

The gpio tree gained a conflict against the mips tree.

Non-merge commits (relative to Linus' tree): 9676
 8370 files changed, 419862 insertions(+), 179824 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 235 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (685764b108a7 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (7b8ba82ad4ad m68k/defconfig: Update defconfigs 
for v4.6-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (e271c7b4420d gre: do not keep the GRE header around in 
collect medata mode)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (84add303ef95 ALSA: usb-audio: Yet another 
Phoneix Audio device quirk)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (c52c545ead97 Input: twl6040-vibra - fix DT 
node memory management)
Merging crypto-current/master (df27b26f04ed crypto: testmgr - Use kmalloc 
memory for RSA input)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix t

[PATCH] genirq: export __irq_set_affinity symbol

2016-05-11 Thread Xie XiuQi
__irq_set_affinity is declared in include/linux/interrupt.h, but not
been exported.

We export it now, so we could use __irq_set_affinity, irq_set_affinity
and irq_force_affinity in kernel modules.

Cc: Li Bin 
Cc: Yijing Wang 
Signed-off-by: Xie XiuQi 
---
 kernel/irq/manage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index cc1cc64..e131245 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -239,6 +239,7 @@ int __irq_set_affinity(unsigned int irq, const struct 
cpumask *mask, bool force)
raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
+EXPORT_SYMBOL_GPL(__irq_set_affinity);
 
 int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 {
-- 
1.8.3.1



[PATCH] genirq: export __irq_set_affinity symbol

2016-05-11 Thread Xie XiuQi
__irq_set_affinity is declared in include/linux/interrupt.h, but not
been exported.

We export it now, so we could use __irq_set_affinity, irq_set_affinity
and irq_force_affinity in kernel modules.

Cc: Li Bin 
Cc: Yijing Wang 
Signed-off-by: Xie XiuQi 
---
 kernel/irq/manage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index cc1cc64..e131245 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -239,6 +239,7 @@ int __irq_set_affinity(unsigned int irq, const struct 
cpumask *mask, bool force)
raw_spin_unlock_irqrestore(>lock, flags);
return ret;
 }
+EXPORT_SYMBOL_GPL(__irq_set_affinity);
 
 int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 {
-- 
1.8.3.1



Re: [PATCH v2 1/1] dmaengine: slave means at least one of DMA_SLAVE, DMA_CYCLIC

2016-05-11 Thread Vinod Koul
On Tue, May 10, 2016 at 08:43:34PM +0300, Andy Shevchenko wrote:
> When check for capabilities recognize slave support by either DMA_SLAVE or
> DMA_CYCLIC bit set. If we don't do that the user can't get a normally worked
> DMA support for engines that doesn't have one of the mentioned bits set.

Applied, thanks

-- 
~Vinod


Re: [PATCH v2 1/1] dmaengine: slave means at least one of DMA_SLAVE, DMA_CYCLIC

2016-05-11 Thread Vinod Koul
On Tue, May 10, 2016 at 08:43:34PM +0300, Andy Shevchenko wrote:
> When check for capabilities recognize slave support by either DMA_SLAVE or
> DMA_CYCLIC bit set. If we don't do that the user can't get a normally worked
> DMA support for engines that doesn't have one of the mentioned bits set.

Applied, thanks

-- 
~Vinod


Re: [PATCH 04/18] dmaengine: st_fdma: Add xp70 firmware loading mechanism.

2016-05-11 Thread Vinod Koul
On Wed, May 11, 2016 at 08:57:40AM +0100, Peter Griffin wrote:
> > Above two seem to be generic code and should be moved to core, this way
> > other drivers using ELF firmware binaries can reuse...
> 
> Do you mean add to dmaengine.c?

Nope this is not specfic to dmaengine. Perhaps drivers/base/..

> Certainly st_fdma_elf_sanity_check is fairly generic. 
> st_fdma_elf_load_segments
> is less so, especially st_fdma_seg_to_mem().
> 
> > 
> > > @@ -738,6 +925,15 @@ static int st_fdma_slave_config(struct dma_chan 
> > > *chan,
> > >   struct dma_slave_config *slave_cfg)
> > >  {
> > >   struct st_fdma_chan *fchan = to_st_fdma_chan(chan);
> > > + int ret;
> > > +
> > > + if (!atomic_read(>fdev->fw_loaded)) {
> > > + ret = st_fdma_get_fw(fchan->fdev);
> > 
> > this seems quite an odd place to load firmware, can you explain why
> > 
> 
> Good question :) I've been round the houses a bit on the firmware loading.
> 
> I will try and document here what I've tried and the different advantages /
> disadvantages I've encountered.

Per Linus, driver should load firmware on first open. So either you should
do it while channel allocation or first descriptor allocation.

> 
> In V3 patches, I used the device_config() hook. This can sleep (I think..it 
> isn't
> documented here [1]). This hook happens very close to the dma starting and the
> rootfs / firmware is likely to be present. However I've sinced learnt that it
> doesn't get  called in the memcpy case so is not a sufficient solution.
> 
> In part putting it here was to try and get firmware loading out of probe() 
> based
> on your feedback to v1 [2].
> 
> In V1 patches: - I was using request_firmware_nowait() in probe() in 
> conjunction with
> the CONFIG_FW_LOADER_USER_HELPER kernel option. I've since also learnt that 
> this
> kernel option is deprecated and shouldn't be used.

I think nowait version is better. Why do you need
CONFIG_FW_LOADER_USER_HELPER?

> Your feedback in v1 was to move firmware loading to device open [2]. However
> what classes is  device open in the dmaengine context is not obvious.

channel allocation or descriptor allocation

> 
> ===
> 
> device_alloc_chan_resources() hook
> 
> This is probably the hook closest to a device open in dmaengine context.
> This hook can sleep so request_firmware can be called. However if all drivers
> are built-in firmware loading still fails as ALSA drivers calls
> devm_snd_dmaengine_pcm_register() which requests dma channels during *its* 
> probe().
> This happens way before the rootfs is present and thus firmware loading still
> fails, and is still being done indirectly from a probe() anyway.
> 
> I had some discussion with Mark and Arnd on IRC, about the possibility of ALSA
> not rquesting DMA channels during their probe(). Marks opinion I believe was
> this wasn't a good solution as ALSA would be presenting devices to userspace 
> that potentially don't exist & can't work.

Hmmm, there are two parts, one is requesting the firmware and second is the
loading part.

Since descriptor allocation is atomic in nature, I think we should load the
firmware using nowait version and if you can atomically copy then do so at
first descriptor allocation. If not then you should copy in nowait handler

> ===
> 
> *_prep_*() hooks
> 
> + Always get called before a DMA is started
> 
> - *prep* hooks can be called from atomic context so can't sleep [1] and can't 
> use
>   request_firmware().
> - I tried using request_firmwqare_nowait() with GFP_ATOMIC flag firmware. This
>   half works but firmware loading doesn't complete before the DMA is started 
> which
>   means it often fails on the first DMA attempt.
> 
>  ===
> 
> Suggested solution for v4 patches: -
> 
> Both Arnd, Mark and Lee initial suggestion on irc was that if the device is 
> useless
> without firmware, and the firmware isn't available then the driver should call
> request_firmware() in probe() and -EPROBE_DEFER if not available.
> 
> This solution works well, in both the built-in and modules case. As the 
> deffered
> fdma driver re-probes after the rootfs has been mounted when the firmware is
> available. The other ALSA depedencies then also re-probe and everything works
> nicely (and you can be assured that the devices will actually work at this 
> point).
> 
> This fdma driver and the ALSA ones will be enabled in multi_v7_defconfig as 
> modules,
> so it is only likely to be used as a module. However using -EPROBE_DEFER
> mechanism, has the nice quality that if it is compiled as built-in you still 
> end
> up with a working system (albeit with a longer boot time).

I think that makes sense, relying on deferred probing seems a better way to
manage

> 
> Failing that the only other solution I can see is extending the dmaengine API 
> to
> provide another hook which can sleep, and must be called before a DMA
> transation is started. Note that firmware isn't actually *required* until we
> start the dma transaction, but the 

Re: [PATCH 04/18] dmaengine: st_fdma: Add xp70 firmware loading mechanism.

2016-05-11 Thread Vinod Koul
On Wed, May 11, 2016 at 08:57:40AM +0100, Peter Griffin wrote:
> > Above two seem to be generic code and should be moved to core, this way
> > other drivers using ELF firmware binaries can reuse...
> 
> Do you mean add to dmaengine.c?

Nope this is not specfic to dmaengine. Perhaps drivers/base/..

> Certainly st_fdma_elf_sanity_check is fairly generic. 
> st_fdma_elf_load_segments
> is less so, especially st_fdma_seg_to_mem().
> 
> > 
> > > @@ -738,6 +925,15 @@ static int st_fdma_slave_config(struct dma_chan 
> > > *chan,
> > >   struct dma_slave_config *slave_cfg)
> > >  {
> > >   struct st_fdma_chan *fchan = to_st_fdma_chan(chan);
> > > + int ret;
> > > +
> > > + if (!atomic_read(>fdev->fw_loaded)) {
> > > + ret = st_fdma_get_fw(fchan->fdev);
> > 
> > this seems quite an odd place to load firmware, can you explain why
> > 
> 
> Good question :) I've been round the houses a bit on the firmware loading.
> 
> I will try and document here what I've tried and the different advantages /
> disadvantages I've encountered.

Per Linus, driver should load firmware on first open. So either you should
do it while channel allocation or first descriptor allocation.

> 
> In V3 patches, I used the device_config() hook. This can sleep (I think..it 
> isn't
> documented here [1]). This hook happens very close to the dma starting and the
> rootfs / firmware is likely to be present. However I've sinced learnt that it
> doesn't get  called in the memcpy case so is not a sufficient solution.
> 
> In part putting it here was to try and get firmware loading out of probe() 
> based
> on your feedback to v1 [2].
> 
> In V1 patches: - I was using request_firmware_nowait() in probe() in 
> conjunction with
> the CONFIG_FW_LOADER_USER_HELPER kernel option. I've since also learnt that 
> this
> kernel option is deprecated and shouldn't be used.

I think nowait version is better. Why do you need
CONFIG_FW_LOADER_USER_HELPER?

> Your feedback in v1 was to move firmware loading to device open [2]. However
> what classes is  device open in the dmaengine context is not obvious.

channel allocation or descriptor allocation

> 
> ===
> 
> device_alloc_chan_resources() hook
> 
> This is probably the hook closest to a device open in dmaengine context.
> This hook can sleep so request_firmware can be called. However if all drivers
> are built-in firmware loading still fails as ALSA drivers calls
> devm_snd_dmaengine_pcm_register() which requests dma channels during *its* 
> probe().
> This happens way before the rootfs is present and thus firmware loading still
> fails, and is still being done indirectly from a probe() anyway.
> 
> I had some discussion with Mark and Arnd on IRC, about the possibility of ALSA
> not rquesting DMA channels during their probe(). Marks opinion I believe was
> this wasn't a good solution as ALSA would be presenting devices to userspace 
> that potentially don't exist & can't work.

Hmmm, there are two parts, one is requesting the firmware and second is the
loading part.

Since descriptor allocation is atomic in nature, I think we should load the
firmware using nowait version and if you can atomically copy then do so at
first descriptor allocation. If not then you should copy in nowait handler

> ===
> 
> *_prep_*() hooks
> 
> + Always get called before a DMA is started
> 
> - *prep* hooks can be called from atomic context so can't sleep [1] and can't 
> use
>   request_firmware().
> - I tried using request_firmwqare_nowait() with GFP_ATOMIC flag firmware. This
>   half works but firmware loading doesn't complete before the DMA is started 
> which
>   means it often fails on the first DMA attempt.
> 
>  ===
> 
> Suggested solution for v4 patches: -
> 
> Both Arnd, Mark and Lee initial suggestion on irc was that if the device is 
> useless
> without firmware, and the firmware isn't available then the driver should call
> request_firmware() in probe() and -EPROBE_DEFER if not available.
> 
> This solution works well, in both the built-in and modules case. As the 
> deffered
> fdma driver re-probes after the rootfs has been mounted when the firmware is
> available. The other ALSA depedencies then also re-probe and everything works
> nicely (and you can be assured that the devices will actually work at this 
> point).
> 
> This fdma driver and the ALSA ones will be enabled in multi_v7_defconfig as 
> modules,
> so it is only likely to be used as a module. However using -EPROBE_DEFER
> mechanism, has the nice quality that if it is compiled as built-in you still 
> end
> up with a working system (albeit with a longer boot time).

I think that makes sense, relying on deferred probing seems a better way to
manage

> 
> Failing that the only other solution I can see is extending the dmaengine API 
> to
> provide another hook which can sleep, and must be called before a DMA
> transation is started. Note that firmware isn't actually *required* until we
> start the dma transaction, but the 

Re: Question about request queues in I/O scheduling

2016-05-11 Thread Max Kanushin

Hello,

I was trying to use spin_trylock(q->queue_lock) on several queues when 
q->nr_sorted exceeds the per-defined number to block them and unlock the 
them later with spin_unlock(q->queue_lock), but I have faced the 
following problem: my system freezes when I am trying to test it by 
moving files around in my system. I thought it might be caused by 
blocking a queue of a system partition or swap, but I do not know how to 
check this as well.
Is it the right way to stop the queue/elevator? What are freezes might 
be caused by? Is there a simple way to find out which block device is a 
queue belong to?

I would really appreciate your reply.

Best regards,
  Max Kanushin.

On 04/29/2016 07:22 PM, Jeff Moyer wrote:

Max Kanushin  writes:


Thank you very much for the reply. My general idea is take control of all
request queues to block and unblock them manually depending on their load.
One of my steps is to find a length of a queue to decide if to block it.
Actually I think I've found the way. If I got it right, I can move from one
request to another within the list:
struct list_head queue_head;
So that I can calculate how many requests are there in the queue.


Hi, Max,

The queue_head is the dispatch list for I/O, so it does not represent
all I/O queued for a request_queue.  The number of requests in the
scheduler would be reflected by q->nr_sorted.  The number of dispatched
requests is in an array, in_flight.

Cheers,
Jeff





Best regards,
   Max Kanushin.
On Apr 29, 2016 6:32 PM, "Jeff Moyer"  wrote:


Max Kanushin  writes:


Hello,

I was searching for a way to find out the length of a request_queue
(that is defined as a structure in
linux/include/linux/blkdev.h). However I am new to the kernel
development and can't figure out where is the actual list of requests
to be processed by an elevator.
Is there a way to iterate requests in a queue or at least find a
number of them?


Hi, Max,

What exactly are you trying to accomplish?

Cheers,
Jeff



Re: Question about request queues in I/O scheduling

2016-05-11 Thread Max Kanushin

Hello,

I was trying to use spin_trylock(q->queue_lock) on several queues when 
q->nr_sorted exceeds the per-defined number to block them and unlock the 
them later with spin_unlock(q->queue_lock), but I have faced the 
following problem: my system freezes when I am trying to test it by 
moving files around in my system. I thought it might be caused by 
blocking a queue of a system partition or swap, but I do not know how to 
check this as well.
Is it the right way to stop the queue/elevator? What are freezes might 
be caused by? Is there a simple way to find out which block device is a 
queue belong to?

I would really appreciate your reply.

Best regards,
  Max Kanushin.

On 04/29/2016 07:22 PM, Jeff Moyer wrote:

Max Kanushin  writes:


Thank you very much for the reply. My general idea is take control of all
request queues to block and unblock them manually depending on their load.
One of my steps is to find a length of a queue to decide if to block it.
Actually I think I've found the way. If I got it right, I can move from one
request to another within the list:
struct list_head queue_head;
So that I can calculate how many requests are there in the queue.


Hi, Max,

The queue_head is the dispatch list for I/O, so it does not represent
all I/O queued for a request_queue.  The number of requests in the
scheduler would be reflected by q->nr_sorted.  The number of dispatched
requests is in an array, in_flight.

Cheers,
Jeff





Best regards,
   Max Kanushin.
On Apr 29, 2016 6:32 PM, "Jeff Moyer"  wrote:


Max Kanushin  writes:


Hello,

I was searching for a way to find out the length of a request_queue
(that is defined as a structure in
linux/include/linux/blkdev.h). However I am new to the kernel
development and can't figure out where is the actual list of requests
to be processed by an elevator.
Is there a way to iterate requests in a queue or at least find a
number of them?


Hi, Max,

What exactly are you trying to accomplish?

Cheers,
Jeff



[PATCH] kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call

2016-05-11 Thread Serge E. Hallyn
Our caller expects 0 on success, not >0.

This fixes a bug in the patch

cgroup, kernfs: make mountinfo show properly scoped path for cgroup 
namespaces

where /sys does not show up in mountinfo, breaking criu.

Thanks for catching this, Andrei.

Reported-by: Andrei Vagin 
Signed-off-by: Serge Hallyn 
---
 fs/kernfs/mount.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 3b78724..3d670a3 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -50,7 +50,8 @@ static int kernfs_sop_show_path(struct seq_file *sf, struct 
dentry *dentry)
if (scops && scops->show_path)
return scops->show_path(sf, node, root);
 
-   return seq_dentry(sf, dentry, " \t\n\\");
+   seq_dentry(sf, dentry, " \t\n\\");
+   return 0;
 }
 
 const struct super_operations kernfs_sops = {
-- 
2.7.4



[PATCH] kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call

2016-05-11 Thread Serge E. Hallyn
Our caller expects 0 on success, not >0.

This fixes a bug in the patch

cgroup, kernfs: make mountinfo show properly scoped path for cgroup 
namespaces

where /sys does not show up in mountinfo, breaking criu.

Thanks for catching this, Andrei.

Reported-by: Andrei Vagin 
Signed-off-by: Serge Hallyn 
---
 fs/kernfs/mount.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 3b78724..3d670a3 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -50,7 +50,8 @@ static int kernfs_sop_show_path(struct seq_file *sf, struct 
dentry *dentry)
if (scops && scops->show_path)
return scops->show_path(sf, node, root);
 
-   return seq_dentry(sf, dentry, " \t\n\\");
+   seq_dentry(sf, dentry, " \t\n\\");
+   return 0;
 }
 
 const struct super_operations kernfs_sops = {
-- 
2.7.4



Re: [PATCH 0.14] oom detection rework v6

2016-05-11 Thread Joonsoo Kim
On Thu, May 12, 2016 at 11:23:34AM +0900, Joonsoo Kim wrote:
> On Tue, May 10, 2016 at 11:43:48AM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 15:41:04, Joonsoo Kim wrote:
> > > 2016-05-05 3:16 GMT+09:00 Michal Hocko :
> > > > On Wed 04-05-16 23:32:31, Joonsoo Kim wrote:
> > > >> 2016-05-04 17:47 GMT+09:00 Michal Hocko :
> > [...]
> > > >> > progress. What is the usual reason to disable compaction in the first
> > > >> > place?
> > > >>
> > > >> I don't disable it. But, who knows who disable compaction? It's been 
> > > >> *not*
> > > >> a long time that CONFIG_COMPACTION is default enable. Maybe, 3 years?
> > > >
> > > > I would really like to hear about real life usecase before we go and
> > > > cripple otherwise deterministic algorithms. It might be very well
> > > > possible that those configurations simply do not have problems with high
> > > > order allocations because they are too specific.
> > 
> > Sorry for insisting but I would really like to hear some answer for
> > this, please.
> 
> I don't know. Who knows? How you can make sure that? And, I don't like
> below fixup. Theoretically, it could retry forever.
> 
> > 
> > [...]
> > > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > >> > index 2e7e26c5d3ba..f48b9e9b1869 100644
> > > >> > --- a/mm/page_alloc.c
> > > >> > +++ b/mm/page_alloc.c
> > > >> > @@ -3319,6 +3319,24 @@ should_compact_retry(struct alloc_context 
> > > >> > *ac, unsigned int order, int alloc_fla
> > > >> >  enum migrate_mode *migrate_mode,
> > > >> >  int compaction_retries)
> > > >> >  {
> > > >> > +   struct zone *zone;
> > > >> > +   struct zoneref *z;
> > > >> > +
> > > >> > +   if (order > PAGE_ALLOC_COSTLY_ORDER)
> > > >> > +   return false;
> > > >> > +
> > > >> > +   /*
> > > >> > +* There are setups with compaction disabled which would 
> > > >> > prefer to loop
> > > >> > +* inside the allocator rather than hit the oom killer 
> > > >> > prematurely. Let's
> > > >> > +* give them a good hope and keep retrying while the order-0 
> > > >> > watermarks
> > > >> > +* are OK.
> > > >> > +*/
> > > >> > +   for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, 
> > > >> > ac->high_zoneidx,
> > > >> > +   ac->nodemask) {
> > > >> > +   if(zone_watermark_ok(zone, 0, min_wmark_pages(zone),
> > > >> > +   ac->high_zoneidx, 
> > > >> > alloc_flags))
> > > >> > +   return true;
> > > >> > +   }
> > > >> > return false;
> > [...]
> > > My benchmark is too specific so I make another one. It does very
> > > simple things.
> > > 
> > > 1) Run the system with 256 MB memory and 2 GB swap
> > > 2) Run memory-hogger which takes (anonymous memory) 256 MB
> > > 3) Make 1000 new processes by fork (It will take 16 MB order-2 pages)
> > > 
> > > You can do it yourself with above instructions.
> > > 
> > > On current upstream kernel without CONFIG_COMPACTION, OOM doesn't happen.
> > > On next-20160509 kernel without CONFIG_COMPACTION, OOM happens when
> > > roughly *500* processes forked.
> > > 
> > > With CONFIG_COMPACTION, OOM doesn't happen on any kernel.
> > 
> > Does the patch I have posted helped?
> 
> I guess that it will help but please do it by yourself. It's simple.
> 
> > > Other kernels doesn't trigger OOM even if I make 1 new processes.
> > 
> > Is this an usual load on !CONFIG_COMPACTION configurations?
> 
> I don't know. User-space developer doesn't take care about kernel
> configuration and it seems that fork 500 times when memory is full is
> not a corner case to me.
> 
> > > This example is very intuitive and reasonable. I think that it's not
> > > artificial.  It has enough swap space so OOM should not happen.
> > 
> > I am not really convinced this is true actually. You can have an
> > arbitrary amount of the swap space yet it still won't help you
> > because more reclaimed memory simply doesn't imply a more continuous
> > memory. This is a fundamental problem. So I think that relying on
> > !CONFIG_COMPACTION for heavy fork (or other high order) loads simply
> > never works reliably.
> 
> I think that you don't understand how powerful the reclaim and
> compaction are. In the system with large disk swap, what compaction can do
> is also possible for reclaim. Reclaim can do more.
> 
> Think about following examples.
> 
> _: free
> U: used(unmovable)
> M: used(migratable and reclaimable)
> 
> _MUU _U_U  
> 
> With compaction (assume theoretically best algorithm),
> just 3 contiguous region can be made like as following:
> 
> MMUU MUMU ___M 
> 
> With reclaim, we can make 8 contiguous region.
> 
> __UU _U_U  
> 
> Reclaim can be easily affected by thrashing but it is fundamentally
> more powerful than compaction.

Hmm... I uses wrong example here because if there is enough 

Re: [PATCH 0.14] oom detection rework v6

2016-05-11 Thread Joonsoo Kim
On Thu, May 12, 2016 at 11:23:34AM +0900, Joonsoo Kim wrote:
> On Tue, May 10, 2016 at 11:43:48AM +0200, Michal Hocko wrote:
> > On Tue 10-05-16 15:41:04, Joonsoo Kim wrote:
> > > 2016-05-05 3:16 GMT+09:00 Michal Hocko :
> > > > On Wed 04-05-16 23:32:31, Joonsoo Kim wrote:
> > > >> 2016-05-04 17:47 GMT+09:00 Michal Hocko :
> > [...]
> > > >> > progress. What is the usual reason to disable compaction in the first
> > > >> > place?
> > > >>
> > > >> I don't disable it. But, who knows who disable compaction? It's been 
> > > >> *not*
> > > >> a long time that CONFIG_COMPACTION is default enable. Maybe, 3 years?
> > > >
> > > > I would really like to hear about real life usecase before we go and
> > > > cripple otherwise deterministic algorithms. It might be very well
> > > > possible that those configurations simply do not have problems with high
> > > > order allocations because they are too specific.
> > 
> > Sorry for insisting but I would really like to hear some answer for
> > this, please.
> 
> I don't know. Who knows? How you can make sure that? And, I don't like
> below fixup. Theoretically, it could retry forever.
> 
> > 
> > [...]
> > > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > >> > index 2e7e26c5d3ba..f48b9e9b1869 100644
> > > >> > --- a/mm/page_alloc.c
> > > >> > +++ b/mm/page_alloc.c
> > > >> > @@ -3319,6 +3319,24 @@ should_compact_retry(struct alloc_context 
> > > >> > *ac, unsigned int order, int alloc_fla
> > > >> >  enum migrate_mode *migrate_mode,
> > > >> >  int compaction_retries)
> > > >> >  {
> > > >> > +   struct zone *zone;
> > > >> > +   struct zoneref *z;
> > > >> > +
> > > >> > +   if (order > PAGE_ALLOC_COSTLY_ORDER)
> > > >> > +   return false;
> > > >> > +
> > > >> > +   /*
> > > >> > +* There are setups with compaction disabled which would 
> > > >> > prefer to loop
> > > >> > +* inside the allocator rather than hit the oom killer 
> > > >> > prematurely. Let's
> > > >> > +* give them a good hope and keep retrying while the order-0 
> > > >> > watermarks
> > > >> > +* are OK.
> > > >> > +*/
> > > >> > +   for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, 
> > > >> > ac->high_zoneidx,
> > > >> > +   ac->nodemask) {
> > > >> > +   if(zone_watermark_ok(zone, 0, min_wmark_pages(zone),
> > > >> > +   ac->high_zoneidx, 
> > > >> > alloc_flags))
> > > >> > +   return true;
> > > >> > +   }
> > > >> > return false;
> > [...]
> > > My benchmark is too specific so I make another one. It does very
> > > simple things.
> > > 
> > > 1) Run the system with 256 MB memory and 2 GB swap
> > > 2) Run memory-hogger which takes (anonymous memory) 256 MB
> > > 3) Make 1000 new processes by fork (It will take 16 MB order-2 pages)
> > > 
> > > You can do it yourself with above instructions.
> > > 
> > > On current upstream kernel without CONFIG_COMPACTION, OOM doesn't happen.
> > > On next-20160509 kernel without CONFIG_COMPACTION, OOM happens when
> > > roughly *500* processes forked.
> > > 
> > > With CONFIG_COMPACTION, OOM doesn't happen on any kernel.
> > 
> > Does the patch I have posted helped?
> 
> I guess that it will help but please do it by yourself. It's simple.
> 
> > > Other kernels doesn't trigger OOM even if I make 1 new processes.
> > 
> > Is this an usual load on !CONFIG_COMPACTION configurations?
> 
> I don't know. User-space developer doesn't take care about kernel
> configuration and it seems that fork 500 times when memory is full is
> not a corner case to me.
> 
> > > This example is very intuitive and reasonable. I think that it's not
> > > artificial.  It has enough swap space so OOM should not happen.
> > 
> > I am not really convinced this is true actually. You can have an
> > arbitrary amount of the swap space yet it still won't help you
> > because more reclaimed memory simply doesn't imply a more continuous
> > memory. This is a fundamental problem. So I think that relying on
> > !CONFIG_COMPACTION for heavy fork (or other high order) loads simply
> > never works reliably.
> 
> I think that you don't understand how powerful the reclaim and
> compaction are. In the system with large disk swap, what compaction can do
> is also possible for reclaim. Reclaim can do more.
> 
> Think about following examples.
> 
> _: free
> U: used(unmovable)
> M: used(migratable and reclaimable)
> 
> _MUU _U_U  
> 
> With compaction (assume theoretically best algorithm),
> just 3 contiguous region can be made like as following:
> 
> MMUU MUMU ___M 
> 
> With reclaim, we can make 8 contiguous region.
> 
> __UU _U_U  
> 
> Reclaim can be easily affected by thrashing but it is fundamentally
> more powerful than compaction.

Hmm... I uses wrong example here because if there is enough freepage,
compaction using theoretically best 

Re: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR

2016-05-11 Thread Andy Gross
On Wed, May 11, 2016 at 11:04:17PM +0530, Abhishek Sahu wrote:



> >   In qup_i2c_xfer and qup_i2c_xfer_v2 state is set to RESET at the
> >end, when
> >   there is no error. So would it be fine if we do it there
> >unconditionally ?
> >
> >Regards,
> > Sricharan
> 
> RESET the QUP state wouldn't create any issue in the case of multiple calls.
> The existing code also RESET the QUP state for bus_err but it is not
> clearing
> status bits.

It'd be better to not reset the QUP inside the ISR at all.  I think the better
solution is making the reset occur in the xfer function.  So just clear the bits
like you should in the isr, and defer reset till later.


Re: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR

2016-05-11 Thread Andy Gross
On Wed, May 11, 2016 at 11:04:17PM +0530, Abhishek Sahu wrote:



> >   In qup_i2c_xfer and qup_i2c_xfer_v2 state is set to RESET at the
> >end, when
> >   there is no error. So would it be fine if we do it there
> >unconditionally ?
> >
> >Regards,
> > Sricharan
> 
> RESET the QUP state wouldn't create any issue in the case of multiple calls.
> The existing code also RESET the QUP state for bus_err but it is not
> clearing
> status bits.

It'd be better to not reset the QUP inside the ISR at all.  I think the better
solution is making the reset occur in the xfer function.  So just clear the bits
like you should in the isr, and defer reset till later.


Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

2016-05-11 Thread Peter Zijlstra
On Thu, May 12, 2016 at 12:05:37PM +1000, Michael Neuling wrote:
> On Wed, 2016-05-11 at 20:24 +0200, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 02:33:45PM +0200, Peter Zijlstra wrote:
> > > 
> > > Hmm, PPC folks; what does your topology look like?
> > > 
> > > Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
> > > seems to suggest your cores do not share cache at all.
> > > 
> > > https://en.wikipedia.org/wiki/POWER7 seems to agree and states
> > > 
> > >   "4 MB L3 cache per C1 core"
> > > 
> > > And http://www-03.ibm.com/systems/resources/systems_power_software_i_pe
> > > rfmgmt_underthehood.pdf
> > > also explicitly draws pictures with the L3 per core.
> > > 
> > > _however_, that same document describes L3 inter-core fill and lateral
> > > cast-out, which sounds like the L3s work together to form a node wide
> > > caching system.
> > > 
> > > Do we want to model this co-operative L3 slices thing as a sort of
> > > node-wide LLC for the purpose of the scheduler ?
> > Going back a generation; Power6 seems to have a shared L3 (off package)
> > between the two cores on the package. The current topology does not
> > reflect that at all.
> > 
> > And going forward a generation; Power8 seems to share the per-core
> > (chiplet) L3 amonst all cores (chiplets) + is has the centaur (memory
> > controller) 16M L4.
> 
> Yep, L1/L2/L3 is per core on POWER8 and POWER7.  POWER6 and POWER5 (both
> dual core chips) had a shared off chip cache

But as per the above, Power7 and Power8 have explicit logic to share the
per-core L3 with the other cores.

How effective is that? From some of the slides/documents i've looked at
the L3s are connected with a high-speed fabric. Suggesting that the
cross-core sharing should be fairly efficient.

In which case it would make sense to treat/model the combined L3 as a
single large LLC covering all cores.


Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

2016-05-11 Thread Peter Zijlstra
On Thu, May 12, 2016 at 12:05:37PM +1000, Michael Neuling wrote:
> On Wed, 2016-05-11 at 20:24 +0200, Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 02:33:45PM +0200, Peter Zijlstra wrote:
> > > 
> > > Hmm, PPC folks; what does your topology look like?
> > > 
> > > Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
> > > seems to suggest your cores do not share cache at all.
> > > 
> > > https://en.wikipedia.org/wiki/POWER7 seems to agree and states
> > > 
> > >   "4 MB L3 cache per C1 core"
> > > 
> > > And http://www-03.ibm.com/systems/resources/systems_power_software_i_pe
> > > rfmgmt_underthehood.pdf
> > > also explicitly draws pictures with the L3 per core.
> > > 
> > > _however_, that same document describes L3 inter-core fill and lateral
> > > cast-out, which sounds like the L3s work together to form a node wide
> > > caching system.
> > > 
> > > Do we want to model this co-operative L3 slices thing as a sort of
> > > node-wide LLC for the purpose of the scheduler ?
> > Going back a generation; Power6 seems to have a shared L3 (off package)
> > between the two cores on the package. The current topology does not
> > reflect that at all.
> > 
> > And going forward a generation; Power8 seems to share the per-core
> > (chiplet) L3 amonst all cores (chiplets) + is has the centaur (memory
> > controller) 16M L4.
> 
> Yep, L1/L2/L3 is per core on POWER8 and POWER7.  POWER6 and POWER5 (both
> dual core chips) had a shared off chip cache

But as per the above, Power7 and Power8 have explicit logic to share the
per-core L3 with the other cores.

How effective is that? From some of the slides/documents i've looked at
the L3s are connected with a high-speed fabric. Suggesting that the
cross-core sharing should be fairly efficient.

In which case it would make sense to treat/model the combined L3 as a
single large LLC covering all cores.


Re: [PATCH] tty: serial: msm: Disable restoring Rx interrupts for DMA Mode

2016-05-11 Thread Andy Gross
On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote:
> On 05/10, Abhishek Sahu wrote:
> > From: Charanya 
> 
> Was it intentional to only have one name here?
> 
> > 
> > The Data loss was happening with current QCOM MSM serial driver during
> > large file transfer due to simultaneous enabling of both UART and
> > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> > watermark) or RXSTALE (stale interrupts) should not be enabled,
> > since these conditions will be handled by DMA controller itself.
> > If these interrupts are enabled then normal UART ISR will read some
> > bytes of data from Rx Buffer and DMA controller will not receive
> > these bytes of data, which will cause data loss.
> > 
> > Now this patch removed the code for enabling of RXLEV and RXSTALE
> > interrupt in DMA Rx completion routine.
> 
> I'm lost, we keep both these irqs masked (well only if uartdm
> version is 1.4 or greater) pretty much the entire time we're
> using DMA for RX. msm_start_rx_dma() will mask them and then when
> the callback completes we'll unmask them (the part that's deleted
> in this patch), but then we'll go back and remask them almost
> immediately because we call msm_start_rx_dma() from the dma
> completion handler.
> 
> Can you clearly describe how this is actually fixing any
> problems? What's the sequence of events that happens to cause
> corruption?
> 
> This does raise the question though why we ever mask/unmask these
> interrupts if we're always going to keep them masked while doing
> DMA RX. Presumably if we can use DMA to RX, we can always use it
> and set things up properly at startup time instead of later on.

Thats probably the right thing to do.  We shouldn't be masking/unmasking
the unused IRQs to begin with.


Re: [PATCH] tty: serial: msm: Disable restoring Rx interrupts for DMA Mode

2016-05-11 Thread Andy Gross
On Wed, May 11, 2016 at 06:41:26PM -0700, Stephen Boyd wrote:
> On 05/10, Abhishek Sahu wrote:
> > From: Charanya 
> 
> Was it intentional to only have one name here?
> 
> > 
> > The Data loss was happening with current QCOM MSM serial driver during
> > large file transfer due to simultaneous enabling of both UART and
> > DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> > watermark) or RXSTALE (stale interrupts) should not be enabled,
> > since these conditions will be handled by DMA controller itself.
> > If these interrupts are enabled then normal UART ISR will read some
> > bytes of data from Rx Buffer and DMA controller will not receive
> > these bytes of data, which will cause data loss.
> > 
> > Now this patch removed the code for enabling of RXLEV and RXSTALE
> > interrupt in DMA Rx completion routine.
> 
> I'm lost, we keep both these irqs masked (well only if uartdm
> version is 1.4 or greater) pretty much the entire time we're
> using DMA for RX. msm_start_rx_dma() will mask them and then when
> the callback completes we'll unmask them (the part that's deleted
> in this patch), but then we'll go back and remask them almost
> immediately because we call msm_start_rx_dma() from the dma
> completion handler.
> 
> Can you clearly describe how this is actually fixing any
> problems? What's the sequence of events that happens to cause
> corruption?
> 
> This does raise the question though why we ever mask/unmask these
> interrupts if we're always going to keep them masked while doing
> DMA RX. Presumably if we can use DMA to RX, we can always use it
> and set things up properly at startup time instead of later on.

Thats probably the right thing to do.  We shouldn't be masking/unmasking
the unused IRQs to begin with.


Re: [PATCH -v2] x86/hweight: Get rid of the special calling convention

2016-05-11 Thread H. Peter Anvin
On May 11, 2016 4:24:09 AM PDT, Peter Zijlstra  wrote:
>On Wed, May 11, 2016 at 07:15:19AM -0400, Brian Gerst wrote:
>
>> I think he meant the out of line version would be asm, so you could
>> control what registers were clobbered.
>
>Yeah, it might save a few cycles on the call, but given that most
>machines should have popcnt these days is it worth the hassle/cost of
>duplicating the lib/hweight.c magic in asm (and remember, twice, once
>for 32bit and once for 64bit) ?

I was thinking it isn't really very complex code even in assembly as it is 
super-regular; you can even crib the gcc-generated code if you wish.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.


Re: [PATCH -v2] x86/hweight: Get rid of the special calling convention

2016-05-11 Thread H. Peter Anvin
On May 11, 2016 4:24:09 AM PDT, Peter Zijlstra  wrote:
>On Wed, May 11, 2016 at 07:15:19AM -0400, Brian Gerst wrote:
>
>> I think he meant the out of line version would be asm, so you could
>> control what registers were clobbered.
>
>Yeah, it might save a few cycles on the call, but given that most
>machines should have popcnt these days is it worth the hassle/cost of
>duplicating the lib/hweight.c magic in asm (and remember, twice, once
>for 32bit and once for 64bit) ?

I was thinking it isn't really very complex code even in assembly as it is 
super-regular; you can even crib the gcc-generated code if you wish.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.


RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 12, 2016 10:21 AM
> 
> On Thu, 12 May 2016 01:19:44 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, May 11, 2016 11:54 PM
> > >
> > > On Wed, 11 May 2016 06:29:06 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > >
> > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > >
> > > > > > > Hi David and Kevin,
> > > > > > >
> > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > >
> > > > > > > > From: Tian, Kevin
> > > > > > > >> Sent: 05 May 2016 10:37
> > > > > > > > ...
> > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > >>> PPC64.
> > > > > > > >>>
> > > > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > received that writes the required word through that address.
> > > > > > > >
> > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > write
> > > > > > > > cycle.
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > >
> > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > MSI-X table.
> > > > > > >
> > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > capability of IRQ remapping could provide this
> > > > > > > kind of protection.
> > > > > > >
> > > > > >
> > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > > specific, but at least for Intel VT-d specification, MSI-X entry 
> > > > > > must
> > > > > > be configured with a remappable format by host kernel which
> > > > > > contains an index into IRQ remapping table. The index will find a
> > > > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > > > device. If you allow a malicious program random index into MSI-X
> > > > > > entry of assigned device, the hole is obvious...
> > > > > >
> > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > > enabled or not.
> > > > >
> > > > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > > > table to the guest and the guest can make direct use of it.  The end
> > > > > goal here is that the guest on a power system is already
> > > > > paravirtualized to not program the device MSI-X by directly writing to
> > > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > > always run virtualized.  Therefore a) they never intend to touch the
> > > > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > > > can only hurt itself by doing so.
> > > > >
> > > > > On x86 we don't have a), our method of programming the MSI-X vector
> > > > > table is to directly write to it. Therefore we will always require 
> > > > > QEMU
> > > > > to place a MemoryRegion over the vector table to intercept those
> > > > > accesses.  However with interrupt remapping, we do have b) on x86, 
> > > > > which
> > > > > means that we don't need to be so strict in disallowing user accesses
> > > > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > > > the device, but the user should only be able to hurt themselves by
> > > > > writing it directly.  x86 doesn't really get anything out of this
> > > > > change, but it helps this special case on power pretty significantly
> > > > > aiui.  Thanks,
> > > > >
> > > >
> > > > Allowing guest direct write to MSI-x table has system-wide impact.
> > > > As I explained earlier, hypervisor needs to control "interrupt_index"
> 

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, May 12, 2016 10:21 AM
> 
> On Thu, 12 May 2016 01:19:44 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, May 11, 2016 11:54 PM
> > >
> > > On Wed, 11 May 2016 06:29:06 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > > >
> > > > > On Thu, 5 May 2016 12:15:46 +
> > > > > "Tian, Kevin"  wrote:
> > > > >
> > > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > > >
> > > > > > > Hi David and Kevin,
> > > > > > >
> > > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > > >
> > > > > > > > From: Tian, Kevin
> > > > > > > >> Sent: 05 May 2016 10:37
> > > > > > > > ...
> > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > > >>> normal code path such as para-virtualized guest kernel on 
> > > > > > > >>> PPC64.
> > > > > > > >>>
> > > > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > > received that writes the required word through that address.
> > > > > > > >
> > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory 
> > > > > > > > write
> > > > > > > > cycle.
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > >
> > > > > > > If we have enough permission to load a malicious driver or
> > > > > > > kernel, we can easily break the guest without exposed
> > > > > > > MSI-X table.
> > > > > > >
> > > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > > the MSI-X table to break other guest or host. The
> > > > > > > capability of IRQ remapping could provide this
> > > > > > > kind of protection.
> > > > > > >
> > > > > >
> > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > > specific, but at least for Intel VT-d specification, MSI-X entry 
> > > > > > must
> > > > > > be configured with a remappable format by host kernel which
> > > > > > contains an index into IRQ remapping table. The index will find a
> > > > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > > > device. If you allow a malicious program random index into MSI-X
> > > > > > entry of assigned device, the hole is obvious...
> > > > > >
> > > > > > Above might make sense only for a IRQ remapping implementation
> > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > > enabled or not.
> > > > >
> > > > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > > > table to the guest and the guest can make direct use of it.  The end
> > > > > goal here is that the guest on a power system is already
> > > > > paravirtualized to not program the device MSI-X by directly writing to
> > > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > > always run virtualized.  Therefore a) they never intend to touch the
> > > > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > > > can only hurt itself by doing so.
> > > > >
> > > > > On x86 we don't have a), our method of programming the MSI-X vector
> > > > > table is to directly write to it. Therefore we will always require 
> > > > > QEMU
> > > > > to place a MemoryRegion over the vector table to intercept those
> > > > > accesses.  However with interrupt remapping, we do have b) on x86, 
> > > > > which
> > > > > means that we don't need to be so strict in disallowing user accesses
> > > > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > > > the device, but the user should only be able to hurt themselves by
> > > > > writing it directly.  x86 doesn't really get anything out of this
> > > > > change, but it helps this special case on power pretty significantly
> > > > > aiui.  Thanks,
> > > > >
> > > >
> > > > Allowing guest direct write to MSI-x table has system-wide impact.
> > > > As I explained earlier, hypervisor needs to control "interrupt_index"
> > > > programmed in MSI-X entry, which is used to associate a 

Re: [PATCH] tty: serial: msm: Remove duplicate handling of clocks

2016-05-11 Thread Pramod Gurav
On 12 May 2016 at 05:48, Stephen Boyd  wrote:
> On 05/11, Pramod Gurav wrote:
>> msm_serial driver provides a .pm callback to the serial core to enable
>> and disable clock resource in suspend/resume path. This function is
>> also called before msm_startup. msm_startup also enables the clocks which
>> is not needed. Hence remove the duplcate clock operation from msm_startup
>> and msm_shutdown. Same is done in console setup to get rid of duplicate
>> clock operation.
>
> I had to check and I see that for the console case we call the
> .pm callback and don't turn it off until suspend/resume paths
> (would be nice to add suspend/resume to this driver too). I
> guess that's what you meant by this last sentence?
>
Yes the clocks would be kept ON if its console.
I am working to add suspend/resume and runtime pm as it will not
happen by default through core.
>>
>> Tested on DB410C console.
>>
>> Signed-off-by: Pramod Gurav 
>
> Reviewed-by: Stephen Boyd 
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project


Re: [PATCH] tty: serial: msm: Remove duplicate handling of clocks

2016-05-11 Thread Pramod Gurav
On 12 May 2016 at 05:48, Stephen Boyd  wrote:
> On 05/11, Pramod Gurav wrote:
>> msm_serial driver provides a .pm callback to the serial core to enable
>> and disable clock resource in suspend/resume path. This function is
>> also called before msm_startup. msm_startup also enables the clocks which
>> is not needed. Hence remove the duplcate clock operation from msm_startup
>> and msm_shutdown. Same is done in console setup to get rid of duplicate
>> clock operation.
>
> I had to check and I see that for the console case we call the
> .pm callback and don't turn it off until suspend/resume paths
> (would be nice to add suspend/resume to this driver too). I
> guess that's what you meant by this last sentence?
>
Yes the clocks would be kept ON if its console.
I am working to add suspend/resume and runtime pm as it will not
happen by default through core.
>>
>> Tested on DB410C console.
>>
>> Signed-off-by: Pramod Gurav 
>
> Reviewed-by: Stephen Boyd 
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project


RE: [PATCH] usb: gadget: f_fs: report error if excess data received

2016-05-11 Thread Du, Changbin
> On Wed, May 11 2016, Felipe Balbi wrote:
> > Also, returning -EOVERFLOW is not exactly correct here, because you'd
> > violate POSIX specification of read(), right ?
> 
> Maybe we could piggyback on:
> 
>EINVAL fd was created via a call to timerfd_create(2) and the
>   wrong size buffer was given to read();
> 
> But I kinda agree.  I’m not sure how much we need to care about this
> instead of having user space round their buffers up to the nearest max
> packet size boundary.
> 
> --
> Best regards
> ミハウ “퓶퓲퓷퓪86” ナザレヴイツ
> «If at first you don’t succeed, give up skydiving»

This is a good idea that "having user space round their buffers". But kernel
Still cannot hide error silently. :)



RE: [PATCH] usb: gadget: f_fs: report error if excess data received

2016-05-11 Thread Du, Changbin
> On Wed, May 11 2016, Felipe Balbi wrote:
> > Also, returning -EOVERFLOW is not exactly correct here, because you'd
> > violate POSIX specification of read(), right ?
> 
> Maybe we could piggyback on:
> 
>EINVAL fd was created via a call to timerfd_create(2) and the
>   wrong size buffer was given to read();
> 
> But I kinda agree.  I’m not sure how much we need to care about this
> instead of having user space round their buffers up to the nearest max
> packet size boundary.
> 
> --
> Best regards
> ミハウ “퓶퓲퓷퓪86” ナザレヴイツ
> «If at first you don’t succeed, give up skydiving»

This is a good idea that "having user space round their buffers". But kernel
Still cannot hide error silently. :)



RE: [PATCH] usb: gadget: f_fs: report error if excess data received

2016-05-11 Thread Du, Changbin
> Hi,
> 
> changbin...@intel.com writes:
> > From: "Du, Changbin" 
> >
> > Since the buffer size for req is rounded up to maxpacketsize,
> > then we may end up with more data then user space has space
> > for.
> 
> only for OUT direction with the controller you're using ;-)
For sure.

> 
> > If it happen, we can keep the excess data for next i/o, or
> > report an error. But we cannot silently drop data, because
> > USB layer should ensure the data integrality it has transferred,
> > otherwise applications may get corrupt data if it doesn't
> > detect this case.
> 
> and when has this actually happened ? Host should not send more data in
> this case, if it does, it's an error on the host side. Also, returning
> -EOVERFLOW is not exactly correct here, because you'd violate POSIX
> specification of read(), right ?
> 
This can happen if the host side app force kill-restart, not taking care of this
special condition(and we are not documented), or even it is a bug. Usually APPs
may has  a protocol to control the packet size, but protocol mismatch can happen
if either side encounter an error.

Anyway, this is real. If kernel return success and drop data, the error may 
explosion later, or its totally hided (but why some data lost in kernel?
Kernel cannot tell userspace we cannot be trusted sometimes, right?). 
so IMO, if this is an error, we need report an error or fix it, not hide it.

The POSIX didn't say read cannot return "-EOVERFLOW", it says:
" Other errors may occur, depending on the object connected to fd."

If "-EOVERFLOW" is not suitable, EFAULT, or any suggestions?

> > Here, we simply report an error to userspace to let userspace
> > proccess. Actually, userspace applications should negotiate
> 
> no, this violates POSIX. Care to explain what problem are you actually
> facing ?
> 
Why this violates POSIX? Could you give more details?

The problem is device side app sometimes received incorrect data caused
by the dropping. Most times the error can be detected by APP itself, but
sometimes cannot. It depends on the design of its communication protocol.

> --
> Balbi

Best Regards,
Du, Changbin


RE: [PATCH] usb: gadget: f_fs: report error if excess data received

2016-05-11 Thread Du, Changbin
> Hi,
> 
> changbin...@intel.com writes:
> > From: "Du, Changbin" 
> >
> > Since the buffer size for req is rounded up to maxpacketsize,
> > then we may end up with more data then user space has space
> > for.
> 
> only for OUT direction with the controller you're using ;-)
For sure.

> 
> > If it happen, we can keep the excess data for next i/o, or
> > report an error. But we cannot silently drop data, because
> > USB layer should ensure the data integrality it has transferred,
> > otherwise applications may get corrupt data if it doesn't
> > detect this case.
> 
> and when has this actually happened ? Host should not send more data in
> this case, if it does, it's an error on the host side. Also, returning
> -EOVERFLOW is not exactly correct here, because you'd violate POSIX
> specification of read(), right ?
> 
This can happen if the host side app force kill-restart, not taking care of this
special condition(and we are not documented), or even it is a bug. Usually APPs
may has  a protocol to control the packet size, but protocol mismatch can happen
if either side encounter an error.

Anyway, this is real. If kernel return success and drop data, the error may 
explosion later, or its totally hided (but why some data lost in kernel?
Kernel cannot tell userspace we cannot be trusted sometimes, right?). 
so IMO, if this is an error, we need report an error or fix it, not hide it.

The POSIX didn't say read cannot return "-EOVERFLOW", it says:
" Other errors may occur, depending on the object connected to fd."

If "-EOVERFLOW" is not suitable, EFAULT, or any suggestions?

> > Here, we simply report an error to userspace to let userspace
> > proccess. Actually, userspace applications should negotiate
> 
> no, this violates POSIX. Care to explain what problem are you actually
> facing ?
> 
Why this violates POSIX? Could you give more details?

The problem is device side app sometimes received incorrect data caused
by the dropping. Most times the error can be detected by APP itself, but
sometimes cannot. It depends on the design of its communication protocol.

> --
> Balbi

Best Regards,
Du, Changbin


linux-next: manual merge of the gpio tree with the mips tree

2016-05-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the gpio tree got a conflict in:

  arch/mips/Kconfig

between commit:

  cec47c182286 ("MIPS: Loongson: Add Loongson-3A R2 basic support")

from the mips tree and commit:

  d30a2b47d4c2 ("MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB")

from the gpio tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/mips/Kconfig
index ac9bfad794eb,512b5def854d..
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@@ -1354,8 -1340,7 +1352,8 @@@ config CPU_LOONGSON
select CPU_SUPPORTS_HUGEPAGES
select WEAK_ORDERING
select WEAK_REORDERING_BEYOND_LLSC
 +  select MIPS_PGD_C0_CONTEXT
-   select ARCH_REQUIRE_GPIOLIB
+   select GPIOLIB
help
The Loongson 3 processor implements the MIPS64R2 instruction
set with many extensions.


linux-next: manual merge of the gpio tree with the mips tree

2016-05-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the gpio tree got a conflict in:

  arch/mips/Kconfig

between commit:

  cec47c182286 ("MIPS: Loongson: Add Loongson-3A R2 basic support")

from the mips tree and commit:

  d30a2b47d4c2 ("MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB")

from the gpio tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/mips/Kconfig
index ac9bfad794eb,512b5def854d..
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@@ -1354,8 -1340,7 +1352,8 @@@ config CPU_LOONGSON
select CPU_SUPPORTS_HUGEPAGES
select WEAK_ORDERING
select WEAK_REORDERING_BEYOND_LLSC
 +  select MIPS_PGD_C0_CONTEXT
-   select ARCH_REQUIRE_GPIOLIB
+   select GPIOLIB
help
The Loongson 3 processor implements the MIPS64R2 instruction
set with many extensions.


linux-next: build failure after merge of the pinctrl tree

2016-05-11 Thread Stephen Rothwell
Hi Linus,

After merging the pinctrl tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

In file included from include/linux/thread_info.h:11:0,
 from include/asm-generic/preempt.h:4,
 from arch/arm/include/generated/asm/preempt.h:1,
 from include/linux/preempt.h:59,
 from include/linux/spinlock.h:50,
 from include/linux/seqlock.h:35,
 from include/linux/time.h:5,
 from include/linux/stat.h:18,
 from include/linux/module.h:10,
 from drivers/pinctrl/tegra/pinctrl-tegra20.c:20:
include/linux/bug.h:34:45: error: unknown field 'parked_reg' specified in 
initializer
 #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
 ^
include/linux/compiler-gcc.h:64:28: note: in expansion of macro 
'BUILD_BUG_ON_ZERO'
 #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
^
include/linux/kernel.h:54:59: note: in expansion of macro '__must_be_array'
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
   ^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2027:12: note: in expansion of macro 
'ARRAY_SIZE'
   .npins = ARRAY_SIZE(drive_##pg_name##_pins), \
^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2050:2: note: in expansion of macro 
'DRV_PG_EXT'
  DRV_PG_EXT(pg_name, r, 2,  3,  4, 12, 20, 28, 2, 30, 2)
  ^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2178:2: note: in expansion of macro 
'DRV_PG'
  DRV_PG(ao1,0x868),
  ^

and many more.

I cannot figure out what caused it, but using the pinctrl tree from
next-20160511 makes it build again.

I am using gcc 5.2.0 hosted on powerpcle, if that matters.

-- 
Cheers,
Stephen Rothwell


linux-next: build failure after merge of the pinctrl tree

2016-05-11 Thread Stephen Rothwell
Hi Linus,

After merging the pinctrl tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

In file included from include/linux/thread_info.h:11:0,
 from include/asm-generic/preempt.h:4,
 from arch/arm/include/generated/asm/preempt.h:1,
 from include/linux/preempt.h:59,
 from include/linux/spinlock.h:50,
 from include/linux/seqlock.h:35,
 from include/linux/time.h:5,
 from include/linux/stat.h:18,
 from include/linux/module.h:10,
 from drivers/pinctrl/tegra/pinctrl-tegra20.c:20:
include/linux/bug.h:34:45: error: unknown field 'parked_reg' specified in 
initializer
 #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
 ^
include/linux/compiler-gcc.h:64:28: note: in expansion of macro 
'BUILD_BUG_ON_ZERO'
 #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0]))
^
include/linux/kernel.h:54:59: note: in expansion of macro '__must_be_array'
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
   ^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2027:12: note: in expansion of macro 
'ARRAY_SIZE'
   .npins = ARRAY_SIZE(drive_##pg_name##_pins), \
^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2050:2: note: in expansion of macro 
'DRV_PG_EXT'
  DRV_PG_EXT(pg_name, r, 2,  3,  4, 12, 20, 28, 2, 30, 2)
  ^
drivers/pinctrl/tegra/pinctrl-tegra20.c:2178:2: note: in expansion of macro 
'DRV_PG'
  DRV_PG(ao1,0x868),
  ^

and many more.

I cannot figure out what caused it, but using the pinctrl tree from
next-20160511 makes it build again.

I am using gcc 5.2.0 hosted on powerpcle, if that matters.

-- 
Cheers,
Stephen Rothwell


RE: [PATCH v7 10/14] usb: otg: add hcd companion support

2016-05-11 Thread Yoshihiro Shimoda
Hi,

> From: Alan Stern
> Sent: Wednesday, May 11, 2016 11:47 PM
> 
> On Wed, 11 May 2016, Roger Quadros wrote:
> 
> > > What I mean is if you have 2 EHCI controllers with 2 companion
> > > controllers, don't you need to know which companion goes with which EHCI
> > > controller? Just like you do for the otg-controller property.
> > >
> >
> > That is a very good point. I'm not very sure and it seems that current code 
> > won't work
> > with multiple EHCI + companion instances.

I may misunderstand this topic, but if I use the following environment, it 
works correctly.

< My environment >
- an otg controller: Sets hcd-needs-companion.
- ehci0 and ohci0 and a function: They connect to the otg controller using 
"otg-controller" property.
- ehci1 and ohci1: No "otg-controller" property.
- ehci2 and ohci2: No "otg-controller" property.

In this environment, all hosts works correctly.
Also I think if we have 2 otg controlelrs, it should be work because otg_dev 
instance differs.
Or, does this topic assume an otg controller handles 2 EHCI controllers?
I'm not sure such environment actually exists.

> > Alan, does USB core even know which EHCI and OHCI are linked to the same 
> > port
> > or the handoff is software transparent?
> 
> The core knows.  It doesn't use the information for a whole lot of
> things, but it does use it in a couple of places.  Search for
> "companion" in core/hcd-pci.c and you'll see.

Thank you for the information. I didn't know this code.
If my understanding is correct, the core/hcd-pci.c code will not be used by 
non-PCI devices.
In other words, nobody sets "hcd->self.hs_companion" if we use such a device.
So, I will try to add such a code if needed.

Best regards,
Yoshihiro Shimoda

> Alan Stern



RE: [PATCH v7 10/14] usb: otg: add hcd companion support

2016-05-11 Thread Yoshihiro Shimoda
Hi,

> From: Alan Stern
> Sent: Wednesday, May 11, 2016 11:47 PM
> 
> On Wed, 11 May 2016, Roger Quadros wrote:
> 
> > > What I mean is if you have 2 EHCI controllers with 2 companion
> > > controllers, don't you need to know which companion goes with which EHCI
> > > controller? Just like you do for the otg-controller property.
> > >
> >
> > That is a very good point. I'm not very sure and it seems that current code 
> > won't work
> > with multiple EHCI + companion instances.

I may misunderstand this topic, but if I use the following environment, it 
works correctly.

< My environment >
- an otg controller: Sets hcd-needs-companion.
- ehci0 and ohci0 and a function: They connect to the otg controller using 
"otg-controller" property.
- ehci1 and ohci1: No "otg-controller" property.
- ehci2 and ohci2: No "otg-controller" property.

In this environment, all hosts works correctly.
Also I think if we have 2 otg controlelrs, it should be work because otg_dev 
instance differs.
Or, does this topic assume an otg controller handles 2 EHCI controllers?
I'm not sure such environment actually exists.

> > Alan, does USB core even know which EHCI and OHCI are linked to the same 
> > port
> > or the handoff is software transparent?
> 
> The core knows.  It doesn't use the information for a whole lot of
> things, but it does use it in a couple of places.  Search for
> "companion" in core/hcd-pci.c and you'll see.

Thank you for the information. I didn't know this code.
If my understanding is correct, the core/hcd-pci.c code will not be used by 
non-PCI devices.
In other words, nobody sets "hcd->self.hs_companion" if we use such a device.
So, I will try to add such a code if needed.

Best regards,
Yoshihiro Shimoda

> Alan Stern



Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-05-11 Thread Alexandre Belloni
Hi,

On 12/05/2016 at 09:05:34 +0800, Wenyou Yang wrote :
> In order to get lower consumption, as a workaround, suspend
> the USB PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> Configuration Register while OHCI USB suspending.
> 

Why is that a workaround?

> This suspend operation must be done before stopping the USB clock,
> resume after the USB clock enabled.
> 
> Signed-off-by: Wenyou Yang 
> ---
> 
>  drivers/usb/host/ohci-at91.c | 63 
> 
>  1 file changed, 63 insertions(+)
> 
> diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c
> index d177372..ce898e0 100644
> --- a/drivers/usb/host/ohci-at91.c
> +++ b/drivers/usb/host/ohci-at91.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  
> @@ -51,6 +53,7 @@ struct ohci_at91_priv {
>   struct clk *hclk;
>   bool clocked;
>   bool wakeup;/* Saved wake-up state for resume */
> + struct regmap *sfr_regmap;
>  };
>  /* interface and function clocks; sometimes also an AHB clock */
>  
> @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device *pdev)
>  
>  /*-*/
>  
> +struct regmap *at91_dt_syscon_sfr(void)
> +{
> + struct regmap *regmap;
> +
> + regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> + if (IS_ERR(regmap))
> + return NULL;
> +
> + return regmap;
> +}
> +
>  static void usb_hcd_at91_remove (struct usb_hcd *, struct platform_device *);
>  
>  /* configure so an HC device and id are always provided */
> @@ -197,6 +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver 
> *driver,
>   goto err;
>   }
>  
> + ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> +
>   board = hcd->self.controller->platform_data;
>   ohci = hcd_to_ohci(hcd);
>   ohci->num_ports = board->ports;
> @@ -581,6 +597,49 @@ static int ohci_hcd_at91_drv_remove(struct 
> platform_device *pdev)
>   return 0;
>  }
>  
> +#define SFR_OHCIICR  0x10
> +#define SFR_OHCIICR_SUSPEND_ABIT(8)
> +#define SFR_OHCIICR_SUSPEND_BBIT(9)
> +#define SFR_OHCIICR_SUSPEND_CBIT(10)
> +
> +#define SFR_OHCIICR_USB_SUSPEND  (SFR_OHCIICR_SUSPEND_A | \
> +  SFR_OHCIICR_SUSPEND_B | \
> +  SFR_OHCIICR_SUSPEND_C)
> +

Those defines should go in a header file either in include/soc/at91 or
in include/linux/mfd

> +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable)
> +{
> + u32 regval;
> + int ret;
> +
> + if (IS_ERR(regmap))
> + return PTR_ERR(regmap);
> +
> + ret = regmap_read(regmap, SFR_OHCIICR, );
> + if (ret)
> + return ret;
> +
> + if (enable)
> + regval &= ~SFR_OHCIICR_USB_SUSPEND;
> + else
> + regval |= SFR_OHCIICR_USB_SUSPEND;
> +
> + regmap_write(regmap, SFR_OHCIICR, regval);
> +
> + regmap_read(regmap, SFR_OHCIICR, );
> +
> + return 0;
> +}
> +
> +static int ohci_at91_port_suspend(struct regmap *regmap)
> +{
> + return ohci_at91_port_ctrl(regmap, false);
> +}
> +
> +static int ohci_at91_port_resume(struct regmap *regmap)
> +{
> + return ohci_at91_port_ctrl(regmap, true);
> +}
> +
>  static int __maybe_unused
>  ohci_hcd_at91_drv_suspend(struct device *dev)
>  {
> @@ -618,6 +677,8 @@ ohci_hcd_at91_drv_suspend(struct device *dev)
>   ohci_writel(ohci, ohci->hc_control, >regs->control);
>   ohci->rh_state = OHCI_RH_HALTED;
>  
> + ohci_at91_port_suspend(ohci_at91->sfr_regmap);
> +

The main issue I see here is that you will call that function for all
SoCs and it will always fail unless it is running on a sama5d2. Maybe we
could be smarter about that.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Re: [PATCH] usb: ohci-at91: Suspend the ports while USB suspending

2016-05-11 Thread Alexandre Belloni
Hi,

On 12/05/2016 at 09:05:34 +0800, Wenyou Yang wrote :
> In order to get lower consumption, as a workaround, suspend
> the USB PORTA/B/C via set the SUSPEND_A/B/C bits of OHCI Interrupt
> Configuration Register while OHCI USB suspending.
> 

Why is that a workaround?

> This suspend operation must be done before stopping the USB clock,
> resume after the USB clock enabled.
> 
> Signed-off-by: Wenyou Yang 
> ---
> 
>  drivers/usb/host/ohci-at91.c | 63 
> 
>  1 file changed, 63 insertions(+)
> 
> diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c
> index d177372..ce898e0 100644
> --- a/drivers/usb/host/ohci-at91.c
> +++ b/drivers/usb/host/ohci-at91.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  
> @@ -51,6 +53,7 @@ struct ohci_at91_priv {
>   struct clk *hclk;
>   bool clocked;
>   bool wakeup;/* Saved wake-up state for resume */
> + struct regmap *sfr_regmap;
>  };
>  /* interface and function clocks; sometimes also an AHB clock */
>  
> @@ -132,6 +135,17 @@ static void at91_stop_hc(struct platform_device *pdev)
>  
>  /*-*/
>  
> +struct regmap *at91_dt_syscon_sfr(void)
> +{
> + struct regmap *regmap;
> +
> + regmap = syscon_regmap_lookup_by_compatible("atmel,sama5d2-sfr");
> + if (IS_ERR(regmap))
> + return NULL;
> +
> + return regmap;
> +}
> +
>  static void usb_hcd_at91_remove (struct usb_hcd *, struct platform_device *);
>  
>  /* configure so an HC device and id are always provided */
> @@ -197,6 +211,8 @@ static int usb_hcd_at91_probe(const struct hc_driver 
> *driver,
>   goto err;
>   }
>  
> + ohci_at91->sfr_regmap = at91_dt_syscon_sfr();
> +
>   board = hcd->self.controller->platform_data;
>   ohci = hcd_to_ohci(hcd);
>   ohci->num_ports = board->ports;
> @@ -581,6 +597,49 @@ static int ohci_hcd_at91_drv_remove(struct 
> platform_device *pdev)
>   return 0;
>  }
>  
> +#define SFR_OHCIICR  0x10
> +#define SFR_OHCIICR_SUSPEND_ABIT(8)
> +#define SFR_OHCIICR_SUSPEND_BBIT(9)
> +#define SFR_OHCIICR_SUSPEND_CBIT(10)
> +
> +#define SFR_OHCIICR_USB_SUSPEND  (SFR_OHCIICR_SUSPEND_A | \
> +  SFR_OHCIICR_SUSPEND_B | \
> +  SFR_OHCIICR_SUSPEND_C)
> +

Those defines should go in a header file either in include/soc/at91 or
in include/linux/mfd

> +static int ohci_at91_port_ctrl(struct regmap *regmap, bool enable)
> +{
> + u32 regval;
> + int ret;
> +
> + if (IS_ERR(regmap))
> + return PTR_ERR(regmap);
> +
> + ret = regmap_read(regmap, SFR_OHCIICR, );
> + if (ret)
> + return ret;
> +
> + if (enable)
> + regval &= ~SFR_OHCIICR_USB_SUSPEND;
> + else
> + regval |= SFR_OHCIICR_USB_SUSPEND;
> +
> + regmap_write(regmap, SFR_OHCIICR, regval);
> +
> + regmap_read(regmap, SFR_OHCIICR, );
> +
> + return 0;
> +}
> +
> +static int ohci_at91_port_suspend(struct regmap *regmap)
> +{
> + return ohci_at91_port_ctrl(regmap, false);
> +}
> +
> +static int ohci_at91_port_resume(struct regmap *regmap)
> +{
> + return ohci_at91_port_ctrl(regmap, true);
> +}
> +
>  static int __maybe_unused
>  ohci_hcd_at91_drv_suspend(struct device *dev)
>  {
> @@ -618,6 +677,8 @@ ohci_hcd_at91_drv_suspend(struct device *dev)
>   ohci_writel(ohci, ohci->hc_control, >regs->control);
>   ohci->rh_state = OHCI_RH_HALTED;
>  
> + ohci_at91_port_suspend(ohci_at91->sfr_regmap);
> +

The main issue I see here is that you will call that function for all
SoCs and it will always fail unless it is running on a sama5d2. Maybe we
could be smarter about that.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


Re: [PATCH 20/25] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-05-11 Thread Zhangjian (Bamvor)

Hi, Arnd

On 2016/5/11 22:50, Arnd Bergmann wrote:

On Wednesday 11 May 2016 19:16:44 Zhangjian wrote:

Hi,

On 2016/5/11 18:12, Zhangjian (Bamvor) wrote:

Hi, Arnd

On 2016/5/11 16:09, Arnd Bergmann wrote:
  > On Wednesday 11 May 2016 10:04:16 Zhangjian wrote:
  >>> I don't remember. It's probably not important whether we have the shift
  >>> in there, as long as it's independent of the actual kernel page size and
  >>> user space and kernel agree on the calling conventions.
  >> Well. I am ok with where to shift the pages size because we get the same
  >> result. I was just thinking if we should get rid of the name of mmap2 in 
our
  >> ILP32 porting. Actually, it is mmap but we name it as mmap2. User may 
confused
  >> if they do not know the implementations.
  >
  > That is a good point: If the implementation matches the mmap() behavior 
rather than
  > mmap2(), we should rename the macro by doing
  >
  > #undef __NR_mmap2
  > #define __NR_mmap 222
  >
  > in the uapi/asm/unistd.h file for ilp32 mode.
Do you mean define the following things in kernel:
```
diff --git a/arch/arm64/include/uapi/asm/unistd.h 
b/arch/arm64/include/uapi/asm/unistd.h
index 1caadc2..3f79640 100644
--- a/arch/arm64/include/uapi/asm/unistd.h
+++ b/arch/arm64/include/uapi/asm/unistd.h
@@ -14,3 +14,9 @@
* along with this program.  If not, see .
*/
   #include 
+
+#ifdef __ILP32__
+#undef __NR_mmap2
+#define __NR_mmap 222
+#endif /* #ifdef __ILP32__ */
+
```
Then glibc could call mmap instead of mmap2.
I could not try it now. Because after change off_t to 64bit in glibc, stat
is fail. I may need to revert the stat relative patch.

After revert stat relative patch in glibc, mmap01-mmap14 success. But mmap16
success with segfault. I will investigate it later.

There is pointer and size_t in mmap, so, IIUC, we need to clear the top halves
of register by using COMPAT_SYSCALL_WRAP6.


Correct, good catch!


And after check the function in
arch/s390/kernel/compat_linux.c, I feel that we need to do the same thing for
pread64 and pwrite64.




But I got following error when I try to add
COMPAT_SYSCALL_WRAP4(pread64, unsigned int, fd, char __user *, buf,
size_t, count, loff_t, pos);
COMPAT_SYSCALL_WRAP4(pwrite64, unsigned int, fd, const char __user *, buf,
size_t, count, loff_t, pos);



Hmm, that is indeed tricky. I think COMPAT_SYSCALL_WRAP4 rightfully
refuses the loff_t argument here, as the common case is that this is
not possible.

It works if I apply the following patch, I defined the wrong `__TYPE_IS_xxx`
yesterday. Should we merge this into ILP32 series or send the compat.h
and syscalls.h individually? The current series of ILP32 is a little bit
long and hard to review.
diff --git a/include/linux/compat.h b/include/linux/compat.h
index ba6ebe0..22a9565 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -747,7 +747,8 @@ asmlinkage long compat_sys_fanotify_mark(int, unsigned int, 
__u32, __u32,
 #ifndef __SC_COMPAT_CAST
 #define __SC_COMPAT_CAST(t, a) ({  \
BUILD_BUG_ON((sizeof(t) > 4) && !__TYPE_IS_L(t) &&  \
-!__TYPE_IS_UL(t) && !__TYPE_IS_PTR(t));\
+!__TYPE_IS_UL(t) && !__TYPE_IS_PTR(t) &&   \
+!__TYPE_IS_LOFFT(t));  \
((t) ((t)(-1) < 0 ? (s64)(s32)(a) : (u64)(u32)(a)));\
 })
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 6e57d9c..66eb85d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -47,6 +47,7 @@
 #define __TYPE_IS_L(t) (__same_type((t)0, 0L))
 #define __TYPE_IS_UL(t)(__same_type((t)0, 0UL))
 #define __TYPE_IS_LL(t) (__same_type((t)0, 0LL) || __same_type((t)0, 0ULL))
+#define __TYPE_IS_LOFFT(t) (__same_type((t)0, (loff_t)0))
 #define __SC_LONG(t, a) __typeof(__builtin_choose_expr(__TYPE_IS_LL(t), 0LL, 
0L)) a
 #define __SC_CAST(t, a)(t) a
 #define __SC_ARGS(t, a)a
diff --git a/kernel/compat_wrapper.c b/kernel/compat_wrapper.c
index 98b68b8..28f02d0 100644
--- a/kernel/compat_wrapper.c
+++ b/kernel/compat_wrapper.c
@@ -304,3 +304,7 @@ COMPAT_SYSCALL_WRAP3(getpeername, int, fd, struct sockaddr 
__user *, usockaddr,
 COMPAT_SYSCALL_WRAP6(sendto, int, fd, void __user *, buff, size_t, len,
 unsigned int, flags, struct sockaddr __user *, addr,
 int, addr_len);
+COMPAT_SYSCALL_WRAP4(pread64, unsigned int, fd, char __user *, buf,
+   size_t, count, loff_t, pos);
+COMPAT_SYSCALL_WRAP4(pwrite64, unsigned int, fd, const char __user *, buf,
+size_t, count, loff_t, pos);
>
> Can you open-code this using a COMPAT_SYSCALL4 definition similar to what
> arch/tile has, but without the merging of the two halves of the argument?
I am lost here. Tile do not use the wrapper, and it do not use the loff_t
either:
COMPAT_SYSCALL_DEFINE6(pread64, 

Re: [PATCH 20/25] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-05-11 Thread Zhangjian (Bamvor)

Hi, Arnd

On 2016/5/11 22:50, Arnd Bergmann wrote:

On Wednesday 11 May 2016 19:16:44 Zhangjian wrote:

Hi,

On 2016/5/11 18:12, Zhangjian (Bamvor) wrote:

Hi, Arnd

On 2016/5/11 16:09, Arnd Bergmann wrote:
  > On Wednesday 11 May 2016 10:04:16 Zhangjian wrote:
  >>> I don't remember. It's probably not important whether we have the shift
  >>> in there, as long as it's independent of the actual kernel page size and
  >>> user space and kernel agree on the calling conventions.
  >> Well. I am ok with where to shift the pages size because we get the same
  >> result. I was just thinking if we should get rid of the name of mmap2 in 
our
  >> ILP32 porting. Actually, it is mmap but we name it as mmap2. User may 
confused
  >> if they do not know the implementations.
  >
  > That is a good point: If the implementation matches the mmap() behavior 
rather than
  > mmap2(), we should rename the macro by doing
  >
  > #undef __NR_mmap2
  > #define __NR_mmap 222
  >
  > in the uapi/asm/unistd.h file for ilp32 mode.
Do you mean define the following things in kernel:
```
diff --git a/arch/arm64/include/uapi/asm/unistd.h 
b/arch/arm64/include/uapi/asm/unistd.h
index 1caadc2..3f79640 100644
--- a/arch/arm64/include/uapi/asm/unistd.h
+++ b/arch/arm64/include/uapi/asm/unistd.h
@@ -14,3 +14,9 @@
* along with this program.  If not, see .
*/
   #include 
+
+#ifdef __ILP32__
+#undef __NR_mmap2
+#define __NR_mmap 222
+#endif /* #ifdef __ILP32__ */
+
```
Then glibc could call mmap instead of mmap2.
I could not try it now. Because after change off_t to 64bit in glibc, stat
is fail. I may need to revert the stat relative patch.

After revert stat relative patch in glibc, mmap01-mmap14 success. But mmap16
success with segfault. I will investigate it later.

There is pointer and size_t in mmap, so, IIUC, we need to clear the top halves
of register by using COMPAT_SYSCALL_WRAP6.


Correct, good catch!


And after check the function in
arch/s390/kernel/compat_linux.c, I feel that we need to do the same thing for
pread64 and pwrite64.




But I got following error when I try to add
COMPAT_SYSCALL_WRAP4(pread64, unsigned int, fd, char __user *, buf,
size_t, count, loff_t, pos);
COMPAT_SYSCALL_WRAP4(pwrite64, unsigned int, fd, const char __user *, buf,
size_t, count, loff_t, pos);



Hmm, that is indeed tricky. I think COMPAT_SYSCALL_WRAP4 rightfully
refuses the loff_t argument here, as the common case is that this is
not possible.

It works if I apply the following patch, I defined the wrong `__TYPE_IS_xxx`
yesterday. Should we merge this into ILP32 series or send the compat.h
and syscalls.h individually? The current series of ILP32 is a little bit
long and hard to review.
diff --git a/include/linux/compat.h b/include/linux/compat.h
index ba6ebe0..22a9565 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -747,7 +747,8 @@ asmlinkage long compat_sys_fanotify_mark(int, unsigned int, 
__u32, __u32,
 #ifndef __SC_COMPAT_CAST
 #define __SC_COMPAT_CAST(t, a) ({  \
BUILD_BUG_ON((sizeof(t) > 4) && !__TYPE_IS_L(t) &&  \
-!__TYPE_IS_UL(t) && !__TYPE_IS_PTR(t));\
+!__TYPE_IS_UL(t) && !__TYPE_IS_PTR(t) &&   \
+!__TYPE_IS_LOFFT(t));  \
((t) ((t)(-1) < 0 ? (s64)(s32)(a) : (u64)(u32)(a)));\
 })
 #endif
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 6e57d9c..66eb85d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -47,6 +47,7 @@
 #define __TYPE_IS_L(t) (__same_type((t)0, 0L))
 #define __TYPE_IS_UL(t)(__same_type((t)0, 0UL))
 #define __TYPE_IS_LL(t) (__same_type((t)0, 0LL) || __same_type((t)0, 0ULL))
+#define __TYPE_IS_LOFFT(t) (__same_type((t)0, (loff_t)0))
 #define __SC_LONG(t, a) __typeof(__builtin_choose_expr(__TYPE_IS_LL(t), 0LL, 
0L)) a
 #define __SC_CAST(t, a)(t) a
 #define __SC_ARGS(t, a)a
diff --git a/kernel/compat_wrapper.c b/kernel/compat_wrapper.c
index 98b68b8..28f02d0 100644
--- a/kernel/compat_wrapper.c
+++ b/kernel/compat_wrapper.c
@@ -304,3 +304,7 @@ COMPAT_SYSCALL_WRAP3(getpeername, int, fd, struct sockaddr 
__user *, usockaddr,
 COMPAT_SYSCALL_WRAP6(sendto, int, fd, void __user *, buff, size_t, len,
 unsigned int, flags, struct sockaddr __user *, addr,
 int, addr_len);
+COMPAT_SYSCALL_WRAP4(pread64, unsigned int, fd, char __user *, buf,
+   size_t, count, loff_t, pos);
+COMPAT_SYSCALL_WRAP4(pwrite64, unsigned int, fd, const char __user *, buf,
+size_t, count, loff_t, pos);
>
> Can you open-code this using a COMPAT_SYSCALL4 definition similar to what
> arch/tile has, but without the merging of the two halves of the argument?
I am lost here. Tile do not use the wrapper, and it do not use the loff_t
either:
COMPAT_SYSCALL_DEFINE6(pread64, 

Re: [linux-sunxi] Re: [PATCH v4 01/11] clk: sunxi: Add display and TCON0 clocks driver

2016-05-11 Thread Priit Laes
On Wed, 2016-05-11 at 15:15 -0700, Stephen Boyd wrote:
> On 05/10, Priit Laes wrote:
> > 
> > On Mon, 2016-05-09 at 15:39 -0700, Stephen Boyd wrote:
> > > 
> > > On 05/09, Stephen Boyd wrote:
> > > > 
> > > > 
> > > > 
> > > > Ok I applied this one to clk-next.
> > > > 
> > > And I squashed this in to silence the following checker warning.
> > > 
> > > drivers/clk/sunxi/clk-sun4i-display.c:110:33: warning: Variable
> > > length array is used.
> > > 
> > > ---8<---
> > > diff --git a/drivers/clk/sunxi/clk-sun4i-display.c
> > > b/drivers/clk/sunxi/clk-sun4i-display.c
> > > index f02e250e64ed..f8ff6c4a5633 100644
> > > --- a/drivers/clk/sunxi/clk-sun4i-display.c
> > > +++ b/drivers/clk/sunxi/clk-sun4i-display.c
> > > @@ -107,7 +107,7 @@ static int
> > > sun4i_a10_display_reset_xlate(struct
> > > reset_controller_dev *rcdev,
> > >  static void __init sun4i_a10_display_init(struct device_node
> > > *node,
> > >     const struct
> > > sun4i_a10_display_clk_data *data)
> > >  {
> > > - const char *parents[data->parents];
> > > + const char *parents[4];
> > This change breaks at least de_[bf]e clocks which have 3 clock
> > parents.

> I just used the largest data->parents number, which was 4. How
> does that break anything?

If you look at the sun4i_a10_display_init, it contains this block:

    ret = of_clk_parent_fill(node, parents, ARRAY_SIZE(parents));
if (ret != ARRAY_SIZE(parents)) {
pr_err("%s: Could not retrieve the parents\n", clk_name);
goto unmap;
}

of_clk_parent_fill returns 3 for de_be/de_fe nodes, and
ARRAY_SIZE(parents) is 4.

Päikest,
Priit :)


Re: [linux-sunxi] Re: [PATCH v4 01/11] clk: sunxi: Add display and TCON0 clocks driver

2016-05-11 Thread Priit Laes
On Wed, 2016-05-11 at 15:15 -0700, Stephen Boyd wrote:
> On 05/10, Priit Laes wrote:
> > 
> > On Mon, 2016-05-09 at 15:39 -0700, Stephen Boyd wrote:
> > > 
> > > On 05/09, Stephen Boyd wrote:
> > > > 
> > > > 
> > > > 
> > > > Ok I applied this one to clk-next.
> > > > 
> > > And I squashed this in to silence the following checker warning.
> > > 
> > > drivers/clk/sunxi/clk-sun4i-display.c:110:33: warning: Variable
> > > length array is used.
> > > 
> > > ---8<---
> > > diff --git a/drivers/clk/sunxi/clk-sun4i-display.c
> > > b/drivers/clk/sunxi/clk-sun4i-display.c
> > > index f02e250e64ed..f8ff6c4a5633 100644
> > > --- a/drivers/clk/sunxi/clk-sun4i-display.c
> > > +++ b/drivers/clk/sunxi/clk-sun4i-display.c
> > > @@ -107,7 +107,7 @@ static int
> > > sun4i_a10_display_reset_xlate(struct
> > > reset_controller_dev *rcdev,
> > >  static void __init sun4i_a10_display_init(struct device_node
> > > *node,
> > >     const struct
> > > sun4i_a10_display_clk_data *data)
> > >  {
> > > - const char *parents[data->parents];
> > > + const char *parents[4];
> > This change breaks at least de_[bf]e clocks which have 3 clock
> > parents.

> I just used the largest data->parents number, which was 4. How
> does that break anything?

If you look at the sun4i_a10_display_init, it contains this block:

    ret = of_clk_parent_fill(node, parents, ARRAY_SIZE(parents));
if (ret != ARRAY_SIZE(parents)) {
pr_err("%s: Could not retrieve the parents\n", clk_name);
goto unmap;
}

of_clk_parent_fill returns 3 for de_be/de_fe nodes, and
ARRAY_SIZE(parents) is 4.

Päikest,
Priit :)


Re: [PATCH] sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()

2016-05-11 Thread Xunlei Pang
On 2016/05/11 at 14:49, Peter Zijlstra wrote:
> On Tue, May 10, 2016 at 11:19:44AM -0700, bseg...@google.com wrote:
>> Xunlei Pang  writes:
>>
>>> Two minor fixes for cfs_rq_clock_task().
>>> 1) If cfs_rq is currently being throttled, we need to subtract the cfs
>>>throttled clock time.
>>>
>>> 2) Make "throttled_clock_task_time" update SMP unrelated. Now UP cases
>>>need it as well.
>>>
>>> Signed-off-by: Xunlei Pang 
>>> ---
>>>  kernel/sched/fair.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 1708729e..fb80a12 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -3655,7 +3655,7 @@ static inline struct cfs_bandwidth 
>>> *tg_cfs_bandwidth(struct task_group *tg)
>>>  static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
>>>  {
>>> if (unlikely(cfs_rq->throttle_count))
>>> -   return cfs_rq->throttled_clock_task;
>>> +   return cfs_rq->throttled_clock_task - 
>>> cfs_rq->throttled_clock_task_time;
>>>  
>>> return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
>>>  }
> The alternative is obviously to do the subtraction in
> tg_throttle_down(), were we set ->throttled_clock_task.

It is possible, but throttled_clock_task is a timestamp, I think doing it here 
is semantically better.

>
>>> @@ -3793,13 +3793,11 @@ static int tg_unthrottle_up(struct task_group *tg, 
>>> void *data)
>>> struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
>>>  
>>> cfs_rq->throttle_count--;
>>> -#ifdef CONFIG_SMP
>>> if (!cfs_rq->throttle_count) {
>>> /* adjust cfs_rq_clock_task() */
>>> cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
>>>  cfs_rq->throttled_clock_task;
>>> }
>>> -#endif
>>>  
>>> return 0;
>>>  }
>> [Cc: p...@google.com]
>>
>> This looks reasonable to me (at least the first part; I'm not
>> certain why the CONFIG_SMP ifdef was put in place).
> 64660c864f46 ("sched: Prevent interactions with throttled entities")
>
> Introduced it, because at that time it was about updating shares, which
> is only present on SMP. Then:
>
> f1b17280efbd ("sched: Maintain runnable averages across throttled periods")
>
> Added the clock thing inside it, and:
>
> 82958366cfea ("sched: Replace update_shares weight distribution with 
> per-entity computation")
>
> took out the shares update and left the clock update, resulting in the
> current code.
>
>

Thanks,
Xunlei



Re: [PATCH] sched/fair: Fix the wrong throttled clock time for cfs_rq_clock_task()

2016-05-11 Thread Xunlei Pang
On 2016/05/11 at 14:49, Peter Zijlstra wrote:
> On Tue, May 10, 2016 at 11:19:44AM -0700, bseg...@google.com wrote:
>> Xunlei Pang  writes:
>>
>>> Two minor fixes for cfs_rq_clock_task().
>>> 1) If cfs_rq is currently being throttled, we need to subtract the cfs
>>>throttled clock time.
>>>
>>> 2) Make "throttled_clock_task_time" update SMP unrelated. Now UP cases
>>>need it as well.
>>>
>>> Signed-off-by: Xunlei Pang 
>>> ---
>>>  kernel/sched/fair.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 1708729e..fb80a12 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -3655,7 +3655,7 @@ static inline struct cfs_bandwidth 
>>> *tg_cfs_bandwidth(struct task_group *tg)
>>>  static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
>>>  {
>>> if (unlikely(cfs_rq->throttle_count))
>>> -   return cfs_rq->throttled_clock_task;
>>> +   return cfs_rq->throttled_clock_task - 
>>> cfs_rq->throttled_clock_task_time;
>>>  
>>> return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
>>>  }
> The alternative is obviously to do the subtraction in
> tg_throttle_down(), were we set ->throttled_clock_task.

It is possible, but throttled_clock_task is a timestamp, I think doing it here 
is semantically better.

>
>>> @@ -3793,13 +3793,11 @@ static int tg_unthrottle_up(struct task_group *tg, 
>>> void *data)
>>> struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
>>>  
>>> cfs_rq->throttle_count--;
>>> -#ifdef CONFIG_SMP
>>> if (!cfs_rq->throttle_count) {
>>> /* adjust cfs_rq_clock_task() */
>>> cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
>>>  cfs_rq->throttled_clock_task;
>>> }
>>> -#endif
>>>  
>>> return 0;
>>>  }
>> [Cc: p...@google.com]
>>
>> This looks reasonable to me (at least the first part; I'm not
>> certain why the CONFIG_SMP ifdef was put in place).
> 64660c864f46 ("sched: Prevent interactions with throttled entities")
>
> Introduced it, because at that time it was about updating shares, which
> is only present on SMP. Then:
>
> f1b17280efbd ("sched: Maintain runnable averages across throttled periods")
>
> Added the clock thing inside it, and:
>
> 82958366cfea ("sched: Replace update_shares weight distribution with 
> per-entity computation")
>
> took out the shares update and left the clock update, resulting in the
> current code.
>
>

Thanks,
Xunlei



Re: [PATCH v2] mmc: dw_mmc: rockchip: Set the drive phase properly

2016-05-11 Thread Shawn Lin

On 2016/5/12 5:40, Douglas Anderson wrote:

Historically for Rockchip devices we've relied on the power-on
default (or perhaps the firmware setting) to get the correct drive
phase for dw_mmc devices.  This worked OK for the most part, but:

* Relying on the setting just "being right" is a bit fragile.

* As soon as there is an instance where the power on default is wrong or
  where the firmware didn't configure this properly then we'll get a
  mysterious failure.

Let's explicitly set this phase in the kernel.

The comments inside this patch try to explain the situation quite
throughly, but the high level overview of this is:

Before this patch on rk3288 devices tested:
* eMMC: 180 degrees
* SDMMC/SDIO0/SDIO1: 90 degrees

After this patch:
* Use 90 degree phase offset usually.
* Use 180 degree phase offset for MMC_DDR52, SDR104, HS200.


Thanks for doing this.

Reviewed-by: Shawn Lin



That means we are _changing_ behavior for those devices in this way:

* If we have HS200 eMMC or DDR52 eMMC, we'll run ID mode at 90
  degrees (vs 180) but otherwise have no change.

* For any non-HS200 / non-DDR52 eMMC devices we'll now _always_ run at
  90 degrees (vs 180).  It seems fairly unlikely that building modern
  hardware is using an eMMC that isn't using DDR52 or HS200, of course.

* For SDR104 cards we'll now run with 180 degree phase offset (vs 90).
  It's expected that 90 degree phase offset would have worked OK, but
  this gives us extra margin.

I have tested this by inserting my collection of uSD cards (mostly UHS,
though a few not) into a veyron_minnie and confirmed that they still
seem to enumerate properly.  For a subset of them I tried putting a
filesystem on them and also tried running mmc_test.

Signed-off-by: Douglas Anderson 
---
Changes in v2:
- Now use 90 degrees for some modes; updated comments to say why.

 drivers/mmc/host/dw_mmc-rockchip.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/mmc/host/dw_mmc-rockchip.c 
b/drivers/mmc/host/dw_mmc-rockchip.c
index 8c20b81cafd8..8068fa887db8 100644
--- a/drivers/mmc/host/dw_mmc-rockchip.c
+++ b/drivers/mmc/host/dw_mmc-rockchip.c
@@ -66,6 +66,70 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, 
struct mmc_ios *ios)
/* Make sure we use phases which we can enumerate with */
if (!IS_ERR(priv->sample_clk))
clk_set_phase(priv->sample_clk, priv->default_sample_phase);
+
+   /*
+* Set the drive phase offset based on speed mode to achieve hold times.
+*
+* That this is _not_ a value that is dynamically tuned and is also
+* _not_ a value that will vary from board to board.  It is a value
+* that could vary between different SoC models if they had massively
+* different output clock delays inside their dw_mmc IP block (delay_o),
+* but since it's OK to overshoot a little we don't need to do complex
+* calculations and can pick values that will just work for everyone.
+*
+* When picking values we'll stick with picking 0/90/180/270 since
+* those can be made very accurately on all known Rockchip SoCs.
+*
+* Note that these values match values from the DesignWare Databook
+* tables for the most part except for SDR12 and "ID mode".  For those
+* two modes the databook calculations assume a clock in of 50MHz.  As
+* seen above, we always use a clock in rate that is exactly the
+* card's input clock (times RK3288_CLKGEN_DIV, but that gets divided
+* back out before the controller sees it).
+*
+* From measurement of a single device, it appears that delay_o is
+* about .5 ns.  Since we try to leave a bit of margin, it's expected
+* that numbers here will be fine even with much larger delay_o
+* (the 1.4 ns assumed by the DesignWare Databook would result in the
+* same results, for instance).
+*/
+   if (!IS_ERR(priv->drv_clk)) {
+   int phase;
+
+   /*
+* In almost all cases a 90 degree phase offset will provide
+* sufficient hold times across all valid input clock rates
+* assuming delay_o is not absurd for a given SoC.  We'll use
+* that as a default.
+*/
+   phase = 90;
+
+   switch (ios->timing) {
+   case MMC_TIMING_MMC_DDR52:
+   /*
+* Since clock in rate with MMC_DDR52 is doubled when
+* bus width is 8 we need to double the phase offset
+* to get the same timings.
+*/
+   if (ios->bus_width == MMC_BUS_WIDTH_8)
+   phase = 180;
+   break;
+   case MMC_TIMING_UHS_SDR104:
+   

Re: [PATCH v2] mmc: dw_mmc: rockchip: Set the drive phase properly

2016-05-11 Thread Shawn Lin

On 2016/5/12 5:40, Douglas Anderson wrote:

Historically for Rockchip devices we've relied on the power-on
default (or perhaps the firmware setting) to get the correct drive
phase for dw_mmc devices.  This worked OK for the most part, but:

* Relying on the setting just "being right" is a bit fragile.

* As soon as there is an instance where the power on default is wrong or
  where the firmware didn't configure this properly then we'll get a
  mysterious failure.

Let's explicitly set this phase in the kernel.

The comments inside this patch try to explain the situation quite
throughly, but the high level overview of this is:

Before this patch on rk3288 devices tested:
* eMMC: 180 degrees
* SDMMC/SDIO0/SDIO1: 90 degrees

After this patch:
* Use 90 degree phase offset usually.
* Use 180 degree phase offset for MMC_DDR52, SDR104, HS200.


Thanks for doing this.

Reviewed-by: Shawn Lin



That means we are _changing_ behavior for those devices in this way:

* If we have HS200 eMMC or DDR52 eMMC, we'll run ID mode at 90
  degrees (vs 180) but otherwise have no change.

* For any non-HS200 / non-DDR52 eMMC devices we'll now _always_ run at
  90 degrees (vs 180).  It seems fairly unlikely that building modern
  hardware is using an eMMC that isn't using DDR52 or HS200, of course.

* For SDR104 cards we'll now run with 180 degree phase offset (vs 90).
  It's expected that 90 degree phase offset would have worked OK, but
  this gives us extra margin.

I have tested this by inserting my collection of uSD cards (mostly UHS,
though a few not) into a veyron_minnie and confirmed that they still
seem to enumerate properly.  For a subset of them I tried putting a
filesystem on them and also tried running mmc_test.

Signed-off-by: Douglas Anderson 
---
Changes in v2:
- Now use 90 degrees for some modes; updated comments to say why.

 drivers/mmc/host/dw_mmc-rockchip.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/mmc/host/dw_mmc-rockchip.c 
b/drivers/mmc/host/dw_mmc-rockchip.c
index 8c20b81cafd8..8068fa887db8 100644
--- a/drivers/mmc/host/dw_mmc-rockchip.c
+++ b/drivers/mmc/host/dw_mmc-rockchip.c
@@ -66,6 +66,70 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, 
struct mmc_ios *ios)
/* Make sure we use phases which we can enumerate with */
if (!IS_ERR(priv->sample_clk))
clk_set_phase(priv->sample_clk, priv->default_sample_phase);
+
+   /*
+* Set the drive phase offset based on speed mode to achieve hold times.
+*
+* That this is _not_ a value that is dynamically tuned and is also
+* _not_ a value that will vary from board to board.  It is a value
+* that could vary between different SoC models if they had massively
+* different output clock delays inside their dw_mmc IP block (delay_o),
+* but since it's OK to overshoot a little we don't need to do complex
+* calculations and can pick values that will just work for everyone.
+*
+* When picking values we'll stick with picking 0/90/180/270 since
+* those can be made very accurately on all known Rockchip SoCs.
+*
+* Note that these values match values from the DesignWare Databook
+* tables for the most part except for SDR12 and "ID mode".  For those
+* two modes the databook calculations assume a clock in of 50MHz.  As
+* seen above, we always use a clock in rate that is exactly the
+* card's input clock (times RK3288_CLKGEN_DIV, but that gets divided
+* back out before the controller sees it).
+*
+* From measurement of a single device, it appears that delay_o is
+* about .5 ns.  Since we try to leave a bit of margin, it's expected
+* that numbers here will be fine even with much larger delay_o
+* (the 1.4 ns assumed by the DesignWare Databook would result in the
+* same results, for instance).
+*/
+   if (!IS_ERR(priv->drv_clk)) {
+   int phase;
+
+   /*
+* In almost all cases a 90 degree phase offset will provide
+* sufficient hold times across all valid input clock rates
+* assuming delay_o is not absurd for a given SoC.  We'll use
+* that as a default.
+*/
+   phase = 90;
+
+   switch (ios->timing) {
+   case MMC_TIMING_MMC_DDR52:
+   /*
+* Since clock in rate with MMC_DDR52 is doubled when
+* bus width is 8 we need to double the phase offset
+* to get the same timings.
+*/
+   if (ios->bus_width == MMC_BUS_WIDTH_8)
+   phase = 180;
+   break;
+   case MMC_TIMING_UHS_SDR104:
+   case MMC_TIMING_MMC_HS200:
+   

Re: [PATCH v2 0/3] input: rmi4: Regulator supply support

2016-05-11 Thread Bjorn Andersson
On Wed 11 May 16:30 PDT 2016, Andrew Duggan wrote:

> Hi Bjorn,
> 
> On 05/10/2016 08:49 AM, Bjorn Andersson wrote:
[..]
> >So either we duplicate the regulator support in spi/i2c or we make them
> >optional in the core driver. Sounds like you prefer the prior, i.e. v1
> >of my patch.
> 
> Yes, after all this I think it makes sense to put regulator support in the
> spi/i2c transports like in your v1 patch. I essentially duplicated the irq
> handling code in both transports so I would be ok with duplicating regulator
> support too. It doesn't seem like that much code. But, if this is too much
> duplication we could create some sort of common file and put the common irq
> and regulator support functions which could be called in the transports.
> Similar to how rmi_2d_sensor.c defines some common functions shared between
> rmi_f11 and rmi_f12.
> 

Sounds reasonable, I'm okay with this. Did you have any comments on the
implementation I had in v1?

@Dmitry, do you want me to resend v1?

Regards,
Bjorn


Re: [PATCH v2 0/3] input: rmi4: Regulator supply support

2016-05-11 Thread Bjorn Andersson
On Wed 11 May 16:30 PDT 2016, Andrew Duggan wrote:

> Hi Bjorn,
> 
> On 05/10/2016 08:49 AM, Bjorn Andersson wrote:
[..]
> >So either we duplicate the regulator support in spi/i2c or we make them
> >optional in the core driver. Sounds like you prefer the prior, i.e. v1
> >of my patch.
> 
> Yes, after all this I think it makes sense to put regulator support in the
> spi/i2c transports like in your v1 patch. I essentially duplicated the irq
> handling code in both transports so I would be ok with duplicating regulator
> support too. It doesn't seem like that much code. But, if this is too much
> duplication we could create some sort of common file and put the common irq
> and regulator support functions which could be called in the transports.
> Similar to how rmi_2d_sensor.c defines some common functions shared between
> rmi_f11 and rmi_f12.
> 

Sounds reasonable, I'm okay with this. Did you have any comments on the
implementation I had in v1?

@Dmitry, do you want me to resend v1?

Regards,
Bjorn


Re: [PATCH 2/2] dt-bindings: rockchip-dw-mshc: add rockchip,default-drv-phase

2016-05-11 Thread Shawn Lin

在 2016/5/12 5:42, Doug Anderson 写道:

Shawn,

On Wed, May 11, 2016 at 1:25 AM, Shawn Lin  wrote:

On 2016/5/11 11:50, Doug Anderson wrote:


Hi,

On Tue, May 10, 2016 at 7:50 PM, Shawn Lin 
wrote:


maybe. But I think 180(downside) is the better.




NAK my previous comments here. Downside is better for SRD, but won't
work for DDR mode. When running in DDR mode, we should use 90 instead.

So let me elaborate a bit more here.
For DDR mode, one single clk cycle should sending two data bits outside
to the devices. We need a hold time for both. If 180 is used, the first
bit occurs around the downside area, which won't be sampled by devices
on the upside.  So on the upside, the devices will see a zero bit if you
actually send a one-bit, which makes the devices  generate CRC finally.


For this above, 180 for all SDR mode is ok, but 90 should be deployed
for DDR mode. So simply checking the timing to hardcode it should be
fine.



OK, I sent out a patch for 180 always.  I can send v2 to use 90 for
DDR modes tomorrow.  ...or feel free to post that yourself if you
want.

We want 90 for all DDR modes?  So MMC_TIMING_UHS_DDR50,
MMC_TIMING_MMC_DDR52, MMC_TIMING_MMC_HS400? (not that we support HS400
in dw_mmc on Rockchip).



Right.


OK, so I dug into this more.  I don't think it's actually quite that
simple.  The Designware databook explicitly states that 'MMC DDR
8-bit' should use 180, not 90.  ...but that's probably explained by
the fact that "cclk_in" for MMC-DDR 8-bit is double what you'd expect.
You can see that in code:

if (ios->bus_width == MMC_BUS_WIDTH_8 &&
ios->timing == MMC_TIMING_MMC_DDR52)
  cclkin = 2 * ios->clock * RK3288_CLKGEN_DIV;


Yes, this requirment is from dw_mmc databook(v270a), table 10-5.

But you can see that we didn't handle MMC_TIMING_UHS_SDR12.
As recommended, MMC_TIMING_UHS_SDR12 mode need to 50M here to make
the internal devider to be "1".
Because Figure 10-3 (Clock Structure for Host Controller) assumes Soc 
only supplies 50M/100M/200M, these three frequency selection, to 
cclk_in. So SDR12 should be 50M/2 = 25M. But for Rockchip Socs, we have 
RK3288_CLKGEN_DIV before cclk_in that means we don't follow this table.


For DDR52-8bit, it's a mandatory requirment to use 100M as cclk_in.



---

tl;dr of all the below:

* For SD: use 180 for SDR104 to get margin, use 90 elsewhere.
* For mmc, use 180 for DDR52 to meet timings, use 180 for HS200 to get
margin, use 90 elsewhere


yes, see the comments blow.




The nice thing about this is, at least for rk3288, things don't change
much from today where we use 90 degrees for most SD cards.

---

For new patch, see: 

---

Details:

So assuming measurements I saw from an EE are correct, measured
"delay_o_ns" is ~ .5 ns for rk3399.  With that, we can do some math.


Soc specific. But it's a trivial margin.


Please correct any mistakes.  Math is hard.  Note that on all Rockchip
devices I've seen pins are limited to 150 MHz and when we choose 150
MHz we end up at 148.5 MHz.  Calculations aren't too different if we
use 150 for that case.


MMC_TIMING_MMC_DDR52:
  min hold time = 2.5 ns
  min data delay = 2.5 ns + .5 ns = 3 ns
  with 100 MHz clock, 90 degree: 2.5 ns
  with 100 MHz clock, 180 degree: 5.0 ns

  ...need 180 degree



When running in DDR52-8bit, 2 cycle of cclk_in is used
for a single data bit internally. When shifting cclk_in_drv to 180,
we can see:

cclk_in        ___
__|||||||||

cclk_in_drv-180
        ___
 __|||||||||

cclk_in_for_DDR52-8bit
   _   _
__| |_|



So cclk_in_drv shifts 180 from cclk_in, which means it's just in the
90 degrees of cclk_in_for_DDR52-8bit. That is what I'd expect that when
running in DDR mode, the cclk_in_drv should be in the 90 degree of
*Actual Internal CLK*, not cclk_in always.

But fot SD DDR50-4bit, it just use cclk_in internlly, so 90 should
be okay.

So we were not in the same page of what the ref-clk is. But I think now
we are in.

Agree on all the calculation below.


--

SD DDR50:
  min hold time = .8 ns
  min data delay = .8 ns + .5 ns = 1.3 ns
  with 50 MHz clock, 90 degree: 5 ns
  with 50 MHz clock, 180 degree: 10 ns

  ...90 degree is good and 180 is massive overkill.  Use 90.

--

SD SDR104 (limited to 148.5 MHz on existing Rockchip parts):
  min hold time = .8 ns
  min data delay = .8 ns + .5 ns = 1.3 ns
  with 148.5 MHz clock, 90 degree: 1.68 ns
  with 148.5 MHz clock, 180 degree: 3.37 ns

  ...so 90 degrees would probably work OK, but 180 degrees gives more margin.
  ...probably explains why existing rk3288 devices w/ 90 degree work OK.

  if we had 200 MHz clock, 90 degree: 1.25 ns
  if we had 200 MHz clock, 180 degree: 2.50 ns

--

SD SDR50:
  min hold time = .8 

Re: [PATCH 2/2] dt-bindings: rockchip-dw-mshc: add rockchip,default-drv-phase

2016-05-11 Thread Shawn Lin

在 2016/5/12 5:42, Doug Anderson 写道:

Shawn,

On Wed, May 11, 2016 at 1:25 AM, Shawn Lin  wrote:

On 2016/5/11 11:50, Doug Anderson wrote:


Hi,

On Tue, May 10, 2016 at 7:50 PM, Shawn Lin 
wrote:


maybe. But I think 180(downside) is the better.




NAK my previous comments here. Downside is better for SRD, but won't
work for DDR mode. When running in DDR mode, we should use 90 instead.

So let me elaborate a bit more here.
For DDR mode, one single clk cycle should sending two data bits outside
to the devices. We need a hold time for both. If 180 is used, the first
bit occurs around the downside area, which won't be sampled by devices
on the upside.  So on the upside, the devices will see a zero bit if you
actually send a one-bit, which makes the devices  generate CRC finally.


For this above, 180 for all SDR mode is ok, but 90 should be deployed
for DDR mode. So simply checking the timing to hardcode it should be
fine.



OK, I sent out a patch for 180 always.  I can send v2 to use 90 for
DDR modes tomorrow.  ...or feel free to post that yourself if you
want.

We want 90 for all DDR modes?  So MMC_TIMING_UHS_DDR50,
MMC_TIMING_MMC_DDR52, MMC_TIMING_MMC_HS400? (not that we support HS400
in dw_mmc on Rockchip).



Right.


OK, so I dug into this more.  I don't think it's actually quite that
simple.  The Designware databook explicitly states that 'MMC DDR
8-bit' should use 180, not 90.  ...but that's probably explained by
the fact that "cclk_in" for MMC-DDR 8-bit is double what you'd expect.
You can see that in code:

if (ios->bus_width == MMC_BUS_WIDTH_8 &&
ios->timing == MMC_TIMING_MMC_DDR52)
  cclkin = 2 * ios->clock * RK3288_CLKGEN_DIV;


Yes, this requirment is from dw_mmc databook(v270a), table 10-5.

But you can see that we didn't handle MMC_TIMING_UHS_SDR12.
As recommended, MMC_TIMING_UHS_SDR12 mode need to 50M here to make
the internal devider to be "1".
Because Figure 10-3 (Clock Structure for Host Controller) assumes Soc 
only supplies 50M/100M/200M, these three frequency selection, to 
cclk_in. So SDR12 should be 50M/2 = 25M. But for Rockchip Socs, we have 
RK3288_CLKGEN_DIV before cclk_in that means we don't follow this table.


For DDR52-8bit, it's a mandatory requirment to use 100M as cclk_in.



---

tl;dr of all the below:

* For SD: use 180 for SDR104 to get margin, use 90 elsewhere.
* For mmc, use 180 for DDR52 to meet timings, use 180 for HS200 to get
margin, use 90 elsewhere


yes, see the comments blow.




The nice thing about this is, at least for rk3288, things don't change
much from today where we use 90 degrees for most SD cards.

---

For new patch, see: 

---

Details:

So assuming measurements I saw from an EE are correct, measured
"delay_o_ns" is ~ .5 ns for rk3399.  With that, we can do some math.


Soc specific. But it's a trivial margin.


Please correct any mistakes.  Math is hard.  Note that on all Rockchip
devices I've seen pins are limited to 150 MHz and when we choose 150
MHz we end up at 148.5 MHz.  Calculations aren't too different if we
use 150 for that case.


MMC_TIMING_MMC_DDR52:
  min hold time = 2.5 ns
  min data delay = 2.5 ns + .5 ns = 3 ns
  with 100 MHz clock, 90 degree: 2.5 ns
  with 100 MHz clock, 180 degree: 5.0 ns

  ...need 180 degree



When running in DDR52-8bit, 2 cycle of cclk_in is used
for a single data bit internally. When shifting cclk_in_drv to 180,
we can see:

cclk_in        ___
__|||||||||

cclk_in_drv-180
        ___
 __|||||||||

cclk_in_for_DDR52-8bit
   _   _
__| |_|



So cclk_in_drv shifts 180 from cclk_in, which means it's just in the
90 degrees of cclk_in_for_DDR52-8bit. That is what I'd expect that when
running in DDR mode, the cclk_in_drv should be in the 90 degree of
*Actual Internal CLK*, not cclk_in always.

But fot SD DDR50-4bit, it just use cclk_in internlly, so 90 should
be okay.

So we were not in the same page of what the ref-clk is. But I think now
we are in.

Agree on all the calculation below.


--

SD DDR50:
  min hold time = .8 ns
  min data delay = .8 ns + .5 ns = 1.3 ns
  with 50 MHz clock, 90 degree: 5 ns
  with 50 MHz clock, 180 degree: 10 ns

  ...90 degree is good and 180 is massive overkill.  Use 90.

--

SD SDR104 (limited to 148.5 MHz on existing Rockchip parts):
  min hold time = .8 ns
  min data delay = .8 ns + .5 ns = 1.3 ns
  with 148.5 MHz clock, 90 degree: 1.68 ns
  with 148.5 MHz clock, 180 degree: 3.37 ns

  ...so 90 degrees would probably work OK, but 180 degrees gives more margin.
  ...probably explains why existing rk3288 devices w/ 90 degree work OK.

  if we had 200 MHz clock, 90 degree: 1.25 ns
  if we had 200 MHz clock, 180 degree: 2.50 ns

--

SD SDR50:
  min hold time = .8 ns
  min data delay = .8 ns + .5 ns = 1.3 ns
  with 

Re: [PATCH 3/6] mm/page_owner: copy last_migrate_reason in copy_page_owner()

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 05:13:12PM +0200, Vlastimil Babka wrote:
> On 05/03/2016 07:23 AM, js1...@gmail.com wrote:
> >From: Joonsoo Kim 
> >
> >Currently, copy_page_owner() doesn't copy all the owner information.
> >It skips last_migrate_reason because copy_page_owner() is used for
> >migration and it will be properly set soon. But, following patch
> >will use copy_page_owner() and this skip will cause the problem that
> >allocated page has uninitialied last_migrate_reason. To prevent it,
> >this patch also copy last_migrate_reason in copy_page_owner().
> 
> Hmm it's a corner case, but if the "new" page was dumped e.g. due to
> a bug during the migration, is the copied migrate reason from the
> "old" page actually meaningful? I'd say it might be misleading and
> it's simpler to just make sure it's initialized to -1.

Hmm... if it is the case, other fields are also misleading. I think
that we can tolerate this corner case and keeping function semantic as
function name suggests is better practice.

Thanks.



Re: [PATCH 3/6] mm/page_owner: copy last_migrate_reason in copy_page_owner()

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 05:13:12PM +0200, Vlastimil Babka wrote:
> On 05/03/2016 07:23 AM, js1...@gmail.com wrote:
> >From: Joonsoo Kim 
> >
> >Currently, copy_page_owner() doesn't copy all the owner information.
> >It skips last_migrate_reason because copy_page_owner() is used for
> >migration and it will be properly set soon. But, following patch
> >will use copy_page_owner() and this skip will cause the problem that
> >allocated page has uninitialied last_migrate_reason. To prevent it,
> >this patch also copy last_migrate_reason in copy_page_owner().
> 
> Hmm it's a corner case, but if the "new" page was dumped e.g. due to
> a bug during the migration, is the copied migrate reason from the
> "old" page actually meaningful? I'd say it might be misleading and
> it's simpler to just make sure it's initialized to -1.

Hmm... if it is the case, other fields are also misleading. I think
that we can tolerate this corner case and keeping function semantic as
function name suggests is better practice.

Thanks.



Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread Hongzhou Yang
On Wed, 2016-05-11 at 19:09 -0700, Hongzhou Yang wrote:
> On Thu, 2016-05-12 at 09:41 +0800, chunfeng yun wrote:
> > Hi,
> > 
> > On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> > > On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > > > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> > > >  wrote:
> > > > 
> > > > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > > > fixed when set its default mode as usb iddig.
> > > > >
> > > > > Signed-off-by: Chunfeng Yun 
> > > > 
> > > 
> > > Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> > > If you want to set its default mode to iddig, you should set it in dts.
> > > 
> > I set it in DTS, but it didn't work, because when usb driver requested
> > IRQ, pinmux was switched back to default mode set by
> > MTK_EINT_FUNCTION().
> > 
> 
> After confirmed, there are something wrong with data sheet and pinmux
> table, and GPIO16 can only receive interrupt by mode 1. So
> 
> Acked-by: Hongzhou Yang 
> 

Linus,

We find there are some other pins still have the same problem, so please
hold on it. Sorry for so much noise.

Thanks,
Hongzhou



Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread Hongzhou Yang
On Wed, 2016-05-11 at 19:09 -0700, Hongzhou Yang wrote:
> On Thu, 2016-05-12 at 09:41 +0800, chunfeng yun wrote:
> > Hi,
> > 
> > On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> > > On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > > > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> > > >  wrote:
> > > > 
> > > > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > > > fixed when set its default mode as usb iddig.
> > > > >
> > > > > Signed-off-by: Chunfeng Yun 
> > > > 
> > > 
> > > Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> > > If you want to set its default mode to iddig, you should set it in dts.
> > > 
> > I set it in DTS, but it didn't work, because when usb driver requested
> > IRQ, pinmux was switched back to default mode set by
> > MTK_EINT_FUNCTION().
> > 
> 
> After confirmed, there are something wrong with data sheet and pinmux
> table, and GPIO16 can only receive interrupt by mode 1. So
> 
> Acked-by: Hongzhou Yang 
> 

Linus,

We find there are some other pins still have the same problem, so please
hold on it. Sorry for so much noise.

Thanks,
Hongzhou



linux-next: manual merge of the kvm tree with the tip tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  drivers/irqchip/irq-gic.c

between commits:

  dc9722cc57eb ("irqchip/gic: Return an error if GIC initialisation fails")
  f673b9b5cb54 ("irqchip/gic: Store GIC configuration parameters")
  d6490461a102 ("irqchip/gic: Add helper functions for GIC setup and teardown")

from the tip tree and commits:

  bafa9193d00c ("irqchip/gic-v2: Gather ACPI specific data in a single 
structure")
  502d6df11ae3 ("irqchip/gic-v2: Parse and export virtual GIC information")

from the kvm tree.

I fixed it up (hopefully - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/irq-gic.c
index 1de20e14a721,3f1d9fd3a462..
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@@ -1248,30 -1191,29 +1250,53 @@@ static bool gic_check_eoimode(struct de
return true;
  }
  
 +static int gic_of_setup(struct gic_chip_data *gic, struct device_node *node)
 +{
 +  if (!gic || !node)
 +  return -EINVAL;
 +
 +  gic->raw_dist_base = of_iomap(node, 0);
 +  if (WARN(!gic->raw_dist_base, "unable to map gic dist registers\n"))
 +  goto error;
 +
 +  gic->raw_cpu_base = of_iomap(node, 1);
 +  if (WARN(!gic->raw_cpu_base, "unable to map gic cpu registers\n"))
 +  goto error;
 +
 +  if (of_property_read_u32(node, "cpu-offset", >percpu_offset))
 +  gic->percpu_offset = 0;
 +
 +  return 0;
 +
 +error:
 +  gic_teardown(gic);
 +
 +  return -ENOMEM;
 +}
 +
+ static void __init gic_of_setup_kvm_info(struct device_node *node)
+ {
+   int ret;
+   struct resource *vctrl_res = _v2_kvm_info.vctrl;
+   struct resource *vcpu_res = _v2_kvm_info.vcpu;
+ 
+   gic_v2_kvm_info.type = GIC_V2;
+ 
+   gic_v2_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
+   if (!gic_v2_kvm_info.maint_irq)
+   return;
+ 
+   ret = of_address_to_resource(node, 2, vctrl_res);
+   if (ret)
+   return;
+ 
+   ret = of_address_to_resource(node, 3, vcpu_res);
+   if (ret)
+   return;
+ 
+   gic_set_kvm_info(_v2_kvm_info);
+ }
+ 
  int __init
  gic_of_init(struct device_node *node, struct device_node *parent)
  {
@@@ -1294,17 -1235,18 +1319,19 @@@
 * Disable split EOI/Deactivate if either HYP is not available
 * or the CPU interface is too small.
 */
 -  if (gic_cnt == 0 && !gic_check_eoimode(node, _base))
 +  if (gic_cnt == 0 && !gic_check_eoimode(node, >raw_cpu_base))
static_key_slow_dec(_deactivate);
  
 -  if (of_property_read_u32(node, "cpu-offset", _offset))
 -  percpu_offset = 0;
 +  ret = __gic_init_bases(gic, -1, >fwnode);
 +  if (ret) {
 +  gic_teardown(gic);
 +  return ret;
 +  }
  
-   if (!gic_cnt)
 -  __gic_init_bases(gic_cnt, -1, dist_base, cpu_base, percpu_offset,
 -   >fwnode);
+   if (!gic_cnt) {
gic_init_physaddr(node);
+   gic_of_setup_kvm_info(node);
+   }
  
if (parent) {
irq = irq_of_parse_and_map(node, 0);
@@@ -1401,8 -1391,8 +1476,8 @@@ static int __init gic_v2_acpi_init(stru
return -EINVAL;
}
  
-   gic->raw_cpu_base = ioremap(cpu_phy_base, ACPI_GIC_CPU_IF_MEM_SIZE);
 -  cpu_base = ioremap(acpi_data.cpu_phys_base, ACPI_GIC_CPU_IF_MEM_SIZE);
 -  if (!cpu_base) {
++  gic->raw_cpu_base = ioremap(acpi_data.cpu_phys_base, 
ACPI_GIC_CPU_IF_MEM_SIZE);
 +  if (!gic->raw_cpu_base) {
pr_err("Unable to map GICC registers\n");
return -ENOMEM;
}


linux-next: manual merge of the kvm tree with the tip tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  drivers/irqchip/irq-gic.c

between commits:

  dc9722cc57eb ("irqchip/gic: Return an error if GIC initialisation fails")
  f673b9b5cb54 ("irqchip/gic: Store GIC configuration parameters")
  d6490461a102 ("irqchip/gic: Add helper functions for GIC setup and teardown")

from the tip tree and commits:

  bafa9193d00c ("irqchip/gic-v2: Gather ACPI specific data in a single 
structure")
  502d6df11ae3 ("irqchip/gic-v2: Parse and export virtual GIC information")

from the kvm tree.

I fixed it up (hopefully - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/irq-gic.c
index 1de20e14a721,3f1d9fd3a462..
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@@ -1248,30 -1191,29 +1250,53 @@@ static bool gic_check_eoimode(struct de
return true;
  }
  
 +static int gic_of_setup(struct gic_chip_data *gic, struct device_node *node)
 +{
 +  if (!gic || !node)
 +  return -EINVAL;
 +
 +  gic->raw_dist_base = of_iomap(node, 0);
 +  if (WARN(!gic->raw_dist_base, "unable to map gic dist registers\n"))
 +  goto error;
 +
 +  gic->raw_cpu_base = of_iomap(node, 1);
 +  if (WARN(!gic->raw_cpu_base, "unable to map gic cpu registers\n"))
 +  goto error;
 +
 +  if (of_property_read_u32(node, "cpu-offset", >percpu_offset))
 +  gic->percpu_offset = 0;
 +
 +  return 0;
 +
 +error:
 +  gic_teardown(gic);
 +
 +  return -ENOMEM;
 +}
 +
+ static void __init gic_of_setup_kvm_info(struct device_node *node)
+ {
+   int ret;
+   struct resource *vctrl_res = _v2_kvm_info.vctrl;
+   struct resource *vcpu_res = _v2_kvm_info.vcpu;
+ 
+   gic_v2_kvm_info.type = GIC_V2;
+ 
+   gic_v2_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
+   if (!gic_v2_kvm_info.maint_irq)
+   return;
+ 
+   ret = of_address_to_resource(node, 2, vctrl_res);
+   if (ret)
+   return;
+ 
+   ret = of_address_to_resource(node, 3, vcpu_res);
+   if (ret)
+   return;
+ 
+   gic_set_kvm_info(_v2_kvm_info);
+ }
+ 
  int __init
  gic_of_init(struct device_node *node, struct device_node *parent)
  {
@@@ -1294,17 -1235,18 +1319,19 @@@
 * Disable split EOI/Deactivate if either HYP is not available
 * or the CPU interface is too small.
 */
 -  if (gic_cnt == 0 && !gic_check_eoimode(node, _base))
 +  if (gic_cnt == 0 && !gic_check_eoimode(node, >raw_cpu_base))
static_key_slow_dec(_deactivate);
  
 -  if (of_property_read_u32(node, "cpu-offset", _offset))
 -  percpu_offset = 0;
 +  ret = __gic_init_bases(gic, -1, >fwnode);
 +  if (ret) {
 +  gic_teardown(gic);
 +  return ret;
 +  }
  
-   if (!gic_cnt)
 -  __gic_init_bases(gic_cnt, -1, dist_base, cpu_base, percpu_offset,
 -   >fwnode);
+   if (!gic_cnt) {
gic_init_physaddr(node);
+   gic_of_setup_kvm_info(node);
+   }
  
if (parent) {
irq = irq_of_parse_and_map(node, 0);
@@@ -1401,8 -1391,8 +1476,8 @@@ static int __init gic_v2_acpi_init(stru
return -EINVAL;
}
  
-   gic->raw_cpu_base = ioremap(cpu_phy_base, ACPI_GIC_CPU_IF_MEM_SIZE);
 -  cpu_base = ioremap(acpi_data.cpu_phys_base, ACPI_GIC_CPU_IF_MEM_SIZE);
 -  if (!cpu_base) {
++  gic->raw_cpu_base = ioremap(acpi_data.cpu_phys_base, 
ACPI_GIC_CPU_IF_MEM_SIZE);
 +  if (!gic->raw_cpu_base) {
pr_err("Unable to map GICC registers\n");
return -ENOMEM;
}


Re: [PATCH 1/6] mm/compaction: split freepages without holding the zone lock

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 04:56:45PM +0200, Vlastimil Babka wrote:
> On 05/03/2016 07:22 AM, js1...@gmail.com wrote:
> > From: Joonsoo Kim 
> > 
> > We don't need to split freepages with holding the zone lock. It will cause
> > more contention on zone lock so not desirable.
> 
> Fair enough, I just worry about the same thing as Hugh pointed out
> recently [1] in that it increases the amount of tricky stuff in
> compaction.c doing similar but not quite the same stuff as page/alloc.c,
> and which will be forgotten to be updated once somebody updates
> prep_new_page with e.g. a new debugging check. Can you perhaps think of
> a more robust solution here?

I will try it. I think that factoring some part of prep_new_page() out
would be enough to make thing robust.

Thanks.



Re: [PATCH 1/6] mm/compaction: split freepages without holding the zone lock

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 04:56:45PM +0200, Vlastimil Babka wrote:
> On 05/03/2016 07:22 AM, js1...@gmail.com wrote:
> > From: Joonsoo Kim 
> > 
> > We don't need to split freepages with holding the zone lock. It will cause
> > more contention on zone lock so not desirable.
> 
> Fair enough, I just worry about the same thing as Hugh pointed out
> recently [1] in that it increases the amount of tricky stuff in
> compaction.c doing similar but not quite the same stuff as page/alloc.c,
> and which will be forgotten to be updated once somebody updates
> prep_new_page with e.g. a new debugging check. Can you perhaps think of
> a more robust solution here?

I will try it. I think that factoring some part of prep_new_page() out
would be enough to make thing robust.

Thanks.



RE: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR

2016-05-11 Thread Sricharan
Hi,

> -Original Message-
> From: linux-arm-kernel [mailto:linux-arm-kernel-
> boun...@lists.infradead.org] On Behalf Of Abhishek Sahu
> Sent: Wednesday, May 11, 2016 11:04 PM
> To: Sricharan 
> Cc: arch...@codeaurora.org; w...@the-dreams.de; linux-arm-
> m...@vger.kernel.org; ntel...@codeaurora.org; linux-
> ker...@vger.kernel.org; dmaeng...@vger.kernel.org; linux-
> i...@vger.kernel.org; agr...@codeaurora.org; andy.gr...@linaro.org; linux-
> arm-ker...@lists.infradead.org
> Subject: RE: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR
> 
> On 2016-05-11 21:27, Sricharan wrote:
> > Hi,
> >
> >> 1. Current QCOM I2C driver hangs when sending data to address
> >> 0x03-0x07
> >> in some scenarios. The QUP controller generates invalid write in this
> >> case, since these addresses are reserved for different bus formats.
> >>
> >> 2. Also, the error handling is done by I2C QUP ISR in the case of DMA
> >> mode.
> >> The state need to be RESET in case of any error for clearing the
> >> available data in FIFO, which otherwise leaves the BAM DMA controller
> >> in hang state.
> >>
> >> This patch fixes the above two issues by clearing the error bits from
> >> I2C and QUP status in ISR in case of I2C error, QUP error and resets
> >> the QUP state to clear the FIFO data.
> >>
> >> Signed-off-by: Abhishek Sahu 
> >> ---
> >>  drivers/i2c/busses/i2c-qup.c | 14 --
> >>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/i2c/busses/i2c-qup.c
> >> b/drivers/i2c/busses/i2c-qup.c index 23eaabb..8c2f1bc 100644
> >> --- a/drivers/i2c/busses/i2c-qup.c
> >> +++ b/drivers/i2c/busses/i2c-qup.c
> >> @@ -213,14 +213,16 @@ static irqreturn_t qup_i2c_interrupt(int irq,
> >> void
> >> *dev)
> >>bus_err &= I2C_STATUS_ERROR_MASK;
> >>qup_err &= QUP_STATUS_ERROR_FLAGS;
> >>
> >> -  if (qup_err) {
> >> -  /* Clear Error interrupt */
> >> +  /* Clear the error bits in QUP_ERROR_FLAGS */
> >> +  if (qup_err)
> >>writel(qup_err, qup->base + QUP_ERROR_FLAGS);
> >> -  goto done;
> >> -  }
> >>
> >> -  if (bus_err) {
> >> -  /* Clear Error interrupt */
> >> +  /* Clear the error bits in QUP_I2C_STATUS */
> >> +  if (bus_err)
> >> +  writel(bus_err, qup->base + QUP_I2C_STATUS);
> >> +
> >> +  /* Reset the QUP State in case of error */
> >> +  if (qup_err || bus_err) {
> >>writel(QUP_RESET_STATE, qup->base + QUP_STATE);
> >>goto done;
> >>}
> >In qup_i2c_xfer and qup_i2c_xfer_v2 state is set to RESET at
> > the end, when
> >there is no error. So would it be fine if we do it there
> > unconditionally ?
> >
> > Regards,
> >  Sricharan
> 
> RESET the QUP state wouldn't create any issue in the case of multiple
calls.
> The existing code also RESET the QUP state for bus_err but it is not
clearing
> status bits.
So is there a difference that you see by setting it in isr for qup
errors ?
Otherwise, its better to set it in one place unconditionally instead of
doing
it in two places ?

Regards,
 Sricharan



RE: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR

2016-05-11 Thread Sricharan
Hi,

> -Original Message-
> From: linux-arm-kernel [mailto:linux-arm-kernel-
> boun...@lists.infradead.org] On Behalf Of Abhishek Sahu
> Sent: Wednesday, May 11, 2016 11:04 PM
> To: Sricharan 
> Cc: arch...@codeaurora.org; w...@the-dreams.de; linux-arm-
> m...@vger.kernel.org; ntel...@codeaurora.org; linux-
> ker...@vger.kernel.org; dmaeng...@vger.kernel.org; linux-
> i...@vger.kernel.org; agr...@codeaurora.org; andy.gr...@linaro.org; linux-
> arm-ker...@lists.infradead.org
> Subject: RE: [PATCH 1/2] i2c: qup: Cleared the error bits in ISR
> 
> On 2016-05-11 21:27, Sricharan wrote:
> > Hi,
> >
> >> 1. Current QCOM I2C driver hangs when sending data to address
> >> 0x03-0x07
> >> in some scenarios. The QUP controller generates invalid write in this
> >> case, since these addresses are reserved for different bus formats.
> >>
> >> 2. Also, the error handling is done by I2C QUP ISR in the case of DMA
> >> mode.
> >> The state need to be RESET in case of any error for clearing the
> >> available data in FIFO, which otherwise leaves the BAM DMA controller
> >> in hang state.
> >>
> >> This patch fixes the above two issues by clearing the error bits from
> >> I2C and QUP status in ISR in case of I2C error, QUP error and resets
> >> the QUP state to clear the FIFO data.
> >>
> >> Signed-off-by: Abhishek Sahu 
> >> ---
> >>  drivers/i2c/busses/i2c-qup.c | 14 --
> >>  1 file changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/i2c/busses/i2c-qup.c
> >> b/drivers/i2c/busses/i2c-qup.c index 23eaabb..8c2f1bc 100644
> >> --- a/drivers/i2c/busses/i2c-qup.c
> >> +++ b/drivers/i2c/busses/i2c-qup.c
> >> @@ -213,14 +213,16 @@ static irqreturn_t qup_i2c_interrupt(int irq,
> >> void
> >> *dev)
> >>bus_err &= I2C_STATUS_ERROR_MASK;
> >>qup_err &= QUP_STATUS_ERROR_FLAGS;
> >>
> >> -  if (qup_err) {
> >> -  /* Clear Error interrupt */
> >> +  /* Clear the error bits in QUP_ERROR_FLAGS */
> >> +  if (qup_err)
> >>writel(qup_err, qup->base + QUP_ERROR_FLAGS);
> >> -  goto done;
> >> -  }
> >>
> >> -  if (bus_err) {
> >> -  /* Clear Error interrupt */
> >> +  /* Clear the error bits in QUP_I2C_STATUS */
> >> +  if (bus_err)
> >> +  writel(bus_err, qup->base + QUP_I2C_STATUS);
> >> +
> >> +  /* Reset the QUP State in case of error */
> >> +  if (qup_err || bus_err) {
> >>writel(QUP_RESET_STATE, qup->base + QUP_STATE);
> >>goto done;
> >>}
> >In qup_i2c_xfer and qup_i2c_xfer_v2 state is set to RESET at
> > the end, when
> >there is no error. So would it be fine if we do it there
> > unconditionally ?
> >
> > Regards,
> >  Sricharan
> 
> RESET the QUP state wouldn't create any issue in the case of multiple
calls.
> The existing code also RESET the QUP state for bus_err but it is not
clearing
> status bits.
So is there a difference that you see by setting it in isr for qup
errors ?
Otherwise, its better to set it in one place unconditionally instead of
doing
it in two places ?

Regards,
 Sricharan



Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support

2016-05-11 Thread Li Bin


on 2016/5/11 23:33, James Morse wrote:
> Hi David,
> 
> On 27/04/16 19:52, David Long wrote:
>> From: "David A. Long" 
>>
>> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches,
>> first seen in October 2013. This version attempts to address concerns raised 
>> by
>> reviewers and also fixes problems discovered during testing.
>>
>> This patchset adds support for kernel probes(kprobes), jump probes(jprobes)
>> and return probes(kretprobes) support for ARM64.
>>
>> The kprobes mechanism makes use of software breakpoint and single stepping
>> support available in the ARM v8 kernel.
> 
> I applied this series on v4.6-rc7, and built the sample kprobes. They work 
> fine,
> unless I throw ftrace into the mix too.
> 
> I enabled the function_graph tracer, then tried to load the jprobe example 
> module:
> -%<-
> root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko
> Planted jprobe at ff80080c8f20, handler addr ff8000bb3000
> root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, 
> stack_st
> art = 0x0 stack_size = 0x0
> Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT 
> (current
>  EL)
> CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> Hardware name: ARM Juno development board (r1) (DT)
> task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> PC is at 0x0
> LR is at 0x0
> 
> pc : [<>] lr : [<>] pstate: 6145
> sp : ffc974e4ff00
> x29: 01200011 x28: ffc974e4c000
> x27: ff80088d x26: 00dc
> x25: 0120 x24: 0015
> x23: 6000 x22: 007fa1b40e60
> x21: 007fa1ce70d0 x20: 
> x19:  x18: 0a03
> x17: 007fa1b40d90 x16: ff80080c9708
> x15: 003b9aca x14: 007fddb7e5c0
> x13: 007fa1b40e2c x12: 00d00ff0
> x11: ff8009c4d000 x10: ff800920c000
> x9 : ff8008f5c000 x8 : ffc976c06800
> x7 : 0006daf2 x6 : 0015
> x5 : 0004 x4 : ffc96e8690a0
> x3 : 001ed7cbab74 x2 : ffc96e869000
> x1 :  x0 : 
> 
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
> Modules linked in: jprobe_example
> CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> Hardware name: ARM Juno development board (r1) (DT)
> task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> PC is at 0x0
> LR is at 0x0
> 
> pc : [<>] lr : [<>] pstate: 6145
> sp : ffc974e4ff00
> x29: 01200011 x28: ffc974e4c000
> x27: ff80088d x26: 00dc
> x25: 0120 x24: 0015
> x23: 6000 x22: 007fa1b40e60
> x21: 007fa1ce70d0 x20: 
> x19:  x18: 0a03
> x17: 007fa1b40d90 x16: ff80080c9708
> x15: 003b9aca x14: 007fddb7e5c0
> x13: 007fa1b40e2c x12: 00d00ff0
> x11: ff8009c4d000 x10: ff800920c000
> x9 : ff8008f5c000 x8 : ffc976c06800
> x7 : 0006daf2 x6 : 0015
> x5 : 0004 x4 : ffc96e8690a0
> x3 : 001ed7cbab74 x2 : ffc96e869000
> x1 :  x0 : 
> 
> Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020)
> Stack: (0xffc974e4ff00 to 0xffc974e5)
> ff00: 0417 007fa1ce76f0 00dc 0417
> ff20:  007fddb7ecf8 0005 
> ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90
> ff60: 0a03 007fddb7e5c0  007fddb7e5e0
> ff80: 00555b358000 00558f56f0e0  00558f574f00
> ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600
> ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000
> ffe0: 01200011 00dc 000484000200 08000200
> Call trace:
> [<  (null)>]   (null)
> Code: bad PC value
> ---[ end trace 35d24aad799c2941 ]---
> -%<-
> 

To solve this, it should pause function tracing before the jprobe handler is 
called
and unpause it before it returns back to the function it probed.

diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
index db2d95c..b21ed00 100644
--- a/arch/arm64/kernel/kprobes.c
+++ b/arch/arm64/kernel/kprobes.c
@@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct 
pt_regs *regs)

instruction_pointer_set(regs, (long)jp->entry);
preempt_disable();
+   pause_graph_tracing();
return 1;
 }

@@ -757,6 +758,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, 
struct pt_regs *regs)
show_regs(regs);

Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support

2016-05-11 Thread Li Bin


on 2016/5/11 23:33, James Morse wrote:
> Hi David,
> 
> On 27/04/16 19:52, David Long wrote:
>> From: "David A. Long" 
>>
>> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches,
>> first seen in October 2013. This version attempts to address concerns raised 
>> by
>> reviewers and also fixes problems discovered during testing.
>>
>> This patchset adds support for kernel probes(kprobes), jump probes(jprobes)
>> and return probes(kretprobes) support for ARM64.
>>
>> The kprobes mechanism makes use of software breakpoint and single stepping
>> support available in the ARM v8 kernel.
> 
> I applied this series on v4.6-rc7, and built the sample kprobes. They work 
> fine,
> unless I throw ftrace into the mix too.
> 
> I enabled the function_graph tracer, then tried to load the jprobe example 
> module:
> -%<-
> root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko
> Planted jprobe at ff80080c8f20, handler addr ff8000bb3000
> root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, 
> stack_st
> art = 0x0 stack_size = 0x0
> Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT 
> (current
>  EL)
> CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> Hardware name: ARM Juno development board (r1) (DT)
> task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> PC is at 0x0
> LR is at 0x0
> 
> pc : [<>] lr : [<>] pstate: 6145
> sp : ffc974e4ff00
> x29: 01200011 x28: ffc974e4c000
> x27: ff80088d x26: 00dc
> x25: 0120 x24: 0015
> x23: 6000 x22: 007fa1b40e60
> x21: 007fa1ce70d0 x20: 
> x19:  x18: 0a03
> x17: 007fa1b40d90 x16: ff80080c9708
> x15: 003b9aca x14: 007fddb7e5c0
> x13: 007fa1b40e2c x12: 00d00ff0
> x11: ff8009c4d000 x10: ff800920c000
> x9 : ff8008f5c000 x8 : ffc976c06800
> x7 : 0006daf2 x6 : 0015
> x5 : 0004 x4 : ffc96e8690a0
> x3 : 001ed7cbab74 x2 : ffc96e869000
> x1 :  x0 : 
> 
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
> Modules linked in: jprobe_example
> CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> Hardware name: ARM Juno development board (r1) (DT)
> task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> PC is at 0x0
> LR is at 0x0
> 
> pc : [<>] lr : [<>] pstate: 6145
> sp : ffc974e4ff00
> x29: 01200011 x28: ffc974e4c000
> x27: ff80088d x26: 00dc
> x25: 0120 x24: 0015
> x23: 6000 x22: 007fa1b40e60
> x21: 007fa1ce70d0 x20: 
> x19:  x18: 0a03
> x17: 007fa1b40d90 x16: ff80080c9708
> x15: 003b9aca x14: 007fddb7e5c0
> x13: 007fa1b40e2c x12: 00d00ff0
> x11: ff8009c4d000 x10: ff800920c000
> x9 : ff8008f5c000 x8 : ffc976c06800
> x7 : 0006daf2 x6 : 0015
> x5 : 0004 x4 : ffc96e8690a0
> x3 : 001ed7cbab74 x2 : ffc96e869000
> x1 :  x0 : 
> 
> Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020)
> Stack: (0xffc974e4ff00 to 0xffc974e5)
> ff00: 0417 007fa1ce76f0 00dc 0417
> ff20:  007fddb7ecf8 0005 
> ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90
> ff60: 0a03 007fddb7e5c0  007fddb7e5e0
> ff80: 00555b358000 00558f56f0e0  00558f574f00
> ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600
> ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000
> ffe0: 01200011 00dc 000484000200 08000200
> Call trace:
> [<  (null)>]   (null)
> Code: bad PC value
> ---[ end trace 35d24aad799c2941 ]---
> -%<-
> 

To solve this, it should pause function tracing before the jprobe handler is 
called
and unpause it before it returns back to the function it probed.

diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
index db2d95c..b21ed00 100644
--- a/arch/arm64/kernel/kprobes.c
+++ b/arch/arm64/kernel/kprobes.c
@@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct 
pt_regs *regs)

instruction_pointer_set(regs, (long)jp->entry);
preempt_disable();
+   pause_graph_tracing();
return 1;
 }

@@ -757,6 +758,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, 
struct pt_regs *regs)
show_regs(regs);
BUG();
}

Re: [PATCH 0.14] oom detection rework v6

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 11:43:48AM +0200, Michal Hocko wrote:
> On Tue 10-05-16 15:41:04, Joonsoo Kim wrote:
> > 2016-05-05 3:16 GMT+09:00 Michal Hocko :
> > > On Wed 04-05-16 23:32:31, Joonsoo Kim wrote:
> > >> 2016-05-04 17:47 GMT+09:00 Michal Hocko :
> [...]
> > >> > progress. What is the usual reason to disable compaction in the first
> > >> > place?
> > >>
> > >> I don't disable it. But, who knows who disable compaction? It's been 
> > >> *not*
> > >> a long time that CONFIG_COMPACTION is default enable. Maybe, 3 years?
> > >
> > > I would really like to hear about real life usecase before we go and
> > > cripple otherwise deterministic algorithms. It might be very well
> > > possible that those configurations simply do not have problems with high
> > > order allocations because they are too specific.
> 
> Sorry for insisting but I would really like to hear some answer for
> this, please.

I don't know. Who knows? How you can make sure that? And, I don't like
below fixup. Theoretically, it could retry forever.

> 
> [...]
> > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > >> > index 2e7e26c5d3ba..f48b9e9b1869 100644
> > >> > --- a/mm/page_alloc.c
> > >> > +++ b/mm/page_alloc.c
> > >> > @@ -3319,6 +3319,24 @@ should_compact_retry(struct alloc_context *ac, 
> > >> > unsigned int order, int alloc_fla
> > >> >  enum migrate_mode *migrate_mode,
> > >> >  int compaction_retries)
> > >> >  {
> > >> > +   struct zone *zone;
> > >> > +   struct zoneref *z;
> > >> > +
> > >> > +   if (order > PAGE_ALLOC_COSTLY_ORDER)
> > >> > +   return false;
> > >> > +
> > >> > +   /*
> > >> > +* There are setups with compaction disabled which would 
> > >> > prefer to loop
> > >> > +* inside the allocator rather than hit the oom killer 
> > >> > prematurely. Let's
> > >> > +* give them a good hope and keep retrying while the order-0 
> > >> > watermarks
> > >> > +* are OK.
> > >> > +*/
> > >> > +   for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, 
> > >> > ac->high_zoneidx,
> > >> > +   ac->nodemask) {
> > >> > +   if(zone_watermark_ok(zone, 0, min_wmark_pages(zone),
> > >> > +   ac->high_zoneidx, alloc_flags))
> > >> > +   return true;
> > >> > +   }
> > >> > return false;
> [...]
> > My benchmark is too specific so I make another one. It does very
> > simple things.
> > 
> > 1) Run the system with 256 MB memory and 2 GB swap
> > 2) Run memory-hogger which takes (anonymous memory) 256 MB
> > 3) Make 1000 new processes by fork (It will take 16 MB order-2 pages)
> > 
> > You can do it yourself with above instructions.
> > 
> > On current upstream kernel without CONFIG_COMPACTION, OOM doesn't happen.
> > On next-20160509 kernel without CONFIG_COMPACTION, OOM happens when
> > roughly *500* processes forked.
> > 
> > With CONFIG_COMPACTION, OOM doesn't happen on any kernel.
> 
> Does the patch I have posted helped?

I guess that it will help but please do it by yourself. It's simple.

> > Other kernels doesn't trigger OOM even if I make 1 new processes.
> 
> Is this an usual load on !CONFIG_COMPACTION configurations?

I don't know. User-space developer doesn't take care about kernel
configuration and it seems that fork 500 times when memory is full is
not a corner case to me.

> > This example is very intuitive and reasonable. I think that it's not
> > artificial.  It has enough swap space so OOM should not happen.
> 
> I am not really convinced this is true actually. You can have an
> arbitrary amount of the swap space yet it still won't help you
> because more reclaimed memory simply doesn't imply a more continuous
> memory. This is a fundamental problem. So I think that relying on
> !CONFIG_COMPACTION for heavy fork (or other high order) loads simply
> never works reliably.

I think that you don't understand how powerful the reclaim and
compaction are. In the system with large disk swap, what compaction can do
is also possible for reclaim. Reclaim can do more.

Think about following examples.

_: free
U: used(unmovable)
M: used(migratable and reclaimable)

_MUU _U_U  

With compaction (assume theoretically best algorithm),
just 3 contiguous region can be made like as following:

MMUU MUMU ___M 

With reclaim, we can make 8 contiguous region.

__UU _U_U  

Reclaim can be easily affected by thrashing but it is fundamentally
more powerful than compaction.

Even, there are not migratable but reclaimable pages and it could weak
power of the compaction.

> > This failure shows that fundamental assumption of your patch is
> > wrong. You triggers OOM even if there is enough reclaimable memory but
> > no high order freepage depending on the fact that we can't guarantee
> > that we can make high order page with reclaiming these 

Re: [PATCH 0.14] oom detection rework v6

2016-05-11 Thread Joonsoo Kim
On Tue, May 10, 2016 at 11:43:48AM +0200, Michal Hocko wrote:
> On Tue 10-05-16 15:41:04, Joonsoo Kim wrote:
> > 2016-05-05 3:16 GMT+09:00 Michal Hocko :
> > > On Wed 04-05-16 23:32:31, Joonsoo Kim wrote:
> > >> 2016-05-04 17:47 GMT+09:00 Michal Hocko :
> [...]
> > >> > progress. What is the usual reason to disable compaction in the first
> > >> > place?
> > >>
> > >> I don't disable it. But, who knows who disable compaction? It's been 
> > >> *not*
> > >> a long time that CONFIG_COMPACTION is default enable. Maybe, 3 years?
> > >
> > > I would really like to hear about real life usecase before we go and
> > > cripple otherwise deterministic algorithms. It might be very well
> > > possible that those configurations simply do not have problems with high
> > > order allocations because they are too specific.
> 
> Sorry for insisting but I would really like to hear some answer for
> this, please.

I don't know. Who knows? How you can make sure that? And, I don't like
below fixup. Theoretically, it could retry forever.

> 
> [...]
> > >> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > >> > index 2e7e26c5d3ba..f48b9e9b1869 100644
> > >> > --- a/mm/page_alloc.c
> > >> > +++ b/mm/page_alloc.c
> > >> > @@ -3319,6 +3319,24 @@ should_compact_retry(struct alloc_context *ac, 
> > >> > unsigned int order, int alloc_fla
> > >> >  enum migrate_mode *migrate_mode,
> > >> >  int compaction_retries)
> > >> >  {
> > >> > +   struct zone *zone;
> > >> > +   struct zoneref *z;
> > >> > +
> > >> > +   if (order > PAGE_ALLOC_COSTLY_ORDER)
> > >> > +   return false;
> > >> > +
> > >> > +   /*
> > >> > +* There are setups with compaction disabled which would 
> > >> > prefer to loop
> > >> > +* inside the allocator rather than hit the oom killer 
> > >> > prematurely. Let's
> > >> > +* give them a good hope and keep retrying while the order-0 
> > >> > watermarks
> > >> > +* are OK.
> > >> > +*/
> > >> > +   for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, 
> > >> > ac->high_zoneidx,
> > >> > +   ac->nodemask) {
> > >> > +   if(zone_watermark_ok(zone, 0, min_wmark_pages(zone),
> > >> > +   ac->high_zoneidx, alloc_flags))
> > >> > +   return true;
> > >> > +   }
> > >> > return false;
> [...]
> > My benchmark is too specific so I make another one. It does very
> > simple things.
> > 
> > 1) Run the system with 256 MB memory and 2 GB swap
> > 2) Run memory-hogger which takes (anonymous memory) 256 MB
> > 3) Make 1000 new processes by fork (It will take 16 MB order-2 pages)
> > 
> > You can do it yourself with above instructions.
> > 
> > On current upstream kernel without CONFIG_COMPACTION, OOM doesn't happen.
> > On next-20160509 kernel without CONFIG_COMPACTION, OOM happens when
> > roughly *500* processes forked.
> > 
> > With CONFIG_COMPACTION, OOM doesn't happen on any kernel.
> 
> Does the patch I have posted helped?

I guess that it will help but please do it by yourself. It's simple.

> > Other kernels doesn't trigger OOM even if I make 1 new processes.
> 
> Is this an usual load on !CONFIG_COMPACTION configurations?

I don't know. User-space developer doesn't take care about kernel
configuration and it seems that fork 500 times when memory is full is
not a corner case to me.

> > This example is very intuitive and reasonable. I think that it's not
> > artificial.  It has enough swap space so OOM should not happen.
> 
> I am not really convinced this is true actually. You can have an
> arbitrary amount of the swap space yet it still won't help you
> because more reclaimed memory simply doesn't imply a more continuous
> memory. This is a fundamental problem. So I think that relying on
> !CONFIG_COMPACTION for heavy fork (or other high order) loads simply
> never works reliably.

I think that you don't understand how powerful the reclaim and
compaction are. In the system with large disk swap, what compaction can do
is also possible for reclaim. Reclaim can do more.

Think about following examples.

_: free
U: used(unmovable)
M: used(migratable and reclaimable)

_MUU _U_U  

With compaction (assume theoretically best algorithm),
just 3 contiguous region can be made like as following:

MMUU MUMU ___M 

With reclaim, we can make 8 contiguous region.

__UU _U_U  

Reclaim can be easily affected by thrashing but it is fundamentally
more powerful than compaction.

Even, there are not migratable but reclaimable pages and it could weak
power of the compaction.

> > This failure shows that fundamental assumption of your patch is
> > wrong. You triggers OOM even if there is enough reclaimable memory but
> > no high order freepage depending on the fact that we can't guarantee
> > that we can make high order page with reclaiming these reclaimable
> > memory. Yes, we can't 

Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI

2016-05-11 Thread Shannon Zhao


On 2016/5/12 7:24, Matt Fleming wrote:
> On Wed, 11 May, at 09:35:47PM, Shannon Zhao wrote:
>>> > > 
>>> > > Also, why do you need to setup efi.runtime_version here? Isn't that
>>> > > done inside uefi_init()?
>>> > > 
>> > I don't see any codes which setup efi.runtime_version in uefi_init().
>  
> Look in the EFI tree,
> 
>   
> https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/tree/drivers/firmware/efi/arm-init.c?h=next#n120
> 
Ah, there are some patches in EFI tree introducing this change, but they
are not in kernel matser but I didn't notice this, sorry.

>>> > > Here is how I would have expected this patch to look:
>>> > > 
>>> > >   - Add FDT code to find hypervisor EFI params
>>> > > 
>>> > >   - Force enable EFI_RUNTIME_SERVICES for Xen and call
>>> > > xen_efi_runtime_setup() inside xen_guest_init()
>>> > > 
>>> > >   - Change arm_enable_runtime_services() to handle the case where
>>> > > EFI_RUNTIME_SERVICES is already set
>>> > > 
>>> > > Am I misunderstanding some ordering issues?
>> > 
>> > Since xen_guest_init() and arm_enable_runtime_services() are both
>> > early_initcall and the calling order is random I think.
> Urgh, right, I missed that.
> 
>> > And when we call xen_efi_runtime_setup() and setup
>> > efi.runtime_version in xen_guest_init(), the efi.systab might be
>> > NULL. So it needs to map the systanle explicitly before we use
>> > efi.systab.
> Could you explain why you need to copy efi.runtime_version instead of
> letting the existing code in uefi_init() set it up?
> 
> Also, efi.systab isn't used by xen_efi_runtime_setup() as far I can
> see, not sure why you need that either?
As said above, I will rebase this patch on top of the EFI next branch.

Thanks,
-- 
Shannon



Re: [PATCH] Xen: EFI: Parse DT parameters for Xen specific UEFI

2016-05-11 Thread Shannon Zhao


On 2016/5/12 7:24, Matt Fleming wrote:
> On Wed, 11 May, at 09:35:47PM, Shannon Zhao wrote:
>>> > > 
>>> > > Also, why do you need to setup efi.runtime_version here? Isn't that
>>> > > done inside uefi_init()?
>>> > > 
>> > I don't see any codes which setup efi.runtime_version in uefi_init().
>  
> Look in the EFI tree,
> 
>   
> https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/tree/drivers/firmware/efi/arm-init.c?h=next#n120
> 
Ah, there are some patches in EFI tree introducing this change, but they
are not in kernel matser but I didn't notice this, sorry.

>>> > > Here is how I would have expected this patch to look:
>>> > > 
>>> > >   - Add FDT code to find hypervisor EFI params
>>> > > 
>>> > >   - Force enable EFI_RUNTIME_SERVICES for Xen and call
>>> > > xen_efi_runtime_setup() inside xen_guest_init()
>>> > > 
>>> > >   - Change arm_enable_runtime_services() to handle the case where
>>> > > EFI_RUNTIME_SERVICES is already set
>>> > > 
>>> > > Am I misunderstanding some ordering issues?
>> > 
>> > Since xen_guest_init() and arm_enable_runtime_services() are both
>> > early_initcall and the calling order is random I think.
> Urgh, right, I missed that.
> 
>> > And when we call xen_efi_runtime_setup() and setup
>> > efi.runtime_version in xen_guest_init(), the efi.systab might be
>> > NULL. So it needs to map the systanle explicitly before we use
>> > efi.systab.
> Could you explain why you need to copy efi.runtime_version instead of
> letting the existing code in uefi_init() set it up?
> 
> Also, efi.systab isn't used by xen_efi_runtime_setup() as far I can
> see, not sure why you need that either?
As said above, I will rebase this patch on top of the EFI next branch.

Thanks,
-- 
Shannon



Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Alex Williamson
On Thu, 12 May 2016 01:19:44 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, May 11, 2016 11:54 PM
> > 
> > On Wed, 11 May 2016 06:29:06 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > >
> > > > On Thu, 5 May 2016 12:15:46 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > >
> > > > > > Hi David and Kevin,
> > > > > >
> > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > >  
> > > > > > > From: Tian, Kevin  
> > > > > > >> Sent: 05 May 2016 10:37  
> > > > > > > ...  
> > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > > > > >>>  
> > > > > > >> Then how do you prevent malicious guest kernel accessing it?  
> > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > received that writes the required word through that address.
> > > > > > >
> > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > > > > cycle.
> > > > > > >
> > > > > > >   David
> > > > > > >  
> > > > > >
> > > > > > If we have enough permission to load a malicious driver or
> > > > > > kernel, we can easily break the guest without exposed
> > > > > > MSI-X table.
> > > > > >
> > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > the MSI-X table to break other guest or host. The
> > > > > > capability of IRQ remapping could provide this
> > > > > > kind of protection.
> > > > > >  
> > > > >
> > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > specific, but at least for Intel VT-d specification, MSI-X entry must
> > > > > be configured with a remappable format by host kernel which
> > > > > contains an index into IRQ remapping table. The index will find a
> > > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > > device. If you allow a malicious program random index into MSI-X
> > > > > entry of assigned device, the hole is obvious...
> > > > >
> > > > > Above might make sense only for a IRQ remapping implementation
> > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > enabled or not.  
> > > >
> > > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > > table to the guest and the guest can make direct use of it.  The end
> > > > goal here is that the guest on a power system is already
> > > > paravirtualized to not program the device MSI-X by directly writing to
> > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > always run virtualized.  Therefore a) they never intend to touch the
> > > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > > can only hurt itself by doing so.
> > > >
> > > > On x86 we don't have a), our method of programming the MSI-X vector
> > > > table is to directly write to it. Therefore we will always require QEMU
> > > > to place a MemoryRegion over the vector table to intercept those
> > > > accesses.  However with interrupt remapping, we do have b) on x86, which
> > > > means that we don't need to be so strict in disallowing user accesses
> > > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > > the device, but the user should only be able to hurt themselves by
> > > > writing it directly.  x86 doesn't really get anything out of this
> > > > change, but it helps this special case on power pretty significantly
> > > > aiui.  Thanks,
> > > >  
> > >
> > > Allowing guest direct write to MSI-x table has system-wide impact.
> > > As I explained earlier, hypervisor needs to control "interrupt_index"
> > > programmed in MSI-X entry, which is used to associate a specific
> > > IRQ remapping entry. Now if you expose whole MSI-x table to guest,
> > > it can program random index into MSI-X entry to associate with
> > > any IRQ remapping entry and then there won't be any isolation per se.
> > >
> > > You can check "5.5.2 MSI and MSI-X 

Re: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Alex Williamson
On Thu, 12 May 2016 01:19:44 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Wednesday, May 11, 2016 11:54 PM
> > 
> > On Wed, 11 May 2016 06:29:06 +
> > "Tian, Kevin"  wrote:
> >   
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Thursday, May 05, 2016 11:05 PM
> > > >
> > > > On Thu, 5 May 2016 12:15:46 +
> > > > "Tian, Kevin"  wrote:
> > > >  
> > > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > > >
> > > > > > Hi David and Kevin,
> > > > > >
> > > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > > >  
> > > > > > > From: Tian, Kevin  
> > > > > > >> Sent: 05 May 2016 10:37  
> > > > > > > ...  
> > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > > > > >>>  
> > > > > > >> Then how do you prevent malicious guest kernel accessing it?  
> > > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > > the receive buffer ring to contain a single word entry that
> > > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > > then using a loopback mode to cause a specific packet be
> > > > > > > received that writes the required word through that address.
> > > > > > >
> > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > > > > cycle.
> > > > > > >
> > > > > > >   David
> > > > > > >  
> > > > > >
> > > > > > If we have enough permission to load a malicious driver or
> > > > > > kernel, we can easily break the guest without exposed
> > > > > > MSI-X table.
> > > > > >
> > > > > > I think it should be safe to expose MSI-X table if we can
> > > > > > make sure that malicious guest driver/kernel can't use
> > > > > > the MSI-X table to break other guest or host. The
> > > > > > capability of IRQ remapping could provide this
> > > > > > kind of protection.
> > > > > >  
> > > > >
> > > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > > structure to guest. I know actual IRQ remapping might be platform
> > > > > specific, but at least for Intel VT-d specification, MSI-X entry must
> > > > > be configured with a remappable format by host kernel which
> > > > > contains an index into IRQ remapping table. The index will find a
> > > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > > device. If you allow a malicious program random index into MSI-X
> > > > > entry of assigned device, the hole is obvious...
> > > > >
> > > > > Above might make sense only for a IRQ remapping implementation
> > > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > > passthrough based on this fact instead of general IRQ remapping
> > > > > enabled or not.  
> > > >
> > > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > > table to the guest and the guest can make direct use of it.  The end
> > > > goal here is that the guest on a power system is already
> > > > paravirtualized to not program the device MSI-X by directly writing to
> > > > the MSI-X vector table.  They have hypercalls for this since they
> > > > always run virtualized.  Therefore a) they never intend to touch the
> > > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > > can only hurt itself by doing so.
> > > >
> > > > On x86 we don't have a), our method of programming the MSI-X vector
> > > > table is to directly write to it. Therefore we will always require QEMU
> > > > to place a MemoryRegion over the vector table to intercept those
> > > > accesses.  However with interrupt remapping, we do have b) on x86, which
> > > > means that we don't need to be so strict in disallowing user accesses
> > > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > > the device, but the user should only be able to hurt themselves by
> > > > writing it directly.  x86 doesn't really get anything out of this
> > > > change, but it helps this special case on power pretty significantly
> > > > aiui.  Thanks,
> > > >  
> > >
> > > Allowing guest direct write to MSI-x table has system-wide impact.
> > > As I explained earlier, hypervisor needs to control "interrupt_index"
> > > programmed in MSI-X entry, which is used to associate a specific
> > > IRQ remapping entry. Now if you expose whole MSI-x table to guest,
> > > it can program random index into MSI-X entry to associate with
> > > any IRQ remapping entry and then there won't be any isolation per se.
> > >
> > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d
> > > spec.  
> > 
> > I think you're 

Re: [PATCH] x86_64: Disabling read-implies-exec when the stack is executable

2016-05-11 Thread Kees Cook
On Wed, May 11, 2016 at 3:40 PM, Andi Kleen  wrote:
>> However, I would tend to agree: RIE should only be needed on 32-bit
>> since 64-bit started its life knowing about no-exec permissions.
>
> NX was not in the original AMD K8 chips.  Was only added some time later.

So we should retain this behavior for all of 64-bit?

>> set_personality_64bit()'s (which is confusingly just an initializer
>> and not called during the personality() syscall) comment about this
>> makes no sense to me:
>>
>> /* TBD: overwrites user setup. Should have two bits.
>>But 64bit processes have always behaved this way,
>>so it's not too bad. The main problem is just that
>>32bit childs are affected again. */
>> current->personality &= ~READ_IMPLIES_EXEC;
>
> What does not make sense?

I just don't have enough context to make sense of it. What two bits?
Always behaved what way?Affected by what?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH] x86_64: Disabling read-implies-exec when the stack is executable

2016-05-11 Thread Kees Cook
On Wed, May 11, 2016 at 3:40 PM, Andi Kleen  wrote:
>> However, I would tend to agree: RIE should only be needed on 32-bit
>> since 64-bit started its life knowing about no-exec permissions.
>
> NX was not in the original AMD K8 chips.  Was only added some time later.

So we should retain this behavior for all of 64-bit?

>> set_personality_64bit()'s (which is confusingly just an initializer
>> and not called during the personality() syscall) comment about this
>> makes no sense to me:
>>
>> /* TBD: overwrites user setup. Should have two bits.
>>But 64bit processes have always behaved this way,
>>so it's not too bad. The main problem is just that
>>32bit childs are affected again. */
>> current->personality &= ~READ_IMPLIES_EXEC;
>
> What does not make sense?

I just don't have enough context to make sense of it. What two bits?
Always behaved what way?Affected by what?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security


Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread Hongzhou Yang
On Thu, 2016-05-12 at 09:41 +0800, chunfeng yun wrote:
> Hi,
> 
> On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> > On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> > >  wrote:
> > > 
> > > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > > fixed when set its default mode as usb iddig.
> > > >
> > > > Signed-off-by: Chunfeng Yun 
> > > 
> > 
> > Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> > If you want to set its default mode to iddig, you should set it in dts.
> > 
> I set it in DTS, but it didn't work, because when usb driver requested
> IRQ, pinmux was switched back to default mode set by
> MTK_EINT_FUNCTION().
> 

After confirmed, there are something wrong with data sheet and pinmux
table, and GPIO16 can only receive interrupt by mode 1. So

Acked-by: Hongzhou Yang 

Yours,
Hongzhou




Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread Hongzhou Yang
On Thu, 2016-05-12 at 09:41 +0800, chunfeng yun wrote:
> Hi,
> 
> On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> > On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> > >  wrote:
> > > 
> > > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > > fixed when set its default mode as usb iddig.
> > > >
> > > > Signed-off-by: Chunfeng Yun 
> > > 
> > 
> > Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> > If you want to set its default mode to iddig, you should set it in dts.
> > 
> I set it in DTS, but it didn't work, because when usb driver requested
> IRQ, pinmux was switched back to default mode set by
> MTK_EINT_FUNCTION().
> 

After confirmed, there are something wrong with data sheet and pinmux
table, and GPIO16 can only receive interrupt by mode 1. So

Acked-by: Hongzhou Yang 

Yours,
Hongzhou




Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

2016-05-11 Thread Michael Neuling
On Wed, 2016-05-11 at 20:24 +0200, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 02:33:45PM +0200, Peter Zijlstra wrote:
> > 
> > Hmm, PPC folks; what does your topology look like?
> > 
> > Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
> > seems to suggest your cores do not share cache at all.
> > 
> > https://en.wikipedia.org/wiki/POWER7 seems to agree and states
> > 
> >   "4 MB L3 cache per C1 core"
> > 
> > And http://www-03.ibm.com/systems/resources/systems_power_software_i_pe
> > rfmgmt_underthehood.pdf
> > also explicitly draws pictures with the L3 per core.
> > 
> > _however_, that same document describes L3 inter-core fill and lateral
> > cast-out, which sounds like the L3s work together to form a node wide
> > caching system.
> > 
> > Do we want to model this co-operative L3 slices thing as a sort of
> > node-wide LLC for the purpose of the scheduler ?
> Going back a generation; Power6 seems to have a shared L3 (off package)
> between the two cores on the package. The current topology does not
> reflect that at all.
> 
> And going forward a generation; Power8 seems to share the per-core
> (chiplet) L3 amonst all cores (chiplets) + is has the centaur (memory
> controller) 16M L4.

Yep, L1/L2/L3 is per core on POWER8 and POWER7.  POWER6 and POWER5 (both
dual core chips) had a shared off chip cache

The POWER8 L4 is really a bit different as it's out in the memory
controller.  It's more of a memory DIMM buffer as it can only cache data
associated with the physical addresses on those DIMMS.

> So it seems the current topology setup is not describing these chips
> very well. Also note that the arch topology code can runtime select a
> topology, so you could make that topo setup micro-arch specific.

We are planning on making some topology changes for the upcoming P9 which
will share L2/L3 amongst pairs of cores (24 cores per chip).

FWIW our P9 upstreaming is still in it's infancy since P9 is not released
yet.

Mike


Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

2016-05-11 Thread Michael Neuling
On Wed, 2016-05-11 at 20:24 +0200, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 02:33:45PM +0200, Peter Zijlstra wrote:
> > 
> > Hmm, PPC folks; what does your topology look like?
> > 
> > Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
> > seems to suggest your cores do not share cache at all.
> > 
> > https://en.wikipedia.org/wiki/POWER7 seems to agree and states
> > 
> >   "4 MB L3 cache per C1 core"
> > 
> > And http://www-03.ibm.com/systems/resources/systems_power_software_i_pe
> > rfmgmt_underthehood.pdf
> > also explicitly draws pictures with the L3 per core.
> > 
> > _however_, that same document describes L3 inter-core fill and lateral
> > cast-out, which sounds like the L3s work together to form a node wide
> > caching system.
> > 
> > Do we want to model this co-operative L3 slices thing as a sort of
> > node-wide LLC for the purpose of the scheduler ?
> Going back a generation; Power6 seems to have a shared L3 (off package)
> between the two cores on the package. The current topology does not
> reflect that at all.
> 
> And going forward a generation; Power8 seems to share the per-core
> (chiplet) L3 amonst all cores (chiplets) + is has the centaur (memory
> controller) 16M L4.

Yep, L1/L2/L3 is per core on POWER8 and POWER7.  POWER6 and POWER5 (both
dual core chips) had a shared off chip cache

The POWER8 L4 is really a bit different as it's out in the memory
controller.  It's more of a memory DIMM buffer as it can only cache data
associated with the physical addresses on those DIMMS.

> So it seems the current topology setup is not describing these chips
> very well. Also note that the arch topology code can runtime select a
> topology, so you could make that topo setup micro-arch specific.

We are planning on making some topology changes for the upcoming P9 which
will share L2/L3 amongst pairs of cores (24 cores per chip).

FWIW our P9 upstreaming is still in it's infancy since P9 is not released
yet.

Mike


Re: [PATCH perf/core v7 02/21] perf buildid: Fix to set correct dso name for kallsyms

2016-05-11 Thread Masami Hiramatsu
On Wed, 11 May 2016 12:45:00 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:51:37PM +0900, Masami Hiramatsu escreveu:
> > Fix to set correct dso name ("[kernel.kallsyms]") for
> > kallsyms, as for vdso does.
> 
> Can you rewrite the above comment and also break this down in two
> patches,

No, could you explain what parts this consists of?
I think this patch is done just one thing. Actually I can rewrite
this as one-line patch like below

if (asprintf(, "%s%s%s", buildid_dir, slash ? "/" : "",
-is_vdso ? DSO__NAME_VDSO : realname) < 0)
+is_vdso ? DSO__NAME_VDSO : (is_kallsyms ? 
"[kernel.kallsyms]": realname)) < 0)

But indeed this is not readable so clean it up like below.

> probably decribing what is the problem that it fixes?

Ah, indeed. This is actually not a fix for existing bug, instead
it will prevent buggy behavior. Current function can get any filepath
with is_vdso = true or is_kallsyms = true, but it seems easily giving
contradictory combination.

Hmm, OK, I think I have to solve this issue in another way.

Thanks!

> 
> - Arnaldo
>  
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/build-id.c |   12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index b6ecf87..234d8a1 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -343,21 +343,25 @@ void disable_buildid_cache(void)
> >  static char *build_id_cache__dirname_from_path(const char *name,
> >bool is_kallsyms, bool is_vdso)
> >  {
> > -   char *realname = (char *)name, *filename;
> > +   const char *realname = name;
> > +   char *filename;
> > bool slash = is_kallsyms || is_vdso;
> >  
> > if (!slash) {
> > realname = realpath(name, NULL);
> > if (!realname)
> > return NULL;
> > -   }
> > +   } else if (is_vdso)
> > +   realname = DSO__NAME_VDSO;
> > +   else
> > +   realname = "[kernel.kallsyms]";
> >  
> > if (asprintf(, "%s%s%s", buildid_dir, slash ? "/" : "",
> > -is_vdso ? DSO__NAME_VDSO : realname) < 0)
> > +realname) < 0)
> > filename = NULL;
> >  
> > if (!slash)
> > -   free(realname);
> > +   free((char *)realname);
> >  
> > return filename;
> >  }


-- 
Masami Hiramatsu 


Re: [PATCH perf/core v7 02/21] perf buildid: Fix to set correct dso name for kallsyms

2016-05-11 Thread Masami Hiramatsu
On Wed, 11 May 2016 12:45:00 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:51:37PM +0900, Masami Hiramatsu escreveu:
> > Fix to set correct dso name ("[kernel.kallsyms]") for
> > kallsyms, as for vdso does.
> 
> Can you rewrite the above comment and also break this down in two
> patches,

No, could you explain what parts this consists of?
I think this patch is done just one thing. Actually I can rewrite
this as one-line patch like below

if (asprintf(, "%s%s%s", buildid_dir, slash ? "/" : "",
-is_vdso ? DSO__NAME_VDSO : realname) < 0)
+is_vdso ? DSO__NAME_VDSO : (is_kallsyms ? 
"[kernel.kallsyms]": realname)) < 0)

But indeed this is not readable so clean it up like below.

> probably decribing what is the problem that it fixes?

Ah, indeed. This is actually not a fix for existing bug, instead
it will prevent buggy behavior. Current function can get any filepath
with is_vdso = true or is_kallsyms = true, but it seems easily giving
contradictory combination.

Hmm, OK, I think I have to solve this issue in another way.

Thanks!

> 
> - Arnaldo
>  
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/build-id.c |   12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index b6ecf87..234d8a1 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -343,21 +343,25 @@ void disable_buildid_cache(void)
> >  static char *build_id_cache__dirname_from_path(const char *name,
> >bool is_kallsyms, bool is_vdso)
> >  {
> > -   char *realname = (char *)name, *filename;
> > +   const char *realname = name;
> > +   char *filename;
> > bool slash = is_kallsyms || is_vdso;
> >  
> > if (!slash) {
> > realname = realpath(name, NULL);
> > if (!realname)
> > return NULL;
> > -   }
> > +   } else if (is_vdso)
> > +   realname = DSO__NAME_VDSO;
> > +   else
> > +   realname = "[kernel.kallsyms]";
> >  
> > if (asprintf(, "%s%s%s", buildid_dir, slash ? "/" : "",
> > -is_vdso ? DSO__NAME_VDSO : realname) < 0)
> > +realname) < 0)
> > filename = NULL;
> >  
> > if (!slash)
> > -   free(realname);
> > +   free((char *)realname);
> >  
> > return filename;
> >  }


-- 
Masami Hiramatsu 


linux-next: manual merge of the tip tree with the arm64 tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/irqchip/irq-gic.c

between commit:

  25fc11aead38 ("irqchip/gic: Restore CPU interface checking")

from the arm64 tree and commit:

  dc9722cc57eb ("irqchip/gic: Return an error if GIC initialisation fails")
  f673b9b5cb54 ("irqchip/gic: Store GIC configuration parameters")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/irq-gic.c
index 095bb5b5c3f2,113e2d02c812..
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@@ -489,8 -486,9 +486,10 @@@ static int gic_cpu_init(struct gic_chip
/*
 * Get what the GIC says our CPU mask is.
 */
-   BUG_ON(cpu >= NR_GIC_CPU_IF);
+   if (WARN_ON(cpu >= NR_GIC_CPU_IF))
+   return -EINVAL;
+ 
 +  gic_check_cpu_features();
cpu_mask = gic_get_cpumask(gic);
gic_cpu_map[cpu] = cpu_mask;
  
@@@ -1012,24 -1029,28 +1030,26 @@@ static const struct irq_domain_ops gic_
.unmap = gic_irq_domain_unmap,
  };
  
- static void __init __gic_init_bases(unsigned int gic_nr, int irq_start,
-  void __iomem *dist_base, void __iomem *cpu_base,
-  u32 percpu_offset, struct fwnode_handle *handle)
+ static int __init __gic_init_bases(struct gic_chip_data *gic, int irq_start,
+  struct fwnode_handle *handle)
  {
irq_hw_number_t hwirq_base;
-   struct gic_chip_data *gic;
-   int gic_irqs, irq_base, i;
- 
-   BUG_ON(gic_nr >= CONFIG_ARM_GIC_MAX_NR);
+   int gic_irqs, irq_base, i, ret;
  
-   gic = _data[gic_nr];
+   if (WARN_ON(!gic || gic->domain))
+   return -EINVAL;
  
 -  gic_check_cpu_features();
 -
/* Initialize irq_chip */
-   if (static_key_true(_deactivate) && gic_nr == 0) {
-   gic->chip = gic_eoimode1_chip;
+   gic->chip = gic_chip;
+ 
+   if (static_key_true(_deactivate) && gic == _data[0]) {
+   gic->chip.irq_mask = gic_eoimode1_mask_irq;
+   gic->chip.irq_eoi = gic_eoimode1_eoi_irq;
+   gic->chip.irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity;
+   gic->chip.name = kasprintf(GFP_KERNEL, "GICv2");
} else {
-   gic->chip = gic_chip;
-   gic->chip.name = kasprintf(GFP_KERNEL, "GIC-%d", gic_nr);
+   gic->chip.name = kasprintf(GFP_KERNEL, "GIC-%d",
+  (int)(gic - _data[0]));
}
  
  #ifdef CONFIG_SMP


linux-next: manual merge of the tip tree with the arm64 tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/irqchip/irq-gic.c

between commit:

  25fc11aead38 ("irqchip/gic: Restore CPU interface checking")

from the arm64 tree and commit:

  dc9722cc57eb ("irqchip/gic: Return an error if GIC initialisation fails")
  f673b9b5cb54 ("irqchip/gic: Store GIC configuration parameters")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/irq-gic.c
index 095bb5b5c3f2,113e2d02c812..
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@@ -489,8 -486,9 +486,10 @@@ static int gic_cpu_init(struct gic_chip
/*
 * Get what the GIC says our CPU mask is.
 */
-   BUG_ON(cpu >= NR_GIC_CPU_IF);
+   if (WARN_ON(cpu >= NR_GIC_CPU_IF))
+   return -EINVAL;
+ 
 +  gic_check_cpu_features();
cpu_mask = gic_get_cpumask(gic);
gic_cpu_map[cpu] = cpu_mask;
  
@@@ -1012,24 -1029,28 +1030,26 @@@ static const struct irq_domain_ops gic_
.unmap = gic_irq_domain_unmap,
  };
  
- static void __init __gic_init_bases(unsigned int gic_nr, int irq_start,
-  void __iomem *dist_base, void __iomem *cpu_base,
-  u32 percpu_offset, struct fwnode_handle *handle)
+ static int __init __gic_init_bases(struct gic_chip_data *gic, int irq_start,
+  struct fwnode_handle *handle)
  {
irq_hw_number_t hwirq_base;
-   struct gic_chip_data *gic;
-   int gic_irqs, irq_base, i;
- 
-   BUG_ON(gic_nr >= CONFIG_ARM_GIC_MAX_NR);
+   int gic_irqs, irq_base, i, ret;
  
-   gic = _data[gic_nr];
+   if (WARN_ON(!gic || gic->domain))
+   return -EINVAL;
  
 -  gic_check_cpu_features();
 -
/* Initialize irq_chip */
-   if (static_key_true(_deactivate) && gic_nr == 0) {
-   gic->chip = gic_eoimode1_chip;
+   gic->chip = gic_chip;
+ 
+   if (static_key_true(_deactivate) && gic == _data[0]) {
+   gic->chip.irq_mask = gic_eoimode1_mask_irq;
+   gic->chip.irq_eoi = gic_eoimode1_eoi_irq;
+   gic->chip.irq_set_vcpu_affinity = gic_irq_set_vcpu_affinity;
+   gic->chip.name = kasprintf(GFP_KERNEL, "GICv2");
} else {
-   gic->chip = gic_chip;
-   gic->chip.name = kasprintf(GFP_KERNEL, "GIC-%d", gic_nr);
+   gic->chip.name = kasprintf(GFP_KERNEL, "GIC-%d",
+  (int)(gic - _data[0]));
}
  
  #ifdef CONFIG_SMP


linux-next: manual merge of the tip tree with the arc tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/irqchip/Kconfig
  drivers/irqchip/Makefile

between commit:

  44df427c894a ("irqchip: add nps Internal and external irqchips")

from the arc tree and commit:

  b8f3ebe630a4 ("irqchip: Add Layerscape SCFG MSI controller support")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/Kconfig
index 1ab632a94db3,81f88ada3a61..
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@@ -245,8 -246,10 +246,16 @@@ config MVEBU_ODM
bool
select GENERIC_MSI_IRQ_DOMAIN
  
 +config EZNPS_GIC
 +  bool "NPS400 Global Interrupt Manager (GIM)"
 +  select IRQ_DOMAIN
 +  help
 +Support the EZchip NPS400 global interrupt controller
++
+ config LS_SCFG_MSI
+   def_bool y if SOC_LS1021A || ARCH_LAYERSCAPE
+   depends on PCI && PCI_MSI
+   select PCI_MSI_IRQ_DOMAIN
+ 
+ config PARTITION_PERCPU
+   bool
diff --cc drivers/irqchip/Makefile
index 9d54d53fe223,f828244b44c2..
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@@ -65,4 -67,4 +67,5 @@@ obj-$(CONFIG_INGENIC_IRQ) += irq-ingen
  obj-$(CONFIG_IMX_GPCV2)   += irq-imx-gpcv2.o
  obj-$(CONFIG_PIC32_EVIC)  += irq-pic32-evic.o
  obj-$(CONFIG_MVEBU_ODMI)  += irq-mvebu-odmi.o
 +obj-$(CONFIG_EZNPS_GIC)   += irq-eznps.o
+ obj-$(CONFIG_LS_SCFG_MSI) += irq-ls-scfg-msi.o


linux-next: manual merge of the tip tree with the arc tree

2016-05-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  drivers/irqchip/Kconfig
  drivers/irqchip/Makefile

between commit:

  44df427c894a ("irqchip: add nps Internal and external irqchips")

from the arc tree and commit:

  b8f3ebe630a4 ("irqchip: Add Layerscape SCFG MSI controller support")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/irqchip/Kconfig
index 1ab632a94db3,81f88ada3a61..
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@@ -245,8 -246,10 +246,16 @@@ config MVEBU_ODM
bool
select GENERIC_MSI_IRQ_DOMAIN
  
 +config EZNPS_GIC
 +  bool "NPS400 Global Interrupt Manager (GIM)"
 +  select IRQ_DOMAIN
 +  help
 +Support the EZchip NPS400 global interrupt controller
++
+ config LS_SCFG_MSI
+   def_bool y if SOC_LS1021A || ARCH_LAYERSCAPE
+   depends on PCI && PCI_MSI
+   select PCI_MSI_IRQ_DOMAIN
+ 
+ config PARTITION_PERCPU
+   bool
diff --cc drivers/irqchip/Makefile
index 9d54d53fe223,f828244b44c2..
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@@ -65,4 -67,4 +67,5 @@@ obj-$(CONFIG_INGENIC_IRQ) += irq-ingen
  obj-$(CONFIG_IMX_GPCV2)   += irq-imx-gpcv2.o
  obj-$(CONFIG_PIC32_EVIC)  += irq-pic32-evic.o
  obj-$(CONFIG_MVEBU_ODMI)  += irq-mvebu-odmi.o
 +obj-$(CONFIG_EZNPS_GIC)   += irq-eznps.o
+ obj-$(CONFIG_LS_SCFG_MSI) += irq-ls-scfg-msi.o


[PATCH v2] usb: ehci-platform: add reset controller number in struct ehci_platform_priv

2016-05-11 Thread Jiancheng Xue
Some ehci compatible controllers have more than one reset signal lines,
e.g., Synopsys DWC USB2.0 Host-AHB Controller has two resets hreset_i_n
and phy_rst_i_n. Two more resets are added in this patch in order for
this kind of controller to use this driver directly.

Signed-off-by: Jiancheng Xue 
Acked-by: Alan Stern 
---
Change Log:
v2:
-Fixed a bug pointed by Alan Stern.
 Called reset_control_put() for the offending reset line.
-Fixed a compiling error.

 drivers/usb/host/ehci-platform.c | 45 +---
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/usb/host/ehci-platform.c b/drivers/usb/host/ehci-platform.c
index 1757ebb..bc33f45 100644
--- a/drivers/usb/host/ehci-platform.c
+++ b/drivers/usb/host/ehci-platform.c
@@ -39,11 +39,12 @@
 
 #define DRIVER_DESC "EHCI generic platform driver"
 #define EHCI_MAX_CLKS 3
+#define EHCI_MAX_RSTS 3
 #define hcd_to_ehci_priv(h) ((struct ehci_platform_priv *)hcd_to_ehci(h)->priv)
 
 struct ehci_platform_priv {
struct clk *clks[EHCI_MAX_CLKS];
-   struct reset_control *rst;
+   struct reset_control *rsts[EHCI_MAX_RSTS];
struct phy **phys;
int num_phys;
bool reset_on_resume;
@@ -149,7 +150,7 @@ static int ehci_platform_probe(struct platform_device *dev)
struct usb_ehci_pdata *pdata = dev_get_platdata(>dev);
struct ehci_platform_priv *priv;
struct ehci_hcd *ehci;
-   int err, irq, phy_num, clk = 0;
+   int err, irq, phy_num, clk = 0, rst;
 
if (usb_disabled())
return -ENODEV;
@@ -234,16 +235,22 @@ static int ehci_platform_probe(struct platform_device 
*dev)
}
}
 
-   priv->rst = devm_reset_control_get_optional(>dev, NULL);
-   if (IS_ERR(priv->rst)) {
-   err = PTR_ERR(priv->rst);
-   if (err == -EPROBE_DEFER)
-   goto err_put_clks;
-   priv->rst = NULL;
-   } else {
-   err = reset_control_deassert(priv->rst);
-   if (err)
-   goto err_put_clks;
+   for (rst = 0; rst < EHCI_MAX_RSTS; rst++) {
+   priv->rsts[rst] = of_reset_control_get_by_index(
+   dev->dev.of_node, rst);
+   if (IS_ERR(priv->rsts[rst])) {
+   err = PTR_ERR(priv->rsts[rst]);
+   if (err == -EPROBE_DEFER)
+   goto err_reset;
+   priv->rsts[rst] = NULL;
+   break;
+   }
+
+   err = reset_control_deassert(priv->rsts[rst]);
+   if (err) {
+   reset_control_put(priv->rsts[rst]);
+   goto err_reset;
+   }
}
 
if (pdata->big_endian_desc)
@@ -300,8 +307,10 @@ err_power:
if (pdata->power_off)
pdata->power_off(dev);
 err_reset:
-   if (priv->rst)
-   reset_control_assert(priv->rst);
+   while (--rst >= 0) {
+   reset_control_assert(priv->rsts[rst]);
+   reset_control_put(priv->rsts[rst]);
+   }
 err_put_clks:
while (--clk >= 0)
clk_put(priv->clks[clk]);
@@ -319,15 +328,17 @@ static int ehci_platform_remove(struct platform_device 
*dev)
struct usb_hcd *hcd = platform_get_drvdata(dev);
struct usb_ehci_pdata *pdata = dev_get_platdata(>dev);
struct ehci_platform_priv *priv = hcd_to_ehci_priv(hcd);
-   int clk;
+   int clk, rst;
 
usb_remove_hcd(hcd);
 
if (pdata->power_off)
pdata->power_off(dev);
 
-   if (priv->rst)
-   reset_control_assert(priv->rst);
+   for (rst = 0; rst < EHCI_MAX_RSTS && priv->rsts[rst]; rst++) {
+   reset_control_assert(priv->rsts[rst]);
+   reset_control_put(priv->rsts[rst]);
+   }
 
for (clk = 0; clk < EHCI_MAX_CLKS && priv->clks[clk]; clk++)
clk_put(priv->clks[clk]);
-- 
1.9.1



[PATCH v2] usb: ehci-platform: add reset controller number in struct ehci_platform_priv

2016-05-11 Thread Jiancheng Xue
Some ehci compatible controllers have more than one reset signal lines,
e.g., Synopsys DWC USB2.0 Host-AHB Controller has two resets hreset_i_n
and phy_rst_i_n. Two more resets are added in this patch in order for
this kind of controller to use this driver directly.

Signed-off-by: Jiancheng Xue 
Acked-by: Alan Stern 
---
Change Log:
v2:
-Fixed a bug pointed by Alan Stern.
 Called reset_control_put() for the offending reset line.
-Fixed a compiling error.

 drivers/usb/host/ehci-platform.c | 45 +---
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/drivers/usb/host/ehci-platform.c b/drivers/usb/host/ehci-platform.c
index 1757ebb..bc33f45 100644
--- a/drivers/usb/host/ehci-platform.c
+++ b/drivers/usb/host/ehci-platform.c
@@ -39,11 +39,12 @@
 
 #define DRIVER_DESC "EHCI generic platform driver"
 #define EHCI_MAX_CLKS 3
+#define EHCI_MAX_RSTS 3
 #define hcd_to_ehci_priv(h) ((struct ehci_platform_priv *)hcd_to_ehci(h)->priv)
 
 struct ehci_platform_priv {
struct clk *clks[EHCI_MAX_CLKS];
-   struct reset_control *rst;
+   struct reset_control *rsts[EHCI_MAX_RSTS];
struct phy **phys;
int num_phys;
bool reset_on_resume;
@@ -149,7 +150,7 @@ static int ehci_platform_probe(struct platform_device *dev)
struct usb_ehci_pdata *pdata = dev_get_platdata(>dev);
struct ehci_platform_priv *priv;
struct ehci_hcd *ehci;
-   int err, irq, phy_num, clk = 0;
+   int err, irq, phy_num, clk = 0, rst;
 
if (usb_disabled())
return -ENODEV;
@@ -234,16 +235,22 @@ static int ehci_platform_probe(struct platform_device 
*dev)
}
}
 
-   priv->rst = devm_reset_control_get_optional(>dev, NULL);
-   if (IS_ERR(priv->rst)) {
-   err = PTR_ERR(priv->rst);
-   if (err == -EPROBE_DEFER)
-   goto err_put_clks;
-   priv->rst = NULL;
-   } else {
-   err = reset_control_deassert(priv->rst);
-   if (err)
-   goto err_put_clks;
+   for (rst = 0; rst < EHCI_MAX_RSTS; rst++) {
+   priv->rsts[rst] = of_reset_control_get_by_index(
+   dev->dev.of_node, rst);
+   if (IS_ERR(priv->rsts[rst])) {
+   err = PTR_ERR(priv->rsts[rst]);
+   if (err == -EPROBE_DEFER)
+   goto err_reset;
+   priv->rsts[rst] = NULL;
+   break;
+   }
+
+   err = reset_control_deassert(priv->rsts[rst]);
+   if (err) {
+   reset_control_put(priv->rsts[rst]);
+   goto err_reset;
+   }
}
 
if (pdata->big_endian_desc)
@@ -300,8 +307,10 @@ err_power:
if (pdata->power_off)
pdata->power_off(dev);
 err_reset:
-   if (priv->rst)
-   reset_control_assert(priv->rst);
+   while (--rst >= 0) {
+   reset_control_assert(priv->rsts[rst]);
+   reset_control_put(priv->rsts[rst]);
+   }
 err_put_clks:
while (--clk >= 0)
clk_put(priv->clks[clk]);
@@ -319,15 +328,17 @@ static int ehci_platform_remove(struct platform_device 
*dev)
struct usb_hcd *hcd = platform_get_drvdata(dev);
struct usb_ehci_pdata *pdata = dev_get_platdata(>dev);
struct ehci_platform_priv *priv = hcd_to_ehci_priv(hcd);
-   int clk;
+   int clk, rst;
 
usb_remove_hcd(hcd);
 
if (pdata->power_off)
pdata->power_off(dev);
 
-   if (priv->rst)
-   reset_control_assert(priv->rst);
+   for (rst = 0; rst < EHCI_MAX_RSTS && priv->rsts[rst]; rst++) {
+   reset_control_assert(priv->rsts[rst]);
+   reset_control_put(priv->rsts[rst]);
+   }
 
for (clk = 0; clk < EHCI_MAX_CLKS && priv->clks[clk]; clk++)
clk_put(priv->clks[clk]);
-- 
1.9.1



Re: [PATCH perf/core v7 07/21] perf symbol: Cleanup the code flow of dso__find_kallsyms

2016-05-11 Thread Masami Hiramatsu
On Wed, 11 May 2016 12:55:50 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:52:27PM +0900, Masami Hiramatsu escreveu:
> > Cleanup the code flow of dso__find_kallsyms() to remove
> > redundant checking code and add some comment for readability.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/symbol.c |   60 
> > +-
> >  1 file changed, 27 insertions(+), 33 deletions(-)
> > 
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index e112bbd..b2b06b7 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -1629,6 +1629,15 @@ static int find_matching_kcore(struct map *map, char 
> > *dir, size_t dir_sz)
> > return ret;
> >  }
> >  
> > +static bool filename__readable(const char *file)
> > +{
> > +   int fd = open(file, O_RDONLY);
> > +   if (fd < 0)
> > +   return false;
> > +   close(fd);
> > +   return true;
> > +}
> 
> Do we really have to use this big hammer just for checking if a file is
> readable? What is wronte with:
> 
>   access(pathname, R_OK)
> 
> ?

Yes, I actually had done that at prototyping. But as the manpage of access
said, access() checks accessibility by real uid/gid, but open() checks
by effective uid/gid. So, I considered this could cause unexpected
issue if we decided to use setuid(). I just wanted to check it was OK.

> I see, you're keeping the existing logic, but since you've gone to the
> trouble of introducing a seemingly generic function like
> filename__readable(), you could as well use the canonical way to check
> if a file is readable, namely access(R_OK), no?

I agreed in general :) If we don't can use access(R_OK) I'd like to use it.

> 
> Adrian, is there something magical about /proc/kcore for this test to be
> done that way? If so, we better document not to waste our time next time
> we look at this...

Ah, right. At least I had to write a comment about it.

Thank you,

> 
> - Arnaldo
> 
> > +
> >  static char *dso__find_kallsyms(struct dso *dso, struct map *map)
> >  {
> > u8 host_build_id[BUILD_ID_SIZE];
> > @@ -1648,45 +1657,33 @@ static char *dso__find_kallsyms(struct dso *dso, 
> > struct map *map)
> >  sizeof(host_build_id)) == 0)
> > is_host = dso__build_id_equal(dso, host_build_id);
> >  
> > -   build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > -
> > -   scnprintf(path, sizeof(path), "%s/%s/%s", buildid_dir,
> > - DSO__NAME_KCORE, sbuild_id);
> > -
> > -   /* Use /proc/kallsyms if possible */
> > +   /* Try a fast path for /proc/kallsyms if possible */
> 
> How that improves the previous comment?
> 
> > if (is_host) {
> > -   DIR *d;
> > -   int fd;
> > -
> > -   /* If no cached kcore go with /proc/kallsyms */
> > -   d = opendir(path);
> > -   if (!d)
> > -   goto proc_kallsyms;
> > -   closedir(d);
> > -
> > /*
> > -* Do not check the build-id cache, until we know we cannot use
> > -* /proc/kcore.
> > +* Do not check the build-id cache, unless we know we cannot use
> > +* /proc/kcore or module maps don't match to /proc/kallsyms.
> >  */
> > -   fd = open("/proc/kcore", O_RDONLY);
> > -   if (fd != -1) {
> > -   close(fd);
> > -   /* If module maps match go with /proc/kallsyms */
> > -   if (!validate_kcore_addresses("/proc/kallsyms", map))
> > -   goto proc_kallsyms;
> > -   }
> > -
> > -   /* Find kallsyms in build-id cache with kcore */
> > -   if (!find_matching_kcore(map, path, sizeof(path)))
> > -   return strdup(path);
> > -
> > -   goto proc_kallsyms;
> > +   if (filename__readable("/proc/kcore") &&
> > +   !validate_kcore_addresses("/proc/kallsyms", map))
> > +   goto proc_kallsyms;
> > }
> >  
> > +   build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > +
> > /* Find kallsyms in build-id cache with kcore */
> > +   scnprintf(path, sizeof(path), "%s/%s/%s",
> > + buildid_dir, DSO__NAME_KCORE, sbuild_id);
> > +
> > if (!find_matching_kcore(map, path, sizeof(path)))
> > return strdup(path);
> >  
> > +   /* Use current /proc/kallsyms if possible */
> > +   if (is_host) {
> > +proc_kallsyms:
> > +   return strdup("/proc/kallsyms");
> > +   }
> > +
> > +   /* Finally, find a cache of kallsyms */
> > scnprintf(path, sizeof(path), "%s/%s/%s",
> >   buildid_dir, DSO__NAME_KALLSYMS, sbuild_id);
> >  
> > @@ -1697,9 +1694,6 @@ static char *dso__find_kallsyms(struct dso *dso, 
> > struct map *map)
> > }
> >  
> > return strdup(path);
> > -
> > -proc_kallsyms:
> > -   return strdup("/proc/kallsyms");
> >  }
> 

Re: [PATCH perf/core v7 07/21] perf symbol: Cleanup the code flow of dso__find_kallsyms

2016-05-11 Thread Masami Hiramatsu
On Wed, 11 May 2016 12:55:50 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:52:27PM +0900, Masami Hiramatsu escreveu:
> > Cleanup the code flow of dso__find_kallsyms() to remove
> > redundant checking code and add some comment for readability.
> > 
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/symbol.c |   60 
> > +-
> >  1 file changed, 27 insertions(+), 33 deletions(-)
> > 
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index e112bbd..b2b06b7 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -1629,6 +1629,15 @@ static int find_matching_kcore(struct map *map, char 
> > *dir, size_t dir_sz)
> > return ret;
> >  }
> >  
> > +static bool filename__readable(const char *file)
> > +{
> > +   int fd = open(file, O_RDONLY);
> > +   if (fd < 0)
> > +   return false;
> > +   close(fd);
> > +   return true;
> > +}
> 
> Do we really have to use this big hammer just for checking if a file is
> readable? What is wronte with:
> 
>   access(pathname, R_OK)
> 
> ?

Yes, I actually had done that at prototyping. But as the manpage of access
said, access() checks accessibility by real uid/gid, but open() checks
by effective uid/gid. So, I considered this could cause unexpected
issue if we decided to use setuid(). I just wanted to check it was OK.

> I see, you're keeping the existing logic, but since you've gone to the
> trouble of introducing a seemingly generic function like
> filename__readable(), you could as well use the canonical way to check
> if a file is readable, namely access(R_OK), no?

I agreed in general :) If we don't can use access(R_OK) I'd like to use it.

> 
> Adrian, is there something magical about /proc/kcore for this test to be
> done that way? If so, we better document not to waste our time next time
> we look at this...

Ah, right. At least I had to write a comment about it.

Thank you,

> 
> - Arnaldo
> 
> > +
> >  static char *dso__find_kallsyms(struct dso *dso, struct map *map)
> >  {
> > u8 host_build_id[BUILD_ID_SIZE];
> > @@ -1648,45 +1657,33 @@ static char *dso__find_kallsyms(struct dso *dso, 
> > struct map *map)
> >  sizeof(host_build_id)) == 0)
> > is_host = dso__build_id_equal(dso, host_build_id);
> >  
> > -   build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > -
> > -   scnprintf(path, sizeof(path), "%s/%s/%s", buildid_dir,
> > - DSO__NAME_KCORE, sbuild_id);
> > -
> > -   /* Use /proc/kallsyms if possible */
> > +   /* Try a fast path for /proc/kallsyms if possible */
> 
> How that improves the previous comment?
> 
> > if (is_host) {
> > -   DIR *d;
> > -   int fd;
> > -
> > -   /* If no cached kcore go with /proc/kallsyms */
> > -   d = opendir(path);
> > -   if (!d)
> > -   goto proc_kallsyms;
> > -   closedir(d);
> > -
> > /*
> > -* Do not check the build-id cache, until we know we cannot use
> > -* /proc/kcore.
> > +* Do not check the build-id cache, unless we know we cannot use
> > +* /proc/kcore or module maps don't match to /proc/kallsyms.
> >  */
> > -   fd = open("/proc/kcore", O_RDONLY);
> > -   if (fd != -1) {
> > -   close(fd);
> > -   /* If module maps match go with /proc/kallsyms */
> > -   if (!validate_kcore_addresses("/proc/kallsyms", map))
> > -   goto proc_kallsyms;
> > -   }
> > -
> > -   /* Find kallsyms in build-id cache with kcore */
> > -   if (!find_matching_kcore(map, path, sizeof(path)))
> > -   return strdup(path);
> > -
> > -   goto proc_kallsyms;
> > +   if (filename__readable("/proc/kcore") &&
> > +   !validate_kcore_addresses("/proc/kallsyms", map))
> > +   goto proc_kallsyms;
> > }
> >  
> > +   build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > +
> > /* Find kallsyms in build-id cache with kcore */
> > +   scnprintf(path, sizeof(path), "%s/%s/%s",
> > + buildid_dir, DSO__NAME_KCORE, sbuild_id);
> > +
> > if (!find_matching_kcore(map, path, sizeof(path)))
> > return strdup(path);
> >  
> > +   /* Use current /proc/kallsyms if possible */
> > +   if (is_host) {
> > +proc_kallsyms:
> > +   return strdup("/proc/kallsyms");
> > +   }
> > +
> > +   /* Finally, find a cache of kallsyms */
> > scnprintf(path, sizeof(path), "%s/%s/%s",
> >   buildid_dir, DSO__NAME_KALLSYMS, sbuild_id);
> >  
> > @@ -1697,9 +1694,6 @@ static char *dso__find_kallsyms(struct dso *dso, 
> > struct map *map)
> > }
> >  
> > return strdup(path);
> > -
> > -proc_kallsyms:
> > -   return strdup("/proc/kallsyms");
> >  }
> >  
> >  static int 

Re: [PATCH perf/core v7 01/21] tools/perf: Fix lsdir to set errno correctly

2016-05-11 Thread Masami Hiramatsu
Hi Arnaldo,

On Wed, 11 May 2016 10:59:39 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:51:27PM +0900, Masami Hiramatsu escreveu:
> > Fix lsdir() to set correct positive error number (ENOMEM).
> > Since "errno" must have a positive error number instead of
> > negative number, fix lsdir to set it correctly.
> 
> Masami, please try to:
> 
> 1) Follow existing convention when writing the patch Subject line, when
> in doubt: git log tools/perf/util/util.c and check, in this case I
> changed it to "perf tools: Fix lsdir to set errno correctly"

OK, I see.

> 2) Add a Fixed line, for this case I used 'git blame' and added:
> 
> Fixes: e1ce726e1db2 ("perf tools: Add lsdir() helper to read a directory")
> 
> This helps reviewing (now and down the line), because one can go more
> quickly check what was the intention in the original patch, to see if it
> is really a fix.

Oops! I just missed that. Of course I will do.

> 
> In some cases this looks trivial, like in this patch, sometimes its
> not...

I see.

Thank you,

> 
> Thanks,
> 
> - Arnaldo
>  
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/util.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> > index 01c9433..eab077a 100644
> > --- a/tools/perf/util/util.c
> > +++ b/tools/perf/util/util.c
> > @@ -139,7 +139,7 @@ struct strlist *lsdir(const char *name,
> >  
> > list = strlist__new(NULL, NULL);
> > if (!list) {
> > -   errno = -ENOMEM;
> > +   errno = ENOMEM;
> > goto out;
> > }
> >  


-- 
Masami Hiramatsu 


Re: [PATCH perf/core v7 01/21] tools/perf: Fix lsdir to set errno correctly

2016-05-11 Thread Masami Hiramatsu
Hi Arnaldo,

On Wed, 11 May 2016 10:59:39 -0300
Arnaldo Carvalho de Melo  wrote:

> Em Wed, May 11, 2016 at 10:51:27PM +0900, Masami Hiramatsu escreveu:
> > Fix lsdir() to set correct positive error number (ENOMEM).
> > Since "errno" must have a positive error number instead of
> > negative number, fix lsdir to set it correctly.
> 
> Masami, please try to:
> 
> 1) Follow existing convention when writing the patch Subject line, when
> in doubt: git log tools/perf/util/util.c and check, in this case I
> changed it to "perf tools: Fix lsdir to set errno correctly"

OK, I see.

> 2) Add a Fixed line, for this case I used 'git blame' and added:
> 
> Fixes: e1ce726e1db2 ("perf tools: Add lsdir() helper to read a directory")
> 
> This helps reviewing (now and down the line), because one can go more
> quickly check what was the intention in the original patch, to see if it
> is really a fix.

Oops! I just missed that. Of course I will do.

> 
> In some cases this looks trivial, like in this patch, sometimes its
> not...

I see.

Thank you,

> 
> Thanks,
> 
> - Arnaldo
>  
> > Signed-off-by: Masami Hiramatsu 
> > ---
> >  tools/perf/util/util.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> > index 01c9433..eab077a 100644
> > --- a/tools/perf/util/util.c
> > +++ b/tools/perf/util/util.c
> > @@ -139,7 +139,7 @@ struct strlist *lsdir(const char *name,
> >  
> > list = strlist__new(NULL, NULL);
> > if (!list) {
> > -   errno = -ENOMEM;
> > +   errno = ENOMEM;
> > goto out;
> > }
> >  


-- 
Masami Hiramatsu 


Re: [PATCH 3/3] clk: bcm2835: Skip PLLC clocks when deciding on a new clock parent

2016-05-11 Thread Eric Anholt
Stephen Boyd  writes:

> On 05/09, Eric Anholt wrote:
>> If the firmware had set up a clock to source from PLLC, go along with
>> it.  But if we're looking for a new parent, we don't want to switch it
>> to PLLC because the firmware will force PLLC (and thus the AXI bus
>> clock) to different frequencies during over-temp/under-voltage,
>> without notification to Linux.
>> 
>> On my system, this moves the Linux-enabled HDMI state machine and DSI1
>> escape clock over to plld_per from pllc_per.  EMMC still ends up on
>> pllc_per, because the firmware had set it up to use that.
>
> Is it ok for EMMC rate to change with over-temp/under-voltage?
> The description makes it sound like PLLC is for clks that want to
> run at some system bus rate and they don't care about exact
> frequencies.

I'm surprised it's OK for it to change, but the firmware is very
intentionally putting it on PLLC (there's this #define for where most
peripherals go, and EMMC doesn't use it and goes for PLLC instead).  If
we do decide we want to override the firmware, I suspect we'll use
assigned-clock-parents for that.

>> Signed-off-by: Eric Anholt 
>> Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio 
>> domain clocks")
>> diff --git a/drivers/clk/bcm/clk-bcm2835.c b/drivers/clk/bcm/clk-bcm2835.c
>> index 1091012ecec6..1d8f29ea9f69 100644
>> --- a/drivers/clk/bcm/clk-bcm2835.c
>> +++ b/drivers/clk/bcm/clk-bcm2835.c
>> @@ -1008,16 +1008,28 @@ static int bcm2835_clock_set_rate(struct clk_hw *hw,
>>  return 0;
>>  }
>>  
>> +static bool
>> +bcm2835_clk_is_pllc(struct clk_hw *hw)
>> +{
>> +if (!hw)
>> +return false;
>> +
>> +return strncmp(clk_hw_get_name(hw), "pllc", 4) == 0;
>
> This strcmp is not great. Any chance we could look for the parent
> by reading the hardware and knowing what bit corresponds to pllc
> as a parent? That would be much nicer so that we don't rely on
> string comparisons for something the hardware can tell us.

We just have the parent index, but which indices are pllc dividers is
different between bcm2835_clock_per_parents and
bcm2835_clock_vpu_parents.  So, I guess it's possible, but seems more
error-prone than the strncmp.


signature.asc
Description: PGP signature


Re: [PATCH 3/3] clk: bcm2835: Skip PLLC clocks when deciding on a new clock parent

2016-05-11 Thread Eric Anholt
Stephen Boyd  writes:

> On 05/09, Eric Anholt wrote:
>> If the firmware had set up a clock to source from PLLC, go along with
>> it.  But if we're looking for a new parent, we don't want to switch it
>> to PLLC because the firmware will force PLLC (and thus the AXI bus
>> clock) to different frequencies during over-temp/under-voltage,
>> without notification to Linux.
>> 
>> On my system, this moves the Linux-enabled HDMI state machine and DSI1
>> escape clock over to plld_per from pllc_per.  EMMC still ends up on
>> pllc_per, because the firmware had set it up to use that.
>
> Is it ok for EMMC rate to change with over-temp/under-voltage?
> The description makes it sound like PLLC is for clks that want to
> run at some system bus rate and they don't care about exact
> frequencies.

I'm surprised it's OK for it to change, but the firmware is very
intentionally putting it on PLLC (there's this #define for where most
peripherals go, and EMMC doesn't use it and goes for PLLC instead).  If
we do decide we want to override the firmware, I suspect we'll use
assigned-clock-parents for that.

>> Signed-off-by: Eric Anholt 
>> Fixes: 41691b8862e2 ("clk: bcm2835: Add support for programming the audio 
>> domain clocks")
>> diff --git a/drivers/clk/bcm/clk-bcm2835.c b/drivers/clk/bcm/clk-bcm2835.c
>> index 1091012ecec6..1d8f29ea9f69 100644
>> --- a/drivers/clk/bcm/clk-bcm2835.c
>> +++ b/drivers/clk/bcm/clk-bcm2835.c
>> @@ -1008,16 +1008,28 @@ static int bcm2835_clock_set_rate(struct clk_hw *hw,
>>  return 0;
>>  }
>>  
>> +static bool
>> +bcm2835_clk_is_pllc(struct clk_hw *hw)
>> +{
>> +if (!hw)
>> +return false;
>> +
>> +return strncmp(clk_hw_get_name(hw), "pllc", 4) == 0;
>
> This strcmp is not great. Any chance we could look for the parent
> by reading the hardware and knowing what bit corresponds to pllc
> as a parent? That would be much nicer so that we don't rely on
> string comparisons for something the hardware can tell us.

We just have the parent index, but which indices are pllc dividers is
different between bcm2835_clock_per_parents and
bcm2835_clock_vpu_parents.  So, I guess it's possible, but seems more
error-prone than the strncmp.


signature.asc
Description: PGP signature


Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread chunfeng yun
Hi,

On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> >  wrote:
> > 
> > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > fixed when set its default mode as usb iddig.
> > >
> > > Signed-off-by: Chunfeng Yun 
> > 
> 
> Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> If you want to set its default mode to iddig, you should set it in dts.
> 
I set it in DTS, but it didn't work, because when usb driver requested
IRQ, pinmux was switched back to default mode set by
MTK_EINT_FUNCTION().


> Yours,
> Hongzhou
> 




Re: [PATCH] tty: serial: msm: Disable restoring Rx interrupts for DMA Mode

2016-05-11 Thread Stephen Boyd
On 05/10, Abhishek Sahu wrote:
> From: Charanya 

Was it intentional to only have one name here?

> 
> The Data loss was happening with current QCOM MSM serial driver during
> large file transfer due to simultaneous enabling of both UART and
> DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> watermark) or RXSTALE (stale interrupts) should not be enabled,
> since these conditions will be handled by DMA controller itself.
> If these interrupts are enabled then normal UART ISR will read some
> bytes of data from Rx Buffer and DMA controller will not receive
> these bytes of data, which will cause data loss.
> 
> Now this patch removed the code for enabling of RXLEV and RXSTALE
> interrupt in DMA Rx completion routine.

I'm lost, we keep both these irqs masked (well only if uartdm
version is 1.4 or greater) pretty much the entire time we're
using DMA for RX. msm_start_rx_dma() will mask them and then when
the callback completes we'll unmask them (the part that's deleted
in this patch), but then we'll go back and remask them almost
immediately because we call msm_start_rx_dma() from the dma
completion handler.

Can you clearly describe how this is actually fixing any
problems? What's the sequence of events that happens to cause
corruption?

This does raise the question though why we ever mask/unmask these
interrupts if we're always going to keep them masked while doing
DMA RX. Presumably if we can use DMA to RX, we can always use it
and set things up properly at startup time instead of later on.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH 6/6] pinctrl: mt8173: set GPIO16 to usb iddig mode

2016-05-11 Thread chunfeng yun
Hi,

On Wed, 2016-05-11 at 11:32 -0700, Hongzhou Yang wrote:
> On Wed, 2016-05-11 at 13:56 +0200, Linus Walleij wrote:
> > On Tue, May 10, 2016 at 10:23 AM, Chunfeng Yun
> >  wrote:
> > 
> > > the default mode of GPIO16 pin is gpio, when set EINT16 to
> > > IRQ_TYPE_LEVEL_HIGH, no interrupt is triggered, it can be
> > > fixed when set its default mode as usb iddig.
> > >
> > > Signed-off-by: Chunfeng Yun 
> > 
> 
> Chunfeng, GPIO16 can be used as EINT16 mode, but the pinmux should be 0.
> If you want to set its default mode to iddig, you should set it in dts.
> 
I set it in DTS, but it didn't work, because when usb driver requested
IRQ, pinmux was switched back to default mode set by
MTK_EINT_FUNCTION().


> Yours,
> Hongzhou
> 




Re: [PATCH] tty: serial: msm: Disable restoring Rx interrupts for DMA Mode

2016-05-11 Thread Stephen Boyd
On 05/10, Abhishek Sahu wrote:
> From: Charanya 

Was it intentional to only have one name here?

> 
> The Data loss was happening with current QCOM MSM serial driver during
> large file transfer due to simultaneous enabling of both UART and
> DMA interrupt. When UART operates in DMA mode, RXLEV (Rx FIFO over
> watermark) or RXSTALE (stale interrupts) should not be enabled,
> since these conditions will be handled by DMA controller itself.
> If these interrupts are enabled then normal UART ISR will read some
> bytes of data from Rx Buffer and DMA controller will not receive
> these bytes of data, which will cause data loss.
> 
> Now this patch removed the code for enabling of RXLEV and RXSTALE
> interrupt in DMA Rx completion routine.

I'm lost, we keep both these irqs masked (well only if uartdm
version is 1.4 or greater) pretty much the entire time we're
using DMA for RX. msm_start_rx_dma() will mask them and then when
the callback completes we'll unmask them (the part that's deleted
in this patch), but then we'll go back and remask them almost
immediately because we call msm_start_rx_dma() from the dma
completion handler.

Can you clearly describe how this is actually fixing any
problems? What's the sequence of events that happens to cause
corruption?

This does raise the question though why we ever mask/unmask these
interrupts if we're always going to keep them masked while doing
DMA RX. Presumably if we can use DMA to RX, we can always use it
and set things up properly at startup time instead of later on.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, May 11, 2016 11:54 PM
> 
> On Wed, 11 May 2016 06:29:06 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 05, 2016 11:05 PM
> > >
> > > On Thu, 5 May 2016 12:15:46 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > >
> > > > > Hi David and Kevin,
> > > > >
> > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > >
> > > > > > From: Tian, Kevin
> > > > > >> Sent: 05 May 2016 10:37
> > > > > > ...
> > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > > > >>>
> > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > the receive buffer ring to contain a single word entry that
> > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > then using a loopback mode to cause a specific packet be
> > > > > > received that writes the required word through that address.
> > > > > >
> > > > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > > > cycle.
> > > > > >
> > > > > > David
> > > > > >
> > > > >
> > > > > If we have enough permission to load a malicious driver or
> > > > > kernel, we can easily break the guest without exposed
> > > > > MSI-X table.
> > > > >
> > > > > I think it should be safe to expose MSI-X table if we can
> > > > > make sure that malicious guest driver/kernel can't use
> > > > > the MSI-X table to break other guest or host. The
> > > > > capability of IRQ remapping could provide this
> > > > > kind of protection.
> > > > >
> > > >
> > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > structure to guest. I know actual IRQ remapping might be platform
> > > > specific, but at least for Intel VT-d specification, MSI-X entry must
> > > > be configured with a remappable format by host kernel which
> > > > contains an index into IRQ remapping table. The index will find a
> > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > device. If you allow a malicious program random index into MSI-X
> > > > entry of assigned device, the hole is obvious...
> > > >
> > > > Above might make sense only for a IRQ remapping implementation
> > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > passthrough based on this fact instead of general IRQ remapping
> > > > enabled or not.
> > >
> > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > table to the guest and the guest can make direct use of it.  The end
> > > goal here is that the guest on a power system is already
> > > paravirtualized to not program the device MSI-X by directly writing to
> > > the MSI-X vector table.  They have hypercalls for this since they
> > > always run virtualized.  Therefore a) they never intend to touch the
> > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > can only hurt itself by doing so.
> > >
> > > On x86 we don't have a), our method of programming the MSI-X vector
> > > table is to directly write to it. Therefore we will always require QEMU
> > > to place a MemoryRegion over the vector table to intercept those
> > > accesses.  However with interrupt remapping, we do have b) on x86, which
> > > means that we don't need to be so strict in disallowing user accesses
> > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > the device, but the user should only be able to hurt themselves by
> > > writing it directly.  x86 doesn't really get anything out of this
> > > change, but it helps this special case on power pretty significantly
> > > aiui.  Thanks,
> > >
> >
> > Allowing guest direct write to MSI-x table has system-wide impact.
> > As I explained earlier, hypervisor needs to control "interrupt_index"
> > programmed in MSI-X entry, which is used to associate a specific
> > IRQ remapping entry. Now if you expose whole MSI-x table to guest,
> > it can program random index into MSI-X entry to associate with
> > any IRQ remapping entry and then there won't be any isolation per se.
> >
> > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d
> > spec.
> 
> I think you're extrapolating beyond the vfio interface.  The change
> here is to remove the vfio protection of the MSI-X vector table when
> the system provides interrupt isolation.  The argument is that this is
> safe to do because the hardware 

RE: [PATCH 5/5] vfio-pci: Allow to mmap MSI-X table if interrupt remapping is supported

2016-05-11 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, May 11, 2016 11:54 PM
> 
> On Wed, 11 May 2016 06:29:06 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, May 05, 2016 11:05 PM
> > >
> > > On Thu, 5 May 2016 12:15:46 +
> > > "Tian, Kevin"  wrote:
> > >
> > > > > From: Yongji Xie [mailto:xyj...@linux.vnet.ibm.com]
> > > > > Sent: Thursday, May 05, 2016 7:43 PM
> > > > >
> > > > > Hi David and Kevin,
> > > > >
> > > > > On 2016/5/5 17:54, David Laight wrote:
> > > > >
> > > > > > From: Tian, Kevin
> > > > > >> Sent: 05 May 2016 10:37
> > > > > > ...
> > > > > >>> Acutually, we are not aimed at accessing MSI-X table from
> > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we
> > > > > >>> can make sure guest kernel would not touch MSI-X table in
> > > > > >>> normal code path such as para-virtualized guest kernel on PPC64.
> > > > > >>>
> > > > > >> Then how do you prevent malicious guest kernel accessing it?
> > > > > > Or a malicious guest driver for an ethernet card setting up
> > > > > > the receive buffer ring to contain a single word entry that
> > > > > > contains the address associated with an MSI-X interrupt and
> > > > > > then using a loopback mode to cause a specific packet be
> > > > > > received that writes the required word through that address.
> > > > > >
> > > > > > Remember the PCIe cycle for an interrupt is a normal memory write
> > > > > > cycle.
> > > > > >
> > > > > > David
> > > > > >
> > > > >
> > > > > If we have enough permission to load a malicious driver or
> > > > > kernel, we can easily break the guest without exposed
> > > > > MSI-X table.
> > > > >
> > > > > I think it should be safe to expose MSI-X table if we can
> > > > > make sure that malicious guest driver/kernel can't use
> > > > > the MSI-X table to break other guest or host. The
> > > > > capability of IRQ remapping could provide this
> > > > > kind of protection.
> > > > >
> > > >
> > > > With IRQ remapping it doesn't mean you can pass through MSI-X
> > > > structure to guest. I know actual IRQ remapping might be platform
> > > > specific, but at least for Intel VT-d specification, MSI-X entry must
> > > > be configured with a remappable format by host kernel which
> > > > contains an index into IRQ remapping table. The index will find a
> > > > IRQ remapping entry which controls interrupt routing for a specific
> > > > device. If you allow a malicious program random index into MSI-X
> > > > entry of assigned device, the hole is obvious...
> > > >
> > > > Above might make sense only for a IRQ remapping implementation
> > > > which doesn't rely on extended MSI-X format (e.g. simply based on
> > > > BDF). If that's the case for PPC, then you should build MSI-X
> > > > passthrough based on this fact instead of general IRQ remapping
> > > > enabled or not.
> > >
> > > I don't think anyone is expecting that we can expose the MSI-X vector
> > > table to the guest and the guest can make direct use of it.  The end
> > > goal here is that the guest on a power system is already
> > > paravirtualized to not program the device MSI-X by directly writing to
> > > the MSI-X vector table.  They have hypercalls for this since they
> > > always run virtualized.  Therefore a) they never intend to touch the
> > > MSI-X vector table and b) they have sufficient isolation that a guest
> > > can only hurt itself by doing so.
> > >
> > > On x86 we don't have a), our method of programming the MSI-X vector
> > > table is to directly write to it. Therefore we will always require QEMU
> > > to place a MemoryRegion over the vector table to intercept those
> > > accesses.  However with interrupt remapping, we do have b) on x86, which
> > > means that we don't need to be so strict in disallowing user accesses
> > > to the MSI-X vector table.  It's not useful for configuring MSI-X on
> > > the device, but the user should only be able to hurt themselves by
> > > writing it directly.  x86 doesn't really get anything out of this
> > > change, but it helps this special case on power pretty significantly
> > > aiui.  Thanks,
> > >
> >
> > Allowing guest direct write to MSI-x table has system-wide impact.
> > As I explained earlier, hypervisor needs to control "interrupt_index"
> > programmed in MSI-X entry, which is used to associate a specific
> > IRQ remapping entry. Now if you expose whole MSI-x table to guest,
> > it can program random index into MSI-X entry to associate with
> > any IRQ remapping entry and then there won't be any isolation per se.
> >
> > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d
> > spec.
> 
> I think you're extrapolating beyond the vfio interface.  The change
> here is to remove the vfio protection of the MSI-X vector table when
> the system provides interrupt isolation.  The argument is that this is
> safe to do because the hardware protects the host from erroneous and
> malicious 

Re: [PATCH v8 6/8] dt-bindings: i2c: rk3x: add support for rk3399

2016-05-11 Thread David.Wu

Hi Doug,

在 2016/5/12 0:35, Doug Anderson 写道:

Hi,

On Tue, May 10, 2016 at 12:30 PM, David Wu  wrote:

The bus clock and function clock are separated at rk3399,
and others use one clock as the bus clock and function clock.

Signed-off-by: David Wu 
Reviewed-by: Douglas Anderson 
---
Change in v8:
- remove error description.

  Documentation/devicetree/bindings/i2c/i2c-rk3x.txt | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt 
b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
index 0b4a85f..5429301 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
@@ -6,10 +6,20 @@ RK3xxx SoCs.
  Required properties :

   - reg : Offset and length of the register set for the device
- - compatible : should be "rockchip,rk3066-i2c", "rockchip,rk3188-i2c",
-   "rockchip,rk3228-i2c" or "rockchip,rk3288-i2c".
+ - compatible: should be one of the following:
+   - "rockchip,rk3066-i2c": for rk3066
+   - "rockchip,rk3188-i2c": for rk3188
+   - "rockchip,rk3228-i2c": for rk3228
+   - "rockchip,rk3288-i2c": for rk3288
+   - "rockchip,rk3399-i2c": for rk3399
   - interrupts : interrupt number
- - clocks : parent clock
+ - clocks: See ../clock/clock-bindings.txt
+   - For older hardware (rk3066, rk3188, rk3228, rk3288):
+ - There is one clock that's used both to derive the functional clock
+   for the device and as the bus clock.
+   - For newer hardware (rk3399): specified by name
+ - "i2c": REQUIRED. This is used to derive the functional clock.
+ - "pclk": REQUIRED. This is the bus clock.


Depending on what Rob thinks, it might make sense to remove the above
two "REQUIRED" bits.  That would match his earlier feedback since
we're still in the "Required" section and thus it is redundant.



Okay, i make a little misunderstand for that, so i will fix it in next 
version.



-Doug







Re: [PATCH v8 6/8] dt-bindings: i2c: rk3x: add support for rk3399

2016-05-11 Thread David.Wu

Hi Doug,

在 2016/5/12 0:35, Doug Anderson 写道:

Hi,

On Tue, May 10, 2016 at 12:30 PM, David Wu  wrote:

The bus clock and function clock are separated at rk3399,
and others use one clock as the bus clock and function clock.

Signed-off-by: David Wu 
Reviewed-by: Douglas Anderson 
---
Change in v8:
- remove error description.

  Documentation/devicetree/bindings/i2c/i2c-rk3x.txt | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt 
b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
index 0b4a85f..5429301 100644
--- a/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
+++ b/Documentation/devicetree/bindings/i2c/i2c-rk3x.txt
@@ -6,10 +6,20 @@ RK3xxx SoCs.
  Required properties :

   - reg : Offset and length of the register set for the device
- - compatible : should be "rockchip,rk3066-i2c", "rockchip,rk3188-i2c",
-   "rockchip,rk3228-i2c" or "rockchip,rk3288-i2c".
+ - compatible: should be one of the following:
+   - "rockchip,rk3066-i2c": for rk3066
+   - "rockchip,rk3188-i2c": for rk3188
+   - "rockchip,rk3228-i2c": for rk3228
+   - "rockchip,rk3288-i2c": for rk3288
+   - "rockchip,rk3399-i2c": for rk3399
   - interrupts : interrupt number
- - clocks : parent clock
+ - clocks: See ../clock/clock-bindings.txt
+   - For older hardware (rk3066, rk3188, rk3228, rk3288):
+ - There is one clock that's used both to derive the functional clock
+   for the device and as the bus clock.
+   - For newer hardware (rk3399): specified by name
+ - "i2c": REQUIRED. This is used to derive the functional clock.
+ - "pclk": REQUIRED. This is the bus clock.


Depending on what Rob thinks, it might make sense to remove the above
two "REQUIRED" bits.  That would match his earlier feedback since
we're still in the "Required" section and thus it is redundant.



Okay, i make a little misunderstand for that, so i will fix it in next 
version.



-Doug







  1   2   3   4   5   6   7   8   9   10   >