date:20121031

Re: [PATCH] lockd: fix races in per-net NSM client handling

2012-10-31 Thread Paweł Sikora

On Wednesday 31 of October 2012 11:22:06 Greg KH wrote:
> On Wed, Oct 31, 2012 at 11:05:51AM -0700, Jonathan Nieder wrote:
> > Hi,
> > 
> > Greg KH wrote:
> > > On Wed, Oct 31, 2012 at 06:27:36PM +0100, Paweł Sikora wrote:
> > 
> > >> the patch metioned in https://lkml.org/lkml/2012/10/24/175 seems to fix
> > >> the 3.6.3 oops (while 3.6.2 works fine) at 16-cores opteron server.
> > >> please queue this path for 3.6.$next.
> > >
> > > Is it in Linus's tree already?  If so, what is the git commit id?
> > 
> > One of
> > 
> >   a4ee8d978e47 LOCKD: fix races in nsm_client_get
> >   e498daa81295 LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero
> > 
> > both of which were included in v3.6.5.
> 
> Ok, Paweł, does 3.6.5 work properly for you?

~12h uptime with full cpu/nfs load and all servers with 3.6.5 seem to work 
stable.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

regulator: tps51632: Seems current code doesn't properly support dvfs_step_20mV case

2012-10-31 Thread Axel Lin

Hi Laxman,

While reading the tps51632 driver, I found there is a dvfs_step_20mV setting
in platform data. But seems current code doesn't properly handle the case when
dvfs_step_20mV is true.

I guess if pdata->dvfs_step_20mV is true, we need:

tps->desc.uV_step = TPS51632_VOLATGE_STEP_20mV;
Fix TPS51632_VOLT_VSEL macro to support dvfs_step_20mV case.
Also I'm wondering if either TPS51632_MAX_VSEL/TPS51632_MAX_VOLATGE or
desc.n_voltages needs change for dvfs_step_20mV case.

Oh, I don't have the datasheet, so my understanding might be wrong.

Regards,
Axel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Suggestion] net-ipv6: format %8s change to %16s in rt6_info_route function of route.c

2012-10-31 Thread Chen Gang

Hello:

1) For Public Kernel:

   A) in rt6_info_route function of net/ipv6/route.c

   B) the length of rt->rt6i_dev->name is 16 (IFNAMSIZ)

   C) using %16s is better than %8s (it will be more "beautiful")
  (also suggest to delete RT6_INFO_LEN, it is useless for ever)



2) For Red Hat RHEL5:

   A) in rt6_info_route function of net/ipv6/route.c

   B) the length of rt->rt6i_dev->name is 16 (IFNAMSIZ)

   C) for RT6_INFO_LEN is still useful, so it is an correctness issue.

   the relative patch for RHEL5 is below:

-
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 08ab51f..3c90b4c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2285,7 +2285,7 @@ void inet6_rt_notify(int event, struct rt6_info
*rt, struct nlmsghdr *nlh,

 #ifdef CONFIG_PROC_FS

-#define RT6_INFO_LEN (32 + 4 + 32 + 4 + 32 + 40 + 5 + 1)
+#define RT6_INFO_LEN (32 + 4 + 32 + 4 + 32 + 48 + 5 + 1)

 struct rt6_proc_arg
 {
@@ -2343,7 +2343,7 @@ static int rt6_info_route(struct rt6_info *rt,
void *p_arg)
arg->len += 32;
}
arg->len += sprintf(arg->buffer + arg->len,
-   " %08x %08x %08x %08x %8s\n",
+   " %08x %08x %08x %08x %16s\n",
rt->rt6i_metric,
atomic_read(&rt->u.dst.__refcnt),
rt->u.dst.__use, rt->rt6i_flags,
rt->rt6i_dev ? rt->rt6i_dev->name : "");



-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/2] Revert pad config check in xen_check_mwait

2012-10-31 Thread Liu, Jinsong

Updated, adding version check at mwait routine.

Thanks,
Jinsong


>From 27e28963d4d25e4c998b5b5ea3828a02e6de9470 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Thu, 1 Nov 2012 21:18:43 +0800
Subject: [PATCH 2/2] Revert pad config check in xen_check_mwait

With Xen acpi pad logic added into kernel, we can now revert xen mwait related
patch df88b2d96e36d9a9e325bfcd12eb45671cbbc937. The reason is, when running 
under
newer Xen platform, Xen pad driver would be early loaded, so native pad driver
would fail to be loaded, and hence no mwait/monitor #UD risk again.

Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose
mwait cpuid capability when running under older Xen platform.

Signed-off-by: Liu, Jinsong 
---
 arch/x86/xen/enlighten.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 586d838..9e22e41 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -287,8 +287,7 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 
 static bool __init xen_check_mwait(void)
 {
-#if defined(CONFIG_ACPI) && !defined(CONFIG_ACPI_PROCESSOR_AGGREGATOR) && \
-   !defined(CONFIG_ACPI_PROCESSOR_AGGREGATOR_MODULE)
+#ifdef CONFIG_ACPI
struct xen_platform_op op = {
.cmd= XENPF_set_processor_pminfo,
.u.set_pminfo.id= -1,
@@ -297,6 +296,10 @@ static bool __init xen_check_mwait(void)
uint32_t buf[3];
unsigned int ax, bx, cx, dx;
unsigned int mwait_mask;
+   unsigned int version = HYPERVISOR_xen_version(XENVER_version, NULL);
+   unsigned int major = version >> 16;
+   unsigned int minor = version & 0x;
+
 
/* We need to determine whether it is OK to expose the MWAIT
 * capability to the kernel to harvest deeper than C3 states from ACPI
@@ -309,6 +312,13 @@ static bool __init xen_check_mwait(void)
if (!xen_initial_domain())
return false;
 
+   /*
+* When running under platform earlier than Xen4.2, do not expose
+* mwait, to avoid the risk of loading native acpi pad driver
+*/
+   if (((major == 4) && (minor < 2)) || (major < 4))
+   return false;
+
ax = 1;
cx = 0;
 
-- 
1.7.1


0002-Revert-pad-config-check-in-xen_check_mwait.patch
Description: 0002-Revert-pad-config-check-in-xen_check_mwait.patch

Problem with DISCARD and RAID5

2012-10-31 Thread NeilBrown


Hi Shaohua,
 I've been doing some testing and discovered a problem with your discard
 support for RAID5.

 The code in blkdev_issue_discard assumes that the 'granularity' is a power
 of 2, and for example subtracts 1 to get a mask.

 However RAID5 sets the granularity to be the stripe size which often is not
 a power of two.  When this happens you can easily get into an infinite loop.

 I suspect that to make this work properly, blkdev_issue_discard will need to
 be changed to allow 'granularity' to be an arbitrary value.
 When it is a power of two, the current masking can be used.
 When it is anything else, it will need to use sector_div().

 Could you look into this please?

Thanks,
NeilBrown


signature.asc
Description: PGP signature

RE: [Xen-devel] [PATCH 1/2] Xen acpi pad implement

2012-10-31 Thread Liu, Jinsong

Thanks! updated as attached.

Jinsong

=
>From f514b97628945cfac00efb0d456f133d44754c9d Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Thu, 1 Nov 2012 21:02:36 +0800
Subject: [PATCH 1/2] Xen acpi pad implement

PAD is acpi Processor Aggregator Device which provides a control point
that enables the platform to perform specific processor configuration
and control that applies to all processors in the platform.

This patch is to implement Xen acpi pad logic. When running under Xen
virt platform, native pad driver would not work. Instead Xen pad driver,
a self-contained and very thin logic level, would take over acpi pad staff.
When acpi pad notify OSPM, xen pad logic intercept and parse _PUR object
and then hypercall to hyervisor for the rest work, say, core parking.

Signed-off-by: Liu, Jinsong 
---
 drivers/xen/Makefile |1 +
 drivers/xen/xen_acpi_pad.c   |  206 ++
 include/xen/interface/platform.h |   17 +++
 3 files changed, 224 insertions(+), 0 deletions(-)
 create mode 100644 drivers/xen/xen_acpi_pad.c

diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 0e86370..a2af622 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_XEN_MCE_LOG) += mcelog.o
 obj-$(CONFIG_XEN_PCIDEV_BACKEND)   += xen-pciback/
 obj-$(CONFIG_XEN_PRIVCMD)  += xen-privcmd.o
 obj-$(CONFIG_XEN_ACPI_PROCESSOR)   += xen-acpi-processor.o
+obj-$(CONFIG_XEN_DOM0) += xen_acpi_pad.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
diff --git a/drivers/xen/xen_acpi_pad.c b/drivers/xen/xen_acpi_pad.c
new file mode 100644
index 000..e8c26a4
--- /dev/null
+++ b/drivers/xen/xen_acpi_pad.c
@@ -0,0 +1,206 @@
+/*
+ * xen_acpi_pad.c - Xen pad interface
+ *
+ * Copyright (c) 2012, Intel Corporation.
+ *Author: Liu, Jinsong 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ACPI_PROCESSOR_AGGREGATOR_CLASS"acpi_pad"
+#define ACPI_PROCESSOR_AGGREGATOR_DEVICE_NAME "Processor Aggregator"
+#define ACPI_PROCESSOR_AGGREGATOR_NOTIFY 0x80
+
+static DEFINE_MUTEX(xen_pad_lock);
+
+static int xen_pad_set_idle_cpus(int num_cpus)
+{
+   struct xen_platform_op op;
+
+   if (num_cpus < 0)
+   return -EINVAL;
+
+   /* set cpu nums expected to be idled */
+   op.cmd = XENPF_core_parking;
+   op.u.core_parking.type = XEN_CORE_PARKING_SET;
+   op.u.core_parking.idle_nums = num_cpus;
+
+   return HYPERVISOR_dom0_op(&op);
+}
+
+/*
+ * Cannot get idle cpus by using hypercall once (shared with _SET)
+ * because of the characteristic of Xen continue_hypercall_on_cpu
+ */
+static int xen_pad_get_idle_cpus(void)
+{
+   int ret;
+   struct xen_platform_op op;
+
+   /* get cpu nums actually be idled */
+   op.cmd = XENPF_core_parking;
+   op.u.core_parking.type = XEN_CORE_PARKING_GET;
+   ret = HYPERVISOR_dom0_op(&op);
+   if (ret < 0)
+   return ret;
+
+   return op.u.core_parking.idle_nums;
+}
+
+/*
+ * Query firmware how many CPUs should be idle
+ * return -1 on failure
+ */
+static int xen_acpi_pad_pur(acpi_handle handle)
+{
+   struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
+   union acpi_object *package;
+   int num = -1;
+
+   if (ACPI_FAILURE(acpi_evaluate_object(handle, "_PUR", NULL, &buffer)))
+   return num;
+
+   if (!buffer.length || !buffer.pointer)
+   return num;
+
+   package = buffer.pointer;
+
+   if (package->type == ACPI_TYPE_PACKAGE &&
+   package->package.count == 2 &&
+   package->package.elements[0].integer.value == 1) /* rev 1 */
+
+   num = package->package.elements[1].integer.value;
+
+   kfree(buffer.pointer);
+   return num;
+}
+
+/* Notify firmware how many CPUs are idle */
+static void xen_acpi_pad_ost(acpi_handle handle, int stat,
+   uint32_t idle_cpus)
+{
+   union acpi_object params[3] = {
+   {.type = ACPI_TYPE_INTEGER,},
+   {.type = ACPI_TYPE_INTEGER,},
+   {.type = ACPI_TYPE_BUFFER,},
+   };
+   struct acpi_object_list arg_list = {3, params};
+
+   params[0].integer.value = ACPI_PROCESSOR_AGGREGATOR_NOTIFY;
+   params[1].integer.value =  stat;
+   params[2].buffer.length = 4;
+   params[2].buffer.pointer = (v

Re: [PATCH 5/5] ACPI: Add support for platform bus type

2012-10-31 Thread Yinghai Lu

On Wed, Oct 31, 2012 at 2:36 AM, Rafael J. Wysocki  wrote:
> From: Mika Westerberg 
>
> With ACPI 5 it is now possible to enumerate traditional SoC
> peripherals, like serial bus controllers and slave devices behind
> them.  These devices are typically based on IP-blocks used in many
> existing SoC platforms and platform drivers for them may already
> be present in the kernel tree.
>
> To make driver "porting" more straightforward, add ACPI support to
> the platform bus type.  Instead of writing ACPI "glue" drivers for
> the existing platform drivers, register the platform bus type with
> ACPI to create platform device objects for the drivers and bind the
> corresponding ACPI handles to those platform devices.
>
> This should allow us to reuse the existing platform drivers for the
> devices in question with the minimum amount of modifications.
>
> This changeset is based on Mika Westerberg's and Mathias Nyman's
> work.
>
> Signed-off-by: Mathias Nyman 
> Signed-off-by: Mika Westerberg 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/acpi/Makefile|1
>  drivers/acpi/acpi_platform.c |  285 
> +++
>  drivers/acpi/internal.h  |7 +
>  drivers/acpi/scan.c  |   16 ++
>  drivers/base/platform.c  |5
>  5 files changed, 313 insertions(+), 1 deletion(-)

this patch is too big, and should be split to at least two or more.

>  create mode 100644 drivers/acpi/acpi_platform.c
>
> Index: linux/drivers/acpi/Makefile
> ===
> --- linux.orig/drivers/acpi/Makefile
> +++ linux/drivers/acpi/Makefile
> @@ -37,6 +37,7 @@ acpi-y+= processor_core.o
>  acpi-y += ec.o
>  acpi-$(CONFIG_ACPI_DOCK)   += dock.o
>  acpi-y += pci_root.o pci_link.o pci_irq.o pci_bind.o
> +acpi-y += acpi_platform.o
>  acpi-y += power.o
>  acpi-y += event.o
>  acpi-y += sysfs.o
> Index: linux/drivers/acpi/acpi_platform.c
> ===
> --- /dev/null
> +++ linux/drivers/acpi/acpi_platform.c
> @@ -0,0 +1,285 @@
> +/*
> + * ACPI support for platform bus type.
> + *
> + * Copyright (C) 2012, Intel Corporation
> + * Authors: Mika Westerberg 
> + *  Mathias Nyman 
> + *  Rafael J. Wysocki 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +ACPI_MODULE_NAME("platform");
> +
> +struct resource_info {
> +   struct device *dev;
> +   struct resource *res;
> +   size_t n, cur;
> +};
> +
> +static acpi_status acpi_platform_count_resources(struct acpi_resource *res,
> +void *data)
> +{
> +   struct acpi_resource_extended_irq *acpi_xirq;
> +   struct resource_info *ri = data;
> +
> +   switch (res->type) {
> +   case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
> +   case ACPI_RESOURCE_TYPE_IRQ:
> +   ri->n++;
> +   break;
> +   case ACPI_RESOURCE_TYPE_EXTENDED_IRQ:
> +   acpi_xirq = &res->data.extended_irq;
> +   ri->n += acpi_xirq->interrupt_count;
> +   break;
> +   case ACPI_RESOURCE_TYPE_ADDRESS32:
> +   if (res->data.address32.resource_type == ACPI_IO_RANGE)
> +   ri->n++;
> +   break;
> +   }
> +
> +   return AE_OK;
> +}
> +
> +static acpi_status acpi_platform_add_resources(struct acpi_resource *res,
> +  void *data)
> +{
> +   struct acpi_resource_fixed_memory32 *acpi_mem;
> +   struct acpi_resource_address32 *acpi_add32;
> +   struct acpi_resource_extended_irq *acpi_xirq;
> +   struct acpi_resource_irq *acpi_irq;
> +   struct resource_info *ri = data;
> +   struct resource *r;
> +   int irq, i;
> +
> +   switch (res->type) {
> +   case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
> +   acpi_mem = &res->data.fixed_memory32;
> +   r = &ri->res[ri->cur++];
> +
> +   r->start = acpi_mem->address;
> +   r->end = r->start + acpi_mem->address_length - 1;
> +   r->flags = IORESOURCE_MEM;
> +
> +   dev_dbg(ri->dev, "Memory32Fixed %pR\n", r);
> +   break;
> +
> +   case ACPI_RESOURCE_TYPE_ADDRESS32:
> +   acpi_add32 = &res->data.address32;
> +
> +   if (acpi_add32->resource_type == ACPI_IO_RANGE) {
> +   r = &ri->res[ri->cur++];
> +   r->start = acpi_add32->minimum;
> +   r->end = r->start + acpi_add32->address_length - 1;
> +

[PATCH] mtd: cmdlinepart: fix the overflow of big mtd partitions

2012-10-31 Thread Huang Shijie

When the kernel parses the following cmdline

#mtdparts=gpmi-nand:16m(boot),16m(kernel),1g(home),4g(test),-(usr)

for a big nand chip Micron MT29F64G08AFAAAWP(8GB), we got the following wrong
result:

.
"mtd: partition size too small (0)"
.

We can not get any partition.

The "4g(test)" partition triggers a overflow of the "size". The memparse()
returns 4g to the "size", but the size is "unsigned long" type, so a overflow
occurs, the "size" becomes zero in the end.

This patch changes the "size"/"offset" to "unsigned long long" type,
and replaces the UINT_MAX with ULLONG_MAX for macros SIZE_REMAINING and
OFFSET_CONTINUOUS.

Signed-off-by: Huang Shijie 
---
 drivers/mtd/cmdlinepart.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/mtd/cmdlinepart.c b/drivers/mtd/cmdlinepart.c
index 4baab3b..1cfd741 100644
--- a/drivers/mtd/cmdlinepart.c
+++ b/drivers/mtd/cmdlinepart.c
@@ -56,8 +56,8 @@
 
 
 /* special size referring to all the remaining space in a partition */
-#define SIZE_REMAINING UINT_MAX
-#define OFFSET_CONTINUOUS UINT_MAX
+#define SIZE_REMAINING ULLONG_MAX
+#define OFFSET_CONTINUOUS ULLONG_MAX
 
 struct cmdline_mtd_partition {
struct cmdline_mtd_partition *next;
@@ -89,7 +89,7 @@ static struct mtd_partition * newpart(char *s,
  int extra_mem_size)
 {
struct mtd_partition *parts;
-   unsigned long size, offset = OFFSET_CONTINUOUS;
+   unsigned long long size, offset = OFFSET_CONTINUOUS;
char *name;
int name_len;
unsigned char *extra_mem;
@@ -104,7 +104,7 @@ static struct mtd_partition * newpart(char *s,
} else {
size = memparse(s, &s);
if (size < PAGE_SIZE) {
-   printk(KERN_ERR ERRP "partition size too small 
(%lx)\n", size);
+   printk(KERN_ERR ERRP "partition size too small 
(%llx)\n", size);
return ERR_PTR(-EINVAL);
}
}
@@ -296,7 +296,7 @@ static int parse_cmdline_partitions(struct mtd_info *master,
struct mtd_partition **pparts,
struct mtd_part_parser_data *data)
 {
-   unsigned long offset;
+   unsigned long long offset;
int i, err;
struct cmdline_mtd_partition *part;
const char *mtd_id = master->name;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch v2 2/2] regulator: support for DA9053-BC

2012-10-31 Thread Ashish Jangam

>From 7b2e8c26c995d775f17491fcce7258e662009d0d Mon Sep 17 00:00:00 2001
From: Ashish Jangam 
Date: Thu, 1 Nov 2012 11:06:17 +0530
Subject: [PATCH 2/2] [patch v 2/1] regulator: support for DA9053-BC

This patch adds DA9053-BC PMIC support to the existing DA9052/53
regulator driver

Signed-off-by: David Dajun Chen 
Signed-off-by: Ashish Jangam 
---
 drivers/regulator/da9052-regulator.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/regulator/da9052-regulator.c 
b/drivers/regulator/da9052-regulator.c
index 27355b1..22fb8cf 100644
--- a/drivers/regulator/da9052-regulator.c
+++ b/drivers/regulator/da9052-regulator.c
@@ -354,6 +354,7 @@ static inline struct da9052_regulator_info 
*find_regulator_info(u8 chip_id,
case DA9053_AA:
case DA9053_BA:
case DA9053_BB:
+   case DA9053_BC:
for (i = 0; i < ARRAY_SIZE(da9053_regulator_info); i++) {
info = &da9053_regulator_info[i];
if (info->reg_desc.id == id)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch v1 1/2] mfd: i2c issue fix for DA9052/53 and support for DA9053-BC

2012-10-31 Thread Ashish Jangam

There is an issue where the DA9052/53-AA/BA/BB PMIC either locks up or fails to
respond following a system Reset. This could result in a second write
in which the bus writes the current content of the write buffer to address
of the last I2C access.

The failure case is where this unwanted write transfers incorrect data to
a critical register.

This patch fixes this issue to by following any read or write with a dummy read
to a safe register address. A safe register address is one where the contents
will not affect the operation of the system.

Apart from this the patch also adds support to the DA9053-BC PMIC chip

Signed-off-by: David Dajun Chen 
Signed-off-by: Ashish Jangam 
---
 drivers/mfd/da9052-i2c.c  |   58 +
 drivers/mfd/da9052-spi.c  |1 +
 include/linux/mfd/da9052/da9052.h |   48 --
 include/linux/mfd/da9052/reg.h|3 ++
 4 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/mfd/da9052-i2c.c b/drivers/mfd/da9052-i2c.c
index 352c58b..279e5f8 100644
--- a/drivers/mfd/da9052-i2c.c
+++ b/drivers/mfd/da9052-i2c.c
@@ -27,6 +27,62 @@
 #include 
 #endif
 
+/* safe register to park I2C operation */
+static inline bool is_i2c_safe_reg(unsigned char reg)
+{
+   switch (reg) {
+   case DA9052_STATUS_A_REG:
+   case DA9052_STATUS_B_REG:
+   case DA9052_STATUS_C_REG:
+   case DA9052_STATUS_D_REG:
+   case DA9052_ADC_RES_L_REG:
+   case DA9052_ADC_RES_H_REG:
+   case DA9052_VDD_RES_REG:
+   case DA9052_ICHG_AV_REG:
+   case DA9052_TBAT_RES_REG:
+   case DA9052_ADCIN4_RES_REG:
+   case DA9052_ADCIN5_RES_REG:
+   case DA9052_ADCIN6_RES_REG:
+   case DA9052_TJUNC_RES_REG:
+   case DA9052_TSI_X_MSB_REG:
+   case DA9052_TSI_Y_MSB_REG:
+   case DA9052_TSI_LSB_REG:
+   case DA9052_TSI_Z_MSB_REG:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/*
+ * There is an issue with DA9052 and DA9053_AA/BA/BB PMIC where the PMIC
+ * gets lockup up or fails to respond following a system reset.
+ * This fix is to follow any read or write with a dummy read to a safe
+ * register.
+ */
+int da9052_i2c_fix(struct da9052 *da9052, unsigned char reg)
+{
+   int val;
+
+   switch (da9052->chip_id) {
+   case DA9052:
+   case DA9053_AA:
+   case DA9053_BA:
+   case DA9053_BB:
+   /* A dummy read to a safe register address. */
+   if (!is_i2c_safe_reg(reg))
+   return regmap_read(da9052->regmap, DA9052_PARK_REGISTER,
+  &val);
+   break;
+   default:
+   /* Do nothing */
+   break;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(da9052_i2c_fix);
+
 static int da9052_i2c_enable_multiwrite(struct da9052 *da9052)
 {
int reg_val, ret;
@@ -51,6 +107,7 @@ static const struct i2c_device_id da9052_i2c_id[] = {
{"da9053-aa", DA9053_AA},
{"da9053-ba", DA9053_BA},
{"da9053-bb", DA9053_BB},
+   {"da9053-bc", DA9053_BC},
{}
 };
 
@@ -60,6 +117,7 @@ static const struct of_device_id dialog_dt_ids[] = {
{ .compatible = "dlg,da9053-aa", .data = &da9052_i2c_id[1] },
{ .compatible = "dlg,da9053-ab", .data = &da9052_i2c_id[2] },
{ .compatible = "dlg,da9053-bb", .data = &da9052_i2c_id[3] },
+   { .compatible = "dlg,da9053-bc", .data = &da9052_i2c_id[4] },
{ /* sentinel */ }
 };
 #endif
diff --git a/drivers/mfd/da9052-spi.c b/drivers/mfd/da9052-spi.c
index dbeadc5..524bd76 100644
--- a/drivers/mfd/da9052-spi.c
+++ b/drivers/mfd/da9052-spi.c
@@ -71,6 +71,7 @@ static struct spi_device_id da9052_spi_id[] = {
{"da9053-aa", DA9053_AA},
{"da9053-ba", DA9053_BA},
{"da9053-bb", DA9053_BB},
+   {"da9053-bc", DA9053_BC},
{}
 };
 
diff --git a/include/linux/mfd/da9052/da9052.h 
b/include/linux/mfd/da9052/da9052.h
index 0507c4c..21b934a 100644
--- a/include/linux/mfd/da9052/da9052.h
+++ b/include/linux/mfd/da9052/da9052.h
@@ -83,6 +83,7 @@ enum da9052_chip_id {
DA9053_AA,
DA9053_BA,
DA9053_BB,
+   DA9053_BC,
 };
 
 struct da9052_pdata;
@@ -101,6 +102,16 @@ struct da9052 {
int chip_irq;
 };
 
+/* I2C Fix */
+#if defined(CONFIG_MFD_DA9052_SPI)
+static inline int da9052_i2c_fix(struct da9052 *da9052, unsigned char reg)
+{
+   return 0;
+}
+#else
+int da9052_i2c_fix(struct da9052 *da9052, unsigned char reg);
+#endif
+
 /* ADC API */
 int da9052_adc_manual_read(struct da9052 *da9052, unsigned char channel);
 int da9052_adc_read_temp(struct da9052 *da9052);
@@ -113,32 +124,61 @@ static inline int da9052_reg_read(struct da9052 *da9052, 
unsigned char reg)
ret = regmap_read(da9052->regmap, reg, &val);
if (ret < 0)
return ret;
+
+   ret = da9052_i2c_fix(da9052, reg);
+   if (ret < 0)
+   return ret;
+
retu

Re: [PATCH v3 1/3] Use acpi_os_hotplug_execute() instead of alloc_acpi_hp_work().

2012-10-31 Thread Tang Chen


On 11/01/2012 11:52 AM, Yinghai Lu wrote:

On Wed, Oct 31, 2012 at 12:27 AM, Tang Chen  wrote:
Please check if you can just fold
acpi_hp_cb_execute
callers, and use acpi_os_hotplug_execute directly.

and have two local conext struct too.


I think this could bring some duplicated work. We need to do the same
work every time we call acpi_os_hotplug_execute(), what has been done
in acpi_hp_cb_execute().

I can try to modify it and resend a new patch to see if it is better.

Thanks. :)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PART3 Patch 00/14] introduce N_MEMORY

2012-10-31 Thread Wen Congyang

At 11/01/2012 02:16 AM, David Rientjes Wrote:
> On Wed, 31 Oct 2012, Wen Congyang wrote:
> 
>> From: Lai Jiangshan 
>>
>> This patch is part3 of the following patchset:
>> https://lkml.org/lkml/2012/10/29/319
>>
>> Part1 is here:
>> https://lkml.org/lkml/2012/10/31/30
>>
>> Part2 is here:
>> http://marc.info/?l=linux-kernel&m=135166705909544&w=2
>>
>> You can apply this patchset without the other parts.
>>
>> we need a node which only contains movable memory. This feature is very
>> important for node hotplug. So we will add a new nodemask
>> for all memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
>> doesn't contain it.
>>
>> We don't remove N_HIGH_MEMORY because it can be used to search which
>> nodes contains memory that the kernel can use.
>>
> 
> This doesn't describe why we need the new node state, unfortunately.  It 

1. Somethimes, we use the node which contains the memory that can be used by
   kernel.
2. Sometimes, we use the node which contains the memory.

In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.

> makes sense to boot with node(s) containing only ZONE_MOVABLE, but it 
> doesn't show why we need a nodemask to specify such nodes and such 

Sorry for confusing you.
We don't add a nodemask to specify nodes which contain only ZONE_MOVABLE.
We want to add a nodemask(N_MEMORY) to specify nodes which contain memory.
In part3, we don't implement the node which only contain ZONE_MOVABLE, so
N_MEMORY is N_HIGH_MEMORY. We will add this nodemask when we implement
the node which contain only ZONE_MOVABLE.

In this patchset, we try to change N_HIGH_MEMORY to N_MEMORY for case2.

Thanks
Wen Congyang

> information should be available from the kernel log or /proc/zoneinfo.
> 
> Node hotplug should fail if all memory cannot be offlined, so why do we 
> need another nodemask?  Only offline the node if all memory is offlined.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regulator: Fix trivial typo for TPS51632 Kconfig help text

2012-10-31 Thread Axel Lin

Signed-off-by: Axel Lin 
---
 drivers/regulator/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 7b920c7..cbc685d 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -341,7 +341,7 @@ config REGULATOR_TPS51632
select REGMAP_I2C
help
  This driver supports TPS51632 voltage regulator chip.
- The TPS52632 is 3-2-1 Phase D-Cap+ Step Down Driverless Controller
+ The TPS51632 is 3-2-1 Phase D-Cap+ Step Down Driverless Controller
  with Serial VID control and DVFS.
  The voltage output can be configure through I2C interface or PWM
  interface.
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel module

2012-10-31 Thread zhangyanfei

于 2012年10月31日 17:01, Hatayama, Daisuke 写道:
> 
> 
>> -Original Message-
>> From: kexec-boun...@lists.infradead.org
>> [mailto:kexec-boun...@lists.infradead.org] On Behalf Of zhangyanfei
>> Sent: Wednesday, October 31, 2012 12:34 PM
>> To: x...@kernel.org; ke...@lists.infradead.org; Avi Kivity; Marcelo
>> Tosatti
>> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org
>> Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
>> loading kvm_intel module
>>
>> Signed-off-by: Zhang Yanfei 
> 
> [...]
> 
>> @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
>>  if (r)
>>  goto out3;
>>
>> +#ifdef CONFIG_KEXEC
>> +crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
>> +#endif
>> +
> 
> Assignment here cannot cover the case where NMI is initiated after VMX is on 
> in kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare 
> but can happen.
> 

By saying "VMX is on in kvm init", you mean kvm_init enables the VMX feature in 
the logical processor?
No, only there is a vcpu to be created, kvm will enable the VMX feature.

I think there is no difference with this assignment before or after kvm_init 
because the vmcs linked
list must be empty before vmx_init is finished.

Thanks
Zhang Yanfei

> What does happen if calling vmclear_local_loaded_vmcss before kvm_init? I 
> think it no problem since the list is initially empty.
> 
>>  vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
>>  vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
>>  vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
>> @@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
>>  free_page((unsigned long)vmx_io_bitmap_b);
>>  free_page((unsigned long)vmx_io_bitmap_a);
>>
>> +#ifdef CONFIG_KEXEC
>> +crash_clear_loaded_vmcss = NULL;
>> +#endif
>> +
>>  kvm_exit();
>>  }
> 
> Also, this is converse to the above.
> 
> Thanks.
> HATAYAMA, Daisuke
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next v5 7/7] tuntap: choose the txq based on rxq

2012-10-31 Thread Jason Wang

This patch implements a simple multiqueue flow steering policy - tx follows rx
for tun/tap. The idea is simple, it just choose the txq based on which rxq it
comes. The flow were identified through the rxhash of a skb, and the hash to
queue mapping were recorded in a hlist with an ageing timer to retire the
mapping. The mapping were created when tun receives packet from userspace, and
was quired in .ndo_select_queue().

I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
perf top, so the overhead could be negelected.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |  227 -
 1 files changed, 224 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 79b6f9e..9e28768 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -115,6 +115,8 @@ struct tap_filter {
  */
 #define MAX_TAP_QUEUES 1024
 
+#define TUN_FLOW_EXPIRE (3 * HZ)
+
 /* A tun_file connects an open character device to a tuntap netdevice. It
  * also contains all socket related strctures (except sock_fprog and 
tap_filter)
  * to serve as one transmit queue for tuntap device. The sock_fprog and
@@ -138,6 +140,18 @@ struct tun_file {
u16 queue_index;
 };
 
+struct tun_flow_entry {
+   struct hlist_node hash_link;
+   struct rcu_head rcu;
+   struct tun_struct *tun;
+
+   u32 rxhash;
+   int queue_index;
+   unsigned long updated;
+};
+
+#define TUN_NUM_FLOW_ENTRIES 1024
+
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
  * device, socket fileter, sndbuf and vnet header size were restore when the
  * file were attached to a persist device.
@@ -163,8 +177,164 @@ struct tun_struct {
 #ifdef TUN_DEBUG
int debug;
 #endif
+   spinlock_t lock;
+   struct kmem_cache *flow_cache;
+   struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
+   struct timer_list flow_gc_timer;
+   unsigned long ageing_time;
 };
 
+static inline u32 tun_hashfn(u32 rxhash)
+{
+   return rxhash & 0x3ff;
+}
+
+static struct tun_flow_entry *tun_flow_find(struct hlist_head *head, u32 
rxhash)
+{
+   struct tun_flow_entry *e;
+   struct hlist_node *n;
+
+   hlist_for_each_entry_rcu(e, n, head, hash_link) {
+   if (e->rxhash == rxhash)
+   return e;
+   }
+   return NULL;
+}
+
+static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
+ struct hlist_head *head,
+ u32 rxhash, u16 queue_index)
+{
+   struct tun_flow_entry *e = kmem_cache_alloc(tun->flow_cache,
+   GFP_ATOMIC);
+   if (e) {
+   tun_debug(KERN_INFO, tun, "create flow: hash %u index %u\n",
+ rxhash, queue_index);
+   e->updated = jiffies;
+   e->rxhash = rxhash;
+   e->queue_index = queue_index;
+   e->tun = tun;
+   hlist_add_head_rcu(&e->hash_link, head);
+   }
+   return e;
+}
+
+static void tun_flow_free(struct rcu_head *head)
+{
+   struct tun_flow_entry *e
+   = container_of(head, struct tun_flow_entry, rcu);
+   kmem_cache_free(e->tun->flow_cache, e);
+}
+
+static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
+{
+   tun_debug(KERN_INFO, tun, "delete flow: hash %u index %u\n",
+ e->rxhash, e->queue_index);
+   hlist_del_rcu(&e->hash_link);
+   call_rcu(&e->rcu, tun_flow_free);
+}
+
+static void tun_flow_flush(struct tun_struct *tun)
+{
+   int i;
+
+   spin_lock_bh(&tun->lock);
+   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+   struct tun_flow_entry *e;
+   struct hlist_node *h, *n;
+
+   hlist_for_each_entry_safe(e, h, n, &tun->flows[i], hash_link)
+   tun_flow_delete(tun, e);
+   }
+   spin_unlock_bh(&tun->lock);
+}
+
+static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 queue_index)
+{
+   int i;
+
+   spin_lock_bh(&tun->lock);
+   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+   struct tun_flow_entry *e;
+   struct hlist_node *h, *n;
+
+   hlist_for_each_entry_safe(e, h, n, &tun->flows[i], hash_link) {
+   if (e->queue_index == queue_index)
+   tun_flow_delete(tun, e);
+   }
+   }
+   spin_unlock_bh(&tun->lock);
+}
+
+static void tun_flow_cleanup(unsigned long data)
+{
+   struct tun_struct *tun = (struct tun_struct *)data;
+   unsigned long delay = tun->ageing_time;
+   unsigned long next_timer = jiffies + delay;
+   unsigned long count = 0;
+   int i;
+
+   tun_debug(KERN_INFO, tun, "tun_flow_cleanup\n");
+
+   spin_lock_bh(&tun->lock);
+   for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+

[net-next v5 6/7] tuntap: add ioctl to attach or detach a file form tuntap device

2012-10-31 Thread Jason Wang

Sometimes usespace may need to active/deactive a queue, this could be done by
detaching and attaching a file from tuntap device.

This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
IFF_DETACH_QUEUE were introduced to do the detaching.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c   |   56 --
 include/uapi/linux/if_tun.h |3 ++
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2762c55..79b6f9e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -195,6 +195,15 @@ static u16 tun_select_queue(struct net_device *dev, struct 
sk_buff *skb)
return txq;
 }
 
+static inline bool tun_not_capable(struct tun_struct *tun)
+{
+   const struct cred *cred = current_cred();
+
+   return ((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) ||
+ (gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
+   !capable(CAP_NET_ADMIN);
+}
+
 static void tun_set_real_num_queues(struct tun_struct *tun)
 {
netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
@@ -1310,8 +1319,6 @@ static int tun_set_iff(struct net *net, struct file 
*file, struct ifreq *ifr)
 
dev = __dev_get_by_name(net, ifr->ifr_name);
if (dev) {
-   const struct cred *cred = current_cred();
-
if (ifr->ifr_flags & IFF_TUN_EXCL)
return -EBUSY;
if ((ifr->ifr_flags & IFF_TUN) && dev->netdev_ops == 
&tun_netdev_ops)
@@ -1321,9 +1328,7 @@ static int tun_set_iff(struct net *net, struct file 
*file, struct ifreq *ifr)
else
return -EINVAL;
 
-   if (((uid_valid(tun->owner) && !uid_eq(cred->euid, tun->owner)) 
||
-(gid_valid(tun->group) && !in_egroup_p(tun->group))) &&
-   !capable(CAP_NET_ADMIN))
+   if (tun_not_capable(tun))
return -EPERM;
err = security_tun_dev_attach(tfile->socket.sk);
if (err < 0)
@@ -1530,6 +1535,40 @@ static void tun_set_sndbuf(struct tun_struct *tun)
}
 }
 
+static int tun_set_queue(struct file *file, struct ifreq *ifr)
+{
+   struct tun_file *tfile = file->private_data;
+   struct tun_struct *tun;
+   struct net_device *dev;
+   int ret = 0;
+
+   rtnl_lock();
+
+   if (ifr->ifr_flags & IFF_ATTACH_QUEUE) {
+   dev = __dev_get_by_name(tfile->net, ifr->ifr_name);
+   if (!dev) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
+   tun = netdev_priv(dev);
+   if (dev->netdev_ops != &tap_netdev_ops &&
+   dev->netdev_ops != &tun_netdev_ops)
+   ret = -EINVAL;
+   else if (tun_not_capable(tun))
+   ret = -EPERM;
+   else
+   ret = tun_attach(tun, file);
+   } else if (ifr->ifr_flags & IFF_DETACH_QUEUE)
+   __tun_detach(tfile, false);
+   else
+   ret = -EINVAL;
+
+unlock:
+   rtnl_unlock();
+   return ret;
+}
+
 static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
unsigned long arg, int ifreq_len)
 {
@@ -1543,7 +1582,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
int vnet_hdr_sz;
int ret;
 
-   if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89) {
+   if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 0x89) {
if (copy_from_user(&ifr, argp, ifreq_len))
return -EFAULT;
} else {
@@ -1554,9 +1593,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
 * This is needed because we never checked for invalid flags on
 * TUNSETIFF. */
return put_user(IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE |
-   IFF_VNET_HDR,
+   IFF_VNET_HDR | IFF_MULTI_QUEUE,
(unsigned int __user*)argp);
-   }
+   } else if (cmd == TUNSETQUEUE)
+   return tun_set_queue(file, &ifr);
 
ret = 0;
rtnl_lock();
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 8ef3a87..958497a 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -54,6 +54,7 @@
 #define TUNDETACHFILTER _IOW('T', 214, struct sock_fprog)
 #define TUNGETVNETHDRSZ _IOR('T', 215, int)
 #define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNSETQUEUE  _IOW('T', 217, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN0x0001
@@ -63,6 +64,8 @@
 #define IFF_VNET_HDR   0x4000
 #define IFF_TUN_EXCL   0x8000
 #define IFF_MULTI_QUEUE 0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define

[net-next v5 5/7] tuntap: multiqueue support

2012-10-31 Thread Jason Wang

This patch converts tun/tap to a multiqueue devices and expose the multiqueue
queues as multiple file descriptors to userspace. Internally, each tun_file were
abstracted as a queue, and an array of pointers to tun_file structurs were
stored in tun_structure device, so multiple tun_files were allowed to be
attached to the device as multiple queues.

When choosing txq, we first try to identify a flow through its rxhash, if it
does not have such one, we could try recorded rxq and then use them to choose
the transmit queue. This policy may be changed in the future.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |  308 ++---
 1 files changed, 220 insertions(+), 88 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bdbb526..2762c55 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -109,6 +109,12 @@ struct tap_filter {
unsigned char   addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
+/* 1024 is probably a high enough limit: modern hypervisors seem to support on
+ * the order of 100-200 CPUs so this leaves us some breathing space if we want
+ * to match a queue per guest CPU.
+ */
+#define MAX_TAP_QUEUES 1024
+
 /* A tun_file connects an open character device to a tuntap netdevice. It
  * also contains all socket related strctures (except sock_fprog and 
tap_filter)
  * to serve as one transmit queue for tuntap device. The sock_fprog and
@@ -129,6 +135,7 @@ struct tun_file {
struct fasync_struct *fasync;
/* only used for fasnyc */
unsigned int flags;
+   u16 queue_index;
 };
 
 /* Since the socket were moved to tun_file, to preserve the behavior of persist
@@ -136,7 +143,8 @@ struct tun_file {
  * file were attached to a persist device.
  */
 struct tun_struct {
-   struct tun_file __rcu   *tfile;
+   struct tun_file __rcu   *tfiles[MAX_TAP_QUEUES];
+   unsigned intnumqueues;
unsigned intflags;
kuid_t  owner;
kgid_t  group;
@@ -157,56 +165,157 @@ struct tun_struct {
 #endif
 };
 
+/* We try to identify a flow through its rxhash first. The reason that
+ * we do not check rxq no. is becuase some cards(e.g 82599), chooses
+ * the rxq based on the txq where the last packet of the flow comes. As
+ * the userspace application move between processors, we may get a
+ * different rxq no. here. If we could not get rxhash, then we would
+ * hope the rxq no. may help here.
+ */
+static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   u32 txq = 0;
+   u32 numqueues = 0;
+
+   rcu_read_lock();
+   numqueues = tun->numqueues;
+
+   txq = skb_get_rxhash(skb);
+   if (txq) {
+   /* use multiply and shift instead of expensive divide */
+   txq = ((u64)txq * numqueues) >> 32;
+   } else if (likely(skb_rx_queue_recorded(skb))) {
+   txq = skb_get_rx_queue(skb);
+   while (unlikely(txq >= numqueues))
+   txq -= numqueues;
+   }
+
+   rcu_read_unlock();
+   return txq;
+}
+
+static void tun_set_real_num_queues(struct tun_struct *tun)
+{
+   netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
+   netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
+}
+
+static void __tun_detach(struct tun_file *tfile, bool clean)
+{
+   struct tun_file *ntfile;
+   struct tun_struct *tun;
+   struct net_device *dev;
+
+   tun = rcu_dereference_protected(tfile->tun,
+   lockdep_rtnl_is_held());
+   if (tun) {
+   u16 index = tfile->queue_index;
+   BUG_ON(index >= tun->numqueues);
+   dev = tun->dev;
+
+   rcu_assign_pointer(tun->tfiles[index],
+  tun->tfiles[tun->numqueues - 1]);
+   rcu_assign_pointer(tfile->tun, NULL);
+   ntfile = rcu_dereference_protected(tun->tfiles[index],
+  lockdep_rtnl_is_held());
+   ntfile->queue_index = index;
+
+   --tun->numqueues;
+   sock_put(&tfile->sk);
+
+   synchronize_net();
+   /* Drop read queue */
+   skb_queue_purge(&tfile->sk.sk_receive_queue);
+   tun_set_real_num_queues(tun);
+
+   if (tun->numqueues == 0 && !(tun->flags & TUN_PERSIST))
+   if (dev->reg_state == NETREG_REGISTERED)
+   unregister_netdevice(dev);
+   }
+
+   if (clean) {
+   BUG_ON(!test_bit(SOCK_EXTERNALLY_ALLOCATED,
+&tfile->socket.flags));
+   sk_release_kernel(&tfile->sk);
+   }
+}
+
+static void tun_detach(struct tun_file *tfile, bool clean)
+{
+   rtnl_lock();
+   __tun_detach(tfile, clean);
+   rtnl_unlock();
+}
+
+static vo

[net-next v5 4/7] tuntap: introduce multiqueue flags

2012-10-31 Thread Jason Wang

Add flags to be used by creating multiqueue tuntap device.

Signed-off-by: Jason Wang 
---
 include/uapi/linux/if_tun.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index 25a585c..8ef3a87 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -34,6 +34,7 @@
 #define TUN_ONE_QUEUE  0x0080
 #define TUN_PERSIST0x0100  
 #define TUN_VNET_HDR   0x0200
+#define TUN_TAP_MQ  0x0400
 
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
@@ -61,6 +62,7 @@
 #define IFF_ONE_QUEUE  0x2000
 #define IFF_VNET_HDR   0x4000
 #define IFF_TUN_EXCL   0x8000
+#define IFF_MULTI_QUEUE 0x0100
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM 0x01/* You can hand me unchecksummed packets. */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next v5 3/7] tuntap: RCUify dereferencing between tun_struct and tun_file

2012-10-31 Thread Jason Wang

RCU were introduced in this patch to synchronize the dereferences between
tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
dereference from one to other must be done under rtnl lock or rcu read critical
region.

This is needed for the following patches since the one of the goal of multiqueue
tuntap is to allow adding or removing queues during workload. Without RCU,
control path would hold tx locks when adding or removing queues (which may cause
sme delay) and it's hard to change the number of queues without stopping the net
device. With the help of rcu, there's also no need for tun_file hold an refcnt
to tun_struct.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |   95 ++---
 1 files changed, 47 insertions(+), 48 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d52ad24..bdbb526 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -115,13 +115,16 @@ struct tap_filter {
  * tap_filter were kept in tun_struct since they were used for filtering for 
the
  * netdevice not for a specific queue (at least I didn't see the reqirement for
  * this).
+ *
+ * RCU usage:
+ * The tun_file and tun_struct are loosely coupled, the pointer from on to the
+ * other can only be read while rcu_read_lock or rtnl_lock is held.
  */
 struct tun_file {
struct sock sk;
struct socket socket;
struct socket_wq wq;
-   atomic_t count;
-   struct tun_struct *tun;
+   struct tun_struct __rcu *tun;
struct net *net;
struct fasync_struct *fasync;
/* only used for fasnyc */
@@ -133,7 +136,7 @@ struct tun_file {
  * file were attached to a persist device.
  */
 struct tun_struct {
-   struct tun_file *tfile;
+   struct tun_file __rcu   *tfile;
unsigned intflags;
kuid_t  owner;
kgid_t  group;
@@ -179,13 +182,11 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file)
if (!err)
goto out;
}
-   tfile->tun = tun;
+   rcu_assign_pointer(tfile->tun, tun);
tfile->socket.sk->sk_sndbuf = tun->sndbuf;
-   tun->tfile = tfile;
+   rcu_assign_pointer(tun->tfile, tfile);
netif_carrier_on(tun->dev);
-   dev_hold(tun->dev);
sock_hold(&tfile->sk);
-   atomic_inc(&tfile->count);
 
 out:
netif_tx_unlock_bh(tun->dev);
@@ -194,34 +195,29 @@ out:
 
 static void __tun_detach(struct tun_struct *tun)
 {
-   struct tun_file *tfile = tun->tfile;
+   struct tun_file *tfile = rcu_dereference_protected(tun->tfile,
+   lockdep_rtnl_is_held());
/* Detach from net device */
-   netif_tx_lock_bh(tun->dev);
netif_carrier_off(tun->dev);
-   tun->tfile = NULL;
-   tfile->tun = NULL;
-   netif_tx_unlock_bh(tun->dev);
-
-   /* Drop read queue */
-   skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
-
-   /* Drop the extra count on the net device */
-   dev_put(tun->dev);
-}
+   rcu_assign_pointer(tun->tfile, NULL);
+   if (tfile) {
+   rcu_assign_pointer(tfile->tun, NULL);
 
-static void tun_detach(struct tun_struct *tun)
-{
-   rtnl_lock();
-   __tun_detach(tun);
-   rtnl_unlock();
+   synchronize_net();
+   /* Drop read queue */
+   skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
+   }
 }
 
 static struct tun_struct *__tun_get(struct tun_file *tfile)
 {
-   struct tun_struct *tun = NULL;
+   struct tun_struct *tun;
 
-   if (atomic_inc_not_zero(&tfile->count))
-   tun = tfile->tun;
+   rcu_read_lock();
+   tun = rcu_dereference(tfile->tun);
+   if (tun)
+   dev_hold(tun->dev);
+   rcu_read_unlock();
 
return tun;
 }
@@ -233,10 +229,7 @@ static struct tun_struct *tun_get(struct file *file)
 
 static void tun_put(struct tun_struct *tun)
 {
-   struct tun_file *tfile = tun->tfile;
-
-   if (atomic_dec_and_test(&tfile->count))
-   tun_detach(tfile->tun);
+   dev_put(tun->dev);
 }
 
 /* TAP filtering */
@@ -357,14 +350,15 @@ static const struct ethtool_ops tun_ethtool_ops;
 static void tun_net_uninit(struct net_device *dev)
 {
struct tun_struct *tun = netdev_priv(dev);
-   struct tun_file *tfile = tun->tfile;
+   struct tun_file *tfile = rcu_dereference_protected(tun->tfile,
+   lockdep_rtnl_is_held());
 
/* Inform the methods they need to stop using the dev.
 */
if (tfile) {
wake_up_all(&tfile->wq.wait);
-   if (atomic_dec_and_test(&tfile->count))
-   __tun_detach(tun);
+   __tun_detach(tun);
+   synchronize_net();
}
 }
 
@@ -386,14 +380,16 @@ static int tun_net_close(struct net_device *dev)

[net-next v5 2/7] tuntap: move socket to tun_file

2012-10-31 Thread Jason Wang

Current tuntap makes use of the socket receive queue as its tx queue. To
implement multiple tx queues for tuntap and enable the ability of adding and
removing queues during workload, the first step is to move the socket related
structures to tun_file. Then we could let multiple fds/sockets to be attached to
the tuntap.

This patch removes tun_sock and moves socket related structures from tun_sock or
tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
still kept in tun_structure since they are used to filter packets for the net
device instead of per transmit queue (at least I see no requirements for
them). After those changes, socket were created and destroyed during file open
and close (instead of device creation and destroy), the socket structures could
be dereferenced from tun_file instead of the file of tun_struct structure
itself.

For persisent device, since we purge during datching and wouldn't queue any
packets when no interface were attached, there's no behaviod changes before and
after this patch, so the changes were transparent to the userspace. To keep the
attributes such as sndbuf, socket filter and vnet header, those would be
re-initialize after a new interface were attached to an persist device.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |  266 +
 1 files changed, 145 insertions(+), 121 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index f830b1b..d52ad24 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -109,14 +109,29 @@ struct tap_filter {
unsigned char   addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
+/* A tun_file connects an open character device to a tuntap netdevice. It
+ * also contains all socket related strctures (except sock_fprog and 
tap_filter)
+ * to serve as one transmit queue for tuntap device. The sock_fprog and
+ * tap_filter were kept in tun_struct since they were used for filtering for 
the
+ * netdevice not for a specific queue (at least I didn't see the reqirement for
+ * this).
+ */
 struct tun_file {
+   struct sock sk;
+   struct socket socket;
+   struct socket_wq wq;
atomic_t count;
struct tun_struct *tun;
struct net *net;
+   struct fasync_struct *fasync;
+   /* only used for fasnyc */
+   unsigned int flags;
 };
 
-struct tun_sock;
-
+/* Since the socket were moved to tun_file, to preserve the behavior of persist
+ * device, socket fileter, sndbuf and vnet header size were restore when the
+ * file were attached to a persist device.
+ */
 struct tun_struct {
struct tun_file *tfile;
unsigned intflags;
@@ -127,29 +142,18 @@ struct tun_struct {
netdev_features_t   set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
  NETIF_F_TSO6|NETIF_F_UFO)
-   struct fasync_struct*fasync;
-
-   struct tap_filter   txflt;
-   struct socket   socket;
-   struct socket_wqwq;
 
int vnet_hdr_sz;
-
+   int sndbuf;
+   struct tap_filter   txflt;
+   struct sock_fprog   fprog;
+   /* protected by rtnl lock */
+   boolfilter_attached;
 #ifdef TUN_DEBUG
int debug;
 #endif
 };
 
-struct tun_sock {
-   struct sock sk;
-   struct tun_struct   *tun;
-};
-
-static inline struct tun_sock *tun_sk(struct sock *sk)
-{
-   return container_of(sk, struct tun_sock, sk);
-}
-
 static int tun_attach(struct tun_struct *tun, struct file *file)
 {
struct tun_file *tfile = file->private_data;
@@ -168,12 +172,19 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file)
goto out;
 
err = 0;
+
+   /* Re-attach filter when attaching to a persist device */
+   if (tun->filter_attached == true) {
+   err = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+   if (!err)
+   goto out;
+   }
tfile->tun = tun;
+   tfile->socket.sk->sk_sndbuf = tun->sndbuf;
tun->tfile = tfile;
-   tun->socket.file = file;
netif_carrier_on(tun->dev);
dev_hold(tun->dev);
-   sock_hold(tun->socket.sk);
+   sock_hold(&tfile->sk);
atomic_inc(&tfile->count);
 
 out:
@@ -183,14 +194,16 @@ out:
 
 static void __tun_detach(struct tun_struct *tun)
 {
+   struct tun_file *tfile = tun->tfile;
/* Detach from net device */
netif_tx_lock_bh(tun->dev);
netif_carrier_off(tun->dev);
tun->tfile = NULL;
+   tfile->tun = NULL;
netif_tx_unlock_bh(tun->dev);
 
/* Drop read queue */
-   skb_queue_purge(&tun->socket.sk->sk_receive_queue);
+   skb_queue_purge(&tfile->socket.sk->sk_receive_queue);
 
/* Drop the extra count on the net device */
dev_put(tun->dev);
@@ -349,21 +362,12 @@ static void t

[net-next v5 1/7] tuntap: log the unsigned informaiton with %u

2012-10-31 Thread Jason Wang

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3157519..f830b1b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1419,7 +1419,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
if (!tun)
goto unlock;
 
-   tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %d\n", cmd);
+   tun_debug(KERN_INFO, tun, "tun_chr_ioctl cmd %u\n", cmd);
 
ret = 0;
switch (cmd) {
@@ -1459,7 +1459,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
break;
}
tun->owner = owner;
-   tun_debug(KERN_INFO, tun, "owner set to %d\n",
+   tun_debug(KERN_INFO, tun, "owner set to %u\n",
  from_kuid(&init_user_ns, tun->owner));
break;
 
@@ -1471,7 +1471,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
break;
}
tun->group = group;
-   tun_debug(KERN_INFO, tun, "group set to %d\n",
+   tun_debug(KERN_INFO, tun, "group set to %u\n",
  from_kgid(&init_user_ns, tun->group));
break;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[net-next v5 0/7] Multiqueue support in tuntap

2012-10-31 Thread Jason Wang

Hello All:

This is an update of multiqueue support in tuntap from V3. Please consider to
merge.

The main idea for this series is to let tun/tap device to be benefited from
multiqueue network cards and multi-core host. We used to have a single queue for
tuntap which could be a bottleneck in a multiqueue/core environment. So this
series let the device could be attched with multiple sockets and expose them
through fd to the userspace as multiqueues. The sereis were orignally designed
to serve as backend for multiqueue virtio-net in KVM, but the design is generic
for other application to be used.

Some quick overview of the design:

- Moving socket from tun_device to tun_file.
- Allowing multiple sockets to be attached to a tun/tap devices.
- Using RCU to synchronize the data path and system call.
- Two new ioctls were added for the usespace to attach and detach socket to the
  device.
- API compatibility were maintained without userspace notable changes, so legacy
  userspace that only use one queue won't need any changes.
- A flow(rxhash) to queue table were maintained by tuntap which choose the txq
  based on the last rxq where it comes.

Performance test:

Pktgen is used to generate the traffic and a simple program that only does the
receiving in userspace.

#q #thread  aggregate kpps +improvement%
1q 1thread  818kpps+0%
2q 2threads 1926kpps   +135%
3q 3threads 2642kpps   +223%
4q 4threads 3536kpps   +332%

Changes from v4:
- Fix style issue found by checkpatch.pl

Changes from V3:
- Rebase to net-next
- A separate RCUiying patch to simply the reviewing
- Add a simple "tx follows rx" policy when choosing txq
- Various bug fixes

Changes from V2:
- Rebase to the latest net-next
- Fix netdev leak when tun_attach fails
- Fix return value of TUNSETOWNER
- Purge the receive queue in socket destructor
- Enable multiqueue tun (V1 and V2 only allows mq to be eanbled for tap
- Add per-queue u64 statistics
- Fix wrong BUG_ON() check in tun_detach()
- Check numqueues instead of tfile[0] in tun_set_iff() to let tunctl -d works
  correctly
- Set numqueues to MAX_TAP_QUEUES during tun_detach_all() to prevent the
  attaching.

Changes from V1:
- Simplify the sockets array management by not leaving NULL in the slot.
- Optimization on the tx queue selecting.
- Fix the bug in tun_deatch_all()

Reference:
- V4 https://lkml.org/lkml/2012/10/29/37
- V3 https://lkml.org/lkml/2012/6/25/191
- V2 http://lwn.net/Articles/459270/
- V1 http://www.mail-archive.com/kvm@vger.kernel.org/msg59479.html

Jason Wang (7):
  tuntap: log the unsigned informaiton with %u
  tuntap: move socket to tun_file
  tuntap: RCUify dereferencing between tun_struct and tun_file
  tuntap: introduce multiqueue flags
  tuntap: multiqueue support
  tuntap: add ioctl to attach or detach a file form tuntap device
  tuntap: choose the txq based on rxq

 drivers/net/tun.c   |  836 ---
 include/uapi/linux/if_tun.h |5 +
 2 files changed, 631 insertions(+), 210 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] Staging: silicom: remove unused including

2012-10-31 Thread Wei Yongjun

From: Wei Yongjun 

Remove including  that don't need it.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 drivers/staging/silicom/bp_mod.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/silicom/bp_mod.c b/drivers/staging/silicom/bp_mod.c
index 018b4ff..58c5f5c 100644
--- a/drivers/staging/silicom/bp_mod.c
+++ b/drivers/staging/silicom/bp_mod.c
@@ -9,7 +9,6 @@
 /*
*/
 /*
*/
 
/**/
-#include 
 
 #include   /* We're doing kernel work */
 #include   /* Specifically, a module */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] Staging: silicom: bypasslib: remove unused including

2012-10-31 Thread Wei Yongjun

From: Wei Yongjun 

Remove including  that don't need it.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 drivers/staging/silicom/bypasslib/bplibk.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/silicom/bypasslib/bplibk.h 
b/drivers/staging/silicom/bypasslib/bplibk.h
index a726d90..d8c1d27 100644
--- a/drivers/staging/silicom/bypasslib/bplibk.h
+++ b/drivers/staging/silicom/bypasslib/bplibk.h
@@ -15,7 +15,6 @@
 
 #include "bp_ioctl.h"
 #include "libbp_sd.h"
-#include 
 
 #define IF_NAME"eth"
 #define SILICOM_VID0x1374


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PART2 Patch] node: cleanup node_state_attr

2012-10-31 Thread Wen Congyang

At 11/01/2012 02:29 AM, David Rientjes Wrote:
> On Wed, 31 Oct 2012, Wen Congyang wrote:
> 
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index af1a177..5d7731e 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -614,23 +614,23 @@ static ssize_t show_node_state(struct device *dev,
>>  { __ATTR(name, 0444, show_node_state, NULL), state }
>>  
>>  static struct node_attr node_state_attr[] = {
>> -_NODE_ATTR(possible, N_POSSIBLE),
>> -_NODE_ATTR(online, N_ONLINE),
>> -_NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
>> -_NODE_ATTR(has_cpu, N_CPU),
>> +[N_POSSIBLE] = _NODE_ATTR(possible, N_POSSIBLE),
>> +[N_ONLINE] = _NODE_ATTR(online, N_ONLINE),
>> +[N_NORMAL_MEMORY] = _NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
>>  #ifdef CONFIG_HIGHMEM
>> -_NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
>> +[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
>>  #endif
>> +[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
>>  };
>>  
> 
> Why change the index for N_CPU?

N_CPU > N_HIGH_MEMORY

We use this array to create attr file in sysfs. So changing the index for N_CPU
doesn't cause any other problem.

Thanks
Wen Congyang

> 
>>  static struct attribute *node_state_attrs[] = {
>> -&node_state_attr[0].attr.attr,
>> -&node_state_attr[1].attr.attr,
>> -&node_state_attr[2].attr.attr,
>> -&node_state_attr[3].attr.attr,
>> +&node_state_attr[N_POSSIBLE].attr.attr,
>> +&node_state_attr[N_ONLINE].attr.attr,
>> +&node_state_attr[N_NORMAL_MEMORY].attr.attr,
>>  #ifdef CONFIG_HIGHMEM
>> -&node_state_attr[4].attr.attr,
>> +&node_state_attr[N_HIGH_MEMORY].attr.attr,
>>  #endif
>> +&node_state_attr[N_CPU].attr.attr,
>>  NULL
>>  };
>>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] x86, xen: fix build dependency when USB_SUPPORT is not enabled

2012-10-31 Thread David Rientjes

CONFIG_XEN_DOM0 must depend on CONFIG_USB_SUPPORT, otherwise there is no 
definition of xen_dbgp_reset_prep() and xen_dbgp_external_startup() 
resulting in the following link error:

drivers/built-in.o: In function `dbgp_reset_prep':
(.text+0x1e03c5): undefined reference to `xen_dbgp_reset_prep'
drivers/built-in.o: In function `dbgp_external_startup':
(.text+0x1e0d55): undefined reference to `xen_dbgp_external_startup'

Signed-off-by: David Rientjes 
---
 arch/x86/xen/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -17,6 +17,7 @@ config XEN_DOM0
def_bool y
depends on XEN && PCI_XEN && SWIOTLB_XEN
depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
+   depends on USB_SUPPORT
 
 # Dummy symbol since people have come to rely on the PRIVILEGED_GUEST
 # name in tools.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] mm: fix build warning for uninitialized value

2012-10-31 Thread David Rientjes

do_wp_page() sets mmun_called if mmun_start and mmun_end were initialized 
and, if so, may call mmu_notifier_invalidate_range_end() with these 
values.  This doesn't prevent gcc from emitting a build warning though:

mm/memory.c: In function ‘do_wp_page’:
mm/memory.c:2530: warning: ‘mmun_start’ may be used uninitialized in this 
function
mm/memory.c:2531: warning: ‘mmun_end’ may be used uninitialized in this function

It's much easier to initialize the variables to impossible values and do a 
simple comparison to determine if they were initialized to remove the bool 
entirely.

Signed-off-by: David Rientjes 
---
 mm/memory.c |   10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2527,9 +2527,8 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
int ret = 0;
int page_mkwrite = 0;
struct page *dirty_page = NULL;
-   unsigned long mmun_start;   /* For mmu_notifiers */
-   unsigned long mmun_end; /* For mmu_notifiers */
-   bool mmun_called = false;   /* For mmu_notifiers */
+   unsigned long mmun_start = 0;   /* For mmu_notifiers */
+   unsigned long mmun_end = 0; /* For mmu_notifiers */
 
old_page = vm_normal_page(vma, address, orig_pte);
if (!old_page) {
@@ -2708,8 +2707,7 @@ gotten:
goto oom_free_new;
 
mmun_start  = address & PAGE_MASK;
-   mmun_end= (address & PAGE_MASK) + PAGE_SIZE;
-   mmun_called = true;
+   mmun_end= mmun_start + PAGE_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 
/*
@@ -2778,7 +2776,7 @@ gotten:
page_cache_release(new_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
-   if (mmun_called)
+   if (mmun_end > mmun_start)
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
if (old_page) {
/*

linux-next: Tree for Nov 1

2012-10-31 Thread Stephen Rothwell

Hi all,

New trees: rr-fixes and swiotlb

Changes since 20121031:

The v4l-dvb tree still had its build failure so I used the version from
next-20121026.

The tmem tree gained conflicts against the xen-two tree.

The akpm tree lost a patch that turned up elsewhere.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 209 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (1e207eb Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending)
Merging fixes/master (12250d8 Merge branch 'i2c-embedded/for-next' of 
git://git.pengutronix.de/git/wsa/linux)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (b43b1ff Merge tag 'fixes-for-rmk' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes)
Merging m68k-current/for-linus (8a745ee m68k: Wire up kcmp)
Merging powerpc-merge/merge (83dac59 cpuidle/powerpc: Fix snooze state problem 
in the cpuidle design on pseries.)
Merging sparc/master (f7e8d9f qlogicpti: Fix build warning.)
Merging net/master (aff9c78 Merge branch 'for-davem' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless)
Merging sound-current/for-linus (16c2e1f ALSA: ice1724: Fix rate setup after 
resume)
Merging pci-current/for-linus (0ff9514 PCI: Don't print anything while decoding 
is disabled)
Merging wireless/master (6fe7cc7 ath9k: Test for TID only in BlockAcks while 
checking tx status)
Merging driver-core.current/driver-core-linus (8f0d816 Linux 3.7-rc3)
Merging tty.current/tty-linus (8f0d816 Linux 3.7-rc3)
Merging usb.current/usb-linus (d99e65b USB: fix build with XEN and 
EARLY_PRINTK_DBGP enabled but USB_SUPPORT disabled)
Merging staging.current/staging-linus (8f0d816 Linux 3.7-rc3)
Merging char-misc.current/char-misc-linus (8f0d816 Linux 3.7-rc3)
Merging input-current/for-linus (32ed191 Input: tsc40 - remove wrong 
announcement of pressure support)
Merging md-current/for-linus (ed30be0 MD RAID10: Fix oops when creating RAID10 
arrays via dm-raid.c)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)
Merging spi-current/spi/merge (d1c185b of/spi: Fix SPI module loading by using 
proper "spi:" modalias prefixes.)
Merging gpio-current/gpio/merge (96b7064 gpio/tca6424: merge I2C transactions, 
remove cast)
Merging rr-fixes/fixes (59ef28b module: fix out-by-one error in kallsyms)
Merging asm-generic/master (9b04ebd asm-generic/io.h: remove asm/cacheflush.h 
include)
Mer

[PATCH -next] CLK: clk-twl6040: fix return value check in twl6040_clk_probe()

2012-10-31 Thread Wei Yongjun

From: Wei Yongjun 

In case of error, the function clk_register() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 drivers/clk/clk-twl6040.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clk-twl6040.c b/drivers/clk/clk-twl6040.c
index f4a3389..bc1e713 100644
--- a/drivers/clk/clk-twl6040.c
+++ b/drivers/clk/clk-twl6040.c
@@ -92,8 +92,8 @@ static int __devinit twl6040_clk_probe(struct platform_device 
*pdev)
 
clkdata->mcpdm_fclk.init = &wm831x_clkout_init;
clkdata->clk = clk_register(&pdev->dev, &clkdata->mcpdm_fclk);
-   if (!clkdata->clk)
-   return -EINVAL;
+   if (IS_ERR(clkdata->clk))
+   return PTR_ERR(clkdata->clk);
 
dev_set_drvdata(&pdev->dev, clkdata);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] mtip32xx: remove unused variables from mtip32xx.c

2012-10-31 Thread yongjun_...@trendmicro.com.cn

Hi all

Sorry, please ignore this mail, this mail just be resend by my mistake.

Regards,
Yongjun Wei

-Original Message-
From: Wei Yongjun [mailto:weiyj...@gmail.com] 
Sent: 2012年11月1日 13:09
To: ax...@kernel.dk
Cc: Yongjun Wei (RD-CN); linux-kernel@vger.kernel.org
Subject: [PATCH] mtip32xx: remove unused variables from mtip32xx.c

From: Wei Yongjun 

The variables fis, reply are initialized but never used otherwise, so remove 
the unused variables.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 drivers/block/mtip32xx/mtip32xx.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index adc6f36..d1e0273 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -555,7 +555,6 @@ static void print_tags(struct driver_data *dd,  static void 
mtip_timeout_function(unsigned long int data)  {
struct mtip_port *port = (struct mtip_port *) data;
-   struct host_to_dev_fis *fis;
struct mtip_cmd *command;
int tag, cmdto_cnt = 0;
unsigned int bit, group;
@@ -587,7 +586,6 @@ static void mtip_timeout_function(unsigned long int data)
bit = tag & 0x1F;
 
command = &port->commands[tag];
-   fis = (struct host_to_dev_fis *) command->command;
 
set_bit(tag, tagaccum);
cmdto_cnt++;
@@ -1142,10 +1140,8 @@ static void mtip_issue_non_ncq_command(struct mtip_port 
*port, int tag)  static bool mtip_pause_ncq(struct mtip_port *port,
struct host_to_dev_fis *fis)
 {
-   struct host_to_dev_fis *reply;
unsigned long task_file_data;
 
-   reply = port->rxfis + RX_FIS_D2H_REG;
task_file_data = readl(port->mmio+PORT_TFDATA);
 
if (fis->command == ATA_CMD_SEC_ERASE_UNIT)




TREND MICRO EMAIL NOTICE
The information contained in this email and any attachments is confidential 
and may be subject to copyright or other intellectual property protection. 
If you are not the intended recipient, you are not authorized to use or 
disclose this information, and we request that you notify us by reply mail or
telephone and delete the original message from your mail system.

Re: [net-next v4 0/7] Multiqueue support in tuntap

2012-10-31 Thread Jason Wang


On 10/31/2012 07:52 AM, Stephen Hemminger wrote:

I am testing BQL for tuntap.
It wouldn't be hard to do BQL in the multi-queue version.


Yes, if BQL for tuntap is in first, I will rebase and convert it to 
multiqueue version.


Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [net-next resend v4 5/7] tuntap: multiqueue support

2012-10-31 Thread Jason Wang


On 11/01/2012 02:16 AM, David Miller wrote:

From: Jason Wang 
Date: Mon, 29 Oct 2012 14:15:49 +0800


@@ -110,6 +110,11 @@ struct tap_filter {
unsigned char   addr[FLT_EXACT_COUNT][ETH_ALEN];
  };
  
+/* 1024 is probably a high enough limit: modern hypervisors seem to support on

+ * the order of 100-200 CPUs so this leaves us some breathing space if we want
+ * to match a queue per guest CPU. */

Please don't format comments like this.  Put that final "*/" on it's
own line.

I'm really perplexed how you can get it right elsewhere in your
patches, and then botch it up only in a few select locations :-/


Sorry about this, some parts were copy paste from the comments of 
reviewer. I will post a new version to fix them all.

+/* We try to identify a flow through its rxhash first. The reason that
+ * we do not check rxq no. is becuase some cards(e.g 82599), chooses
+ * the rxq based on the txq where the last packet of the flow comes. As
+ * the userspace application move between processors, we may get a
+ * different rxq no. here. If we could not get rxhash, then we would
+ * hope the rxq no. may help here.
+ */

For example, this one is done right.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry

2012-10-31 Thread Minchan Kim

On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of first flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of first flushed entry.
> 
> Additionally, update last_pkmap_nr to this index.
> It is certain that entry which is below this index is occupied by other 
> mapping,
> therefore updating last_pkmap_nr to this index is reasonable optimization.
> 
> Cc: Mel Gorman 
> Cc: Peter Zijlstra 
> Cc: Minchan Kim 
> Signed-off-by: Joonsoo Kim 
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..97ad208 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void 
> *vaddr, int size)
>  
>  #ifdef CONFIG_HIGHMEM
>  #include 
> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>  
>  /* declarations for linux/mm/highmem.c */
>  unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index d98b0a9..b365f7b 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>   return virt_to_page(addr);
>  }
>  
> -static void flush_all_zero_pkmaps(void)
> +static unsigned int flush_all_zero_pkmaps(void)
>  {
>   int i;
> - int need_flush = 0;
> + unsigned int index = PKMAP_INVALID_INDEX;
>  
>   flush_cache_kmaps();
>  
> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> &pkmap_page_table[i]);
>  
>   set_page_address(page, NULL);
> - need_flush = 1;
> + if (index == PKMAP_INVALID_INDEX)
> + index = i;
>   }
> - if (need_flush)
> + if (index != PKMAP_INVALID_INDEX)
>   flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> + return index;
>  }
>  
>  /**
> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>   */
>  void kmap_flush_unused(void)
>  {
> + unsigned int index;
> +
>   lock_kmap();
> - flush_all_zero_pkmaps();
> + index = flush_all_zero_pkmaps();
> + if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> + last_pkmap_nr = index;

I don't know how kmap_flush_unused is really fast path so how my nitpick
is effective. Anyway,
What problem happens if we do following as?

lock()
index = flush_all_zero_pkmaps();
if (index != PKMAP_INVALID_INDEX)
last_pkmap_nr = index;
unlock();

Normally, last_pkmap_nr is increased with searching empty slot in
map_new_virtual. So I expect return value of flush_all_zero_pkmaps
in kmap_flush_unused normally become either less than last_pkmap_nr
or last_pkmap_nr + 1.

 
>   unlock_kmap();
>  }
>  
>  static inline unsigned long map_new_virtual(struct page *page)
>  {
>   unsigned long vaddr;
> + unsigned int index = PKMAP_INVALID_INDEX;
>   int count;
>  
>  start:
> @@ -168,40 +176,45 @@ start:
>   for (;;) {
>   last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
>   if (!last_pkmap_nr) {
> - flush_all_zero_pkmaps();
> - count = LAST_PKMAP;
> + index = flush_all_zero_pkmaps();
> + break;
>   }
> - if (!pkmap_count[last_pkmap_nr])
> + if (!pkmap_count[last_pkmap_nr]) {
> + index = last_pkmap_nr;
>   break;  /* Found a usable entry */
> - if (--count)
> - continue;
> -
> - /*
> -  * Sleep for somebody else to unmap their entries
> -  */
> - {
> - DECLARE_WAITQUEUE(wait, current);
> -
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> - add_wait_queue(&pkmap_map_wait, &wait);
> - unlock_kmap();
> - schedule();
> - remove_wait_queue(&pkmap_map_wait, &wait);
> - lock_kmap();
> -
> - /* Somebody else might have mapped it while we slept */
> - if (page_address(page))
> - return (unsigned long)page_address(page);
> -
> - /* Re-start */
> - goto start;
>   }
> + if (--count == 0)
> + break;
>   }
> - vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> + /*
> +  * Sleep for somebody else to unmap their entries
> +  */
> + if (index == PKMAP_INVALID_INDEX) {
> + DECLARE_WAITQUEUE(wait, current);
> +
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> + add_wait_queue(&pkmap_map_wait, &wait);
> +

Re: [RFC] Second attempt at kernel secure boot support

2012-10-31 Thread joeyli

於 三，2012-10-31 於 19:53 +0100，Takashi Iwai 提到：
> At Wed, 31 Oct 2012 17:37:28 +,
> Matthew Garrett wrote:
> > 
> > On Wed, Oct 31, 2012 at 06:28:16PM +0100, Takashi Iwai wrote:
> > 
> > > request_firmware() is used for microcode loading, too, so it's fairly
> > > a core part to cover, I'm afraid.
> > > 
> > > I played a bit about this yesterday.  The patch below is a proof of
> > > concept to (ab)use the module signing mechanism for firmware loading
> > > too.  Sign firmware files via scripts/sign-file, and put to
> > > /lib/firmware/signed directory.
> > 
> > That does still leave me a little uneasy as far as the microcode 
> > licenses go. I don't know that we can distribute signed copies of some 
> > of them, and we obviously can't sign at the user end.
> 
> Yeah, that's a concern.  Although this is a sort of "container" and
> keeping the original data as is, it might be regarded as a
> modification.
> 
> Another approach would be to a signature in a separate file, but I'm
> not sure whether it makes sense.
> 

I think it make sense because the private key is still protected by
signer. Any hacker who modified firmware is still need use private key
to generate signature, but hacker's private key is impossible to match
with the public key that kernel used to verify firmware.

And, I afraid we have no choice that we need put the firmware signature
in a separate file. Contacting with those company's legal department
will be very time-consuming, and I am not sure all company will agree we
put the signature with firmware then distribute.


Thanks a lot!
Joey Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-31 Thread H. Peter Anvin

2) would make most sense to me, but I'd be okay with 3) as well.

"Zhang, Jun"  wrote:

>Hello, Anvin
>
>I want to explain why I modify in this place. In kexec, it pass three
>parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K
>I think my patch modify the least code. 
>Actually, there are some choise to fix it. 
>1)  my patch.
>2)  modify kexec, only pass two parameters -- memmap=544K@64K
>memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM
>range.
>3)  add extra optional, like memmap=REMOVERAM
>
>Which one do you like? Maybe you have better solution, please share it.
>Thanks!
>
>Best Regards!
>
>Jun Zhang
>Inet: 8821-4273
>Dir.Tel: 86-21-6116-4273
>Email: jun.zh...@intel.com
>
>-Original Message-
>From: H. Peter Anvin [mailto:h...@zytor.com] 
>Sent: Wednesday, October 31, 2012 1:39 PM
>To: Zhang, Jun
>Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton;
>Fleming, Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH] To crash dump, we need keep other memory type
>except E820_RAM, because other type come from BIOS or firmware is used
>by other code(for example: PCI_MMCONFIG).
>
>On 10/30/2012 10:22 PM, Zhang, Jun wrote:
>> Hello, Anvin
>>You are right. Thanks!
>>
>> Hello, All
>>Please review it again. Thanks!
>>
>>  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 
>> 2001
>> From: jzha144 
>> Date: Wed, 31 Oct 2012 08:51:18 +0800
>> Subject: [PATCH] When we are doing a crash dump, we still need
>non-E820_RAM
>>   memory type address information in order to do I/O. so only
>>   remove all RAM ranges which need to be dumped.
>>
>> Signed-off-by: jzha144 
>> ---
>>   arch/x86/kernel/e820.c |9 +
>>   1 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
>> df06ade..77be839 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
>>   * reset.
>>   */
>>  saved_max_pfn = e820_end_of_ram_pfn();
>> +
>> +/*
>> + * We are doing a crash dump, so remove all RAM ranges
>> + * as they are the ones that need to be dumped.
>> + * We still need all non-RAM information in order to do I/O.
>> + */
>> +e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
>> +userdef = 1;
>> +return 0;
>>   #endif
>>  e820.nr_map = 0;
>>  userdef = 1;
>>
>
>The code is still wrong...
>
>   -hpa
>
>
>--
>H. Peter Anvin, Intel Open Source Technology Center I work for Intel. 
>I don't speak on their behalf.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/3] Use acpi_os_hotplug_execute() instead of alloc_acpi_hp_work().

2012-10-31 Thread Yinghai Lu

On Wed, Oct 31, 2012 at 12:27 AM, Tang Chen  wrote:
> Hi Yinghai,
>
> alloc_acpi_hp_work() just puts the hutplug work onto kacpi_hotplug_wq.
> As mentioned by Toshi Kani, this job has been done in 
> acpi_os_hotplug_execute().
> So we should use it instead of alloc_acpi_hp_work().
>
> This patch adds a acpi_hp_cb_data struct, which encapsulates the hotplug
> event notifier's parameters:
> struct acpi_hp_cb_data {
> acpi_handle handle;
> u32 type;
> void *context;
> };
>
> And also a function alloc_acpi_hp_work(), which calls 
> acpi_os_hotplug_execute()
> to put the hotplug job onto kacpi_hotplug_wq.
>
> This patch is based on Lu Yinghai's tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
> for-pci-split-pci-root-hp-2
>
> Signed-off-by: Tang Chen 
> ---
>  drivers/acpi/osl.c |   28 
>  drivers/acpi/pci_root_hp.c |   25 +++---
>  drivers/pci/hotplug/acpiphp_glue.c |   39 +++
>  include/acpi/acpiosxf.h|7 ++---
>  4 files changed, 55 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 311a921..d441b16 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -52,6 +52,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

not needed.

>
>  #define _COMPONENT ACPI_OS_SERVICES
>  ACPI_MODULE_NAME("osl");
> @@ -1592,23 +1593,22 @@ void acpi_os_set_prepare_sleep(int (*func)(u8 
> sleep_state,
> __acpi_os_prepare_sleep = func;
>  }
>
> -void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
> -   void (*func)(struct work_struct *work))
> +void acpi_hp_cb_execute(acpi_handle handle, u32 type, void *context,
> +   acpi_osd_exec_callback function)
>  {
> -   struct acpi_hp_work *hp_work;
> -   int ret;
> +   acpi_status status;
> +   struct acpi_hp_cb_data *cb_data;
>
> -   hp_work = kmalloc(sizeof(*hp_work), GFP_KERNEL);
> -   if (!hp_work)
> +   cb_data = kmalloc(sizeof(struct acpi_hp_cb_data), GFP_KERNEL);
> +   if (!cb_data)
> return;
>
> -   hp_work->handle = handle;
> -   hp_work->type = type;
> -   hp_work->context = context;
> +   cb_data->handle = handle;
> +   cb_data->type = type;
> +   cb_data->context = context;
>
> -   INIT_WORK(&hp_work->work, func);
> -   ret = queue_work(kacpi_hotplug_wq, &hp_work->work);
> -   if (!ret)
> -   kfree(hp_work);
> +   status = acpi_os_hotplug_execute(function, cb_data);
> +   if (ACPI_FAILURE(status))
> +   kfree(cb_data);
>  }
> -EXPORT_SYMBOL(alloc_acpi_hp_work);
> +EXPORT_SYMBOL(acpi_hp_cb_execute);
> diff --git a/drivers/acpi/pci_root_hp.c b/drivers/acpi/pci_root_hp.c
> index 7d427e6..2ff83f4 100644
> --- a/drivers/acpi/pci_root_hp.c
> +++ b/drivers/acpi/pci_root_hp.c
> @@ -75,19 +75,20 @@ static void handle_root_bridge_removal(struct acpi_device 
> *device)
> acpi_bus_hot_remove_device(ej_event);
>  }
>
> -static void _handle_hotplug_event_root(struct work_struct *work)
> +/* This function is of type acpi_osd_exec_callback */
> +static void _handle_hotplug_event_root(void *context)
>  {
> struct acpi_pci_root *root;
> char objname[64];
> struct acpi_buffer buffer = { .length = sizeof(objname),
>   .pointer = objname };
> -   struct acpi_hp_work *hp_work;
> +   struct acpi_hp_cb_data *cb_data;
> acpi_handle handle;
> u32 type;
>
> -   hp_work = container_of(work, struct acpi_hp_work, work);
> -   handle = hp_work->handle;
> -   type = hp_work->type;
> +   cb_data = (struct acpi_hp_cb_data *)context;
> +   handle = cb_data->handle;
> +   type = cb_data->type;
>
> root = acpi_pci_find_root(handle);
>
> @@ -124,14 +125,22 @@ static void _handle_hotplug_event_root(struct 
> work_struct *work)
> break;
> }
>
> -   kfree(hp_work); /* allocated in handle_hotplug_event_bridge */
> +   kfree(context); /* allocated in handle_hotplug_event_bridge */
>  }
>
>  static void handle_hotplug_event_root(acpi_handle handle, u32 type,
> void *context)
>  {
> -   alloc_acpi_hp_work(handle, type, context,
> -   _handle_hotplug_event_root);
> +   /*
> +* Currently the code adds all hotplug events to the kacpid_wq
> +* queue when it should add hotplug events to the kacpi_hotplug_wq.
> +* The proper way to fix this is to reorganize the code so that
> +* drivers (dock, etc.) do not call acpi_os_execute(), etc.
> +* For now just re-add this work to the kacpi_hotplug_wq so we
> +* don't deadlock on hotplug actions.
> +*/
> +   acpi_hp_cb_execute(handle, type, cont

Re: [PATCH V4 0/4] ARM: tegra: Enable SLINK controller driver

2012-10-31 Thread Laxman Dewangan


On Thursday 01 November 2012 12:46 AM, Stephen Warren wrote:

On 10/31/2012 11:31 AM, Laxman Dewangan wrote:

On Wednesday 31 October 2012 09:59 PM, Stephen Warren wrote:

On 10/31/2012 03:02 AM, Laxman Dewangan wrote:

This series modify the dts file to add the slink addresses,
make AUXDATA in board dt files, enable slink4 for tegra30-cardhu and
enable slink controller defconfig.

I don't appear to have received patch 1/4 this time around. I'll assume
it's identical to V3, since I don't think anything needed to change
there...

Yes, it is identical to V3, no change on the 1/4.

OK, I have applied the series now. Thanks.

BTW, I noticed that the Cardu board files sets the SPI controller's max
frequency to 25MHZ (which is consistent with the commit description) but
the SPI flash node's frequency to 20MHz. Was that intended, or should I
fix it up to be 25MHz?
If device provide the max frequency then the spi communciation will on 
this requested frequency.
If device does not provide the max spi frequency then it will 
communciate with the controller max frequency.


In this case we provide the 20MHz for device and 25MHz for controller. 
So spi communciation with this device will be always 20MHz.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build warning after merge of the tmem tree

2012-10-31 Thread Stephen Rothwell

Hi Konrad,

After merging the tmem tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

arch/x86/xen/enlighten.c:109:0: warning: "xen_pvh_domain" redefined [enabled by 
default]
include/xen/xen.h:23:0: note: this is the location of the previous definition

Probably caused by the merge of commit 6056726e851a ("xen/pvh: bootup and
setup (E820) related changes") from the xen-two tree and commit
7282a68f5aea ("PVH: Basic and preparatory changes") from the tmem tree.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpeNsPG0Tjtl.pgp
Description: PGP signature

Re: [PATCH 3/9] net: xfrm: use this_cpu_ptr per-cpu helper

2012-10-31 Thread Herbert Xu

On Wed, Oct 31, 2012 at 05:35:46PM +, Christoph Lameter wrote:
> On Wed, 31 Oct 2012, Shan Wei wrote:
> 
> > -
> > list_for_each_entry(pos, &ipcomp_tfms_list, list) {
> > struct crypto_comp *tfm;
> >
> > tfms = pos->tfms;
> > -   tfm = *per_cpu_ptr(tfms, cpu);
> > +
> > +   /* This can be any valid CPU ID so we don't need locking. */
> > +   tfm = *this_cpu_ptr(tfms);
> 
> It would be better to use
> 
>   this_cpu_read(tfms)
> 
> since that would also make it atomic vs interrupts. The above code (both
> original and modified) could determine a pointer to a per cpu structure
> and then take an interrupt which would move the task. On return we would
> be accessing the per cpu variable of another processor.

Please refer to the comment in the patch above.

But I think the patch is wrong anyway because it would introduce
a warning, no?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the tmem tree with the xen-two tree

2012-10-31 Thread Stephen Rothwell

Hi Konrad,

Today's linux-next merge of the tmem tree got a conflict in
arch/x86/xen/smp.c between commits 1ba23a0f2605 ("xen/smp: Move the
common CPU init code a bit to prep for PVH patch") and 6c6067f26388
("xen/pvh: Extend vcpu_guest_context, p2m, event, and XenBus") from the
xen-two tree and commit 7282a68f5aea ("PVH: Basic and preparatory
changes") from the tmem tree.

I fixed it up (using the xen-two tree version) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpnryeCbBlPG.pgp
Description: PGP signature

linux-next: manual merge of the tmem tree with the xen-two tree

2012-10-31 Thread Stephen Rothwell

Hi Konrad,

Today's linux-next merge of the tmem tree got a conflict in
arch/x86/xen/setup.c between commit 6056726e851a ("xen/pvh: bootup and
setup (E820) related changes") from the xen-two tree and commit
7282a68f5aea ("PVH: Basic and preparatory changes") from the tmem tree.

I fixed it up (by using the xen-two tree version) and can carry the fix
as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgplN7WamhIO3.pgp
Description: PGP signature

linux-next: manual merge of the tmem tree with the xen-two tree

2012-10-31 Thread Stephen Rothwell

Hi Konrad,

Today's linux-next merge of the tmem tree got conflicts in
arch/x86/include/asm/xen/interface.h and drivers/xen/cpu_hotplug.c
between commit 6c6067f26388 ("xen/pvh: Extend vcpu_guest_context, p2m,
event, and XenBus") from the xen-two tree and commit 7282a68f5aea ("PVH:
Basic and preparatory changes") from the tmem tree.

I fixed it up (see below and using the xen-two version for cpu_hotplug.c)
and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/x86/include/asm/xen/interface.h
index 20e738a,104fa50..000
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@@ -145,16 -136,8 +145,17 @@@ struct vcpu_guest_context 
  struct cpu_user_regs user_regs; /* User-level CPU registers */
  struct trap_info trap_ctxt[256];/* Virtual IDT  */
  unsigned long ldt_base, ldt_ents;   /* LDT (linear address, # ents) */
 -unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents).*
 -   * PV in HVM: it's GDTR addr/sz */
 +union {
 +  struct {
-   /* PV: GDT (machine frames, # ents).*/
++  /* PV: GDT (machine frames, # ents).
++   * PV in HVM: it's GDTR addr/sz */
 +  unsigned long gdt_frames[16], gdt_ents;
 +  } pv;
 +  struct {
 +  /* PVH: GDTR addr and size */
 +  unsigned long gdtaddr, gdtsz;
 +  } pvh;
 +} u;
  unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1)   */
  /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */
  unsigned long ctrlreg[8];   /* CR0-CR7 (control registers)  */


pgpMWwvGiz42d.pgp
Description: PGP signature

RE: [PATCH] staging: ste_rmi4: Convert to Type-B support

2012-10-31 Thread Alexandra Chin

Hi Henrik,

I see what you mean. Input subsystem handles multi-touch tracking for input 
driver.
I will update reporting process, and resubmit a patch against 3.7-rcX.
Greatly appreciate your comments.

Alexandra Chin

[Consult] "gifts" to Red Hat.

2012-10-31 Thread Chen Gang

Hello linux-kernel@vger.kernel.org:

1) Sorry for bothering you, firstly.

   A) Asianux have found, analysed, and also provided the solving ways
to the issues which only relative with Red Hat (public kernel not have),
we can call them in short words "gifts".

   B) The relative members of Red Hat suggested Asianux to send these
"gifts" to public kernel mailing list (linux-*@vger.kernel.org), please
see the details in bottom of this mail.

   C) I want to consult: "is it suitable to talk about these 'gifts' in
linux-*@vger.kernel.org" ?

2) for Asianux:

   A) Asianux will provide their contributes to the Open Source, just as
Asianux have already get benefits from the Open Source.

   B) Finding, analysing, and providing solving ways is one of important
ways to contributes to the Open Source.

   C) Asianux will do what they can do for Open Source
  focus on Linux kernel (not only Red Hat, but also public kernel)

3) please feed back:

   A) welcome any members to giving any suggestions and completions.

   B) if none-reply within 2 weeks, Asianux will see "it is suitable".

   C) thanks.

Appendix:

The communication between gang.c...@asianux.com and rwhee...@redhat.com
-
于 2012年10月24日 19:07, Ric Wheeler 写道:
> On 10/24/2012 06:24 AM, Chen Gang wrote:
>> Hi Ric,
>>
>>
>> 1) I have sent to this mail to linux-...@vger.kernel.org.
>
> Great!
>
> What you need to do is to post the question there and also a test
> with the upstream kernel version.
>>
>> 2) I want to know:  "Why send an issue to a public kernel mail list
>> which is only relative with Red Hat own ??".
>
> Our developers will see your post and will follow up based on the
> upstream conversation if we think it is a feature our customer base
> (and supported configurations) will see.
>
> We are *not* paid to support other distributions or their developer
> teams in debugging and supporting your code base.
>

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch

2012-10-31 Thread Wen Congyang

At 10/31/2012 09:41 PM, Jianguo Wu Wrote:
> On 2012/10/31 19:23, Wen Congyang wrote:
>> NR_FREE_PAGES will be wrong after offlining pages.  We add/dec
>> NR_FREE_PAGES like this now:
>>
>> 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
>>
>> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is
>>MIGRATE_ISOLATE
>>
>> 3. dec NR_FREE_PAGES when offlining isolated pages.
>>
>> 4. add NR_FREE_PAGES when undoing isolate pages.
>>
>> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
>> NR_FREE_PAGES are right.  When we come to step4, all pages are not in
>> buddy system, so we don't change NR_FREE_PAGES in this step, but we change
>> NR_FREE_PAGES in step3.  So NR_FREE_PAGES is wrong after offlining pages.
>> So there is no need to change NR_FREE_PAGES in step3.
>>
>> This patch also fixs a problem in step2: if the migratetype is
>> MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
>> pcppages.
>>
>> Signed-off-by: Wen Congyang 
>> Cc: David Rientjes 
>> Cc: Jiang Liu 
>> Cc: Len Brown 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Christoph Lameter 
>> Cc: Minchan Kim 
>> Cc: KOSAKI Motohiro 
>> Cc: Yasuaki Ishimatsu 
>> Cc: Dave Hansen 
>> Cc: Mel Gorman 
>> Signed-off-by: Andrew Morton 
>> ---
>>  mm/page_alloc.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 5b74de6..a7cd2d1 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int 
>> count,
>>  /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>>  __free_one_page(page, zone, 0, mt);
>>  trace_mm_page_pcpu_drain(page, 0, mt);
>> -if (is_migrate_cma(mt))
>> -__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 
>> 1);
>> +if (likely(mt != MIGRATE_ISOLATE)) {
> 
> Hi Congyang,
>   I think mt != MIGRATE_ISOLATE is always true here,
> page from PCP's migratetype < MIGRATE_PCPTYPES.
> When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE,
> but set_freepage_migratetype() isn't called.
> Maybe we can use mt = get_pageblock_migratetype() here ?

Yes, you are right. I have sent a fix patch.

Thanks for pointing it out.

Wen Congyang

> 
> Thanks,
> Jianguo Wu.
> 
>> +__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
>> +if (is_migrate_cma(mt))
>> +__mod_zone_page_state(zone, 
>> NR_FREE_CMA_PAGES, 1);
>> +}
>>  } while (--to_free && --batch_free && !list_empty(list));
>>  }
>> -__mod_zone_page_state(zone, NR_FREE_PAGES, count);
>>  spin_unlock(&zone->lock);
>>  }
>>  
>> @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, 
>> unsigned long end_pfn)
>>  list_del(&page->lru);
>>  rmv_page_order(page);
>>  zone->free_area[order].nr_free--;
>> -__mod_zone_page_state(zone, NR_FREE_PAGES,
>> -  - (1UL << order));
>>  for (i = 0; i < (1 << order); i++)
>>  SetPageReserved((page+i));
>>  pfn += (1 << order);
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] memory-hotplug: fix NR_FREE_PAGES mismatch's fix

2012-10-31 Thread Wen Congyang

When a page is freed and put into pcp list, get_freepage_migratetype()
doesn't return MIGRATE_ISOLATE even if this pageblock is isolated.
So we should use get_pageblock_migratetype() instead of mt to check
whether it is isolated.

Cc: David Rientjes 
Cc: Jiang Liu 
Cc: Len Brown 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Christoph Lameter 
Cc: Minchan Kim 
Cc: KOSAKI Motohiro 
Cc: Yasuaki Ishimatsu 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Jianguo Wu 
Signed-off-by: Wen Congyang 

---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 027afd0..e9c19d2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
-   if (likely(mt != MIGRATE_ISOLATE)) {
+   if (likely(mt != get_pageblock_migratetype(page))) {
__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
if (is_migrate_cma(mt))
__mod_zone_page_state(zone, 
NR_FREE_CMA_PAGES, 1);
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How about a gpio_get(device , char ) function?

2012-10-31 Thread Alex Courbot

On Wednesday 31 October 2012 23:25:41 Stephen Warren wrote:
> On 10/31/2012 03:04 AM, Alex Courbot wrote:
> > Hi,
> > 
> > Would anyone be opposed to having a gpio_get() function that works
> > similarly to e.g. regulator_get() and clk_get()?
> 
> One major stumbling block is that with device tree, each individual
> binding gets to decide on the specific naming of the propert{y,ies} that
> define the GPIO(s) for the device, and so there's no way to provide a
> generic implementation of that function.

The idea is not to make every GPIOs declared so far accessible through this 
function - as you point out this would be tricky at best - but to also define 
how they should be properly declared (similarly to regulators and pals) for 
bindings that need to use get_gpio(). Existing bindings and drivers that can 
live without it should not be modified. And now that you mention it, the end of 
the GPIO declaration anarchy would be another point in favor of this feature.

Now I am aware that almost every subsystem comes with its own scheme for 
declaring resources in the DT and there will be an long fight to decide which 
one should apply here, but I'm willing to take that road.

> Related, I've always wished that DT nodes looked like:
> 
> device {
> reg = <...>;
> compatible = <...>;
> resources {
> pwms = <...>;
> regulators = <...>;
> clocks = <...>;
> gpios = <...>;
> other-devices = <...>; /* for custom API dependencies */
> };
> config {
> /* device-specific properties */
> };
> child-busses {
> 0 = { ... };
> 1 = { ... };
> };
> };
> 
> ... specifically so that all resource allocation, and perhaps even child
> bus enumeration, could be completely standardized in the DT/device core.
> This could also feed into deferred probe, which could then be purely
> implemented inside the DT/driver core. However, that'd require something
> incompatible like "device tree 2.0"

I think this would be awesome. This could even probably be implemented without 
breaking things, if everything takes place inside well-defined subnodes of the 
device.

Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/3] zram/zsmalloc promotion

2012-10-31 Thread Minchan Kim

On Wed, Oct 31, 2012 at 09:19:00AM -0700, Greg Kroah-Hartman wrote:
> On Wed, Oct 31, 2012 at 04:02:02PM +0900, Minchan Kim wrote:
> > On Tue, Oct 30, 2012 at 07:43:07PM -0700, Greg Kroah-Hartman wrote:
> > > On Wed, Oct 31, 2012 at 11:39:48AM +0900, Minchan Kim wrote:
> > > > Greg, what do you think about LTSI?
> > > > Is it proper feature to add it? For it, still do I need ACK from mm 
> > > > developers?
> > > 
> > > It's already in LTSI, as it's in the 3.4 kernel, right?
> > 
> > Right. But as I look, it seems to be based on 3.4.11 which doesn't have
> > recent bug fix and enhances and current 3.4.16 also doesn't include it.
> 
> You can ask for those bugfixes to get backported to the stable/longterm
> kernel tree, see Documentation/stable_kernel_rules.txt for how to do
> this properly.
> 
> > Just out of curiosity.
> > 
> > Is there any rule about update period in long-term kernel?
> > I mean how often you release long-term kernel.
> 
> About once a week lately.
> 
> > Is there any rule about update period in LTSI kernel based on long-term 
> > kernel?
> 
> No, the LTSI kernel work has been slow due to the lack of time on my
> part lately.
> 
> > If I get the answer on above two quesion, I can expect later what LTSI 
> > kernel
> > version include feature I need.
> > 
> > Another question.
> > For example, There is A feature in mainline and A has no problem but
> > someone invents new wheel "B" which is better than A so it replace A totally
> > in recent mainline. As following stable-kernel rule, it's not a real bug fix
> > so I guess stable kernel will never replace A with B.
> 
> That is correct.
> 
> > It means LTSI never get a chance to use new wheel. Right?
> 
> No, you can submit the same patches for the LTSI kernel as well, they
> will probably be accepted as the rules are much more "loose" for the
> LTSI tree compared to the normal stable/longterm kernel rules.  Which is
> the primary reason it is around.
> 
> Hope this helps,
> 
> greg k-h

Thanks, Greg!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] staging/comedi: Use dev_ printks in drivers/ni_mio_cs.c

2012-10-31 Thread YAMANE Toshiaki

fixed below checkpatch warnings.
- WARNING: printk() should include KERN_ facility level

Signed-off-by: YAMANE Toshiaki 
---
 drivers/staging/comedi/drivers/ni_mio_cs.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/comedi/drivers/ni_mio_cs.c 
b/drivers/staging/comedi/drivers/ni_mio_cs.c
index f844684..76c6a13 100644
--- a/drivers/staging/comedi/drivers/ni_mio_cs.c
+++ b/drivers/staging/comedi/drivers/ni_mio_cs.c
@@ -340,13 +340,15 @@ static int mio_cs_attach(struct comedi_device *dev, 
struct comedi_devconfig *it)
 
irq = link->irq;
 
-   printk("comedi%d: %s: DAQCard: io 0x%04lx, irq %u, ",
-  dev->minor, dev->driver->driver_name, dev->iobase, irq);
+   dev->board_ptr = ni_boards + ni_getboardtype(dev, link);
 
 #if 0
{
int i;
 
+   printk("comedi%d: %s: DAQCard: io 0x%04lx, irq %u, ",
+  dev->minor, dev->driver->driver_name, dev->iobase, irq);
+
printk(" board fingerprint:");
for (i = 0; i < 32; i += 2) {
printk(" %04x %02x", inw(dev->iobase + i),
@@ -357,18 +359,17 @@ static int mio_cs_attach(struct comedi_device *dev, 
struct comedi_devconfig *it)
for (i = 0; i < 10; i++)
printk(" 0x%04x", win_in(i));
printk("\n");
+
+   printk("boardtype.name: %s\n", boardtype.name);
}
 #endif
 
-   dev->board_ptr = ni_boards + ni_getboardtype(dev, link);
-
-   printk(" %s", boardtype.name);
dev->board_name = boardtype.name;
 
ret = request_irq(irq, ni_E_interrupt, NI_E_IRQ_FLAGS,
  "ni_mio_cs", dev);
if (ret < 0) {
-   printk(" irq not available\n");
+   dev_err(dev->class_dev, "irq not available\n");
return -EINVAL;
}
dev->irq = irq;
@@ -401,7 +402,8 @@ static int ni_getboardtype(struct comedi_device *dev,
return i;
}
 
-   printk("unknown board 0x%04x -- pretend it is a ", link->card_id);
+   dev_err(dev->class_dev,
+   "unknown board 0x%04x -- pretend it is a ", link->card_id);
 
return 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] staging/comedi: fix the initialize statics issue in drivers/ni_mio_cs.c

2012-10-31 Thread YAMANE Toshiaki

fixed below checkpatch error.
- ERROR: do not initialise statics to 0 or NULL

Signed-off-by: YAMANE Toshiaki 
---
 drivers/staging/comedi/drivers/ni_mio_cs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/ni_mio_cs.c 
b/drivers/staging/comedi/drivers/ni_mio_cs.c
index 69a42c5..f844684 100644
--- a/drivers/staging/comedi/drivers/ni_mio_cs.c
+++ b/drivers/staging/comedi/drivers/ni_mio_cs.c
@@ -251,7 +251,7 @@ static void mio_cs_config(struct pcmcia_device *link);
 static void cs_release(struct pcmcia_device *link);
 static void cs_detach(struct pcmcia_device *);
 
-static struct pcmcia_device *cur_dev = NULL;
+static struct pcmcia_device *cur_dev;
 
 static int cs_attach(struct pcmcia_device *link)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] staging/comedi: fix the spaces issue at the start of line in drivers/ni_mio_cs.c

2012-10-31 Thread YAMANE Toshiaki

fixed below checkpatch warnings.
- WARNING: please, no spaces at the start of a line

Signed-off-by: YAMANE Toshiaki 
---
 drivers/staging/comedi/drivers/ni_mio_cs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/ni_mio_cs.c 
b/drivers/staging/comedi/drivers/ni_mio_cs.c
index b5b43e4..69a42c5 100644
--- a/drivers/staging/comedi/drivers/ni_mio_cs.c
+++ b/drivers/staging/comedi/drivers/ni_mio_cs.c
@@ -175,7 +175,7 @@ struct ni_private {
 
struct pcmcia_device *link;
 
- NI_PRIVATE_COMMON};
+NI_PRIVATE_COMMON};
 
 /* How we access registers */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: solve compilation errors in rcar_thermal

2012-10-31 Thread Fengguang Wu

On Tue, Oct 30, 2012 at 08:21:09PM -0700, Kuninori Morimoto wrote:
> 
> Hi Zhang, Andrew
> 
> This patch is needed on latest linus/master branch.
> Please re-check this patch.

Rui, it'd be better to send Andrew a finalized patch with your
Acked-by or Signed-off-by (because you passed it on), after resolving
the below puzzle:

> And, similar patch was added on linux-next/master branch
> b5da4e6d5603633835a1da267e0e699eea66f317
> (Thermal: Pass zone parameters as argument to tzd_register)
> but it seems wrong (?)

Thanks,
Fengguang

> At Tue, 21 Aug 2012 22:01:36 +0530,
> Devendra Naga wrote:
> > 
> > following were the errors reported
> > 
> > drivers/thermal/rcar_thermal.c: In function ‘rcar_thermal_probe’:
> > drivers/thermal/rcar_thermal.c:214:10: warning: passing argument 3 of 
> > ‘thermal_zone_device_register’ makes integer from pointer without a cast 
> > [enabled by default]
> > include/linux/thermal.h:166:29: note: expected ‘int’ but argument is of 
> > type ‘struct rcar_thermal_priv *’
> > drivers/thermal/rcar_thermal.c:214:10: error: too few arguments to function 
> > ‘thermal_zone_device_register’
> > include/linux/thermal.h:166:29: note: declared here
> > make[1]: *** [drivers/thermal/rcar_thermal.o] Error 1
> > make: *** [drivers/thermal/rcar_thermal.o] Error 2
> > 
> > with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> > 
> > Signed-off-by: Devendra Naga 
> > ---
> >  drivers/thermal/rcar_thermal.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/rcar_thermal.c b/drivers/thermal/rcar_thermal.c
> > index d445271..f7a1b57 100644
> > --- a/drivers/thermal/rcar_thermal.c
> > +++ b/drivers/thermal/rcar_thermal.c
> > @@ -210,7 +210,7 @@ static int rcar_thermal_probe(struct platform_device 
> > *pdev)
> > goto error_free_priv;
> > }
> >  
> > -   zone = thermal_zone_device_register("rcar_thermal", 0, priv,
> > +   zone = thermal_zone_device_register("rcar_thermal", 0, 0, priv,
> > &rcar_thermal_zone_ops, 0, 0);
> > if (IS_ERR(zone)) {
> > dev_err(&pdev->dev, "thermal zone device is NULL\n");
> > -- 
> > 1.7.9.5
> > 
> 
> 
> Best regards
> ---
> Kuninori Morimoto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] thermal: solve compilation errors in rcar_thermal

2012-10-31 Thread Zhang, Rui

Hi, Andrew,

Can you take this patch?
It fixes a real build error, and IMO, we should merge it ASAP. Thanks.

> -Original Message-
> From: kuninori morimoto [mailto:kuninori.morimoto...@gmail.com] On
> Behalf Of Kuninori Morimoto
> Sent: Wednesday, October 31, 2012 4:46 PM
> To: Zhang, Rui
> Cc: Andrew Morton; linux-kernel@vger.kernel.org; Wu, Fengguang;
> Devendra Naga
> Subject: [PATCH] thermal: solve compilation errors in rcar_thermal
> Importance: High
> 
> From: Devendra Naga 
> 
> following were the errors reported
> 
> drivers/thermal/rcar_thermal.c: In function 'rcar_thermal_probe':
> drivers/thermal/rcar_thermal.c:214:10: warning: passing argument 3 of
> 'thermal_zone_device_register' makes integer from pointer without a
> cast [enabled by default]
> include/linux/thermal.h:166:29: note: expected 'int' but argument is of
> type 'struct rcar_thermal_priv *'
> drivers/thermal/rcar_thermal.c:214:10: error: too few arguments to
> function 'thermal_zone_device_register'
> include/linux/thermal.h:166:29: note: declared here
> make[1]: *** [drivers/thermal/rcar_thermal.o] Error 1
> make: *** [drivers/thermal/rcar_thermal.o] Error 2
> 
> with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> 
> Signed-off-by: Devendra Naga 


Acked-by: Zhang Rui 

Thanks,
rui
> ---
> Hi Zhang
> 
> This is original patch.
> Please check Author's name after "git am"
> 
>  drivers/thermal/rcar_thermal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/rcar_thermal.c
> b/drivers/thermal/rcar_thermal.c index d445271..f7a1b57 100644
> --- a/drivers/thermal/rcar_thermal.c
> +++ b/drivers/thermal/rcar_thermal.c
> @@ -210,7 +210,7 @@ static int rcar_thermal_probe(struct
> platform_device *pdev)
>   goto error_free_priv;
>   }
> 
> - zone = thermal_zone_device_register("rcar_thermal", 0, priv,
> + zone = thermal_zone_device_register("rcar_thermal", 0, 0, priv,
>   &rcar_thermal_zone_ops, 0, 0);
>   if (IS_ERR(zone)) {
>   dev_err(&pdev->dev, "thermal zone device is NULL\n");
> --
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-31 Thread Zhang, Jun

Hello, Anvin

I want to explain why I modify in this place. In kexec, it pass three 
parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K
I think my patch modify the least code. 
Actually, there are some choise to fix it. 
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM range.
3)  add extra optional, like memmap=REMOVERAM

Which one do you like? Maybe you have better solution, please share it.
Thanks!

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 1:39 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 10:22 PM, Zhang, Jun wrote:
> Hello, Anvin
>You are right. Thanks!
>
> Hello, All
>Please review it again. Thanks!
>
>  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 
> 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
>   memory type address information in order to do I/O. so only
>   remove all RAM ranges which need to be dumped.
>
> Signed-off-by: jzha144 
> ---
>   arch/x86/kernel/e820.c |9 +
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
> df06ade..77be839 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
>* reset.
>*/
>   saved_max_pfn = e820_end_of_ram_pfn();
> +
> + /*
> +  * We are doing a crash dump, so remove all RAM ranges
> +  * as they are the ones that need to be dumped.
> +  * We still need all non-RAM information in order to do I/O.
> +  */
> + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
> + userdef = 1;
> + return 0;
>   #endif
>   e820.nr_map = 0;
>   userdef = 1;
>

The code is still wrong...

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] arm-dt: Enable DT proc updates.

2012-10-31 Thread Rob Herring

On 10/31/2012 10:57 AM, Pantelis Antoniou wrote:
> This simple patch enables dynamic changes of the DT tree on runtime
> to be visible to the device-tree proc interface.
> 
> Signed-off-by: Pantelis Antoniou 

Acked-by: Rob Herring 

> ---
>  arch/arm/include/asm/prom.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/include/asm/prom.h b/arch/arm/include/asm/prom.h
> index aeae9c6..6d65ba2 100644
> --- a/arch/arm/include/asm/prom.h
> +++ b/arch/arm/include/asm/prom.h
> @@ -11,6 +11,8 @@
>  #ifndef __ASMARM_PROM_H
>  #define __ASMARM_PROM_H
>  
> +#define HAVE_ARCH_DEVTREE_FIXUPS
> +
>  #ifdef CONFIG_OF
>  
>  extern struct machine_desc *setup_machine_fdt(unsigned int dt_phys);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread KOSAKI Motohiro

>> - making zero page daemon and avoid pagesize zero fill at page fault
>> - making new vma or page flags and mark as discardable w/o swap and
>>   vmscan treat it. (like this and/or MADV_FREE)
>
> Thanks for the information.
> I realized by you I'm not first people to think of this idea.
> Rik already tried it(https://lkml.org/lkml/2007/4/17/53) by new page flag
> and even other OSes already have such good feature. And John's concept was
> already tried long time ago (https://lkml.org/lkml/2005/11/1/384)
>
> Hmm, I look over Rik's thread but couldn't find why it wasn't merged
> at that time. Anyone know it?

Dunno. and I like volatile feature than old one. but bold remark, please don't
100% trust me, I haven't review a detailed code of your patch and I don't
strictly understand it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/3] ACPI: container hot remove support.

2012-10-31 Thread Tang Chen


Hi Yinghai,

How do you think the 1st patch ? Is the idea OK with you ?

And about the memory hotplug thing, so far as I know, we are trying to
limit kernel memory in some nodes, and only support to hot-remove the
nodes with out kernel memory. This functionality is called
online_movable. And some of the patches are already in next-tree, most
of the patches are under review. :)

Thanks. :)

On 11/01/2012 12:48 AM, Yinghai Lu wrote:

On Wed, Oct 31, 2012 at 4:09 AM, Yasuaki Ishimatsu
  wrote:

patch 2. Introduce a new function container_device_remove() to handle
   ACPI_NOTIFY_EJECT_REQUEST event for container.


If container device contains memory device, the function is
very danger. As you know, we are developing a memory hotplug.
If memory has kernel memory, memory hot remove operations fails.
But container_device_remove() cannot realize it. So even if
the memory hot remove operation fails, container_device_remove()
keeps hot remove operation. Finally, the function sends _EJ0
to firmware. In this case, if the memory is accessed, kernel
panic occurs.
The example is as follows:

  https://lkml.org/lkml/2012/9/26/318


so what is the overall status memory hot-remove?
how are following memory get processed ?
1. memory for kernel text, module
2. page table
3. vmemmap
4. memory for kmalloc, for dma



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] kvm, svm: Update MAINTAINERS entry

2012-10-31 Thread Marcelo Tosatti

On Mon, Oct 29, 2012 at 07:08:21PM +0100, Joerg Roedel wrote:
> I have no access to my AMD email address anymore. Update
> entry in MAINTAINERS to the new address.
> 
> Cc: Avi Kivity 
> Cc: Marcelo Tosatti 
> Signed-off-by: Joerg Roedel 
> ---
>  MAINTAINERS |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/3] ACPI: container hot remove support.

2012-10-31 Thread Tang Chen

On 10/31/2012 07:09 PM, Yasuaki Ishimatsu wrote:
> Hi Tang,
> 
> If container device contains memory device, the function is
> very danger. As you know, we are developing a memory hotplug.
> If memory has kernel memory, memory hot remove operations fails.
> But container_device_remove() cannot realize it. So even if
> the memory hot remove operation fails, container_device_remove()
> keeps hot remove operation. Finally, the function sends _EJ0
> to firmware. In this case, if the memory is accessed, kernel
> panic occurs.
> The example is as follows:
> 
>   https://lkml.org/lkml/2012/9/26/318
> 
Hi Ishimatsu,

I see, thanks for the info. So we need to do some roll back thing.

Is anyone doing this now ?
If yes, would you please give me some links to refer to ? And I think I
should push these patches later.
If not, I think I can try to do it.

Thanks. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Minchan Kim

On Wed, Oct 31, 2012 at 06:15:33PM -0700, Paul Turner wrote:
> On Wed, Oct 31, 2012 at 3:56 PM, KOSAKI Motohiro
>  wrote:
> >>> > Allocator should call madvise(MADV_NOVOLATILE) before reusing for
> >>> > allocating that area to user. Otherwise, accessing of volatile range
> >>> > will meet SIGBUS error.
> >>>
> >>> Well, why?  It would be easy enough for the fault handler to give
> >>> userspace a new, zeroed page at that address.
> >>
> >> Note: MADV_DONTNEED already has this (nice) property.
> >
> > I don't think I strictly understand this patch. but maybe I can answer why
> > userland and malloc folks don't like MADV_DONTNEED.
> >
> > glibc malloc discard freed memory by using MADV_DONTNEED
> > as tcmalloc. and it is often a source of large performance decrease.
> > because of MADV_DONTNEED discard memory immediately and
> > right after malloc() call fall into page fault and pagesize memset() path.
> > then, using DONTNEED increased zero fill and cache miss rate.
> >
> > At called free() time, malloc don't have a knowledge when next big malloc()
> > is called. then, immediate discarding may or may not get good performance
> > gain. (Ah, ok, the rate is not 5:5. then usually it is worth. but not 
> > everytime)
> >
> 
> Ah; In tcmalloc allocations (and their associated free-lists) are
> binned into separate lists as a function of object-size which helps to
> mitigate this.
> 
> I'd make a separate more general argument here:
> If I'm allocating a large (multi-kilobyte object) the cost of what I'm
> about to do with that object is likely fairly large -- The fault/zero
> cost a probably fairly small proportional cost, which limits the
> optimization value.

While I look at thread trial of Rik which is same goal while implementation
is different, I found this number.

https://lkml.org/lkml/2007/4/20/390

I believe optimiation is valuable. Of course, I need simillar testing for
proving it.

> 
> >
> > In past, several developers tryied to avoid such situation, likes
> >
> > - making zero page daemon and avoid pagesize zero fill at page fault
> > - making new vma or page flags and mark as discardable w/o swap and
> >   vmscan treat it. (like this and/or MADV_FREE)
> > - making new process option and avoid page zero fill from page fault path.
> >   (yes, it is big incompatibility and insecure. but some embedded folks 
> > thought
> >they are acceptable downside)
> > - etc
> >
> >
> > btw, I'm not sure this patch is better for malloc because current 
> > MADV_DONTNEED
> > don't need mmap_sem and works very effectively when a lot of threads case.
> > taking mmap_sem might bring worse performance than DONTNEED. dunno.
> 
> MADV_VOLATILE also seems to end up looking quite similar to a
> user-visible (range-based) cleancache.
> 
> A second popular use-case for such semantics is the case of
> discardable cache elements (e.g. web browser).  I suspect we'd want to
> at least mention these in the changelog.  (Alternatively, what does a
> cleancache-backed-fs exposing these semantics look like?)
> 

It's a trial of John Stultz(http://lwn.net/Articles/518130/, there was another
trial long time ago https://lkml.org/lkml/2005/11/1/384) and I want to
expand the concept from file-backed page to anonymous page so this patch
is a trial for anonymous page. So, usecase of my patch have focussed on
malloc/free case.
I hope both are able to be unified.

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: Fix the "WFI" instruction opcode definition.

2012-10-31 Thread Rob Herring

On 10/31/2012 08:24 PM, Yangfei (Felix) wrote:
> The current "WFI" opcode definiton causes CPU hot-plug feature fails to work
> if the kernel is built with CONFIG_THUMB2_KERNEL/CONFIG_CPU_ENDIAN_BE8 being
> defined. An invalid instruction exception will be generated.
> 
> Signed-off-by: yangfei.ker...@gmail.com
> ---
>  arch/arm/mach-exynos/hotplug.c   |8 +++-
>  arch/arm/mach-realview/hotplug.c |8 +++-
>  arch/arm/mach-shmobile/hotplug.c |8 +++-
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/mach-exynos/hotplug.c b/arch/arm/mach-exynos/hotplug.c
> index f4d7dd2..823a0e4 100644
> --- a/arch/arm/mach-exynos/hotplug.c
> +++ b/arch/arm/mach-exynos/hotplug.c
> @@ -18,11 +18,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
>  #include "common.h"
>  
> +/*
> + * Define opcode of the WFI instruction.
> + */
> +#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
> +
>  static inline void cpu_enter_lowpower(void)
>  {
>   unsigned int v;
> @@ -72,7 +78,7 @@ static inline void platform_do_lowpower(unsigned int cpu, 
> int *spurious)
>   /*
>* here's the WFI
>*/
> - asm(".word  0xe320f003\n"
> + asm(__WFI

Wouldn't using the actual wfi instruction fix this. There is a wfi() macro.

Or just call cpu_do_idle() which will do any other things needed before
wfi like a dsb instruction.

Rob
>   :
>   :
>   : "memory", "cc");
> diff --git a/arch/arm/mach-realview/hotplug.c 
> b/arch/arm/mach-realview/hotplug.c
> index 53818e5..5271a1a 100644
> --- a/arch/arm/mach-realview/hotplug.c
> +++ b/arch/arm/mach-realview/hotplug.c
> @@ -15,6 +15,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +/*
> + * Define opcode of the WFI instruction.
> + */
> +#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
>  
>  static inline void cpu_enter_lowpower(void)
>  {
> @@ -64,7 +70,7 @@ static inline void platform_do_lowpower(unsigned int cpu, 
> int *spurious)
>   /*
>* here's the WFI
>*/
> - asm(".word  0xe320f003\n"
> + asm(__WFI
>   :
>   :
>   : "memory", "cc");
> diff --git a/arch/arm/mach-shmobile/hotplug.c 
> b/arch/arm/mach-shmobile/hotplug.c
> index b09a0bd..0d7b7d1 100644
> --- a/arch/arm/mach-shmobile/hotplug.c
> +++ b/arch/arm/mach-shmobile/hotplug.c
> @@ -20,6 +20,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +/*
> + * Define opcode of the WFI instruction.
> + */
> ++#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
>  
>  static cpumask_t dead_cpus;
>  
> @@ -39,7 +45,7 @@ void shmobile_cpu_die(unsigned int cpu)
>   /*
>* here's the WFI
>*/
> - asm(".word  0xe320f003\n"
> + asm(__WFI
>   :
>   :
>   : "memory", "cc");
> --
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Minchan Kim

On Wed, Oct 31, 2012 at 06:22:58PM -0700, Paul Turner wrote:
> On Wed, Oct 31, 2012 at 5:50 PM, Minchan Kim  wrote:
> > Hello,
> >
> > On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote:
> >> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton
> >>  wrote:
> >> >
> >> > On Tue, 30 Oct 2012 10:29:54 +0900
> >> > Minchan Kim  wrote:
> >> >
> >> > > This patch introudces new madvise behavior MADV_VOLATILE and
> >> > > MADV_NOVOLATILE for anonymous pages. It's different with
> >> > > John Stultz's version which considers only tmpfs while this patch
> >> > > considers only anonymous pages so this cannot cover John's one.
> >> > > If below idea is proved as reasonable, I hope we can unify both
> >> > > concepts by madvise/fadvise.
> >> > >
> >> > > Rationale is following as.
> >> > > Many allocators call munmap(2) when user call free(3) if ptr is
> >> > > in mmaped area. But munmap isn't cheap because it have to clean up
> >> > > all pte entries and unlinking a vma so overhead would be increased
> >> > > linearly by mmaped area's size.
> >> >
> >> > Presumably the userspace allocator will internally manage memory in
> >> > large chunks, so the munmap() call frequency will be much lower than
> >> > the free() call frequency.  So the performance gains from this change
> >> > might be very small.
> >>
> >> I don't think I strictly understand the motivation from a
> >> malloc-standpoint here.
> >>
> >> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want
> >> to perform discards on Linux.For any reasonable allocator (short
> >> of binding malloc --> mmap, free --> unmap) this seems a better
> >> choice.
> >>
> >> Note also from a performance stand-point I doubt any allocator (which
> >> case about performance) is going to want to pay the cost of even a
> >> null syscall about typical malloc/free usage (consider: a tcmalloc
> >
> > Good point.
> >
> >> malloc/free pairis currently <20ns).  Given then that this cost is
> >> amortized once you start doing discards on larger blocks MADV_DONTNEED
> >> seems a preferable interface:
> >> - You don't need to reconstruct an arena when you do want to allocate
> >> since there's no munmap/mmap for the region to change about
> >> - There are no syscalls involved in later reallocating the block.
> >
> > Above benefits are applied on MADV_VOLATILE, too.
> > But as you pointed out, there is a little bit overhead than DONTNEED
> > because allocator should call madvise(MADV_NOVOLATILE) before allocation.
> > For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem
> > and could be a problem on parallel malloc/free workload as KOSAKI pointed 
> > out.
> >
> > In such case, we can change semantic so malloc doesn't need to call
> > madivse(NOVOLATILE) before allocating. Then, page fault handler have to
> > check whether this page fault happen by access of volatile vma. If so,
> > it could return zero page instead of SIGBUS and mark the vma isn't volatile
> > any more.
> 
> I think being able to determine whether the backing was discarded
> (about a atomic transition to non-volatile) would be a required
> property to make this useful for non-malloc use-cases.
> 

Absolutely.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ARM: Fix the "WFI" instruction opcode definition.

2012-10-31 Thread Yangfei (Felix)

The current "WFI" opcode definiton causes CPU hot-plug feature fails to work
if the kernel is built with CONFIG_THUMB2_KERNEL/CONFIG_CPU_ENDIAN_BE8 being
defined. An invalid instruction exception will be generated.

Signed-off-by: yangfei.ker...@gmail.com
---
 arch/arm/mach-exynos/hotplug.c   |8 +++-
 arch/arm/mach-realview/hotplug.c |8 +++-
 arch/arm/mach-shmobile/hotplug.c |8 +++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mach-exynos/hotplug.c b/arch/arm/mach-exynos/hotplug.c
index f4d7dd2..823a0e4 100644
--- a/arch/arm/mach-exynos/hotplug.c
+++ b/arch/arm/mach-exynos/hotplug.c
@@ -18,11 +18,17 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 #include "common.h"
 
+/*
+ * Define opcode of the WFI instruction.
+ */
+#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
+
 static inline void cpu_enter_lowpower(void)
 {
unsigned int v;
@@ -72,7 +78,7 @@ static inline void platform_do_lowpower(unsigned int cpu, int 
*spurious)
/*
 * here's the WFI
 */
-   asm(".word  0xe320f003\n"
+   asm(__WFI
:
:
: "memory", "cc");
diff --git a/arch/arm/mach-realview/hotplug.c b/arch/arm/mach-realview/hotplug.c
index 53818e5..5271a1a 100644
--- a/arch/arm/mach-realview/hotplug.c
+++ b/arch/arm/mach-realview/hotplug.c
@@ -15,6 +15,12 @@
 #include 
 #include 
 #include 
+#include 
+
+/*
+ * Define opcode of the WFI instruction.
+ */
+#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
 
 static inline void cpu_enter_lowpower(void)
 {
@@ -64,7 +70,7 @@ static inline void platform_do_lowpower(unsigned int cpu, int 
*spurious)
/*
 * here's the WFI
 */
-   asm(".word  0xe320f003\n"
+   asm(__WFI
:
:
: "memory", "cc");
diff --git a/arch/arm/mach-shmobile/hotplug.c b/arch/arm/mach-shmobile/hotplug.c
index b09a0bd..0d7b7d1 100644
--- a/arch/arm/mach-shmobile/hotplug.c
+++ b/arch/arm/mach-shmobile/hotplug.c
@@ -20,6 +20,12 @@
 #include 
 #include 
 #include 
+#include 
+
+/*
+ * Define opcode of the WFI instruction.
+ */
++#define __WFI __inst_arm_thumb16(0xe320f003, 0xbf30)
 
 static cpumask_t dead_cpus;
 
@@ -39,7 +45,7 @@ void shmobile_cpu_die(unsigned int cpu)
/*
 * here's the WFI
 */
-   asm(".word  0xe320f003\n"
+   asm(__WFI
:
:
: "memory", "cc");
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Paul Turner

On Wed, Oct 31, 2012 at 5:50 PM, Minchan Kim  wrote:
> Hello,
>
> On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote:
>> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton
>>  wrote:
>> >
>> > On Tue, 30 Oct 2012 10:29:54 +0900
>> > Minchan Kim  wrote:
>> >
>> > > This patch introudces new madvise behavior MADV_VOLATILE and
>> > > MADV_NOVOLATILE for anonymous pages. It's different with
>> > > John Stultz's version which considers only tmpfs while this patch
>> > > considers only anonymous pages so this cannot cover John's one.
>> > > If below idea is proved as reasonable, I hope we can unify both
>> > > concepts by madvise/fadvise.
>> > >
>> > > Rationale is following as.
>> > > Many allocators call munmap(2) when user call free(3) if ptr is
>> > > in mmaped area. But munmap isn't cheap because it have to clean up
>> > > all pte entries and unlinking a vma so overhead would be increased
>> > > linearly by mmaped area's size.
>> >
>> > Presumably the userspace allocator will internally manage memory in
>> > large chunks, so the munmap() call frequency will be much lower than
>> > the free() call frequency.  So the performance gains from this change
>> > might be very small.
>>
>> I don't think I strictly understand the motivation from a
>> malloc-standpoint here.
>>
>> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want
>> to perform discards on Linux.For any reasonable allocator (short
>> of binding malloc --> mmap, free --> unmap) this seems a better
>> choice.
>>
>> Note also from a performance stand-point I doubt any allocator (which
>> case about performance) is going to want to pay the cost of even a
>> null syscall about typical malloc/free usage (consider: a tcmalloc
>
> Good point.
>
>> malloc/free pairis currently <20ns).  Given then that this cost is
>> amortized once you start doing discards on larger blocks MADV_DONTNEED
>> seems a preferable interface:
>> - You don't need to reconstruct an arena when you do want to allocate
>> since there's no munmap/mmap for the region to change about
>> - There are no syscalls involved in later reallocating the block.
>
> Above benefits are applied on MADV_VOLATILE, too.
> But as you pointed out, there is a little bit overhead than DONTNEED
> because allocator should call madvise(MADV_NOVOLATILE) before allocation.
> For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem
> and could be a problem on parallel malloc/free workload as KOSAKI pointed out.
>
> In such case, we can change semantic so malloc doesn't need to call
> madivse(NOVOLATILE) before allocating. Then, page fault handler have to
> check whether this page fault happen by access of volatile vma. If so,
> it could return zero page instead of SIGBUS and mark the vma isn't volatile
> any more.

I think being able to determine whether the backing was discarded
(about a atomic transition to non-volatile) would be a required
property to make this useful for non-malloc use-cases.

>
>>
>> The only real additional cost is address-space.  Are you strongly
>> concerned about the 32-bit case?
>
> No. I believe allocators have a logic to clean up them once address space is
> almost full.
>
> Thanks, Paul.
>
> --
> Kind regards,
> Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Minchan Kim

Hi KOSAKI,

On Wed, Oct 31, 2012 at 06:56:05PM -0400, KOSAKI Motohiro wrote:
> >> > Allocator should call madvise(MADV_NOVOLATILE) before reusing for
> >> > allocating that area to user. Otherwise, accessing of volatile range
> >> > will meet SIGBUS error.
> >>
> >> Well, why?  It would be easy enough for the fault handler to give
> >> userspace a new, zeroed page at that address.
> >
> > Note: MADV_DONTNEED already has this (nice) property.
> 
> I don't think I strictly understand this patch. but maybe I can answer why
> userland and malloc folks don't like MADV_DONTNEED.
> 
> glibc malloc discard freed memory by using MADV_DONTNEED
> as tcmalloc. and it is often a source of large performance decrease.
> because of MADV_DONTNEED discard memory immediately and
> right after malloc() call fall into page fault and pagesize memset() path.
> then, using DONTNEED increased zero fill and cache miss rate.
> 
> At called free() time, malloc don't have a knowledge when next big malloc()
> is called. then, immediate discarding may or may not get good performance
> gain. (Ah, ok, the rate is not 5:5. then usually it is worth. but not 
> everytime)
> 
> 
> In past, several developers tryied to avoid such situation, likes
> 
> - making zero page daemon and avoid pagesize zero fill at page fault
> - making new vma or page flags and mark as discardable w/o swap and
>   vmscan treat it. (like this and/or MADV_FREE)

Thanks for the information.
I realized by you I'm not first people to think of this idea.
Rik already tried it(https://lkml.org/lkml/2007/4/17/53) by new page flag
and even other OSes already have such good feature. And John's concept was
already tried long time ago (https://lkml.org/lkml/2005/11/1/384)

Hmm, I look over Rik's thread but couldn't find why it wasn't merged
at that time. Anyone know it?

> - making new process option and avoid page zero fill from page fault path.
>   (yes, it is big incompatibility and insecure. but some embedded folks 
> thought
>they are acceptable downside)
> - etc
> 
> 
> btw, I'm not sure this patch is better for malloc because current 
> MADV_DONTNEED
> don't need mmap_sem and works very effectively when a lot of threads case.
> taking mmap_sem might bring worse performance than DONTNEED. dunno.

It's a good point. 

Quote from my reply to Paul
"
In such case, we can change semantic so malloc doesn't need to call
madivse(NOVOLATILE) before allocating. Then, page fault handler have to
check whether this page fault happen by access of volatile vma. If so,
it could return zero page instead of SIGBUS and mark the vma isn't volatile
any more.
"
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Paul Turner

On Wed, Oct 31, 2012 at 3:56 PM, KOSAKI Motohiro
 wrote:
>>> > Allocator should call madvise(MADV_NOVOLATILE) before reusing for
>>> > allocating that area to user. Otherwise, accessing of volatile range
>>> > will meet SIGBUS error.
>>>
>>> Well, why?  It would be easy enough for the fault handler to give
>>> userspace a new, zeroed page at that address.
>>
>> Note: MADV_DONTNEED already has this (nice) property.
>
> I don't think I strictly understand this patch. but maybe I can answer why
> userland and malloc folks don't like MADV_DONTNEED.
>
> glibc malloc discard freed memory by using MADV_DONTNEED
> as tcmalloc. and it is often a source of large performance decrease.
> because of MADV_DONTNEED discard memory immediately and
> right after malloc() call fall into page fault and pagesize memset() path.
> then, using DONTNEED increased zero fill and cache miss rate.
>
> At called free() time, malloc don't have a knowledge when next big malloc()
> is called. then, immediate discarding may or may not get good performance
> gain. (Ah, ok, the rate is not 5:5. then usually it is worth. but not 
> everytime)
>

Ah; In tcmalloc allocations (and their associated free-lists) are
binned into separate lists as a function of object-size which helps to
mitigate this.

I'd make a separate more general argument here:
If I'm allocating a large (multi-kilobyte object) the cost of what I'm
about to do with that object is likely fairly large -- The fault/zero
cost a probably fairly small proportional cost, which limits the
optimization value.

>
> In past, several developers tryied to avoid such situation, likes
>
> - making zero page daemon and avoid pagesize zero fill at page fault
> - making new vma or page flags and mark as discardable w/o swap and
>   vmscan treat it. (like this and/or MADV_FREE)
> - making new process option and avoid page zero fill from page fault path.
>   (yes, it is big incompatibility and insecure. but some embedded folks 
> thought
>they are acceptable downside)
> - etc
>
>
> btw, I'm not sure this patch is better for malloc because current 
> MADV_DONTNEED
> don't need mmap_sem and works very effectively when a lot of threads case.
> taking mmap_sem might bring worse performance than DONTNEED. dunno.

MADV_VOLATILE also seems to end up looking quite similar to a
user-visible (range-based) cleancache.

A second popular use-case for such semantics is the case of
discardable cache elements (e.g. web browser).  I suspect we'd want to
at least mention these in the changelog.  (Alternatively, what does a
cleancache-backed-fs exposing these semantics look like?)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [PATCH] x86: Don't clobber top of pt_regs in nested NMI

2012-10-31 Thread Steven Rostedt

On Mon, 2012-10-01 at 17:29 -0700, Salman Qazi wrote:
> The nested NMI modifies the place (instruction, flags and stack)
> that the first NMI will iret to.  However, the copy of registers
> modified is exactly the one that is the part of pt_regs in
> the first NMI.  This can change the behaviour of the first NMI.
> 
> In particular, Google's arch_trigger_all_cpu_backtrace handler
> also prints regions of memory surrounding addresses appearing in
> registers.  This results in handled exceptions, after which nested NMIs
> start coming in.  These nested NMIs change the value of registers
> in pt_regs.  This can cause the original NMI handler to produce
> incorrect output.
> 
> We solve this problem by interchanging the position of the preserved
> copy of the iret registers ("saved") and the copy subject to being
> trampled by nested NMI ("copied").
> 

I was all ready to push this forward, but on my last final review I
found some nits that prevent me from doing so.

> Signed-off-by: Salman Qazi 
> ---
>  arch/x86/kernel/entry_64.S |   41 +++--
>  1 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 44531ac..b5d6e43 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1739,9 +1739,10 @@ nested_nmi:
>  
>  1:
>   /* Set up the interrupted NMIs stack to jump to repeat_nmi */
> - leaq -6*8(%rsp), %rdx
> + leaq -1*8(%rsp), %rdx
>   movq %rdx, %rsp
> - CFI_ADJUST_CFA_OFFSET 6*8
> + CFI_ADJUST_CFA_OFFSET 1*8
> + leaq -10*8(%rsp), %rdx
>   pushq_cfi $__KERNEL_DS
>   pushq_cfi %rdx
>   pushfq_cfi
> @@ -1749,8 +1750,8 @@ nested_nmi:
>   pushq_cfi $repeat_nmi
>  
>   /* Put stack back */
> - addq $(11*8), %rsp
> - CFI_ADJUST_CFA_OFFSET -11*8
> + addq $(6*8), %rsp
> + CFI_ADJUST_CFA_OFFSET -6*8
>  
>  nested_nmi_out:
>   popq_cfi %rdx
> @@ -1776,18 +1777,18 @@ first_nmi:
>* +-+
>* | NMI executing variable  |
>* +-+
> -  * | Saved SS|
> -  * | Saved Return RSP|
> -  * | Saved RFLAGS|
> -  * | Saved CS|
> -  * | Saved RIP   |
> -  * +-+
>* | copied SS   |
>* | copied Return RSP   |
>* | copied RFLAGS   |
>* | copied CS   |
>* | copied RIP  |
>* +-+
> +  * | Saved SS|
> +  * | Saved Return RSP|
> +  * | Saved RFLAGS|
> +  * | Saved CS|
> +  * | Saved RIP   |
> +  * +-+
>* | pt_regs |
>* +-+
>*
> @@ -1803,9 +1804,14 @@ first_nmi:
>   /* Set the NMI executing variable on the stack. */
>   pushq_cfi $1
>  
> + /*
> +  * Leave room for the "copied" frame
> +  */
> + subq $(5*8), %rsp
> +
>   /* Copy the stack frame to the Saved frame */
>   .rept 5
> - pushq_cfi 6*8(%rsp)
> + pushq_cfi 11*8(%rsp)
>   .endr
>   CFI_DEF_CFA_OFFSET SS+8-RIP
>  
> @@ -1826,12 +1832,15 @@ repeat_nmi:
>* is benign for the non-repeat case, where 1 was pushed just above
>* to this very stack slot).
>*/
> - movq $1, 5*8(%rsp)
> + movq $1, 10*8(%rsp)
>  
>   /* Make another copy, this one may be modified by nested NMIs */
> + addq $(10*8), %rsp

This breaks the CFI magic.

>   .rept 5
> - pushq_cfi 4*8(%rsp)
> + pushq_cfi -6*8(%rsp)
>   .endr
> + subq $(5*8), %rsp

So does this.

This needs to be annotated correctly before I can push it out. But the
good news is, I stressed tested this change, and it all works out.

Jan, can you help out here?

-- Steve

> +
>   CFI_DEF_CFA_OFFSET SS+8-RIP
>  end_repeat_nmi:
>  
> @@ -1882,8 +1891,12 @@ nmi_swapgs:
>   SWAPGS_UNSAFE_STACK
>  nmi_restore:
>   RESTORE_ALL 8
> +
> + /* Pop the extra iret frame */
> + addq $(5*8), %rsp
> +
>   /* Clear the NMI executing stack variable */
> - movq $0, 10*8(%rsp)
> + movq $0, 5*8(%rsp)
>   jmp irq_return
>   CFI_ENDPROC
>  END(nmi)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] module: add syscall to load module from fd

2012-10-31 Thread Rusty Russell

Kees Cook  writes:
> Rusty,
>
> I haven't seen this land in your modules-next tree. I just wanted to
> make sure it hadn't gotten lost. I'd like to do some kmod tests
> against linux-next, but I've been waiting for this to appear.

Yes, sorting that out now, they should be in tomorrow's linux-next.
And I've sent the ppc patch to linuxppc-dev for Acks.

Thanks for the prod,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: plat-versatile: move FPGA irq driver to drivers/irqchip

2012-10-31 Thread Rob Herring

On 10/31/2012 04:31 PM, Linus Walleij wrote:
> This moves the Versatile FPGA interrupt controller driver, used in
> the Integrator/AP, Integrator/CP and some Versatile boards, out
> of arch/arm/plat-versatile and down to drivers/irqchip where we
> have consensus that such drivers belong. The header file is
> consequently moved to .
> 
> Signed-off-by: Linus Walleij 
> ---
>  arch/arm/Kconfig   |   4 +-
>  arch/arm/mach-integrator/integrator_ap.c   |   3 +-
>  arch/arm/mach-integrator/integrator_cp.c   |   2 +-
>  arch/arm/mach-versatile/core.c |   2 +-
>  arch/arm/plat-versatile/Kconfig|   9 -
>  arch/arm/plat-versatile/Makefile   |   1 -
>  drivers/irqchip/Kconfig|   9 +-
>  drivers/irqchip/Makefile   |   1 +
>  drivers/irqchip/irq-arm-fpga.c | 204 
> +
>  .../irqchip/irq-versatile-fpga.c   |   4 +-
>  .../linux/platform_data/irq-versatile-fpga.h   |   0
>  11 files changed, 220 insertions(+), 19 deletions(-)
>  create mode 100644 drivers/irqchip/irq-arm-fpga.c
>  rename arch/arm/plat-versatile/fpga-irq.c => 
> drivers/irqchip/irq-versatile-fpga.c (97%)
>  rename arch/arm/plat-versatile/include/plat/fpga-irq.h => 
> include/linux/platform_data/irq-versatile-fpga.h (100%)

I think include/linux/irqchip/ is the right place. Ideally we would not
need the header at all. You can remove some of the function declarations
if you base this on Thomas Petazzoni's series to have a common init
function for DT and also move the fpga_handle_irq init into the
fpga_irq_init function.

Rob

> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 73067ef..2205e3e 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -284,8 +284,8 @@ config ARCH_INTEGRATOR
>   select MULTI_IRQ_HANDLER
>   select NEED_MACH_MEMORY_H
>   select PLAT_VERSATILE
> - select PLAT_VERSATILE_FPGA_IRQ
>   select SPARSE_IRQ
> + select VERSATILE_FPGA_IRQ
>   help
> Support for ARM's Integrator platform.
>  
> @@ -318,7 +318,7 @@ config ARCH_VERSATILE
>   select PLAT_VERSATILE
>   select PLAT_VERSATILE_CLCD
>   select PLAT_VERSATILE_CLOCK
> - select PLAT_VERSATILE_FPGA_IRQ
> + select VERSATILE_FPGA_IRQ
>   help
> This enables support for ARM Ltd Versatile board.
>  
> diff --git a/arch/arm/mach-integrator/integrator_ap.c 
> b/arch/arm/mach-integrator/integrator_ap.c
> index 4f13bc5..caa279f 100644
> --- a/arch/arm/mach-integrator/integrator_ap.c
> +++ b/arch/arm/mach-integrator/integrator_ap.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -56,8 +57,6 @@
>  #include 
>  #include 
>  
> -#include 
> -
>  #include "common.h"
>  
>  /* 
> diff --git a/arch/arm/mach-integrator/integrator_cp.c 
> b/arch/arm/mach-integrator/integrator_cp.c
> index 4423bc8..b50fdc7 100644
> --- a/arch/arm/mach-integrator/integrator_cp.c
> +++ b/arch/arm/mach-integrator/integrator_cp.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -46,7 +47,6 @@
>  #include 
>  
>  #include 
> -#include 
>  #include 
>  
>  #include "common.h"
> diff --git a/arch/arm/mach-versatile/core.c b/arch/arm/mach-versatile/core.c
> index 5b5c1ee..46bfb8c 100644
> --- a/arch/arm/mach-versatile/core.c
> +++ b/arch/arm/mach-versatile/core.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -51,7 +52,6 @@
>  #include 
>  
>  #include 
> -#include 
>  #include 
>  
>  #include "core.h"
> diff --git a/arch/arm/plat-versatile/Kconfig b/arch/arm/plat-versatile/Kconfig
> index 2a4ae8a..619f0fa 100644
> --- a/arch/arm/plat-versatile/Kconfig
> +++ b/arch/arm/plat-versatile/Kconfig
> @@ -6,15 +6,6 @@ config PLAT_VERSATILE_CLOCK
>  config PLAT_VERSATILE_CLCD
>   bool
>  
> -config PLAT_VERSATILE_FPGA_IRQ
> - bool
> - select IRQ_DOMAIN
> -
> -config PLAT_VERSATILE_FPGA_IRQ_NR
> -   int
> -   default 4
> -   depends on PLAT_VERSATILE_FPGA_IRQ
> -
>  config PLAT_VERSATILE_LEDS
>   def_bool y if NEW_LEDS
>   depends on ARCH_REALVIEW || ARCH_VERSATILE
> diff --git a/arch/arm/plat-versatile/Makefile 
> b/arch/arm/plat-versatile/Makefile
> index 74cfd94..f88d448 100644
> --- a/arch/arm/plat-versatile/Makefile
> +++ b/arch/arm/plat-versatile/Makefile
> @@ -2,7 +2,6 @@ ccflags-$(CONFIG_ARCH_MULTIPLATFORM) := 
> -I$(srctree)/$(src)/include
>  
>  obj-$(CONFIG_PLAT_VERSATILE_CLOCK) += clock.o
>  obj-$(CONFIG_PLAT_VERSATILE_CLCD) += clcd.o
> -obj-$(CONFIG_PLAT_VERSATILE_FPGA_IRQ) += fpga-irq.o
>  obj-$(CONFIG_PLAT_VERSATILE_LEDS) += leds.o
>  obj-$(CONFIG_PLAT_VERSATILE_SCHED_CLOCK) += sched-clock.o
>  obj-$(CONFIG_SMP) += headsmp.o platsmp.o
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index

Re: [PATCH 14/16] virtio: Convert dev_printk(KERN_ to dev_(

2012-10-31 Thread Rusty Russell

Joe Perches  writes:

> dev_ calls take less code than dev_printk(KERN_
> and reducing object size is good.
> Convert if (printk_ratelimit()) dev_printk to dev__ratelimited.
>
> Signed-off-by: Joe Perches 

Applied.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/7] capebus: Core capebus support

2012-10-31 Thread Russ Dill

On Wed, Oct 31, 2012 at 3:07 PM, Pantelis Antoniou
 wrote:
>
> On Oct 31, 2012, at 11:55 PM, Russ Dill wrote:
>
>> On Wed, Oct 31, 2012 at 9:52 AM, Pantelis Antoniou
>>  wrote:
>>> Introducing capebus; a bus that allows small boards (capes) to connect
>>> to a complex SoC using simple expansion connectors.
>>>
>
> [snip]
>>> +   if (drv) {
>>> +   /* call the removed bus method (if added prev.) */
>>> +   if (cape_dev->added) {
>>> +   BUG_ON(cape_dev->bus == NULL);
>>> +   BUG_ON(cape_dev->bus->ops == NULL);
>>> +   if (cape_dev->bus->ops->dev_removed)
>>> +   cape_dev->bus->ops->dev_removed(cape_dev);
>>> +   cape_dev->added = 0;
>>> +   }
>>
>> Is there any case where added will not track drv?
>
>
> Yes, there is a corner case here.
>
> There is the case where while the device is created there is no matching
> driver yet. Either that's the case of a not supported cape, or the
> cape driver hasn't been loaded yet.
>
> We do need the device to be created, so that the user can browse in the
> sysfs it's eeprom attributes.
>
> There's some further complications with runtime cape overrides, but
> that's the gist of it.

I'm trying to figure out how that would come about, here is where
added is set to 1:

+   /* all is fine... */
+   cape_dev->driver = drv;
+   cape_dev->added = 1;

This is after calling drv->probe, so drv is not null.

There is a brief time here where added is 0, but driver is not.

+   if (drv) {
+   /* call the removed bus method (if added prev.) */
+   if (cape_dev->added) {
+   BUG_ON(cape_dev->bus == NULL);
+   BUG_ON(cape_dev->bus->ops == NULL);
+   if (cape_dev->bus->ops->dev_removed)
+   cape_dev->bus->ops->dev_removed(cape_dev);
+   cape_dev->added = 0;
+   }
+   if (drv->remove) {
+   pm_runtime_get_sync(dev);
+   drv->remove(cape_dev);
+   pm_runtime_put_noidle(dev);
+   }
+   cape_dev->driver = NULL;

Is one of the remove or resume functions check added in this case?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Minchan Kim

Hello,

On Wed, Oct 31, 2012 at 02:59:07PM -0700, Paul Turner wrote:
> On Wed, Oct 31, 2012 at 2:35 PM, Andrew Morton
>  wrote:
> >
> > On Tue, 30 Oct 2012 10:29:54 +0900
> > Minchan Kim  wrote:
> >
> > > This patch introudces new madvise behavior MADV_VOLATILE and
> > > MADV_NOVOLATILE for anonymous pages. It's different with
> > > John Stultz's version which considers only tmpfs while this patch
> > > considers only anonymous pages so this cannot cover John's one.
> > > If below idea is proved as reasonable, I hope we can unify both
> > > concepts by madvise/fadvise.
> > >
> > > Rationale is following as.
> > > Many allocators call munmap(2) when user call free(3) if ptr is
> > > in mmaped area. But munmap isn't cheap because it have to clean up
> > > all pte entries and unlinking a vma so overhead would be increased
> > > linearly by mmaped area's size.
> >
> > Presumably the userspace allocator will internally manage memory in
> > large chunks, so the munmap() call frequency will be much lower than
> > the free() call frequency.  So the performance gains from this change
> > might be very small.
> 
> I don't think I strictly understand the motivation from a
> malloc-standpoint here.
> 
> These days we (tcmalloc) use madvise(..., MADV_DONTNEED) when we want
> to perform discards on Linux.For any reasonable allocator (short
> of binding malloc --> mmap, free --> unmap) this seems a better
> choice.
> 
> Note also from a performance stand-point I doubt any allocator (which
> case about performance) is going to want to pay the cost of even a
> null syscall about typical malloc/free usage (consider: a tcmalloc

Good point.

> malloc/free pairis currently <20ns).  Given then that this cost is
> amortized once you start doing discards on larger blocks MADV_DONTNEED
> seems a preferable interface:
> - You don't need to reconstruct an arena when you do want to allocate
> since there's no munmap/mmap for the region to change about
> - There are no syscalls involved in later reallocating the block.

Above benefits are applied on MADV_VOLATILE, too.
But as you pointed out, there is a little bit overhead than DONTNEED
because allocator should call madvise(MADV_NOVOLATILE) before allocation.
For mavise(NOVOLATILE) does just mark vma flag, it does need mmap_sem
and could be a problem on parallel malloc/free workload as KOSAKI pointed out.

In such case, we can change semantic so malloc doesn't need to call
madivse(NOVOLATILE) before allocating. Then, page fault handler have to
check whether this page fault happen by access of volatile vma. If so,
it could return zero page instead of SIGBUS and mark the vma isn't volatile
any more.

> 
> The only real additional cost is address-space.  Are you strongly
> concerned about the 32-bit case?

No. I believe allocators have a logic to clean up them once address space is
almost full.

Thanks, Paul.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-10-31 Thread Michael Wang

On 11/01/2012 02:57 AM, Paton J. Lewis wrote:
> On 10/30/12 11:32 PM, Michael Wang wrote:
>> On 10/26/2012 08:08 AM, Paton J. Lewis wrote:
>>> From: "Paton J. Lewis" 
>>>
>>> It is not currently possible to reliably delete epoll items when
>>> using the
>>> same epoll set from multiple threads. After calling epoll_ctl with
>>> EPOLL_CTL_DEL, another thread might still be executing code related
>>> to an
>>> event for that epoll item (in response to epoll_wait). Therefore the
>>> deleting
>>> thread does not know when it is safe to delete resources pertaining
>>> to the
>>> associated epoll item because another thread might be using those
>>> resources.
>>>
>>> The deleting thread could wait an arbitrary amount of time after calling
>>> epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
>>> inefficient and could result in the destruction of resources before
>>> another
>>> thread is done handling an event returned by epoll_wait.
>>>
>>> This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which
>>> disables an
>>> epoll item. If epoll_ctl returns -EBUSY in this case, then another
>>> thread may
>>> handling a return from epoll_wait for this item. Otherwise if epoll_ctl
>>> returns 0, then it is safe to delete the epoll item. This allows
>>> multiple
>>> threads to use a mutex to determine when it is safe to delete an
>>> epoll item
>>> and its associated resources, which allows epoll items to be deleted
>>> both
>>> efficiently and without error in a multi-threaded environment. Note that
>>> EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT,
>>> and using
>>> EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.
>>>
>>> This patch also adds a new test_epoll self-test program to both
>>> demonstrate
>>> the need for this feature and test it.
>>
>> Hi, Paton
>>
>> I'm just think about may be we could use this way.
>>
>> Seems like currently we are depending on the epoll_ctl() to indicate the
>> start point of safe section and epoll_wait() for the end point, like:
>>
>>  while () {
>>  epoll_wait()--
>>
>>  fd event arrivedsafe section
>>
>>  clear fd epi->event.events
>>  --
>>  if (fd need stop)
>>  continue;
>>  --
>>  ...fd data process...
>>
>>  epoll_ctl(MOD)  danger section
>>
>>  set fd epi->event.events--
>>
>>  continue;
>>  }
>>
>> So we got a safe section and do delete work in this section won't cause
>> trouble since we have a stop check directly after it.
>>
>> Actually what we want is to make sure no one will touch the fd any more
>> after we DISABLE it.
>>
>> Then what about we add a ref count and a stop flag in epi, maintain it
>> like:
>>
>>  epoll_wait()
>>
>>  check user events and
>>  dec the ref count of fd ---
>>
>>  ...
>>
>>  fd event arrivedsafe sec if ref count is 0
>>
>>  if epi stop flag set
>>  do nothing
>>  else
>>  inc epi ref count   ---
> 
> The pseudecode you provide below (for "DISABLE") seems to indicate that
> this "epi ref count" must be maintained by the kernel. Therefore any
> userspace modification of a ref count associated with an epoll item will
> require a new or changed kernel API.
> 
>>  send event
>>
>> And what DISABLE do is:
>>
>>  set epi stop flag
>>
>>  if epi ref count is not 0
>>  wait until ref count be 0
> 
> Perhaps I don't fully understand what you're proposing, but I don't
> think it's reasonable for a kernel API (epoll_ctl in this case) to block
> while waiting for a userspace action (decrementing the ref count) that
> might never occur.
> 
> Andrew Morton also proposed using ref counting in response to my initial
> patch submission; my reply to his proposal might also be applicable to
> your proposal. A link to that discussion thread:
> http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096
> 
> Sorry if I am misunderstanding your proposal, but I don't see how it
> solves the original problem.

I just try to find out whether we could using DISABLE with out ONESHOT :)

My currently understanding is:

1. we actually want to determine the part between each epoll_wait() in a
while().

2. we can't count on epoll_wait() itself, since no info pass to kernel
to indicate whether it was invoked after another epoll_wait() in the
same while().

3. so we need epoll_ctl(MOD) to tell kernel: user finished process data
after epoll_wait(), and those data belong to which epi.

4. since 3 we need ONESHOT to be enabled.


Is

Re: [PATCH 0/2] Removing the use of VLAIS from the Linux Kernel

2012-10-31 Thread Herbert Xu

On Wed, Oct 31, 2012 at 12:41:32PM -0400, David Miller wrote:
> 
> I don't think imposing the limitations of a non-gcc compiler
> is rasonable.

I agree.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread Minchan Kim

Hi Andrew,

On Wed, Oct 31, 2012 at 02:35:24PM -0700, Andrew Morton wrote:
> On Tue, 30 Oct 2012 10:29:54 +0900
> Minchan Kim  wrote:
> 
> > This patch introudces new madvise behavior MADV_VOLATILE and
> > MADV_NOVOLATILE for anonymous pages. It's different with
> > John Stultz's version which considers only tmpfs while this patch
> > considers only anonymous pages so this cannot cover John's one.
> > If below idea is proved as reasonable, I hope we can unify both
> > concepts by madvise/fadvise.
> > 
> > Rationale is following as.
> > Many allocators call munmap(2) when user call free(3) if ptr is
> > in mmaped area. But munmap isn't cheap because it have to clean up
> > all pte entries and unlinking a vma so overhead would be increased
> > linearly by mmaped area's size.
> 
> Presumably the userspace allocator will internally manage memory in
> large chunks, so the munmap() call frequency will be much lower than
> the free() call frequency.  So the performance gains from this change
> might be very small.
> 
> The whole point of the patch is to improve performance, but we have no
> evidence that it was successful in doing that!  I do think we'll need
> good quantitative testing results before proceeding with such a patch,
> please.

Absolutely. That's why I send it as RFC.
In this time, I would like to reach a concensus on that this idea
makes sense before further investigating because we have lots of
experienced developer pool and one of them might know this is really
needed or not.

> 
> Also, it is very desirable that we involve the relevant userspace
> (glibc, etc) developers in this.  And I understand that the google
> tcmalloc project will probably have interest in this - I've cc'ed
> various people@google in the hope that they can provide input (please).

Thanks! I should have done. Such input is really one I need now.

> 
> Also, it is a userspace API change.  Please cc mtk.manpa...@gmail.com.

This is RFC so we don't have anything fixed until now.
I will Cc'ed him after everything I should solve goes out and
interface is fixed.

> 
> Also, I assume that you have userspace test code.  At some stage,
> please consider adding a case to tools/testing/selftests.  Such a test
> would require to creation of memory pressure, which is rather contrary
> to the selftests' current philosopy of being a bunch of short-running
> little tests.  Perhaps you can come up with something.  But I suggest
> that such work be done later, once it becomes clearer that this code is
> actually headed into the kernel.

Yes.

> 
> > Allocator should call madvise(MADV_NOVOLATILE) before reusing for
> > allocating that area to user. Otherwise, accessing of volatile range
> > will meet SIGBUS error.
> 
> Well, why?  It would be easy enough for the fault handler to give
> userspace a new, zeroed page at that address.

Absolutely. It would be convenient but as a matter of fact, I am considering
to unify John Stultz's fallocate volatile range which consider only tmpfs
pages so madvise/fadvise might be better candidate as API.
In tmpfs case, John implemented it as returning zero page when someone
access volatile region like you mentioned but in this kernel summit, Hugh
pointed out and wanted to return SIGBUS and I think it makes debug better.

Another option is we can put a flag in API which indicates that VM will
return zero page or SIGBUS when user access volatile range so user can do
what they want.

> 
> Or we could simply leave the old page in place at that address.  If the
> page gets touched, we clear MADV_NOVOLATILE on its VMA and give the
> page (or all the not-yet-reclaimed pages) back to userspace at their
> old addresses.
> 
> Various options suggest themselves here.  You've chosen one of them but
> I would like to see a pretty exhaustive description of the reasoning
> behind that decision.

Will do.

> 
> Also, I wonder about the interaction with other vma manipulation
> operations.  For example, can a VMA get split when in the MADV_VOLATILE
> state?  If so, what happens?  

Both VMAs would be volatile although one of either has never reclaimed
pages. I understand it's not an optimal but I expect user will not do 
such operations(ex, mprotect, mremap) frequently on volatile vma.

If they do, maybe I need to come up with something but It wouldn't be easy.

> 
> Also, I see no reason why the code shouldn't work OK with nonlinear VMAs,
> but I bet this wasn't tested ;)

Yes. I didn't consider that yet. AFAIK, nonlinear vma is related to
file-mapped vma while this patch consider only anon vma which is good for
first step.

> 
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -86,6 +86,22 @@ static long madvise_behavior(struct vm_area_struct * vma,
> > if (error)
> > goto out;
> > break;
> > +   case MADV_VOLATILE:
> > +   if (vma->vm_flags & VM_LOCKED) {
> > +   error = -EINVAL;
> > +   goto out;
> > +   }
> > +   new_fla

Re: [PATCH] emulator test: add "rep ins" mmio access test

2012-10-31 Thread Marcelo Tosatti

On Fri, Oct 19, 2012 at 03:39:08PM +0800, Xiao Guangrong wrote:
> Add the test to trigger the bug that "rep ins" causes vcpu->mmio_fragments
> overflow overflow while move large data from ioport to MMIO
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  x86/emulator.c |   14 ++
>  1 files changed, 14 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] proc: add "Seccomp" to status

2012-10-31 Thread Andrew Morton

On Wed, 31 Oct 2012 13:09:27 -0700
Kees Cook  wrote:

> Adds the seccomp mode to the /proc/$pid/status file so the state of
> seccomp can be externally examined.

There's no reason here for anyone to apply this patch to anything. 
Presumably you see some value to our users - please share your thoughts
with us ;)

> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -327,6 +327,13 @@ static inline void task_cap(struct seq_file *m, struct 
> task_struct *p)
>   render_cap_t(m, "CapBnd:\t", &cap_bset);
>  }
>  
> +static inline void task_seccomp(struct seq_file *m, struct task_struct *p)
> +{
> +#ifdef CONFIG_SECCOMP

hm, OK, cpuset_task_status_allowed() is a no-op if CONFIG_CPUSETS=n, so
there is precedent for fields vanishing with Kconfig changes.

> + seq_printf(m, "Seccomp:\t%d\n", p->seccomp.mode);

Get thee yon unto Documentation/filesystems/proc.txt!

> +#endif
> +}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11] kvm: notify host when the guest is panicked

2012-10-31 Thread Marcelo Tosatti

On Tue, Oct 30, 2012 at 10:30:02PM -0400, Sasha Levin wrote:
> On Tue, Oct 30, 2012 at 9:48 PM, Wen Congyang  wrote:
> > At 10/31/2012 09:12 AM, Marcelo Tosatti Wrote:
> >> It has been asked earlier why a simple virtio device is not usable
> >> for this (with no response IIRC).
> >
> > 1. We can't use virtio device when the kernel is booting.
> 
> So the issue here is the small window between the point the guest
> becomes "self aware" and to the point virtio drivers are loaded,
> right?
> 
> I agree that if something happens during that interval, a
> "virtio-notifier" driver won't catch that, but anything beyond that is
> better done with a virtio driver, so how is the generic infrastructure
> added in this patch useful to anything beyond detecting panics in that
> initial interval?

Asked earlier about quantification of panics in that window (i doubt
early panics are that significant for this usecase). netconsole has
the same issue:

"This module logs kernel printk messages over UDP allowing debugging of
problem where disk logging fails and serial consoles are impractical.

It can be used either built-in or as a module. As a built-in,
netconsole initializes immediately after NIC cards and will bring up
the specified interface as soon as possible. While this doesn't allow
capture of early kernel panics, it does capture most of the boot
process."

> > 2. The virtio's driver can be built as a module, and if it is not loaded
> >and the kernel is panicked, there is no way to notify the host.
> 
> Even if the suggested virtio-notifier driver is built as a module, it
> would get auto-loaded when the guest is booting, so I'm not sure about
> this point?

> > 3. I/O port is more reliable than virtio device.
> >If virtio's driver has some bug, and it cause kernel panicked, we can't
> >use it. The I/O port is more reliable because it only depends on notifier
> >chain(If we use virtio device, it also depends on notifier chain).
> 
> This is like suggesting that we let KVM emulate virtio-blk on it's
> own, parallel to the virtio implementation, so that even if there's a
> problem with virtio-blk, KVM can emulate a virtio-blk on it's own.
> 
> Furthermore, why stop at virtio? What if the KVM code has a bug and it
> doesn't pass IO properly? Or the x86 code? we still want panic
> notifications if that happens...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/3] alarmtimer: Replace the spinlock rtcdev_lock with mutex

2012-10-31 Thread Liu, Chuansheng



> -Original Message-
> From: Oliver Neukum [mailto:oneu...@suse.de]
> Sent: Wednesday, October 31, 2012 5:03 PM
> To: Liu, Chuansheng
> Cc: john.stu...@linaro.org; t...@linutronix.de; gre...@linuxfoundation.org;
> linux-kernel@vger.kernel.org; Liu, Chuansheng
> Subject: Re: [PATCH 1/3] alarmtimer: Replace the spinlock rtcdev_lock with
> mutex
> 
> On Thursday 01 November 2012 00:20:55 Chuansheng Liu wrote:
> > When do code reviewing, found no special requirement to
> > use spin_lock_irqsave/spin_unlock_irqrestore, because
> > alarmtimer_get_rtcdev() is called by posix clock interface.
> > So would like to use mutex to replace it.
> 
> What is gained thereby?
spin_lock_irqsave will disable the preempt and local irq, it is expensive than
mutex. Thanks.
> 
>   Regards
>   Oliver

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] KVM fixes for 3.7-rc3

2012-10-31 Thread Marcelo Tosatti


Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the following KVM bug fixes


Xiao Guangrong (1):
  KVM: x86: fix vcpu->mmio_fragments overflow

 arch/x86/kvm/x86.c   |   60 ++-
 include/linux/kvm_host.h |   15 +--
 2 files changed, 36 insertions(+), 39 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] r8169: Fix WoL on RTL8168d/8111d.

2012-10-31 Thread Cyril Brulebois

This regression was spotted between Debian squeeze and Debian wheezy
kernels (respectively based on 2.6.32 and 3.2). More info about
Wake-on-LAN issues with Realtek's 816x chipsets can be found in the
following thread: http://marc.info/?t=13207921944

Probable regression from d4ed95d796e5126bba51466dc07e287cebc8bd19;
more chipsets are likely affected.

Tested on top of a 3.2.23 kernel.

Reported-by: Florent Fourcot 
Tested-by: Florent Fourcot 
Hinted-by: Francois Romieu 
Signed-off-by: Cyril Brulebois 
---
 drivers/net/ethernet/realtek/r8169.c |2 ++
 1 file changed, 2 insertions(+)

v2: Update link in commit message, and re-send to netdev/Dave according
to Francois Romieu's comment, thanks.

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index e7ff886..eb6a5e4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3827,6 +3827,8 @@ static void rtl_wol_suspend_quirk(struct rtl8169_private 
*tp)
void __iomem *ioaddr = tp->mmio_addr;
 
switch (tp->mac_version) {
+   case RTL_GIGA_MAC_VER_25:
+   case RTL_GIGA_MAC_VER_26:
case RTL_GIGA_MAC_VER_29:
case RTL_GIGA_MAC_VER_30:
case RTL_GIGA_MAC_VER_32:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] device_cgroup: fix unchecked cgroup parent usage

2012-10-31 Thread Andrew Morton

On Wed, 31 Oct 2012 12:04:30 -0400
Aristeu Rozanski  wrote:

> In 4cef7299b4786879a3e113e84084a72b24590c5b the cgroup parent usage is
> unchecked. root will not have a parent and trying to use
> device.{allow,deny} will cause problems.

>From my reading of the code "problems" means "kernel null pointer
dereference".

> For some reason my stressing
> scripts didn't test the root directory so I didn't catch it on my
> regular tests.
> 
> --- github.orig/security/device_cgroup.c  2012-10-26 17:18:01.739366780 
> -0400
> +++ github/security/device_cgroup.c   2012-10-29 10:03:33.221918003 -0400
> @@ -352,6 +352,8 @@
>   */
>  static inline int may_allow_all(struct dev_cgroup *parent)

offtopic: this function could quite neatly have a bool return type.

>  {
> + if (!parent)
> + return 1;

hm.  Does it need a comment explaining what and why?  I guess not...  just.

>   return parent->behavior == DEVCG_DEFAULT_ALLOW;
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/6] PM / Runtime: introduce pm_runtime_set[get]_memalloc_noio()

2012-10-31 Thread Ming Lei

On Wed, Oct 31, 2012 at 11:41 PM, Alan Stern  wrote:
>
> Sorry, I misread your message.  You are setting the device's flag, not
> the thread's flag.

Never mind.

>
> This still doesn't help in this case where CONFIG_PM_RUNTIME is
> disabled.  I think it will be simpler to set the noio flag during every
> device reset.

Yes, it's better to set the flag during every device reset now.

Also pppoe or network interface over serial port is a bit difficult to
deal with, as Oliver pointed out.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/8] ARM: zynq: dts: split up device tree

2012-10-31 Thread Josh Cartwright

The purpose of the created zynq-7000.dtsi file is to describe the
hardware common to all Zynq 7000-based boards.  Also, get rid of the
zynq-ep107 device tree, since it is not hardware anyone can purchase.

Add a zc702 dts file based on the zynq-7000.dtsi.  Add it to the
dts/Makefile so it is built with the 'dtbs' target.

Signed-off-by: Josh Cartwright 
---
 arch/arm/boot/dts/Makefile |  1 +
 .../boot/dts/{zynq-ep107.dts => zynq-7000.dtsi}| 19 +++---
 arch/arm/boot/dts/zynq-zc702.dts   | 30 ++
 arch/arm/mach-zynq/common.c|  3 ++-
 4 files changed, 36 insertions(+), 17 deletions(-)
 rename arch/arm/boot/dts/{zynq-ep107.dts => zynq-7000.dtsi} (79%)
 create mode 100644 arch/arm/boot/dts/zynq-zc702.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index f37cf9f..76ed11e 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -103,5 +103,6 @@ dtb-$(CONFIG_ARCH_VEXPRESS) += vexpress-v2p-ca5s.dtb \
 dtb-$(CONFIG_ARCH_VT8500) += vt8500-bv07.dtb \
wm8505-ref.dtb \
wm8650-mid.dtb
+dtb-$(CONFIG_ARCH_ZYNQ) += zynq-zc702.dtb
 
 endif
diff --git a/arch/arm/boot/dts/zynq-ep107.dts b/arch/arm/boot/dts/zynq-7000.dtsi
similarity index 79%
rename from arch/arm/boot/dts/zynq-ep107.dts
rename to arch/arm/boot/dts/zynq-7000.dtsi
index 5caf100..8b30e59 100644
--- a/arch/arm/boot/dts/zynq-ep107.dts
+++ b/arch/arm/boot/dts/zynq-7000.dtsi
@@ -10,29 +10,16 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU General Public License for more details.
  */
+/include/ "skeleton.dtsi"
 
-/dts-v1/;
 / {
-   model = "Xilinx Zynq EP107";
-   compatible = "xlnx,zynq-ep107";
-   #address-cells = <1>;
-   #size-cells = <1>;
-   interrupt-parent = <&intc>;
-
-   memory {
-   device_type = "memory";
-   reg = <0x0 0x1000>;
-   };
-
-   chosen {
-   bootargs = "console=ttyPS0,9600 root=/dev/ram rw 
initrd=0x80,8M earlyprintk";
-   linux,stdout-path = &uart0;
-   };
+   compatible = "xlnx,zynq-7000";
 
amba {
compatible = "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
+   interrupt-parent = <&intc>;
ranges;
 
intc: interrupt-controller@f8f01000 {
diff --git a/arch/arm/boot/dts/zynq-zc702.dts b/arch/arm/boot/dts/zynq-zc702.dts
new file mode 100644
index 000..e25a307
--- /dev/null
+++ b/arch/arm/boot/dts/zynq-zc702.dts
@@ -0,0 +1,30 @@
+/*
+ *  Copyright (C) 2011 Xilinx
+ *  Copyright (C) 2012 National Instruments Corp.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+/dts-v1/;
+/include/ "zynq-7000.dtsi"
+
+/ {
+   model = "Zynq ZC702 Development Board";
+   compatible = "xlnx,zynq-zc702", "xlnx,zynq-7000";
+
+   memory {
+   device_type = "memory";
+   reg = <0x0 0x4000>;
+   };
+
+   chosen {
+   bootargs = "console=ttyPS1,115200 earlyprintk";
+   };
+
+};
diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index 0279ea7..447904b 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -115,7 +115,8 @@ static void __init xilinx_map_io(void)
 }
 
 static const char *xilinx_dt_match[] = {
-   "xlnx,zynq-ep107",
+   "xlnx,zynq-zc702",
+   "xlnx,zynq-7000",
NULL
 };
 
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/8] ARM: zynq: move arm-specific sys_timer out of ttc

2012-10-31 Thread Josh Cartwright

Move the sys_timer definition out of ttc driver and make it part of the
common zynq code.  This is preparation for renaming and COMMON_CLK
support.

Signed-off-by: Josh Cartwright 
---
 arch/arm/mach-zynq/common.c | 13 +
 arch/arm/mach-zynq/common.h |  4 +---
 arch/arm/mach-zynq/timer.c  | 10 +-
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index ba8d14f..6f058258 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -93,6 +94,18 @@ static struct map_desc io_desc[] __initdata = {
 
 };
 
+static void __init xilinx_zynq_timer_init(void)
+{
+   xttcpss_timer_init();
+}
+
+/*
+ * Instantiate and initialize the system timer structure
+ */
+static struct sys_timer xttcpss_sys_timer = {
+   .init   = xilinx_zynq_timer_init,
+};
+
 /**
  * xilinx_map_io() - Create memory mappings needed for early I/O.
  */
diff --git a/arch/arm/mach-zynq/common.h b/arch/arm/mach-zynq/common.h
index a009644..954b91c 100644
--- a/arch/arm/mach-zynq/common.h
+++ b/arch/arm/mach-zynq/common.h
@@ -17,8 +17,6 @@
 #ifndef __MACH_ZYNQ_COMMON_H__
 #define __MACH_ZYNQ_COMMON_H__
 
-#include 
-
-extern struct sys_timer xttcpss_sys_timer;
+void __init xttcpss_timer_init(void);
 
 #endif
diff --git a/arch/arm/mach-zynq/timer.c b/arch/arm/mach-zynq/timer.c
index c2c96cc..c93cbe5 100644
--- a/arch/arm/mach-zynq/timer.c
+++ b/arch/arm/mach-zynq/timer.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 
-#include 
 #include 
 #include "common.h"
 
@@ -269,7 +268,7 @@ static struct clock_event_device xttcpss_clockevent = {
  * Initializes the timer hardware and register the clock source and clock event
  * timers with Linux kernal timer framework
  **/
-static void __init xttcpss_timer_init(void)
+void __init xttcpss_timer_init(void)
 {
xttcpss_timer_hardware_init();
clocksource_register_hz(&clocksource_xttcpss, TIMER_RATE);
@@ -289,10 +288,3 @@ static void __init xttcpss_timer_init(void)
xttcpss_clockevent.cpumask = cpumask_of(0);
clockevents_register_device(&xttcpss_clockevent);
 }
-
-/*
- * Instantiate and initialize the system timer structure
- */
-struct sys_timer xttcpss_sys_timer = {
-   .init   = xttcpss_timer_init,
-};
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/8] serial: xilinx_uartps: get clock rate info from dts

2012-10-31 Thread Josh Cartwright

Add support for specifying clock information for the uart clk via the
device tree.  This eliminates the need to hardcode rates in the device
tree.

Signed-off-by: Josh Cartwright 
---
 arch/arm/boot/dts/zynq-7000.dtsi   |  4 ++--
 drivers/tty/serial/xilinx_uartps.c | 30 +-
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi
index bb3085c..5fb763f 100644
--- a/arch/arm/boot/dts/zynq-7000.dtsi
+++ b/arch/arm/boot/dts/zynq-7000.dtsi
@@ -44,14 +44,14 @@
compatible = "xlnx,xuartps";
reg = <0xE000 0x1000>;
interrupts = <0 27 4>;
-   clock = <5000>;
+   clocks = <&uart_clk 0>;
};
 
uart1: uart@e0001000 {
compatible = "xlnx,xuartps";
reg = <0xE0001000 0x1000>;
interrupts = <0 50 4>;
-   clock = <5000>;
+   clocks = <&uart_clk 0>;
};
 
slcr: slcr@f800 {
diff --git a/drivers/tty/serial/xilinx_uartps.c 
b/drivers/tty/serial/xilinx_uartps.c
index 23efe17..adfecbc 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -944,18 +945,20 @@ static int __devinit xuartps_probe(struct platform_device 
*pdev)
int rc;
struct uart_port *port;
struct resource *res, *res2;
-   int clk = 0;
+   struct clk *clk;
 
-   const unsigned int *prop;
-
-   prop = of_get_property(pdev->dev.of_node, "clock", NULL);
-   if (prop)
-   clk = be32_to_cpup(prop);
+   clk = of_clk_get(pdev->dev.of_node, 0);
if (!clk) {
dev_err(&pdev->dev, "no clock specified\n");
return -ENODEV;
}
 
+   rc = clk_prepare_enable(clk);
+   if (rc) {
+   dev_err(&pdev->dev, "could not enable clock\n");
+   return -EBUSY;
+   }
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENODEV;
@@ -978,7 +981,8 @@ static int __devinit xuartps_probe(struct platform_device 
*pdev)
port->mapbase = res->start;
port->irq = res2->start;
port->dev = &pdev->dev;
-   port->uartclk = clk;
+   port->uartclk = clk_get_rate(clk);
+   port->private_data = clk;
dev_set_drvdata(&pdev->dev, port);
rc = uart_add_one_port(&xuartps_uart_driver, port);
if (rc) {
@@ -1000,14 +1004,14 @@ static int __devinit xuartps_probe(struct 
platform_device *pdev)
 static int __devexit xuartps_remove(struct platform_device *pdev)
 {
struct uart_port *port = dev_get_drvdata(&pdev->dev);
-   int rc = 0;
+   struct clk *clk = port->private_data;
+   int rc;
 
/* Remove the xuartps port from the serial core */
-   if (port) {
-   rc = uart_remove_one_port(&xuartps_uart_driver, port);
-   dev_set_drvdata(&pdev->dev, NULL);
-   port->mapbase = 0;
-   }
+   rc = uart_remove_one_port(&xuartps_uart_driver, port);
+   dev_set_drvdata(&pdev->dev, NULL);
+   port->mapbase = 0;
+   clk_disable_unprepare(clk);
return rc;
 }
 
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/8] ARM: zynq: add COMMON_CLK support

2012-10-31 Thread Josh Cartwright

Add support for COMMON_CLK, and provide simplified models for the
necessary clocks on the zynq-7000.  Currently, the PLLs, the CPU clock
network, and the basic peripheral clock networks (for SDIO, SMC, SPI,
QSPI, UART) are modelled.

Signed-off-by: Josh Cartwright 
---
 .../devicetree/bindings/clock/zynq-7000.txt|  55 
 arch/arm/Kconfig   |   1 +
 arch/arm/boot/dts/zynq-7000.dtsi   |  56 
 arch/arm/boot/dts/zynq-zc702.dts   |   4 +
 arch/arm/mach-zynq/common.c|  11 +
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-zynq.c | 355 +
 include/linux/clk/zynq.h   |  24 ++
 8 files changed, 507 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/zynq-7000.txt
 create mode 100644 drivers/clk/clk-zynq.c
 create mode 100644 include/linux/clk/zynq.h

diff --git a/Documentation/devicetree/bindings/clock/zynq-7000.txt 
b/Documentation/devicetree/bindings/clock/zynq-7000.txt
new file mode 100644
index 000..2e21fc1
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/zynq-7000.txt
@@ -0,0 +1,55 @@
+Device Tree Clock bindings for the Zynq 7000 EPP
+
+The Zynq EPP has several different clk providers, each with there own bindings.
+The purpose of this document is to document their usage.
+
+See clock_bindings.txt for more information on the generic clock bindings.
+See Chapter 25 of Zynq TRM for more information about Zynq clocks.
+
+== PLLs ==
+
+Used to describe the ARM_PLL, DDR_PLL, and IO_PLL.
+
+Required properties:
+- #clock-cells : shall be 0 (only one clock is output from this node)
+- compatible : "xlnx,zynq-pll"
+- reg : pair of u32 values, which are the address offsets within the SLCR
+of the relevant PLL_CTRL register and PLL_CFG register respectively
+- clocks : phandle for parent clock.  should be the phandle for ps_clk
+
+Optional properties:
+- clock-output-names : name of the output clock
+
+Example:
+   armpll: armpll {
+   #clock-cells = <0>;
+   compatible = "xlnx,zynq-pll";
+   clocks = <&ps_clk>;
+   reg = <0x100 0x110>;
+   clock-output-names = "armpll";
+   };
+
+== Peripheral clocks ==
+
+Describes clock node for the SDIO, SMC, SPI, QSPI, and UART clocks.
+
+Required properties:
+- #clock-cells : shall be 1
+- compatible : "xlnx,zynq-periph-clock"
+- reg : a single u32 value, describing the offset within the SLCR where
+the CLK_CTRL register is found for this peripheral
+- clocks : phandle for parent clocks.  should hold phandles for
+   the IO_PLL, ARM_PLL, and DDR_PLL in order
+
+Optional properties:
+- clock-output-names : name of the output clock
+
+Example:
+   uart_clk: uart_clk {
+   #clock-cells = <1>;
+   compatible = "xlnx,zynq-periph-clock";
+   clocks = <&iopll &armpll &ddrpll>;
+   reg = <0x154>;
+   clock-output-names = "uart0_ref_clk",
+"uart1_ref_clk";
+   };
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 21ed87b..ccfe0ab 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -959,6 +959,7 @@ config ARCH_ZYNQ
bool "Xilinx Zynq ARM Cortex A9 Platform"
select ARM_AMBA
select ARM_GIC
+   select COMMON_CLK
select CPU_V7
select GENERIC_CLOCKEVENTS
select ICST
diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi
index 8b30e59..bb3085c 100644
--- a/arch/arm/boot/dts/zynq-7000.dtsi
+++ b/arch/arm/boot/dts/zynq-7000.dtsi
@@ -53,5 +53,61 @@
interrupts = <0 50 4>;
clock = <5000>;
};
+
+   slcr: slcr@f800 {
+   compatible = "xlnx,zynq-slcr";
+   reg = <0xF800 0x1000>;
+
+   clocks {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ps_clk: ps_clk {
+   #clock-cells = <0>;
+   compatible = "fixed-clock";
+   /* clock-frequency set in 
board-specific file */
+   clock-output-names = "ps_clk";
+   };
+   armpll: armpll {
+   #clock-cells = <0>;
+   compatible = "xlnx,zynq-pll";
+   clocks = <&ps_clk>;
+   reg = <0x100 0x110>;
+   clock-output-names = "armpll";
+   };
+   ddrpll: ddrpll {
+

[PATCH 0/8] zynq COMMON_CLK support

2012-10-31 Thread Josh Cartwright

This patchset implements COMMON_CLK support for the zynq.  At this
point, only the basic fundamental clocks are modelled, and only
passively; for rate calculation.  of_clk bindings are implemented to
allow specifying clock/peripheral relationships in the device tree.

Patch 1 and 2 are a followup to my early patch: "ARM: zynq: move ttc
timer code to drivers/clocksource".  Patch 1 moves the definition
sys_timer definition out of the ttc code, and into the zynq common code.
Patch 2 is the actual rename, and makefile cleanup.

Patch 3 adds a description of the second uart to zynq-ep107.dts.  I did
this pre-split (patch 4), because I felt it might make reviewing easier.

Patch 4 uses zynq-ep107.dts as a reference to create zynq-7000.dtsi,
which is intended to be a common dtsi snippet for inclusion in
describing zynq-7000 based boards.  zynq-zc702.dts is created as an
example consumer.  The zynq-ep107.dts file is removed entirely (it
describes, presumably, a board not available to consumers).

Patch 5 is the real meat; it adds an implementation of the clk models
for the PLLs, the CPU clock network, and basic (simplified) clk models
for the essential peripherals (UART and the TTC).

Patch 6 removes CONFIG_OF conditional code from the xilinx uart driver.
The zynq kernel requires CONFIG_OF, and this hardware is not currently
used on any other non CONFIG_OF platform.

Patch 7 adds support to the xilinx_uartps driver to allow getting clock
rate information form the device tree.

Patch 8 implements DT support for the ttc, including pulling clock tree
info.

---
There are some specific concerns that I had that I would like some
guidance on:

Two identical timers on the board have historically been statically
allocated to act as the system clocksource, and the clockevent_device.
With patch 8, this distinction is done in the device tree by tweaking
with the compatible properties of which of the timers you want used for
what purpose.

I feel, however, that this is an abuse of the device tree, which should
only be used to describe hardware, not to layout a policy on how the
hardware is used.

So, if it's not in the device tree, then where?  Do I go back to the
static allocation routine, such that the first matching ttc node in the
tree becomes the clockevent_device, and the second one a clocksource?
That seems like a hack.

Is it somehow possible to have all of the timers registered as both a
clocksource and a clockevent_device, and have some higher level logic
make the policy decision as to which timer is used for what?

An additional question regarding of_clk bindings:

my_clock {
#clock-cells = <0>;
clock-output-names = "my_out_clock";
};
node_a {
clocks = <&clk>;
clock-names = "my_clock";
clock-ranges;

node_b {
/* ... */
};
};

In this scenario, should I be expecting of_clk_get(node_b, 0) to
retrieve a handle to parent's consumed clock (due to clock-ranges)?  I
could make this work using of_clk_get_by_name(node_b, "my_clock"), but I
was somewhat surprised the former didn't work.

Thanks (and sorry for the novel),
   Josh

---
Josh Cartwright (8):
  ARM: zynq: move arm-specific sys_timer out of ttc
  ARM: zynq: move ttc timer code to drivers/clocksource
  ARM: zynq: dts: add description of the second uart
  ARM: zynq: dts: split up device tree
  ARM: zynq: add COMMON_CLK support
  serial: xilinx_uartps: kill CONFIG_OF conditional
  serial: xilinx_uartps: get clock rate info from dts
  clocksource: xilinx_ttc: add OF_CLK support

 .../devicetree/bindings/clock/zynq-7000.txt|  55 
 arch/arm/Kconfig   |   1 +
 arch/arm/boot/dts/Makefile |   1 +
 arch/arm/boot/dts/zynq-7000.dtsi   | 166 ++
 arch/arm/boot/dts/zynq-ep107.dts   |  63 
 arch/arm/boot/dts/zynq-zc702.dts   |  44 +++
 arch/arm/mach-zynq/Makefile|   2 +-
 arch/arm/mach-zynq/common.c|  29 +-
 arch/arm/mach-zynq/timer.c | 298 -
 drivers/clk/Makefile   |   1 +
 drivers/clk/clk-zynq.c | 355 +
 drivers/clocksource/Makefile   |   1 +
 drivers/clocksource/xilinx_ttc.c   | 326 +++
 drivers/tty/serial/xilinx_uartps.c |  39 +--
 include/linux/clk/zynq.h   |  24 ++
 .../common.h => include/linux/xilinx_ttc.h |   8 +-
 16 files changed, 1022 insertions(+), 391 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/clock/zynq-7000.txt
 create mode 100644 arch/arm/boot/dts/zynq-7000.dtsi
 delete mode 100644 arch/arm/boot/dts/zynq-ep107.dts
 create mode 100644 arch/arm/boot/dts/zynq-zc702.dts
 delete mode 100644 arch

[PATCH 2/8] ARM: zynq: move ttc timer code to drivers/clocksource

2012-10-31 Thread Josh Cartwright

Suggested cleanup by Arnd Bergmann.  Move the ttc timer.c code to
drivers/clocksource, and out of the mach-zynq directory.

The common.h (which only held the timer declaration) was renamed to
xilinx_ttc.h and moved into include/linux.

Signed-off-by: Josh Cartwright 
Cc: Arnd Bergmann 
---
 arch/arm/mach-zynq/Makefile| 2 +-
 arch/arm/mach-zynq/common.c| 2 +-
 drivers/clocksource/Makefile   | 1 +
 arch/arm/mach-zynq/timer.c => drivers/clocksource/xilinx_ttc.c | 1 -
 arch/arm/mach-zynq/common.h => include/linux/xilinx_ttc.h  | 4 ++--
 5 files changed, 5 insertions(+), 5 deletions(-)
 rename arch/arm/mach-zynq/timer.c => drivers/clocksource/xilinx_ttc.c (99%)
 rename arch/arm/mach-zynq/common.h => include/linux/xilinx_ttc.h (91%)

diff --git a/arch/arm/mach-zynq/Makefile b/arch/arm/mach-zynq/Makefile
index 397268c..320faed 100644
--- a/arch/arm/mach-zynq/Makefile
+++ b/arch/arm/mach-zynq/Makefile
@@ -3,4 +3,4 @@
 #
 
 # Common support
-obj-y  := common.o timer.o
+obj-y  := common.o
diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index 6f058258..0279ea7 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,7 +33,6 @@
 #include 
 
 #include 
-#include "common.h"
 
 static struct of_device_id zynq_of_bus_ids[] __initdata = {
{ .compatible = "simple-bus", },
diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
index 603be36..f27c7b1 100644
--- a/drivers/clocksource/Makefile
+++ b/drivers/clocksource/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_DW_APB_TIMER_OF) += dw_apb_timer_of.o
 obj-$(CONFIG_CLKSRC_DBX500_PRCMU)  += clksrc-dbx500-prcmu.o
 obj-$(CONFIG_ARMADA_370_XP_TIMER)  += time-armada-370-xp.o
 obj-$(CONFIG_ARCH_BCM2835) += bcm2835_timer.o
+obj-$(CONFIG_ARCH_ZYNQ)+= xilinx_ttc.o
 
 obj-$(CONFIG_CLKSRC_ARM_GENERIC)   += arm_generic.o
diff --git a/arch/arm/mach-zynq/timer.c b/drivers/clocksource/xilinx_ttc.c
similarity index 99%
rename from arch/arm/mach-zynq/timer.c
rename to drivers/clocksource/xilinx_ttc.c
index c93cbe5..ff38b3e 100644
--- a/arch/arm/mach-zynq/timer.c
+++ b/drivers/clocksource/xilinx_ttc.c
@@ -25,7 +25,6 @@
 #include 
 
 #include 
-#include "common.h"
 
 #define IRQ_TIMERCOUNTER0  42
 
diff --git a/arch/arm/mach-zynq/common.h b/include/linux/xilinx_ttc.h
similarity index 91%
rename from arch/arm/mach-zynq/common.h
rename to include/linux/xilinx_ttc.h
index 954b91c..303a3fd 100644
--- a/arch/arm/mach-zynq/common.h
+++ b/include/linux/xilinx_ttc.h
@@ -14,8 +14,8 @@
  * GNU General Public License for more details.
  */
 
-#ifndef __MACH_ZYNQ_COMMON_H__
-#define __MACH_ZYNQ_COMMON_H__
+#ifndef __XILINX_TTC_H__
+#define __XILINX_TTC_H__
 
 void __init xttcpss_timer_init(void);
 
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/8] serial: xilinx_uartps: kill CONFIG_OF conditional

2012-10-31 Thread Josh Cartwright

The Zynq platform requires the use of CONFIG_OF.  Remove the #ifdef
conditionals in the uartps driver.

Signed-off-by: Josh Cartwright 
---
 drivers/tty/serial/xilinx_uartps.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/tty/serial/xilinx_uartps.c 
b/drivers/tty/serial/xilinx_uartps.c
index b627363..23efe17 100644
--- a/drivers/tty/serial/xilinx_uartps.c
+++ b/drivers/tty/serial/xilinx_uartps.c
@@ -946,15 +946,11 @@ static int __devinit xuartps_probe(struct platform_device 
*pdev)
struct resource *res, *res2;
int clk = 0;
 
-#ifdef CONFIG_OF
const unsigned int *prop;
 
prop = of_get_property(pdev->dev.of_node, "clock", NULL);
if (prop)
clk = be32_to_cpup(prop);
-#else
-   clk = *((unsigned int *)(pdev->dev.platform_data));
-#endif
if (!clk) {
dev_err(&pdev->dev, "no clock specified\n");
return -ENODEV;
@@ -1044,16 +1040,11 @@ static int xuartps_resume(struct platform_device *pdev)
 }
 
 /* Match table for of_platform binding */
-
-#ifdef CONFIG_OF
 static struct of_device_id xuartps_of_match[] __devinitdata = {
{ .compatible = "xlnx,xuartps", },
{}
 };
 MODULE_DEVICE_TABLE(of, xuartps_of_match);
-#else
-#define xuartps_of_match NULL
-#endif
 
 static struct platform_driver xuartps_platform_driver = {
.probe   = xuartps_probe,   /* Probe method */
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/8] clocksource: xilinx_ttc: add OF_CLK support

2012-10-31 Thread Josh Cartwright

Add support for retrieving TTC configuration from device tree.  This
includes the ability to pull information about the driving clocks from
the of_clk bindings.

Signed-off-by: Josh Cartwright 
---
 arch/arm/boot/dts/zynq-7000.dtsi |  53 
 arch/arm/boot/dts/zynq-zc702.dts |  10 ++
 drivers/clocksource/xilinx_ttc.c | 273 ++-
 3 files changed, 218 insertions(+), 118 deletions(-)

diff --git a/arch/arm/boot/dts/zynq-7000.dtsi b/arch/arm/boot/dts/zynq-7000.dtsi
index 5fb763f..9a2442c 100644
--- a/arch/arm/boot/dts/zynq-7000.dtsi
+++ b/arch/arm/boot/dts/zynq-7000.dtsi
@@ -109,5 +109,58 @@
};
};
};
+
+   ttc0: ttc0@f8001000 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "xlnx,ttc";
+   reg = <0xF8001000 0x1000>;
+   clocks = <&cpu_clk 3>;
+   clock-names = "cpu_1x";
+   clock-ranges;
+
+   ttc0_0: ttc0.0 {
+   status = "disabled";
+   reg = <0>;
+   interrupts = <0 10 4>;
+   };
+   ttc0_1: ttc0.1 {
+   status = "disabled";
+   reg = <1>;
+   interrupts = <0 11 4>;
+   };
+   ttc0_2: ttc0.2 {
+   status = "disabled";
+   reg = <2>;
+   interrupts = <0 12 4>;
+   };
+   };
+
+   ttc1: ttc0@f8002000 {
+   #interrupt-parent = <&intc>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "xlnx,ttc";
+   reg = <0xF8002000 0x1000>;
+   clocks = <&cpu_clk 3>;
+   clock-names = "cpu_1x";
+   clock-ranges;
+
+   ttc1_0: ttc1.0 {
+   status = "disabled";
+   reg = <0>;
+   interrupts = <0 37 4>;
+   };
+   ttc1_1: ttc1.1 {
+   status = "disabled";
+   reg = <1>;
+   interrupts = <0 38 4>;
+   };
+   ttc1_2: ttc1.2 {
+   status = "disabled";
+   reg = <2>;
+   interrupts = <0 39 4>;
+   };
+   };
};
 };
diff --git a/arch/arm/boot/dts/zynq-zc702.dts b/arch/arm/boot/dts/zynq-zc702.dts
index 86f44d5..c772942 100644
--- a/arch/arm/boot/dts/zynq-zc702.dts
+++ b/arch/arm/boot/dts/zynq-zc702.dts
@@ -32,3 +32,13 @@
 &ps_clk {
clock-frequency = <3330>;
 };
+
+&ttc0_0 {
+   status = "ok";
+   compatible = "xlnx,ttc-counter-clocksource";
+};
+
+&ttc0_1 {
+   status = "ok";
+   compatible = "xlnx,ttc-counter-clockevent";
+};
diff --git a/drivers/clocksource/xilinx_ttc.c b/drivers/clocksource/xilinx_ttc.c
index ff38b3e..a4718f7 100644
--- a/drivers/clocksource/xilinx_ttc.c
+++ b/drivers/clocksource/xilinx_ttc.c
@@ -23,30 +23,14 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 
-#define IRQ_TIMERCOUNTER0  42
-
-/*
- * This driver configures the 2 16-bit count-up timers as follows:
- *
- * T1: Timer 1, clocksource for generic timekeeping
- * T2: Timer 2, clockevent source for hrtimers
- * T3: Timer 3, 
- *
- * The input frequency to the timer module for emulation is 2.5MHz which is
- * common to all the timer channels (T1, T2, and T3). With a pre-scaler of 32,
- * the timers are clocked at 78.125KHz (12.8 us resolution).
- *
- * The input frequency to the timer module in silicon will be 200MHz. With the
- * pre-scaler of 32, the timers are clocked at 6.25MHz (160ns resolution).
- */
-#define XTTCPSS_CLOCKSOURCE0   /* Timer 1 as a generic timekeeping */
-#define XTTCPSS_CLOCKEVENT 1   /* Timer 2 as a clock event */
-
-#define XTTCPSS_TIMER_BASE TTC0_BASE
-#define XTTCPCC_EVENT_TIMER_IRQ(IRQ_TIMERCOUNTER0 + 1)
 /*
  * Timer Register Offset Definitions of Timer 1, Increment base address by 4
  * and use same offsets for Timer 2
@@ -63,9 +47,14 @@
 
 #define XTTCPSS_CNT_CNTRL_DISABLE_MASK 0x1
 
-/* Setup the timers to use pre-scaling */
-
-#define TIMER_RATE (PERIPHERAL_CLOCK_RATE / 32)
+/* Setup the timers to use pre-scaling, using a fixed value for now that will
+ * work across most input frequency, but it may need to be more dynamic
+ */
+#define PRESCALE_EXPONENT  11  /* 2 ^

[PATCH 3/8] ARM: zynq: dts: add description of the second uart

2012-10-31 Thread Josh Cartwright

The zynq-7000 has an additional UART at 0xE0001000.  Describe it in the
device tree.

Signed-off-by: Josh Cartwright 
---
 arch/arm/boot/dts/zynq-ep107.dts | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/boot/dts/zynq-ep107.dts b/arch/arm/boot/dts/zynq-ep107.dts
index 574bc04..5caf100 100644
--- a/arch/arm/boot/dts/zynq-ep107.dts
+++ b/arch/arm/boot/dts/zynq-ep107.dts
@@ -59,5 +59,12 @@
interrupts = <0 27 4>;
clock = <5000>;
};
+
+   uart1: uart@e0001000 {
+   compatible = "xlnx,xuartps";
+   reg = <0xE0001000 0x1000>;
+   interrupts = <0 50 4>;
+   clock = <5000>;
+   };
};
 };
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2] Support volatile range for anon vma

2012-10-31 Thread KOSAKI Motohiro

>> > Allocator should call madvise(MADV_NOVOLATILE) before reusing for
>> > allocating that area to user. Otherwise, accessing of volatile range
>> > will meet SIGBUS error.
>>
>> Well, why?  It would be easy enough for the fault handler to give
>> userspace a new, zeroed page at that address.
>
> Note: MADV_DONTNEED already has this (nice) property.

I don't think I strictly understand this patch. but maybe I can answer why
userland and malloc folks don't like MADV_DONTNEED.

glibc malloc discard freed memory by using MADV_DONTNEED
as tcmalloc. and it is often a source of large performance decrease.
because of MADV_DONTNEED discard memory immediately and
right after malloc() call fall into page fault and pagesize memset() path.
then, using DONTNEED increased zero fill and cache miss rate.

At called free() time, malloc don't have a knowledge when next big malloc()
is called. then, immediate discarding may or may not get good performance
gain. (Ah, ok, the rate is not 5:5. then usually it is worth. but not everytime)


In past, several developers tryied to avoid such situation, likes

- making zero page daemon and avoid pagesize zero fill at page fault
- making new vma or page flags and mark as discardable w/o swap and
  vmscan treat it. (like this and/or MADV_FREE)
- making new process option and avoid page zero fill from page fault path.
  (yes, it is big incompatibility and insecure. but some embedded folks thought
   they are acceptable downside)
- etc


btw, I'm not sure this patch is better for malloc because current MADV_DONTNEED
don't need mmap_sem and works very effectively when a lot of threads case.
taking mmap_sem might bring worse performance than DONTNEED. dunno.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] x86: Provide a comment in uapi/asm/hw_breakpoint.h

2012-10-31 Thread David Howells

Ingo Molnar  wrote:

> Just wanted to inquire about the current status of it, as I'd 
> rather not pull anything that introduces breakages and is still 
> work in progress. Once it's all sorted out I'll have a look.

Should I pull the x86 disintegration and perf fixes onto a tree derived from
tipbot since there are a lot of perf changes in there?

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 16/17 v2] f2fs: move proc files to debugfs

2012-10-31 Thread 'Greg KH'

On Thu, Nov 01, 2012 at 07:38:12AM +0900, Jaegeuk Kim wrote:
> From: Greg Kroah-Hartman 
> 
> This moves all of the f2fs debugging files into debugfs. The files are
> located in /sys/kernel/debug/f2fs/
> 
> Note, I think we are generating all of the same information in each of
> the files for every unique f2fs filesystem in the machine.  This copies
> the functionality that was present in the proc files, but this should be
> fixed up in the future.
> 
> Signed-off-by: Greg Kroah-Hartman 
> [jaegeuk@samsung.com: merged 3 debugfs entries into a *status* entry]
> Signed-off-by: Jaegeuk Kim 

Thanks for the change.

As for the merge, that looks good to me.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] checkpatch: Emit a warning when decimal values are used

2012-10-31 Thread Andrew Morton

On Wed, 31 Oct 2012 11:37:03 +0100
Pavel Machek  wrote:

> Hi!
> 
> > Linux kernel doesn't like decimals, say so.
> 
> ?!
> 
> Linux surely supports decimal constants, like "100". Did you mean
> "octal"?
> 
> If you wanted to add warning for something... I never want to see
> 
> #define CRAPPY_EMBEDDED_REGISTER ((0x1) << (0))
> 
> again

Joe means floating point.  I suggest that the patchset be reworked,
using s/decimal/float/g.

The kernel does have floating point constants, in various graphics
drivers, iirc.  They are used in places where the floatiness gets
handled at complation time.  Along the lines of:

int foo = 1.1 * 2.2;

And I suppose that's an OK thing to do.  We could instead do

int foo = 2;/* 1.1 * 2.2 */

but that's taking away a programmer convenience for no good reason. 
It would be highly inconvenient if the "1.1" was in fact a #define in
some other file, or a Kconfig string.

That being said, I guess it's a worthwhile thing for checkpatch to warn
about.  Hopefully the programmer will say "well thanks, but I meant to
do that".

A much better solution would be to arrange for the kernel to fail to
compile (or to fail to link) if floats are used.  That way, people
could continue to use floats within their compile-time scalar
expressions without getting harrassed by checkpatch.  But I don't know
how to arrange this.

hm.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC virtio-next 1/4] virtio: Move definitions to header file vring.h

2012-10-31 Thread Sjur Brændeland

From: Sjur Brændeland 

Move the vring_virtqueue structure, memory barrier and debug
macros out from virtio_ring.c to the new header file vring.h.
This is done in order to allow other kernel modules to access the
virtio internal data-structures.

Signed-off-by: Sjur Brændeland 
---
Tis patch triggers a couple of checkpatch warnings, but I've
chosen to do a clean copy and not do any corrections.

 drivers/virtio/virtio_ring.c |   96 +
 drivers/virtio/vring.h   |  121 ++
 2 files changed, 122 insertions(+), 95 deletions(-)
 create mode 100644 drivers/virtio/vring.h

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..9027af6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,101 +23,7 @@
 #include 
 #include 
 #include 
-
-/* virtio guest is communicating with a virtual "device" that actually runs on
- * a host processor.  Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
-/* Where possible, use SMP barriers which are more lightweight than mandatory
- * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio-pci does not use). */
-#define virtio_mb(vq) \
-   do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
-#define virtio_rmb(vq) \
-   do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
-#define virtio_wmb(vq) \
-   do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
-#else
-/* We must force memory ordering even if guest is UP since host could be
- * running on another CPU, but SMP barriers are defined to barrier() in that
- * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb(vq) mb()
-#define virtio_rmb(vq) rmb()
-#define virtio_wmb(vq) wmb()
-#endif
-
-#ifdef DEBUG
-/* For development, we want to crash whenever the ring is screwed. */
-#define BAD_RING(_vq, fmt, args...)\
-   do {\
-   dev_err(&(_vq)->vq.vdev->dev,   \
-   "%s:"fmt, (_vq)->vq.name, ##args);  \
-   BUG();  \
-   } while (0)
-/* Caller is supposed to guarantee no reentry. */
-#define START_USE(_vq) \
-   do {\
-   if ((_vq)->in_use)  \
-   panic("%s:in_use = %i\n",   \
- (_vq)->vq.name, (_vq)->in_use);   \
-   (_vq)->in_use = __LINE__;   \
-   } while (0)
-#define END_USE(_vq) \
-   do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
-#else
-#define BAD_RING(_vq, fmt, args...)\
-   do {\
-   dev_err(&_vq->vq.vdev->dev, \
-   "%s:"fmt, (_vq)->vq.name, ##args);  \
-   (_vq)->broken = true;   \
-   } while (0)
-#define START_USE(vq)
-#define END_USE(vq)
-#endif
-
-struct vring_virtqueue
-{
-   struct virtqueue vq;
-
-   /* Actual memory layout for this queue */
-   struct vring vring;
-
-   /* Can we use weak barriers? */
-   bool weak_barriers;
-
-   /* Other side has made a mess, don't try any more. */
-   bool broken;
-
-   /* Host supports indirect buffers */
-   bool indirect;
-
-   /* Host publishes avail event idx */
-   bool event;
-
-   /* Head of free buffer list. */
-   unsigned int free_head;
-   /* Number we've added since last sync. */
-   unsigned int num_added;
-
-   /* Last used index we've seen. */
-   u16 last_used_idx;
-
-   /* How to notify other side. FIXME: commonalize hcalls! */
-   void (*notify)(struct virtqueue *vq);
-
-#ifdef DEBUG
-   /* They're supposed to lock for us. */
-   unsigned int in_use;
-
-   /* Figure out if their kicks are too delayed. */
-   bool last_add_time_valid;
-   ktime_t last_add_time;
-#endif
-
-   /* Tokens for callbacks. */
-   void *data[];
-};
-
-#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+#include "vring.h"
 
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
new file mode 100644
index 000..b997fc3
--- /dev/null
+++ b/drivers/virtio/vring.h
@@ -0,0 +1,121 @@
+/* Virtio ring implementation.
+ *
+ *  Copyright 2007 Rusty Russell IBM Corporation
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Softwa

[RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings.

2012-10-31 Thread Sjur Brændeland

From: Sjur Brændeland 

Add last avilable index to the vring_virtqueue structure,
this is done to prepare for implementation of the reversed vring.

Signed-off-by: Sjur Brændeland 
---
 drivers/virtio/vring.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
index b997fc3..3b53961 100644
--- a/drivers/virtio/vring.h
+++ b/drivers/virtio/vring.h
@@ -51,6 +51,9 @@ struct vring_virtqueue
/* Last used index we've seen. */
u16 last_used_idx;
 
+   /* Last avail index seen. NOTE: Only used for reversed rings.*/
+   u16 last_avail_idx;
+
/* How to notify other side. FIXME: commonalize hcalls! */
void (*notify)(struct virtqueue *vq);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio

2012-10-31 Thread Sjur Brændeland

From: Sjur Brændeland 

Add the CAIF Virtio Link layer, used for communicating with a
modem over shared memory. Virtio is used as the transport mechanism.

In the TX direction the virtio rings are used in the normal fashion,
sending data in the available ring. But in the rx direction the
the we have flipped the direction of the virtio ring, and
implemented the virtio access-function similar to what is found
in vhost.c.

CAIF also uses the virtio configuration space for getting
configuration parameters such as headroom, tailroom etc.

Signed-off-by: Vikram ARV 
Signed-off-by: Sjur Brændeland 
---
 drivers/net/caif/Kconfig|9 +
 drivers/net/caif/Makefile   |3 +
 drivers/net/caif/caif_virtio.c  |  627 +++
 include/uapi/linux/virtio_ids.h |1 +
 4 files changed, 640 insertions(+)
 create mode 100644 drivers/net/caif/caif_virtio.c

diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index abf4d7a..a01f617 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -47,3 +47,12 @@ config CAIF_HSI
The caif low level driver for CAIF over HSI.
Be aware that if you enable this then you also need to
enable a low-level HSI driver.
+
+config CAIF_VIRTIO
+   tristate "CAIF virtio transport driver"
+   default n
+   depends on CAIF
+   depends on REMOTEPROC
+   select VIRTIO
+   ---help---
+   The caif driver for CAIF over Virtio.
diff --git a/drivers/net/caif/Makefile b/drivers/net/caif/Makefile
index 91dff86..d9ee26a 100644
--- a/drivers/net/caif/Makefile
+++ b/drivers/net/caif/Makefile
@@ -13,3 +13,6 @@ obj-$(CONFIG_CAIF_SHM) += caif_shm.o
 
 # HSI interface
 obj-$(CONFIG_CAIF_HSI) += caif_hsi.o
+
+# Virtio interface
+obj-$(CONFIG_CAIF_VIRTIO) += caif_virtio.o
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
new file mode 100644
index 000..e50940f
--- /dev/null
+++ b/drivers/net/caif/caif_virtio.c
@@ -0,0 +1,627 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Contact: Sjur Brendeland / sjur.brandel...@stericsson.com
+ * Authors: Vicram Arv / vikram@stericsson.com,
+ * Dmitry Tarnyagin / dmitry.tarnya...@stericsson.com
+ * Sjur Brendeland / sjur.brandel...@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":" fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "../drivers/virtio/vring.h"
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Vicram Arv ");
+MODULE_DESCRIPTION("Virtio CAIF Driver");
+
+/*
+ * struct cfv_info - Caif Virtio control structure
+ * @cfdev: caif common header
+ * @vdev:  Associated virtio device
+ * @vq_rx: rx/downlink virtqueue
+ * @vq_tx: tx/uplink virtqueue
+ * @ndev:  associated netdevice
+ * @queued_tx: number of buffers queued in the tx virtqueue
+ * @watermark_tx: indicates number of buffers the tx queue
+ * should shrink to to unblock datapath
+ * @tx_lock:   protects vq_tx to allow concurrent senders
+ * @tx_hr: transmit headroom
+ * @rx_hr: receive headroom
+ * @tx_tr: transmit tailroom
+ * @rx_tr: receive tailroom
+ * @mtu:   transmit max size
+ * @mru:   receive max size
+ */
+struct cfv_info {
+   struct caif_dev_common cfdev;
+   struct virtio_device *vdev;
+   struct virtqueue *vq_rx;
+   struct virtqueue *vq_tx;
+   struct net_device *ndev;
+   unsigned int queued_tx;
+   unsigned int watermark_tx;
+   /* Protect access to vq_tx */
+   spinlock_t tx_lock;
+   /* Copied from Virtio config space */
+   u16 tx_hr;
+   u16 rx_hr;
+   u16 tx_tr;
+   u16 rx_tr;
+   u32 mtu;
+   u32 mru;
+};
+
+/*
+ * struct token_info - maintains Transmit buffer data handle
+ * @size:  size of transmit buffer
+ * @dma_handle: handle to allocated dma device memory area
+ * @vaddr: virtual address mapping to allocated memory area
+ */
+struct token_info {
+   size_t size;
+   u8 *vaddr;
+   dma_addr_t dma_handle;
+};
+
+/* Default if virtio config space is unavailable */
+#define CFV_DEF_MTU_SIZE 4096
+#define CFV_DEF_HEADROOM 16
+#define CFV_DEF_TAILROOM 16
+
+/* Require IP header to be 4-byte aligned. */
+#define IP_HDR_ALIGN 4
+
+/*
+ * virtqueue_next_avail_desc - get the next available descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor in the ring
+ *
+ * Look for the next available descriptor in the available ring.
+ * Return NULL if nothing new in the available.
+ */
+static struct vring_desc *virtqueue_next_avail_desc(struct virtqueue *_vq,
+   int *head)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   u16 avail_idx, hd, last_avail_idx = vq->last_avail_idx;
+
+   START_USE(vq);
+
+   if (unlikely(vq->broken))
+   goto

[RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty

2012-10-31 Thread Sjur Brændeland

From: Sjur Brændeland 

Enable option to force call of callback function even if
used ring is empty. This is needed for reversed vring.
Add a helper function  __vring_interrupt and add extra
boolean argument for forcing callback when interrupt is called.
The original vring_interrupt semantic and signature is
perserved.

Signed-off-by: Sjur Brændeland 
---
 drivers/remoteproc/remoteproc_virtio.c |2 +-
 drivers/virtio/virtio_ring.c   |6 +++---
 include/linux/virtio_ring.h|8 +++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index e7a4780..ddde863 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -63,7 +63,7 @@ irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int 
notifyid)
if (!rvring || !rvring->vq)
return IRQ_NONE;
 
-   return vring_interrupt(0, rvring->vq);
+   return __vring_interrupt(0, rvring->vq, true);
 }
 EXPORT_SYMBOL(rproc_vq_interrupt);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9027af6..af85034 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -504,11 +504,11 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
-irqreturn_t vring_interrupt(int irq, void *_vq)
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
-   if (!more_used(vq)) {
+   if (!force && !more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
}
@@ -522,7 +522,7 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 
return IRQ_HANDLED;
 }
-EXPORT_SYMBOL_GPL(vring_interrupt);
+EXPORT_SYMBOL_GPL(__vring_interrupt);
 
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 63c6ea1..ccb7915 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -20,5 +20,11 @@ void vring_del_virtqueue(struct virtqueue *vq);
 /* Filter out transport-specific feature bits. */
 void vring_transport_features(struct virtio_device *vdev);
 
-irqreturn_t vring_interrupt(int irq, void *_vq);
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force);
+
+static inline irqreturn_t vring_interrupt(int irq, void *_vq)
+{
+   return __vring_interrupt(irq, _vq, false);
+}
+
 #endif /* _LINUX_VIRTIO_RING_H */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 789 matches

Mail list logo