date:20140313

[PATCH] pinctrl: msm: fix up out-of-order merge conflict

2014-03-13 Thread Linus Walleij

Commit 051a58b4622f0e1b732acb750097c64bc00ddb93
"pinctrl: msm: Simplify msm_config_reg() and callers"
removed the local "reg" variable in the msm_config_reg()
function, but the earlier
commit ed118a5fd951bd2def8249ee251842c4f81fe4bd
"pinctrl-msm: Support output-{high,low} configuration"
introduced a new switchclause using it.

Fix this up by removing the offending register assignment.

Reported-by: Kbuild test robot 
Cc: Stephen Boyd 
Cc: Bjorn Andersson 
Signed-off-by: Linus Walleij 
---
 drivers/pinctrl/pinctrl-msm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/pinctrl/pinctrl-msm.c b/drivers/pinctrl/pinctrl-msm.c
index 19d2feb0674f..343f421c7696 100644
--- a/drivers/pinctrl/pinctrl-msm.c
+++ b/drivers/pinctrl/pinctrl-msm.c
@@ -215,7 +215,6 @@ static int msm_config_reg(struct msm_pinctrl *pctrl,
*mask = 7;
break;
case PIN_CONFIG_OUTPUT:
-   *reg = g->ctl_reg;
*bit = g->oe_bit;
*mask = 1;
break;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] scsi: Change sense buffer size to 252

2014-03-13 Thread Hannes Reinecke

On 03/14/2014 07:00 AM, Fam Zheng wrote:
> According to SPC-4, section 4.5.2.1, 252 is the limit of sense data. So
> increase the values.
> 
> Tested by hacking QEMU to fake virtio-scsi request sense len to 252.
> Without this patch the driver stops working immediately when it gets the
> request.
> 
> Signed-off-by: Fam Zheng 
> ---
>  include/linux/virtio_scsi.h | 2 +-
>  include/scsi/scsi_cmnd.h| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
> index 4195b97..a437f7f 100644
> --- a/include/linux/virtio_scsi.h
> +++ b/include/linux/virtio_scsi.h
> @@ -28,7 +28,7 @@
>  #define _LINUX_VIRTIO_SCSI_H
>  
>  #define VIRTIO_SCSI_CDB_SIZE   32
> -#define VIRTIO_SCSI_SENSE_SIZE 96
> +#define VIRTIO_SCSI_SENSE_SIZE 252
>  
>  /* SCSI command request, followed by data-out */
>  struct virtio_scsi_cmd_req {
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index 91558a1..a64dac03 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -104,7 +104,7 @@ struct scsi_cmnd {
>   struct request *request;/* The command we are
>  working on */
>  
> -#define SCSI_SENSE_BUFFERSIZE96
> +#define SCSI_SENSE_BUFFERSIZE252
>   unsigned char *sense_buffer;
>   /* obtained by REQUEST SENSE when
>* CHECK CONDITION is received on original
> 
Not without careful review.
Blindly increasing the buffersize is not a good idea; this define is
used at several locations and even within the drivers themselves.
So we cannot just increase the define for the SCSI stack.

And, btw, so far I haven't come across any issue where a sense
buffer overflow occurred. We first need to implement a proper sense
code handling (descriptor sense parsing etc) before we need to worry
about this.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [sched/balance] INFO: possible recursive locking detected

2014-03-13 Thread Fengguang Wu

On Fri, Mar 14, 2014 at 02:26:24PM +0800, Alex Shi wrote:
> On 03/14/2014 02:16 PM, Fengguang Wu wrote:
> > Alex,
> > 
> > Here are the test results for branch alexshi/single-balance
> 
> 
> Thanks a lot! Fengguang.
> Is the nex04 machine 4P*8 core * HT? and are a04/a06 atom box?

nex04 is Nehalem-EX. a04/06 are Atom servers.

> Would you like to share the machine info for nhm8, nhm-white and xps2?

They are all 1S Nehalem PC.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i2c-davinci: Handle signals gracefully

2014-03-13 Thread Mike Looijmans


On 03/10/2014 04:24 PM, Wolfram Sang wrote:



Even more, you should complete the whole transfer. There are devices
where things can really go wrong if you send a half-complete command and
then start with the next one. So, not checking signals at all is the way
to go for I2C drivers. There is some cruft left, so I am happy about
patches fixing that, with testing on real HW. Like yours here.


I agree.

I know the Zynq (using a cadence controller) also lets signals
interrupt I2C transfers, so I'll propose a patch to Xilinx and CC to
you and linux-i2c to completely remove signal handling from that
driver as well.


Cool, thanks!


Are you going to update the davinci patch as well?


An amended patch is on its way now. I forgot to set the subject to "PATCHv2" 
though.


Mike.



Met vriendelijke groet / kind regards,

Mike Looijmans

TOPIC Embedded Systems
Eindhovenseweg 32-C, NL-5683 KH Best
Postbus 440, NL-5680 AK Best
Telefoon: (+31) (0) 499 33 69 79
Telefax:  (+31) (0) 499 33 69 70
E-mail: mike.looijm...@topic.nl
Website: www.topic.nl

Please consider the environment before printing this e-mail

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] i2c-davinci: Handle signals gracefully

2014-03-13 Thread Mike Looijmans

When a signal is caught while the i2c-davinci bus driver is transferring,
the drive just "abandons" the transfer and leaves the controller to fend
for itself. The next I2C transaction will find the controller in an
undefined state and often results in a stream of "initiating i2c bus recovery"
messages until the controller arrives in a defined state. This behaviour
also sends out "half" or possibly even mixed messages to I2C client
devices which may put them in an undesired state as well.

This patch fixes this issue by always attempting to finish the current
transaction, and only abort on bus errors. This keeps the controller in a
defined state, and is also much friendlier towards client devices, because
it will only send complete messages.

Before this patch, reading an I2C device in a loop and interrupting it
often resulted in a "initiating i2c bus recovery" storm and not being
able to communicate via I2C for several seconds. With this patch, I2C
transactions will not be interrupted or otherwise halfway completed.

v2: Completely ignore signals.

Signed-off-by: Mike Looijmans 
---
 drivers/i2c/busses/i2c-davinci.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-davinci.c b/drivers/i2c/busses/i2c-davinci.c
index af0b583..254d897 100644
--- a/drivers/i2c/busses/i2c-davinci.c
+++ b/drivers/i2c/busses/i2c-davinci.c
@@ -372,9 +372,9 @@ i2c_davinci_xfer_msg(struct i2c_adapter *adap, struct 
i2c_msg *msg, int stop)
flag |= DAVINCI_I2C_MDR_STP;
davinci_i2c_write_reg(dev, DAVINCI_I2C_MDR_REG, flag);
 
-   r = wait_for_completion_interruptible_timeout(&dev->cmd_complete,
+   r = wait_for_completion_timeout(&dev->cmd_complete,
  dev->adapter.timeout);
-   if (r == 0) {
+   if (unlikely(r == 0)) {
dev_err(dev->dev, "controller timed out\n");
davinci_i2c_recover_bus(dev);
i2c_davinci_init(dev);
@@ -384,7 +384,6 @@ i2c_davinci_xfer_msg(struct i2c_adapter *adap, struct 
i2c_msg *msg, int stop)
if (dev->buf_len) {
/* This should be 0 if all bytes were transferred
 * or dev->cmd_err denotes an error.
-* A signal may have aborted the transfer.
 */
if (r >= 0) {
dev_err(dev->dev, "abnormal termination buf_len=%i\n",
@@ -436,22 +435,24 @@ i2c_davinci_xfer(struct i2c_adapter *adap, struct i2c_msg 
msgs[], int num)
ret = i2c_davinci_wait_bus_not_busy(dev, 1);
if (ret < 0) {
dev_warn(dev->dev, "timeout waiting for bus ready\n");
-   return ret;
+   goto error;
}
 
for (i = 0; i < num; i++) {
ret = i2c_davinci_xfer_msg(adap, &msgs[i], (i == (num - 1)));
-   dev_dbg(dev->dev, "%s [%d/%d] ret: %d\n", __func__, i + 1, num,
-   ret);
+   dev_dbg(dev->dev, "%s [%d/%d] %#x ret: %d\n", __func__, i + 1,
+   num, msgs[i].addr, ret);
if (ret < 0)
-   return ret;
+   goto error;
}
+   ret = num;
 
+error:
 #ifdef CONFIG_CPU_FREQ
complete(&dev->xfr_complete);
 #endif
 
-   return num;
+   return ret;
 }
 
 static u32 i2c_davinci_func(struct i2c_adapter *adap)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cpufreq: arm_big_little: make vexpress driver dependent on bL core driver

2014-03-13 Thread Viresh Kumar

Currently vexpress big LITTLE driver selects ARM_BIG_LITTLE_CPUFREQ and so if
CONFIG_BIG_LITTLE isn't enabled and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ is enabled
we get below warnings while compiling:

warning: (ARM_VEXPRESS_SPC_CPUFREQ) selects ARM_BIG_LITTLE_CPUFREQ which has
unmet direct dependencies (ARCH_HAS_CPUFREQ && CPU_FREQ && (ARM || ARM64) && ARM
&& BIG_LITTLE && ARM_CPU_TOPOLOGY && HAVE_CLK)

To fix this make ARM_VEXPRESS_SPC_CPUFREQ dependent on ARM_BIG_LITTLE_CPUFREQ
instead of selecting it.

This also moves entry for ARM_VEXPRESS_SPC_CPUFREQ along with other big LITTLE
config entries.

Signed-off-by: Viresh Kumar 
---
Hi Rafael,

Both of these are fixes, please see if they can make it to 3.14 only.

Thanks.

 drivers/cpufreq/Kconfig.arm | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 3129749..9fb6270 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -2,6 +2,7 @@
 # ARM CPU Frequency scaling drivers
 #
 
+# big LITTLE core layer and glue drivers
 config ARM_BIG_LITTLE_CPUFREQ
tristate "Generic ARM big LITTLE CPUfreq driver"
depends on ARM && BIG_LITTLE && ARM_CPU_TOPOLOGY && HAVE_CLK
@@ -16,6 +17,14 @@ config ARM_DT_BL_CPUFREQ
  This enables probing via DT for Generic CPUfreq driver for ARM
  big.LITTLE platform. This gets frequency tables from DT.
 
+config ARM_VEXPRESS_SPC_CPUFREQ
+tristate "Versatile Express SPC based CPUfreq driver"
+   depends on ARM_BIG_LITTLE_CPUFREQ && ARCH_VEXPRESS_SPC
+help
+  This add the CPUfreq driver support for Versatile Express
+ big.LITTLE platforms using SPC for power management.
+
+
 config ARM_EXYNOS_CPUFREQ
bool
 
@@ -241,11 +250,3 @@ config ARM_TEGRA_CPUFREQ
default y
help
  This adds the CPUFreq driver support for TEGRA SOCs.
-
-config ARM_VEXPRESS_SPC_CPUFREQ
-tristate "Versatile Express SPC based CPUfreq driver"
-select ARM_BIG_LITTLE_CPUFREQ
-depends on ARCH_VEXPRESS_SPC
-help
-  This add the CPUfreq driver support for Versatile Express
- big.LITTLE platforms using SPC for power management.
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] cpufreq: arm_big_little: set 'physical_cluster' for each cpu

2014-03-13 Thread Viresh Kumar

We have a per-cpu variable for managing which cluster does a CPU belong to.
Currently physical_cluster is set only for the policy->cpu. And that results in
following on some SoC's:

- There are two clusters:
  - Cluster 0 has four ARM Cortex A7 CPUs (slower ones): 0,1,2,3
  - Cluster 1 has four ARM Cortex A15 CPUs (faster ones): 4,5,6,7
- CPUs are booted in order 0,1..7 and so initially policy->cpu for A7 cluster
  would be 0 and for A15 cluster would be 4.
- Now CPU4 (i.e. A15_0) is hotplugged out and so policy->cpu for A15 cluster
  becomes 5 (i.e. A15_1).
- But physical cluster is only set for CPU0 and CPU4 in ARM big LITTLE driver
  and isn't updated.
- Now freq change request comes for A15 cluster and we would try to update freq
  of physical_cluster of CPU5, i.e. A15_1. And it is currently set to zero
  (default value of uninitialized global variables).
- And so we actually try to change freq of A7 cluster instead of A15.
- This also results in kernel crash as sometimes we might request freq above
  A7's limit and CPU may behave badly..

Fix this by initializing physical_cluster for all CPUs of a policy.

(Adding signed-off by Xin as he reported this issue and provided this diff)

Signed-off-by: Xin Wang 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/arm_big_little.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 3d87078..bad2ed3 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -446,9 +446,12 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy)
}
 
if (cur_cluster < MAX_CLUSTERS) {
+   int cpu;
+
cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu));
 
-   per_cpu(physical_cluster, policy->cpu) = cur_cluster;
+   for_each_cpu(cpu, policy->cpus)
+   per_cpu(physical_cluster, cpu) = cur_cluster;
} else {
/* Assumption: during init, we are always running on A15 */
per_cpu(physical_cluster, policy->cpu) = A15_CLUSTER;
-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 36/52] zsmalloc: Fix CPU hotplug callback registration

2014-03-13 Thread Minchan Kim

On Tue, Mar 11, 2014 at 02:09:59AM +0530, Srivatsa S. Bhat wrote:
> Subsystems that want to register CPU hotplug callbacks, as well as perform
> initialization for the CPUs that are already online, often do it as shown
> below:
> 
>   get_online_cpus();
> 
>   for_each_online_cpu(cpu)
>   init_cpu(cpu);
> 
>   register_cpu_notifier(&foobar_cpu_notifier);
> 
>   put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
> with CPU hotplug operations).
> 
> Instead, the correct and race-free way of performing the callback
> registration is:
> 
>   cpu_notifier_register_begin();
> 
>   for_each_online_cpu(cpu)
>   init_cpu(cpu);
> 
>   /* Note the use of the double underscored version of the API */
>   __register_cpu_notifier(&foobar_cpu_notifier);
> 
>   cpu_notifier_register_done();
> 
> 
> Fix the zsmalloc code by using this latter form of callback registration.
> 
> Cc: Minchan Kim 
> Cc: Nitin Gupta 
> Cc: Ingo Molnar 
> Cc: linux...@kvack.org
> Signed-off-by: Srivatsa S. Bhat 

Acked-by: Minchan Kim 

Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 0/6] mm: support madvise(MADV_FREE)

2014-03-13 Thread Minchan Kim

This patch is an attempt to support MADV_FREE for Linux.

Rationale is following as.

Allocators call munmap(2) when user call free(3) if ptr is
in mmaped area. But munmap isn't cheap because it have to clean up
all pte entries, unlinking a vma and returns free pages to buddy
so overhead would be increased linearly by mmaped area's size.
So they like madvise_dontneed rather than munmap.

"dontneed" holds read-side lock of mmap_sem so other threads
of the process could go with concurrent page faults so it is
better than munmap if it's not lack of address space.
But the problem is that most of allocator reuses that address
space soonish so applications see page fault, page allocation,
page zeroing if allocator already called madvise_dontneed
on the address space.

For avoidng that overheads, other OS have supported MADV_FREE.
The idea is just mark pages as lazyfree when madvise called
and purge them if memory pressure happens. Otherwise, VM doesn't
detach pages on the address space so application could use
that memory space without above overheads.

I tweaked jamalloc to use MADV_FREE for the testing.

diff --git a/src/chunk_mmap.c b/src/chunk_mmap.c
index 8a42e75..20e31af 100644
--- a/src/chunk_mmap.c
+++ b/src/chunk_mmap.c
@@ -131,7 +131,7 @@ pages_purge(void *addr, size_t length)
 #  else
 #error "No method defined for purging unused dirty pages."
 #  endif
-   int err = madvise(addr, length, JEMALLOC_MADV_PURGE);
+   int err = madvise(addr, length, 5);
unzeroed = (JEMALLOC_MADV_ZEROS == false || err != 0);
 #  undef JEMALLOC_MADV_PURGE
 #  undef JEMALLOC_MADV_ZEROS


RAM 2G, CPU 4, ebizzy benchmark(./ebizzy -S 30 -n 512)

(1.1) stands for 1 process and 1 thread so for exmaple,
(1.4) is 1 process and 4 thread.

vanilla jemalloc patched jemalloc

1.1   1.1
records:  5  records:  5
avg:  7404.60avg:  14059.80
std:  116.67(1.58%)  std:  93.92(0.67%)
max:  7564.00max:  14152.00
min:  7288.00min:  13893.00
1.4   1.4
records:  5  records:  5
avg:  16160.80   avg:  30173.00
std:  509.80(3.15%)  std:  3050.72(10.11%)
max:  16728.00   max:  33989.00
min:  15216.00   min:  25173.00
1.8   1.8
records:  5  records:  5
avg:  16003.00   avg:  30080.20
std:  290.40(1.81%)  std:  2063.57(6.86%)
max:  16537.00   max:  32735.00
min:  15727.00   min:  27381.00
4.1   4.1
records:  5  records:  5
avg:  4003.60avg:  8064.80
std:  65.33(1.63%)   std:  143.89(1.78%)
max:  4118.00max:  8319.00
min:  3921.00min:  7888.00
4.4   4.4
records:  5  records:  5
avg:  3907.40avg:  7199.80
std:  48.68(1.25%)   std:  80.21(1.11%)
max:  3997.00max:  7320.00
min:  3863.00min:  7113.00
4.8   4.8
records:  5  records:  5
avg:  3893.00avg:  7195.20
std:  19.11(0.49%)   std:  101.55(1.41%)
max:  3927.00max:  7309.00
min:  3869.00min:  7012.00
8.1   8.1
records:  5  records:  5
avg:  1942.00avg:  3602.80
std:  34.60(1.78%)   std:  22.97(0.64%)
max:  2010.00max:  3632.00
min:  1913.00min:  3563.00
8.4   8.4
records:  5  records:  5
avg:  1938.00avg:  3405.60
std:  32.77(1.69%)   std:  36.25(1.06%)
max:  1998.00max:  3468.00
min:  1905.00min:  3374.00
8.8   8.8
records:  5  records:  5
avg:  1977.80avg:  3434.20
std:  25.75(1.30%)   std:  57.95(1.69%)
max:  2011.00max:  3533.00
min:  1937.00min:  3363.00

So, MADV_FREE is 2 time faster than MADV_DONTNEED for
every cases.

I didn't test a lot but it's enough to show the concept and
direction before LSF/MM.

Patchset is based on 3.14-rc6.

Welcome any comment!

Minchan Kim (6):
  mm: clean up PAGE_MAPPING_FLAGS
  mm: work deactivate_page with anon pages
  mm: support madvise(MADV_FREE)
  mm: add stat about lazyfree pages
  mm: reclaim lazyfree pages in swapless system
  mm: ksm: don't merge lazyfree page

 include/asm-generic/tlb.h  |  9 
 include/linux/mm.h | 39 +-
 include/linux/mm_inline.h  |  9 
 include/linux/mmzone.h |  1 +
 include/linux/rmap.h   |  1 +
 include/linux/swap.h   | 15 +
 include/linux/vm_event_item.h  |  1 +
 include/uapi/asm-generic/mman-common.h |  1 +
 mm/ksm.c   | 18 +++-
 mm/madvise.c   | 17 +--
 mm/memory.c| 12 ++-
 mm/page_alloc.c|  5 -
 m

[RFC 4/6] mm: add stat about lazyfree pages

2014-03-13 Thread Minchan Kim

This patch adds new vmstat for lazyfree pages so that admin
could check how many of lazyfree pages remains each zone
and how many of lazyfree pages purged until now.

Signed-off-by: Minchan Kim 
---
 include/linux/mm.h| 4 
 include/linux/mmzone.h| 1 +
 include/linux/vm_event_item.h | 1 +
 mm/page_alloc.c   | 5 -
 mm/vmscan.c   | 1 +
 mm/vmstat.c   | 2 ++
 6 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9b048cabce27..498613946991 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -975,6 +975,8 @@ static inline void SetPageLazyFree(struct page *page)
 
page->mapping = (void *)((unsigned long)page->mapping |
PAGE_MAPPING_LZFREE);
+
+   __inc_zone_page_state(page, NR_LAZYFREE_PAGES);
 }
 
 static inline void ClearPageLazyFree(struct page *page)
@@ -984,6 +986,8 @@ static inline void ClearPageLazyFree(struct page *page)
 
page->mapping = (void *)((unsigned long)page->mapping &
~PAGE_MAPPING_LZFREE);
+
+   __dec_zone_page_state(page, NR_LAZYFREE_PAGES);
 }
 
 static inline int PageLazyFree(struct page *page)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 5f2052c83154..7366ec56ea73 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -113,6 +113,7 @@ enum zone_stat_item {
NR_ACTIVE_FILE, /*  " " "   "   " */
NR_UNEVICTABLE, /*  " " "   "   " */
NR_MLOCK,   /* mlock()ed pages found and moved off LRU */
+   NR_LAZYFREE_PAGES,  /* freeable pages at memory pressure */
NR_ANON_PAGES,  /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
   only modified from process context */
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 3a712e2e7d76..6b5b870895da 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,6 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
FOR_ALL_ZONES(PGALLOC),
PGFREE, PGACTIVATE, PGDEACTIVATE,
PGFAULT, PGMAJFAULT,
+   PGLAZYFREE,
FOR_ALL_ZONES(PGREFILL),
FOR_ALL_ZONES(PGSTEAL_KSWAPD),
FOR_ALL_ZONES(PGSTEAL_DIRECT),
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3bac76ae4b30..596f24ecf397 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -731,8 +731,11 @@ static bool free_pages_prepare(struct page *page, unsigned 
int order)
trace_mm_page_free(page, order);
kmemcheck_free_shadow(page, order);
 
-   if (PageAnon(page))
+   if (PageAnon(page)) {
+   if (PageLazyFree(page))
+   __dec_zone_page_state(page, NR_LAZYFREE_PAGES);
page->mapping = NULL;
+   }
for (i = 0; i < (1 << order); i++)
bad += free_pages_check(page + i);
if (bad)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0ab38faebe98..98a1c3ffcaab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -832,6 +832,7 @@ static unsigned long shrink_page_list(struct list_head 
*page_list,
if (!page_freeze_refs(page, 1))
goto keep_locked;
unlock_page(page);
+   count_vm_event(PGLAZYFREE);
goto free_it;
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index def5dd2fbe61..4235aeb9b96e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -742,6 +742,7 @@ const char * const vmstat_text[] = {
"nr_active_file",
"nr_unevictable",
"nr_mlock",
+   "nr_lazyfree_pages",
"nr_anon_pages",
"nr_mapped",
"nr_file_pages",
@@ -789,6 +790,7 @@ const char * const vmstat_text[] = {
 
"pgfault",
"pgmajfault",
+   "pglazyfree",
 
TEXTS_FOR_ZONES("pgrefill")
TEXTS_FOR_ZONES("pgsteal_kswapd")
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 5/6] mm: reclaim lazyfree pages in swapless system

2014-03-13 Thread Minchan Kim

If there are lazyfree pages in system, shrink inactive anonymous
LRU to discard lazyfree pages regardless of existing avaialable
swap.

Signed-off-by: Minchan Kim 
---
 mm/vmscan.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 98a1c3ffcaab..ad73e053c581 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1889,8 +1889,13 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
if (!global_reclaim(sc))
force_scan = true;
 
-   /* If we have no swap space, do not bother scanning anon pages. */
-   if (!sc->may_swap || (get_nr_swap_pages() <= 0)) {
+   /*
+* If we have no swap space and lazyfree pages,
+* do not bother scanning anon pages.
+*/
+   if (!sc->may_swap ||
+   (get_nr_swap_pages() <= 0 &&
+   zone_page_state(zone, NR_LAZYFREE_PAGES) <= 0)) {
scan_balance = SCAN_FILE;
goto out;
}
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 2/6] mm: work deactivate_page with anon pages

2014-03-13 Thread Minchan Kim

Now, deactivate_page works for file page but MADV_FREE will use
it to move lazyfree pages to inactive LRU's tail so this patch
makes deactivate_page work with anon pages as well as file pages.

Signed-off-by: Minchan Kim 
---
 include/linux/mm_inline.h |  9 +
 mm/swap.c | 20 ++--
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index cf55945c83fb..0503caafd532 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -22,6 +22,15 @@ static inline int page_is_file_cache(struct page *page)
return !PageSwapBacked(page);
 }
 
+static __always_inline void add_page_to_lru_list_tail(struct page *page,
+   struct lruvec *lruvec, enum lru_list lru)
+{
+   int nr_pages = hpage_nr_pages(page);
+   mem_cgroup_update_lru_size(lruvec, lru, nr_pages);
+   list_add_tail(&page->lru, &lruvec->lists[lru]);
+   __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, nr_pages);
+}
+
 static __always_inline void add_page_to_lru_list(struct page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 0092097b3f4c..ac13714b5d8b 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -643,14 +643,11 @@ void add_page_to_unevictable_list(struct page *page)
  * If the page isn't page_mapped and dirty/writeback, the page
  * could reclaim asap using PG_reclaim.
  *
- * 1. active, mapped page -> none
- * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
- * 3. inactive, mapped page -> none
- * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
- * 5. inactive, clean -> inactive, tail
- * 6. Others -> none
+ * 1. file mapped page -> none
+ * 2. dirty/writeback page -> head of inactive with PG_reclaim
+ * 3. inactive, clean -> tail of inactive
  *
- * In 4, why it moves inactive's head, the VM expects the page would
+ * In 2, why it moves inactive's head, the VM expects the page would
  * be write it out by flusher threads as this is much more effective
  * than the single-page writeout from reclaim.
  */
@@ -667,7 +664,7 @@ static void lru_deactivate_fn(struct page *page, struct 
lruvec *lruvec,
return;
 
/* Some processes are using the page */
-   if (page_mapped(page))
+   if (!PageAnon(page) && page_mapped(page))
return;
 
active = PageActive(page);
@@ -677,7 +674,6 @@ static void lru_deactivate_fn(struct page *page, struct 
lruvec *lruvec,
del_page_from_lru_list(page, lruvec, lru + active);
ClearPageActive(page);
ClearPageReferenced(page);
-   add_page_to_lru_list(page, lruvec, lru);
 
if (PageWriteback(page) || PageDirty(page)) {
/*
@@ -686,12 +682,16 @@ static void lru_deactivate_fn(struct page *page, struct 
lruvec *lruvec,
 * is _really_ small and  it's non-critical problem.
 */
SetPageReclaim(page);
+   add_page_to_lru_list(page, lruvec, lru);
} else {
/*
 * The page's writeback ends up during pagevec
 * We moves tha page into tail of inactive.
+*
+* The lazyfree page move into lru's tail to
+* discard easily.
 */
-   list_move_tail(&page->lru, &lruvec->lists[lru]);
+   add_page_to_lru_list_tail(page, lruvec, lru);
__count_vm_event(PGROTATED);
}
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 6/6] mm: ksm: don't merge lazyfree page

2014-03-13 Thread Minchan Kim

I didn't test this patch but just wanted to make lagefree pages KSM.

Signed-off-by: Minchan Kim 
---
 mm/ksm.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 68710e80994a..43ca73aa45e7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -470,7 +470,8 @@ static struct page *get_mergeable_page(struct rmap_item 
*rmap_item)
page = follow_page(vma, addr, FOLL_GET);
if (IS_ERR_OR_NULL(page))
goto out;
-   if (PageAnon(page) || page_trans_compound_anon(page)) {
+   if ((PageAnon(page) && !PageLazyFree(page)) ||
+   page_trans_compound_anon(page)) {
flush_anon_page(vma, page, addr);
flush_dcache_page(page);
} else {
@@ -1032,13 +1033,20 @@ static int try_to_merge_one_page(struct vm_area_struct 
*vma,
 
/*
 * We need the page lock to read a stable PageSwapCache in
-* write_protect_page().  We use trylock_page() instead of
-* lock_page() because we don't want to wait here - we
-* prefer to continue scanning and merging different pages,
+* write_protect_page() and check lazyfree.
+* We use trylock_page() instead of lock_page() because we
+* don't want to wait here - we prefer to continue scanning
+* and merging different pages,
 * then come back to this page when it is unlocked.
 */
if (!trylock_page(page))
goto out;
+
+   if (PageLazyFree(page)) {
+   unlock_page(page);
+   goto out;
+   }
+
/*
 * If this anonymous page is mapped only here, its pte may need
 * to be write-protected.  If it's mapped elsewhere, all of its
@@ -1621,7 +1629,7 @@ next_mm:
cond_resched();
continue;
}
-   if (PageAnon(*page) ||
+   if ((PageAnon(*page) && !PageLazyFree(*page)) ||
page_trans_compound_anon(*page)) {
flush_anon_page(vma, *page, ksm_scan.address);
flush_dcache_page(*page);
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 1/6] mm: clean up PAGE_MAPPING_FLAGS

2014-03-13 Thread Minchan Kim

It's preparation to squeeze a new flag PAGE_MAPPING_LZFREE so
functions to get a anon_vma from mapping shouldn't assume that
+/- PAGE_MAPPING_ANON is enough.

Signed-off-by: Minchan Kim 
---
 mm/rmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index d9d42316a99a..76069afa6b81 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -412,7 +412,7 @@ struct anon_vma *page_get_anon_vma(struct page *page)
if (!page_mapped(page))
goto out;
 
-   anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+   anon_vma = page_rmapping(page);
if (!atomic_inc_not_zero(&anon_vma->refcount)) {
anon_vma = NULL;
goto out;
@@ -455,7 +455,7 @@ struct anon_vma *page_lock_anon_vma_read(struct page *page)
if (!page_mapped(page))
goto out;
 
-   anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
+   anon_vma = page_rmapping(page);
root_anon_vma = ACCESS_ONCE(anon_vma->root);
if (down_read_trylock(&root_anon_vma->rwsem)) {
/*
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 3/6] mm: support madvise(MADV_FREE)

2014-03-13 Thread Minchan Kim

Linux doesn't have an ability to free pages lazy while other OS
already have been supported that named by madvise(MADV_FREE).

The gain is clear that kernel can evict freed pages rather than
swapping out or OOM if memory pressure happens.

Without memory pressure, freed pages would be reused by userspace
without another additional overhead(ex, page fault + + page allocation
+ page zeroing).

Firstly, heavy users would be general allocators(ex, jemalloc,
I hope ptmalloc support it) and jemalloc already have supported
the feature for other OS(ex, FreeBSD)

At the moment, this patch would break build other ARCHs which have
own TLB flush scheme other than that x86 but if there is no objection
in this direction, I will add patches for handling other ARCHs
in next iteration.

Signed-off-by: Minchan Kim 
---
 include/asm-generic/tlb.h  |  9 
 include/linux/mm.h | 35 ++-
 include/linux/rmap.h   |  1 +
 include/linux/swap.h   | 15 ++
 include/uapi/asm-generic/mman-common.h |  1 +
 mm/madvise.c   | 17 +--
 mm/memory.c| 12 ++-
 mm/rmap.c  | 21 +--
 mm/swap_state.c| 38 +-
 mm/vmscan.c| 22 +++-
 10 files changed, 163 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 5672d7ea1fa0..b82ee729a065 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,8 +116,17 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct 
mm_struct *mm, unsigned long
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
unsigned long end);
+int __tlb_madvfree_page(struct mmu_gather *tlb, struct page *page);
 int __tlb_remove_page(struct mmu_gather *tlb, struct page *page);
 
+static inline void tlb_madvfree_page(struct mmu_gather *tlb, struct page *page)
+{
+   /* Prevent page free */
+   get_page(page);
+   if (!__tlb_remove_page(tlb, MarkLazyFree(page)))
+   tlb_flush_mmu(tlb);
+}
+
 /* tlb_remove_page
  * Similar to __tlb_remove_page but will call tlb_flush_mmu() itself when
  * required.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c1b7414c7bef..9b048cabce27 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -933,10 +933,16 @@ void page_address_init(void);
  * Please note that, confusingly, "page_mapping" refers to the inode
  * address_space which maps the page from disk; whereas "page_mapped"
  * refers to user virtual address space into which the page is mapped.
+ *
+ * PAGE_MAPPING_LZFREE bit is set along with PAGE_MAPPING_ANON bit
+ * and then page->mapping points to an anon_vma. This flag is used
+ * for lazy freeing the page instead of swap.
  */
 #define PAGE_MAPPING_ANON  1
 #define PAGE_MAPPING_KSM   2
-#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
+#define PAGE_MAPPING_LZFREE4
+#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM | \
+PAGE_MAPPING_LZFREE)
 
 extern struct address_space *page_mapping(struct page *page);
 
@@ -962,6 +968,32 @@ static inline int PageAnon(struct page *page)
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
+static inline void SetPageLazyFree(struct page *page)
+{
+   BUG_ON(!PageAnon(page));
+   BUG_ON(!PageLocked(page));
+
+   page->mapping = (void *)((unsigned long)page->mapping |
+   PAGE_MAPPING_LZFREE);
+}
+
+static inline void ClearPageLazyFree(struct page *page)
+{
+   BUG_ON(!PageAnon(page));
+   BUG_ON(!PageLocked(page));
+
+   page->mapping = (void *)((unsigned long)page->mapping &
+   ~PAGE_MAPPING_LZFREE);
+}
+
+static inline int PageLazyFree(struct page *page)
+{
+   if (((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
+   (PAGE_MAPPING_ANON|PAGE_MAPPING_LZFREE))
+   return 1;
+   return 0;
+}
+
 /*
  * Return the pagecache index of the passed page.  Regular pagecache pages
  * use ->index whereas swapcache pages use ->private
@@ -1054,6 +1086,7 @@ struct zap_details {
struct address_space *check_mapping;/* Check page->mapping if set */
pgoff_t first_index;/* Lowest page->index to unmap 
*/
pgoff_t last_index; /* Highest page->index to unmap 
*/
+   int lazy_free;  /* do lazy free */
 };
 
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1da693d51255..19e74aebb3d5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rm

Re: [sched/balance] INFO: possible recursive locking detected

2014-03-13 Thread Alex Shi

On 03/14/2014 02:16 PM, Fengguang Wu wrote:
> Alex,
> 
> Here are the test results for branch alexshi/single-balance


Thanks a lot! Fengguang.
Is the nex04 machine 4P*8 core * HT? and are a04/a06 atom box?
Would you like to share the machine info for nhm8, nhm-white and xps2?

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Improve 32 bit vDSO time

2014-03-13 Thread Stefani Seibold

Am Donnerstag, den 13.03.2014, 15:08 -0700 schrieb H. Peter Anvin:
> On 03/13/2014 01:11 AM, Stefani Seibold wrote:
> > Am Mittwoch, den 12.03.2014, 20:48 -0700 schrieb H. Peter Anvin:
> >> On 03/12/2014 04:11 PM, stef...@seibold.net wrote:
> >>>
> >>> I will do this when your patch is pulled into tip. For now we have the
> >>> choice, but i preferer our solution removing the compat vdso.
> >>>
> >>
> >> Sorry, that didn't parse from me.
> > 
> > I thought it is a good idea to wait until the "remove compat vdso
> > support" is settled and pulled into tip.
> > 
> > If there is no objections against this patch i am happy to do the
> > job ;-)
> > 
> > BTW: Thanks to Andy for doing this job to git rid off this ugly compat
> > vDSO layer.
> > 
> >> Also, if you state a preference, could you please motivate it?
> > 
> > For the next three days i am very busy with a important project, so i
> > will rebase the vdso 32 bit time patch on Monday or Tuesday. 
> > 
> 
> So when I get the updated patch from Andy I will pull it into tip,
> resetting the x86/vdso branch.  I'll then expect an upgraded and
> simplified patchset from you to put on top.
> 
> Sounds like a plan.
> 

I love it when a plan comes together.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[sched] 373e9ec30ad: -11.2% qperf.udp_bw.send_bw.MB_sec

2014-03-13 Thread Fengguang Wu

Alex, we noticed the below changes on

https://github.com/alexshi/power-scheduling.git single-balance
commit 373e9ec30ad1efee2c4ba6b58fc317626e6482d0 ("sched: only do load balance 
on tick_do_timer_cpu")

test case: lkp-a06/micro/qperf/600s

  v3.14-rc6  373e9ec30ad1efee2c4ba6b58  
---  -  
   273 ~ 0% -11.2%242 ~ 1%  TOTAL qperf.udp_bw.send_bw.MB_sec
   268 ~ 0%  -9.8%242 ~ 0%  TOTAL qperf.tcp_bw.bw.MB_sec
   266 ~ 0%  -9.8%240 ~ 1%  TOTAL qperf.udp_bw.recv_bw.MB_sec
 48281 ~ 2% -81.3%   9035 ~ 1%  TOTAL softirqs.SCHED
   1359067 ~ 4% -30.7% 942510 ~ 2%  TOTAL interrupts.RES
  10201776 ~14% -30.1%7133067 ~10%  TOTAL cpuidle.C3.time
  52277488 ~ 4% -26.6%   38366608 ~ 6%  TOTAL cpuidle.C2.time
28 ~14% -25.3% 21 ~18%  TOTAL cpuidle.POLL.usage
  1516 ~16% -27.0%   1107 ~22%  TOTAL cpuidle.POLL.time
  3955 ~ 7% -24.8%   2972 ~14%  TOTAL cpuidle.C3.usage
  1.10 ~ 9% -17.0%   0.91 ~16%  TOTAL 
perf-profile.cpu-cycles.menu_select.cpuidle_idle_call.arch_cpu_idle.cpu_startup_entry.start_secondary
   9066611 ~ 0%  -8.3%8313872 ~ 1%  TOTAL proc-vmstat.pgalloc_dma32
  24350865 ~ 0%  -8.3%   22328921 ~ 1%  TOTAL proc-vmstat.pgfree
  15285179 ~ 0%  -8.3%   14016032 ~ 1%  TOTAL proc-vmstat.pgalloc_normal
  2961 ~ 3% -19.7%   2378 ~ 3%  TOTAL vmstat.system.in
   3887389 ~20%  +9.7%4262549 ~ 0%  TOTAL 
time.voluntary_context_switches
 14152 ~ 1%  +2.2%  14464 ~ 0%  TOTAL vmstat.system.cs

Legend:
~XX%- stddev percent
[+-]XX% - change percent


qperf.udp_bw.send_bw.MB_sec

   280 ++---+
   |.*. |
   275 *+...*.*...**  ..   *|
   270 ++  **.. *.  |
   ||
   265 ++   |
   260 ++   |
   ||
   255 ++   |
   250 ++  O|
   ||
   245 ++  O|
   240 O+   O O   O OO OO
   |   OO   O   |
   235 ++O--+

[*] bisect-good sample
[O] bisect-bad  sample

Thanks,
Fengguang
./qperf
./qperf 127.0.0.1 --time 100 tcp_bw tcp_lat udp_bw udp_lat sctp_bw sctp_lat quit

Re: [sched/balance] INFO: possible recursive locking detected

2014-03-13 Thread Fengguang Wu

Alex,

Here are the test results for branch alexshi/single-balance

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   100 ~ 0% +69.7%170 ~ 0%  lkp-nex04/micro/ebizzy/200%-100-10
   100 ~ 0% +69.7%170 ~ 0%  TOTAL 
ebizzy.throughput.per_thread.max

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
70 ~ 0% -24.1% 53 ~ 2%  lkp-nex04/micro/ebizzy/200%-100-10
70 ~ 0% -24.1% 53 ~ 2%  TOTAL 
ebizzy.throughput.per_thread.min

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
  8259 ~ 0% -21.1%   6513 ~ 0%  nhm8/micro/dbench/100%
  8259 ~ 0% -21.1%   6513 ~ 0%  TOTAL dbench.throughput-MB/sec

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   273 ~ 0% -12.2%239 ~ 0%  lkp-a06/micro/qperf/600s
   273 ~ 0% -12.2%239 ~ 0%  TOTAL qperf.udp_bw.send_bw.MB_sec

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   266 ~ 0% -11.5%236 ~ 1%  lkp-a06/micro/qperf/600s
   266 ~ 0% -11.5%236 ~ 1%  TOTAL qperf.udp_bw.recv_bw.MB_sec

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
  1305 ~ 0%  -1.6%   1284 ~ 0%  
lkp-a04/micro/netperf/120s-200%-SCTP_RR
  1305 ~ 0%  -1.6%   1284 ~ 0%  TOTAL netperf.Throughput_tps

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
 0 ~ 0%  +Inf% 15 ~37%  xps2/micro/pigz/100%
 0 ~ 0%  +Inf% 15 ~37%  TOTAL cpuidle.POLL.time

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
 0 ~ 0%  +Inf%  1 ~35%  xps2/micro/pigz/100%
 0 ~ 0%  +Inf%  1 ~35%  TOTAL cpuidle.POLL.usage

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   652 ~ 8%  +12247.8%  80606 ~37%  lkp-a06/micro/qperf/600s
   652 ~ 8%  +12247.8%  80606 ~37%  TOTAL 
interrupts.46:PCI-MSI-edge.eth0-rx-0

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
 2.076e+08 ~ 2% -38.8%  1.271e+08 ~ 4%  lkp-a06/micro/qperf/600s
 2.076e+08 ~ 2% -38.8%  1.271e+08 ~ 4%  TOTAL cpuidle.C1.time

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   272 ~28% -77.7% 60 ~33%  nhm8/micro/dbench/100%
   272 ~28% -77.7% 60 ~33%  TOTAL cpuidle.C1E-NHM.usage

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
150897 ~17%+170.9% 408742 ~25%  xps2/micro/pigz/100%
150897 ~17%+170.9% 408742 ~25%  TOTAL cpuidle.C1-NHM.time

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   396 ~ 8% +44.0%570 ~15%  xps2/micro/pigz/100%
   396 ~ 8% +44.0%570 ~15%  TOTAL cpuidle.C1-NHM.usage

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
142164 ~38%+134.0% 332734 ~13%  
lkp-a04/micro/netperf/120s-200%-SCTP_RR
107420 ~11% +60.7% 172632 ~34%  
lkp-a04/micro/netperf/120s-200%-SCTP_STREAM
  52277488 ~ 4% -27.7%   37785333 ~ 3%  lkp-a06/micro/qperf/600s
  52527073 ~ 5% -27.1%   38290700 ~ 3%  TOTAL cpuidle.C2.time

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
   259 ~12% +40.6%365 ~20%  
lkp-a04/micro/netperf/120s-200%-SCTP_STREAM
   259 ~12% +40.6%365 ~20%  TOTAL cpuidle.C2.usage

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
 17249 ~44%+234.0%  57611 ~37%  
lkp-a04/micro/netperf/120s-200%-SCTP_RR
115242 ~18% +42.2% 163908 ~20%  
lkp-a04/micro/netperf/120s-200%-SCTP_STREAM
 15808 ~33%+323.3%  66910 ~31%  
lkp-a04/micro/netperf/120s-200%-TCP_CRR
 21121 ~42%+194.6%  62220 ~39%  
lkp-a04/micro/netperf/120s-200%-TCP_RR
  10201776 ~14% -44.2%5690345 ~32%  lkp-a06/micro/qperf/600s
  10371197 ~14% -41.8%6040996 ~31%  TOTAL cpuidle.C3.time

  v3.14-rc6  c3ccbcb772e3b1c7b976dfb1c  
---  -  
 13692 ~ 2%+115.8%  29549 ~ 1%  
lkp-a04/micro/netperf/120s-200%-SCTP_RR
 15001 ~ 2% +33.6%  20043 ~ 4%  
lkp-a04/micro/netperf/120s-200%-SCTP_STREAM
 14224 ~ 6%+105.2%  29188 ~ 1%  
lkp-a04/micro/netperf/120s-200%-SCTP_STREAM_MANY
 13113 ~ 2%+142.4%  31788 ~10%  
lkp-a04/micro/netperf/120s-200%-TCP_CRR
 13398 ~ 3%+121.7%  29701 ~ 1%  
lkp-a04/micro/netperf/120s-200%-TCP_RR
 13536 ~ 3%+114.7%  29056 ~ 0%  
lkp-a04/micro/netperf/120s-200%-TCP_SENDFILE
 13358 ~ 2%+118.8%

[PATCH v2] scsi: Change sense buffer size to 252

2014-03-13 Thread Fam Zheng

According to SPC-4, section 4.5.2.1, 252 is the limit of sense data. So
increase the values.

Tested by hacking QEMU to fake virtio-scsi request sense len to 252.
Without this patch the driver stops working immediately when it gets the
request.

Signed-off-by: Fam Zheng 
---
 include/linux/virtio_scsi.h | 2 +-
 include/scsi/scsi_cmnd.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
index 4195b97..a437f7f 100644
--- a/include/linux/virtio_scsi.h
+++ b/include/linux/virtio_scsi.h
@@ -28,7 +28,7 @@
 #define _LINUX_VIRTIO_SCSI_H
 
 #define VIRTIO_SCSI_CDB_SIZE   32
-#define VIRTIO_SCSI_SENSE_SIZE 96
+#define VIRTIO_SCSI_SENSE_SIZE 252
 
 /* SCSI command request, followed by data-out */
 struct virtio_scsi_cmd_req {
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 91558a1..a64dac03 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -104,7 +104,7 @@ struct scsi_cmnd {
struct request *request;/* The command we are
   working on */
 
-#define SCSI_SENSE_BUFFERSIZE  96
+#define SCSI_SENSE_BUFFERSIZE  252
unsigned char *sense_buffer;
/* obtained by REQUEST SENSE when
 * CHECK CONDITION is received on original
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Oops in follow_page_mask when running 3.14.0-rc6-next-20140312-smp

2014-03-13 Thread Michalis Pappas

Hello,

last night I ran into an oops on -next-20140312. Hasn't occurred again since 
and I don't have a clue on how to reproduce it either. Anyway, here it goes:

===
Mar 13 21:07:00 darkstar kernel: [10579.371582] BUG: unable to handle kernel 
NULL pointer dereference at 0010
Mar 13 21:07:00 darkstar kernel: [10579.371651] IP: [] 
follow_page_mask+0x27/0x440
Mar 13 21:07:00 darkstar kernel: [10579.371701] *pdpt = 34a06001 *pde = 
 
Mar 13 21:07:00 darkstar kernel: [10579.371747] Oops:  [#1] SMP 
Mar 13 21:07:00 darkstar kernel: [10579.371778] Modules linked in: ipv6 
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss 
snd_mixer_oss nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables cpufreq_ondemand speedstep_lib lp ppdev 
parport_pc parport fuse usb_storage btusb bluetooth uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_core videodev joydev i915 coretemp iwldvm 
drm_kms_helper drm sdhci_pci acer_wmi sparse_keymap video thermal mac80211 
acpi_cpufreq processor i2c_i801 intel_agp thermal_sys r8169 mii sdhci iwlwifi 
snd_hda_codec_realtek snd_hda_codec_generic hwmon intel_gtt mxm_wmi lpc_ich 
jmb38x_ms mmc_core cfg80211 memstick i2c_algo_bit i2c_core snd_hda_intel 
snd_hda_controller snd_hda_codec wmi agpgart ehci_pci uhci_hcd ehci_hcd rfkill 
snd_hwdep psmouse snd_pcm snd_timer snd soundcore serio_raw microcode evdev 
button battery ac loop
Mar 13 21:07:00 darkstar kernel: [10579.372004] CPU: 1 PID: 3976 Comm: ps Not 
tainted 3.14.0-rc6-next-20140312-smp #8
Mar 13 21:07:00 darkstar kernel: [10579.372004] Hardware name: Acer   
Aspire 2930 /Aspire 2930 , BIOS V1.14 11/11/2008
Mar 13 21:07:00 darkstar kernel: [10579.372004] task: eb7d4440 ti: efdc8000 
task.ti: efdc8000
Mar 13 21:07:00 darkstar kernel: [10579.372004] EIP: 0060:[] EFLAGS: 
00010213 CPU: 1
Mar 13 21:07:00 darkstar kernel: [10579.372004] EIP is at 
follow_page_mask+0x27/0x440
Mar 13 21:07:00 darkstar kernel: [10579.372004] EAX:  EBX: f330b060 
ECX: 0016 EDX: 0002
Mar 13 21:07:00 darkstar kernel: [10579.372004] ESI: bfdfc3cc EDI: f3c0c380 
EBP: efdc9e0c ESP: efdc9dd4
Mar 13 21:07:00 darkstar kernel: [10579.372004]  DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Mar 13 21:07:00 darkstar kernel: [10579.372004] CR0: 8005003b CR2: 0010 
CR3: 338a8000 CR4: 07f0
Mar 13 21:07:00 darkstar kernel: [10579.372004] Stack:
Mar 13 21:07:00 darkstar kernel: [10579.372004]  c1b114a9 0016 71d3e7e3 
0780 8001 261b0067 efdc9e54 fffa3fe0
Mar 13 21:07:00 darkstar kernel: [10579.372004]  f330b060 0016 bfdfc3cc 
0016 f330b060 eb7d4440 efdc9e54 c111c9ff
Mar 13 21:07:00 darkstar kernel: [10579.372004]  efdc9e44 f76b5374  
  c114a1f8 f76b5d0c eb7d4440
Mar 13 21:07:00 darkstar kernel: [10579.372004] Call Trace:
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] ? 
apic_timer_interrupt+0x2d/0x34
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
__get_user_pages.part.89+0xcf/0x470
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] ? 
__inode_permission+0x48/0xb0
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
__get_user_pages+0x3c/0x60
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
get_user_pages+0x5c/0x70
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
__access_remote_vm+0xbe/0x170
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
access_process_vm+0x45/0x80
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
get_cmdline+0x75/0x100
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
proc_pid_cmdline+0x12/0x20
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
proc_info_read+0x87/0xd0
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] ? 
pid_getattr+0xe0/0xe0
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
vfs_read+0x85/0x160
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] SyS_read+0x4c/0x90
Mar 13 21:07:00 darkstar kernel: [10579.372004]  [] 
syscall_call+0x7/0xb
Mar 13 21:07:00 darkstar kernel: [10579.372004] Code: 00 00 00 00 55 89 e5 57 
56 53 83 ec 2c 66 66 66 66 90 8b 78 20 89 c3 8b 45 08 89 d6 c1 ea 1e 89 4d ec 
c7 00 00 00 00 00 8b 47 20 <8b> 0c d0 8b 44 d0 04 89 c2 09 ca 0f 84 12 01 00 00 
89 ca 25 00
Mar 13 21:07:00 darkstar kernel: [10579.372004] EIP: [] 
follow_page_mask+0x27/0x440 SS:ESP 0068:efdc9dd4
Mar 13 21:07:00 darkstar kernel: [10579.372004] CR2: 0010
Mar 13 21:07:00 darkstar kernel: [10579.382958] ---[ end trace 1c7459dcda289e35 
]---
===

Hopefully it will be useful to someone. Let me know if any further information 
is needed.

Michalis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http:/

Re: [Intel-gfx] i915: black screen after boot on 915GM (Linux >= 3.4-rc1)

2014-03-13 Thread Jani Nikula

On Thu, 13 Mar 2014, Krzysztof Mazur  wrote:
> Hi,
>
> The commit 3f2dc5ac05714711fc14f2bf0ee5e42d5c08c581 (drm/i915: Fix 915GM
> self-refresh enable/disable) causes strange regression on the HP Compaq
> nc6120. During and after boot to framebuffer console with just LVDS
> the screen is black (backlight on, but black). Starting a X server fixes
> the problem. Connecting an external VGA monitor also fixes the problem.
>
> Reverting this commit fixes the problem at least up to Linux 3.14.0-rc6.
>
> I'm still using ACPI video problem workaround that effectively does:
>
>  static int acpi_video_bus_start_devices(struct acpi_video_bus *video)
>  {
> - return acpi_video_bus_DOS(video, 0,
> + return acpi_video_bus_DOS(video, 3,
> acpi_osi_is_win8() ? 1 : 0);
>  }
>
> in drivers/acpi/video.c.
>
> I've added dmesgs from the last good commit and the bad commit at:
> https://bugs.freedesktop.org/show_bug.cgi?id=76103

We track this on the bugzilla. Please add any new information (such as
the other patches you have) there. Thanks.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 10/52] arm, kvm: Fix CPU hotplug callback registration

2014-03-13 Thread Srivatsa S. Bhat

On 03/13/2014 04:51 AM, Christoffer Dall wrote:
> On Tue, Mar 11, 2014 at 02:05:38AM +0530, Srivatsa S. Bhat wrote:
>> Subsystems that want to register CPU hotplug callbacks, as well as perform
>> initialization for the CPUs that are already online, often do it as shown
>> below:
>>
>>  get_online_cpus();
>>
>>  for_each_online_cpu(cpu)
>>  init_cpu(cpu);
>>
>>  register_cpu_notifier(&foobar_cpu_notifier);
>>
>>  put_online_cpus();
>>
>> This is wrong, since it is prone to ABBA deadlocks involving the
>> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
>> with CPU hotplug operations).
>>
>> Instead, the correct and race-free way of performing the callback
>> registration is:
>>
>>  cpu_notifier_register_begin();
>>
>>  for_each_online_cpu(cpu)
>>  init_cpu(cpu);
>>
>>  /* Note the use of the double underscored version of the API */
>>  __register_cpu_notifier(&foobar_cpu_notifier);
>>
>>  cpu_notifier_register_done();
>>
>>
>> Fix the kvm code in arm by using this latter form of callback registration.
>>
>> Cc: Christoffer Dall 
>> Cc: Gleb Natapov 
>> Cc: Russell King 
>> Cc: Ingo Molnar 
>> Cc: kvm...@lists.cs.columbia.edu
>> Cc: k...@vger.kernel.org
>> Cc: linux-arm-ker...@lists.infradead.org
>> Acked-by: Paolo Bonzini 
>> Signed-off-by: Srivatsa S. Bhat 
>> ---
>>
>>  arch/arm/kvm/arm.c |7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index bd18bb8..f0e50a0 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -1051,21 +1051,26 @@ int kvm_arch_init(void *opaque)
>>  }
>>  }
>>  
>> +cpu_notifier_register_begin();
>> +
>>  err = init_hyp_mode();
>>  if (err)
>>  goto out_err;
>>  
>> -err = register_cpu_notifier(&hyp_init_cpu_nb);
>> +err = __register_cpu_notifier(&hyp_init_cpu_nb);
>>  if (err) {
>>  kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
>>  goto out_err;
>>  }
>>  
>> +cpu_notifier_register_done();
>> +
>>  hyp_cpu_pm_init();
>>  
>>  kvm_coproc_table_init();
>>  return 0;
>>  out_err:
>> +cpu_notifier_register_done();
>>  return err;
>>  }
>>  
>>
> 
> Just so we're clear, the existing code was simply racy as not prone to
> deadlocks, right?
> 
> This makes it clear that the test above for compatible CPUs can be quite
> easily evaded by using CPU hotplug, but we don't really have a good
> solution for handling that yet...  Hmmm, grumble grumble, I guess if you
> hotplug unsupported CPUs on a KVM/ARM system for now, stuff will break.
> 

In this particular case, there was no deadlock possibility, rather the
existing code had insufficient synchronization against CPU hotplug.

init_hyp_mode() would invoke cpu_init_hyp_mode() on currently online CPUs
using on_each_cpu(). If a CPU came online after this point and before calling
register_cpu_notifier(), that CPU would remain uninitialized because this
subsystem would miss the hot-online event. This patch fixes this bug and
also uses the new synchronization method (instead of get/put_online_cpus())
to ensure that we don't deadlock with CPU hotplug.

> In any case:
> Acked-by: Christoffer Dall 
> 

Thanks a lot!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] ACPI: Don't re-select SBS battery if it's already selected

2014-03-13 Thread Lan Tianyu

2014-03-12 6:20 GMT+08:00 Matthew Garrett :
> The existing SBS code explicitly sets the selected battery in the SBS
> manager regardless of whether the battery in question is already selected.
> This causes bus timeouts on Apple hardware. Check for this case and avoid
> it.
>

Hi Matthew:
   This patch is to avoid a redundant battery select operation when
the battery is selected. But the symptom "bus timeouts" is a bus transaction
issue, right?  Will this happen during other SBS write/read operations? Do we
need to increase the wait time of SMBUS transaction?

> Signed-off-by: Matthew Garrett 
> ---
>  drivers/acpi/sbs.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/acpi/sbs.c b/drivers/acpi/sbs.c
> index dbd4849..c386505 100644
> --- a/drivers/acpi/sbs.c
> +++ b/drivers/acpi/sbs.c
> @@ -470,17 +470,25 @@ static struct device_attribute alarm_attr = {
>  static int acpi_battery_read(struct acpi_battery *battery)
>  {
> int result = 0, saved_present = battery->present;
> -   u16 state;
> +   u16 state, selected, desired;
>
> if (battery->sbs->manager_present) {
> result = acpi_smbus_read(battery->sbs->hc, SMBUS_READ_WORD,
> ACPI_SBS_MANAGER, 0x01, (u8 *)&state);
> if (!result)
> battery->present = state & (1 << battery->id);
> -   state &= 0x0fff;
> -   state |= 1 << (battery->id + 12);
> -   acpi_smbus_write(battery->sbs->hc, SMBUS_WRITE_WORD,
> - ACPI_SBS_MANAGER, 0x01, (u8 *)&state, 2);
> +   /*
> +* Don't switch battery if the correct one is already selected
> +*/
> +   selected = state & 0xf000;
> +   desired = 1 << (battery->id + 12);
> +   if (selected != desired) {
> +   state &= 0x0fff;
> +   state |= desired;
> +   acpi_smbus_write(battery->sbs->hc, SMBUS_WRITE_WORD,
> +ACPI_SBS_MANAGER, 0x01,
> +(u8 *)&state, 2);
> +   }
> } else if (battery->id == 0)
> battery->present = 1;
> if (result || !battery->present)
> --
> 1.8.5.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Zynq macb

2014-03-13 Thread Michal Simek

On 03/13/2014 11:33 PM, Sören Brinkmann wrote:
> On Thu, 2014-03-13 at 03:16PM -0700, Sören Brinkmann wrote:
>> Hi Nicolas,
>>
>> I did some testing on the current linux-next tree and ran iperf on Zynq.
>> It seems that network and even the whole system can collapse when doing
>> that.
>> I don't really know what's going on, but once I saw the message:
>>  "inconsistent Rx descriptor chain"
>> printed twice (system frozen afterwards).
>>
>> I don't know what exactly is going wrong, but suspect something around
>> memory/DMA. I have no clue whether it makes any sense or not, but I
>> tried using the macb_* functions instead of the gem_* ones (see diff below).
>> That seems to result in a stable system and working Ethernet.
> 
> That was a little too early. After roughly 25 minutest the system runs
> into a deadlock:
>   BUG: spinlock lockup suspected on CPU#1, iperf/774
>lock: 0xeda0366c, .magic: dead4ead, .owner: swapper/0/0, .owner_cpu: 0
>   CPU: 1 PID: 774 Comm: iperf Tainted: GW
> 3.14.0-rc6-next-20140312-xilinx-dirty #41
>   [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>   [] (show_stack) from [] (dump_stack+0x80/0xcc)
>   [] (dump_stack) from [] (do_raw_spin_lock+0xd4/0x190)
>   [] (do_raw_spin_lock) from [] 
> (_raw_spin_lock_irqsave+0x58/0x64)
>   [] (_raw_spin_lock_irqsave) from [] 
> (macb_start_xmit+0x24/0x2d0)
>   [] (macb_start_xmit) from [] 
> (dev_hard_start_xmit+0x334/0x470)
>   [] (dev_hard_start_xmit) from [] 
> (sch_direct_xmit+0x78/0x2f8)
>   [] (sch_direct_xmit) from [] 
> (__dev_queue_xmit+0x314/0x704)
>   [] (__dev_queue_xmit) from [] 
> (ip_finish_output+0x6c4/0x894)
>   [] (ip_finish_output) from [] (ip_local_out+0x74/0x90)
>   [] (ip_local_out) from [] (ip_queue_xmit+0x400/0x5c4)
>   [] (ip_queue_xmit) from [] 
> (tcp_transmit_skb+0xa18/0xab0)
>   [] (tcp_transmit_skb) from [] (tcp_recvmsg+0x92c/0xae4)
>   [] (tcp_recvmsg) from [] (inet_recvmsg+0x1c0/0x1fc)
>   [] (inet_recvmsg) from [] (sock_recvmsg+0x7c/0x98)
>   [] (sock_recvmsg) from [] (SyS_recvfrom+0x9c/0x108)
>   [] (SyS_recvfrom) from [] (sys_recv+0x14/0x18)
>   [] (sys_recv) from [] (ret_fast_syscall+0x0/0x48)

Do you have this change in your tree?
https://github.com/Xilinx/linux-xlnx/commit/1a85939af40acca2bf963407b497cc31c303ff3e

I don't think we have sent this to mainline yet.

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v6 5/7] arm64: ftrace: Add dynamic ftrace support

2014-03-13 Thread AKASHI Takahiro


Thank you for you clarification, Steven.

-Takahiro AKASHI

On 03/14/2014 03:33 AM, Steven Rostedt wrote:

On Thu, 2014-03-13 at 18:10 +, Will Deacon wrote:



+#else /* CONFIG_DYNAMIC_FTRACE */
+/*
+ * _mcount() is used to build the kernel with -pg option, but all the branch
+ * instructions to _mcount() are replaced to NOP initially at kernel start up,
+ * and later on, NOP to branch to ftrace_caller() when enabled or branch to
+ * NOP when disabled per-function base.
+ */
+ENTRY(_mcount)
+   ret
+ENDPROC(_mcount)


Judging by your comment then, this should never be called. Is that right? If
so, we could add a BUG-equivalent so we know if we missed an mcount during
patching.


Actually, it can be called before the change to nops are done in early
boot. This is done very early, but everything before ftrace_init() in
init/main.c can still call _mcount.



+   /*
+* Note:
+* Due to modules and __init, code can disappear and change,
+* we need to protect against faulting as well as code changing.
+* We do this by aarch64_insn_*() which use the probe_kernel_*().
+*
+* No lock is held here because all the modifications are run
+* through stop_machine().
+*/
+   if (validate) {
+   if (aarch64_insn_read((void *)pc, &replaced))
+   return -EFAULT;
+
+   if (replaced != old)
+   return -EINVAL;
+   }
+   if (aarch64_insn_patch_text_nosync((void *)pc, new))
+   return -EPERM;


I think you're better off propagating the errors here, rather than
overriding them with EFAULT/EINVAL/EPERM.


The ftrace generic code expects to see these specific errors. Look at
ftrace_bug() in kernel/trace/ftrace.c.




+
+   return 0;
+}
+
+/*
+ * Replace tracer function in ftrace_caller()
+ */
+int ftrace_update_ftrace_func(ftrace_func_t func)
+{
+   unsigned long pc;
+   unsigned int new;
+
+   pc = (unsigned long)&ftrace_call;
+   new = aarch64_insn_gen_branch_imm(pc, (unsigned long)func, true);
+
+   return ftrace_modify_code(pc, 0, new, false);
+}
+
+/*
+ * Turn on the call to ftrace_caller() in instrumented function
+ */
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+   unsigned long pc = rec->ip;
+   unsigned int old, new;
+
+   old = aarch64_insn_gen_nop();
+   new = aarch64_insn_gen_branch_imm(pc, addr, true);
+
+   return ftrace_modify_code(pc, old, new, true);
+}
+
+/*
+ * Turn off the call to ftrace_caller() in instrumented function
+ */
+int ftrace_make_nop(struct module *mod,
+   struct dyn_ftrace *rec, unsigned long addr)
+{
+   unsigned long pc = rec->ip;
+   unsigned int old, new;
+
+   old = aarch64_insn_gen_branch_imm(pc, addr, true);
+   new = aarch64_insn_gen_nop();
+
+   return ftrace_modify_code(pc, old, new, true);
+}
+
+int __init ftrace_dyn_arch_init(void *data)
+{
+   *(unsigned long *)data = 0;
+
+   return 0;
+}
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
  /*
   * function_graph tracer expects ftrace_return_to_handler() to be called
@@ -61,4 +144,34 @@ void prepare_ftrace_return(unsigned long *parent, unsigned 
long self_addr,
return;
}
  }
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+/*
+ * Turn on/off the call to ftrace_graph_caller() in ftrace_caller()
+ * depending on @enable.
+ */
+static int ftrace_modify_graph_caller(bool enable)
+{
+   unsigned long pc = (unsigned long)&ftrace_graph_call;
+   unsigned int branch, nop, old, new;
+
+   branch = aarch64_insn_gen_branch_imm(pc,
+   (unsigned long)ftrace_graph_caller, false);
+   nop = aarch64_insn_gen_nop();
+   old = enable ? nop : branch;
+   new = enable ? branch : nop;
+
+   return ftrace_modify_code(pc, old, new, true);


You could rewrite this as:

if (enable)
return ftrace_modify_code(pc, nop, branch, true);
else
return ftrace_modify_code(pc, branch, nop, true);

which I find easier to read.


Heh, maybe that could be updated in other archs too. I'll have to think
about that one.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 5/7] arm64: ftrace: Add dynamic ftrace support

2014-03-13 Thread AKASHI Takahiro


On 03/14/2014 03:10 AM, Will Deacon wrote:

On Thu, Mar 13, 2014 at 10:13:48AM +, AKASHI Takahiro wrote:

This patch allows "dynamic ftrace" if CONFIG_DYNAMIC_FTRACE is enabled.
Here we can turn on and off tracing dynamically per-function base.

On arm64, this is done by patching single branch instruction to _mcount()
inserted by gcc -pg option. The branch is replaced to NOP initially at
kernel start up, and later on, NOP to branch to ftrace_caller() when
enabled or branch to NOP when disabled.
Please note that ftrace_caller() is a counterpart of _mcount() in case of
'static' ftrace.

More details on architecture specific requirements are described in
Documentation/trace/ftrace-design.txt.

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/Kconfig   |1 +
  arch/arm64/include/asm/ftrace.h  |   15 +
  arch/arm64/kernel/entry-ftrace.S |   43 +++
  arch/arm64/kernel/ftrace.c   |  113 ++
  4 files changed, 172 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6b3fef6..6954959 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -33,6 +33,7 @@ config ARM64
select HAVE_DMA_API_DEBUG
select HAVE_DMA_ATTRS
select HAVE_DMA_CONTIGUOUS
+   select HAVE_DYNAMIC_FTRACE
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 58ea595..ed5c448 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -18,6 +18,21 @@

  #ifndef __ASSEMBLY__
  extern void _mcount(unsigned long);
+
+struct dyn_arch_ftrace {
+   /* No extra data needed for arm64 */
+};
+
+extern unsigned long ftrace_graph_call;
+
+static inline unsigned long ftrace_call_adjust(unsigned long addr)
+{
+   /*
+* addr is the address of the mcount call instruction.
+* recordmcount does the necessary offset calculation.
+*/
+   return addr;
+}


You could just as easily implement this as a dummy macro, but I guess it
doesn't matter either way.


FYI, all archs define this as an inline function.
Leave it as it is.


  #endif /* __ASSEMBLY__ */

  #endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 0ac31c8..c0fbe10 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -86,6 +86,7 @@
add \reg, \reg, #8
.endm

+#ifndef CONFIG_DYNAMIC_FTRACE
  /*
   * void _mcount(unsigned long return_address)
   * @return_address: return address to instrumented function
@@ -134,6 +135,48 @@ skip_ftrace_call:
  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
  ENDPROC(_mcount)

+#else /* CONFIG_DYNAMIC_FTRACE */
+/*
+ * _mcount() is used to build the kernel with -pg option, but all the branch
+ * instructions to _mcount() are replaced to NOP initially at kernel start up,
+ * and later on, NOP to branch to ftrace_caller() when enabled or branch to
+ * NOP when disabled per-function base.
+ */
+ENTRY(_mcount)
+   ret
+ENDPROC(_mcount)


Judging by your comment then, this should never be called. Is that right? If
so, we could add a BUG-equivalent so we know if we missed an mcount during
patching.


Steven explained this.


+/*
+ * void ftrace_caller(unsigned long return_address)
+ * @return_address: return address to instrumented function
+ *
+ * This function is a counterpart of _mcount() in 'static' ftrace, and
+ * makes calls to:
+ * - tracer function to probe instrumented function's entry,
+ * - ftrace_graph_caller to set up an exit hook
+ */
+ENTRY(ftrace_caller)
+   mcount_enter
+
+   mcount_get_pc0  x0  // function's pc
+   mcount_get_lr   x1  // function's lr
+
+   .global ftrace_call
+ftrace_call:   // tracer(pc, lr);
+   nop // This will be replaced with "bl xxx"
+   // where xxx can be any kind of tracer.
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   .global ftrace_graph_call
+ftrace_graph_call: // ftrace_graph_caller();
+   nop // If enabled, this will be replaced
+   // "b ftrace_graph_caller"
+#endif
+
+   mcount_exit
+ENDPROC(ftrace_caller)
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
  ENTRY(ftrace_stub)
ret
  ENDPROC(ftrace_stub)
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a559ab8..8c26476 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -17,6 +17,89 @@
  #include 
  #include 

+#ifdef CONFIG_DYNAMIC_FTRACE
+/*
+ * Replace a single instruction, which may be a branch or NOP.
+ * If @validate == true, a replaced instruction is checked against 'old'.
+ */
+static int ftrace_modify_code(unsigned long pc, unsigned int old,
+

[PATCH] media: davinci: vpbe: fix build warning

2014-03-13 Thread Lad, Prabhakar

From: "Lad, Prabhakar" 

this patch fixes following build warning
drivers/media/platform/davinci/vpbe_display.c: In function 
'vpbe_start_streaming':
drivers/media/platform/davinci/vpbe_display.c:344: warning: unused variable 
'vpbe_dev'

Signed-off-by: Lad, Prabhakar 
---
 drivers/media/platform/davinci/vpbe_display.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/media/platform/davinci/vpbe_display.c 
b/drivers/media/platform/davinci/vpbe_display.c
index 7a0e40e..b4f12d0 100644
--- a/drivers/media/platform/davinci/vpbe_display.c
+++ b/drivers/media/platform/davinci/vpbe_display.c
@@ -341,7 +341,6 @@ static int vpbe_start_streaming(struct vb2_queue *vq, 
unsigned int count)
 {
struct vpbe_fh *fh = vb2_get_drv_priv(vq);
struct vpbe_layer *layer = fh->layer;
-   struct vpbe_device *vpbe_dev = fh->disp_dev->vpbe_dev;
int ret;
 
/* Get the next frame from the buffer queue */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 27/27] iommu/exynos: enhanced error messages

2014-03-13 Thread Cho KyongHo

Some redundant error message is removed and some error messages
are changed to error level from debug level.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4888383..b7f7731 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1012,7 +1012,7 @@ static void exynos_iommu_detach_device(struct 
iommu_domain *domain,
dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n",
__func__, &pgtable);
} else {
-   dev_dbg(dev, "%s: No IOMMU is attached\n", __func__);
+   dev_err(dev, "%s: No IOMMU is attached\n", __func__);
}
 }
 
@@ -1112,10 +1112,8 @@ static int lv2set_page(sysmmu_pte_t *pent, phys_addr_t 
paddr, size_t size,
short *pgcnt)
 {
if (size == SPAGE_SIZE) {
-   if (!lv2ent_fault(pent)) {
-   WARN(1, "Trying mapping on 4KiB where mapping exists");
+   if (WARN_ON(!lv2ent_fault(pent)))
return -EADDRINUSE;
-   }
 
*pent = mk_lv2ent_spage(paddr);
pgtable_flush(pent, pent + 1);
@@ -1123,9 +1121,7 @@ static int lv2set_page(sysmmu_pte_t *pent, phys_addr_t 
paddr, size_t size,
} else { /* size == LPAGE_SIZE */
int i;
for (i = 0; i < SPAGES_PER_LPAGE; i++, pent++) {
-   if (!lv2ent_fault(pent)) {
-   WARN(1,
-   "Trying mapping on 64KiB where mapping exists");
+   if (WARN_ON(!lv2ent_fault(pent))) {
if (i > 0)
memset(pent - i, 0, sizeof(*pent) * i);
return -EADDRINUSE;
@@ -1198,8 +1194,8 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long l_iova,
}
 
if (ret)
-   pr_debug("%s: Failed to map iova %#x/%#zx bytes\n",
-   __func__, iova, size);
+   pr_err("%s: Failed(%d) to map %#zx bytes @ %#x\n",
+   __func__, ret, size, iova);
 
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
@@ -1236,7 +1232,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
ent = section_entry(priv->pgtable, iova);
 
if (lv1ent_section(ent)) {
-   if (size < SECT_SIZE) {
+   if (WARN_ON(size < SECT_SIZE)) {
err_pgsize = SECT_SIZE;
goto err;
}
@@ -1271,7 +1267,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
}
 
/* lv1ent_large(ent) == true here */
-   if (size < LPAGE_SIZE) {
+   if (WARN_ON(size < LPAGE_SIZE)) {
err_pgsize = LPAGE_SIZE;
goto err;
}
@@ -1290,9 +1286,8 @@ done:
 err:
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
-   WARN(1,
-   "%s: Failed due to size(%#zx) @ %#x is smaller than page size %#zx\n",
-   __func__, size, iova, err_pgsize);
+   pr_err("%s: Failed: size(%#zx) @ %#x is smaller than page size %#zx\n",
+   __func__, size, iova, err_pgsize);
 
return 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 25/27] iommu/exynos: use simpler function to get MMU version

2014-03-13 Thread Cho KyongHo

This commit changes the function to get MMU version simpler.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 6e716cc..3d4dabb 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -251,24 +251,6 @@ static unsigned int __raw_sysmmu_version(struct 
sysmmu_drvdata *data)
return MMU_RAW_VER(__raw_readl(data->sfrbase + REG_MMU_VERSION));
 }
 
-static unsigned int __sysmmu_version(struct sysmmu_drvdata *data,
-unsigned int *minor)
-{
-   unsigned int ver = 0;
-
-   ver = __raw_sysmmu_version(data);
-   if (ver > MAKE_MMU_VER(3, 3)) {
-   dev_err(data->sysmmu, "%s: version(%d.%d) is higher than 3.3\n",
-   __func__, MMU_MAJ_VER(ver), MMU_MIN_VER(ver));
-   BUG();
-   }
-
-   if (minor)
-   *minor = MMU_MIN_VER(ver);
-
-   return MMU_MAJ_VER(ver);
-}
-
 static bool sysmmu_block(void __iomem *sfrbase)
 {
int i = 120;
@@ -422,13 +404,13 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
 static void __sysmmu_init_config(struct sysmmu_drvdata *data)
 {
unsigned int cfg = CFG_LRU | CFG_QOS(15);
-   int maj, min = 0;
+   unsigned int ver;
 
-   maj = __sysmmu_version(data, &min);
-   if (maj == 3) {
-   if (min >= 2) {
+   ver = __raw_sysmmu_version(data);
+   if (MMU_MAJ_VER(ver) == 3) {
+   if (MMU_MIN_VER(ver) >= 2) {
cfg |= CFG_FLPDCACHE;
-   if (min == 3) {
+   if (MMU_MIN_VER(ver) == 3) {
cfg |= CFG_ACGEN;
cfg &= ~CFG_LRU;
} else {
@@ -583,7 +565,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
sysmmu_iova_t iova,
 * 1MB page can be cached in one of all sets.
 * 64KB page can be one of 16 consecutive sets.
 */
-   if (__sysmmu_version(data, NULL) == 2)
+   if (MMU_MAJ_VER(__raw_sysmmu_version(data)) == 2)
num_inv = min_t(unsigned int,
size / PAGE_SIZE, 64);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Question for ARM926T mtune values in ARM Makfile

2014-03-13 Thread V JobNickname

For ARM926T, the value in the Makefile are assigned to arm9tdmi.
Actually all ARM9xxx are assigned by value arm9tdmi.

But in some cross compile GCC toolchains, such as from Buildroot, they
have different specified tune value
for example, in buildroot GCC

config BR2_GCC_TARGET_TUNE
...
default arm920 if BR2_arm920
default arm920t if BR2_arm920t
default arm922t if BR2_arm922t
default arm926ej-s if BR2_arm926t
default arm1136j-s if BR2_arm1136j_s
default arm1136jf-s if BR2_arm1136jf_s
default arm1176jz-s if BR2_arm1176jz_s
default arm1176jzf-s if BR2_arm1176jzf_s
...

and by "man gcc"
-mtune is similar to mcpu
and in mcpu list includes arm9tdmi and arm926ej-s

Should mtune value in Kernel ARM Makefile for ARM926T be modified to
tune-$(CONFIG_CPU_ARM926T)  =-mtune=arm926ej-s

and will this have better instruction tune by GCC for ARM926T?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 26/27] iommu/exynos: apply workaround of caching fault page table entries

2014-03-13 Thread Cho KyongHo

This patch contains 2 workaround for the System MMU v3.x.

System MMU v3.2 and v3.3 has FLPD cache that caches first level page
table entries to reduce page table walking latency. However, the
FLPD cache is filled with a first level page table entry even though
it is not accessed by a master H/W because System MMU v3.3
speculatively prefetches page table entries that may be accessed
in the near future by the master H/W.
The prefetched FLPD cache entries are not invalidated by iommu_unmap()
because iommu_unmap() only unmaps and invalidates the page table
entries that is mapped.

Because exynos-iommu driver discards a second level page table when
it needs to be replaced with another second level page table or
a first level page table entry with 1MB mapping, It is required to
invalidate FLPD cache that may contain the first level page table
entry that points to the second level page table.

Another workaround of System MMU v3.3 is initializing the first level
page table entries with the second level page table which is filled
with all zeros. This prevents System MMU prefetches 'fault' first
level page table entry which may lead page fault on access to 16MiB
wide.

System MMU 3.x fetches consecutive page table entries by a page
table walking to maximize bus utilization and to minimize TLB miss
panelty.
Unfortunately, functional problem is raised with the fetching behavior
because it fetches 'fault' page table entries that specifies no
translation information and that a valid translation information will
be written to in the near future. The logic in the System MMU generates
page fault with the cached fault entries that is no longer coherent
with the page table which is updated.

There is another workaround that must be implemented by I/O virtual
memory manager: any two consecutive I/O virtual memory area must have
a hole between the two that is larger than or equal to 128KiB.
Also, next I/O virtual memory area must be started from the next
128KiB boundary.

0128K   256K   384K 512K
|-|---|-||
|area1>|.hole...|<--- area2 -

The constraint is depicted above.
The size is selected by the calculation followed:
 - System MMU can fetch consecutive 64 page table entries at once
   64 * 4KiB = 256KiB. This is the size between 128K ~ 384K of the
   above picture. This style of fetching is 'block fetch'. It fetches
   the page table entries predefined consecutive page table entries
   including the entry that is the reason of the page table walking.
 - System MMU can prefetch upto consecutive 32 page table entries.
   This is the size between 256K ~ 384K.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  164 +-
 1 file changed, 147 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 3d4dabb..4888383 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -47,8 +47,12 @@
 #define LPAGE_MASK (~(LPAGE_SIZE - 1))
 #define SPAGE_MASK (~(SPAGE_SIZE - 1))
 
-#define lv1ent_fault(sent) (((*(sent) & 3) == 0) || ((*(sent) & 3) == 3))
-#define lv1ent_page(sent) ((*(sent) & 3) == 1)
+#define lv1ent_fault(sent) ((*(sent) == ZERO_LV2LINK) || \
+  ((*(sent) & 3) == 0) || ((*(sent) & 3) == 3))
+#define lv1ent_zero(sent) (*(sent) == ZERO_LV2LINK)
+#define lv1ent_page_zero(sent) ((*(sent) & 3) == 1)
+#define lv1ent_page(sent) ((*(sent) != ZERO_LV2LINK) && \
+ ((*(sent) & 3) == 1))
 #define lv1ent_section(sent) ((*(sent) & 3) == 2)
 
 #define lv2ent_fault(pent) ((*(pent) & 3) == 0)
@@ -137,6 +141,8 @@ typedef u32 sysmmu_iova_t;
 typedef u32 sysmmu_pte_t;
 
 static struct kmem_cache *lv2table_kmem_cache;
+static sysmmu_pte_t *zero_lv2_table;
+#define ZERO_LV2LINK mk_lv1ent_page(virt_to_phys(zero_lv2_table))
 
 static sysmmu_pte_t *section_entry(sysmmu_pte_t *pgtable, sysmmu_iova_t iova)
 {
@@ -538,6 +544,33 @@ static bool exynos_sysmmu_disable(struct device *dev)
return disabled;
 }
 
+static void __sysmmu_tlb_invalidate_flpdcache(struct sysmmu_drvdata *data,
+ sysmmu_iova_t iova)
+{
+   if (__raw_sysmmu_version(data) == MAKE_MMU_VER(3, 3))
+   __raw_writel(iova | 0x1, data->sfrbase + REG_MMU_FLUSH_ENTRY);
+}
+
+static void sysmmu_tlb_invalidate_flpdcache(struct device *dev,
+   sysmmu_iova_t iova)
+{
+   struct sysmmu_list_data *list;
+
+   for_each_sysmmu_list(dev, list) {
+   unsigned long flags;
+   struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
+
+   __master_clk_enable(data);
+
+   spin_lock_irqsave(&data->lock, flags);
+   if (is_sysmmu_active(data) && data->runtime_active)
+   __sysmmu_tlb_invalidate_flpdcache

[PATCH v11 24/27] iommu/exynos: use exynos-iommu specific typedef

2014-03-13 Thread Cho KyongHo

This commit introduces sysmmu_pte_t for page table entries and
sysmmu_iova_t vor I/O virtual address that is manipulated by
exynos-iommu driver. The purpose of the typedef is to remove
dependencies to the driver code from the change of CPU architecture
from 32 bit to 64 bit.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  103 ++
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index e375501..6e716cc 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -56,19 +56,19 @@
 #define lv2ent_large(pent) ((*(pent) & 3) == 1)
 
 #define section_phys(sent) (*(sent) & SECT_MASK)
-#define section_offs(iova) ((iova) & 0xF)
+#define section_offs(iova) ((sysmmu_iova_t)(iova) & 0xF)
 #define lpage_phys(pent) (*(pent) & LPAGE_MASK)
-#define lpage_offs(iova) ((iova) & 0x)
+#define lpage_offs(iova) ((sysmmu_iova_t)(iova) & 0x)
 #define spage_phys(pent) (*(pent) & SPAGE_MASK)
-#define spage_offs(iova) ((iova) & 0xFFF)
+#define spage_offs(iova) ((sysmmu_iova_t)(iova) & 0xFFF)
 
-#define lv1ent_offset(iova) ((iova) >> SECT_ORDER)
-#define lv2ent_offset(iova) (((iova) & 0xFF000) >> SPAGE_ORDER)
+#define lv1ent_offset(iova) ((sysmmu_iova_t)(iova) >> SECT_ORDER)
+#define lv2ent_offset(iova) (((sysmmu_iova_t)(iova) & 0xFF000) >> SPAGE_ORDER)
 
 #define NUM_LV1ENTRIES 4096
-#define NUM_LV2ENTRIES 256
+#define NUM_LV2ENTRIES (SECT_SIZE / SPAGE_SIZE)
 
-#define LV2TABLE_SIZE (NUM_LV2ENTRIES * sizeof(long))
+#define LV2TABLE_SIZE (NUM_LV2ENTRIES * sizeof(sysmmu_pte_t))
 
 #define SPAGES_PER_LPAGE (LPAGE_SIZE / SPAGE_SIZE)
 
@@ -133,16 +133,19 @@
&((struct exynos_iommu_owner *)dev->archdata.iommu)->mmu_list, \
entry)
 
+typedef u32 sysmmu_iova_t;
+typedef u32 sysmmu_pte_t;
+
 static struct kmem_cache *lv2table_kmem_cache;
 
-static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
+static sysmmu_pte_t *section_entry(sysmmu_pte_t *pgtable, sysmmu_iova_t iova)
 {
return pgtable + lv1ent_offset(iova);
 }
 
-static unsigned long *page_entry(unsigned long *sent, unsigned long iova)
+static sysmmu_pte_t *page_entry(sysmmu_pte_t *sent, sysmmu_iova_t iova)
 {
-   return (unsigned long *)phys_to_virt(
+   return (sysmmu_pte_t *)phys_to_virt(
lv2table_base(sent)) + lv2ent_offset(iova);
 }
 
@@ -194,7 +197,7 @@ struct exynos_iommu_owner {
 
 struct exynos_iommu_domain {
struct list_head clients; /* list of sysmmu_drvdata.node */
-   unsigned long *pgtable; /* lv1 page table, 16KB */
+   sysmmu_pte_t *pgtable; /* lv1 page table, 16KB */
short *lv2entcnt; /* free lv2 entry counter for each section */
spinlock_t lock; /* lock for this structure */
spinlock_t pgtablelock; /* lock for modifying page table @ pgtable */
@@ -288,7 +291,7 @@ static void __sysmmu_tlb_invalidate(void __iomem *sfrbase)
 }
 
 static void __sysmmu_tlb_invalidate_entry(void __iomem *sfrbase,
-   unsigned long iova, unsigned int num_inv)
+   sysmmu_iova_t iova, unsigned int num_inv)
 {
unsigned int i;
for (i = 0; i < num_inv; i++) {
@@ -299,7 +302,7 @@ static void __sysmmu_tlb_invalidate_entry(void __iomem 
*sfrbase,
 }
 
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
-  unsigned long pgd)
+  phys_addr_t pgd)
 {
__raw_writel(pgd, sfrbase + REG_PT_BASE_ADDR);
 
@@ -308,22 +311,22 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
 
 static void show_fault_information(const char *name,
enum exynos_sysmmu_inttype itype,
-   phys_addr_t pgtable_base, unsigned long fault_addr)
+   phys_addr_t pgtable_base, sysmmu_iova_t fault_addr)
 {
-   unsigned long *ent;
+   sysmmu_pte_t *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at %#lx by %s(Page table base: %pa)\n",
+   pr_err("%s occurred at %#x by %s(Page table base: %pa)\n",
sysmmu_fault_name[itype], fault_addr, name, &pgtable_base);
 
ent = section_entry(phys_to_virt(pgtable_base), fault_addr);
-   pr_err("\tLv1 entry: 0x%lx\n", *ent);
+   pr_err("\tLv1 entry: %#x\n", *ent);
 
if (lv1ent_page(ent)) {
ent = page_entry(ent, fault_addr);
-   pr_err("\t Lv2 entry: 0x%lx\n", *ent);
+   pr_err("\t Lv2 entry: %#x\n", *ent);
}
 }
 
@@ -332,7 +335,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
/* SYSMMU is in blocked when interrupt occurred. */
struct sysmmu_drvdata *data = dev_id;
enum exynos_sysmmu_inttype itype;
-   unsigned long addr = -1;
+   sysmmu_iova_t addr = -1;

[PATCH v11 23/27] iommu/exynos: fix address handling

2014-03-13 Thread Cho KyongHo

Use of __pa and __va macro is changed to virt_to_phys and phys_to_virt
which are recommended in driver code. printk formatting of physical
address is also fixed to %pa.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   45 +++---
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 2beb197..e375501 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -142,7 +142,8 @@ static unsigned long *section_entry(unsigned long *pgtable, 
unsigned long iova)
 
 static unsigned long *page_entry(unsigned long *sent, unsigned long iova)
 {
-   return (unsigned long *)__va(lv2table_base(sent)) + lv2ent_offset(iova);
+   return (unsigned long *)phys_to_virt(
+   lv2table_base(sent)) + lv2ent_offset(iova);
 }
 
 enum exynos_sysmmu_inttype {
@@ -215,7 +216,7 @@ struct sysmmu_drvdata {
struct iommu_domain *domain;
bool runtime_active;
bool suspended;
-   unsigned long pgtable;
+   phys_addr_t pgtable;
 };
 
 static bool set_sysmmu_active(struct sysmmu_drvdata *data)
@@ -307,17 +308,17 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
 
 static void show_fault_information(const char *name,
enum exynos_sysmmu_inttype itype,
-   unsigned long pgtable_base, unsigned long fault_addr)
+   phys_addr_t pgtable_base, unsigned long fault_addr)
 {
unsigned long *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at 0x%lx by %s(Page table base: 0x%lx)\n",
-   sysmmu_fault_name[itype], fault_addr, name, pgtable_base);
+   pr_err("%s occurred at %#lx by %s(Page table base: %pa)\n",
+   sysmmu_fault_name[itype], fault_addr, name, &pgtable_base);
 
-   ent = section_entry(__va(pgtable_base), fault_addr);
+   ent = section_entry(phys_to_virt(pgtable_base), fault_addr);
pr_err("\tLv1 entry: 0x%lx\n", *ent);
 
if (lv1ent_page(ent)) {
@@ -917,7 +918,7 @@ static void exynos_iommu_domain_destroy(struct iommu_domain 
*domain)
for (i = 0; i < NUM_LV1ENTRIES; i++)
if (lv1ent_page(priv->pgtable + i))
kmem_cache_free(lv2table_kmem_cache,
-   __va(lv2table_base(priv->pgtable + i)));
+   phys_to_virt(lv2table_base(priv->pgtable + i)));
 
free_pages((unsigned long)priv->pgtable, 2);
free_pages((unsigned long)priv->lv2entcnt, 1);
@@ -931,11 +932,12 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*domain,
struct exynos_iommu_owner *owner = dev->archdata.iommu;
struct exynos_iommu_domain *priv = domain->priv;
unsigned long flags;
+   phys_addr_t pgtable = virt_to_phys(priv->pgtable);
int ret;
 
spin_lock_irqsave(&priv->lock, flags);
 
-   ret = __exynos_sysmmu_enable(dev, __pa(priv->pgtable), domain);
+   ret = __exynos_sysmmu_enable(dev, pgtable, domain);
if (ret == 0) {
list_add_tail(&owner->client, &priv->clients);
owner->domain = domain;
@@ -943,13 +945,14 @@ static int exynos_iommu_attach_device(struct iommu_domain 
*domain,
 
spin_unlock_irqrestore(&priv->lock, flags);
 
-   if (ret < 0)
-   dev_err(dev, "%s: Failed to attach IOMMU with pgtable %#x\n",
-   __func__, __pa(priv->pgtable));
-   else
-   dev_dbg(dev, "%s: Attached IOMMU with pgtable 0x%x%s\n",
-   __func__, __pa(priv->pgtable),
-   (ret == 0) ? "" : ", again");
+   if (ret < 0) {
+   dev_err(dev, "%s: Failed to attach IOMMU with pgtable %pa\n",
+   __func__, &pgtable);
+   return ret;
+   }
+
+   dev_dbg(dev, "%s: Attached IOMMU with pgtable %pa %s\n",
+   __func__, &pgtable, (ret == 0) ? "" : ", again");
 
return ret;
 }
@@ -975,11 +978,13 @@ static void exynos_iommu_detach_device(struct 
iommu_domain *domain,
 
spin_unlock_irqrestore(&priv->lock, flags);
 
-   if (owner == dev->archdata.iommu)
-   dev_dbg(dev, "%s: Detached IOMMU with pgtable %#x\n",
-   __func__, __pa(priv->pgtable));
-   else
+   if (owner == dev->archdata.iommu) {
+   phys_addr_t pgtable = virt_to_phys(priv->pgtable);
+   dev_dbg(dev, "%s: Detached IOMMU with pgtable %pa\n",
+   __func__, &pgtable);
+   } else {
dev_dbg(dev, "%s: No IOMMU is attached\n", __func__);
+   }
 }
 
 static unsigned long *alloc_lv2entry(unsigned long *sent, unsigned long iova,
@@ -998,7 +1003,7 @@ static unsigned l

[PATCH v11 22/27] iommu/exynos: add devices attached to the System MMU to an IOMMU group

2014-03-13 Thread Cho KyongHo

Patch written by Antonios Motakis :

IOMMU groups are expected by certain users of the IOMMU API,
e.g. VFIO. Since each device is behind its own System MMU, we
can allocate a new IOMMU group for each device.

Reviewd-by: Cho KyongHo 
Signed-off-by: Antonios Motakis 
---
 drivers/iommu/exynos-iommu.c |   28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 543ea2e0..2beb197 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1213,6 +1213,32 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct 
iommu_domain *domain,
return phys;
 }
 
+static int exynos_iommu_add_device(struct device *dev)
+{
+   struct iommu_group *group;
+   int ret;
+
+   group = iommu_group_get(dev);
+
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group)) {
+   dev_err(dev, "Failed to allocate IOMMU group\n");
+   return PTR_ERR(group);
+   }
+   }
+
+   ret = iommu_group_add_device(group, dev);
+   iommu_group_put(group);
+
+   return ret;
+}
+
+static void exynos_iommu_remove_device(struct device *dev)
+{
+   iommu_group_remove_device(dev);
+}
+
 static struct iommu_ops exynos_iommu_ops = {
.domain_init = &exynos_iommu_domain_init,
.domain_destroy = &exynos_iommu_domain_destroy,
@@ -1221,6 +1247,8 @@ static struct iommu_ops exynos_iommu_ops = {
.map = &exynos_iommu_map,
.unmap = &exynos_iommu_unmap,
.iova_to_phys = &exynos_iommu_iova_to_phys,
+   .add_device = &exynos_iommu_add_device,
+   .remove_device = &exynos_iommu_remove_device,
.pgsize_bitmap = SECT_SIZE | LPAGE_SIZE | SPAGE_SIZE,
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 19/27] iommu/exynos: add support for power management subsystems.

2014-03-13 Thread Cho KyongHo

This adds support for Suspend to RAM and Runtime Power Management.

Since System MMU is located in the same local power domain of its
master H/W, System MMU must be initialized before it is working if
its power domain was ever turned off. TLB invalidation according to
unmapping on page tables must also be performed while power domain is
turned on.

This patch ensures that resume and runtime_resume(restore_state)
functions in this driver is called before the calls to resume and
runtime_resume callback functions in the drivers of master H/Ws.
Likewise, suspend and runtime_suspend(save_state) functions in this
driver is called after the calls to suspend and runtime_suspend in the
drivers of master H/Ws.

In order to get benefit of this support, the master H/W and its System
MMU must resides in the same power domain in terms of Linux kernel. If
a master H/W does not use generic I/O power domain, its driver must
call iommu_attach_device() after its local power domain is turned on,
iommu_detach_device before turned off.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  220 ++
 1 file changed, 201 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 9037da0..84ba29a 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -203,6 +204,7 @@ struct sysmmu_drvdata {
int activations;
rwlock_t lock;
struct iommu_domain *domain;
+   bool runtime_active;
unsigned long pgtable;
 };
 
@@ -388,7 +390,8 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
data->pgtable = 0;
data->domain = NULL;
 
-   __sysmmu_disable_nocount(data);
+   if (data->runtime_active)
+   __sysmmu_disable_nocount(data);
 
dev_dbg(data->sysmmu, "Disabled\n");
} else  {
@@ -449,7 +452,8 @@ static int __sysmmu_enable(struct sysmmu_drvdata *data,
data->pgtable = pgtable;
data->domain = domain;
 
-   __sysmmu_enable_nocount(data);
+   if (data->runtime_active)
+   __sysmmu_enable_nocount(data);
 
dev_dbg(data->sysmmu, "Enabled\n");
} else {
@@ -534,13 +538,11 @@ static void sysmmu_tlb_invalidate_entry(struct device 
*dev, unsigned long iova,
data = dev_get_drvdata(owner->sysmmu);
 
read_lock_irqsave(&data->lock, flags);
-   if (is_sysmmu_active(data)) {
-   unsigned int maj;
+   if (is_sysmmu_active(data) && data->runtime_active) {
unsigned int num_inv = 1;
 
__master_clk_enable(data);
 
-   maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
/*
 * L2TLB invalidation required
 * 4KB page: 1 invalidation
@@ -551,7 +553,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
 * 1MB page can be cached in one of all sets.
 * 64KB page can be one of 16 consecutive sets.
 */
-   if ((maj >> 28) == 2) /* major version number */
+   if (__sysmmu_version(data, NULL) == 2) /* major version number 
*/
num_inv = min_t(unsigned int, size / PAGE_SIZE, 64);
 
if (sysmmu_block(data->sfrbase)) {
@@ -576,7 +578,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
data = dev_get_drvdata(owner->sysmmu);
 
read_lock_irqsave(&data->lock, flags);
-   if (is_sysmmu_active(data)) {
+   if (is_sysmmu_active(data) && data->runtime_active) {
__master_clk_enable(data);
if (sysmmu_block(data->sfrbase)) {
__sysmmu_tlb_invalidate(data->sfrbase);
@@ -677,11 +679,40 @@ static int __init exynos_sysmmu_probe(struct 
platform_device *pdev)
platform_set_drvdata(pdev, data);
 
pm_runtime_enable(dev);
+   data->runtime_active = !pm_runtime_enabled(dev);
 
dev_dbg(dev, "Probed and initialized\n");
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int sysmmu_suspend(struct device *dev)
+{
+   struct sysmmu_drvdata *data = dev_get_drvdata(dev);
+   unsigned long flags;
+   read_lock_irqsave(&data->lock, flags);
+   if (is_sysmmu_active(data) &&
+   (!pm_runtime_enabled(dev) || data->runtime_active))
+   __sysmmu_disable_nocount(data);
+   read_unlock_irqrestore(&data->lock, flags);
+   return 0;
+}
+
+static int sysmmu_resume(struct device *dev)
+{
+   struct sysmmu_drvdata *data = dev_get_drvdata(dev);
+   unsigned long flags;
+   read_lock_irqsave(&data->lock, flags);
+   if (is_sysmmu_active(data) &&
+   (!pm_runtime_enabled(dev) || data->runtime_active))
+   __sysmm

[PATCH v11 21/27] iommu/exynos: change rwlock to spinlock

2014-03-13 Thread Cho KyongHo

Since acquiring read_lock is not more frequent than write_lock, it is
not beneficial to use rwlock, this commit changes rwlock to spinlock.

Reviewed-by: Grant Grundler 
Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 7489343..543ea2e0 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -211,7 +211,7 @@ struct sysmmu_drvdata {
struct clk *clk;
struct clk *clk_master;
int activations;
-   rwlock_t lock;
+   spinlock_t lock;
struct iommu_domain *domain;
bool runtime_active;
bool suspended;
@@ -334,11 +334,12 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
unsigned long addr = -1;
int ret = -ENOSYS;
 
-   read_lock(&data->lock);
-
WARN_ON(!is_sysmmu_active(data));
 
+   spin_lock(&data->lock);
+
__master_clk_enable(data);
+
itype = (enum exynos_sysmmu_inttype)
__ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
@@ -371,7 +372,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
 
__master_clk_disable(data);
 
-   read_unlock(&data->lock);
+   spin_unlock(&data->lock);
 
return IRQ_HANDLED;
 }
@@ -392,7 +393,7 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
bool disabled;
unsigned long flags;
 
-   write_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
 
disabled = set_sysmmu_inactive(data);
 
@@ -409,7 +410,7 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
data->activations);
}
 
-   write_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 
return disabled;
 }
@@ -457,7 +458,7 @@ static int __sysmmu_enable(struct sysmmu_drvdata *data,
int ret = 0;
unsigned long flags;
 
-   write_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
if (set_sysmmu_active(data)) {
data->pgtable = pgtable;
data->domain = domain;
@@ -475,7 +476,7 @@ static int __sysmmu_enable(struct sysmmu_drvdata *data,
if (WARN_ON(ret < 0))
set_sysmmu_inactive(data); /* decrement count */
 
-   write_unlock_irqrestore(&data->lock, flags);
+   spin_unlock_irqrestore(&data->lock, flags);
 
return ret;
 }
@@ -562,7 +563,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
 
for_each_sysmmu_list(dev, list) {
struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
-   read_lock(&data->lock);
+   spin_lock(&data->lock);
if (is_sysmmu_active(data) && data->runtime_active) {
unsigned int num_inv = 1;
 
@@ -594,7 +595,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
iova);
}
 
-   read_unlock(&data->lock);
+   spin_unlock(&data->lock);
}
 
spin_unlock_irqrestore(&owner->lock, flags);
@@ -610,7 +611,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
 
for_each_sysmmu_list(dev, list) {
struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
-   read_lock(&data->lock);
+   spin_lock(&data->lock);
if (is_sysmmu_active(data) && data->runtime_active) {
__master_clk_enable(data);
if (sysmmu_block(data->sfrbase)) {
@@ -621,7 +622,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
} else {
dev_dbg(dev, "disabled. Skipping TLB invalidation\n");
}
-   read_unlock(&data->lock);
+   spin_unlock(&data->lock);
}
 
spin_unlock_irqrestore(&owner->lock, flags);
@@ -819,7 +820,7 @@ static int __init exynos_sysmmu_probe(struct 
platform_device *pdev)
if (!ret) {
data->runtime_active = !pm_runtime_enabled(dev);
data->sysmmu = dev;
-   rwlock_init(&data->lock);
+   spin_lock_init(&data->lock);
 
platform_set_drvdata(pdev, data);
}
@@ -1269,12 +1270,12 @@ static int sysmmu_pm_genpd_suspend(struct device *dev)
for_each_sysmmu_list(dev, list) {
struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
unsigned long flags;
-   write_lock_irqsave(&data->lock, flags);
+   spin_lock_irqsave(&data->lock, flags);
if (!data->suspended && is_sysmmu_active(data) &&

[PATCH v11 20/27] iommu/exynos: allow having multiple System MMUs for a master H/W

2014-03-13 Thread Cho KyongHo

Some master device descriptor like fimc-is which is an abstraction
of very complex H/W may have multiple System MMUs. For those devices,
the design of the link between System MMU and its master H/W is needed
to be reconsidered.

A link structure, sysmmu_list_data is introduced that provides a link
to master H/W and that has a pointer to the device descriptor of a
System MMU. Given a device descriptor of a master H/W, it is possible
to traverse all System MMUs that must be controlled along with the
master H/W.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  534 ++
 1 file changed, 333 insertions(+), 201 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 84ba29a..7489343 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -128,6 +128,10 @@
 #define __master_clk_disable(data) __clk_gate_ctrl(data, clk_master, dis)
 
 #define has_sysmmu(dev)(dev->archdata.iommu != NULL)
+#define for_each_sysmmu_list(dev, list_data)   \
+   list_for_each_entry(list_data,  \
+   &((struct exynos_iommu_owner *)dev->archdata.iommu)->mmu_list, \
+   entry)
 
 static struct kmem_cache *lv2table_kmem_cache;
 
@@ -181,7 +185,7 @@ static char *sysmmu_fault_name[SYSMMU_FAULTS_NUM] = {
 struct exynos_iommu_owner {
struct list_head client; /* entry of exynos_iommu_domain.clients */
struct device *dev;
-   struct device *sysmmu;
+   struct list_head mmu_list;  /* list of sysmmu_list_data.entry */
struct iommu_domain *domain;
void *vmm_data; /* IO virtual memory manager's data */
spinlock_t lock;/* Lock to preserve consistency of System MMU */
@@ -195,6 +199,11 @@ struct exynos_iommu_domain {
spinlock_t pgtablelock; /* lock for modifying page table @ pgtable */
 };
 
+struct sysmmu_list_data {
+   struct list_head entry; /* entry of exynos_iommu_owner.mmu_list */
+   struct device *sysmmu;
+};
+
 struct sysmmu_drvdata {
struct device *sysmmu;  /* System MMU's device descriptor */
struct device *master;  /* Owner of system MMU */
@@ -205,6 +214,7 @@ struct sysmmu_drvdata {
rwlock_t lock;
struct iommu_domain *domain;
bool runtime_active;
+   bool suspended;
unsigned long pgtable;
 };
 
@@ -471,28 +481,39 @@ static int __sysmmu_enable(struct sysmmu_drvdata *data,
 }
 
 /* __exynos_sysmmu_enable: Enables System MMU
- *
- * returns -error if an error occurred and System MMU is not enabled,
- * 0 if the System MMU has been just enabled and 1 if System MMU was already
- * enabled before.
- */
+*
+* returns -error if an error occurred and System MMU is not enabled,
+* 0 if the System MMU has been just enabled and 1 if System MMU was already
+* enabled before.
+*/
 static int __exynos_sysmmu_enable(struct device *dev, unsigned long pgtable,
  struct iommu_domain *domain)
 {
int ret = 0;
unsigned long flags;
struct exynos_iommu_owner *owner = dev->archdata.iommu;
-   struct sysmmu_drvdata *data;
+   struct sysmmu_list_data *list;
 
BUG_ON(!has_sysmmu(dev));
 
spin_lock_irqsave(&owner->lock, flags);
 
-   data = dev_get_drvdata(owner->sysmmu);
-
-   ret = __sysmmu_enable(data, pgtable, domain);
-   if (ret >= 0)
+   for_each_sysmmu_list(dev, list) {
+   struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
data->master = dev;
+   ret = __sysmmu_enable(data, pgtable, domain);
+   if (ret < 0) {
+   struct sysmmu_list_data *iter;
+   for_each_sysmmu_list(dev, iter) {
+   if (iter->sysmmu == list->sysmmu)
+   break;
+   data = dev_get_drvdata(iter->sysmmu);
+   __sysmmu_disable(data);
+   data->master = NULL;
+   }
+   break;
+   }
+   }
 
spin_unlock_irqrestore(&owner->lock, flags);
 
@@ -511,17 +532,19 @@ static bool exynos_sysmmu_disable(struct device *dev)
unsigned long flags;
bool disabled = true;
struct exynos_iommu_owner *owner = dev->archdata.iommu;
-   struct sysmmu_drvdata *data;
+   struct sysmmu_list_data *list;
 
BUG_ON(!has_sysmmu(dev));
 
spin_lock_irqsave(&owner->lock, flags);
 
-   data = dev_get_drvdata(owner->sysmmu);
-
-   disabled = __sysmmu_disable(data);
-   if (disabled)
-   data->master = NULL;
+   /* Every call to __sysmmu_disable() must return same result */
+   for_each_sysmmu_list(dev, list) {
+   struct sysmmu_drvdata *data = dev_get_drvdata(list->sysmmu);
+   disabl

[PATCH v11 15/27] iommu/exynos: use convenient macro to handle gate clocks

2014-03-13 Thread Cho KyongHo

exynos-iommu driver must care about master H/W's gate clock as well as
System MMU's gate clock. To enhance readability of the source code,
macros to gate/ungate those clocks are defined.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 71e77f1..cef62d0 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -101,6 +101,16 @@
 #define REG_PB1_SADDR  0x054
 #define REG_PB1_EADDR  0x058
 
+#define __clk_gate_ctrl(data, clk, en) do {\
+   if (data->clk)  \
+   clk_##en##able(data->clk);  \
+   } while (0)
+
+#define __sysmmu_clk_enable(data)  __clk_gate_ctrl(data, clk, en)
+#define __sysmmu_clk_disable(data) __clk_gate_ctrl(data, clk, dis)
+#define __master_clk_enable(data)  __clk_gate_ctrl(data, clk_master, en)
+#define __master_clk_disable(data) __clk_gate_ctrl(data, clk_master, dis)
+
 static struct kmem_cache *lv2table_kmem_cache;
 
 static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
@@ -302,7 +312,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
 
WARN_ON(!is_sysmmu_active(data));
 
-   clk_enable(data->clk_master);
+   __master_clk_enable(data);
itype = (enum exynos_sysmmu_inttype)
__ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
@@ -329,7 +339,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (itype != SYSMMU_FAULT_UNKNOWN)
sysmmu_unblock(data->sfrbase);
 
-   clk_disable(data->clk_master);
+   __master_clk_disable(data);
 
read_unlock(&data->lock);
 
@@ -346,12 +356,12 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
if (!set_sysmmu_inactive(data))
goto finish;
 
-   clk_enable(data->clk_master);
+   __master_clk_enable(data);
 
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
 
-   clk_disable(data->clk);
-   clk_disable(data->clk_master);
+   __sysmmu_clk_disable(data);
+   __master_clk_disable(data);
 
disabled = true;
data->pgtable = 0;
@@ -396,14 +406,14 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
 
data->pgtable = pgtable;
 
-   clk_enable(data->clk_master);
-   clk_enable(data->clk);
+   __master_clk_enable(data);
+   __sysmmu_clk_enable(data);
 
__sysmmu_set_ptbase(data->sfrbase, pgtable);
 
__raw_writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
 
-   clk_disable(data->clk_master);
+   __master_clk_disable(data);
 
data->domain = domain;
 
@@ -462,7 +472,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
unsigned int maj;
unsigned int num_inv = 1;
 
-   clk_enable(data->clk_master);
+   __master_clk_enable(data);
 
maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
/*
@@ -483,7 +493,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
num_inv);
sysmmu_unblock(data->sfrbase);
}
-   clk_disable(data->clk_master);
+   __master_clk_disable(data);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
@@ -499,12 +509,12 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
read_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
-   clk_enable(data->clk_master);
+   __master_clk_enable(data);
if (sysmmu_block(data->sfrbase)) {
__sysmmu_tlb_invalidate(data->sfrbase);
sysmmu_unblock(data->sfrbase);
}
-   clk_disable(data->clk_master);
+   __master_clk_disable(data);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 14/27] iommu/exynos: gating clocks of master H/W

2014-03-13 Thread Cho KyongHo

This patch gates clocks of master H/W as well as clocks of System MMU
if master clocks are specified.

Some Exynos SoCs (i.e. GScalers in Exynos5250) have dependencies in
the gating clocks of master H/W and its System MMU. If a H/W is the
case, accessing control registers of System MMU is prohibited unless
both of the gating clocks of System MMU and its master H/W.

CC: Tomasz Figa 
Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 34feb04..71e77f1 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -173,6 +173,7 @@ struct sysmmu_drvdata {
struct device *dev; /* Owner of system MMU */
void __iomem *sfrbase;
struct clk *clk;
+   struct clk *clk_master;
int activations;
rwlock_t lock;
struct iommu_domain *domain;
@@ -301,6 +302,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
 
WARN_ON(!is_sysmmu_active(data));
 
+   clk_enable(data->clk_master);
itype = (enum exynos_sysmmu_inttype)
__ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
@@ -327,6 +329,8 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (itype != SYSMMU_FAULT_UNKNOWN)
sysmmu_unblock(data->sfrbase);
 
+   clk_disable(data->clk_master);
+
read_unlock(&data->lock);
 
return IRQ_HANDLED;
@@ -342,10 +346,12 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
if (!set_sysmmu_inactive(data))
goto finish;
 
+   clk_enable(data->clk_master);
+
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
 
-   if (data->clk)
-   clk_disable(data->clk);
+   clk_disable(data->clk);
+   clk_disable(data->clk_master);
 
disabled = true;
data->pgtable = 0;
@@ -388,15 +394,17 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
goto finish;
}
 
-   if (data->clk)
-   clk_enable(data->clk);
-
data->pgtable = pgtable;
 
+   clk_enable(data->clk_master);
+   clk_enable(data->clk);
+
__sysmmu_set_ptbase(data->sfrbase, pgtable);
 
__raw_writel(CTRL_ENABLE, data->sfrbase + REG_MMU_CTRL);
 
+   clk_disable(data->clk_master);
+
data->domain = domain;
 
dev_dbg(data->sysmmu, "Enabled\n");
@@ -453,6 +461,9 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
if (is_sysmmu_active(data)) {
unsigned int maj;
unsigned int num_inv = 1;
+
+   clk_enable(data->clk_master);
+
maj = __raw_readl(data->sfrbase + REG_MMU_VERSION);
/*
 * L2TLB invalidation required
@@ -472,6 +483,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
num_inv);
sysmmu_unblock(data->sfrbase);
}
+   clk_disable(data->clk_master);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
@@ -487,10 +499,12 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
read_lock_irqsave(&data->lock, flags);
 
if (is_sysmmu_active(data)) {
+   clk_enable(data->clk_master);
if (sysmmu_block(data->sfrbase)) {
__sysmmu_tlb_invalidate(data->sfrbase);
sysmmu_unblock(data->sfrbase);
}
+   clk_disable(data->clk_master);
} else {
dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
@@ -544,6 +558,17 @@ static int __init exynos_sysmmu_probe(struct 
platform_device *pdev)
return ret;
}
 
+   data->clk_master = devm_clk_get(dev, "master");
+   if (IS_ERR(data->clk_master))
+   data->clk_master = NULL;
+
+   ret = clk_prepare(data->clk_master);
+   if (ret) {
+   clk_unprepare(data->clk);
+   dev_err(dev, "Failed to prepare master's clk\n");
+   return ret;
+   }
+
data->sysmmu = dev;
rwlock_init(&data->lock);
INIT_LIST_HEAD(&data->node);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 18/27] iommu/exynos: turn on useful configuration options

2014-03-13 Thread Cho KyongHo

This turns on FLPD_CACHE, ACGEN and SYSSEL.

FLPD_CACHE is a cache of 1st level page table entries that contains
the address of a 2nd level page table to reduce latency of page table
walking.

ACGEN is architectural clock gating that gates clocks by System MMU
itself if it is not active. Note that ACGEN is different from clock
gating by the CPU. ACGEN just gates clocks to the internal logic of
System MMU while clock gating by the CPU gates clocks to the System
MMU.

SYSSEL selects System MMU version in some Exynos SoCs. Some Exynos
SoCs have an option to select System MMU versions exclusively because
the SoCs adopts new System MMU version experimentally.

This also always selects LRU as TLB replacement policy. Selecting TLB
replacement policy is deprecated from System MMU 3.2. TLB in System
MMU 3.3 has single TLB replacement policy, LRU. The bit of MMU_CFG
selecting TLB replacement policy is remained as reserved.

QoS value of page table walking is set to 15 (highst value). System
MMU 3.3 can inherit QoS value of page table walking from its master
H/W's transaction. This new feature is enabled by default and QoS
value written to MMU_CFG is ignored.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   52 +-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 6834556..9037da0 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -82,6 +82,13 @@
 #define CTRL_BLOCK 0x7
 #define CTRL_DISABLE   0x0
 
+#define CFG_LRU0x1
+#define CFG_QOS(n) ((n & 0xF) << 7)
+#define CFG_MASK   0x0150 /* Selecting bit 0-15, 20, 22 and 24 */
+#define CFG_ACGEN  (1 << 24) /* System MMU 3.3 only */
+#define CFG_SYSSEL (1 << 22) /* System MMU 3.2 only */
+#define CFG_FLPDCACHE  (1 << 20) /* System MMU 3.2+ only */
+
 #define REG_MMU_CTRL   0x000
 #define REG_MMU_CFG0x004
 #define REG_MMU_STATUS 0x008
@@ -98,6 +105,12 @@
 
 #define REG_MMU_VERSION0x034
 
+#define MMU_MAJ_VER(val)   ((val) >> 7)
+#define MMU_MIN_VER(val)   ((val) & 0x7F)
+#define MMU_RAW_VER(reg)   (((reg) >> 21) & ((1 << 11) - 1)) /* 11 bits */
+
+#define MAKE_MMU_VER(maj, min) maj) & 0xF) << 7) | ((min) & 0x7F))
+
 #define REG_PB0_SADDR  0x04C
 #define REG_PB0_EADDR  0x050
 #define REG_PB1_SADDR  0x054
@@ -217,6 +230,29 @@ static void sysmmu_unblock(void __iomem *sfrbase)
__raw_writel(CTRL_ENABLE, sfrbase + REG_MMU_CTRL);
 }
 
+static unsigned int __raw_sysmmu_version(struct sysmmu_drvdata *data)
+{
+   return MMU_RAW_VER(__raw_readl(data->sfrbase + REG_MMU_VERSION));
+}
+
+static unsigned int __sysmmu_version(struct sysmmu_drvdata *data,
+unsigned int *minor)
+{
+   unsigned int ver = 0;
+
+   ver = __raw_sysmmu_version(data);
+   if (ver > MAKE_MMU_VER(3, 3)) {
+   dev_err(data->sysmmu, "%s: version(%d.%d) is higher than 3.3\n",
+   __func__, MMU_MAJ_VER(ver), MMU_MIN_VER(ver));
+   BUG();
+   }
+
+   if (minor)
+   *minor = MMU_MIN_VER(ver);
+
+   return MMU_MAJ_VER(ver);
+}
+
 static bool sysmmu_block(void __iomem *sfrbase)
 {
int i = 120;
@@ -367,7 +403,21 @@ static bool __sysmmu_disable(struct sysmmu_drvdata *data)
 
 static void __sysmmu_init_config(struct sysmmu_drvdata *data)
 {
-   unsigned long cfg = 0;
+   unsigned long cfg = CFG_LRU | CFG_QOS(15);
+   int maj, min = 0;
+
+   maj = __sysmmu_version(data, &min);
+   if (maj == 3) {
+   if (min >= 2) {
+   cfg |= CFG_FLPDCACHE;
+   if (min == 3) {
+   cfg |= CFG_ACGEN;
+   cfg &= ~CFG_LRU;
+   } else {
+   cfg |= CFG_SYSSEL;
+   }
+   }
+   }
 
__raw_writel(cfg, data->sfrbase + REG_MMU_CFG);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 13/27] iommu/exynos: support for device tree

2014-03-13 Thread Cho KyongHo

This commit adds device tree support for System MMU.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/Kconfig|5 ++---
 drivers/iommu/exynos-iommu.c |   21 +
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index df56e4c..22af807 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -178,16 +178,15 @@ config TEGRA_IOMMU_SMMU
 
 config EXYNOS_IOMMU
bool "Exynos IOMMU Support"
-   depends on ARCH_EXYNOS && EXYNOS_DEV_SYSMMU
+   depends on ARCH_EXYNOS
select IOMMU_API
+   default n
help
  Support for the IOMMU(System MMU) of Samsung Exynos application
  processor family. This enables H/W multimedia accellerators to see
  non-linear physical memory chunks as a linear memory in their
  address spaces
 
- If unsure, say N here.
-
 config EXYNOS_IOMMU_DEBUG
bool "Debugging log for Exynos IOMMU"
depends on EXYNOS_IOMMU
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 33b424d..34feb04 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -497,7 +498,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
read_unlock_irqrestore(&data->lock, flags);
 }
 
-static int exynos_sysmmu_probe(struct platform_device *pdev)
+static int __init exynos_sysmmu_probe(struct platform_device *pdev)
 {
int irq, ret;
struct device *dev = &pdev->dev;
@@ -557,11 +558,23 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
return 0;
 }
 
-static struct platform_driver exynos_sysmmu_driver = {
-   .probe  = exynos_sysmmu_probe,
-   .driver = {
+#ifdef CONFIG_OF
+static struct of_device_id sysmmu_of_match[] __initconst = {
+   { .compatible   = "samsung,sysmmu-v1", },
+   { .compatible   = "samsung,sysmmu-v2", },
+   { .compatible   = "samsung,sysmmu-v3.1", },
+   { .compatible   = "samsung,sysmmu-v3.2", },
+   { .compatible   = "samsung,sysmmu-v3.3", },
+   { },
+};
+#endif
+
+static struct platform_driver exynos_sysmmu_driver __refdata = {
+   .probe  = exynos_sysmmu_probe,
+   .driver = {
.owner  = THIS_MODULE,
.name   = "exynos-sysmmu",
+   .of_match_table = of_match_ptr(sysmmu_of_match),
}
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 16/27] iommu/exynos: remove custom fault handler

2014-03-13 Thread Cho KyongHo

This commit removes custom fault handler. The device drivers that
need to register fault handler can register
with iommu_set_fault_handler().

CC: Grant Grundler 
Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   80 +-
 1 file changed, 24 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index cef62d0..3458349 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -136,16 +136,6 @@ enum exynos_sysmmu_inttype {
SYSMMU_FAULTS_NUM
 };
 
-/*
- * @itype: type of fault.
- * @pgtable_base: the physical address of page table base. This is 0 if @itype
- *is SYSMMU_BUSERROR.
- * @fault_addr: the device (virtual) address that the System MMU tried to
- * translated. This is 0 if @itype is SYSMMU_BUSERROR.
- */
-typedef int (*sysmmu_fault_handler_t)(enum exynos_sysmmu_inttype itype,
-   unsigned long pgtable_base, unsigned long fault_addr);
-
 static unsigned short fault_reg_offset[SYSMMU_FAULTS_NUM] = {
REG_PAGE_FAULT_ADDR,
REG_AR_FAULT_ADDR,
@@ -187,7 +177,6 @@ struct sysmmu_drvdata {
int activations;
rwlock_t lock;
struct iommu_domain *domain;
-   sysmmu_fault_handler_t fault_handler;
unsigned long pgtable;
 };
 
@@ -256,34 +245,17 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
__sysmmu_tlb_invalidate(sfrbase);
 }
 
-static void __set_fault_handler(struct sysmmu_drvdata *data,
-   sysmmu_fault_handler_t handler)
-{
-   unsigned long flags;
-
-   write_lock_irqsave(&data->lock, flags);
-   data->fault_handler = handler;
-   write_unlock_irqrestore(&data->lock, flags);
-}
-
-void exynos_sysmmu_set_fault_handler(struct device *dev,
-   sysmmu_fault_handler_t handler)
-{
-   struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
-
-   __set_fault_handler(data, handler);
-}
-
-static int default_fault_handler(enum exynos_sysmmu_inttype itype,
-unsigned long pgtable_base, unsigned long fault_addr)
+static void show_fault_information(const char *name,
+   enum exynos_sysmmu_inttype itype,
+   unsigned long pgtable_base, unsigned long fault_addr)
 {
unsigned long *ent;
 
if ((itype >= SYSMMU_FAULTS_NUM) || (itype < SYSMMU_PAGEFAULT))
itype = SYSMMU_FAULT_UNKNOWN;
 
-   pr_err("%s occurred at 0x%lx(Page table base: 0x%lx)\n",
-   sysmmu_fault_name[itype], fault_addr, pgtable_base);
+   pr_err("%s occurred at 0x%lx by %s(Page table base: 0x%lx)\n",
+   sysmmu_fault_name[itype], fault_addr, name, pgtable_base);
 
ent = section_entry(__va(pgtable_base), fault_addr);
pr_err("\tLv1 entry: 0x%lx\n", *ent);
@@ -292,12 +264,6 @@ static int default_fault_handler(enum 
exynos_sysmmu_inttype itype,
ent = page_entry(ent, fault_addr);
pr_err("\t Lv2 entry: 0x%lx\n", *ent);
}
-
-   pr_err("Generating Kernel OOPS... because it is unrecoverable.\n");
-
-   BUG();
-
-   return 0;
 }
 
 static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
@@ -320,24 +286,28 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
else
addr = __raw_readl(data->sfrbase + fault_reg_offset[itype]);
 
-   if (data->domain)
-   ret = report_iommu_fault(data->domain, data->dev, addr, itype);
-
-   if ((ret == -ENOSYS) && data->fault_handler) {
-   unsigned long base = data->pgtable;
-   if (itype != SYSMMU_FAULT_UNKNOWN)
-   base = __raw_readl(data->sfrbase + REG_PT_BASE_ADDR);
-   ret = data->fault_handler(itype, base, addr);
+   if (itype == SYSMMU_FAULT_UNKNOWN) {
+   pr_err("%s: Fault is not occurred by System MMU '%s'!\n",
+   __func__, dev_name(data->sysmmu));
+   pr_err("%s: Please check if IRQ is correctly configured.\n",
+   __func__);
+   BUG();
+   } else {
+   unsigned long base =
+   __raw_readl(data->sfrbase + REG_PT_BASE_ADDR);
+   show_fault_information(dev_name(data->sysmmu),
+   itype, base, addr);
+   if (data->domain)
+   ret = report_iommu_fault(data->domain,
+   data->dev, addr, itype);
}
 
-   if (!ret && (itype != SYSMMU_FAULT_UNKNOWN))
-   __raw_writel(1 << itype, data->sfrbase + REG_INT_CLEAR);
-   else
-   dev_dbg(data->sysmmu, "%s is not handled.\n",
-   sysmmu_fault_name[itype]);
+   /* fault is not recovered by fault handler */
+   BUG_ON(ret != 0);
 
-   if (itype

[PATCH v11 17/27] iommu/exynos: remove calls to Runtime PM API functions

2014-03-13 Thread Cho KyongHo

Runtime power management by exynos-iommu driver independently from
master H/W's runtime pm is not useful for power saving since attaching
master H/W in probing time turns on its local power endlessly.
Thus this removes runtime pm API calls.
Runtime PM support is added in the following commits to exynos-iommu
driver.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  369 +++---
 1 file changed, 238 insertions(+), 131 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 3458349..6834556 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -111,6 +113,8 @@
 #define __master_clk_enable(data)  __clk_gate_ctrl(data, clk_master, en)
 #define __master_clk_disable(data) __clk_gate_ctrl(data, clk_master, dis)
 
+#define has_sysmmu(dev)(dev->archdata.iommu != NULL)
+
 static struct kmem_cache *lv2table_kmem_cache;
 
 static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
@@ -159,6 +163,16 @@ static char *sysmmu_fault_name[SYSMMU_FAULTS_NUM] = {
"UNKNOWN FAULT"
 };
 
+/* attached to dev.archdata.iommu of the master device */
+struct exynos_iommu_owner {
+   struct list_head client; /* entry of exynos_iommu_domain.clients */
+   struct device *dev;
+   struct device *sysmmu;
+   struct iommu_domain *domain;
+   void *vmm_data; /* IO virtual memory manager's data */
+   spinlock_t lock;/* Lock to preserve consistency of System MMU */
+};
+
 struct exynos_iommu_domain {
struct list_head clients; /* list of sysmmu_drvdata.node */
unsigned long *pgtable; /* lv1 page table, 16KB */
@@ -168,9 +182,8 @@ struct exynos_iommu_domain {
 };
 
 struct sysmmu_drvdata {
-   struct list_head node; /* entry of exynos_iommu_domain.clients */
struct device *sysmmu;  /* System MMU's device descriptor */
-   struct device *dev; /* Owner of system MMU */
+   struct device *master;  /* Owner of system MMU */
void __iomem *sfrbase;
struct clk *clk;
struct clk *clk_master;
@@ -239,7 +252,6 @@ static void __sysmmu_tlb_invalidate_entry(void __iomem 
*sfrbase,
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
   unsigned long pgd)
 {
-   __raw_writel(0x1, sfrbase + REG_MMU_CFG); /* 16KB LV1, LRU */
__raw_writel(pgd, sfrbase + REG_PT_BASE_ADDR);
 
__sysmmu_tlb_invalidate(sfrbase);
@@ -299,7 +311,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
itype, base, addr);
if (data->domain)
ret = report_iommu_fault(data->domain,
-   data->dev, addr, itype);
+   data->master, addr, itype);
}
 
/* fault is not recovered by fault handler */
@@ -316,116 +328,148 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static bool __exynos_sysmmu_disable(struct sysmmu_drvdata *data)
+static void __sysmmu_disable_nocount(struct sysmmu_drvdata *data)
 {
-   unsigned long flags;
-   bool disabled = false;
-
-   write_lock_irqsave(&data->lock, flags);
-
-   if (!set_sysmmu_inactive(data))
-   goto finish;
-
-   __master_clk_enable(data);
+   clk_enable(data->clk_master);
 
__raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
+   __raw_writel(0, data->sfrbase + REG_MMU_CFG);
 
__sysmmu_clk_disable(data);
__master_clk_disable(data);
+}
 
-   disabled = true;
-   data->pgtable = 0;
-   data->domain = NULL;
-finish:
-   write_unlock_irqrestore(&data->lock, flags);
+static bool __sysmmu_disable(struct sysmmu_drvdata *data)
+{
+   bool disabled;
+   unsigned long flags;
+
+   write_lock_irqsave(&data->lock, flags);
+
+   disabled = set_sysmmu_inactive(data);
+
+   if (disabled) {
+   data->pgtable = 0;
+   data->domain = NULL;
+
+   __sysmmu_disable_nocount(data);
 
-   if (disabled)
dev_dbg(data->sysmmu, "Disabled\n");
-   else
-   dev_dbg(data->sysmmu, "%d times left to be disabled\n",
+   } else  {
+   dev_dbg(data->sysmmu, "%d times left to disable\n",
data->activations);
+   }
+
+   write_unlock_irqrestore(&data->lock, flags);
 
return disabled;
 }
 
-/* __exynos_sysmmu_enable: Enables System MMU
- *
- * returns -error if an error occurred and System MMU is not enabled,
- * 0 if the System MMU has been just enabled and 1 if System MMU was already
- * enabled before.
- */
-static int __exynos_sysmmu_enable(struct sysmmu_drvdata *data,
+static void __sysmmu_init_config(struc

[PATCH v11 12/27] ARM: dts: Add description of System MMU of Exynos SoCs

2014-03-13 Thread Cho KyongHo

This patch adds dts entries for the System MMU devices found on
Exynos4 and Exynos5 SoC series and the System MMU binding
documentation.

CC: Rob Herring 
CC: Sylwester Nawrocki 
Signed-off-by: Cho KyongHo 
---
 .../bindings/iommu/samsung,exynos4210-sysmmu.txt   |   86 +++
 arch/arm/boot/dts/exynos4.dtsi |  107 
 arch/arm/boot/dts/exynos4210.dtsi  |   23 +-
 arch/arm/boot/dts/exynos4x12.dtsi  |   77 +-
 arch/arm/boot/dts/exynos5250.dtsi  |  266 +++-
 arch/arm/boot/dts/exynos5420.dtsi  |  205 ++-
 6 files changed, 758 insertions(+), 6 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/iommu/samsung,exynos4210-sysmmu.txt

diff --git 
a/Documentation/devicetree/bindings/iommu/samsung,exynos4210-sysmmu.txt 
b/Documentation/devicetree/bindings/iommu/samsung,exynos4210-sysmmu.txt
new file mode 100644
index 000..e4417bb
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/samsung,exynos4210-sysmmu.txt
@@ -0,0 +1,86 @@
+Samsung Exynos IOMMU H/W, System MMU (System Memory Management Unit)
+
+Samsung's Exynos architecture contains System MMUs that enables scattered
+physical memory chunks visible as a contiguous region to DMA-capable peripheral
+devices like MFC, FIMC, FIMD, GScaler, FIMC-IS and so forth.
+
+System MMU is an IOMMU and supports identical translation table format to
+ARMv7 translation tables with minimum set of page properties including access
+permissions, shareability and security protection. In addition, System MMU has
+another capabilities like L2 TLB or block-fetch buffers to minimize translation
+latency.
+
+System MMUs are in many to one relation with peripheral devices, i.e. single
+peripheral device might have multiple System MMUs (usually one for each bus
+master), but one System MMU can handle transactions from only one peripheral
+device. The relation between a System MMU and the peripheral device needs to be
+defined in device node of the peripheral device.
+
+MFC in all Exynos SoCs and FIMD, M2M Scalers and G2D in Exynos5420 has 2 System
+MMUs.
+* MFC has one System MMU on its left and right bus.
+* FIMD in Exynos5420 has one System MMU for window 0 and 4, the other system 
MMU
+  for window 1, 2 and 3.
+* M2M Scalers and G2D in Exynos5420 has one System MMU on the read channel and
+  the other System MMU on the write channel.
+The drivers must consider how to handle those System MMUs. One of the idea is
+to implement child devices or sub-devices which are the client devices of the
+System MMU.
+
+Required properties:
+- compatible: Should be one of:
+   "samsung,sysmmu-v1"
+   "samsung,sysmmu-v2"
+   "samsung,sysmmu-v3.1"
+   "samsung,sysmmu-v3.2"
+   "samsung,sysmmu-v3.3"
+
+- reg: A tuple of base address and size of System MMU registers.
+- interrupt-parent: The phandle of the interrupt controller of System MMU
+- interrupts: An interrupt specifier for interrupt signal of System MMU,
+ according to the format defined by a particular interrupt
+ controller.
+- clock-names: Should be "sysmmu" if the System MMU is needed to gate its 
clock.
+   Please refer to the following documents:
+  Documentation/devicetree/bindings/clock/clock-bindings.txt
+  Documentation/devicetree/bindings/clock/exynos4-clock.txt
+  Documentation/devicetree/bindings/clock/exynos5250-clock.txt
+  Documentation/devicetree/bindings/clock/exynos5420-clock.txt
+  Optional "master" if the clock to the System MMU is gated by
+  another gate clock other than "sysmmu". The System MMU driver
+  sets "master" the parent of "sysmmu".
+  Exynos4 SoCs, there needs no "master" clockj.
+  Exynos5 SoCs, some System MMUs must have "master" clocks.
+- clocks: Required if the System MMU is needed to gate its clock.
+ Please refer to the documents listed above.
+- samsung,power-domain: Required if the System MMU is needed to gate its power.
+ Please refer to the following document:
+ Documentation/devicetree/bindings/arm/exynos/power_domain.txt
+- mmu-masters: A phandle to device nodes representing the master for which
+   the System MMU can provide a translation. Any additional values
+  after the phandle will be ignored because a System MMU never
+  have two or more masters. "#stream-id-cells" specified in the
+  master's node will be also ignored.
+  If more than one phandle is specified, only the first phandle
+  will be treated.
+
+Examples:
+   gsc_0: gsc@13e0 {
+   compatible = "samsung,exynos5-gsc";
+   reg = <0x13e0 0x1000>;
+   interrupts = <0 85 0>;
+   samsung,power-domain = <&pd_gsc>;
+   clocks = <&cl

[PATCH v11 11/27] clk: exynos: add gate clock descriptions of System MMU

2014-03-13 Thread Cho KyongHo

This adds gate clocks of all System MMUs and their master IPs
that are not apeared in clk-exynos5250.c and clk-exynos5420.c
Also fixes GATE_IP_ACP to 0x18800 and changed GATE_DA to GATE
for System MMU clocks in clk-exynos4.c

Signed-off-by: Cho KyongHo 
---
 .../devicetree/bindings/clock/exynos5250-clock.txt |3 +++
 .../devicetree/bindings/clock/exynos5420-clock.txt |6 +-
 drivers/clk/samsung/clk-exynos5250.c   |5 +
 drivers/clk/samsung/clk-exynos5420.c   |   13 +++--
 include/dt-bindings/clock/exynos5250.h |4 
 include/dt-bindings/clock/exynos5420.h |6 +-
 6 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/clock/exynos5250-clock.txt 
b/Documentation/devicetree/bindings/clock/exynos5250-clock.txt
index 72ce617..67e50ba 100644
--- a/Documentation/devicetree/bindings/clock/exynos5250-clock.txt
+++ b/Documentation/devicetree/bindings/clock/exynos5250-clock.txt
@@ -162,6 +162,9 @@ clock which they consume.
   g2d  345
   mdma0346
   smmu_mdma0   347
+  smmu_tv  348
+  smmu_fimd1   349
+  smmu_2d  350
 
 
[Clock Muxes]
diff --git a/Documentation/devicetree/bindings/clock/exynos5420-clock.txt 
b/Documentation/devicetree/bindings/clock/exynos5420-clock.txt
index 458f347..62dabc3 100644
--- a/Documentation/devicetree/bindings/clock/exynos5420-clock.txt
+++ b/Documentation/devicetree/bindings/clock/exynos5420-clock.txt
@@ -146,7 +146,8 @@ clock which they consume.
   hdmi 413
   aclk300_disp1420
   fimd1421
-  smmu_fimd1   422
+  smmu_fimd1m0 422
+  smmu_fimd1m1 423
   aclk166  430
   mixer431
   aclk266  440
@@ -172,12 +173,15 @@ clock which they consume.
   mdma0473
   aclk333_g2d  480
   g2d  481
+  smmu_g2d 482
   aclk333_432_gscl 490
   smmu_3aa 491
   smmu_fimcl0  492
   smmu_fimcl1  493
   smmu_fimcl3  494
   fimc_lite3   495
+  fimc_lite0   496
+  fimc_lite1   497
   aclk_g3d 500
   g3d  501
   smmu_mixer   502
diff --git a/drivers/clk/samsung/clk-exynos5250.c 
b/drivers/clk/samsung/clk-exynos5250.c
index e7ee442..6605733 100644
--- a/drivers/clk/samsung/clk-exynos5250.c
+++ b/drivers/clk/samsung/clk-exynos5250.c
@@ -615,6 +615,11 @@ static struct samsung_gate_clock exynos5250_gate_clks[] 
__initdata = {
GATE(CLK_WDT, "wdt", "div_aclk66", GATE_IP_PERIS, 19, 0, 0),
GATE(CLK_RTC, "rtc", "div_aclk66", GATE_IP_PERIS, 20, 0, 0),
GATE(CLK_TMU, "tmu", "div_aclk66", GATE_IP_PERIS, 21, 0, 0),
+   GATE(CLK_SMMU_TV, "smmu_tv", "mout_aclk200_disp1_sub",
+   GATE_IP_DISP1, 2, 0, 0),
+   GATE(CLK_SMMU_FIMD1, "smmu_fimd1", "mout_aclk200_disp1_sub",
+   GATE_IP_DISP1, 8, 0, 0),
+   GATE(CLK_SMMU_2D, "smmu_2d", "div_aclk200", GATE_IP_ACP, 7, 0, 0),
 };
 
 static struct samsung_pll_rate_table vpll_24mhz_tbl[] __initdata = {
diff --git a/drivers/clk/samsung/clk-exynos5420.c 
b/drivers/clk/samsung/clk-exynos5420.c
index 60b2681..b58e4d3 100644
--- a/drivers/clk/samsung/clk-exynos5420.c
+++ b/drivers/clk/samsung/clk-exynos5420.c
@@ -82,6 +82,7 @@
 #define GATE_BUS_PERIC10x10754
 #define GATE_BUS_PERIS00x10760
 #define GATE_BUS_PERIS10x10764
+#define GATE_IP_G2D0x08800
 #define GATE_IP_GSCL0  0x10910
 #define GATE_IP_GSCL1  0x10920
 #define GATE_IP_MFC0x1092c
@@ -707,6 +708,10 @@ static struct samsung_gate_clock exynos5420_gate_clks[] 
__initdata = {
GATE(CLK_GSCL_WB, "gscl_wb", "aclk300_gscl", GATE_IP_GSCL1, 13, 0, 0),
GATE(CLK_SMMU_FIMCL3, "smmu_fimcl3,", "aclk333_432_gscl",
GATE_IP_GSCL1, 16, 0, 0),
+   GATE(CLK_FIMC_LITE0, "fimc_lite0", "aclk333_432_gscl",
+   GATE_IP_GSCL0, 5, 0, 0),
+   GATE(CLK_FIMC_LITE1, "fimc_lite1", "aclk333_432_gscl",
+   GATE_IP_GSCL0, 6, 0, 0),
GATE(CLK_FIMC_LITE3, "fimc_lite3", "aclk333_432_gscl",
GATE_IP_GSCL1, 17, 0, 0),
 
@@ -715,8 +720,10 @@ static struct samsung_gate_clock exynos5420_gate_clks[] 
__initdata = {
GATE(CLK_DP1, "dp1", "aclk200_disp1", GATE_IP_DISP1, 4, 0, 0),
GATE(CLK_MIXER, "mixer", "aclk166", GATE_IP_DISP1, 5, 0, 0),
GATE(CLK_HDMI, "hdmi", "aclk200_disp1", GATE_IP_DISP1, 6, 0, 0),
-   GATE(CLK_SMMU_FIMD1, "smmu_fimd1", "aclk300_disp1", GATE_IP_DISP1, 8, 0,
-   0),
+   GATE(CLK_SMMU_FIMD1M0, "smmu_fimd1m0", "aclk300_disp1", GATE_IP_DISP1,
+   7, 0, 0),
+   GATE(CLK_SMMU_FIMD1M1, "smmu_fimd1m1", "aclk300_disp1", GATE_IP_DISP1,
+   8, 0, 0),

[PATCH v11 10/27] iommu/exynos: use managed device helper functions

2014-03-13 Thread Cho KyongHo

This patch uses managed device helper functions in the probe().

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   64 +-
 1 file changed, 26 insertions(+), 38 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 36e6b73..33b424d 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -499,51 +499,48 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
 
 static int exynos_sysmmu_probe(struct platform_device *pdev)
 {
-   int ret;
+   int irq, ret;
struct device *dev = &pdev->dev;
struct sysmmu_drvdata *data;
struct resource *res;
 
-   data = kzalloc(sizeof(*data), GFP_KERNEL);
-   if (!data) {
-   dev_dbg(dev, "Not enough memory\n");
-   ret = -ENOMEM;
-   goto err_alloc;
-   }
+   data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
-   dev_dbg(dev, "Unable to find IOMEM region\n");
-   ret = -ENOENT;
-   goto err_init;
+   dev_err(dev, "Unable to find IOMEM region\n");
+   return -ENOENT;
}
 
-   data->sfrbase = ioremap(res->start, resource_size(res));
-   if (!data->sfrbase) {
-   dev_dbg(dev, "Unable to map IOMEM @ PA:%#x\n", res->start);
-   ret = -ENOENT;
-   goto err_res;
-   }
+   data->sfrbase = devm_ioremap_resource(dev, res);
+   if (IS_ERR(data->sfrbase))
+   return PTR_ERR(data->sfrbase);
 
-   ret = platform_get_irq(pdev, 0);
-   if (ret <= 0) {
+   irq = platform_get_irq(pdev, 0);
+   if (irq <= 0) {
dev_dbg(dev, "Unable to find IRQ resource\n");
-   goto err_irq;
+   return irq;
}
 
-   ret = request_irq(ret, exynos_sysmmu_irq, 0,
+   ret = devm_request_irq(dev, irq, exynos_sysmmu_irq, 0,
dev_name(dev), data);
if (ret) {
-   dev_dbg(dev, "Unabled to register interrupt handler\n");
-   goto err_irq;
+   dev_err(dev, "Unabled to register handler of irq %d\n", irq);
+   return ret;
}
 
-   if (dev_get_platdata(dev)) {
-   data->clk = clk_get(dev, "sysmmu");
-   if (IS_ERR(data->clk)) {
-   data->clk = NULL;
-   dev_dbg(dev, "No clock descriptor registered\n");
-   }
+   data->clk = devm_clk_get(dev, "sysmmu");
+   if (IS_ERR(data->clk)) {
+   dev_info(dev, "No gate clock found!\n");
+   data->clk = NULL;
+   }
+
+   ret = clk_prepare(data->clk);
+   if (ret) {
+   dev_err(dev, "Failed to prepare clk\n");
+   return ret;
}
 
data->sysmmu = dev;
@@ -556,17 +553,8 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
 
pm_runtime_enable(dev);
 
-   dev_dbg(dev, "Initialized\n");
+   dev_dbg(dev, "Probed and initialized\n");
return 0;
-err_irq:
-   free_irq(platform_get_irq(pdev, 0), data);
-err_res:
-   iounmap(data->sfrbase);
-err_init:
-   kfree(data);
-err_alloc:
-   dev_err(dev, "Failed to initialize\n");
-   return ret;
 }
 
 static struct platform_driver exynos_sysmmu_driver = {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 08/27] iommu/exynos: always use a single clock descriptor

2014-03-13 Thread Cho KyongHo

System MMU driver is changed to control only a single instance of
System MMU at a time. Since a single instance of System MMU has only
a single clock descriptor for its clock gating, there is no need to
obtain two or more clock descriptors.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |  223 ++
 1 file changed, 72 insertions(+), 151 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 8dc7031..a4499b2 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -171,9 +171,8 @@ struct sysmmu_drvdata {
struct device *sysmmu;  /* System MMU's device descriptor */
struct device *dev; /* Owner of system MMU */
char *dbgname;
-   int nsfrs;
-   void __iomem **sfrbases;
-   struct clk *clk[2];
+   void __iomem *sfrbase;
+   struct clk *clk;
int activations;
rwlock_t lock;
struct iommu_domain *domain;
@@ -294,56 +293,39 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void 
*dev_id)
 {
/* SYSMMU is in blocked when interrupt occurred. */
struct sysmmu_drvdata *data = dev_id;
-   struct resource *irqres;
-   struct platform_device *pdev;
enum exynos_sysmmu_inttype itype;
unsigned long addr = -1;
-
-   int i, ret = -ENOSYS;
+   int ret = -ENOSYS;
 
read_lock(&data->lock);
 
WARN_ON(!is_sysmmu_active(data));
 
-   pdev = to_platform_device(data->sysmmu);
-   for (i = 0; i < (pdev->num_resources / 2); i++) {
-   irqres = platform_get_resource(pdev, IORESOURCE_IRQ, i);
-   if (irqres && ((int)irqres->start == irq))
-   break;
-   }
-
-   if (i == pdev->num_resources) {
+   itype = (enum exynos_sysmmu_inttype)
+   __ffs(__raw_readl(data->sfrbase + REG_INT_STATUS));
+   if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
itype = SYSMMU_FAULT_UNKNOWN;
-   } else {
-   itype = (enum exynos_sysmmu_inttype)
-   __ffs(__raw_readl(data->sfrbases[i] + REG_INT_STATUS));
-   if (WARN_ON(!((itype >= 0) && (itype < SYSMMU_FAULT_UNKNOWN
-   itype = SYSMMU_FAULT_UNKNOWN;
-   else
-   addr = __raw_readl(
-   data->sfrbases[i] + fault_reg_offset[itype]);
-   }
+   else
+   addr = __raw_readl(data->sfrbase + fault_reg_offset[itype]);
 
if (data->domain)
-   ret = report_iommu_fault(data->domain, data->dev,
-   addr, itype);
+   ret = report_iommu_fault(data->domain, data->dev, addr, itype);
 
if ((ret == -ENOSYS) && data->fault_handler) {
unsigned long base = data->pgtable;
if (itype != SYSMMU_FAULT_UNKNOWN)
-   base = __raw_readl(
-   data->sfrbases[i] + REG_PT_BASE_ADDR);
+   base = __raw_readl(data->sfrbase + REG_PT_BASE_ADDR);
ret = data->fault_handler(itype, base, addr);
}
 
if (!ret && (itype != SYSMMU_FAULT_UNKNOWN))
-   __raw_writel(1 << itype, data->sfrbases[i] + REG_INT_CLEAR);
+   __raw_writel(1 << itype, data->sfrbase + REG_INT_CLEAR);
else
dev_dbg(data->sysmmu, "(%s) %s is not handled.\n",
data->dbgname, sysmmu_fault_name[itype]);
 
if (itype != SYSMMU_FAULT_UNKNOWN)
-   sysmmu_unblock(data->sfrbases[i]);
+   sysmmu_unblock(data->sfrbase);
 
read_unlock(&data->lock);
 
@@ -354,20 +336,16 @@ static bool __exynos_sysmmu_disable(struct sysmmu_drvdata 
*data)
 {
unsigned long flags;
bool disabled = false;
-   int i;
 
write_lock_irqsave(&data->lock, flags);
 
if (!set_sysmmu_inactive(data))
goto finish;
 
-   for (i = 0; i < data->nsfrs; i++)
-   __raw_writel(CTRL_DISABLE, data->sfrbases[i] + REG_MMU_CTRL);
+   __raw_writel(CTRL_DISABLE, data->sfrbase + REG_MMU_CTRL);
 
-   if (data->clk[1])
-   clk_disable(data->clk[1]);
-   if (data->clk[0])
-   clk_disable(data->clk[0]);
+   if (data->clk)
+   clk_disable(data->clk);
 
disabled = true;
data->pgtable = 0;
@@ -393,7 +371,7 @@ finish:
 static int __exynos_sysmmu_enable(struct sysmmu_drvdata *data,
unsigned long pgtable, struct iommu_domain *domain)
 {
-   int i, ret = 0;
+   int ret = 0;
unsigned long flags;
 
write_lock_irqsave(&data->lock, flags);
@@ -410,17 +388,14 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
goto finish;
}
 
-   if (data->clk[0])
-   clk_enable(data->clk[0]);
-   if (data->clk[1])
-

[PATCH v11 09/27] iommu/exynos: remove dbgname from drvdata of a System MMU

2014-03-13 Thread Cho KyongHo

This patch removes dbgname member from sysmmu_drvdata structure.
Kernel message for debugging already has the name of a single
System MMU node.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   32 +---
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index a4499b2..36e6b73 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -170,7 +170,6 @@ struct sysmmu_drvdata {
struct list_head node; /* entry of exynos_iommu_domain.clients */
struct device *sysmmu;  /* System MMU's device descriptor */
struct device *dev; /* Owner of system MMU */
-   char *dbgname;
void __iomem *sfrbase;
struct clk *clk;
int activations;
@@ -321,8 +320,8 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
if (!ret && (itype != SYSMMU_FAULT_UNKNOWN))
__raw_writel(1 << itype, data->sfrbase + REG_INT_CLEAR);
else
-   dev_dbg(data->sysmmu, "(%s) %s is not handled.\n",
-   data->dbgname, sysmmu_fault_name[itype]);
+   dev_dbg(data->sysmmu, "%s is not handled.\n",
+   sysmmu_fault_name[itype]);
 
if (itype != SYSMMU_FAULT_UNKNOWN)
sysmmu_unblock(data->sfrbase);
@@ -354,10 +353,10 @@ finish:
write_unlock_irqrestore(&data->lock, flags);
 
if (disabled)
-   dev_dbg(data->sysmmu, "(%s) Disabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled\n");
else
-   dev_dbg(data->sysmmu, "(%s) %d times left to be disabled\n",
-   data->dbgname, data->activations);
+   dev_dbg(data->sysmmu, "%d times left to be disabled\n",
+   data->activations);
 
return disabled;
 }
@@ -384,7 +383,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
ret = 1;
}
 
-   dev_dbg(data->sysmmu, "(%s) Already enabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Already enabled\n");
goto finish;
}
 
@@ -399,7 +398,7 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
 
data->domain = domain;
 
-   dev_dbg(data->sysmmu, "(%s) Enabled\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Enabled\n");
 finish:
write_unlock_irqrestore(&data->lock, flags);
 
@@ -415,16 +414,15 @@ int exynos_sysmmu_enable(struct device *dev, unsigned 
long pgtable)
 
ret = pm_runtime_get_sync(data->sysmmu);
if (ret < 0) {
-   dev_dbg(data->sysmmu, "(%s) Failed to enable\n", data->dbgname);
+   dev_dbg(data->sysmmu, "Failed to enable\n");
return ret;
}
 
ret = __exynos_sysmmu_enable(data, pgtable, NULL);
if (WARN_ON(ret < 0)) {
pm_runtime_put(data->sysmmu);
-   dev_err(data->sysmmu,
-   "(%s) Already enabled with page table %#lx\n",
-   data->dbgname, data->pgtable);
+   dev_err(data->sysmmu, "Already enabled with page table %#lx\n",
+   data->pgtable);
} else {
data->dev = dev;
}
@@ -474,9 +472,7 @@ static void sysmmu_tlb_invalidate_entry(struct device *dev, 
unsigned long iova,
sysmmu_unblock(data->sfrbase);
}
} else {
-   dev_dbg(data->sysmmu,
-   "(%s) Disabled. Skipping invalidating TLB.\n",
-   data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
read_unlock_irqrestore(&data->lock, flags);
@@ -495,9 +491,7 @@ void exynos_sysmmu_tlb_invalidate(struct device *dev)
sysmmu_unblock(data->sfrbase);
}
} else {
-   dev_dbg(data->sysmmu,
-   "(%s) Disabled. Skipping invalidating TLB.\n",
-   data->dbgname);
+   dev_dbg(data->sysmmu, "Disabled. Skipping invalidating TLB.\n");
}
 
read_unlock_irqrestore(&data->lock, flags);
@@ -562,7 +556,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
pm_runtime_enable(dev);
 
-   dev_dbg(dev, "(%s) Initialized\n", data->dbgname);
+   dev_dbg(dev, "Initialized\n");
return 0;
 err_irq:
free_irq(platform_get_irq(pdev, 0), data);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 07/27] iommu/exynos: always enable runtime PM

2014-03-13 Thread Cho KyongHo

Checking if the probing device has a parent device was just to discover
if the probing device is involved in a power domain when the power
domain controlled by Samsung's custom implementation.
Since generic IO power domain is applied, it is required to remove
the condition to see if the probing device has a parent device.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index bee1bb1..8dc7031 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -632,8 +632,7 @@ static int exynos_sysmmu_probe(struct platform_device *pdev)
 
__set_fault_handler(data, &default_fault_handler);
 
-   if (dev->parent)
-   pm_runtime_enable(dev);
+   pm_runtime_enable(dev);
 
dev_dbg(dev, "(%s) Initialized\n", data->dbgname);
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 06/27] iommu/exynos: allocate lv2 page table from own slab

2014-03-13 Thread Cho KyongHo

Since kmalloc() does not guarantee that the allignment of 1KiB when it
allocates 1KiB, it is required to allocate lv2 page table from own
slab that guarantees alignment of 1KiB

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 647fc46..bee1bb1 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -100,6 +100,8 @@
 #define REG_PB1_SADDR  0x054
 #define REG_PB1_EADDR  0x058
 
+static struct kmem_cache *lv2table_kmem_cache;
+
 static unsigned long *section_entry(unsigned long *pgtable, unsigned long iova)
 {
return pgtable + lv1ent_offset(iova);
@@ -726,7 +728,8 @@ static void exynos_iommu_domain_destroy(struct iommu_domain 
*domain)
 
for (i = 0; i < NUM_LV1ENTRIES; i++)
if (lv1ent_page(priv->pgtable + i))
-   kfree(__va(lv2table_base(priv->pgtable + i)));
+   kmem_cache_free(lv2table_kmem_cache,
+   __va(lv2table_base(priv->pgtable + i)));
 
free_pages((unsigned long)priv->pgtable, 2);
free_pages((unsigned long)priv->lv2entcnt, 1);
@@ -825,7 +828,7 @@ static unsigned long *alloc_lv2entry(unsigned long *sent, 
unsigned long iova,
if (lv1ent_fault(sent)) {
unsigned long *pent;
 
-   pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
+   pent = kmem_cache_zalloc(lv2table_kmem_cache, GFP_ATOMIC);
BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
if (!pent)
return ERR_PTR(-ENOMEM);
@@ -855,8 +858,7 @@ static int lv1set_section(unsigned long *sent, unsigned 
long iova,
return -EADDRINUSE;
}
 
-   kfree(page_entry(sent, 0));
-
+   kmem_cache_free(lv2table_kmem_cache, page_entry(sent, 0));
*pgcnt = 0;
}
 
@@ -1061,11 +1063,31 @@ static int __init exynos_iommu_init(void)
 {
int ret;
 
+   lv2table_kmem_cache = kmem_cache_create("exynos-iommu-lv2table",
+   LV2TABLE_SIZE, LV2TABLE_SIZE, 0, NULL);
+   if (!lv2table_kmem_cache) {
+   pr_err("%s: Failed to create kmem cache\n", __func__);
+   return -ENOMEM;
+   }
+
ret = platform_driver_register(&exynos_sysmmu_driver);
+   if (ret) {
+   pr_err("%s: Failed to register driver\n", __func__);
+   goto err_reg_driver;
+   }
 
-   if (ret == 0)
-   bus_set_iommu(&platform_bus_type, &exynos_iommu_ops);
+   ret = bus_set_iommu(&platform_bus_type, &exynos_iommu_ops);
+   if (ret) {
+   pr_err("%s: Failed to register exynos-iommu driver.\n",
+   __func__);
+   goto err_set_iommu;
+   }
 
+   return 0;
+err_set_iommu:
+   platform_driver_unregister(&exynos_sysmmu_driver);
+err_reg_driver:
+   kmem_cache_destroy(lv2table_kmem_cache);
return ret;
 }
 subsys_initcall(exynos_iommu_init);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 05/27] iommu/exynos: remove prefetch buffer setting

2014-03-13 Thread Cho KyongHo

Prefetch buffer is a cache of System MMU 3.x and caches a block of
page table entries to make effect of larger page with small pages.
However, how to control prefetch buffers and the specifications of
prefetch buffers different from minor versions of System MMU v3.
Prefetch buffers must be controled with care because there are some
restrictions in H/W design.

The interface and implementation to initiate prefetch buffers will
be prepared later.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 0d26aeb..647fc46 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -244,13 +244,6 @@ static void __sysmmu_set_ptbase(void __iomem *sfrbase,
__sysmmu_tlb_invalidate(sfrbase);
 }
 
-static void __sysmmu_set_prefbuf(void __iomem *sfrbase, unsigned long base,
-   unsigned long size, int idx)
-{
-   __raw_writel(base, sfrbase + REG_PB0_SADDR + idx * 8);
-   __raw_writel(size - 1 + base,  sfrbase + REG_PB0_EADDR + idx * 8);
-}
-
 static void __set_fault_handler(struct sysmmu_drvdata *data,
sysmmu_fault_handler_t handler)
 {
@@ -424,15 +417,6 @@ static int __exynos_sysmmu_enable(struct sysmmu_drvdata 
*data,
 
for (i = 0; i < data->nsfrs; i++) {
__sysmmu_set_ptbase(data->sfrbases[i], pgtable);
-
-   if ((readl(data->sfrbases[i] + REG_MMU_VERSION) >> 28) == 3) {
-   /* System MMU version is 3.x */
-   __raw_writel((1 << 12) | (2 << 28),
-   data->sfrbases[i] + REG_MMU_CFG);
-   __sysmmu_set_prefbuf(data->sfrbases[i], 0, -1, 0);
-   __sysmmu_set_prefbuf(data->sfrbases[i], 0, -1, 1);
-   }
-
__raw_writel(CTRL_ENABLE, data->sfrbases[i] + REG_MMU_CTRL);
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 04/27] iommu/exynos: fix L2TLB invalidation

2014-03-13 Thread Cho KyongHo

L2TLB is 8-way set-associative TLB with 512 entries. The number of
sets is 64.
A single 4KB(small page) translation information is cached
only to a set whose index is the same with the lower 6 bits of the page
frame number.
A single 64KB(large page) translation information can be
cached to any 16 sets whose top two bits of their indices are the same
with the bit [5:4] of the page frame number.
A single 1MB(section) or larger translation information can be cached to
any set in the TLB.

It is required to invalidate entire sets that may cache the target
translation information to guarantee that the L2TLB has no stale data.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4a74ed8..0d26aeb 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -225,9 +225,14 @@ static void __sysmmu_tlb_invalidate(void __iomem *sfrbase)
 }
 
 static void __sysmmu_tlb_invalidate_entry(void __iomem *sfrbase,
-   unsigned long iova)
+   unsigned long iova, unsigned int num_inv)
 {
-   __raw_writel((iova & SPAGE_MASK) | 1, sfrbase + REG_MMU_FLUSH_ENTRY);
+   unsigned int i;
+   for (i = 0; i < num_inv; i++) {
+   __raw_writel((iova & SPAGE_MASK) | 1,
+   sfrbase + REG_MMU_FLUSH_ENTRY);
+   iova += SPAGE_SIZE;
+   }
 }
 
 static void __sysmmu_set_ptbase(void __iomem *sfrbase,
@@ -477,7 +482,8 @@ static bool exynos_sysmmu_disable(struct device *dev)
return disabled;
 }
 
-static void sysmmu_tlb_invalidate_entry(struct device *dev, unsigned long iova)
+static void sysmmu_tlb_invalidate_entry(struct device *dev, unsigned long iova,
+   size_t size)
 {
unsigned long flags;
struct sysmmu_drvdata *data = dev_get_drvdata(dev->archdata.iommu);
@@ -487,9 +493,24 @@ static void sysmmu_tlb_invalidate_entry(struct device 
*dev, unsigned long iova)
if (is_sysmmu_active(data)) {
int i;
for (i = 0; i < data->nsfrs; i++) {
+   unsigned int maj;
+   unsigned int num_inv = 1;
+   maj = __raw_readl(data->sfrbases[i] + REG_MMU_VERSION);
+   /*
+* L2TLB invalidation required
+* 4KB page: 1 invalidation
+* 64KB page: 16 invalidation
+* 1MB page: 64 invalidation
+* because it is set-associative TLB
+* with 8-way and 64 sets.
+* 1MB page can be cached in one of all sets.
+* 64KB page can be one of 16 consecutive sets.
+*/
+   if ((maj >> 28) == 2) /* major version number */
+   num_inv = min_t(unsigned int, size / PAGE_SIZE, 
64);
if (sysmmu_block(data->sfrbases[i])) {
__sysmmu_tlb_invalidate_entry(
-   data->sfrbases[i], iova);
+   data->sfrbases[i], iova, num_inv);
sysmmu_unblock(data->sfrbases[i]);
}
}
@@ -999,7 +1020,7 @@ done:
 
spin_lock_irqsave(&priv->lock, flags);
list_for_each_entry(data, &priv->clients, node)
-   sysmmu_tlb_invalidate_entry(data->dev, iova);
+   sysmmu_tlb_invalidate_entry(data->dev, iova, size);
spin_unlock_irqrestore(&priv->lock, flags);
 
return size;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 02/27] iommu/exynos: add missing cache flush for removed page table entries

2014-03-13 Thread Cho KyongHo

This commit adds cache flush for removed small and large page entries
in exynos_iommu_unmap(). Missing cache flush of removed page table
entries can cause missing page fault interrupt when a master IP
accesses an unmapped area.

Reviewed-by: Tomasz Figa 
Tested-by: Grant Grundler 
Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 4876d35..1c3a397 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -958,6 +958,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
if (lv2ent_small(ent)) {
*ent = 0;
size = SPAGE_SIZE;
+   pgtable_flush(ent, ent + 1);
priv->lv2entcnt[lv1ent_offset(iova)] += 1;
goto done;
}
@@ -966,6 +967,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
BUG_ON(size < LPAGE_SIZE);
 
memset(ent, 0, sizeof(*ent) * SPAGES_PER_LPAGE);
+   pgtable_flush(ent, ent + SPAGES_PER_LPAGE);
 
size = LPAGE_SIZE;
priv->lv2entcnt[lv1ent_offset(iova)] += SPAGES_PER_LPAGE;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 03/27] iommu/exynos: change error handling when page table update is failed

2014-03-13 Thread Cho KyongHo

This patch changes not to panic on any error when updating page table.
Instead prints error messages with callstack.

Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |   58 --
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 1c3a397..4a74ed8 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -812,13 +812,18 @@ finish:
 static unsigned long *alloc_lv2entry(unsigned long *sent, unsigned long iova,
short *pgcounter)
 {
+   if (lv1ent_section(sent)) {
+   WARN(1, "Trying mapping on %#08lx mapped with 1MiB page", iova);
+   return ERR_PTR(-EADDRINUSE);
+   }
+
if (lv1ent_fault(sent)) {
unsigned long *pent;
 
pent = kzalloc(LV2TABLE_SIZE, GFP_ATOMIC);
BUG_ON((unsigned long)pent & (LV2TABLE_SIZE - 1));
if (!pent)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
*sent = mk_lv1ent_page(__pa(pent));
*pgcounter = NUM_LV2ENTRIES;
@@ -829,14 +834,21 @@ static unsigned long *alloc_lv2entry(unsigned long *sent, 
unsigned long iova,
return page_entry(sent, iova);
 }
 
-static int lv1set_section(unsigned long *sent, phys_addr_t paddr, short *pgcnt)
+static int lv1set_section(unsigned long *sent, unsigned long iova,
+ phys_addr_t paddr, short *pgcnt)
 {
-   if (lv1ent_section(sent))
+   if (lv1ent_section(sent)) {
+   WARN(1, "Trying mapping on 1MiB@%#08lx that is mapped",
+   iova);
return -EADDRINUSE;
+   }
 
if (lv1ent_page(sent)) {
-   if (*pgcnt != NUM_LV2ENTRIES)
+   if (*pgcnt != NUM_LV2ENTRIES) {
+   WARN(1, "Trying mapping on 1MiB@%#08lx that is mapped",
+   iova);
return -EADDRINUSE;
+   }
 
kfree(page_entry(sent, 0));
 
@@ -854,8 +866,10 @@ static int lv2set_page(unsigned long *pent, phys_addr_t 
paddr, size_t size,
short *pgcnt)
 {
if (size == SPAGE_SIZE) {
-   if (!lv2ent_fault(pent))
+   if (!lv2ent_fault(pent)) {
+   WARN(1, "Trying mapping on 4KiB where mapping exists");
return -EADDRINUSE;
+   }
 
*pent = mk_lv2ent_spage(paddr);
pgtable_flush(pent, pent + 1);
@@ -864,7 +878,10 @@ static int lv2set_page(unsigned long *pent, phys_addr_t 
paddr, size_t size,
int i;
for (i = 0; i < SPAGES_PER_LPAGE; i++, pent++) {
if (!lv2ent_fault(pent)) {
-   memset(pent, 0, sizeof(*pent) * i);
+   WARN(1,
+   "Trying mapping on 64KiB where mapping exists");
+   if (i > 0)
+   memset(pent - i, 0, sizeof(*pent) * i);
return -EADDRINUSE;
}
 
@@ -892,7 +909,7 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
entry = section_entry(priv->pgtable, iova);
 
if (size == SECT_SIZE) {
-   ret = lv1set_section(entry, paddr,
+   ret = lv1set_section(entry, iova, paddr,
&priv->lv2entcnt[lv1ent_offset(iova)]);
} else {
unsigned long *pent;
@@ -900,17 +917,16 @@ static int exynos_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pent = alloc_lv2entry(entry, iova,
&priv->lv2entcnt[lv1ent_offset(iova)]);
 
-   if (!pent)
-   ret = -ENOMEM;
+   if (IS_ERR(pent))
+   ret = PTR_ERR(pent);
else
ret = lv2set_page(pent, paddr, size,
&priv->lv2entcnt[lv1ent_offset(iova)]);
}
 
-   if (ret) {
+   if (ret)
pr_debug("%s: Failed to map iova 0x%lx/0x%x bytes\n",
__func__, iova, size);
-   }
 
spin_unlock_irqrestore(&priv->pgtablelock, flags);
 
@@ -924,6 +940,7 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
struct sysmmu_drvdata *data;
unsigned long flags;
unsigned long *ent;
+   size_t err_pgsize;
 
BUG_ON(priv->pgtable == NULL);
 
@@ -932,7 +949,10 @@ static size_t exynos_iommu_unmap(struct iommu_domain 
*domain,
ent = section_entry(priv->pgtable, iova);
 
if (lv1ent_section(ent)) {
-   BUG_ON(size < SECT_S

[PATCH v11 01/27] iommu/exynos: do not include removed header

2014-03-13 Thread Cho KyongHo

Commit 25e9d28d92 (ARM: EXYNOS: remove system mmu initialization from
exynos tree) removed arch/arm/mach-exynos/mach/sysmmu.h header without
removing remaining use of it from exynos-iommu driver, thus causing a
compilation error.

This patch fixes the error by removing respective include line
from exynos-iommu.c.

CC: Tomasz Figa 
Signed-off-by: Cho KyongHo 
---
 drivers/iommu/exynos-iommu.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 0740189..4876d35 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -12,6 +12,7 @@
 #define DEBUG
 #endif
 
+#include 
 #include 
 #include 
 #include 
@@ -29,8 +30,6 @@
 #include 
 #include 
 
-#include 
-
 /* We does not consider super section mapping (16MB) */
 #define SECT_ORDER 20
 #define LPAGE_ORDER 16
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v11 00/27] iommu/exynos: Fixes and Enhancements of System MMU driver with DT

2014-03-13 Thread Cho KyongHo

Sorry for the delayed posting the v11 patchset.

The current exynos-iommu(System MMU) driver does not work autonomously
since it is lack of support for power management of peripheral blocks.
For example, MFC device driver must ensure that its System MMU is disabled
before MFC block is power-down not to invalidate IOTLB in the System MMU
when I/O memory mapping is changed. Because a System MMU resides in the
same H/W block, access to control registers of System MMU while the H/W
block is turned off must be prohibited.

This set of changes solves the above problem with setting each System MMUs
as the parent of the device which owns the System MMU to receive the
information when the device is turned off or turned on.

Another big change to the driver is the support for devicetree.
The bindings for System MMU is described in
Documentation/devicetree/bindings/arm/samsung/system-mmu.txt

In addition, this patchset also includes several bug fixes and enhancements
of the current driver.

Change log:
v11:
- Rebased on the latest works on clock, arm/samsung, iommu branches
- Change the property to link System MMU and its master H/W
  'iommu' in the master's node -> 'mmu-masters' in the System MMU's node
- Changed compatible string:
  "samsung,sysmmu-v1"
  "samsung,sysmmu-v2"
  "samsung,sysmmu-v3.1"
  "samsung,sysmmu-v3.2"
  "samsung,sysmmu-v3.3"
- Change the implementation of retrieving System MMU version -> simpler
- Check NULL pointer before call to clk_enable() and clk_disable()
- Allow a single master to link to multiple System MMUs.
  (fimc-is, fimd/g2d/Scaler in Exynos5420)
- Workarounds of known problems of System MMU
- Code enhancements:
  * Compilable for 64-bit
  * Enhanced error messages

v10:
- Rebased on the following branches
  git.linaro.org/git-ro/people/mturquette/linux.git/clk-next
  git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git/for-next
  git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/next
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.12-rc3)
- Set parent clock to all System MMU clocks.
- Add clock and DT descriptos for Exynos5420
- Modified error handling in exynos_iommu_init()
- Split "iommu/exynos: support for device tree" patch into the following 6 
patches
  iommu/exynos: handle only one instance of System MMU
  iommu/exynos: always enable runtime PM
  iommu/exynos: always use a single clock descriptor
  iommu/exynos: remove dbgname from drvdata of a System MMU
  iommu/exynos: use managed driver helper functions
  iommu/exynos: support for device tree
- Remove 'interrupt-names' and 'status' properties from DT
- Change n:1 relationship between master:System MMU into 1:1 relationship.
- Removed custom fault handler and print the status of System MMU
  whenever System MMU fault is occurred.
- Post Antonios Motakis's commit together:
  "iommu/exynos: add devices attached to the System MMU to an IOMMU group"

v9:
- Rebased on the following branches
  git.linaro.org/git-ro/people/mturquette/linux.git/clk-next
  git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git/samsung-next
  git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master (3.11-rc4)
- Split "add bus notifier for registering System MMU" into 5 patches
- Call clk_prepare() that was missing in v8.
- Fixed base address of sysmmu_tv in exynos4210.dtsi
- BUG_ON() instead of return -EADDRINUSE when trying mapping on an mapped area
- Moved camif_top to 317 in drivers/clk/samsung/clk-exynos5250.c
- Removed 'iommu' property from 'codec'(mfc) node
- Does not make 'master' clock to be the parent of 'sysmmu' clock.
   'master' clock is enabled before accessing control registers of System MMU
   and disabled after the access.

v8:
- Reordered patch list: moved "change rwloc to spinlock" to the last.
- Fixed remained bug in "fix page table maintenance".
- Always return 0 from exynos_iommu_attach_device().
- Removed prefetch buffer setting when System MMU is enabled
  due to the restriction of prefetch buffers:
  A prefetch buffer must not hit from more than one DMA.
  For instance with GScalers, if a single prefetch buffer is initialized
  with 0x0 ~ 0x and a GScaler works on source buffer at 0x1000
  and target buffer @ 0x2000, the System MMU may be got deadlock.
  Clients must initialize prefetch buffers with custom function defined
  in exynos-iommu drivers whenever they need to enable prefetch buffers.
- The clock of System MMU has no relationship with the clock of its master H/W.
  The clock of master H/W is always enabled when exynos-iommu driver needs to
  access MMIO area and disabled as soon as the access finishes.
- Removed err_page variable used in exynos_iommu_unmap() in the previous patch
  "fix page table maintenance".
- Split a big patch "add bus notifier for registering System MMU".
   Extracted the following 2 patches: 9/12 and 10/12.
- And some additional fixes...

v7:
- Rebased on the stable 3.10
- Registered PM domains and gate clocks with DT
- C

Re: [PATCH 0/2] Add exit_prepare callback to the cpufreq_driver interface.

2014-03-13 Thread Viresh Kumar

On Thu, Mar 13, 2014 at 11:06 PM,   wrote:
> From: Dirk Brandewie 
>
> Some drivers (intel_pstate) need to modify state on a core before it
> is completely offline.  The ->exit() callback is executed during the
> CPU_POST_DEAD phase of the cpu offline process which is too late to
> change the state of the core.
>
> Patch 1 adds an optional callback function to the cpufreq_driver
> interface which is called by the core during the CPU_DOWN_PREPARE
> phase of cpu offline in __cpufreq_remove_dev_prepare().
>
> Patch 2 changes intel_pstate to use the ->exit_prepare callback to do
> its cleanup during cpu offline.

Copying stuff from other mail thread here so that we can discuss on a
single mail chain.

On 14 March 2014 03:09, Rafael J. Wysocki  wrote:
> On Thursday, March 13, 2014 12:56:02 PM Viresh Kumar wrote:
>> On 13 March 2014 08:07, Rafael J. Wysocki  wrote:
>> > On Wednesday, March 12, 2014 02:27:07 PM Dirk Brandewie wrote:
>>
>> >> > I see two possibilities:
>> >> >   1. Move the exit() callback to __cpufreq_remove_dev_prepare().  I 
>> >> > don't
>> >> >  have a good understanding of what carnage this would cause in the 
>> >> > core
>> >> >  or other scaling drivers.
>> >> >
>> >> >   2. Add another callback to the cpufreq_driver interface that would be 
>> >> > call
>> >> >  from __cpufreq_remove_dev_prepare() if the callback was set.
>> >
>> > I prefer 2, the reason being that it pretty much is guaranteed not to break
>> > anything.  For the record, I'm not a fan of adding new cpufreq driver 
>> > callbacks,
>> > but in this particular case it seems we can't really do anything better.
>>
>> I haven't thought a lot about which one of these two looks better, probably
>> Rafael might be correct. But I can see another way out here as this is very
>> much driver specific. Why can't we do a register_hotcpu_notifier() from
>> pstate driver alone?
>
> Why would that be better than adding a new callback?

Because its becoming more and more confusing. Probably we got the problem
right but have wrong solutions for it.

But having considered this issue in detail now, I have more inputs. All Dirk and
Patrick wanted is to set core to min P-state before it gets offlined. We already
have some infrastructure in core which is called this today:
cpufreq_generic_suspend(). We can rework on that to get a more ideal solution
for both the problems.

Over that I don't think Dirk's solution is going to work if we twist
the systems a
bit. For example, Dirk probably wanted to set P-STATE of every core to MIN
when it goes down. But his solution probably doesn't do that right now.

As exit() is called only when the policy expires or all the CPUs of that policy
are down. Suppose only one CPU out of 4 goes down from a policy, then
pstate driver would never know that happened. And that core wouldn't go
to min state.

I think we have two solutions here:
- If its just about setting core a particular freq when it goes down, I think it
looks a generic enough problem and so better fix core for that. Probably with
help of flags field/suspend-freq (maybe renamed) and without calling drivers
exit() at all..

- If this is highly driver specific (which doesn't look like if all we
have to do
is setting freq to MIN), then better have something like
register_hotcpu_notifier() with priority set to -1, so that it gets called after
cpufreq.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 6/7] arm64: ftrace: Add CALLER_ADDRx macros

2014-03-13 Thread AKASHI Takahiro


On 03/14/2014 03:07 AM, Steven Rostedt wrote:

On Thu, 2014-03-13 at 15:54 +, Will Deacon wrote:

On Thu, Mar 13, 2014 at 10:13:49AM +, AKASHI Takahiro wrote:

CALLER_ADDRx returns caller's address at specified level in call stacks.
They are used for several tracers like irqsoff and preemptoff.
Strange to say, however, they are refered even without FTRACE.

Please note that this implementation assumes that we have frame pointers.
(which means kernel should be compiled with -fno-omit-frame-pointer.)


How do you ensure that -fno-omit-frame-pointer is passed?


Perhaps -pg does the same thing?


+#define HAVE_ARCH_CALLER_ADDR
+
+#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+#define CALLER_ADDR1 ((unsigned long)return_address(1))
+#define CALLER_ADDR2 ((unsigned long)return_address(2))
+#define CALLER_ADDR3 ((unsigned long)return_address(3))
+#define CALLER_ADDR4 ((unsigned long)return_address(4))
+#define CALLER_ADDR5 ((unsigned long)return_address(5))
+#define CALLER_ADDR6 ((unsigned long)return_address(6))


Could we change the core definitions of these macros (in linux/ftrace.h) to
use return_address, then provide an overridable version of return_address
that defaults to __builtin_return_address, instead of copy-pasting this
sequence?


We could add a new macro:

/* All archs should have this, but we define it for consistency */
#ifndef ftrace_return_address0
# define ftrace_return_address0  __builtin_return_address(0)
#endif
/* Archs may use other ways for ADDR1 and beyond */
#ifndef ftrace_return_address
# define ftrace_return_address(n) __builtin_return_address(n)
#endif

And then have:

#define CALLER_ADDR0 ((unsigned long)ftrace_return_address0)
#define CALLER_ADDR1 ((unsigned long)ftrace_return_address(1))
[...]

And then you would only need to redefine ftrace_return_address.


I'm going to create a separate RFC, including fixes for other archs.

-Takahiro AKASHI


-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")

2014-03-13 Thread Yuanhan Liu

On Wed, Mar 12, 2014 at 04:54:47PM +, Mel Gorman wrote:
> On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> > Hi,
> > 
> > Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> > kswapd") caused a big performance regression(73%) for vm-scalability/
> > lru-file-readonce testcase on a system with 256G memory without swap.
> > 
> > That testcase simply looks like this:
> >  truncate -s 1T /tmp/vm-scalability.img
> >  mkfs.xfs -q /tmp/vm-scalability.img
> >  mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> > 
> >  SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
> >  for i in `seq 1 120`; do
> >  truncate $SPARESE_FILE-$i -s 36G
> >  timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i 
> > of=/dev/null
> >  done
> > 
> >  wait
> > 
> 
> The filename implies that it's a sparse file with no IO but does not say
> what the truncate function/program/whatever actually does.

It's actually the /usr/bin/truncate file from coreutils.

> If it's really a
> sparse file then the dd process should be reading zeros and writing them to
> NULL without IO. Where are pages being dirtied?

Sorry, my bad. I was wrong and I meant to "the speed of getting new
pages", but not "the speed of dirtying pages".

> Does the truncate command
> really create a sparse file or is it something else?
> 
> > Actually, it's not the newlly added code(obey proportional scanning)
> > in that commit caused the regression. But instead, it's the following
> > change:
> > +
> > +   if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> > +   continue;
> > +
> > 
> > 
> > -   if (nr_reclaimed >= nr_to_reclaim &&
> > -   sc->priority < DEF_PRIORITY)
> > +   if (global_reclaim(sc) && !current_is_kswapd())
> > break;
> > 
> > The difference is that we might reclaim more than requested before
> > in the first round reclaimming(sc->priority == DEF_PRIORITY).
> > 
> > So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> > reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> > up the dirty rate. And thus page allocation stalls, and performance drops:
> > 
> >O for e82e0561
> >* for parent commit
> > 
> > proc-vmstat.allocstall
> > 
> >  2e+06 
> > ++---+
> >1.8e+06 O+  OO   O   
> > |
> >|
> > |
> >1.6e+06 ++   
> > |
> >1.4e+06 ++   
> > |
> >|
> > |
> >1.2e+06 ++   
> > |
> >  1e+06 ++   
> > |
> > 80 ++   
> > |
> >|
> > |
> > 60 ++   
> > |
> > 40 ++   
> > |
> >|
> > |
> > 20 
> > *+..**...*...*
> >  0 
> > ++---+
> > 
> >vm-scalability.throughput
> > 
> >2.2e+07 
> > ++---+
> >|
> > |
> >  2e+07 
> > *+..**...*...*
> >1.8e+07 ++   
> > |
> >|
> > |
> >1.6e+07 ++   
> > |
> >|
> > |
> >1.4e+07 ++   
> > |
> >|
> > |
> >1.2e+07 ++   
> > |
> >  1e+07 ++   
> > |
> >|
> > |
> >  8e+06 ++  OO   O   
> > |
> >O
> >

Re: [PATCH] fs: fix i_writecount on shmem and friends

2014-03-13 Thread Al Viro

On Thu, Mar 13, 2014 at 04:55:41PM +1100, NeilBrown wrote:

> Can we do direct writes from kernel space yet?  If so I'll change the code to
> do that so that it will work with any filesystem (which supports direct
> writes).

You can - see __swap_writepage() (mm/page_io.c).  However, that area is
about to get a lot of massage, so it would make sense to wait a bit...

> (The documentation says we that bitmap files should only be used on ext2 or
> ext3.  Most people use bitmaps on the raw devices so hopefully the few who
> have a need for files will read the documentation :-)
> 
> (and yes, I check for FMODE_WRITE)

Umm...  Where?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/7] arm64: Add ftrace support

2014-03-13 Thread AKASHI Takahiro


On 03/14/2014 02:08 AM, Will Deacon wrote:

On Thu, Mar 13, 2014 at 10:13:47AM +, AKASHI Takahiro wrote:

This patch implements arm64 specific part to support function tracers,
such as function (CONFIG_FUNCTION_TRACER), function_graph
(CONFIG_FUNCTION_GRAPH_TRACER) and function profiler
(CONFIG_FUNCTION_PROFILER).

With 'function' tracer, all the functions in the kernel are traced with
timestamps in ${sysfs}/tracing/trace. If function_graph tracer is
specified, call graph is generated.

The kernel must be compiled with -pg option so that _mcount() is inserted
at the beginning of functions. This function is called on every function's
entry as long as tracing is enabled.
In addition, function_graph tracer also needs to be able to probe function's
exit. ftrace_graph_caller() & return_to_handler do this by faking link
register's value to intercept function's return path.

More details on architecture specific requirements are described in
Documentation/trace/ftrace-design.txt.


[...]


You seem not to like this statement :-)


diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
new file mode 100644
index 000..0ac31c8
--- /dev/null
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -0,0 +1,175 @@
+/*
+ * arch/arm64/kernel/entry-ftrace.S
+ *
+ * Copyright (C) 2013 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * Gcc with -pg will put the following code in the beginning of each function:
+ *  mov x0, x30
+ *  bl _mcount
+ * [function's body ...]
+ * "bl _mcount" may be replaced to "bl ftrace_caller" or NOP if dynamic
+ * ftrace is enabled.
+ *
+ * Please note that x0 as an argument will not be used here because we can
+ * get lr(x30) of instrumented function at any time by winding up call stack
+ * as long as the kernel is compiled without -fomit-frame-pointer.
+ * (or CONFIG_FRAME_POINTER, this is forced on arm64)
+ *
+ * stack layout after mcount_enter in _mcount():
+ *
+ * current sp/fp =>  0:+-+
+ * in _mcount()   | x29 | -> instrumented function's fp
+ *+-+
+ *| x30 | -> _mcount()'s lr (= instrumented function's pc)
+ * old sp  => +16:+-+
+ * when instrumented   | |
+ * function calls  | ... |
+ * _mcount()  | |
+ *| |
+ * instrumented => +xx:+--- --+
+ * function's fp   | x29 | -> parent's fp
+ *+-+
+ *| x30 | -> instrumented function's lr (= parent's pc)
+ *+-+
+ *| ... |


I guess it's just the diff that's misaligning your ASCII art here?


Yes, I think so. Misaligned due to "tab"


+/*
+ * void return_to_handler(void)
+ *
+ * Run ftrace_return_to_handler() before going back to parent.
+ * @fp is checked against the value passed by ftrace_graph_caller()
+ * only when CONFIG_FUNCTION_GRAPH_FP_TEST is enabled.a
+ */
+   .global return_to_handler
+return_to_handler:


ENTRY(return_to_handler)


Fix it.


+   str x0, [sp, #-16]!
+   mov x0, x29 // parent's fp
+   bl  ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
+   mov x30, x0 // restore the original return address
+   ldr x0, [sp], #16
+   ret


and an ENDPROC here.


Fix it.
But please note that this (return_to_handler) is not a real function.
Declaring it as ENDPROC is not very useful.


+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
new file mode 100644
index 000..a559ab8
--- /dev/null
+++ b/arch/arm64/kernel/ftrace.c
@@ -0,0 +1,64 @@
+/*
+ * arch/arm64/kernel/ftrace.c
+ *
+ * Copyright (C) 2013 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+/*
+ * function_graph tracer expects ftrace_return_to_handler() to be called
+ * on the way back to parent. For this purpose, this function is called
+ * in _mcount() or ftrace_caller() to replace return address (*parent) on
+ * the call stack to return_to_handler.
+ *
+ * Note that @frame_pointer is used only for sanity check later.
+ */
+void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
+  unsigned long frame_pointer)
+{
+   unsigned long return_hooker = (unsigned long)&return_to_handler;
+   unsigned long old;
+   struct ftrace_graph_ent trace;
+   int err;
+
+   if (unlikely(atomic_read(¤t->tracing_grap

Re: [PATCH v4] mm: per-thread vma caching

2014-03-13 Thread Andrew Morton

On Fri, 14 Mar 2014 11:05:51 +0800 Li Zefan  wrote:

> Hi Davidlohr,
> 
> On 2014/3/4 11:26, Linus Torvalds wrote:
> > On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso  wrote:
> >>
> >> Yes, I shortly realized that was silly... but I can say for sure it can
> >> happen and a quick qemu run confirms it. So I see your point as to
> >> asking why we need it, so now I'm looking for an explanation in the
> >> code.
> > 
> > We definitely *do* have users.
> > 
> > One example would be ptrace -> access_process_vm -> __access_remote_vm
> > -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> > 
> 
> I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
> your patch? Don't know how it was triggered.
> 
> ...
>
> [ 6072.027007]  [] get_user_pages+0x52/0x60
> [ 6072.027015]  [] __access_remote_vm+0x118/0x1f0
> [ 6072.027023]  [] access_process_vm+0x5b/0x80
> [ 6072.027033]  [] proc_pid_cmdline+0x77/0x120
> [ 6072.027041]  [] proc_info_read+0xa2/0xe0
> [ 6072.027050]  [] vfs_read+0xad/0x1a0
> [ 6072.027057]  [] SyS_read+0x65/0xb0
> [ 6072.027066]  [] system_call_fastpath+0x16/0x1b
> [ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 
> 89 c3 0f 86 d7 00 00 00 4c 89 e0
>  49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 
> 08 74 1b 49 8b 87 90 00 00
> [ 6072.027134] RIP  [] follow_page_mask+0x69/0x620
> [ 6072.027142]  RSP 
> [ 6072.027146] CR2: 07f8

Yep.  Please grab whichever of

mm-per-thread-vma-caching-fix-3.patch
mm-per-thread-vma-caching-fix-4.patch
mm-per-thread-vma-caching-fix-5.patch
mm-per-thread-vma-caching-fix-6-checkpatch-fixes.patch
mm-per-thread-vma-caching-fix-6-fix.patch

which you don't have from http://ozlabs.org/~akpm/mmots/broken-out/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next v3 1/2] r8152: addRTL8152_EARLY_AGG_TIMEOUT_SUPER

2014-03-13 Thread David Miller

From: hayeswang 
Date: Fri, 14 Mar 2014 10:37:21 +0800

>  From: David Miller [mailto:da...@davemloft.net] 
>  Sent: Friday, March 14, 2014 1:22 AM
> [...]
>> And I fundamentally disagree with this being a Kconfig parameter.
>> 
>> Make it run-time calculated _or_ settable via ethtool.
> 
> Excuse me. How should I make it run-time calculated without a
> Kconfig parameter? Should I use module_param? 

You run-time determine the setting based upon the negotiated link
speed and traffic patterns.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm fixes

2014-03-13 Thread Dave Airlie


Hi Linus,

pretty minor set of fixes for radeon, ttm and vmwgfx,
ttm ones are a regression and an oops seen on server chipsets.

Dave.

The following changes since commit fa389e220254c69ffae0d403eac4146171062d08:

  Linux 3.14-rc6 (2014-03-09 19:41:57 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to f042cc4a607d675452134884c26e8d0d395dd979:

  Merge tag 'ttm-fixes-3.14-2014-03-12' of 
git://people.freedesktop.org/~thomash/linux into drm-fixes (2014-03-13 17:31:24 
+1000)



Alex Deucher (4):
  drm/radeon: fix runpm disabling on non-PX harder
  drm/radeon/cik: properly set sdma ring status on disable
  drm/radeon/cik: stop the sdma engines in the enable() function
  drm/radeon/cik: properly set compute ring status on disable

Dave Airlie (3):
  Merge tag 'vmwgfx-fixes-3.14-2014-03-13' of 
git://people.freedesktop.org/~thomash/linux into drm-fixes
  Merge branch 'drm-fixes-3.14' of 
git://people.freedesktop.org/~agd5f/linux into drm-fixes
  Merge tag 'ttm-fixes-3.14-2014-03-12' of 
git://people.freedesktop.org/~thomash/linux into drm-fixes

Rob Clark (1):
  drm/ttm: don't oops if no invalidate_caches()

Thomas Hellstrom (2):
  drm/ttm: Work around performance regression with VM_PFNMAP
  drm/vmwgfx: Fix a surface reference corner-case in legacy emulation mode

 drivers/gpu/drm/radeon/cik.c|  5 -
 drivers/gpu/drm/radeon/cik_sdma.c   | 14 +++---
 drivers/gpu/drm/radeon/radeon_kms.c | 10 +-
 drivers/gpu/drm/ttm/ttm_bo.c|  8 +---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++-
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 18 ++
 6 files changed, 50 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 7/7] arm64: ftrace: Add system call tracepoint

2014-03-13 Thread AKASHI Takahiro


On 03/14/2014 01:25 AM, Will Deacon wrote:

On Thu, Mar 13, 2014 at 10:13:50AM +, AKASHI Takahiro wrote:

This patch allows system call entry or exit to be traced as ftrace events,
ie. sys_enter_*/sys_exit_*, if CONFIG_FTRACE_SYSCALLS is enabled.
Those events appear and can be controlled under
 ${sysfs}/tracing/events/syscalls/

Please note that we can't trace compat system calls here because
AArch32 mode does not share the same syscall table with AArch64.
Just define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS in order to avoid unexpected
results (bogus syscalls reported or even hang-up).

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/Kconfig   |1 +
  arch/arm64/include/asm/ftrace.h  |   20 
  arch/arm64/include/asm/syscall.h |1 +
  arch/arm64/include/asm/unistd.h  |2 ++
  arch/arm64/kernel/ptrace.c   |   48 ++
  5 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6954959..b1dcdb4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -43,6 +43,7 @@ config ARM64
select HAVE_MEMBLOCK
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
+   select HAVE_SYSCALL_TRACEPOINTS
select IRQ_DOMAIN
select MODULES_USE_ELF_RELA
select NO_BOOTMEM
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index c44c4b1..4ef06f1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -44,6 +44,26 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
  #define CALLER_ADDR4 ((unsigned long)return_address(4))
  #define CALLER_ADDR5 ((unsigned long)return_address(5))
  #define CALLER_ADDR6 ((unsigned long)return_address(6))
+
+#include 
+
+/*
+ * Because AArch32 mode does not share the same syscall table with AArch64,
+ * tracing compat syscalls may result in reporting bogus syscalls or even
+ * hang-up, so just do not trace them.
+ * See kernel/trace/trace_syscalls.c
+ *
+ * x86 code says:
+ * If the user realy wants these, then they should use the
+ * raw syscall tracepoints with filtering.


Fair enough.


+ */
+#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1


You don't need the '1' here.


OK.


+static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
+{
+if (is_compat_task())
+return true;
+return false;
+}


return is_compat_task();


Fix it.


  #endif /* ifndef __ASSEMBLY__ */

  #endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 70ba9d4..383771e 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -18,6 +18,7 @@

  #include 

+extern const void *sys_call_table[];

  static inline int syscall_get_nr(struct task_struct *task,
 struct pt_regs *regs)
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 82ce217..c335479 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -28,3 +28,5 @@
  #endif
  #define __ARCH_WANT_SYS_CLONE
  #include 
+
+#define NR_syscalls (__NR_syscalls)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 9993a8f..9c52b3e 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -41,6 +41,9 @@
  #include 
  #include 

+#define CREATE_TRACE_POINTS
+#include 
+
  /*
   * TODO: does not yet catch signals sent when the child dies.
   * in exit.c or in signal.c.
@@ -1062,29 +1065,31 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
  {
unsigned long saved_reg;

-   if (!test_thread_flag(TIF_SYSCALL_TRACE))
-   return regs->syscallno;
+   if (test_thread_flag(TIF_SYSCALL_TRACE)) {
+   /*
+* A scrach register (ip(r12) on AArch32, x7 on AArch64) is
+* used to denote syscall entry/exit:
+*   0 -> entry
+*/
+   if (is_compat_task()) {


if (arch_trace_is_compat_syscall())


I don't mind either way, but this part of code comes from the original
syscall_trace() (ie. ptrace stuff), and has nothing to do with ftrace events.
(You know, arch_trace_is_compat_syscall() is currently defined in asm/ftrace.h.)
So I'd like to keep it unchanged unless you really want.


With those changes:

   Acked-by: Will Deacon 


Thank you,
-Takahiro AKASHI


Will


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 0/8] devfreq: exynos4: Support dt and use common ppmu driver

2014-03-13 Thread Chanwoo Choi

Hi,

On 03/14/2014 01:43 AM, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Thursday, March 13, 2014 05:17:21 PM Chanwoo Choi wrote:
>> This patchset support devicetree and use common ppmu driver instead of
>> individual code of exynos4_bus.c to remove duplicate code. Also this patchset
>> get the resources for busfreq from dt data by using DT helper function.
>> - PPMU register address
>> - PPMU clock
>> - Regulator for INT/MIF block
>>
>> This patchset use SET_SYSTEM_SLEEP_PM_OPS macro intead of legacy method.
>> To remove power-leakage in suspend state, before entering suspend state,
>> disable ppmu clocks.
>>
>> Changes from v1:
>> - Add exynos4_bus.txt documentation for devicetree guide
>> - Fix probe failure if CONFIG_PM_OPP is disabled
>> - Fix typo and resource leak(regulator/clock/memory) when happening probe 
>> failure
>> - Add additionally comment for PPMU usage instead of previous PPC
>> - Split separate patch to remove ambiguous of patch
>>
>> Chanwoo Choi (8):
>>   devfreq: exynos4: Support devicetree to get device id of Exynos4 SoC
>>   devfreq: exynos4: Use common ppmu driver and get ppmu address from dt data
>>   devfreq: exynos4: Add ppmu's clock control and code clean about regulator 
>> control
>>   devfreq: exynos4: Fix bug of resource leak and code clean on probe()
>>   devfreq: exynos4: Use SET_SYSTEM_SLEEP_PM_OPS macro
>>   devfreq: exynos4: Fix power-leakage of clock on suspend state
>>   devfreq: exynos4: Add CONFIG_PM_OPP dependency to fix probe fail
>>   devfreq: exynos4: Add busfreq driver for exynos4210/exynos4x12
>>
>>  .../devicetree/bindings/devfreq/exynos4_bus.txt|  49 +++
>>  drivers/devfreq/Kconfig|   1 +
>>  drivers/devfreq/exynos/Makefile|   2 +-
>>  drivers/devfreq/exynos/exynos4_bus.c   | 415 
>> ++---
>>  4 files changed, 341 insertions(+), 126 deletions(-)
>>  create mode 100644 Documentation/devicetree/bindings/devfreq/exynos4_bus.txt
> 
> Thanks for updating this patchset.  There are still some minor issues
> left though:
> 
> - patch #4 should be at beginning of the patch series
> 
> - moving of devfreq_unregister_opp_notifier(dev, data->devfreq) from
>   exynos4_bus_exit() to exynos4_busfreq_remove() should be in patch #4
>   (which should really be at the beggining of patch series) not #3
> 
> - handling of iounmap(data->ppmu[i].hw_base) should be added to
>   exynos4_bus_exit() in patch #2 not #3
> 
> - patch #8 summary and description should mention fact that it adds DT
>   binding documentation (not the driver itself) and the patch itself
>   can be slighlty polished

OK, I'll re-order the sequence of patchset and modify minior issues about your 
comment.
Also, I'll modify the patch description for patch8.

> 
> One important note about this patchset not mentioned in the cover
> letter is that it is improving currently unused driver (because of
> DT-only mach-exynos conversion the only user was removed in June 2013
> and from the reading the code I suspect that even that user hadn't
> worked previously).  As such this patch series should not cause any
> regressions.

I don't understand correct your meaning.I explained DT support on upper
patchset description by using DT helper function and I added PPMU descritpion.
Also, Each patch include detailed description of patch content.

What is more needed?

Best Regards,
Chanwoo Choi





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] mm: per-thread vma caching

2014-03-13 Thread Li Zefan

Hi Davidlohr,

On 2014/3/4 11:26, Linus Torvalds wrote:
> On Mon, Mar 3, 2014 at 7:13 PM, Davidlohr Bueso  wrote:
>>
>> Yes, I shortly realized that was silly... but I can say for sure it can
>> happen and a quick qemu run confirms it. So I see your point as to
>> asking why we need it, so now I'm looking for an explanation in the
>> code.
> 
> We definitely *do* have users.
> 
> One example would be ptrace -> access_process_vm -> __access_remote_vm
> -> get_user_pages() -> find_extend_vma() -> find_vma_prev -> find_vma.
> 

I raw this oops on 3.14.0-rc5-next-20140307, which is possible caused by
your patch? Don't know how it was triggered.

[ 6072.026715] BUG: unable to handle kernel NULL pointer dereference at 
07f8
[ 6072.026729] IP: [] follow_page_mask+0x69/0x620
[ 6072.026742] PGD c1975f067 PUD c19479067 PMD 0
[ 6072.026749] Oops:  [#1] SMP
[ 6072.026852] CPU: 2 PID: 13445 Comm: ps Not tainted 
3.14.0-rc5-next-20140307-0.1-default+ #4
[ 6072.026863] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285
  /BC11BTSA  , BIO
S CTSAV036 04/27/2011
[ 6072.026872] task: 88061d8848a0 ti: 880618854000 task.ti: 
880618854000
[ 6072.026880] RIP: 0010:[]  [] 
follow_page_mask+0x69/0x620
[ 6072.026889] RSP: 0018:880618855c18  EFLAGS: 00010206
[ 6072.026895] RAX: 00ff RBX: ffea RCX: 880618855d0c
[ 6072.026902] RDX:  RSI: 7fff0a474cc7 RDI: 88061aef8f00
[ 6072.026909] RBP: 880618855c88 R08: 0002 R09: 
[ 6072.026916] R10:  R11: 3485 R12: 7fff0a474cc7
[ 6072.026924] R13: 0016 R14: 88061aef8f00 R15: 880c1c842508
[ 6072.026932] FS:  7f4687701700() GS:880c26a0() 
knlGS:
[ 6072.026940] CS:  0010 DS:  ES:  CR0: 8005003b
[ 6072.026947] CR2: 07f8 CR3: 000c184ee000 CR4: 07e0
[ 6072.026955] Stack:
[ 6072.026959]  880618855c48 880618855d0c 18855c58 
0246
[ 6072.026969]   0752 817c975c 

[ 6072.026980]  880618855c88 0016 880c1c842508 
88061d8848a0
[ 6072.026989] Call Trace:
[ 6072.026998]  [] __get_user_pages+0x204/0x5a0
[ 6072.027007]  [] get_user_pages+0x52/0x60
[ 6072.027015]  [] __access_remote_vm+0x118/0x1f0
[ 6072.027023]  [] access_process_vm+0x5b/0x80
[ 6072.027033]  [] proc_pid_cmdline+0x77/0x120
[ 6072.027041]  [] proc_info_read+0xa2/0xe0
[ 6072.027050]  [] vfs_read+0xad/0x1a0
[ 6072.027057]  [] SyS_read+0x65/0xb0
[ 6072.027066]  [] system_call_fastpath+0x16/0x1b
[ 6072.027072] Code: f4 4c 89 f7 89 45 a4 e8 36 0e eb ff 48 3d 00 f0 ff ff 48 
89 c3 0f 86 d7 00 00 00 4c 89 e0
 49 8b 56 40 48 c1 e8 27 25 ff 01 00 00 <48> 8b 0c c2 48 85 c9 75 3e 41 83 e5 
08 74 1b 49 8b 87 90 00 00
[ 6072.027134] RIP  [] follow_page_mask+0x69/0x620
[ 6072.027142]  RSP 
[ 6072.027146] CR2: 07f8
[ 6072.134516] ---[ end trace 8d006e01f05d1ba8 ]---


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 6/7] arm64: ftrace: Add CALLER_ADDRx macros

2014-03-13 Thread AKASHI Takahiro


On 03/14/2014 12:54 AM, Will Deacon wrote:

On Thu, Mar 13, 2014 at 10:13:49AM +, AKASHI Takahiro wrote:

CALLER_ADDRx returns caller's address at specified level in call stacks.
They are used for several tracers like irqsoff and preemptoff.
Strange to say, however, they are refered even without FTRACE.

Please note that this implementation assumes that we have frame pointers.
(which means kernel should be compiled with -fno-omit-frame-pointer.)


How do you ensure that -fno-omit-frame-pointer is passed?


arm64 selects ARCH_WANT_FRAME_POINTERS, then FRAME_POINTER is on 
(lib/Kconfig.debug)
and so -fno-omit-frame-pointer is appended (${TOP}/Makefile).
(stacktrace.c also assumes FRAME_POINTER.)

Do you think I should remove the comment above?


Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/include/asm/ftrace.h|   13 -
  arch/arm64/kernel/Makefile |3 +-
  arch/arm64/kernel/return_address.c |   55 
  3 files changed, 69 insertions(+), 2 deletions(-)
  create mode 100644 arch/arm64/kernel/return_address.c

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ed5c448..c44c4b1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -18,6 +18,7 @@

  #ifndef __ASSEMBLY__
  extern void _mcount(unsigned long);
+extern void *return_address(unsigned int);

  struct dyn_arch_ftrace {
/* No extra data needed for arm64 */
@@ -33,6 +34,16 @@ static inline unsigned long ftrace_call_adjust(unsigned long 
addr)
 */
return addr;
  }
-#endif /* __ASSEMBLY__ */
+
+#define HAVE_ARCH_CALLER_ADDR
+
+#define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
+#define CALLER_ADDR1 ((unsigned long)return_address(1))
+#define CALLER_ADDR2 ((unsigned long)return_address(2))
+#define CALLER_ADDR3 ((unsigned long)return_address(3))
+#define CALLER_ADDR4 ((unsigned long)return_address(4))
+#define CALLER_ADDR5 ((unsigned long)return_address(5))
+#define CALLER_ADDR6 ((unsigned long)return_address(6))


Could we change the core definitions of these macros (in linux/ftrace.h) to
use return_address, then provide an overridable version of return_address
that defaults to __builtin_return_address, instead of copy-pasting this
sequence?


I think I understand what you mean, and will try to post a separate RFC,
but I also want to hold off this change on this patch since such a change
may raise a small controversy from other archs' maintainers.


diff --git a/arch/arm64/kernel/return_address.c 
b/arch/arm64/kernel/return_address.c
new file mode 100644
index 000..89102a6
--- /dev/null
+++ b/arch/arm64/kernel/return_address.c
@@ -0,0 +1,55 @@
+/*
+ * arch/arm64/kernel/return_address.c
+ *
+ * Copyright (C) 2013 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+#include 
+
+struct return_address_data {
+   unsigned int level;
+   void *addr;
+};
+
+static int save_return_addr(struct stackframe *frame, void *d)
+{
+   struct return_address_data *data = d;
+
+   if (!data->level) {
+   data->addr = (void *)frame->pc;
+   return 1;
+   } else {
+   --data->level;
+   return 0;
+   }
+}
+
+void *return_address(unsigned int level)
+{
+   struct return_address_data data;
+   struct stackframe frame;
+   register unsigned long current_sp asm ("sp");
+
+   data.level = level + 2;
+   data.addr = NULL;
+
+   frame.fp = (unsigned long)__builtin_frame_address(0);
+   frame.sp = current_sp;
+   frame.pc = (unsigned long)return_address; /* dummy */
+
+   walk_stackframe(&frame, save_return_addr, &data);
+
+   if (!data.level)
+   return data.addr;
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(return_address);


This whole file is basically copied from arch/arm/, but it's not too much
code. Ideally the toolchain would have made use of the frame pointer, but it
looks like it doesn't bother.


I confirmed that __builtin_return_address([123456]) doesn't work
even with -fno-omit-frame-pointer.
Keep this as it is.

-Takahiro AKASHI


Will


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] user namespace: fix incorrect memory barriers

2014-03-13 Thread Mikulas Patocka

smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.

In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
  body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
  instructions in the body of the for loop

The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.

Signed-off-by: Mikulas Patocka 
Cc: sta...@vger.kernel.org  # 3.5+

---
 kernel/user_namespace.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

Index: linux-3.14-rc5/kernel/user_namespace.c
===
--- linux-3.14-rc5.orig/kernel/user_namespace.c 2014-03-14 03:28:50.0 
+0100
+++ linux-3.14-rc5/kernel/user_namespace.c  2014-03-14 03:29:36.0 
+0100
@@ -152,7 +152,7 @@ static u32 map_id_range_down(struct uid_
 
/* Find the matching extent */
extents = map->nr_extents;
-   smp_read_barrier_depends();
+   smp_rmb();
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].first;
last = first + map->extent[idx].count - 1;
@@ -176,7 +176,7 @@ static u32 map_id_down(struct uid_gid_ma
 
/* Find the matching extent */
extents = map->nr_extents;
-   smp_read_barrier_depends();
+   smp_rmb();
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].first;
last = first + map->extent[idx].count - 1;
@@ -199,7 +199,7 @@ static u32 map_id_up(struct uid_gid_map 
 
/* Find the matching extent */
extents = map->nr_extents;
-   smp_read_barrier_depends();
+   smp_rmb();
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].lower_first;
last = first + map->extent[idx].count - 1;
@@ -615,9 +615,8 @@ static ssize_t map_write(struct file *fi
 * were written before the count of the extents.
 *
 * To achieve this smp_wmb() is used on guarantee the write
-* order and smp_read_barrier_depends() is guaranteed that we
-* don't have crazy architectures returning stale data.
-*
+* order and smp_rmb() is guaranteed that we don't have crazy
+* architectures returning stale data.
 */
mutex_lock(&id_map_mutex);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] procfs: make smp_affinity values go+r

2014-03-13 Thread Chema Gonzalez

Includes:
- /proc/irq/default_smp_affinity
- /proc/irq/*/affinity_hint
- /proc/irq/*/smp_affinity
- /proc/irq/*/smp_affinity_list

Users can distill the same information by reading /proc/interrupts.

Signed-off-by: Chema Gonzalez 
---
 kernel/irq/proc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 36f6ee1..ac1ba2f 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -324,15 +324,15 @@ void register_irq_proc(unsigned int irq, struct irq_desc 
*desc)
 
 #ifdef CONFIG_SMP
/* create /proc/irq//smp_affinity */
-   proc_create_data("smp_affinity", 0600, desc->dir,
+   proc_create_data("smp_affinity", 0644, desc->dir,
 &irq_affinity_proc_fops, (void *)(long)irq);
 
/* create /proc/irq//affinity_hint */
-   proc_create_data("affinity_hint", 0400, desc->dir,
+   proc_create_data("affinity_hint", 0444, desc->dir,
 &irq_affinity_hint_proc_fops, (void *)(long)irq);
 
/* create /proc/irq//smp_affinity_list */
-   proc_create_data("smp_affinity_list", 0600, desc->dir,
+   proc_create_data("smp_affinity_list", 0644, desc->dir,
 &irq_affinity_list_proc_fops, (void *)(long)irq);
 
proc_create_data("node", 0444, desc->dir,
@@ -372,7 +372,7 @@ void unregister_handler_proc(unsigned int irq, struct 
irqaction *action)
 static void register_default_affinity_proc(void)
 {
 #ifdef CONFIG_SMP
-   proc_create("irq/default_smp_affinity", 0600, NULL,
+   proc_create("irq/default_smp_affinity", 0644, NULL,
&default_affinity_proc_fops);
 #endif
 }
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/vdso] x86, vdso, xen: Remove stray reference to FIX_VDSO

2014-03-13 Thread tip-bot for H. Peter Anvin

Commit-ID:  1f2cbcf648962cdcf511d234cb39745baa9f5d07
Gitweb: http://git.kernel.org/tip/1f2cbcf648962cdcf511d234cb39745baa9f5d07
Author: H. Peter Anvin 
AuthorDate: Thu, 13 Mar 2014 19:44:47 -0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 13 Mar 2014 19:44:47 -0700

x86, vdso, xen: Remove stray reference to FIX_VDSO

Checkin

b0b49f2673f0 x86, vdso: Remove compat vdso support

... removed the VDSO from the fixmap, and thus FIX_VDSO; remove a
stray reference in Xen.

Found by Fengguang Wu's test robot.

Reported-by: Fengguang Wu 
Cc: Andy Lutomirski 
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Link: 
http://lkml.kernel.org/r/4bb4690899106eb11430b1186d5cc66ca9d1660c.1394751608.git.l...@amacapital.net
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/mmu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 256282e..21c6a42 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2058,7 +2058,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t 
phys, pgprot_t prot)
case FIX_RO_IDT:
 #ifdef CONFIG_X86_32
case FIX_WP_TEST:
-   case FIX_VDSO:
 # ifdef CONFIG_HIGHMEM
case FIX_KMAP_BEGIN ... FIX_KMAP_END:
 # endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 4/4] x86: Pass memory range via E820 for kdump

2014-03-13 Thread Dave Young

> [..]
> 
> I tested on a prototype system with 231 entries in the map with good results.
> Everything succeeds when using kexec to initiate a fast reboot. For crash, it
> works with and without --pass-memmap-cmdline when using noefi. I hit the
> following panic when initiating a crash leaving EFI enabled in the crash 
> kernel:
> 
>  ? __unmap_pmd_range+0x77/0x190
>  unmap_pmd_range+0xcf/0x1c0
>  populate_pgd+0x16d/0x250
>  __cpa_process_fault+0x15/0xb0
>  __change_page_attr+0x15e/0x2a0
>  __change_page_attr_set_clr+0x59/0xc0
>  kernel_map_pages_in_pgd+0x7a/0xb0
>  __map_region+0x46/0x64
>  ? early_idt_handlers+0x117/0x120
>  efi_map_region_fixed+0xd/0xf
>  efi_enter_virtual_mode+0x4c/0x476
>  ? early_idt_handlers+0x117/0x120
>  ? early_idt_handlers+0x117/0x120
>  start_kernel+0x2e5/0x376
>  ? repair_env_string+0x5b/0x5b
>  ? memblock_reserve+0x49/0x4e
>  x86_64_start_reservations+0x2a/0x2c
>  x86_64_start_kernel+0x19f/0x1ae
> 
> However, I was able to reproduce this panic using an earlier version of
> kexec-tools, so I believe it is unrelated to this patch.

Can you test with matt's tree to see if it works?
If it still happens please post the full log.

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] procfs: make smp_affinity values 0644

2014-03-13 Thread Chema Gonzalez

0444 maybe?

-Chema

On Thu, Mar 13, 2014 at 7:37 PM, Eric Dumazet  wrote:
> On Thu, 2014-03-13 at 19:05 -0700, Chema Gonzalez wrote:
>> Includes:
>> - /proc/irq/default_smp_affinity
>> - /proc/irq/*/smp_affinity
>> - /proc/irq/*/smp_affinity_list
>>
>> Users can distill the same information by reading /proc/interrupts.
>>
>> Signed-off-by: Chema Gonzalez 
>> ---
>
> Seems good to me, but what about affinity_hint ?
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH net-next v3 1/2] r8152: addRTL8152_EARLY_AGG_TIMEOUT_SUPER

2014-03-13 Thread hayeswang

 From: David Miller [mailto:da...@davemloft.net] 
 Sent: Friday, March 14, 2014 1:22 AM
[...]
> And I fundamentally disagree with this being a Kconfig parameter.
> 
> Make it run-time calculated _or_ settable via ethtool.

Excuse me. How should I make it run-time calculated without a
Kconfig parameter? Should I use module_param? 

Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] procfs: make smp_affinity values 0644

2014-03-13 Thread Eric Dumazet

On Thu, 2014-03-13 at 19:05 -0700, Chema Gonzalez wrote:
> Includes:
> - /proc/irq/default_smp_affinity
> - /proc/irq/*/smp_affinity
> - /proc/irq/*/smp_affinity_list
> 
> Users can distill the same information by reading /proc/interrupts.
> 
> Signed-off-by: Chema Gonzalez 
> ---

Seems good to me, but what about affinity_hint ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/6] sched: powerpc: create a dedicated topology table

2014-03-13 Thread Preeti U Murthy

On 03/12/2014 04:34 PM, Dietmar Eggemann wrote:
> On 12/03/14 07:44, Vincent Guittot wrote:
>> On 12 March 2014 05:42, Preeti U Murthy  wrote:
>>> On 03/11/2014 06:48 PM, Vincent Guittot wrote:
 On 11 March 2014 11:08, Preeti U Murthy  wrote:
> Hi Vincent,
>
> On 03/05/2014 12:48 PM, Vincent Guittot wrote:
>> Create a dedicated topology table for handling asymetric feature.
>> The current proposal creates a new level which describes which groups of 
>> CPUs
>> take adavantge of SD_ASYM_PACKING. The useless level will be removed 
>> during the
>> build of the sched_domain topology.
>>
>> Another solution would be to set SD_ASYM_PACKING in the sd_flags of SMT 
>> level
>> during the boot sequence and before the build of the sched_domain 
>> topology.
>
> Is the below what you mean as the other solution? If it is so, I would
> strongly recommend this approach rather than adding another level to the
> topology level to represent the asymmetric behaviour.
>
> +static struct sched_domain_topology_level powerpc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +   { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
> SD_INIT_NAME(SMT) | arch_sd_sibling_asym_packing() },
> +#endif
> +   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +   { NULL, },
> +};

 Not exactly like that but something like below

 +static struct sched_domain_topology_level powerpc_topology[] = {
 +#ifdef CONFIG_SCHED_SMT
 + { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
 SD_INIT_NAME(SMT) },
 +#endif
 + { cpu_cpu_mask, SD_INIT_NAME(DIE) },
 + { NULL, },
 +};
 +
 +static void __init set_sched_topology(void)
 +{
 +#ifdef CONFIG_SCHED_SMT
 + powerpc_topology[0].sd_flags |= arch_sd_sibling_asym_packing();
 +#endif
 + sched_domain_topology = powerpc_topology;
 +}
>>>
>>> I think we can set it in powerpc_topology[] and not bother about setting
>>> additional flags outside of this array. It is clearer this way.
>>
>> IIRC, the arch_sd_sibling_asym_packing of powerpc can be set at
>> runtime which prevents it from being put directly in the table. Or it
>> means that we should use a function pointer in the table for setting
>> flags instead of a static value like the current proposal.
> 
> Right,
> 
> static struct sched_domain_topology_level powerpc_topology[] = {
> #ifdef CONFIG_SCHED_SMT
> { cpu_asmt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES |
> SD_ASYM_PACKING, SD_INIT_NAME(ASMT) },
> { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES |
> arch_sd_sibling_asym_packing() | SD_SHARE_POWERDOMAIN, SD_INIT_NAME(SMT) },
> #endif
> { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> { NULL, },
> };
> 
> is not compiling:
> 
>   CC  arch/powerpc/kernel/smp.o
> arch/powerpc/kernel/smp.c:787:2: error: initializer element is not constant
> arch/powerpc/kernel/smp.c:787:2: error: (near initialization for
> 'powerpc_topology[1].sd_flags')
> 
> So I'm in favour of a function pointer, even w/o the 'int cpu' parameter
> to circumvent the issue that it is too easy to create broken sd setups.

Alright, this looks fine to me. You could use the function pointer to
retrieve flags and have all initializations of sched domain features
consolidated in the table.

Regards
Preeti U Murthy
> 
> -- Dietmar
> 
>>
>>>
>>> On an additional note, on powerpc we would want to clear the
>>> SD_SHARE_POWERDOMAIN flag at the CPU domain. On Power8, considering we
>>> have 8 threads per core, we would want to consolidate tasks atleast upto
>>> 4 threads without significant performance impact before spilling over to
>>> the other cores. By doing so, besides making use of the higher power of
>>> the core we could do cpuidle management at the core level for the
>>> remaining idle cores as a result of this consolidation.
>>
>> OK. i will add the SD_SHARE_POWERDOMAIN like below
>>
>> Thanks
>> Vincent
>>
>>>
>>> So the powerpc_topology[] would be something like the below:
>>>
>>> +static struct sched_domain_topology_level powerpc_topology[] = {
>>> +#ifdef CONFIG_SCHED_SMT
>>> +   { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
>>> SD_INIT_NAME(SMT) | arch_sd_sibling_asym_packing() | SD_SHARE_POWERDOMAIN },
>>> +#endif
>>> +   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
>>> +   { NULL, },
>>> +};
>>>
>>> The amount of consolidation to the threads in a core, we will probably
>>> take care of it in cpu capacity or a similar parameter, but the above
>>> topology level would be required to request the scheduler to try
>>> consolidating tasks to cores till the cpu capacity(3/4/5 threads) has
>>> exceeded.
>>>
>>> Regards
>>> Preeti U Murthy
>>>
>>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordo

[PATCH V3] serial/uart/8250: Add tunable RX interrupt trigger I/F of FIFO buffers

2014-03-13 Thread Yoshihiro YUNOMAE

Add tunable RX interrupt trigger I/F of FIFO buffers.
Serial devices are used as not only message communication devices but control
or sending communication devices. For the latter uses, normally small data
will be exchanged, so user applications want to receive data unit as soon as
possible for real-time tendency. If we have a sensor which sends a 1 byte data
each time and must control a device based on the sensor feedback, the RX
interrupt should be triggered for each data.

According to HW specification of serial UART devices, RX interrupt trigger
can be changed, but the trigger is hard-coded. For example, RX interrupt trigger
in 16550A can be set to 1, 4, 8, or 14 bytes for HW, but current driver sets
the trigger to only 8bytes.

This patch makes a 16550A device change RX interrupt trigger from userland.


- Read current setting
 # cat /sys/dev/char/4\:64/rx_int_trig
 8

- Write user setting
 # echo 1 > /sys/dev/char/4\:64/rx_int_trig
 # cat /sys/dev/char/4\:64/rx_int_trig
 1


- 16550A (1, 4, 8, or 14 bytes)

Changed in V2:
 - Use _IOW for TIOCSFIFORTRIG definition
 - Pass the interrupt trigger value itself

Changes in V3:
 - Change I/F from ioctl(2) to sysfs(rx_int_trig)

Signed-off-by: Yoshihiro YUNOMAE 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Heikki Krogerus 
Cc: Jingoo Han 
Cc: Aaron Sierra 
Cc: Stephen Warren 
Cc: linux-ser...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/tty/serial/8250/8250_core.c |  189 ++-
 drivers/tty/serial/serial_core.c|   18 ++-
 include/linux/serial_8250.h |1 
 include/linux/serial_core.h |4 +
 4 files changed, 198 insertions(+), 14 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_core.c 
b/drivers/tty/serial/8250/8250_core.c
index 69932b7..c945ceb 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -531,11 +531,9 @@ static void serial8250_clear_fifos(struct uart_8250_port 
*p)
 
 void serial8250_clear_and_reinit_fifos(struct uart_8250_port *p)
 {
-   unsigned char fcr;
-
serial8250_clear_fifos(p);
-   fcr = uart_config[p->port.type].fcr;
-   serial_out(p, UART_FCR, fcr);
+   p->fcr = uart_config[p->port.type].fcr;
+   serial_out(p, UART_FCR, p->fcr);
 }
 EXPORT_SYMBOL_GPL(serial8250_clear_and_reinit_fifos);
 
@@ -2325,10 +2323,19 @@ serial8250_do_set_termios(struct uart_port *port, 
struct ktermios *termios,
if ((baud < 2400 && !up->dma) || fifo_bug) {
fcr &= ~UART_FCR_TRIGGER_MASK;
fcr |= UART_FCR_TRIGGER_1;
+   /* Don't use user setting RX trigger */
+   up->fcr = 0;
}
}
 
/*
+* If up->fcr exists, a user has opened this port, changed RX trigger,
+* or read RX trigger before. So, we don't need to change up->fcr here.
+*/
+   if (!up->fcr)
+   up->fcr = fcr;
+
+   /*
 * MCR-based auto flow control.  When AFE is enabled, RTS will be
 * deasserted when the receive FIFO contains more characters than
 * the trigger, or the MCR RTS bit is cleared.  In the case where
@@ -2455,15 +2462,15 @@ serial8250_do_set_termios(struct uart_port *port, 
struct ktermios *termios,
 * is written without DLAB set, this mode will be disabled.
 */
if (port->type == PORT_16750)
-   serial_port_out(port, UART_FCR, fcr);
+   serial_port_out(port, UART_FCR, up->fcr);
 
serial_port_out(port, UART_LCR, cval);  /* reset DLAB */
up->lcr = cval; /* Save LCR */
if (port->type != PORT_16750) {
/* emulated UARTs (Lucent Venus 167x) need two steps */
-   if (fcr & UART_FCR_ENABLE_FIFO)
+   if (up->fcr & UART_FCR_ENABLE_FIFO)
serial_port_out(port, UART_FCR, UART_FCR_ENABLE_FIFO);
-   serial_port_out(port, UART_FCR, fcr);   /* set fcr */
+   serial_port_out(port, UART_FCR, up->fcr);   /* set fcr */
}
serial8250_set_mctrl(port, port->mctrl);
spin_unlock_irqrestore(&port->lock, flags);
@@ -2656,6 +2663,172 @@ static int serial8250_request_port(struct uart_port 
*port)
return ret;
 }
 
+static unsigned char convert_fcr2val(struct uart_8250_port *up,
+unsigned char fcr)
+{
+   unsigned char val = 0, trig_raw = fcr & UART_FCR_TRIGGER_MASK;
+
+   switch (up->port.type) {
+   case PORT_16550A:
+   if (trig_raw == UART_FCR_R_TRIG_00)
+   val = 1;
+   else if (trig_raw == UART_FCR_R_TRIG_01)
+   val = 4;
+   else if (trig_raw == UART_FCR_R_TRIG_10)
+   val = 8;
+   else if (trig_raw == UART_FCR_R_TRIG_11)
+   val = 14;
+   break;
+   }

Re: [PATCH] backlight: lm3639: use devm_backlight_device_register()

2014-03-13 Thread Jingoo Han

On Friday, March 14, 2014 11:14 AM, Daniel Jeong wrote:
> 
>  change to use devm_backlight_device_register() for simple cleanup.
> 
> Signed-off-by: Daniel Jeong 

Acked-by: Jingoo Han 

Lee Jones,
Would you merge this patch into your backlight tree?

Best regards,
Jingoo Han

> ---
>  drivers/video/backlight/lm3639_bl.c |   17 +++--
>  1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/video/backlight/lm3639_bl.c 
> b/drivers/video/backlight/lm3639_bl.c
> index 6fd60ad..5f36808 100644
> --- a/drivers/video/backlight/lm3639_bl.c
> +++ b/drivers/video/backlight/lm3639_bl.c
> @@ -349,8 +349,9 @@ static int lm3639_probe(struct i2c_client *client,
>   props.brightness = pdata->init_brt_led;
>   props.max_brightness = pdata->max_brt_led;
>   pchip->bled =
> - backlight_device_register("lm3639_bled", pchip->dev, pchip,
> -   &lm3639_bled_ops, &props);
> + devm_backlight_device_register(pchip->dev, "lm3639_bled",
> +pchip->dev, pchip, &lm3639_bled_ops,
> +&props);
>   if (IS_ERR(pchip->bled)) {
>   dev_err(&client->dev, "fail : backlight register\n");
>   ret = PTR_ERR(pchip->bled);
> @@ -360,7 +361,7 @@ static int lm3639_probe(struct i2c_client *client,
>   ret = device_create_file(&(pchip->bled->dev), &dev_attr_bled_mode);
>   if (ret < 0) {
>   dev_err(&client->dev, "failed : add sysfs entries\n");
> - goto err_bled_mode;
> + goto err_out;
>   }
> 
>   /* flash */
> @@ -391,8 +392,6 @@ err_torch:
>   led_classdev_unregister(&pchip->cdev_flash);
>  err_flash:
>   device_remove_file(&(pchip->bled->dev), &dev_attr_bled_mode);
> -err_bled_mode:
> - backlight_device_unregister(pchip->bled);
>  err_out:
>   return ret;
>  }
> @@ -407,10 +406,8 @@ static int lm3639_remove(struct i2c_client *client)
>   led_classdev_unregister(&pchip->cdev_torch);
>   if (&pchip->cdev_flash)
>   led_classdev_unregister(&pchip->cdev_flash);
> - if (pchip->bled) {
> + if (pchip->bled)
>   device_remove_file(&(pchip->bled->dev), &dev_attr_bled_mode);
> - backlight_device_unregister(pchip->bled);
> - }
>   return 0;
>  }
> 
> @@ -432,6 +429,6 @@ static struct i2c_driver lm3639_i2c_driver = {
>  module_i2c_driver(lm3639_i2c_driver);
> 
>  MODULE_DESCRIPTION("Texas Instruments Backlight+Flash LED driver for 
> LM3639");
> -MODULE_AUTHOR("Daniel Jeong ");
> -MODULE_AUTHOR("G.Shark Jeong ");
> +MODULE_AUTHOR("Daniel Jeong ");
> +MODULE_AUTHOR("Ldd Mlp ");
>  MODULE_LICENSE("GPL v2");
> --
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/6] sched: powerpc: create a dedicated topology table

2014-03-13 Thread Preeti U Murthy

On 03/12/2014 01:14 PM, Vincent Guittot wrote:
> On 12 March 2014 05:42, Preeti U Murthy  wrote:
>> On 03/11/2014 06:48 PM, Vincent Guittot wrote:
>>> On 11 March 2014 11:08, Preeti U Murthy  wrote:
 Hi Vincent,

 On 03/05/2014 12:48 PM, Vincent Guittot wrote:
> Create a dedicated topology table for handling asymetric feature.
> The current proposal creates a new level which describes which groups of 
> CPUs
> take adavantge of SD_ASYM_PACKING. The useless level will be removed 
> during the
> build of the sched_domain topology.
>
> Another solution would be to set SD_ASYM_PACKING in the sd_flags of SMT 
> level
> during the boot sequence and before the build of the sched_domain 
> topology.

 Is the below what you mean as the other solution? If it is so, I would
 strongly recommend this approach rather than adding another level to the
 topology level to represent the asymmetric behaviour.

 +static struct sched_domain_topology_level powerpc_topology[] = {
 +#ifdef CONFIG_SCHED_SMT
 +   { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
 SD_INIT_NAME(SMT) | arch_sd_sibling_asym_packing() },
 +#endif
 +   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
 +   { NULL, },
 +};
>>>
>>> Not exactly like that but something like below
>>>
>>> +static struct sched_domain_topology_level powerpc_topology[] = {
>>> +#ifdef CONFIG_SCHED_SMT
>>> + { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
>>> SD_INIT_NAME(SMT) },
>>> +#endif
>>> + { cpu_cpu_mask, SD_INIT_NAME(DIE) },
>>> + { NULL, },
>>> +};
>>> +
>>> +static void __init set_sched_topology(void)
>>> +{
>>> +#ifdef CONFIG_SCHED_SMT
>>> + powerpc_topology[0].sd_flags |= arch_sd_sibling_asym_packing();
>>> +#endif
>>> + sched_domain_topology = powerpc_topology;
>>> +}
>>
>> I think we can set it in powerpc_topology[] and not bother about setting
>> additional flags outside of this array. It is clearer this way.
> 
> IIRC, the arch_sd_sibling_asym_packing of powerpc can be set at
> runtime which prevents it from being put directly in the table. Or it
> means that we should use a function pointer in the table for setting
> flags instead of a static value like the current proposal.

Oh yes you are right. Then the above looks good to me. So we define the
table as usual and set the asymmetric flag in set_sched_topology().
> 
>>
>> On an additional note, on powerpc we would want to clear the
>> SD_SHARE_POWERDOMAIN flag at the CPU domain. On Power8, considering we
>> have 8 threads per core, we would want to consolidate tasks atleast upto
>> 4 threads without significant performance impact before spilling over to
>> the other cores. By doing so, besides making use of the higher power of
>> the core we could do cpuidle management at the core level for the
>> remaining idle cores as a result of this consolidation.
> 
> OK. i will add the SD_SHARE_POWERDOMAIN like below

Thanks!

Regards
Preeti U Murthy
> 
> Thanks
> Vincent
> 
>>
>> So the powerpc_topology[] would be something like the below:
>>
>> +static struct sched_domain_topology_level powerpc_topology[] = {
>> +#ifdef CONFIG_SCHED_SMT
>> +   { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES,
>> SD_INIT_NAME(SMT) | arch_sd_sibling_asym_packing() | SD_SHARE_POWERDOMAIN },
>> +#endif
>> +   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
>> +   { NULL, },
>> +};
>>
>> The amount of consolidation to the threads in a core, we will probably
>> take care of it in cpu capacity or a similar parameter, but the above
>> topology level would be required to request the scheduler to try
>> consolidating tasks to cores till the cpu capacity(3/4/5 threads) has
>> exceeded.
>>
>> Regards
>> Preeti U Murthy
>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] power: add an API to log wakeup reasons

2014-03-13 Thread Ruchi Kandoi

True, we could create new wakeup sources specifically to
track this information, perhaps as needed once an IRQ is first
observed to trigger a wakeup.

We would want to know which wakeup sources were responsible for the
most recent wakeup, since we keep a timeline of suspend/resume events
with wakeup reasons and durations.  It may require some extra work to
keep track of this information.

Apart from Google, other vendors like Qualcomm and Nvidia have already
introduced similar kinds of logging in their respective interrupt
controller drivers. We would really like to make this a standardized
logging for debugging purposes.



On Thu, Mar 13, 2014 at 6:06 PM, Rafael J. Wysocki  wrote:
> On Thursday, March 13, 2014 05:43:20 PM Ruchi Kandoi wrote:
>> This should be true most of the times.
>>
>> But there might be cases otherwise too.
>>
>> For instance, there was a bug earlier with wi-fi which would cause the
>> system to wake up but not get hold of a wakeup source because there
>> wasn't any work for it to do. In that case, the wakeup sources would
>> not log such an event.
>>
>> Additionally, there could be a situation where an IRQ caused the
>> system to resume from suspend. And since the system was up, a driver
>> could take a wakeup source. In this case we would assume that the
>> driver would have woken the system, but in reality the driver held the
>> wakeup source only because the system was up and did not cause the
>> wake up to happen.
>
> But you can create special wakeup sources associated with interrupts (in
> addition to the existing ones) and use the statistics for those.
>
> It is possible to define wakeup sources that don't correspond to any
> devices.
>
> Rafael
>
>
>> On Thu, Mar 13, 2014 at 3:18 PM, Rafael J. Wysocki  
>> wrote:
>> > Hi,
>> >
>> > I saw the v4, but I don't have it handy, so replying here.
>> >
>> > On Wednesday, March 12, 2014 12:46:38 PM Ruchi Kandoi wrote:
>> >> For power management diagnostic purposes, it is often useful to know
>> >> what interrupts are frequently waking the system from low power
>> >> suspend mode, especially on battery-powered consumer electronics
>> >> devices that are expected to spend much of their time in low-power
>> >> suspend while not in active use.  For example, reduced battery life on
>> >> a mobile phone may be caused in part by frequent wakeups by broadcast
>> >> traffic on a busy wireless LAN even while the screen is off and the
>> >> phone not in active use.
>> >>
>> >> Add API log_wakeup_reason() exposes it to userspace via the sysfs path
>> >> /sys/kernel/wakeup_reasons/last_resume_reason. This API would be called
>> >> from the paltform specific, or from the driver for the interrupt 
>> >> controller,
>> >> when the system resumes because of an IRQ. It logs the reasons which 
>> >> caused
>> >> the system to wakeup from the low-power mode.
>> >
>> > So what exactly is wrong with using wakeup sources for this purpose?
>> >
>> > --
>> > I speak only for myself.
>> > Rafael J. Wysocki, Intel Open Source Technology Center.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] backlight: lm3639: use devm_backlight_device_register()

2014-03-13 Thread Daniel Jeong

 change to use devm_backlight_device_register() for simple cleanup.

Signed-off-by: Daniel Jeong 
---
 drivers/video/backlight/lm3639_bl.c |   17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/video/backlight/lm3639_bl.c 
b/drivers/video/backlight/lm3639_bl.c
index 6fd60ad..5f36808 100644
--- a/drivers/video/backlight/lm3639_bl.c
+++ b/drivers/video/backlight/lm3639_bl.c
@@ -349,8 +349,9 @@ static int lm3639_probe(struct i2c_client *client,
props.brightness = pdata->init_brt_led;
props.max_brightness = pdata->max_brt_led;
pchip->bled =
-   backlight_device_register("lm3639_bled", pchip->dev, pchip,
- &lm3639_bled_ops, &props);
+   devm_backlight_device_register(pchip->dev, "lm3639_bled",
+  pchip->dev, pchip, &lm3639_bled_ops,
+  &props);
if (IS_ERR(pchip->bled)) {
dev_err(&client->dev, "fail : backlight register\n");
ret = PTR_ERR(pchip->bled);
@@ -360,7 +361,7 @@ static int lm3639_probe(struct i2c_client *client,
ret = device_create_file(&(pchip->bled->dev), &dev_attr_bled_mode);
if (ret < 0) {
dev_err(&client->dev, "failed : add sysfs entries\n");
-   goto err_bled_mode;
+   goto err_out;
}
 
/* flash */
@@ -391,8 +392,6 @@ err_torch:
led_classdev_unregister(&pchip->cdev_flash);
 err_flash:
device_remove_file(&(pchip->bled->dev), &dev_attr_bled_mode);
-err_bled_mode:
-   backlight_device_unregister(pchip->bled);
 err_out:
return ret;
 }
@@ -407,10 +406,8 @@ static int lm3639_remove(struct i2c_client *client)
led_classdev_unregister(&pchip->cdev_torch);
if (&pchip->cdev_flash)
led_classdev_unregister(&pchip->cdev_flash);
-   if (pchip->bled) {
+   if (pchip->bled)
device_remove_file(&(pchip->bled->dev), &dev_attr_bled_mode);
-   backlight_device_unregister(pchip->bled);
-   }
return 0;
 }
 
@@ -432,6 +429,6 @@ static struct i2c_driver lm3639_i2c_driver = {
 module_i2c_driver(lm3639_i2c_driver);
 
 MODULE_DESCRIPTION("Texas Instruments Backlight+Flash LED driver for LM3639");
-MODULE_AUTHOR("Daniel Jeong ");
-MODULE_AUTHOR("G.Shark Jeong ");
+MODULE_AUTHOR("Daniel Jeong ");
+MODULE_AUTHOR("Ldd Mlp ");
 MODULE_LICENSE("GPL v2");
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Enquiry

2014-03-13 Thread HARVESTROYALMACHINERY




Dear Sir,

Please do you handle Copyright Infringement Cases as our long time  
business associate has violated our copyright agreement and we need a  
good lawyer to handle this matter urgently.


If your firm is retained, our expectation of your services for now  
will be within the scenario of a Demand Letter to our client. This  
approach will trigger the much needed response from our client towards  
amicable settlement. We are prepared to pay your retainer fees and we  
will also send you other important and sensitive documents.


Expecting your urgent reply.

REPLY TO ALTERNATE EMAIL: cheng-gongc...@harvestroyalmachineryindco.com

Best regards,
HARVESTROYALMACHINERY

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 0/2] Qualcomm Universal Peripheral (QUP) I2C controller

2014-03-13 Thread Bjorn Andersson

This fifth revision of the QUP I2C driver comes with minor fixes, as per review
comments on the second third revision.

Regards,
Bjorn

Changes from second v3:
 - Reformat device tree binding description related to clocks
 - Minor cleanup related to dt parsing of clock frequency
 - Properly return EINVAL on dt parse error
 - Use i2c_add_adapter instead of numbered version
 - Call pm_runtime_set_active() before we leave probe with clocks enabled
 - Remove debug prints from suspend and resume

Changes from v3:
 - Simplified interrupt handler
 - Corrected the state transition poll timeout
 - Refactored state transition code
 - Refactored the polling functions waiting for transfers to finish
 - Made the write fifo fill function care if there's space
 - Corrected programmed length on writes
 - Made block read and block write work
 - Removed data duplicates from qup_i2c_dev
 - Changed timeout to HZ, to give room for clock stretching
 - Properly reject reads over 256 bytes, as limited by HW
 - Dropped reinitialization of completions
 - Made sure to not re-initiate reads for every block read
 - Added QUP version number to compatible

Changes from v2:
 - Removed unused variables and includes
 - Corrected read logic in irq handler
 - Made the polling loop in qup_i2c_poll_state() less arbitrary
 - Only building suspend/resume if CONFIG_PM_SLEEP

Changes from v1:
 - Cleaned up device tree binding example.
 - Refrased device tree bindings.
 - Following changes in the i2c framework.
 - Use the core clock to calculate divider for the bus clock, instead of
   explicitly setting it.
 - Remove explicit pinctrl settting.
 - Split/renamed qup_i2c_enable(bool) into enable/disable functions.
 - Return value was overwritten on error in write_one/read_one.
 - Initialize the i2c core every time, so that we actually can execute
   more than 1 transmission per xfer.

Bjorn Andersson (1):
  i2c: New bus driver for the Qualcomm QUP I2C controller

Ivan T. Ivanov (1):
  i2c: qup: Add device tree bindings information

 .../devicetree/bindings/i2c/qcom,i2c-qup.txt   |   46 ++
 drivers/i2c/busses/Kconfig |   10 +
 drivers/i2c/busses/Makefile|1 +
 drivers/i2c/busses/i2c-qup.c   |  768 
 4 files changed, 825 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/qcom,i2c-qup.txt
 create mode 100644 drivers/i2c/busses/i2c-qup.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 2/2] i2c: New bus driver for the Qualcomm QUP I2C controller

2014-03-13 Thread Bjorn Andersson

This bus driver supports the QUP i2c hardware controller in the Qualcomm SOCs.
The Qualcomm Universal Peripheral Engine (QUP) is a general purpose data path
engine with input/output FIFOs and an embedded i2c mini-core. The driver
supports FIFO mode (for low bandwidth applications) and block mode (interrupt
generated for each block-size data transfer).

Cc: Andy Gross 
Cc: Stephen Boyd 
Signed-off-by: Ivan T. Ivanov 
Signed-off-by: Bjorn Andersson 
---
 drivers/i2c/busses/Kconfig   |   10 +
 drivers/i2c/busses/Makefile  |1 +
 drivers/i2c/busses/i2c-qup.c |  768 ++
 3 files changed, 779 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-qup.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index f5ed031..f994461 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -648,6 +648,16 @@ config I2C_PXA_SLAVE
  is necessary for systems where the PXA may be a target on the
  I2C bus.
 
+config I2C_QUP
+   tristate "Qualcomm QUP based I2C controller"
+   depends on ARCH_QCOM
+   help
+ If you say yes to this option, support will be included for the
+ built-in I2C interface on the Qualcomm SoCs.
+
+ This driver can also be built as a module.  If so, the module
+ will be called i2c-qup.
+
 config I2C_RIIC
tristate "Renesas RIIC adapter"
depends on ARCH_SHMOBILE || COMPILE_TEST
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index a08931f..bf2257b 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_I2C_PNX) += i2c-pnx.o
 obj-$(CONFIG_I2C_PUV3) += i2c-puv3.o
 obj-$(CONFIG_I2C_PXA)  += i2c-pxa.o
 obj-$(CONFIG_I2C_PXA_PCI)  += i2c-pxa-pci.o
+obj-$(CONFIG_I2C_QUP)  += i2c-qup.o
 obj-$(CONFIG_I2C_RIIC) += i2c-riic.o
 obj-$(CONFIG_I2C_S3C2410)  += i2c-s3c2410.o
 obj-$(CONFIG_I2C_S6000)+= i2c-s6000.o
diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
new file mode 100644
index 000..0c844bf
--- /dev/null
+++ b/drivers/i2c/busses/i2c-qup.c
@@ -0,0 +1,768 @@
+/*
+ * Copyright (c) 2009-2013, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2014, Sony Mobile Communications AB.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* QUP Registers */
+#define QUP_CONFIG 0x000
+#define QUP_STATE  0x004
+#define QUP_IO_MODE0x008
+#define QUP_SW_RESET   0x00c
+#define QUP_OPERATIONAL0x018
+#define QUP_ERROR_FLAGS0x01c
+#define QUP_ERROR_FLAGS_EN 0x020
+#define QUP_HW_VERSION 0x030
+#define QUP_MX_OUTPUT_CNT  0x100
+#define QUP_OUT_FIFO_BASE  0x110
+#define QUP_MX_WRITE_CNT   0x150
+#define QUP_MX_INPUT_CNT   0x200
+#define QUP_MX_READ_CNT0x208
+#define QUP_IN_FIFO_BASE   0x218
+#define QUP_I2C_CLK_CTL0x400
+#define QUP_I2C_STATUS 0x404
+
+/* QUP States and reset values */
+#define QUP_RESET_STATE0
+#define QUP_RUN_STATE  1
+#define QUP_PAUSE_STATE3
+#define QUP_STATE_MASK 3
+
+#define QUP_STATE_VALIDBIT(2)
+#define QUP_I2C_MAST_GEN   BIT(4)
+
+#define QUP_OPERATIONAL_RESET  0x000ff0
+#define QUP_I2C_STATUS_RESET   0xfc
+
+/* QUP OPERATIONAL FLAGS */
+#define QUP_I2C_NACK_FLAG  BIT(3)
+#define QUP_OUT_NOT_EMPTY  BIT(4)
+#define QUP_IN_NOT_EMPTY   BIT(5)
+#define QUP_OUT_FULL   BIT(6)
+#define QUP_OUT_SVC_FLAG   BIT(8)
+#define QUP_IN_SVC_FLAGBIT(9)
+#define QUP_MX_OUTPUT_DONE BIT(10)
+#define QUP_MX_INPUT_DONE  BIT(11)
+
+/* I2C mini core related values */
+#define QUP_CLOCK_AUTO_GATEBIT(13)
+#define I2C_MINI_CORE  (2 << 8)
+#define I2C_N_VAL  15
+/* Most significant word offset in FIFO port */
+#define QUP_MSW_SHIFT  (I2C_N_VAL + 1)
+
+/* Packing/Unpacking words in FIFOs, and IO modes */
+#define QUP_OUTPUT_BLK_MODE(1 << 10)
+#define QUP_INPUT_BLK_MODE (1 << 12)
+#define QUP_UNPACK_EN  BIT(14)
+#define QUP_PACK_ENBIT(15)
+
+#define QUP_REPACK_EN  (QUP_UNPACK_EN | QUP_PACK_EN)
+
+#define QUP_OUTPUT_BLOCK_SIZE(x)(((x) >> 0) & 0x03)
+#define QUP_OUTPUT_FIFO_SIZE(x)(((x) >> 2) & 0x07)
+#define QUP_INPUT_BLOCK_SIZE(x)(((x

[PATCH v5 1/2] i2c: qup: Add device tree bindings information

2014-03-13 Thread Bjorn Andersson

From: "Ivan T. Ivanov" 

The Qualcomm Universal Peripherial (QUP) wraps I2C mini-core and
provide input and output FIFO's for it. I2C controller can operate
as master with supported bus speeds of 100Kbps and 400Kbps.

Signed-off-by: Ivan T. Ivanov 
[bjorn: reformulated part of binding description
added version to compatible
cleaned up example]
Signed-off-by: Bjorn Andersson 
---
 .../devicetree/bindings/i2c/qcom,i2c-qup.txt   |   46 
 1 file changed, 46 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/qcom,i2c-qup.txt

diff --git a/Documentation/devicetree/bindings/i2c/qcom,i2c-qup.txt 
b/Documentation/devicetree/bindings/i2c/qcom,i2c-qup.txt
new file mode 100644
index 000..32ef64e
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/qcom,i2c-qup.txt
@@ -0,0 +1,46 @@
+Qualcomm Universal Peripheral (QUP) I2C controller
+
+Required properties:
+ - compatible: Should be:
+   * "qcom,i2c-qup-v1.1.1" for 8660, 8960 and 8064.
+   * "qcom,i2c-qup-v2.1.1" for 8974 v1.
+   * "qcom,i2c-qup-v2.2.1" for 8974 v2 and later.
+ - reg: Should contain QUP register address and length.
+ - interrupts: Should contain I2C interrupt.
+
+ - clocks: A list of phandles + clock-specifiers, one for each entry in
+   clock-names.
+ - clock-names: Should contain:
+   * "core" for the core clock
+   * "iface" for the AHB clock
+
+ - #address-cells: Should be <1> Address cells for i2c device address
+ - #size-cells: Should be <0> as i2c addresses have no size component
+
+Optional properties:
+ - clock-frequency: Should specify the desired i2c bus clock frequency in Hz,
+defaults to 100kHz if omitted.
+
+Child nodes should conform to i2c bus binding.
+
+Example:
+
+ i2c@f9924000 {
+   compatible = "qcom,i2c-qup-v2.2.1";
+   reg = <0xf9924000 0x1000>;
+   interrupts = <0 96 0>;
+
+   clocks = <&gcc GCC_BLSP1_QUP2_I2C_APPS_CLK>, <&gcc GCC_BLSP1_AHB_CLK>;
+   clock-names = "core", "iface";
+
+   clock-frequency = <355000>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   dummy@60 {
+   compatible = "dummy";
+   reg = <0x60>;
+   };
+ };
+
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] amd/pci: Add AMD hostbridge supports for newer AMD systems

2014-03-13 Thread Suravee Suthikulpanit

On 3/12/2014 4:13 PM, Bjorn Helgaas wrote:

I assume the system is fully functional even without these patches,
>right?  The only effect of these changes should be a performance
>improvement.

[Suravee] Yes, the system is fully functional except the numa 
information for PCI ethernet adapters is returning -1.

>
>So the choices are:
>
>   1) Change the BIOS to provide _PXM
>   2) Change Linux with your patches
>
>Either way, the customer has to upgrade something.  Choice 1) gets you
>the performance improvement on all Linux and Windows releases, even
>the ones that are already in the field.

[Suravee] I agree with you on both cases and I am working with the BIOS 
team to see if they can look into this and provide fixes for the future. 
 My only concern is the device _PXM information might be specific to 
each platform vendors since they are the one deciding the number of 
northbridge to be put on the systems. Also, it's quite difficult to 
influence HW/BIOS vendor to provide fixes for older platforms unless 
some important customers are complaining to them.

>Choice 2) fixes future Linux
>distros and whatever old releases you can convince the distro to
>backport to, and doesn't help Windows at all.  It makes work for all
>the distros and we (i.e., I) have to worry about maintaining this in
>the future.

[Suravee] Actually, this was originally reported by Redhat (please see 
https://bugzilla.redhat.com/show_bug.cgi?id=1040440), and they are 
waiting for the fixes if we decide to fix it in the upstream Linux code 
base.

>I'm curious to see what your BIOS folks say.  It seems strange that
>after all these years, they wouldn't provide something relatively
>simple like _PXM, so I wonder if there's more to the story.

I looked at this a bit more.  Your patches fill in the
mp_bus_to_node[] table, and pci_acpi_scan_root() uses
get_mp_bus_to_node() to get that information back out.  But
pci_acpi_scan_root() only uses get_mp_bus_to_node() if acpi_get_pxm()
fails, or if pxm_to_node() returns -1.

[Suravee] Correct

If acpi_get_pxm() failed, that's pretty good indication that the _PXM
method is missing or broken.  If pxm_to_node() failed, it looks like
that could mean the SRAT table is missing or broken, and we didn't
fill in the relevant pxm_to_node_map[] entry.  So it's possible that
we do have a _PXM method, but there's something wrong with the SRAT.

[Suravee] I am not sure which table is the _PXM method supposed to be 
apart of.

Can you collect an acpidump and complete dmesg log from a system with
the problem?  They might be too big for the mailing list; if so, you
can attach them to a kernel.org bugzilla and just include the URL
here.

[Suravee] Please see https://bugzilla.kernel.org/show_bug.cgi?id=72051
I have included SRAT, SLIT and DSDT table as part of the bug. Please let 
me know if you are looking for more information.

Thank you,

Suravee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] procfs: make smp_affinity values 0644

2014-03-13 Thread Chema Gonzalez

Includes:
- /proc/irq/default_smp_affinity
- /proc/irq/*/smp_affinity
- /proc/irq/*/smp_affinity_list

Users can distill the same information by reading /proc/interrupts.

Signed-off-by: Chema Gonzalez 
---
 kernel/irq/proc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 36f6ee1..3f80a9a 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -324,7 +324,7 @@ void register_irq_proc(unsigned int irq, struct irq_desc 
*desc)
 
 #ifdef CONFIG_SMP
/* create /proc/irq//smp_affinity */
-   proc_create_data("smp_affinity", 0600, desc->dir,
+   proc_create_data("smp_affinity", 0644, desc->dir,
 &irq_affinity_proc_fops, (void *)(long)irq);
 
/* create /proc/irq//affinity_hint */
@@ -332,7 +332,7 @@ void register_irq_proc(unsigned int irq, struct irq_desc 
*desc)
 &irq_affinity_hint_proc_fops, (void *)(long)irq);
 
/* create /proc/irq//smp_affinity_list */
-   proc_create_data("smp_affinity_list", 0600, desc->dir,
+   proc_create_data("smp_affinity_list", 0644, desc->dir,
 &irq_affinity_list_proc_fops, (void *)(long)irq);
 
proc_create_data("node", 0444, desc->dir,
@@ -372,7 +372,7 @@ void unregister_handler_proc(unsigned int irq, struct 
irqaction *action)
 static void register_default_affinity_proc(void)
 {
 #ifdef CONFIG_SMP
-   proc_create("irq/default_smp_affinity", 0600, NULL,
+   proc_create("irq/default_smp_affinity", 0644, NULL,
&default_affinity_proc_fops);
 #endif
 }
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] regulator: bcm590xx: Use array to save desc and *info

2014-03-13 Thread Axel Lin

BCM590XX_NUM_REGS is known in compile time.
Use array to save desc and *info makes the code simpler.

Signed-off-by: Axel Lin 
---
 drivers/regulator/bcm590xx-regulator.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/regulator/bcm590xx-regulator.c 
b/drivers/regulator/bcm590xx-regulator.c
index ab08ca7..fe21855 100644
--- a/drivers/regulator/bcm590xx-regulator.c
+++ b/drivers/regulator/bcm590xx-regulator.c
@@ -151,9 +151,9 @@ static struct bcm590xx_info bcm590xx_regs[] = {
 };
 
 struct bcm590xx_reg {
-   struct regulator_desc *desc;
+   struct regulator_desc desc[BCM590XX_NUM_REGS];
struct bcm590xx *mfd;
-   struct bcm590xx_info **info;
+   struct bcm590xx_info *info[BCM590XX_NUM_REGS];
 };
 
 static int bcm590xx_get_vsel_register(int id)
@@ -319,20 +319,6 @@ static int bcm590xx_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, pmu);
 
-   pmu->desc = devm_kzalloc(&pdev->dev, BCM590XX_NUM_REGS *
-   sizeof(struct regulator_desc), GFP_KERNEL);
-   if (!pmu->desc) {
-   dev_err(&pdev->dev, "Memory alloc fails for desc\n");
-   return -ENOMEM;
-   }
-
-   pmu->info = devm_kzalloc(&pdev->dev, BCM590XX_NUM_REGS *
-   sizeof(struct bcm590xx_info *), GFP_KERNEL);
-   if (!pmu->info) {
-   dev_err(&pdev->dev, "Memory alloc fails for info\n");
-   return -ENOMEM;
-   }
-
info = bcm590xx_regs;
 
for (i = 0; i < BCM590XX_NUM_REGS; i++, info++) {
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] regulator: bcm590xx: Remove **rdev from struct bcm590xx_reg

2014-03-13 Thread Axel Lin

The **rdev of 'struct bcm590xx_reg' isn't used anywhere in the driver so
remove it.

Signed-off-by: Axel Lin 
---
 drivers/regulator/bcm590xx-regulator.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/regulator/bcm590xx-regulator.c 
b/drivers/regulator/bcm590xx-regulator.c
index d12d6d6..ab08ca7 100644
--- a/drivers/regulator/bcm590xx-regulator.c
+++ b/drivers/regulator/bcm590xx-regulator.c
@@ -153,7 +153,6 @@ static struct bcm590xx_info bcm590xx_regs[] = {
 struct bcm590xx_reg {
struct regulator_desc *desc;
struct bcm590xx *mfd;
-   struct regulator_dev **rdev;
struct bcm590xx_info **info;
 };
 
@@ -334,13 +333,6 @@ static int bcm590xx_probe(struct platform_device *pdev)
return -ENOMEM;
}
 
-   pmu->rdev = devm_kzalloc(&pdev->dev, BCM590XX_NUM_REGS *
-   sizeof(struct regulator_dev *), GFP_KERNEL);
-   if (!pmu->rdev) {
-   dev_err(&pdev->dev, "Memory alloc fails for rdev\n");
-   return -ENOMEM;
-   }
-
info = bcm590xx_regs;
 
for (i = 0; i < BCM590XX_NUM_REGS; i++, info++) {
@@ -391,8 +383,6 @@ static int bcm590xx_probe(struct platform_device *pdev)
pdev->name);
return PTR_ERR(rdev);
}
-
-   pmu->rdev[i] = rdev;
}
 
return 0;
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] regulator: bcm590xx: Make the modalias matches the driver name

2014-03-13 Thread Axel Lin

Signed-off-by: Axel Lin 
---
 drivers/regulator/bcm590xx-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/bcm590xx-regulator.c 
b/drivers/regulator/bcm590xx-regulator.c
index e6b2e8e..d12d6d6 100644
--- a/drivers/regulator/bcm590xx-regulator.c
+++ b/drivers/regulator/bcm590xx-regulator.c
@@ -410,4 +410,4 @@ module_platform_driver(bcm590xx_regulator_driver);
 MODULE_AUTHOR("Matt Porter ");
 MODULE_DESCRIPTION("BCM590xx voltage regulator driver");
 MODULE_LICENSE("GPL v2");
-MODULE_ALIAS("platform:bcm590xx-regulator");
+MODULE_ALIAS("platform:bcm590xx-vregs");
-- 
1.8.1.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Trusted kernel patchset for Secure Boot lockdown

2014-03-13 Thread Matthew Garrett

On Thu, 2014-03-13 at 23:21 +, One Thousand Gnomes wrote:
> On Thu, 13 Mar 2014 21:30:48 +
> Matthew Garrett  wrote:
> 
> > On Thu, 2014-03-13 at 21:24 +, One Thousand Gnomes wrote:
> > 
> > > If I have CAP_SYS_RAWIO I can make arbitary ring 0 calls from userspace,
> > > trivially and in a fashion well known and documented.
> > 
> > How?
> 
> You want a list... there are load of places all over the kernel that have
> assumptions that RAWIO = safe from the boringly mundane like MSR access
> to the more obscure driver "this is RAWIO trust the user" cases of which
> there are plenty.

Have you actually looked at these patches? I've looked at every case of
RAWIO in the kernel. For cases that are hardware specific and tied to
fairly old hardware, I've ignored them. For cases which provide an
obvious mechanism for exploitation, I've added an additional check. For
cases where I can't see a reasonable mechanism for executing arbitrary
code in the kernel, I've done nothing.

If you have specific examples of processes with CAP_SYS_RAWIO being able
to execute arbitrary code in the kernel even with this patchset applied,
please, give them.

>You can even avoid the userspace issues with a small amount of
>checking. If you don't want to touch capability sets then make the
>default behaviour for capable(x) in fact be
>
>capable(x & ~secure_forbidden)
>
>for a measured kernel and add a 
>
>capable_always()
>
>for the cases you want to not break.

We could do that, but now the behaviour of the patchset is far less
obvious. capable(CAP_SYS_RAWIO) now means something different to every
other use of capable(), and we still need get_trusted_kernel() calls for
cases where the checks have nothing to do with processes and so
capabilities can't be used. It still involves auditing every use of
CAP_SYS_RAWIO. In fact, in some cases we need to *add* CAP_SYS_RAWIO
checks - which, again, breaks userspace.

> As for mem= and exactmap, it has nothing to do with /dev/mem and
> everything to do with giving the kernel a memory map where some of the
> space it thinks is RAM is in fact devices, rom, space etc. If the kernel
> is given a false memory map it will misbehave. Exploitably - well given
> the kind of things people have achieved in the past - quite possibly.

Sure. That's a worthwhile thing to fix, and it's something that dropping
CAP_SYS_RAWIO would do nothing to help you with.

> If you are not prepared to do the job right, then I don't think it
> belongs upstream. Let's do it right, and if we have to tweak a few bits
> of userspace to make them work in measured mode (but without breaking
> anything in normal modes) then it's worth doing the job properly.

We can do this without unnecessarily breaking any userspace. We just
can't do it by fiddling with capabilities.

> I don't think we need to break any userspace for "normal" mode to do
> this. Userspace in measured mode is going to change anyway. It already
> has just for things like module signing.

This has been discussed at length. Nobody who's actually spent time
working on the problem wants to use capabilities. CAP_SYS_RAWIO is not
semantically identical to the trusted kernel bit. Trying to make them
semantically identical will break existing userspace.

> (As an aside you may also then want to think about whether you allow
> measured userspace elements that secure_forbidden is considered to be 0
> for so you can sign userspace apps that are allowed to do RAWIO)

I'd be amazed if any of the applications that need RAWIO have had any
kind of meaningful security audit, with the possible exception of X (and
then we'd need to add support for signed X modules and sign all the
DDXes and seriously just no). I've no objection to someone doing that
work (and Vivek did a pile of it when looking at implementing kexec via
signed userspace), but I don't see any real use cases - pretty much
everyone using bits of RAWIO that are gated in the trusted kernel case
should be using a real kernel interface instead.

-- 
Matthew Garrett

Re: [PATCH 8/9] PCI: Ignore BAR contents when firmware left decoding disabled

2014-03-13 Thread Ming Lei

On Fri, Mar 14, 2014 at 12:08 AM, Bjorn Helgaas  wrote:
> On Thu, Mar 13, 2014 at 2:51 AM, Ming Lei  wrote:
>> Hi Bjorn,
>>
>> I found this patch broke virtio-pci devices.
>
> Thanks a lot for testing this.
>
>> On Thu, Feb 27, 2014 at 3:37 AM, Bjorn Helgaas  wrote:
>>> Don't rely on BAR contents when the command register says the BAR is
>>> disabled.
>>>
>>> If we receive a PCI device from firmware (or a hot-added device that was
>>> just powered up) with the MEMORY or IO enable bits in the PCI command
>>> register cleared, there's no reason to believe the BARs contain valid
>>> addresses.
>>
>> From PCI LOCAL BUS SPECIFICATION, REV. 3.0, both
>> PCI_COMMAND_IO and PCI_COMMAND_MEM should be
>> cleared after reset, so looks the patch sets IORESOURCE_UNSET
>> too early because PCI drivers may call pci_enable_device()
>> (->pci_enable_resources()) to enable the two bits of
>> PCI_COMMAND explicitly.
>
> The point is that it's not safe to enable those two bits unless we're
> certain that the BARs they control contain valid values that don't
> conflict with anything else in the system.
>
> Maybe we should only set IORESOURCE_UNSET when we find a conflict or a
> BAR that's not contained by an upstream bridge window, and we should
> try to reallocate then.  I'm pretty sure we do that at least in some
> cases, but it would probably simplify things if we did it more
> consistently, and maybe we shouldn't set it at all here in
> __pci_read_base().

I think so because __pci_read_base() is called in device emulation
path.

>
> But I'd like to understand your situation better, so can you provide
> more details, please?  Complete before/after dmesg logs would go a
> long way toward illustrating the problem you're seeing.

Please see the two attachment log. The memory allocation failure
is caused by mistaken value read from pci address after the device
is failed to enable.


Thanks,
-- 
Ming Lei


dmesg_before_revert.tar.gz
Description: GNU Zip compressed data


dmesg_after_revert.tar.gz
Description: GNU Zip compressed data

[PATCH] mtip32xx: Fix ERO and NoSnoop values in PCIe upstream on AMD systems

2014-03-13 Thread Asai Thambi S P


A hardware quirk in P320h/P420m interfere with PCIe transactions on some
AMD chipsets, making P320h/P420m unusable. This workaround is to disable
ERO and NoSnoop bits in the parent and root complex for normal functioning
of these devices

NOTE: This workaround is specific to AMD chipset with a PCIe upstream
device with device id 0x5aXX

Signed-off-by: Asai Thambi S P 
Signed-off-by: Sam Bradshaw 
---
 drivers/block/mtip32xx/mtip32xx.c |   53 +
 1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 624e9d9..8c462d3 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4479,6 +4479,57 @@ static DEFINE_HANDLER(5);
 static DEFINE_HANDLER(6);
 static DEFINE_HANDLER(7);
 
+static void mtip_disable_link_opts(struct driver_data *dd, struct pci_dev 
*pdev)
+{
+   int pos;
+   unsigned short pcie_dev_ctrl;
+
+   pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+   if (pos) {
+   pci_read_config_word(pdev,
+   pos + PCI_EXP_DEVCTL,
+   &pcie_dev_ctrl);
+   if (pcie_dev_ctrl & (1 << 11) ||
+   pcie_dev_ctrl & (1 << 4)) {
+   dev_info(&dd->pdev->dev,
+   "Disabling ERO/No-Snoop on bridge device 
%04x:%04x\n",
+   pdev->vendor, pdev->device);
+   pcie_dev_ctrl &= ~(PCI_EXP_DEVCTL_NOSNOOP_EN |
+   PCI_EXP_DEVCTL_RELAX_EN);
+   pci_write_config_word(pdev,
+   pos + PCI_EXP_DEVCTL,
+   pcie_dev_ctrl);
+   }
+   }
+}
+
+static void mtip_fix_ero_nosnoop(struct driver_data *dd, struct pci_dev *pdev)
+{
+   /*
+* This workaround is specific to AMD/ATI chipset with a PCI upstream
+* device with device id 0x5aXX
+*/
+   if (pdev->bus && pdev->bus->self) {
+   if (pdev->bus->self->vendor == PCI_VENDOR_ID_ATI &&
+   ((pdev->bus->self->device & 0xff00) == 0x5a00)) {
+   mtip_disable_link_opts(dd, pdev->bus->self);
+   } else {
+   /* Check further up the topology */
+   struct pci_dev *parent_dev = pdev->bus->self;
+   if (parent_dev->bus &&
+   parent_dev->bus->parent &&
+   parent_dev->bus->parent->self &&
+   parent_dev->bus->parent->self->vendor ==
+PCI_VENDOR_ID_ATI &&
+   (parent_dev->bus->parent->self->device &
+   0xff00) == 0x5a00) {
+   mtip_disable_link_opts(dd,
+   parent_dev->bus->parent->self);
+   }
+   }
+   }
+}
+
 /*
  * Called for each supported PCI device detected.
  *
@@ -4630,6 +4681,8 @@ static int mtip_pci_probe(struct pci_dev *pdev,
goto msi_initialize_err;
}
 
+   mtip_fix_ero_nosnoop(dd, pdev);
+
/* Initialize the block layer. */
rv = mtip_block_initialize(dd);
if (rv < 0) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] control groups: documentation improvements

2014-03-13 Thread Li Zefan

On 2014/3/14 0:04, Glyn Normington wrote:
> Hi Tejun
> 
> Stepping back from the patch for a while, we'd like to explore the issues you 
> raise. Please bear with us as we try to capture the ideas precisely.
> 
> Continued inline...
> 
> Regards,
> Glyn (& Steve Powell, copied)
> 
> On 10/03/2014 14:20, Tejun Heo wrote:
>> Hey,
>>
>> On Mon, Mar 10, 2014 at 02:17:21PM +, Glyn Normington wrote:
>>> Then we missed how to create a hierarchy with no associated
>>> subsystems. The only way I can think of is to use mount, specify no
>>> subsystems on -o (which defaults to all the subsystems defined in
>>> the kernel), and run it in a kernel with no subsystems defined
>>> (which seems unlikely these days).
>>>
>>> Is that what you had in mind or is there some other way of creating
>>> a hierarchy with no subsystems attached?
>> Hierarchy name should be specified "-o name=" for hierarchies w/o any
>> controllers.
> 
> According to cgroups.txt:
> 
>  When passing a name= option for a new hierarchy, you need to
>  specify subsystems manually; the legacy behaviour of mounting all
>  subsystems when none are explicitly specified is not supported when
>  you give a subsystem a name.
> 
> So the documentation certainly does not make it clear that it is valid to 
> specify no subsystems.
> 
> We tried this (on a 3.11 kernel) and can't get it to work:
> 
> # mount -t cgroup -o name=th none /home/glyn/h1

# mount -t cgroup -o name=th,none none /home/glyn/h1

and then

# mount -t cgroup -o name=th none /home/glyn/h2

> 
> mount: wrong fs type, bad option, bad superblock on none,
> missing codepage or helper program, or other error
> (for several filesystems (e.g. nfs, cifs) you might
> need a /sbin/mount. helper program) In some
> cases useful info is found in syslog - try dmesg |
> tail or so
> 
> Please could you supply an example which works?
> 
> Clarify that subsystems may be attached to multiple hierarchies,
> although this isn't very useful, and explain what happens.
 And a subsystem may only be attached to a single hierarchy.
>>> Perhaps that's what should happen, but the following experiment
>>> demonstrates a subsystem being attached to two hierarchies:
>>>
>>> $ pwd
>>> /home/vagrant
>>> $ mkdir mem1
>>> $ mkdir mem2
>>> $ sudo su
>>> # mount -t cgroup -o memory none /home/vagrant/mem1
>>> # mount -t cgroup -o memory none /home/vagrant/mem2
>>> # cd mem1
>>> # mkdir inst1
>>> # ls inst1
>>> cgroup.clone_children  memory.failcnt ...
>>> # ls ../mem2
>>> cgroup.clone_children  inst1 memory.limit_in_bytes ...
>>> # cd inst1
>>> # echo 100 > memory.limit_in_bytes
>>> # cat memory.limit_in_bytes
>>> 1003520
>>> # cat ../../mem2/inst1/memory.limit_in_bytes
>>> 1003520
>>> # echo $$ > tasks
>>> # cat tasks
>>> 1365
>>> 1409
>>> # cat ../../mem2/inst1/tasks
>>> 1365
>>> 1411
>> You're mounting the same hierarchy twice.  Those are two views into
>> the same hierarchy.
>>
>> Thanks.
>>
> Yes, it does appear that this is what is going on, but to explain it this way 
> turns out to be more complicated than one might expect. Here's an attempt...
> 

Yeah, this can only confuse people.

I don't think we need extra explanation, because we all know the same
filesystem can have more than one mount point, and cgroupfs is no
different.

> ---
>  A*hierarchy*  is a non-empty set of cgroups arranged in a tree and a
>  non-empty set of subsystems such that the cgroups in the hierarchy
>  partition all the tasks in the system (in other words, every task in the
>  system is in exactly one of the cgroups in the hierarchy) and each
>  subsystem attaches its own state to each cgroup in the hierarchy.
> 
>  There may be zero or more active hierarchies. Each hierarchy has one
>  or more views associated with it. A *view* is an instance of the cgroup
>  virtual filesystem. The tree of cgroups of a hierarchy is represented
>  by the directory tree in the cgroup virtual filesystem of each view of
>  the hierarchy. All the directory trees in the cgroup virtual filesystem
>  of the views of a given hierarchy have identical content.
> 
>  No subsystem may participate in more than one hierarchy.
> ---
> 
> We'd also need to explain the behaviour of mount and umount with respect to 
> views. The first time a cgroup mount is performed with a given set of 
> subsystems, a hierarchy is created and a view of the hierarchy is created and 
> associated with a cgroup filesystem at the mount point. Subsequently, if 
> another cgroup mount is performed with the same set of subsystems, no new 
> hierarchy is created but a new view of the existing hierarchy is created and 
> associated with the cgroup filesystem at the new mount point.
> 
> Unmounting a cgroup mount point destroys a particular view and destroys the 
> hierarchy associated with the view if and only if the view is the only 
> (remaining) view of the hierarchy.
> 
> So does introducing the concept of a view really help? The wording in the 
> patch does with

Re: [RC6 Bell Chime] [PATCH 00/24] rfcomm fixes

2014-03-13 Thread Peter Hurley


Hi Sander,

On 03/13/2014 08:49 PM, Sander Eikelenboom wrote:


Is it just me .. or is this going at the speed of about a bluetooth connection 
..
and probably missing the boot for 3.14 ? (for no good reason IMHO)


(it was not in John's nor Dave's last pull request, although it seems to be 
reverted in the bluetooth tree now .. i didn't
see any formal pull request from that .. to get it even *starting* to traverse 
all the trees up to Linus ... )


Known bugs sometimes roll out into mainline release because the
alternative can be worse.

As I explained in the follow-up to my patch series, I would
not have expected Marcel to pick up any of the fixes for 3.14.
There are a lot of moving parts in usb + bluetooth + rfcomm + tty,
and the unfortunate reality is that -next doesn't get as much
testing as it should.

The fault is mine because Gianluca let me know about the
problems with the conversion to tty_port, but the holidays really
interfered with my ability to put this work first, and I'm sorry
for that.

I know the breakage around RFCOMM is frustrating but I think the
worst is behind us. After 3.15 gets some -rc testing, I will
be happy to cherry-pick the critical fixes for -stable inclusion.

Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RC6 Bell Chime] [PATCH 00/24] rfcomm fixes

2014-03-13 Thread Marcel Holtmann

Hi Sander,

 Since:
 - 3.14-RC6 has been cut
 - this regression is known and reported since the merge window
 - the fix (revert of 3 patches) is known for over a month now
 - but it's still not in mainline
 - my polite ping request from last week seems to have provoked exactly 0 
 (zero) response.
 
 IT'S TIME TO CHIME SOME BELLS :-)
 
 Hope that WILL be heard somewhere ...
 
 --
 Sander
 
 PS. on the informative side the 3 commits to be reverted are:
 
 f86772af6a0f643d3e13eb3f4f9213ae0c333ee4 Bluetooth: Remove 
 rfcomm_carrier_raised()
 4a2fb3ecc7467c775b154813861f25a0ddc11aa0 Bluetooth: Always wait for a 
 connection on RFCOMM open()
 e228b63390536f5b737056059a9a04ea016b1abf Bluetooth: Move 
 rfcomm_get_device() before rfcomm_dev_activate()
>>> 
>>> Gustavo, should I be expecting a pull request?
> 
>> you have all 3 reverts already in wireless-next as part of a larger RFCOMM 
>> change from Peter that we had to do to get this bug fixed. The whole series 
>> ended up as 24 patches and was way to late in the -rc stage. In addition we 
>> did not have enough exposure from people running that patch set. I did not 
>> wanted to end up in a ping-pong situation with apply + revert over and over 
>> again until we gave this some test exposure.
> 
>> I think it is pretty safe to revert these 3 patches from 3.14-rc6 since they 
>> broke more than they actually fixed. However everybody has to be aware of 
>> that the real fix only comes in 3.15 since the change unfortunately is 
>> large. As far as we can tell, only users of RFCOMM TTY are affected. All 
>> RFCOMM socket users are fine.
> 
>> So I do not know what the best way of getting these reverts merged. Gustavo 
>> has a tree ready for you to pull from. Or would it be better if they get 
>> cherry picked from wireless-next tree.
> 
> 
> 
> Is it just me .. or is this going at the speed of about a bluetooth 
> connection ..
> and probably missing the boot for 3.14 ? (for no good reason IMHO)
> 
> 
> (it was not in John's nor Dave's last pull request, although it seems to be 
> reverted in the bluetooth tree now .. i didn't
> see any formal pull request from that .. to get it even *starting* to 
> traverse all the trees up to Linus … )

the whole set is now in net-next so it is really up to John or Dave if they 
either want to cherry pick these 3 reverts or if they want to pull from 
bluetooth.git tree.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros

2014-03-13 Thread Chen Gang

Add readl() and writel() for 'PM_' macros, just like another areas have
done within unicored32, or will cause compiling issue.

The related error (allmodconfig for unicored32):

CC  arch/unicore32/kernel/clock.o
  arch/unicore32/kernel/clock.c: In function ‘clk_set_rate’:
  arch/unicore32/kernel/clock.c:182: warning: initialization makes integer from 
pointer without a cast
  arch/unicore32/kernel/clock.c:204: error: lvalue required as left operand of 
assignment
  arch/unicore32/kernel/clock.c:206: error: lvalue required as left operand of 
assignment
  arch/unicore32/kernel/clock.c:207: error: invalid operands to binary & (have 
‘void *’ and ‘long unsigned int’)
  make[1]: *** [arch/unicore32/kernel/clock.o] Error 1
  make: *** [arch/unicore32/kernel] Error 2


Signed-off-by: Chen Gang 
---
 arch/unicore32/kernel/clock.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/unicore32/kernel/clock.c b/arch/unicore32/kernel/clock.c
index 18d4563..b1ca775 100644
--- a/arch/unicore32/kernel/clock.c
+++ b/arch/unicore32/kernel/clock.c
@@ -179,7 +179,7 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
}
 #ifdef CONFIG_CPU_FREQ
if (clk == &clk_mclk_clk) {
-   u32 pll_rate, divstatus = PM_DIVSTATUS;
+   u32 pll_rate, divstatus = readl(PM_DIVSTATUS);
int ret, i;
 
/* lookup mclk_clk_table */
@@ -201,10 +201,10 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
/ (((divstatus & 0xf000) >> 12) + 1);
 
/* set pll sys cfg reg. */
-   PM_PLLSYSCFG = pll_rate;
+   writel(pll_rate, PM_PLLSYSCFG);
 
-   PM_PMCR = PM_PMCR_CFBSYS;
-   while ((PM_PLLDFCDONE & PM_PLLDFCDONE_SYSDFC)
+   writel(PM_PMCR_CFBSYS, PM_PMCR);
+   while ((readl(PM_PLLDFCDONE) & PM_PLLDFCDONE_SYSDFC)
!= PM_PLLDFCDONE_SYSDFC)
udelay(100);
/* about 1ms */
-- 
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] power: add an API to log wakeup reasons

2014-03-13 Thread Rafael J. Wysocki

On Thursday, March 13, 2014 05:43:20 PM Ruchi Kandoi wrote:
> This should be true most of the times.
> 
> But there might be cases otherwise too.
> 
> For instance, there was a bug earlier with wi-fi which would cause the
> system to wake up but not get hold of a wakeup source because there
> wasn't any work for it to do. In that case, the wakeup sources would
> not log such an event.
> 
> Additionally, there could be a situation where an IRQ caused the
> system to resume from suspend. And since the system was up, a driver
> could take a wakeup source. In this case we would assume that the
> driver would have woken the system, but in reality the driver held the
> wakeup source only because the system was up and did not cause the
> wake up to happen.

But you can create special wakeup sources associated with interrupts (in
addition to the existing ones) and use the statistics for those.

It is possible to define wakeup sources that don't correspond to any
devices.

Rafael


> On Thu, Mar 13, 2014 at 3:18 PM, Rafael J. Wysocki  wrote:
> > Hi,
> >
> > I saw the v4, but I don't have it handy, so replying here.
> >
> > On Wednesday, March 12, 2014 12:46:38 PM Ruchi Kandoi wrote:
> >> For power management diagnostic purposes, it is often useful to know
> >> what interrupts are frequently waking the system from low power
> >> suspend mode, especially on battery-powered consumer electronics
> >> devices that are expected to spend much of their time in low-power
> >> suspend while not in active use.  For example, reduced battery life on
> >> a mobile phone may be caused in part by frequent wakeups by broadcast
> >> traffic on a busy wireless LAN even while the screen is off and the
> >> phone not in active use.
> >>
> >> Add API log_wakeup_reason() exposes it to userspace via the sysfs path
> >> /sys/kernel/wakeup_reasons/last_resume_reason. This API would be called
> >> from the paltform specific, or from the driver for the interrupt 
> >> controller,
> >> when the system resumes because of an IRQ. It logs the reasons which caused
> >> the system to wakeup from the low-power mode.
> >
> > So what exactly is wrong with using wakeup sources for this purpose?
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] improve_stack: make stack dump output useful again

2014-03-13 Thread Linus Torvalds

On Thu, Mar 13, 2014 at 4:07 PM, Sasha Levin  wrote:
>
> I figured that I'll just read it from System.map (and do the math when
> adding the offset). That should work, right?

Yes, although just reading the symbols from the vmlinux file would be
*much* more convenient, since I know that not everybody saves the
System.map file (cough cough me). But the vmlinux file you need
anyway.

So it would actually be much better to use "nm vmlinux" as the source
of the base information instead, and have just one file you depend on.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 731 matches

Mail list logo