Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance

2013-10-08 Thread Yuan Zhong
Hi Gu,

> Hi Yuan,
> On 10/08/2013 07:30 PM, Yuan Zhong wrote:
>
>> Hi Gu,
>> 
>>> Hi Yuan,
>>> On 10/08/2013 04:30 PM, Yuan Zhong wrote:
>> 
 Previously, do_checkpoint() will call congestion_wait() for waiting the 
 pages (previous submitted node/meta/data pages) to be written back.
 Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for 
 waiting.
 For this reason, there is a situation that after the pages have been 
 written back, 
 but the checkpoint thread still wait for congestion_wait to exit.
>> 
>>> How do you confirm this issue? 
>> 
>>   I traced the execution path.
>>   In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be 
>> called.
>>   And I found that, when pages of F2FS_WRITEBACK has been zero, but
>>   checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be 
>> zero.
>
>Yes, it maybe. Congestion_wait add the task to a global wait queue which 
>related to
>all back devices, so if F2FS_WRITEBACK has been zero, but other io may be 
>still going on.
>Anyway, using a private wait queue to hold is a better choose.:)
>
>   
>>   So, I think this point could be improved.
>>   And I wrote a simple test case and tested on Micro-SD card, the steps as 
>> following:
>>   (a) create a fixed-size file (4KB)
>>   (b) go on to sync the file 
>>   (c) go back to step #a (fixed numbers of cycling:1024) 
>>The results indicated that the execution time is reduced greatly by using 
>> this patch.
>
>Yes, the change is an improvement if the issue is existent.
>
>  
>> 
>> 
>>> I suspect that the block-core does not have a wake-up mechanism
>>> when the back device is uncongested.
>> 
>> 
>>   Yes, you are right.
>>   So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK 
>> to be zero.
>>   In f2fs_end_io_write, f2fs_writeback_wake is called.
>>   you cloud find this code in my patch. 
>
>Saw it.:)
>But one problem is that the checkpoint routine always is singleton, so the 
>wait queue just
>services only one body, it seems not very worthy. How about just schedule and 
>wake up it
>directly? See the following one.


Yes, your point is right.
My reason for using wait queue is that I am influenced by congestion_wait 
function.
The inner function of congesiton_wait is also using wait_queue.
And, I think, your patch is also a more efficient method.


>
>Signed-off-by: Gu Zheng 
>---
> fs/f2fs/checkpoint.c |   11 +--
> fs/f2fs/f2fs.h   |1 +
> fs/f2fs/segment.c|4 
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
>diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>index d808827..2a5999d 100644
>--- a/fs/f2fs/checkpoint.c
>+++ b/fs/f2fs/checkpoint.c
>@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
>is_umount)
>   f2fs_put_page(cp_page, 1);
> 
>   /* wait for previous submitted node/meta pages writeback */
>-  while (get_pages(sbi, F2FS_WRITEBACK))
>-  congestion_wait(BLK_RW_ASYNC, HZ / 50);
>+  sbi->cp_task = current;
>+  while (get_pages(sbi, F2FS_WRITEBACK)) {
>+  set_current_state(TASK_UNINTERRUPTIBLE);
>+  if (!get_pages(sbi, F2FS_WRITEBACK))
>+  break;
>+  io_schedule();
>+  }
>+  __set_current_state(TASK_RUNNING);
>+  sbi->cp_task = NULL;
> 
>   filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
>   filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
>diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>index a955a59..408ace7 100644
>--- a/fs/f2fs/f2fs.h
>+++ b/fs/f2fs/f2fs.h
>@@ -365,6 +365,7 @@ struct f2fs_sb_info {
>   struct mutex writepages;/* mutex for writepages() */
>   int por_doing;  /* recovery is doing or not */
>   int on_build_free_nids; /* build_free_nids is doing */
>+  struct task_struct *cp_task;/* checkpoint task */
> 
>   /* for orphan inode management */
>   struct list_head orphan_inode_list; /* orphan inode list */
>diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>index bd79bbe..3b20359 100644
>--- a/fs/f2fs/segment.c
>+++ b/fs/f2fs/segment.c
>@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
> 
>   if (p->is_sync)
>   complete(p->wait);
>+
>+  if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
>+  wake_up_process(p->sbi->cp_task);
>+
>   kfree(p);
>   bio_put(bio);
> }
>-- 
>1.7.7
>
>Regards,
>Gu 
>


Regards,
Yuan


>> 
>> 
 This is a problem here, especially, when sync a large number of small 
 files or dirs.
 In order to avoid this, a wait_list is introduced, 
 the checkpoint thread will be dropped into the wait_list if the pages have 
 not been written back, 
 and will be waked up by contrast.
>> 
>>> Please pay some attention to the mail form, this mail is out of format in 

Re: [PATCH v2 0/3] Move ARCH specific fpu_counter out of task_struct

2013-10-08 Thread Vineet Gupta
Hi Andrew,

On 10/09/2013 01:44 AM, Andrew Morton wrote:
> On Tue, 8 Oct 2013 13:16:26 +0530 Vineet Gupta  
> wrote:
> 
>> When debugging a ARC SMP 3.11 build failure due to a ST insn dealing with
>> task_struct.thread going out of range, I spotted @fpu_counter in task_struct
>> which only SH/x86 happen to use, and can be easily moved out into 
>> corresponding
>> ARCH specific thread_struct. This saves 4 bytes per task_struct instantiated,
>> for all the other 18 arches.
>>
>> While the change is relatively strightforward, I've failed to connect with
>> SH arch folks / Paul Mundt, latter's SH email keeps bouncing back.
>> Hence I'm taking the liberty to use his alternate email (as listed on his
>> github page) and I appologize in advance if this causes any inconvenience.
> 
> I can't find [patch v2 2/3] anywhere.  Resend, please?
> 

Somehow that e-mail was dropped by some fancy "detection" machinery at our email
servers ;-) which I got notified of later on.

Patch reposted to lists.

Thx,
-Vineet

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rtc: rtc-puv3: use dev_dbg() instead of pr_debug()

2013-10-08 Thread Sangjung Woo
Because dev_*() are used along with pr_debug() function in this code,
the debug message is not tidy. This patch converts from pr_debug() to
dev_dbg() since dev_*() are encouraged to use in device driver code.

Signed-off-by: Sangjung Woo 
---
 drivers/rtc/rtc-puv3.c |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/rtc/rtc-puv3.c b/drivers/rtc/rtc-puv3.c
index 402732c..9167bb0 100644
--- a/drivers/rtc/rtc-puv3.c
+++ b/drivers/rtc/rtc-puv3.c
@@ -57,7 +57,7 @@ static void puv3_rtc_setaie(int to)
 {
unsigned int tmp;
 
-   pr_debug("%s: aie=%d\n", __func__, to);
+   dev_dbg(dev, "%s: aie=%d\n", __func__, to);
 
tmp = readl(RTC_RTSR) & ~RTC_RTSR_ALE;
 
@@ -71,7 +71,7 @@ static int puv3_rtc_setpie(struct device *dev, int enabled)
 {
unsigned int tmp;
 
-   pr_debug("%s: pie=%d\n", __func__, enabled);
+   dev_debug(dev, "%s: pie=%d\n", __func__, enabled);
 
spin_lock_irq(_rtc_pie_lock);
tmp = readl(RTC_RTSR) & ~RTC_RTSR_HZE;
@@ -90,7 +90,7 @@ static int puv3_rtc_gettime(struct device *dev, struct 
rtc_time *rtc_tm)
 {
rtc_time_to_tm(readl(RTC_RCNR), rtc_tm);
 
-   pr_debug("read time %02x.%02x.%02x %02x/%02x/%02x\n",
+   dev_dbg(dev, "read time %02x.%02x.%02x %02x/%02x/%02x\n",
 rtc_tm->tm_year, rtc_tm->tm_mon, rtc_tm->tm_mday,
 rtc_tm->tm_hour, rtc_tm->tm_min, rtc_tm->tm_sec);
 
@@ -101,7 +101,7 @@ static int puv3_rtc_settime(struct device *dev, struct 
rtc_time *tm)
 {
unsigned long rtc_count = 0;
 
-   pr_debug("set time %02d.%02d.%02d %02d/%02d/%02d\n",
+   dev_dbg(dev, "set time %02d.%02d.%02d %02d/%02d/%02d\n",
 tm->tm_year, tm->tm_mon, tm->tm_mday,
 tm->tm_hour, tm->tm_min, tm->tm_sec);
 
@@ -119,7 +119,7 @@ static int puv3_rtc_getalarm(struct device *dev, struct 
rtc_wkalrm *alrm)
 
alrm->enabled = readl(RTC_RTSR) & RTC_RTSR_ALE;
 
-   pr_debug("read alarm %02x %02x.%02x.%02x %02x/%02x/%02x\n",
+   dev_dbg(dev, "read alarm %02x %02x.%02x.%02x %02x/%02x/%02x\n",
 alrm->enabled,
 alm_tm->tm_year, alm_tm->tm_mon, alm_tm->tm_mday,
 alm_tm->tm_hour, alm_tm->tm_min, alm_tm->tm_sec);
@@ -132,7 +132,7 @@ static int puv3_rtc_setalarm(struct device *dev, struct 
rtc_wkalrm *alrm)
struct rtc_time *tm = >time;
unsigned long rtcalarm_count = 0;
 
-   pr_debug("puv3_rtc_setalarm: %d, %02x/%02x/%02x %02x.%02x.%02x\n",
+   dev_dbg(dev, "puv3_rtc_setalarm: %d, %02x/%02x/%02x %02x.%02x.%02x\n",
 alrm->enabled,
 tm->tm_mday & 0xff, tm->tm_mon & 0xff, tm->tm_year & 0xff,
 tm->tm_hour & 0xff, tm->tm_min & 0xff, tm->tm_sec);
@@ -140,7 +140,7 @@ static int puv3_rtc_setalarm(struct device *dev, struct 
rtc_wkalrm *alrm)
rtc_tm_to_time(tm, _count);
writel(rtcalarm_count, RTC_RTAR);
 
-   puv3_rtc_setaie(alrm->enabled);
+   puv3_rtc_setaie(>dev, alrm->enabled);
 
if (alrm->enabled)
enable_irq_wake(puv3_rtc_alarmno);
@@ -227,7 +227,7 @@ static int puv3_rtc_remove(struct platform_device *dev)
rtc_device_unregister(rtc);
 
puv3_rtc_setpie(>dev, 0);
-   puv3_rtc_setaie(0);
+   puv3_rtc_setaie(>dev, 0);
 
release_resource(puv3_rtc_mem);
kfree(puv3_rtc_mem);
@@ -241,7 +241,7 @@ static int puv3_rtc_probe(struct platform_device *pdev)
struct resource *res;
int ret;
 
-   pr_debug("%s: probe=%p\n", __func__, pdev);
+   dev_dbg(>dev, "%s: probe=%p\n", __func__, pdev);
 
/* find the IRQs */
puv3_rtc_tickno = platform_get_irq(pdev, 1);
@@ -256,7 +256,7 @@ static int puv3_rtc_probe(struct platform_device *pdev)
return -ENOENT;
}
 
-   pr_debug("PKUnity_rtc: tick irq %d, alarm irq %d\n",
+   dev_dbg(>dev, "PKUnity_rtc: tick irq %d, alarm irq %d\n",
 puv3_rtc_tickno, puv3_rtc_alarmno);
 
/* get the memory region */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] device: Add kernel standard devm_k.alloc functions

2013-10-08 Thread Greg KH
On Tue, Oct 08, 2013 at 10:32:27PM -0700, Joe Perches wrote:
> Currently, devm_ managed memory only supports kzalloc.
> 
> Convert the devm_kzalloc implementation to devm_kmalloc
> and remove the complete memset to 0 but still set the
> initial struct devres header and whatever padding before
> data to 0.
> 
> Add the other normal alloc variants as static inlines with
> __GFP_ZERO added to the gfp flag where appropriate:
> 
>   devm_kzalloc
>   devm_kcalloc
>   devm_kmalloc_array
> 
> Add gfp.h to device.h for the newly added static inlines.
> 
> Signed-off-by: Joe Perches 
> ---
>  drivers/base/devres.c  | 27 ---
>  include/linux/device.h | 21 +++--
>  2 files changed, 35 insertions(+), 13 deletions(-)

Makes sense to me, does this let other drivers start to use this where
they were not able to in the past?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFCv3 1/7] ARM: OMAP2+: hwmod-data: Add SSI information

2013-10-08 Thread Paul Walmsley
On Sun, 6 Oct 2013, Sebastian Reichel wrote:

> This patch adds Synchronous Serial Interface (SSI) hwmod support for
> OMAP34xx SoCs.
> 
> Signed-off-by: Sebastian Reichel 

Thanks, queued this one for v3.13.  You can drop it from any future 
reposts of this series.


- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] dma: mmp_tdma: add multiple burst size support for 910-squ

2013-10-08 Thread Qiao Zhou
add multiple burst size support for 910-squ.

Signed-off-by: Qiao Zhou 
---
 drivers/dma/mmp_tdma.c |   25 -
 1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/mmp_tdma.c b/drivers/dma/mmp_tdma.c
index 38cb517..d84354b 100644
--- a/drivers/dma/mmp_tdma.c
+++ b/drivers/dma/mmp_tdma.c
@@ -228,8 +228,31 @@ static int mmp_tdma_config_chan(struct mmp_tdma_chan 
*tdmac)
return -EINVAL;
}
} else if (tdmac->type == PXA910_SQU) {
-   tdcr |= TDCR_BURSTSZ_SQU_32B;
tdcr |= TDCR_SSPMOD;
+
+   switch (tdmac->burst_sz) {
+   case 1:
+   tdcr |= TDCR_BURSTSZ_SQU_1B;
+   break;
+   case 2:
+   tdcr |= TDCR_BURSTSZ_SQU_2B;
+   break;
+   case 4:
+   tdcr |= TDCR_BURSTSZ_SQU_4B;
+   break;
+   case 8:
+   tdcr |= TDCR_BURSTSZ_SQU_8B;
+   break;
+   case 16:
+   tdcr |= TDCR_BURSTSZ_SQU_16B;
+   break;
+   case 32:
+   tdcr |= TDCR_BURSTSZ_SQU_32B;
+   break;
+   default:
+   dev_err(tdmac->dev, "mmp_tdma: unknown burst size.\n");
+   return -EINVAL;
+   }
}
 
writel(tdcr, tdmac->reg_base + TDCR);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] dma: mmp-tdma: add multiple burst size support for 910-squ

2013-10-08 Thread Qiao Zhou
v1: add multiple burst size support. remove previous fixed 32-byte setting.

Qiao Zhou (1):
  dma: mmp_tdma: add multiple burst size support for 910-squ

 drivers/dma/mmp_tdma.c |   25 -
 1 files changed, 24 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/3] x86: Move fpu_counter into ARCH specific thread_struct

2013-10-08 Thread Vineet Gupta
From: Vineet Gupta 

Only a couple of arches (sh/x86) use fpu_counter in task_struct so it
can be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.

Compile tested i386_defconfig + gcc 4.7.3

Signed-off-by: Vineet Gupta 
Acked-by: Ingo Molnar 
Cc: x...@kernel.org
---
 arch/x86/include/asm/fpu-internal.h | 10 +-
 arch/x86/include/asm/processor.h|  9 +
 arch/x86/kernel/i387.c  |  2 +-
 arch/x86/kernel/process_32.c|  4 ++--
 arch/x86/kernel/process_64.c|  2 +-
 arch/x86/kernel/traps.c |  2 +-
 6 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index 4d0bda7b11e3..c49a613c6452 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -365,7 +365,7 @@ static inline void drop_fpu(struct task_struct *tsk)
 * Forget coprocessor state..
 */
preempt_disable();
-   tsk->fpu_counter = 0;
+   tsk->thread.fpu_counter = 0;
__drop_fpu(tsk);
clear_used_math();
preempt_enable();
@@ -424,7 +424,7 @@ static inline fpu_switch_t switch_fpu_prepare(struct 
task_struct *old, struct ta
 * or if the past 5 consecutive context-switches used math.
 */
fpu.preload = tsk_used_math(new) && (use_eager_fpu() ||
-new->fpu_counter > 5);
+new->thread.fpu_counter > 5);
if (__thread_has_fpu(old)) {
if (!__save_init_fpu(old))
cpu = ~0;
@@ -433,16 +433,16 @@ static inline fpu_switch_t switch_fpu_prepare(struct 
task_struct *old, struct ta
 
/* Don't change CR0.TS if we just switch! */
if (fpu.preload) {
-   new->fpu_counter++;
+   new->thread.fpu_counter++;
__thread_set_has_fpu(new);
prefetch(new->thread.fpu.state);
} else if (!use_eager_fpu())
stts();
} else {
-   old->fpu_counter = 0;
+   old->thread.fpu_counter = 0;
old->thread.fpu.last_cpu = ~0;
if (fpu.preload) {
-   new->fpu_counter++;
+   new->thread.fpu_counter++;
if (!use_eager_fpu() && fpu_lazy_restore(new, cpu))
fpu.preload = 0;
else
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 987c75ecc334..7b034a4057f9 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -488,6 +488,15 @@ struct thread_struct {
unsigned long   iopl;
/* Max allowed port in the bitmap, in bytes: */
unsignedio_bitmap_max;
+   /*
+* fpu_counter contains the number of consecutive context switches
+* that the FPU is used. If this is over a threshold, the lazy fpu
+* saving becomes unlazy to save the trap. This is an unsigned char
+* so that after 256 times the counter wraps and the behavior turns
+* lazy again; this to deal with bursty apps that only use FPU for
+* a short time
+*/
+   unsigned char fpu_counter;
 };
 
 /*
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 5d576ab34403..e8368c6dd2a2 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -100,7 +100,7 @@ void unlazy_fpu(struct task_struct *tsk)
__save_init_fpu(tsk);
__thread_fpu_end(tsk);
} else
-   tsk->fpu_counter = 0;
+   tsk->thread.fpu_counter = 0;
preempt_enable();
 }
 EXPORT_SYMBOL(unlazy_fpu);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 884f98f69354..6af43b03c8ef 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -153,7 +153,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
childregs->orig_ax = -1;
childregs->cs = __KERNEL_CS | get_kernel_rpl();
childregs->flags = X86_EFLAGS_IF | X86_EFLAGS_FIXED;
-   p->fpu_counter = 0;
+   p->thread.fpu_counter = 0;
p->thread.io_bitmap_ptr = NULL;
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));
return 0;
@@ -166,7 +166,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
p->thread.ip = (unsigned long) ret_from_fork;
task_user_gs(p) = get_user_gs(current_pt_regs());
 
-   p->fpu_counter = 0;
+   p->thread.fpu_counter = 0;
p->thread.io_bitmap_ptr = NULL;
tsk = current;
err = -ENOMEM;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 

Re: [PATCH] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread sangjung.woo

On 10/09/2013 01:59 PM, Joe Perches wrote:

The commit message doesn't match the patch subject
(shows kzalloc)

I was a bit surprised to find there isn't a devm_kmalloc.

This seems fine otherwise.


I just sent the second patch file after modifying the commit message.
Thank you for your opinion.

Cheers,
Sangjung Woo




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] device: Add kernel standard devm_k.alloc functions

2013-10-08 Thread Joe Perches
Currently, devm_ managed memory only supports kzalloc.

Convert the devm_kzalloc implementation to devm_kmalloc
and remove the complete memset to 0 but still set the
initial struct devres header and whatever padding before
data to 0.

Add the other normal alloc variants as static inlines with
__GFP_ZERO added to the gfp flag where appropriate:

devm_kzalloc
devm_kcalloc
devm_kmalloc_array

Add gfp.h to device.h for the newly added static inlines.

Signed-off-by: Joe Perches 
---
 drivers/base/devres.c  | 27 ---
 include/linux/device.h | 21 +++--
 2 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 507379e..37e67a2 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -91,7 +91,8 @@ static __always_inline struct devres * alloc_dr(dr_release_t 
release,
if (unlikely(!dr))
return NULL;
 
-   memset(dr, 0, tot_size);
+   memset(dr, 0, offsetof(struct devres, data));
+
INIT_LIST_HEAD(>node.entry);
dr->node.release = release;
return dr;
@@ -745,58 +746,62 @@ void devm_remove_action(struct device *dev, void 
(*action)(void *), void *data)
 EXPORT_SYMBOL_GPL(devm_remove_action);
 
 /*
- * Managed kzalloc/kfree
+ * Managed kmalloc/kfree
  */
-static void devm_kzalloc_release(struct device *dev, void *res)
+static void devm_kmalloc_release(struct device *dev, void *res)
 {
/* noop */
 }
 
-static int devm_kzalloc_match(struct device *dev, void *res, void *data)
+static int devm_kmalloc_match(struct device *dev, void *res, void *data)
 {
return res == data;
 }
 
 /**
- * devm_kzalloc - Resource-managed kzalloc
+ * devm_kmalloc - Resource-managed kmalloc
  * @dev: Device to allocate memory for
  * @size: Allocation size
  * @gfp: Allocation gfp flags
  *
- * Managed kzalloc.  Memory allocated with this function is
+ * Managed kmalloc.  Memory allocated with this function is
  * automatically freed on driver detach.  Like all other devres
  * resources, guaranteed alignment is unsigned long long.
  *
  * RETURNS:
  * Pointer to allocated memory on success, NULL on failure.
  */
-void * devm_kzalloc(struct device *dev, size_t size, gfp_t gfp)
+void * devm_kmalloc(struct device *dev, size_t size, gfp_t gfp)
 {
struct devres *dr;
 
/* use raw alloc_dr for kmalloc caller tracing */
-   dr = alloc_dr(devm_kzalloc_release, size, gfp);
+   dr = alloc_dr(devm_kmalloc_release, size, gfp);
if (unlikely(!dr))
return NULL;
 
+   /*
+* This is named devm_kzalloc_release for historical reasons
+* The initial implementation did not support kmalloc, only kzalloc
+*/
set_node_dbginfo(>node, "devm_kzalloc_release", size);
devres_add(dev, dr->data);
return dr->data;
 }
-EXPORT_SYMBOL_GPL(devm_kzalloc);
+EXPORT_SYMBOL_GPL(devm_kmalloc);
 
 /**
  * devm_kfree - Resource-managed kfree
  * @dev: Device this memory belongs to
  * @p: Memory to free
  *
- * Free memory allocated with devm_kzalloc().
+ * Free memory allocated with devm_kmalloc().
  */
 void devm_kfree(struct device *dev, void *p)
 {
int rc;
 
-   rc = devres_destroy(dev, devm_kzalloc_release, devm_kzalloc_match, p);
+   rc = devres_destroy(dev, devm_kmalloc_release, devm_kmalloc_match, p);
WARN_ON(rc);
 }
 EXPORT_SYMBOL_GPL(devm_kfree);
diff --git a/include/linux/device.h b/include/linux/device.h
index ce690ea..46540f8 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct device;
@@ -614,8 +615,24 @@ extern void devres_close_group(struct device *dev, void 
*id);
 extern void devres_remove_group(struct device *dev, void *id);
 extern int devres_release_group(struct device *dev, void *id);
 
-/* managed kzalloc/kfree for device drivers, no kmalloc, always use kzalloc */
-extern void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
+/* managed devm_k.alloc/kfree for device drivers */
+extern void *devm_kmalloc(struct device *dev, size_t size, gfp_t gfp);
+static inline void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp)
+{
+   return devm_kmalloc(dev, size, gfp | __GFP_ZERO);
+}
+static inline void *devm_kmalloc_array(struct device *dev,
+  size_t n, size_t size, gfp_t flags)
+{
+   if (size != 0 && n > SIZE_MAX / size)
+   return NULL;
+   return devm_kmalloc(dev, n * size, flags);
+}
+static inline void *devm_kcalloc(struct device *dev,
+size_t n, size_t size, gfp_t flags)
+{
+   return devm_kmalloc_array(dev, n, size, flags | __GFP_ZERO);
+}
 extern void devm_kfree(struct device *dev, void *p);
 
 void __iomem *devm_ioremap_resource(struct device *dev, struct resource *res);


--
To unsubscribe from this list: send the line "unsubscribe 

[PATCH v2] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread Sangjung Woo
In order to be free automatically and make the cleanup paths more
simple, use devm_kzalloc() instead of kmalloc().

Signed-off-by: Sangjung Woo 
---
 drivers/rtc/rtc-pl030.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/rtc/rtc-pl030.c b/drivers/rtc/rtc-pl030.c
index 22bacdb..ecd57c6 100644
--- a/drivers/rtc/rtc-pl030.c
+++ b/drivers/rtc/rtc-pl030.c
@@ -106,7 +106,7 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
if (ret)
goto err_req;
 
-   rtc = kmalloc(sizeof(*rtc), GFP_KERNEL);
+   rtc = devm_kzalloc(>dev, sizeof(*rtc), GFP_KERNEL);
if (!rtc) {
ret = -ENOMEM;
goto err_rtc;
@@ -115,7 +115,7 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
rtc->base = ioremap(dev->res.start, resource_size(>res));
if (!rtc->base) {
ret = -ENOMEM;
-   goto err_map;
+   goto err_rtc;
}
 
__raw_writel(0, rtc->base + RTC_CR);
@@ -141,8 +141,6 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
free_irq(dev->irq[0], rtc);
  err_irq:
iounmap(rtc->base);
- err_map:
-   kfree(rtc);
  err_rtc:
amba_release_regions(dev);
  err_req:
@@ -160,7 +158,6 @@ static int pl030_remove(struct amba_device *dev)
free_irq(dev->irq[0], rtc);
rtc_device_unregister(rtc->rtc);
iounmap(rtc->base);
-   kfree(rtc);
amba_release_regions(dev);
 
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3] PCI: exynos: add support for MSI

2013-10-08 Thread Jingoo Han
On Tuesday, October 08, 2013 3:23 PM, Kishon Vijay Abraham I wrote:
> On Friday 06 September 2013 12:24 PM, Jingoo Han wrote:
> > This patch adds support for Message Signaled Interrupt in the
> > Exynos PCIe diver using Synopsys designware PCIe core IP.
> >
> > Signed-off-by: Siva Reddy Kallam 
> > Signed-off-by: Srikanth T Shivanand 
> > Signed-off-by: Jingoo Han 
> > Cc: Pratyush Anand 
> > Cc: Mohit KUMAR 
> > ---

[.]

> >  int __init dw_pcie_host_init(struct pcie_port *pp)
> >  {
> > struct device_node *np = pp->dev->of_node;
> > @@ -157,6 +372,8 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
> > struct of_pci_range_parser parser;
> > u32 val;
> >
> > +   struct irq_domain *irq_domain;
> > +
> > if (of_pci_range_parser_init(, np)) {
> > dev_err(pp->dev, "missing ranges property\n");
> > return -EINVAL;
> > @@ -223,6 +440,18 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
> > return -EINVAL;
> > }
> >
> > +   if (IS_ENABLED(CONFIG_PCI_MSI)) {
> > +   irq_domain = irq_domain_add_linear(pp->dev->of_node,
> > +   MAX_MSI_IRQS, _domain_ops,
> > +   _pcie_msi_chip);
> > +   if (!irq_domain) {
> > +   dev_err(pp->dev, "irq domain init failed\n");
> > +   return -ENXIO;
> > +   }
> > +
> > +   pp->msi_irq_start = irq_find_mapping(irq_domain, 0);
> 
> Where is the irq_create_mapping done for this irq domain? Is that not needed?
> Without that I'm not getting the correct irq number.

Oh, you're right.
irq_create_mapping() is necessary!

Without irq_create_mapping(), it makes the ugly NULL deference
when two PCIe controllers are used on Exynos5440.
It is my mistake.

I will add irq_create_mapping() to dw_msi_setup_irq(), as tegra PCIe driver
did. I will send the patch, soon.

I really appreciate your comment. :-)
Thank you.

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] pinctrl: single: Prepare for supporting SoC specific features

2013-10-08 Thread Haojian Zhuang
On Tue, Oct 8, 2013 at 7:55 PM, Linus Walleij  wrote:
> On Mon, Oct 7, 2013 at 7:35 PM, Tony Lindgren  wrote:
>
>> Hi Linus W,
>>
>> Any comments on the pinctrl patches 3 - 5 in this series?
>
> I have no problems with this patch #3, as it is just changing syntax,
> not semantics.
>
> The problems start with patch #4.
>
> I am tormented with mixed feelings about this, because from one point of
> view I feel it is breaking the promise of pinctrl-single being a
> driver for platforms
> where a pin is controlled by a *single* register.
>
> If this was pinctrl-foo.c I would not have been so much bothered,
> but now it is something that was supposed to be self-contained and
> simple, pertaining to a single register, starting to look like something
> else.
>
> This is a bit like: "oh yeah just one register controls the pins, but under
> some circumstances I also want to mess with this register over here,
> and then this register over there ..." etc.
>
> I'd like Haojian to ACK this to proceed since he's also using this driver
> now. Then I feel better on continuing down this road ...
>
> Then I have a lesser comment on patch #4 since it makes it possible
> for this pin controller to support wake-up interrupt, as I don't see how
> this plays out with front-end GPIO controllers, but let's discuss that
> in the context of that patch.
>
> Yours,
> Linus Walleij
>

I'm OK on both #3 & #4. So Acked-by: Haojian Zhuang 

Regards
Haojian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/3] ARM: shmobile: kzm9d: Use common clock framework

2013-10-08 Thread Magnus Damm
Hi Simon,

On Wed, Oct 9, 2013 at 12:40 PM, Simon Horman  wrote:
> On Tue, Oct 08, 2013 at 02:34:03PM +0900, takas...@ops.dti.ne.jp wrote:
>> Use common clock framework version of clock
>>  drivers/clk/shmobile/clk-emev2.c
>> instead of sh-clkfwk version
>>  arch/arm/mach-shmobile/clock-emev2.c
>> when it is configured as a part of multi-platform.
>>
>> Signed-off-by: Takashi Yoshii 
>
> Thanks.
>
> I plan to add this patch to a new topic branch,
> topic/emev2-common-clock, in the renesas tree and
> queue it up from there for inclusion in mainline
> if/when the first patch of this series is accepted
> by Mike Turquette.

Thanks for picking up patches, Simon.

I think you can simply merge this patch after the following series:

[PATCH 00/05] ARM: shmobile: KZM9D Multiplatform update

There are no build time dependencies on patch 1 and 2 so this patch
can be merged independently. Regarding run-time operation, the
multiplatform series above makes KZM9D DT reference only build for
multiplatform, and in such case CCF is required.

So if you want to keep KZM9D DT reference working until Mike Turquette
accepts the CCF bits, then I recommend you to wait with "[PATCH 00/05]
ARM: shmobile: KZM9D Multiplatform update" until all EMEV2 CCF bits
have been merged.

Another way is to merge "[PATCH 00/05] ARM: shmobile: KZM9D
Multiplatform update" before the EMEV2 CCF bits, but if so you may as
well merge this patch as well IMO. This
multiplatform-update-series-merge-before-CCF plan will result in
untestable KZM9D DT reference until EMEV2 CCF is merged. Either way is
fine with me.

Cheers,

/ magnus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread Joe Perches
On Wed, 2013-10-09 at 13:36 +0900, sangjung.woo wrote:
> On 10/09/2013 01:07 PM, Joe Perches wrote:
> > On Wed, 2013-10-09 at 13:00 +0900, Sangjung Woo wrote:
> >> In order to be free automatically and make the cleanup paths more
> >> simple, use devm_kzalloc() instead of kzalloc().
> > []
> >> diff --git a/drivers/rtc/rtc-pl030.c b/drivers/rtc/rtc-pl030.c
> > []
> >> @@ -106,7 +106,7 @@ static int pl030_probe(struct amba_device *dev, const 
> >> struct amba_id *id)
> >>if (ret)
> >>goto err_req;
> >>   
> >> -  rtc = kmalloc(sizeof(*rtc), GFP_KERNEL);
> >> +  rtc = devm_kzalloc(>dev, sizeof(*rtc), GFP_KERNEL);
[]
> > You're not deleting a memset and you're
> > converting a kmalloc.
> You are right.
> >
> > Why do you need the zalloc version?
> >
> The key point of this patch is resource-managed memory allocation.

The commit message doesn't match the patch subject
(shows kzalloc)

I was a bit surprised to find there isn't a devm_kmalloc.

This seems fine otherwise.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, October 09, 2013 4:27 AM
> To: Bhushan Bharat-R65777
> Cc: alex.william...@redhat.com; j...@8bytes.org; b...@kernel.crashing.org;
> ga...@kernel.crashing.org; linux-kernel@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de;
> io...@lists.linux-foundation.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information
> 
> On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > @@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
> > int len;
> > u32 offset;
> > static const u32 all_avail[] = { 0, NR_MSI_IRQS };
> > +   static int bank_index;
> >
> > match = of_match_device(fsl_of_msi_ids, >dev);
> > if (!match)
> > @@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
> > dev->dev.of_node->full_name);
> > goto error_out;
> > }
> > -   msi->msiir_offset =
> > -   features->msiir_offset + (res.start & 0xf);
> > +   msi->msiir = res.start + features->msiir_offset;
> > +   printk("msi->msiir = %llx\n", msi->msiir);
> 
> dev_dbg or remove

Oops, sorry it was leftover of debugging :(

> 
> > }
> >
> > msi->feature = features->fsl_pic_ip; @@ -470,6 +500,7 @@ static int
> > fsl_of_msi_probe(struct platform_device *dev)
> > }
> > }
> >
> > +   msi->bank_index = bank_index++;
> 
> What if multiple MSIs are boing probed in parallel?

Ohh, I have not thought that it can be called in parallel

>  bank_index is not atomic.

Will declare bank_intex as atomic_t and use atomic_inc_return(_index)

> 
> > diff --git a/arch/powerpc/sysdev/fsl_msi.h
> > b/arch/powerpc/sysdev/fsl_msi.h index 8225f86..6bd5cfc 100644
> > --- a/arch/powerpc/sysdev/fsl_msi.h
> > +++ b/arch/powerpc/sysdev/fsl_msi.h
> > @@ -29,12 +29,19 @@ struct fsl_msi {
> > struct irq_domain *irqhost;
> >
> > unsigned long cascade_irq;
> > -
> > -   u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
> > +   dma_addr_t msiir; /* MSIIR Address in CCSR */
> 
> Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies that 
> it's
> the output of the DMA API, but I don't think the DMA API is used in the MSI
> driver.  Perhaps it should be, but we still want the raw physical address to
> pass on to VFIO.

Looking through the conversation I will make this phys_addr_t

> 
> > void __iomem *msi_regs;
> > u32 feature;
> > int msi_virqs[NR_MSI_REG];
> >
> > +   /*
> > +* During probe each bank is assigned a index number.
> > +* index number ranges from 0 to 2^32.
> > +* Example  MSI bank 1 = 0
> > +* MSI bank 2 = 1, and so on.
> > +*/
> > +   int bank_index;
> 
> 2^32 doesn't fit in "int" (nor does 2^32 - 1).

Right :(

> 
> Just say that indices start at 0.

Will correct this

Thanks
-Bharat

> 
> -Scott
> 



Re: [alsa-devel] [RFC/RFT v2 0/4] ALSA: hda - hdmi: ATI/AMD multi-channel and HBR support

2013-10-08 Thread Olivier Langlois

> 
> Then please test everything again. I.e.
>   o speaker-test -D hdmi:CARD=Generic,DEV=0 -c8 -r192000 -F S32_LE
> 
this work fine

>   o Is there any difference seen
> with these, in the beginning/end (i.e. fade-out/in):
> speaker-test -D hdmi:CARD=Generic,DEV=0,AES0=0x04 -c2 -r48000
> speaker-test -D hdmi:CARD=Generic,DEV=0,AES0=0x06 -c2 -r48000
> 
I have the same result as before the last fix. AES0=0x04 works
AES0=0x06 doesn't. No error is emitted from speaker-test but no sound is
coming out speakers.

>   o Also, is there a difference in the beginning of these
> (maybe garbage sound and/or slightly slower startup?):
> aplay -Dhdmi:CARD=Generic,DEV=0,AES0=4 -r44100 -f s16_le -c2
> testi.dts.cut.spdif
> aplay -Dhdmi:CARD=Generic,DEV=0,AES0=6 -r44100 -f s16_le -c2
> testi.dts.cut.spdif

no garbage. Maybe slightly slower startup with AES0=0x6.
> 
>   o Contents of /proc/asound/cardX/eld#0.0
> 
> Thanks a lot :)
> 
monitor_present 1
eld_valid   1
monitor_nameSC-09TX
connection_type HDMI
eld_version [0x2] CEA-861D or below
edid_version[0x0] no CEA EDID Timing Extension block present
manufacture_id  0x2f41
product_id  0x0
port_id 0x105
support_hdcp0
support_ai  0
audio_sync_delay0
speakers[0x4f] FL/FR LFE FC RL/RR RLC/RRC
sad_count   7
sad0_coding_type[0x1] LPCM
sad0_channels   8
sad0_rates  [0x1ee0] 32000 44100 48000 88200 96000 176400 192000
sad0_bits   [0xe] 16 20 24
sad1_coding_type[0x2] AC-3
sad1_channels   6
sad1_rates  [0xe0] 32000 44100 48000
sad1_max_bitrate64
sad2_coding_type[0x7] DTS
sad2_channels   7
sad2_rates  [0x6c0] 44100 48000 88200 96000
sad2_max_bitrate1536000
sad3_coding_type[0xa] E-AC-3/DD+ (Dolby Digital Plus)
sad3_channels   8
sad3_rates  [0xc0] 44100 48000
sad4_coding_type[0xb] DTS-HD
sad4_channels   8
sad4_rates  [0x1ec0] 44100 48000 88200 96000 176400 192000
sad5_coding_type[0xc] MLP (Dolby TrueHD)
sad5_channels   8
sad5_rates  [0x1ec0] 44100 48000 88200 96000 176400 192000
sad6_coding_type[0xe] WMAPro
sad6_channels   8
sad6_rates  [0x6e0] 32000 44100 48000 88200 96000
sad6_profile3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread sangjung.woo

On 10/09/2013 01:07 PM, Joe Perches wrote:

On Wed, 2013-10-09 at 13:00 +0900, Sangjung Woo wrote:

In order to be free automatically and make the cleanup paths more
simple, use devm_kzalloc() instead of kzalloc().

[]

diff --git a/drivers/rtc/rtc-pl030.c b/drivers/rtc/rtc-pl030.c

[]

@@ -106,7 +106,7 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
if (ret)
goto err_req;
  
-	rtc = kmalloc(sizeof(*rtc), GFP_KERNEL);

+   rtc = devm_kzalloc(>dev, sizeof(*rtc), GFP_KERNEL);


First of all, thanks for your review.


You're not deleting a memset and you're
converting a kmalloc.

You are right.


Why do you need the zalloc version?


The key point of this patch is resource-managed memory allocation.
As you already know, memory space that is allocated by devm_kzalloc() 
function

is automatically freed on driver detach. That makes the code tidy and
reduces human's mistakes not to kfree().



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Benjamin Herrenschmidt
On Tue, 2013-10-08 at 20:55 -0700, H. Peter Anvin wrote:
> Why not add a minimum number to pci_enable_msix(), i.e.:
> 
> pci_enable_msix(pdev, msix_entries, nvec, minvec)
> 
> ... which means "nvec" is the number of interrupts *requested*, and
> "minvec" is the minimum acceptable number (otherwise fail).

Which is exactly what Ben (the other Ben :-) suggested and that I
supports...

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thermal/intel_powerclamp: Add newer CPU models

2013-10-08 Thread Zhang Rui
On Thu, 2013-09-26 at 04:33 -0700, Jacob Pan wrote:
> This will enable intel_powerclamp driver on newer Intel CPUs
> including some Ivy Bridge and Haswell processors.
> 
> Signed-off-by: Jacob Pan 

applied to thermal -next.

thanks,
rui
> ---
>  drivers/thermal/intel_powerclamp.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/thermal/intel_powerclamp.c 
> b/drivers/thermal/intel_powerclamp.c
> index b40b37c..6e3b061 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -675,6 +675,11 @@ static const struct x86_cpu_id intel_powerclamp_ids[] = {
>   { X86_VENDOR_INTEL, 6, 0x2e},
>   { X86_VENDOR_INTEL, 6, 0x2f},
>   { X86_VENDOR_INTEL, 6, 0x3a},
> + { X86_VENDOR_INTEL, 6, 0x3c},
> + { X86_VENDOR_INTEL, 6, 0x3e},
> + { X86_VENDOR_INTEL, 6, 0x3f},
> + { X86_VENDOR_INTEL, 6, 0x45},
> + { X86_VENDOR_INTEL, 6, 0x46},
>   {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, intel_powerclamp_ids);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [f2fs-dev] [PATCH v2] f2fs: avoid congestion_wait when do_checkpoint for better performance

2013-10-08 Thread Gu Zheng
Hi Yuan,
On 10/08/2013 07:30 PM, Yuan Zhong wrote:

> Hi Gu,
> 
>> Hi Yuan,
>> On 10/08/2013 04:30 PM, Yuan Zhong wrote:
> 
>>> Previously, do_checkpoint() will call congestion_wait() for waiting the 
>>> pages (previous submitted node/meta/data pages) to be written back.
>>> Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for 
>>> waiting.
>>> For this reason, there is a situation that after the pages have been 
>>> written back, 
>>> but the checkpoint thread still wait for congestion_wait to exit.
> 
>> How do you confirm this issue? 
> 
>   I traced the execution path.
>   In f2fs_end_io_write, dec_page_count(p->sbi, F2FS_WRITEBACK) will be called.
>   And I found that, when pages of F2FS_WRITEBACK has been zero, but
>   checkpoint thread still congestion_wait for pages of F2FS_WRITEBACK to be 
> zero.

Yes, it maybe. Congestion_wait add the task to a global wait queue which 
related to
all back devices, so if F2FS_WRITEBACK has been zero, but other io may be still 
going on.
Anyway, using a private wait queue to hold is a better choose.:)


>   So, I think this point could be improved.
>   And I wrote a simple test case and tested on Micro-SD card, the steps as 
> following:
>   (a) create a fixed-size file (4KB)
>   (b) go on to sync the file 
>   (c) go back to step #a (fixed numbers of cycling:1024)  
>The results indicated that the execution time is reduced greatly by using 
> this patch.

Yes, the change is an improvement if the issue is existent.

  
> 
> 
>> I suspect that the block-core does not have a wake-up mechanism
>> when the back device is uncongested.
> 
> 
>   Yes, you are right.
>   So I wake up the checkpoint thread by myself, when pages of F2FS_WRITEBACK 
> to be zero.
>   In f2fs_end_io_write, f2fs_writeback_wait is called.
>   you cloud find this code in my patch. 

Saw it.:)
But one problem is that the checkpoint routine always is singleton, so the wait 
queue just
services only one body, it seems not very worthy. How about just schedule and 
wake up it
directly? See the following one.

Signed-off-by: Gu Zheng 
---
 fs/f2fs/checkpoint.c |   11 +--
 fs/f2fs/f2fs.h   |1 +
 fs/f2fs/segment.c|4 
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d808827..2a5999d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -757,8 +757,15 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
f2fs_put_page(cp_page, 1);
 
/* wait for previous submitted node/meta pages writeback */
-   while (get_pages(sbi, F2FS_WRITEBACK))
-   congestion_wait(BLK_RW_ASYNC, HZ / 50);
+   sbi->cp_task = current;
+   while (get_pages(sbi, F2FS_WRITEBACK)) {
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   if (!get_pages(sbi, F2FS_WRITEBACK))
+   break;
+   io_schedule();
+   }
+   __set_current_state(TASK_RUNNING);
+   sbi->cp_task = NULL;
 
filemap_fdatawait_range(sbi->node_inode->i_mapping, 0, LONG_MAX);
filemap_fdatawait_range(sbi->meta_inode->i_mapping, 0, LONG_MAX);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a955a59..408ace7 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -365,6 +365,7 @@ struct f2fs_sb_info {
struct mutex writepages;/* mutex for writepages() */
int por_doing;  /* recovery is doing or not */
int on_build_free_nids; /* build_free_nids is doing */
+   struct task_struct *cp_task;/* checkpoint task */
 
/* for orphan inode management */
struct list_head orphan_inode_list; /* orphan inode list */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index bd79bbe..3b20359 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -597,6 +597,10 @@ static void f2fs_end_io_write(struct bio *bio, int err)
 
if (p->is_sync)
complete(p->wait);
+
+   if (!get_pages(p->sbi, F2FS_WRITEBACK) && p->sbi->cp_task)
+   wake_up_process(p->sbi->cp_task);
+
kfree(p);
bio_put(bio);
 }
-- 
1.7.7

Regards,
Gu 

> 
> 
>>> This is a problem here, especially, when sync a large number of small files 
>>> or dirs.
>>> In order to avoid this, a wait_list is introduced, 
>>> the checkpoint thread will be dropped into the wait_list if the pages have 
>>> not been written back, 
>>> and will be waked up by contrast.
> 
>> Please pay some attention to the mail form, this mail is out of format in my 
>> mail client.
> 
>> Regards,
>> Gu
> 
> Regards,
> Yuan
> 
>>>
>>> Signed-off-by: Yuan Zhong 
>>> ---  
>>>  fs/f2fs/checkpoint.c |3 +--
>>>  fs/f2fs/f2fs.h   |   19 +++
>>>  fs/f2fs/segment.c|1 +
>>>  fs/f2fs/super.c  |1 +
>>>  4 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c 

Re: [PATCH] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread Joe Perches
On Wed, 2013-10-09 at 13:00 +0900, Sangjung Woo wrote:
> In order to be free automatically and make the cleanup paths more
> simple, use devm_kzalloc() instead of kzalloc().
[]
> diff --git a/drivers/rtc/rtc-pl030.c b/drivers/rtc/rtc-pl030.c
[]
> @@ -106,7 +106,7 @@ static int pl030_probe(struct amba_device *dev, const 
> struct amba_id *id)
>   if (ret)
>   goto err_req;
>  
> - rtc = kmalloc(sizeof(*rtc), GFP_KERNEL);
> + rtc = devm_kzalloc(>dev, sizeof(*rtc), GFP_KERNEL);

You're not deleting a memset and you're
converting a kmalloc.

Why do you need the zalloc version?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rtc: pl030: Use devm_kzalloc() instead of kmalloc()

2013-10-08 Thread Sangjung Woo
In order to be free automatically and make the cleanup paths more
simple, use devm_kzalloc() instead of kzalloc().

Signed-off-by: Sangjung Woo 
---
 drivers/rtc/rtc-pl030.c |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/rtc/rtc-pl030.c b/drivers/rtc/rtc-pl030.c
index 22bacdb..ecd57c6 100644
--- a/drivers/rtc/rtc-pl030.c
+++ b/drivers/rtc/rtc-pl030.c
@@ -106,7 +106,7 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
if (ret)
goto err_req;
 
-   rtc = kmalloc(sizeof(*rtc), GFP_KERNEL);
+   rtc = devm_kzalloc(>dev, sizeof(*rtc), GFP_KERNEL);
if (!rtc) {
ret = -ENOMEM;
goto err_rtc;
@@ -115,7 +115,7 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
rtc->base = ioremap(dev->res.start, resource_size(>res));
if (!rtc->base) {
ret = -ENOMEM;
-   goto err_map;
+   goto err_rtc;
}
 
__raw_writel(0, rtc->base + RTC_CR);
@@ -141,8 +141,6 @@ static int pl030_probe(struct amba_device *dev, const 
struct amba_id *id)
free_irq(dev->irq[0], rtc);
  err_irq:
iounmap(rtc->base);
- err_map:
-   kfree(rtc);
  err_rtc:
amba_release_regions(dev);
  err_req:
@@ -160,7 +158,6 @@ static int pl030_remove(struct amba_device *dev)
free_irq(dev->irq[0], rtc);
rtc_device_unregister(rtc->rtc);
iounmap(rtc->base);
-   kfree(rtc);
amba_release_regions(dev);
 
return 0;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread H. Peter Anvin
On 10/02/2013 03:29 AM, Alexander Gordeev wrote:
> 
> As result, device drivers will cease to use the overcomplicated
> repeated fallbacks technique and resort to a straightforward
> pattern - determine the number of MSI/MSI-X interrupts required
> before calling pci_enable_msi_block() and pci_enable_msix()
> interfaces:
> 
> 
>   rc = pci_msix_table_size(adapter->pdev);
>   if (rc < 0)
>   return rc;
> 
>   nvec = min(nvec, rc);
>   if (nvec < FOO_DRIVER_MINIMUM_NVEC) {
>   return -ENOSPC;
> 
>   for (i = 0; i < nvec; i++)
>   adapter->msix_entries[i].entry = i;
> 
>   rc = pci_enable_msix(adapter->pdev,
>adapter->msix_entries, nvec);
>   return rc;
> 

Why not add a minimum number to pci_enable_msix(), i.e.:

pci_enable_msix(pdev, msix_entries, nvec, minvec)

... which means "nvec" is the number of interrupts *requested*, and
"minvec" is the minimum acceptable number (otherwise fail).

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/3] ARM: shmobile: kzm9d: Use common clock framework

2013-10-08 Thread Simon Horman
On Tue, Oct 08, 2013 at 02:34:03PM +0900, takas...@ops.dti.ne.jp wrote:
> Use common clock framework version of clock
>  drivers/clk/shmobile/clk-emev2.c
> instead of sh-clkfwk version
>  arch/arm/mach-shmobile/clock-emev2.c
> when it is configured as a part of multi-platform.
> 
> Signed-off-by: Takashi Yoshii 

Thanks.

I plan to add this patch to a new topic branch,
topic/emev2-common-clock, in the renesas tree and
queue it up from there for inclusion in mainline
if/when the first patch of this series is accepted
by Mike Turquette.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/3] ARM: shmobile: emev2: Add clock tree description in DT

2013-10-08 Thread Simon Horman
On Tue, Oct 08, 2013 at 02:54:26PM +0900, Magnus Damm wrote:
> On Tue, Oct 8, 2013 at 2:33 PM,   wrote:
> > Add minimum clock tree description to .dts file.
> > This provides same set of clocks as current sh-clkfwk version .c
> > code does.
> >
> > Signed-off-by: Takashi Yoshii 

Thanks.

I plan to add this patch to a new topic branch,
topic/emev2-common-clock, in the renesas tree and
queue it up from there for inclusion in mainline
if/when the first patch of this series is accepted
by Mike Turquette.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

2013-10-08 Thread Andi Kleen
Some more comments.

>   - your patches might break apps/ABI

Can you please explain that a bit more. We have a lot of CONFIG options
that disable syscalls, /sys, lots of stuff. Whoever uses them needs
to know what they are doing. I thought it was pretty
much consensus that Linux is supposed to be a very configurable OS,
which people can taylor from small embedded to large kitchen sink
included using CONFIG_*. 

Are you saying that Linux should not be configurable small to big? (I would
find that hard to believe). If that's really your standpoint I would
like to see some confirmation on this, as it would seem a big 
departure from traditional practice.

Or is the concern that you want it default y or EXPERT, so what
the defaults are? That sounds reasonable.

Or should it be more modular like Peter pointed out (that
would seem like a good solution for generic distros, but not so
good for deeply embedded like running on Quark)

BTW afaik pretty much every other architecture still allows to disable
it, just x86 has this dependency loop problem.

> 
>   - your patch-set unnecessarily complicates things, making the kernel
> less maintainable

I actually simplified some things, like unnecessary
dependencies between perf and profile.

These should be applied in any case as they are independent.
I can repost them.

Given some of the ifdefs/configs were not nice, perhaps there's a better
solution for this from Frederic.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] clk: emev2: Add support for emev2 SMU clocks with DT

2013-10-08 Thread Simon Horman
On Tue, Oct 08, 2013 at 02:58:08PM +0900, Magnus Damm wrote:
> On Tue, Oct 8, 2013 at 2:32 PM,   wrote:
> > Device tree clock binding document for EMMA Mobile EV2 SMU,
> > And Common clock framework based implementation of it.
> > Following nodes are defined to describe clock tree.
> > - renesas,emev2-smu
> > - renesas,emev2-smu-clkdiv
> > - renesas,emev2-smu-gclk
> >
> > These bindings are designed manually based on
> >  19UH0037EJ1000_SMU : System Management Unit User's Manual
> >
> > So far, reparent is not implemented, and is fixed to index #0.
> > Clock tree description is not included, and should be provided
> > by device-tree.
> >
> > Signed-off-by: Takashi Yoshii 
> > ---
> >  .../devicetree/bindings/clock/emev2-clock.txt  |  98 
> > +++
> >  drivers/clk/Makefile   |   1 +
> >  drivers/clk/shmobile/Makefile  |   3 +
> >  drivers/clk/shmobile/clk-emev2.c   | 104 
> > +
> >  4 files changed, 206 insertions(+)
> 
> Thanks for cleaning up the Makefile bits, Yoshii-san.
> 
> This patch and the bindings look fine to me from a SoC point of view.
> Using these together with the topology information in emev2.dtsi makes
> it possible for us to use CCF and multiplatform as expected on the
> EMEV2 SoC.
> 
> Acked-by: Magnus Damm 

Thanks. I will send a pull request for this change to Mike Turquette.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lockstat: report avg wait and hold times

2013-10-08 Thread Davidlohr Bueso
On Thu, 2013-10-03 at 14:15 +0200, Ingo Molnar wrote:
> * Davidlohr Bueso  wrote:
[...]
> > --- a/kernel/lockdep_proc.c
> > +++ b/kernel/lockdep_proc.c
> > @@ -421,6 +421,7 @@ static void seq_lock_time(struct seq_file *m, struct 
> > lock_time *lt)
> > seq_time(m, lt->min);
> > seq_time(m, lt->max);
> > seq_time(m, lt->total);
> > +   seq_time(m, lt->nr ? lt->total/lt->nr : 0);
> 
> That won't build on 32-bit systems as lt->total is s64.
> 
> You'll need to utilize do_div().

Oops, thanks for pointing that out. Below is v2.

8<-
From: Davidlohr Bueso 
Subject: [PATCH v2] lockstat: Report avg wait and hold times

While both the nr and total times are showed, having the avg
lock hold and wait times show in the report is quite useful when
working on performance related issues. Furthermore, I find
myself constantly doing the calculations manually.

In addition, some of the documentation examples were changed to
easily update them to show the two new columns. No textual
change otherwise, as descriptions match the lockstat output.

Signed-off-by: Davidlohr Bueso 
Cc: as...@hp.com
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1380746928.2313.14.ca...@buesod1.americas.hpqcorp.net
[ Fixlets: changed a seq_printf() to seq_puts(), converted spaces to tabs. ]
Signed-off-by: Ingo Molnar 

Signed-off-by: Davidlohr Bueso 
---
 Documentation/lockstat.txt | 123 ++---
 kernel/lockdep_proc.c  |  15 +++---
 2 files changed, 70 insertions(+), 68 deletions(-)

diff --git a/Documentation/lockstat.txt b/Documentation/lockstat.txt
index dd2f7b2..72d0106 100644
--- a/Documentation/lockstat.txt
+++ b/Documentation/lockstat.txt
@@ -46,16 +46,14 @@ With these hooks we provide the following statistics:
  contentions   - number of lock acquisitions that had to wait
  wait time min - shortest (non-0) time we ever had to wait for a lock
max - longest time we ever had to wait for a lock
-   total   - total time we spend waiting on this lock
+  total   - total time we spend waiting on this lock
+  avg - average time spent waiting on this lock
  acq-bounces   - number of lock acquisitions that involved x-cpu data
  acquisitions  - number of times we took the lock
  hold time min - shortest (non-0) time we ever held the lock
-   max - longest time we ever held the lock
-   total   - total time this lock was held
-
-From these number various other statistics can be derived, such as:
-
- hold time average = hold time total / acquisitions
+  max - longest time we ever held the lock
+  total   - total time this lock was held
+  avg - average time this lock was held
 
 These numbers are gathered per lock class, per read/write state (when
 applicable).
@@ -84,37 +82,38 @@ Look at the current lock statistics:
 
 # less /proc/lock_stat
 
-01 lock_stat version 0.3
-02 
---
-03   class namecon-bouncescontentions   
waittime-min   waittime-max waittime-totalacq-bounces   acquisitions   
holdtime-min   holdtime-max holdtime-total
-04 
---
+01 lock_stat version 0.4
+02-
+03  class namecon-bouncescontentions   
waittime-min   waittime-max waittime-total   waittime-avgacq-bounces   
acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
+04-
 05
-06  >mmap_sem-W:   233538 
18446744073708   22924.27  607243.51   1342  45806  
 1.718595.89 1180582.34
-07  >mmap_sem-R:   205587 
18446744073708   28403.36  731975.00   1940 412426  
 0.58  187825.45 6307502.88
-08  ---
-09>mmap_sem487  
[] do_page_fault+0x466/0x928
-10>mmap_sem179  
[] sys_mprotect+0xcd/0x21d
-11>mmap_sem279  
[] 

Re: [PATCH 00/16] sched/wait: Collapse __wait_event macros -v5

2013-10-08 Thread Paul E. McKenney
On Tue, Oct 08, 2013 at 08:28:43PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 08, 2013 at 01:40:56PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 08, 2013 at 09:47:18PM +0200, Ingo Molnar wrote:

[ . . . ]

> > > > Should I be thinking about making a kernel/rcu?
> > > 
> > > I wanted to raise it with you at the KS :-)
> > 
> > Sorry for jumping the gun.  ;-)
> > 
> > > To me it would sure look nice to have kernel/rcu/tree.c, 
> > > kernel/rcu/tiny.c, kernel/rcu/core.c, etc.
> > > 
> > > [ ... and we would certainly also break new ground by introducing a
> > >   "torture.c" file, for the first time in Linux kernel history! ;-) ]
> > 
> > Ooh...  I had better act fast!  ;-)
> > 
> > > But it's really your call, this is something you should only do if you 
> > > are 
> > > comfortable with it.
> > 
> > I have actually been thinking about it off and on for some time.
> 
> And here is a first cut.  Just the renaming and needed adjustments,
> no splitting or merging of files.
> 
> Thoughts?

Wow!  I rebased my commits destined for 3.14 on top of this, and "git
rebase" did it with several protests, but with no manual intervention
required.

Now if it actually still builds, boots, and runs...  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/16] sched/wait: Collapse __wait_event macros -v5

2013-10-08 Thread Paul E. McKenney
On Tue, Oct 08, 2013 at 01:40:56PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 08, 2013 at 09:47:18PM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney  wrote:
> > 
> > > On Tue, Oct 08, 2013 at 12:23:31PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * Peter Zijlstra  wrote:
> > > > 
> > > > > On Sat, Oct 05, 2013 at 10:04:16AM +0200, Ingo Molnar wrote:
> > > > > > 
> > > > > > * Peter Zijlstra  wrote:
> > > > > > 
> > > > > > > On Fri, Oct 04, 2013 at 10:44:05PM +0200, Peter Zijlstra wrote:
> > > > > > > > 
> > > > > > > > slightly related; do we want to do something like the following 
> > > > > > > > two
> > > > > > > > patches?
> > > > > > > 
> > > > > > > and
> > > > > > 
> > > > > > Yeah, both look good to me - but I'd move them into 
> > > > > > kernel/sched/completion.c and kernel/sched/wait.c if no-one objects.
> > > > > 
> > > > > Do you also want to suck in semaphore.c mutex.c rwsem.c spinlock.c 
> > > > > etc? 
> > > > > Or do you want to create something like kernel/locking/ for all that.
> > > > 
> > > > Yeah, I think kernel/locking/ would be a suitable place for those, and 
> > > > I'd 
> > > > move lockdep*.c there too. (Such things are best done near the end of a 
> > > > merge window, when there's not much pending, to not disrupt 
> > > > development.)
> > > > 
> > > > kernel/*.c is a pretty crowded place with 100+ files currently, I've 
> > > > been 
> > > > gradually working towards depopulating it slowly but surely for 
> > > > subsystems 
> > > > that I co-maintain or where I'm frequently active. We already have:
> > > > 
> > > >   kernel/sched/
> > > >   kernel/events/
> > > >   kernel/irq/
> > > >   kernel/time/
> > > >   kernel/trace/
> > > > 
> > > > and the deeper kernel/*/* hierarchies already host another ~100 .c 
> > > > files. 
> > > > So the transition is half done already I suspect.
> > > 
> > > Should I be thinking about making a kernel/rcu?
> > 
> > I wanted to raise it with you at the KS :-)
> 
> Sorry for jumping the gun.  ;-)
> 
> > To me it would sure look nice to have kernel/rcu/tree.c, 
> > kernel/rcu/tiny.c, kernel/rcu/core.c, etc.
> > 
> > [ ... and we would certainly also break new ground by introducing a
> >   "torture.c" file, for the first time in Linux kernel history! ;-) ]
> 
> Ooh...  I had better act fast!  ;-)
> 
> > But it's really your call, this is something you should only do if you are 
> > comfortable with it.
> 
> I have actually been thinking about it off and on for some time.

And here is a first cut.  Just the renaming and needed adjustments,
no splitting or merging of files.

Thoughts?

Thanx, Paul



diff --git a/kernel/Makefile b/kernel/Makefile
index 1ce47553fb02..f99d908b5550 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -6,9 +6,9 @@ obj-y = fork.o exec_domain.o panic.o \
cpu.o exit.o itimer.o time.o softirq.o resource.o \
sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \
signal.o sys.o kmod.o workqueue.o pid.o task_work.o \
-   rcupdate.o extable.o params.o posix-timers.o \
+   extable.o params.o posix-timers.o \
kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
-   hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
+   hrtimer.o rwsem.o nsproxy.o semaphore.o \
notifier.o ksysfs.o cred.o reboot.o \
async.o range.o groups.o lglock.o smpboot.o
 
@@ -27,6 +27,7 @@ obj-y += power/
 obj-y += printk/
 obj-y += cpu/
 obj-y += irq/
+obj-y += rcu/
 
 obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
@@ -81,12 +82,6 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
-obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
-obj-$(CONFIG_TREE_RCU) += rcutree.o
-obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
-obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
-obj-$(CONFIG_TINY_RCU) += rcutiny.o
-obj-$(CONFIG_TINY_PREEMPT_RCU) += rcutiny.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
new file mode 100644
index ..01e9ec37a3e3
--- /dev/null
+++ b/kernel/rcu/Makefile
@@ -0,0 +1,6 @@
+obj-y += update.o srcu.o
+obj-$(CONFIG_RCU_TORTURE_TEST) += torture.o
+obj-$(CONFIG_TREE_RCU) += tree.o
+obj-$(CONFIG_TREE_PREEMPT_RCU) += tree.o
+obj-$(CONFIG_TREE_RCU_TRACE) += tree_trace.o
+obj-$(CONFIG_TINY_RCU) += tiny.o
diff --git a/kernel/rcu.h b/kernel/rcu/rcu.h
similarity index 100%
rename from kernel/rcu.h
rename to kernel/rcu/rcu.h
diff --git a/kernel/srcu.c b/kernel/rcu/srcu.c
similarity index 100%
rename from kernel/srcu.c
rename to kernel/rcu/srcu.c
diff --git a/kernel/rcutiny.c b/kernel/rcu/tiny.c
similarity index 99%
rename from 

[PATCH 1/3] perf util: Add findnew method to intlist - v2

2013-10-08 Thread David Ahern
Similar to other findnew based methods if the requested
object is not found, add it to the list.

v2: followed format of other findnew methods per acme's request

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/util/intlist.c |   22 ++
 tools/perf/util/intlist.h |1 +
 tools/perf/util/rblist.c  |   27 ---
 tools/perf/util/rblist.h  |1 +
 4 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/intlist.c b/tools/perf/util/intlist.c
index 826d7b38c9a0..89715b64a315 100644
--- a/tools/perf/util/intlist.c
+++ b/tools/perf/util/intlist.c
@@ -58,22 +58,36 @@ void intlist__remove(struct intlist *ilist, struct int_node 
*node)
rblist__remove_node(>rblist, >rb_node);
 }
 
-struct int_node *intlist__find(struct intlist *ilist, int i)
+static struct int_node *__intlist__findnew(struct intlist *ilist,
+  int i, bool create)
 {
-   struct int_node *node;
+   struct int_node *node = NULL;
struct rb_node *rb_node;
 
if (ilist == NULL)
return NULL;
 
-   node = NULL;
-   rb_node = rblist__find(>rblist, (void *)((long)i));
+   if (create)
+   rb_node = rblist__findnew(>rblist, (void *)((long)i));
+   else
+   rb_node = rblist__find(>rblist, (void *)((long)i));
+
if (rb_node)
node = container_of(rb_node, struct int_node, rb_node);
 
return node;
 }
 
+struct int_node *intlist__find(struct intlist *ilist, int i)
+{
+   return __intlist__findnew(ilist, i, false);
+}
+
+struct int_node *intlist__findnew(struct intlist *ilist, int i)
+{
+   return __intlist__findnew(ilist, i, true);
+}
+
 static int intlist__parse_list(struct intlist *ilist, const char *s)
 {
char *sep;
diff --git a/tools/perf/util/intlist.h b/tools/perf/util/intlist.h
index 0eb00ac39e01..aa6877d36858 100644
--- a/tools/perf/util/intlist.h
+++ b/tools/perf/util/intlist.h
@@ -24,6 +24,7 @@ int intlist__add(struct intlist *ilist, int i);
 
 struct int_node *intlist__entry(const struct intlist *ilist, unsigned int idx);
 struct int_node *intlist__find(struct intlist *ilist, int i);
+struct int_node *intlist__findnew(struct intlist *ilist, int i);
 
 static inline bool intlist__has_entry(struct intlist *ilist, int i)
 {
diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index a16cdd2625ad..0dfe27d99458 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -48,10 +48,12 @@ void rblist__remove_node(struct rblist *rblist, struct 
rb_node *rb_node)
rblist->node_delete(rblist, rb_node);
 }
 
-struct rb_node *rblist__find(struct rblist *rblist, const void *entry)
+static struct rb_node *__rblist__findnew(struct rblist *rblist,
+const void *entry,
+bool create)
 {
struct rb_node **p = >entries.rb_node;
-   struct rb_node *parent = NULL;
+   struct rb_node *parent = NULL, *new_node = NULL;
 
while (*p != NULL) {
int rc;
@@ -67,7 +69,26 @@ struct rb_node *rblist__find(struct rblist *rblist, const 
void *entry)
return parent;
}
 
-   return NULL;
+   if (create) {
+   new_node = rblist->node_new(rblist, entry);
+   if (new_node) {
+   rb_link_node(new_node, parent, p);
+   rb_insert_color(new_node, >entries);
+   ++rblist->nr_entries;
+   }
+   }
+
+   return new_node;
+}
+
+struct rb_node *rblist__find(struct rblist *rblist, const void *entry)
+{
+   return __rblist__findnew(rblist, entry, false);
+}
+
+struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry)
+{
+   return __rblist__findnew(rblist, entry, true);
 }
 
 void rblist__init(struct rblist *rblist)
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 6d0cae5ae83d..ff9913b994c2 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -32,6 +32,7 @@ void rblist__delete(struct rblist *rblist);
 int rblist__add_node(struct rblist *rblist, const void *new_entry);
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node);
 struct rb_node *rblist__find(struct rblist *rblist, const void *entry);
+struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry);
 struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx);
 
 static inline bool rblist__empty(const struct rblist *rblist)
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf record: mmap output file

2013-10-08 Thread David Ahern
When recording raw_syscalls for the entire system, e.g.,
perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1

you end up with a negative feedback loop as perf itself calls
write() fairly often. This patch handles the problem by mmap'ing the
file in chunks of 64M at a time and copies events from the event buffers
to the file avoiding write system calls.

Before (with write syscall):

perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
[ perf record: Woken up 0 times to write data ]
[ perf record: Captured and wrote 81.843 MB /tmp/perf.data (~3575786 samples) ]

After (using mmap):

perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
[ perf record: Woken up 31 times to write data ]
[ perf record: Captured and wrote 8.203 MB /tmp/perf.data (~358388 samples) ]

In addition to perf-trace benefits using mmap lowers the overhead of
perf-record. For example,

  perf stat -i -- perf record -g -o /tmp/perf.data openssl speed aes

showsi a drop in time, CPU cycles, and instructions all drop by more than a
factor of 3. Jiri also ran a test that showed a big improvement.

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Mike Galbraith 
Cc: Stephane Eranian 
---
 tools/perf/builtin-record.c |   87 +++
 1 file changed, 87 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index da1384012505..45bb565e0bb1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,6 +29,9 @@
 #include 
 #include 
 
+/* mmap file big chunks at a time */
+#define MMAP_OUTPUT_SIZE   (64*1024*1024)
+
 #ifndef HAVE_ON_EXIT
 #ifndef ATEXIT_MAX
 #define ATEXIT_MAX 32
@@ -64,6 +67,14 @@ static void __handle_on_exit_funcs(void)
 struct perf_record {
struct perf_tooltool;
struct perf_record_opts opts;
+
+   /* for MMAP based file writes */
+   void*mmap_addr;
+   u64 bytes_at_mmap_start; /* bytes in file when mmap 
use starts */
+   u64 mmap_offset;/* current location within mmap 
*/
+   size_t  mmap_size;  /* size of mmap segments */
+   booluse_mmap;
+
u64 bytes_written;
const char  *output_name;
struct perf_evlist  *evlist;
@@ -82,8 +93,66 @@ static void advance_output(struct perf_record *rec, size_t 
size)
rec->bytes_written += size;
 }
 
+static int do_mmap_output(struct perf_record *rec, void *buf, size_t size)
+{
+   u64 remaining;
+   off_t offset;
+
+   if (rec->mmap_addr == NULL) {
+do_mmap:
+   offset = rec->bytes_at_mmap_start + rec->bytes_written;
+   if (offset < (ssize_t) rec->mmap_size) {
+   rec->mmap_offset = offset;
+   offset = 0;
+   } else
+   rec->mmap_offset = 0;
+
+   rec->mmap_addr = mmap(NULL, rec->mmap_size,
+PROT_WRITE | PROT_READ,
+MAP_SHARED,
+rec->output,
+offset);
+
+   if (rec->mmap_addr == MAP_FAILED) {
+   pr_err("mmap failed: %d: %s\n", errno, strerror(errno));
+   return -1;
+   }
+
+   /* expand file to include this mmap segment */
+   if (ftruncate(rec->output, offset + rec->mmap_size) != 0) {
+   pr_err("ftruncate failed\n");
+   return -1;
+   }
+   }
+
+   remaining = rec->mmap_size - rec->mmap_offset;
+
+   if (size > remaining) {
+   memcpy(rec->mmap_addr + rec->mmap_offset, buf, remaining);
+   rec->bytes_written += remaining;
+
+   size -= remaining;
+   buf  += remaining;
+
+   msync(rec->mmap_addr, rec->mmap_size, MS_ASYNC);
+   munmap(rec->mmap_addr, rec->mmap_size);
+   goto do_mmap;
+   }
+
+   if (size) {
+   memcpy(rec->mmap_addr + rec->mmap_offset, buf, size);
+   rec->bytes_written += size;
+   rec->mmap_offset += size;
+   }
+
+   return 0;
+}
+
 static int write_output(struct perf_record *rec, void *buf, size_t size)
 {
+   if (rec->use_mmap)
+   return do_mmap_output(rec, buf, size);
+
while (size) {
int ret = write(rec->output, buf, size);
 
@@ -546,6 +615,11 @@ static int __cmd_record(struct perf_record *rec, int argc, 
const char **argv)
if (forks)
perf_evlist__start_workload(evsel_list);
 
+   if (!rec->opts.pipe_output && stat(output_name, ) == 0) {
+   rec->use_mmap = true;
+   

[PATCH 2/3] perf trace: Add summary option to dump syscall statistics

2013-10-08 Thread David Ahern
When enabled dumps a summary of all syscalls by task with the usual
statistics -- min, max, average and relative stddev. For example,

make - 26341 :   3344   [ 17.4% ]  0.000 ms

read :   520.000 4.802 0.644   30.08
   write :   200.004 0.036 0.010   21.72
open :   240.003 0.046 0.014   23.68
   close :   640.002 0.055 0.008   22.53
stat : 27140.002 0.222 0.0044.47
   fstat :   180.001 0.041 0.006   46.26
mmap :   300.003 0.009 0.0065.71
mprotect :80.006 0.039 0.016   32.16
  munmap :   120.007 0.077 0.020   38.25
 brk :   480.002 0.014 0.004   10.18
rt_sigaction :   180.002 0.002 0.0022.11
  rt_sigprocmask :   600.002 0.128 0.010   32.88
  access :20.006 0.006 0.0060.00
pipe :   120.004 0.048 0.013   35.98
   vfork :   340.448 0.980 0.6923.04
  execve :   200.000 0.387 0.046   56.66
   wait4 :   340.017  9923.287   593.221   68.45
   fcntl :80.001 0.041 0.013   48.79
getdents :   480.002 0.079 0.013   19.62
  getcwd :20.005 0.005 0.0050.00
   chdir :20.070 0.070 0.0700.00
   getrlimit :20.045 0.045 0.0450.00
  arch_prctl :20.002 0.002 0.0020.00
   setrlimit :20.002 0.002 0.0020.00
  openat :   940.003 0.005 0.0032.11

Signed-off-by: David Ahern 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Stephane Eranian 
---
 tools/perf/Documentation/perf-trace.txt |4 ++
 tools/perf/builtin-trace.c  |  110 +++
 2 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/tools/perf/Documentation/perf-trace.txt 
b/tools/perf/Documentation/perf-trace.txt
index 1a224862118f..54139c6457f8 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -93,6 +93,10 @@ the thread executes on the designated CPUs. Default is to 
monitor all CPUs.
 --comm::
 Show process COMM right beside its ID, on by default, disable with 
--no-comm.
 
+--summary::
+   Show a summary of syscalls by thread with min, max, and average times 
(in
+msec) and relative stddev.
+
 SEE ALSO
 
 linkperf:perf-record[1], linkperf:perf-script[1]
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 27c4269b7fe3..8a3f0bdb0f79 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -10,6 +10,7 @@
 #include "util/strlist.h"
 #include "util/intlist.h"
 #include "util/thread_map.h"
+#include "util/stat.h"
 
 #include 
 #include 
@@ -892,6 +893,8 @@ struct thread_trace {
int   max;
char  **table;
} paths;
+
+   struct intlist *syscall_stats;
 };
 
 static struct thread_trace *thread_trace__new(void)
@@ -901,6 +904,8 @@ static struct thread_trace *thread_trace__new(void)
if (ttrace)
ttrace->paths.max = -1;
 
+   ttrace->syscall_stats = intlist__new(NULL);
+
return ttrace;
 }
 
@@ -947,6 +952,7 @@ struct trace {
struct intlist  *pid_list;
boolsched;
boolmultiple_threads;
+   boolsummary;
boolshow_comm;
double  duration_filter;
double  runtime_ms;
@@ -1274,10 +1280,8 @@ typedef int (*tracepoint_handler)(struct trace *trace, 
struct perf_evsel *evsel,
  struct perf_sample *sample);
 
 static struct syscall *trace__syscall_info(struct trace *trace,
-  struct perf_evsel *evsel,
-  struct perf_sample *sample)
+  struct perf_evsel *evsel, int id)
 {
-   int id = perf_evsel__intval(evsel, sample, "id");
 
if (id < 0) {
 
@@ -1318,6 +1322,32 @@ out_cant_read:
return NULL;
 }
 
+static void thread__update_stats(struct thread_trace *ttrace,
+int id, struct perf_sample *sample)
+{
+   struct int_node *inode;
+   struct stats *stats;
+   u64 duration = 0;
+
+   inode = intlist__findnew(ttrace->syscall_stats, id);
+   if (inode == NULL)
+   return;
+
+   stats = inode->priv;
+   if (stats == NULL) {
+   stats = malloc(sizeof(struct stats));
+   if (stats == NULL)
+   

[PATCH 0/3] perf trace enhancements

2013-10-08 Thread David Ahern
Hi Arnaldo:

Revision to intlist per your comment with the summary option
updated per your perf/core branch.

The mmap output file has survived local testing without problems
so please consider it for inclusion as well. It lowers the overhead
of perf-record.

David Ahern (3):
  perf util: Add findnew method to intlist - v2
  perf trace: Add summary option to dump syscall statistics
  perf record: mmap output file

 tools/perf/Documentation/perf-trace.txt |4 ++
 tools/perf/builtin-record.c |   87 
 tools/perf/builtin-trace.c  |  110 +++
 tools/perf/util/intlist.c   |   22 +--
 tools/perf/util/intlist.h   |1 +
 tools/perf/util/rblist.c|   27 +++-
 tools/perf/util/rblist.h|1 +
 7 files changed, 233 insertions(+), 19 deletions(-)

-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

2013-10-08 Thread Andi Kleen
> So I test-built a config close to your config with both tracing and perf 
> on and off (note, I had OPROFILE and KVM in a module), and got the 
> following kernel sizes:

Yes I mistakenly included KVM (I think that was the difference)
Without KVM it's ~272k text, 96k BSS data delta.

Still big, but not quite as bad as I made it out to be.

Thanks for catching.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] ARM: add initial support for Marvell Berlin SoCs

2013-10-08 Thread Jisheng Zhang
Dear Sebastian,

On Tue, 8 Oct 2013 05:24:33 -0700
Sebastian Hesselbarth  wrote:

> This adds initial support for the Marvell Berlin (88DE3xxx) SoC family
> and basic machine setup for Armada 1500 (88DE3100) SoCs.

First of all, thanks for these patches. I worked and is still working on Marvell
Berlin Linux kernel BSP at Marvell. As the person who brought up Linux kernel 
for
various Berlin SoCs since BG2, I have some comments to share with you.

> 
> Signed-off-by: Sebastian Hesselbarth 
> Reviewed-by: Jason Cooper 
> Reviewed-by: Thomas Petazzoni 
> Reviewed-by: Arnd Bergmann 
> ---
> Changelog:
> RFCv2->v1:
> - remove custom .init_time, adds dependency for arch-wide of_clk_init call
> RFCv1->RFCv2:
> - nuke .map_io (Reported by Arnd Bergmann)
> - add copyright reference
> - switch to mach-berlin instead of mach-mvebu
> 
> Cc: Jason Cooper 
> Cc: Thomas Petazzoni 
> Cc: Arnd Bergmann 
> Cc: Russell King 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm/Kconfig  |2 ++
>  arch/arm/Makefile |1 +
>  arch/arm/mach-berlin/Kconfig  |   24 
>  arch/arm/mach-berlin/Makefile |1 +
>  arch/arm/mach-berlin/berlin.c |   39
> +++ 5 files changed, 67 insertions(+)
>  create mode 100644 arch/arm/mach-berlin/Kconfig
>  create mode 100644 arch/arm/mach-berlin/Makefile
>  create mode 100644 arch/arm/mach-berlin/berlin.c
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 1ad6fb6..5692426 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -932,6 +932,8 @@ source "arch/arm/mach-bcm/Kconfig"
>  
>  source "arch/arm/mach-bcm2835/Kconfig"
>  
> +source "arch/arm/mach-berlin/Kconfig"
> +
>  source "arch/arm/mach-clps711x/Kconfig"
>  
>  source "arch/arm/mach-cns3xxx/Kconfig"
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index a37a50f..3ba332b 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -147,6 +147,7 @@ textofs-$(CONFIG_ARCH_MSM8960) := 0x00208000
>  machine-$(CONFIG_ARCH_AT91)  += at91
>  machine-$(CONFIG_ARCH_BCM)   += bcm
>  machine-$(CONFIG_ARCH_BCM2835)   += bcm2835
> +machine-$(CONFIG_ARCH_BERLIN)+= berlin
>  machine-$(CONFIG_ARCH_CLPS711X)  += clps711x
>  machine-$(CONFIG_ARCH_CNS3XXX)   += cns3xxx
>  machine-$(CONFIG_ARCH_DAVINCI)   += davinci
> diff --git a/arch/arm/mach-berlin/Kconfig b/arch/arm/mach-berlin/Kconfig
> new file mode 100644
> index 000..56a671e
> --- /dev/null
> +++ b/arch/arm/mach-berlin/Kconfig
> @@ -0,0 +1,24 @@
> +config ARCH_BERLIN
> + bool "Marvell Berlin (88DE3xxx) SoCs" if ARCH_MULTI_V7
> + select GENERIC_CLOCKEVENTS
> + select GENERIC_IRQ_CHIP
> + select COMMON_CLK
> + select DW_APB_ICTL
> + select DW_APB_TIMER_OF
> +
> +if ARCH_BERLIN
> +
> +menu "Marvell Berlin (88DE3xxx) SoC variants"
It would be better to s/88DE3xxx/88DE or remove 88DE3xxx totally
> +
> +config MACH_MV88DE3100
Can you please use MACH_BERLIN2? This is what we used internally in latest BSP
> + bool "Marvell 88DE3100 (Armada 1500)"
> + select ARM_GIC
> + select CACHE_L2X0
The tauros3 and PL310 are different although the programming interface are
compatible. In PJ4B and Tauros3, the CP15 cache maintenance commands include
both L1 and L2, memory mapped PA-based maintenance operations in L2 are not 
needed.
How to handle this in cache-l2x0.c?
> + select CPU_PJ4B
> + select HAVE_ARM_TWD if LOCAL_TIMERS
> + select HAVE_SMP
> + select LOCAL_TIMERS if SMP
> +
> +endmenu
> +
> +endif
> diff --git a/arch/arm/mach-berlin/Makefile b/arch/arm/mach-berlin/Makefile
> new file mode 100644
> index 000..ab69fe9
> --- /dev/null
> +++ b/arch/arm/mach-berlin/Makefile
> @@ -0,0 +1 @@
> +obj-y += berlin.o
> diff --git a/arch/arm/mach-berlin/berlin.c b/arch/arm/mach-berlin/berlin.c
> new file mode 100644
> index 000..54b3ba7
> --- /dev/null
> +++ b/arch/arm/mach-berlin/berlin.c
> @@ -0,0 +1,39 @@
> +/*
> + * Device Tree support for Marvell Berlin (88DE3xxx) platforms.
ditto
> + *
> + * Sebastian Hesselbarth 
> + *
> + * based on GPL'ed 2.6 kernel sources
> + *  (c) Marvell International Ltd.
> + *
> + * This file is licensed under the terms of the GNU General Public
> + * License version 2.  This program is licensed "as is" without any
> + * warranty of any kind, whether express or implied.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static void __init berlin_init_machine(void)
> +{
> + /*
> +  * with DT probing for L2CCs, berlin_init_machine can be removed.
> +  * Note: 88DE3005 (Armada 1500-mini) uses pl310 l2cc
> +  */
> + l2x0_of_init(0x70c0, 0xfeff);
> + of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
> +}
> +
> +static const char * const berlin_dt_compat[] = {
> + "marvell,berlin",
> + NULL,
> +};
> +

Re: [PATCH 6/6] x86: Allow disabling HW_BREAKPOINTS and PERF_EVENTS

2013-10-08 Thread Andi Kleen
> You'd think that, but for whatever reason, ftrace/perf oopses still happen.

Hiding bugs seems like a poor use of the CONFIG option. It would 
be better to figure out a way to catch them earlier. Perhaps
trinity needs to run more often? any chance of a fengguang style nightly
service for mainline?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] A review of dm-writeboost

2013-10-08 Thread Akira Hayakawa
Mikulas,

> Waking up every 100ms in flush_proc is not good because it wastes CPU time 
> and energy if the driver is idle.
Yes, 100ms is too short. I will change it to 1sec then.
We can wait for 1 sec in termination.

> The problem is that if you fill up the whole cache device in less time 
> than 1 second. Then, you are waiting for 1 second until the migration 
> process notices that it has some work to do. That waiting is pointless and 
> degrades performance - the new write requests received by your driver sit 
> in queue_current_buffer, waiting for data to be migrated. And the 
> migration process sits in 
> schedule_timeout_interruptible(msecs_to_jiffies(1000)), waiting for the 
> end of the timeout. Your driver does absolutely nothing in this moment.
> 
> For this reason, you should wake the migration process as soon as 
> it is needed.
I see the reason. I agree.

Migration is not restlessly executed in writeboost.
The cache device is full and needs migration to make room for new writes
is "urgent" situation.
Setting reserving_segment_id to non-zero can tell
migration daemon that the migration is urgent.

There is also a non-urgent migration
when the backing store is loaded lower than threshold.
Restlessly migrating to the backing store may affect the
whole system. For example, it may defer the read request
to the backing store.
So, migrating only in idle time can be a good strategy.

> Pointless modulator_proc
> 
> 
> This thread does no work, it just turns cache->allow_migrate on and off. 
> Thus, the thread is useless, you can just remove it. In the place where 
> you read cache->allow_migrate (in migrate_proc) you could just do the work 
> that used to be performed in modulator_proc.
Sure, it turns the flag on and off, and this daemon is needful.
This daemon calculates the load of the backing store then
moving this code to migrate_proc and do the same thing
every loop is too CPU consuming.
Make a decision every second seems to be reasonable.

However, some system doesn't want to delay migration at all
because the backing store has a large write back cache
and wants it filled for its optimization
(e.g. reordering) to be effective.
In this case, setting both enable_migration_modulator 
and allow_migrate to 0 will do.

Also, note that related to migration
nr_max_batched_migration can determine how many segments
can be migrated at a time.

Back to the urgent migration,
the problem can be solved easily.
How about inserting waking up the migration daemon
just after reserving_segment_id to non-zero.
It is similar to waking up flush daemon when it queues a flush job.

void wait_for_migration(struct wb_cache *cache, u64 id)
{
struct segment_header *seg = get_segment_header_by_id(cache, id);

/*
 * Set reserving_segment_id to non zero
 * to force the migartion daemon
 * to complete migarate of this segment
 * immediately.
 */
cache->reserving_segment_id = id;
// HERE
wait_for_completion(>migrate_done);
cache->reserving_segment_id = 0;
}

> flush_proc is woken up correctly. But the other threads aren't. 
> migrate_proc, modulator_proc, recorder_proc, sync_proc all do polling.
For other daemons,
modulator: turns on and off according to the load of the backing store every 
second (default ON)
recorder: update the super block record every T seconds (default T=60)
sync: make all the transient data persistent every T seconds (default T=60)

They are just looping themselves.

Maybe, recorder and sync should be turned off for default.
- Recorder daemon is just for fast rebooting. The record section contains
  the last_migrated_segment_id which is used in recover_cache()
  to decrease the segments to recover.
- Sync daemon is for SLA in enterprise. Some user want to
  make the storage system persistent every given period.
  This is needless intrinsically. So, turning it off by default is appropriate.

Akira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Joe Perches
On Wed, 2013-10-09 at 13:22 +1100, Ryan Mallon wrote:

> Anyway, updated patch below:

nit:

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> @@ -1312,11 +1313,36 @@ char *pointer(const char *fmt, char *buf, char *end, 
> void *ptr,
>   spec.field_width = default_width;
>   return string(buf, end, "pK-error", spec);
>   }
> - if (!((kptr_restrict == 0) ||
> -   (kptr_restrict == 1 &&
> -has_capability_noaudit(current, CAP_SYSLOG
> +
> + switch (kptr_restrict) {
> + case 0:
> + /* Always print %pK values */
> + break;
> + case 1: {
> + /*
> +  * Only print the real pointer value if the current
> +  * proccess has CAP_SYSLOG and is running with the

s/proccess/process/ typo


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 7:06 PM, Al Viro  wrote:
> On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
>> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro  wrote:
>> >
>> > Point, but I would argue that we should yell very loud if we get 0 from
>> > vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
>> > to return that, but a lot of userland code won't be expecting that and
>> > won't be able to cope...
>>
>> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
>> a possible cause, in addition to zero-sized writes themselves, of
>> course.
>
> Umm...  What it says is "If some data can be written without blocking the
> thread, write() shall write what it can and return the number of bytes
> written. Otherwise, it shall return -1 and set errno to EAGAIN."

Look closer.

  ".. most historical implementations return zero (with the O_NDELAY
flag set, which is the historical predecessor of O_NONBLOCK .."

>> Also, writing to (but not past) the end of a block device returns 0
>> for "end of device", iirc.
>
> What do you mean?  If the starting position is below the end of device,
> we get a non-zero length write, not exceeding the end.  If it's at
> the end of device, we get -ENOSPC.  It's out of scope for POSIX, but
> Linux is definitely acting that way...

Hmm. I'm pretty sure I've seen zero returns for EOF somewhere..

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Ryan Mallon
On 09/10/13 13:00, Joe Perches wrote:
> On Wed, 2013-10-09 at 12:55 +1100, Ryan Mallon wrote:
>> On 09/10/13 12:30, Joe Perches wrote:
>>> On Tue, 2013-10-08 at 17:49 -0700, Joe Perches wrote:
 On Wed, 2013-10-09 at 11:15 +1100, Ryan Mallon wrote:
> Some setuid binaries will allow reading of files which have read
> permission by the real user id. This is problematic with files which
> use %pK because the file access permission is checked at open() time,
> but the kptr_restrict setting is checked at read() time. If a setuid
> binary opens a %pK file as an unprivileged user, and then elevates
> permissions before reading the file, then kernel pointer values may be
> leaked.
 I think it should explicitly test 0.
>>> Also, Documentation/sysctl/kernel.txt should be updated too.
>>>
>>> Here's a suggested patch:
>>>
>>> ---
>>>  Documentation/sysctl/kernel.txt | 14 --
>>>  lib/vsprintf.c  | 38 ++
>>>  2 files changed, 34 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/Documentation/sysctl/kernel.txt 
>>> b/Documentation/sysctl/kernel.txt
>>> index 9d4c1d1..eac53d5 100644
>>> --- a/Documentation/sysctl/kernel.txt
>>> +++ b/Documentation/sysctl/kernel.txt
>>> @@ -290,13 +290,15 @@ Default value is "/sbin/hotplug".
>>>  kptr_restrict:
>>>  
>>>  This toggle indicates whether restrictions are placed on
>>> -exposing kernel addresses via /proc and other interfaces.  When
>>> -kptr_restrict is set to (0), there are no restrictions.  When
>>> -kptr_restrict is set to (1), the default, kernel pointers
>>> +exposing kernel addresses via /proc and other interfaces.
>>> +
>>> +When kptr_restrict is set to (0), there are no restrictions.
>>> +When kptr_restrict is set to (1), the default, kernel pointers
>>>  printed using the %pK format specifier will be replaced with 0's
>>> -unless the user has CAP_SYSLOG.  When kptr_restrict is set to
>>> -(2), kernel pointers printed using %pK will be replaced with 0's
>>> -regardless of privileges.
>>> +unless the user has CAP_SYSLOG and effective user and group ids
>>> +are equal to the real ids.
>>> +When kptr_restrict is set to (2), kernel pointers printed using
>>> +%pK will be replaced with 0's regardless of privileges.
>> I'll add this, thanks.
>>
>> I'm less fussed about the suggestions for the logic. The current test is
>> small and concise.
> The logic ends up the same to the compiler, but it's
> human readers that want simple and clear.
>
>> The original also does the in_irq tests regardless of
>> the kptr_restrict setting since they are mostly intended to catch
>> internal kernel bugs.
> Not so.
>
> http://marc.info/?l=linux-security-module=129303800912245=4
> https://lkml.org/lkml/2012/7/13/428
>

Ah, I misread it. It does however check when kptr_restrict != 0, not
just when kptr_restrict is 1. I've left the in_irq test as-is, but used
a switch as suggested. I don't really care either way, I think the
original check is quite readable. Anyway, updated patch below:

~Ryan

---

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d1..eac53d5 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -290,13 +290,15 @@ Default value is "/sbin/hotplug".
 kptr_restrict:
 
 This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces.  When
-kptr_restrict is set to (0), there are no restrictions.  When
-kptr_restrict is set to (1), the default, kernel pointers
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to (0), there are no restrictions.
+When kptr_restrict is set to (1), the default, kernel pointers
 printed using the %pK format specifier will be replaced with 0's
-unless the user has CAP_SYSLOG.  When kptr_restrict is set to
-(2), kernel pointers printed using %pK will be replaced with 0's
-regardless of privileges.
+unless the user has CAP_SYSLOG and effective user and group ids
+are equal to the real ids.
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
 
 ==
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 26559bd..6dd8c5d 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include   /* for PAGE_SIZE */
@@ -1312,11 +1313,36 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
spec.field_width = default_width;
return string(buf, end, "pK-error", spec);
}
-   if (!((kptr_restrict == 0) ||
- (kptr_restrict == 1 &&
-  has_capability_noaudit(current, CAP_SYSLOG
+
+   switch (kptr_restrict) {
+   case 0:
+   /* Always 

[GIT PULL] MTD fixes for 3.12-rc

2013-10-08 Thread Brian Norris
Hi Linus!

David Woodhouse and I have queued up a few MTD fixes for 3.12. As David seems
to be MIA again, I am sending the following pull request with his permission.

Thanks,
Brian

The following changes since commit 7b9e3a6ac00be4f3d654a711573b1794b046c22f:

  Merge tag 'fixes-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2013-09-19 18:49:08 
-0500)

are available in the git repository at:


  git://git.infradead.org/linux-mtd.git tags/for-linus-20131008

for you to fetch changes up to 2b468ef0e7959b703626b64c4d264ef822c9267a:

  mtd: m25p80: Fix 4 byte addressing mode for Micron devices. (2013-09-27 
05:56:22 -0500)



- fix a small memory leak in some new ONFI code
- account for additional odd variations of Micron SPI flash


Brian Norris (1):
  mtd: nand: fix memory leak in ONFI extended parameter page

Elie De Brauwer (1):
  mtd: m25p80: Fix 4 byte addressing mode for Micron devices.

 drivers/mtd/devices/m25p80.c | 17 +++--
 drivers/mtd/nand/nand_base.c |  8 +++-
 2 files changed, 18 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


gma500_gfx: Black VGA display with Intel D2500CC board

2013-10-08 Thread Jan-Benedict Glaw
Hi!

I'll try an up-to-date kernel tomorrow, but with 3.10.x running, I
thought the usual fixes for black screens should be included.

  The board features a VGA as well as a DVI connector, VGA is
connected. No LVDS, no DisplayPort. With DRM debugging enabled, I get
this:

[0.00] Linux version 3.10-3-amd64 (debian-ker...@lists.debian.org) (gcc 
version 4.7.3 (Debian 4.7.3-7) ) #1 SMP Debian 3.10.11-1 (2013-09-10)
[...]
[   15.714564] [drm] Initialized drm 1.1.0 20060810
[   28.544053] [drm:drm_pci_init], 
[   28.544088] [drm:drm_get_pci_dev], 
[   28.544279] [drm:drm_get_minor], 
[   28.544482] [drm:drm_get_minor], new minor assigned 64
[   28.544487] [drm:drm_get_minor], 
[   28.544632] [drm:drm_get_minor], new minor assigned 0
[   28.544657] gma500 :00:02.0: setting latency timer to 64
[   28.544782] [drm:psb_intel_opregion_setup], OpRegion detected at 0xcef92018
[   28.544797] [drm:psb_intel_opregion_setup], Public ACPI methods supported
[   28.544801] [drm:psb_intel_opregion_setup], ASLE supported
[   28.544838] gma500 :00:02.0: irq 49 for MSI/MSI-X
[   28.544862] [drm:psb_intel_init_bios], Using VBT from OpRegion: $VBT 
CEDARVIEW  d
[   28.544871] [drm:drm_mode_debug_printmodeline], Modeline 0:"1920x1080" 0 
144000 1920 2016 2080 2176 1080 1088 1092 1100 0x8 0xa
[   28.544881] [drm:parse_sdvo_device_mapping], No SDVO device info is found in 
VBT
[   28.544888] [drm:parse_edp], EDP timing in vbt t1_t3 2000 t8 10 t9 2000 t10 
500 t11_t12 5000
[   28.544894] [drm:parse_edp], VBT reports EDP: Lane_count 1, Lane_rate 6, Bpp 
24
[   28.544899] [drm:parse_edp], VBT reports EDP: VSwing  0, Preemph 0
[   28.560830] acpi device:29: registered as cooling_device0
[   28.561260] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   28.561428] input: Video Bus as 
/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:00/input/input1
[   28.561515] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   28.561553] [drm] No driver support for vblank timestamp query.
[   28.561591] [drm:drm_irq_install], irq=49
[   28.562110] [drm:drm_sysfs_connector_add], adding "VGA-1" to sysfs
[   28.562160] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.623667] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter 
intel drm LVDSDDC_C
[   28.623678] [drm:drm_sysfs_connector_add], adding "LVDS-1" to sysfs
[   28.623731] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.623861] [drm:drm_sysfs_connector_add], adding "DVI-D-1" to sysfs
[   28.623909] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.623931] [drm:drm_sysfs_connector_add], adding "DP-1" to sysfs
[   28.623978] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.624490] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-B
[   28.624995] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[   28.625000] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[   28.625503] [drm:cdv_intel_dp_aux_ch], dp_aux_ch timeout status 0x51440064
[   28.625507] [drm:cdv_intel_dp_i2c_aux_ch], aux_ch failed -110
[   28.625669] [drm:drm_sysfs_connector_add], adding "DVI-D-2" to sysfs
[   28.625718] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.625741] [drm:drm_sysfs_connector_add], adding "DP-2" to sysfs
[   28.625786] [drm:drm_sysfs_hotplug_event], generating hotplug event
[   28.626298] [drm:cdv_intel_dp_i2c_init], i2c_init DPDDC-C
[   28.626603] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[   28.626806] [drm:cdv_intel_dp_i2c_aux_ch], aux_i2c nack
[   28.666898] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[   28.686768] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   28.686812] gma500 :00:02.0: trying to get vblank count for disabled 
pipe 1
[   28.726865] [drm:cdv_intel_single_pipe_active], pipe enabled 0
[   28.746737] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1]
[   28.872705] [drm:drm_helper_probe_single_connector_modes], 
[CONNECTOR:7:VGA-1] probed modes :
[   28.872712] [drm:drm_mode_debug_printmodeline], Modeline 23:"1280x1024" 60 
108000 1280 1328 1440 1688 1024 1025 1028 1066 0x48 0x5
[   28.872720] [drm:drm_mode_debug_printmodeline], Modeline 33:"1280x1024" 75 
135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[   28.872728] [drm:drm_mode_debug_printmodeline], Modeline 34:"1024x768" 75 
78800 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[   28.872736] [drm:drm_mode_debug_printmodeline], Modeline 25:"1024x768" 75 
78750 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[   28.872743] [drm:drm_mode_debug_printmodeline], Modeline 35:"1024x768" 70 
75000 1024 1048 1184 1328 768 771 777 806 0x40 0xa
[   28.872751] [drm:drm_mode_debug_printmodeline], Modeline 36:"1024x768" 60 
65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[   28.872759] [drm:drm_mode_debug_printmodeline], Modeline 37:"800x600" 75 
49500 800 816 896 1056 600 601 604 625 0x40 0x5
[   28.872766] [drm:drm_mode_debug_printmodeline], 

Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Al Viro
On Tue, Oct 08, 2013 at 06:38:47PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 6:18 PM, Al Viro  wrote:
> >
> > Point, but I would argue that we should yell very loud if we get 0 from
> > vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> > to return that, but a lot of userland code won't be expecting that and
> > won't be able to cope...
> 
> Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
> a possible cause, in addition to zero-sized writes themselves, of
> course.

Umm...  What it says is "If some data can be written without blocking the
thread, write() shall write what it can and return the number of bytes
written. Otherwise, it shall return -1 and set errno to EAGAIN."  For
sockets EWOULDBLOCK is also allowed as a possible errno value.  I hadn't
dug through the streams-related part, but we don't have that mess anyway.
 
> Also, writing to (but not past) the end of a block device returns 0
> for "end of device", iirc.

What do you mean?  If the starting position is below the end of device,
we get a non-zero length write, not exceeding the end.  If it's at
the end of device, we get -ENOSPC.  It's out of scope for POSIX, but
Linux is definitely acting that way...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Joe Perches
On Wed, 2013-10-09 at 12:55 +1100, Ryan Mallon wrote:
> On 09/10/13 12:30, Joe Perches wrote:
> > On Tue, 2013-10-08 at 17:49 -0700, Joe Perches wrote:
> >> On Wed, 2013-10-09 at 11:15 +1100, Ryan Mallon wrote:
> >>> Some setuid binaries will allow reading of files which have read
> >>> permission by the real user id. This is problematic with files which
> >>> use %pK because the file access permission is checked at open() time,
> >>> but the kptr_restrict setting is checked at read() time. If a setuid
> >>> binary opens a %pK file as an unprivileged user, and then elevates
> >>> permissions before reading the file, then kernel pointer values may be
> >>> leaked.
> >>
> >> I think it should explicitly test 0.
> > 
> > Also, Documentation/sysctl/kernel.txt should be updated too.
> > 
> > Here's a suggested patch:
> > 
> > ---
> >  Documentation/sysctl/kernel.txt | 14 --
> >  lib/vsprintf.c  | 38 ++
> >  2 files changed, 34 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/sysctl/kernel.txt 
> > b/Documentation/sysctl/kernel.txt
> > index 9d4c1d1..eac53d5 100644
> > --- a/Documentation/sysctl/kernel.txt
> > +++ b/Documentation/sysctl/kernel.txt
> > @@ -290,13 +290,15 @@ Default value is "/sbin/hotplug".
> >  kptr_restrict:
> >  
> >  This toggle indicates whether restrictions are placed on
> > -exposing kernel addresses via /proc and other interfaces.  When
> > -kptr_restrict is set to (0), there are no restrictions.  When
> > -kptr_restrict is set to (1), the default, kernel pointers
> > +exposing kernel addresses via /proc and other interfaces.
> > +
> > +When kptr_restrict is set to (0), there are no restrictions.
> > +When kptr_restrict is set to (1), the default, kernel pointers
> >  printed using the %pK format specifier will be replaced with 0's
> > -unless the user has CAP_SYSLOG.  When kptr_restrict is set to
> > -(2), kernel pointers printed using %pK will be replaced with 0's
> > -regardless of privileges.
> > +unless the user has CAP_SYSLOG and effective user and group ids
> > +are equal to the real ids.
> > +When kptr_restrict is set to (2), kernel pointers printed using
> > +%pK will be replaced with 0's regardless of privileges.
> 
> I'll add this, thanks.
> 
> I'm less fussed about the suggestions for the logic. The current test is
> small and concise.

The logic ends up the same to the compiler, but it's
human readers that want simple and clear.

> The original also does the in_irq tests regardless of
> the kptr_restrict setting since they are mostly intended to catch
> internal kernel bugs.

Not so.

http://marc.info/?l=linux-security-module=129303800912245=4
https://lkml.org/lkml/2012/7/13/428


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] pvclock: detect watchdog reset at pvclock read

2013-10-08 Thread Marcelo Tosatti
On Tue, Oct 08, 2013 at 11:58:10AM +0200, Paolo Bonzini wrote:
> Il 08/10/2013 03:05, Marcelo Tosatti ha scritto:
> > +void pvclock_touch_watchdogs(void)
> > +{
> > +   touch_softlockup_watchdog_sync();
> > +   clocksource_touch_watchdog();
> > +   rcu_cpu_stall_reset();
> > +   reset_hung_task_detector();
> > +}
> > +
> 
> Should this function be in kernel/ instead?  It's not really
> pvclock-specific.
> 
> Paolo

kernel/watchdog.c is configurable via CONFIG_LOCKUP_DETECTOR, so its not
appropriate.

And, the choice of watchdogs to reset might be different for the caller.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 12/15] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

2013-10-08 Thread Marcelo Tosatti
On Tue, Oct 08, 2013 at 12:02:32PM +0800, Xiao Guangrong wrote:
> 
> Hi Marcelo,
> 
> On Oct 8, 2013, at 9:23 AM, Marcelo Tosatti  wrote:
> 
> >> 
> >> +  if (kvm->arch.rcu_free_shadow_page) {
> >> +  kvm_mmu_isolate_pages(invalid_list);
> >> +  sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
> >> +  list_del_init(invalid_list);
> >> +  call_rcu(>rcu, free_pages_rcu);
> >> +  return;
> >> +  }
> > 
> > This is unbounded (there was a similar problem with early fast page fault
> > implementations):
> > 
> > From RCU/checklist.txt:
> > 
> > "An especially important property of the synchronize_rcu()
> >primitive is that it automatically self-limits: if grace periods
> >are delayed for whatever reason, then the synchronize_rcu()
> >primitive will correspondingly delay updates.  In contrast,
> >code using call_rcu() should explicitly limit update rate in
> >cases where grace periods are delayed, as failing to do so can
> >result in excessive realtime latencies or even OOM conditions.
> > "
> 
> I understand what you are worrying about… Hmm, can it be avoided by
> just using kvm->arch.rcu_free_shadow_page in a small window? - Then
> there are slight chance that the page need to be freed by call_rcu.

The point that must be addressed is that you cannot allow an unlimited
number of sp's to be freed via call_rcu between two grace periods.

So something like:

- For every 17MB worth of shadow pages.
- Guarantee a grace period has passed.

If you control kvm->arch.rcu_free_shadow_page, you could periodically
verify how many MBs worth of shadow pages are in the queue for RCU
freeing and force grace period after a certain number.

> > Moreover, freeing pages differently depending on some state should 
> > be avoided.
> > 
> > Alternatives:
> > 
> > - Disable interrupts at write protect sites.
> 
> The write-protection can be triggered by KVM ioctl that is not in the VCPU
> context, if we do this, we also need to send IPI to the KVM thread when do
> TLB flush.

Yes. However for the case being measured, simultaneous page freeing by vcpus 
should be minimal (therefore not affecting the latency of GET_DIRTY_LOG).

> And we can not do much work while interrupt is disabled due to
> interrupt latency.
> 
> > - Rate limit the number of pages freed via call_rcu
> > per grace period.
> 
> Seems complex. :(
> 
> > - Some better alternative.
> 
> Gleb has a idea that uses RCU_DESTORY to protect the shadow page table
> and encodes the page-level into the spte (since we need to check if the spte
> is the last-spte. ).  How about this?

Pointer please? Why is DESTROY_SLAB_RCU any safer than call_rcu with
regards to limitation? (maybe it is).

> I planned to do it after this patchset merged, if you like it and if you think
> that "using kvm->arch.rcu_free_shadow_page in a small window" can not avoid
> the issue, i am happy to do it in the next version. :)

Unfortunately the window can be large (as it depends on the size of the
memslot), so it would be best if this problem can be addressed before 
merging. What is your idea for reducing rcu_free_shadow_page=1 window?

Thank you for the good work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Mark Lord
On 13-10-02 06:29 AM, Alexander Gordeev wrote:
..
> This update converts pci_enable_msix() and pci_enable_msi_block()
> interfaces to canonical kernel functions and makes them return a
> error code in case of failure or 0 in case of success.

Rather than silently break dozens of drivers in mysterious ways,
please invent new function names for the replacements to the
existing pci_enable_msix() and pci_enable_msi_block() functions.

That way, both in-tree and out-of-tree drivers will notice the API change,
rather than having it go unseen and just failing for unknown reasons.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Ryan Mallon
On 09/10/13 12:30, Joe Perches wrote:
> On Tue, 2013-10-08 at 17:49 -0700, Joe Perches wrote:
>> On Wed, 2013-10-09 at 11:15 +1100, Ryan Mallon wrote:
>>> Some setuid binaries will allow reading of files which have read
>>> permission by the real user id. This is problematic with files which
>>> use %pK because the file access permission is checked at open() time,
>>> but the kptr_restrict setting is checked at read() time. If a setuid
>>> binary opens a %pK file as an unprivileged user, and then elevates
>>> permissions before reading the file, then kernel pointer values may be
>>> leaked.
>>
>> I think it should explicitly test 0.
> 
> Also, Documentation/sysctl/kernel.txt should be updated too.
> 
> Here's a suggested patch:
> 
> ---
>  Documentation/sysctl/kernel.txt | 14 --
>  lib/vsprintf.c  | 38 ++
>  2 files changed, 34 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 9d4c1d1..eac53d5 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -290,13 +290,15 @@ Default value is "/sbin/hotplug".
>  kptr_restrict:
>  
>  This toggle indicates whether restrictions are placed on
> -exposing kernel addresses via /proc and other interfaces.  When
> -kptr_restrict is set to (0), there are no restrictions.  When
> -kptr_restrict is set to (1), the default, kernel pointers
> +exposing kernel addresses via /proc and other interfaces.
> +
> +When kptr_restrict is set to (0), there are no restrictions.
> +When kptr_restrict is set to (1), the default, kernel pointers
>  printed using the %pK format specifier will be replaced with 0's
> -unless the user has CAP_SYSLOG.  When kptr_restrict is set to
> -(2), kernel pointers printed using %pK will be replaced with 0's
> -regardless of privileges.
> +unless the user has CAP_SYSLOG and effective user and group ids
> +are equal to the real ids.
> +When kptr_restrict is set to (2), kernel pointers printed using
> +%pK will be replaced with 0's regardless of privileges.

I'll add this, thanks.

I'm less fussed about the suggestions for the logic. The current test is
small and concise. The original also does the in_irq tests regardless of
the kptr_restrict setting since they are mostly intended to catch
internal kernel bugs.

Anyway, I am mostly interested to hear if the solution is acceptable, or
if a more involved open() vs read() time check is required.

~Ryan

>  
>  ==
>  
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 26559bd..986fdbe 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include /* for PAGE_SIZE */
> @@ -1302,20 +1303,33 @@ char *pointer(const char *fmt, char *buf, char *end, 
> void *ptr,
>   return buf;
>   }
>   case 'K':
> - /*
> -  * %pK cannot be used in IRQ context because its test
> -  * for CAP_SYSLOG would be meaningless.
> -  */
> - if (kptr_restrict && (in_irq() || in_serving_softirq() ||
> -   in_nmi())) {
> - if (spec.field_width == -1)
> - spec.field_width = default_width;
> - return string(buf, end, "pK-error", spec);
> + switch (kptr_restrict) {
> + case 0: /* None */
> + break;
> + case 1: {   /* Restricted (the default) */
> + const struct cred *cred;
> +
> + if (in_irq() || in_serving_softirq() || in_nmi()) {
> + /*
> +  * This cannot be used in IRQ context because
> +  * the test for CAP_SYSLOG would be meaningless
> +  */
> + if (spec.field_width == -1)
> + spec.field_width = default_width;
> + return string(buf, end, "pK-error", spec);
> + }
> + cred = current_cred();
> + if (!has_capability_noaudit(current, CAP_SYSLOG) ||
> + !uid_eq(cred->euid, cred->uid) ||
> + !gid_eq(cred->egid, cred->gid))
> + ptr = NULL;
> + break;
>   }
> - if (!((kptr_restrict == 0) ||
> -   (kptr_restrict == 1 &&
> -has_capability_noaudit(current, CAP_SYSLOG
> + case 2: /* Forbidden - Always 0 */
> + default:
>   ptr = NULL;
> + break;
> + }
>   break;
>   case 'N':
> 

Re: [RFC PATCH v2 0/1] FPGA subsystem core

2013-10-08 Thread Greg Kroah-Hartman
On Tue, Oct 08, 2013 at 06:47:41PM -0500, delicious quinoa wrote:
> On Tue, Oct 8, 2013 at 4:44 PM, Greg Kroah-Hartman
>  wrote:
> > On Tue, Oct 08, 2013 at 12:00:14PM -0500, Alan Tull wrote:
> >> On Fri, 2013-10-04 at 16:33 -0700, Greg Kroah-Hartman wrote:
> >> > On Fri, Oct 04, 2013 at 11:12:13AM -0700, H. Peter Anvin wrote:
> >> > > On 10/04/2013 10:44 AM, Michal Simek wrote:
> >> > > >
> >> > > > If you look at it in general I believe that there is wide range of
> >> > > > applications which just contain one bitstream per fpga and the
> >> > > > bitstream is replaced by newer version in upgrade. For them
> >> > > > firmware interface should be pretty useful. Just setup firmware
> >> > > > name with bitstream and it will be automatically loaded in startup
> >> > > > phase.
> >> > > >
> >> > > > Then there is another set of applications especially in connection
> >> > > > to partial reconfiguration where this can be done statically by
> >> > > > pregenerated partial bitstreams or automatically generated on
> >> > > > target cpu. For doing everything on the target firmware interface
> >> > > > is not the best because everything can be handled by user
> >> > > > application and it is easier just to push this bitstream to do
> >> > > > device and not to save it to the fs.
> >> > > >
> >> > > > I think the question here is if this subsystem could have several
> >> > > > interfaces. For example Alan is asking for adding char support.
> >> > > > Does it even make sense to have more interfaces with the same
> >> > > > backend driver? When this is answered then we can talk which one
> >> > > > make sense to have. In v2 is sysfs and firmware one. Adding char
> >> > > > is also easy to do.
> >> > > >
> >> > >
> >> > > Greg, what do you think?
> >> > >
> >> > > I agree that the firmware interface makes sense when the use of the
> >> > > FPGA is an implementation detail in a fixed hardware configuration,
> >> > > but that is a fairly restricted use case all things considered.
> >> >
> >> > Ideally I thought this would be just like "firmware", you dump the file
> >> > to the FPGA, it validates it and away you go with a new image running in
> >> > the chip.
> >> >
> >> > But, it sounds like this is much more complicated, so much so that
> >> > configfs might be the correct interface for it, as you can do lots of
> >> > things there, and it is very flexible (some say too flexible...)
> >> >
> >> > A char device, with a zillion different custom ioctls is also a way to
> >> > do it, but one that I really want to avoid as that gets messy really
> >> > quickly.
> >>
> >> Hi Greg,
> >>
> >> We are discussing a char device that has very few interfaces:
> >>  - a way of writing the image to fpga
> >>  - a way of getting fpga manager status
> >>  - a way of setting fpga manager state
> >>
> >> This all looks like standard char driver interface to me.  Writing the
> >> image could be writing to the devnode (cat image.bin > /dev/fpga0). The
> >> status stuff would be sysfs attributes.  All normal stuff any char
> >> driver in the kernel would do.  Why not just go with that?
> >
> > Because we really hate to add new ioctls to the kernel if at all
> > possible.
> 
> I don't see any need for adding any ioctls.
> 
> > Using sysfs (and it's one-value-per-file rule), makes
> > userspace tools easier, and managing the different devices in the system
> > easier (you know _exactly_ which device you are talking to, you don't
> > have to guess based on minor number).
> 
> That's cool.  The interface we could use is writing the raw fpga data
> to /sys/class/fpga_manager/fpga0/fpga_config_data
> 
> Reading or setting the fpga state could be from
> /sys/class/fpga_manager/fpga0/fpga_config_state

Ok, that's fine, I don't object to that, but you are giving up the
notification and loading ability of the kernel for the image files by
doing this, which will require you to use/write/maintain userspace
tools.  If you use the firmware interface, no userspace tool is needed
at all, which I can see some people really wanting, right?

> Or do I misunderstand?  Do you include sysfs attributes when you
> are talking about ioctls?

You can't do ioctls on sysfs files, so no.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel paging request at 00740060

2013-10-08 Thread Mike Galbraith
On Tue, 2013-10-08 at 21:05 +0200, Jakub Jelinek wrote: 
> On Tue, Oct 08, 2013 at 08:51:54PM +0200, Oleg Nesterov wrote:
> > On 10/08, Linus Torvalds wrote:
> > >
> > > (not yet merged), see:
> > >
> > > 
> > > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=0c44c2d0f459cd7e275242b72f500137c4fa834d
> > 
> > I do not really understand inline assembly constraints, but I'll ask
> > anyway.
> > 
> > +#define __GEN_RMWcc(fullop, var, cc, ...) \
> > +do { \
> > + asm volatile goto (fullop "; j" cc " %l[cc_label]" \
> > + : : "m" (var), ## __VA_ARGS__ \
> >   ^
> > 
> > don't we need
> > 
> > "+m" (var)
> > 
> > here?
> 
> You actually can't have output operands with asm goto, only inputs
> and clobbers.  But the "memory" clobber should be enough here.
> 
> If you suspect a compiler bug, can somebody please narrow it down to
> a single object file (if I've skimmed the patch right, it is just an
> optimization, where object files compiled without and with the patch
> should actually coexist fine in the same kernel), ideally to a single
> routine if possible and post a preprocessed source + gcc command line
> + version of gcc?

gcc version 4.6.2 (SUSE Linux) won't produce output, but where it dies
might point in the general direction of newer gcc troubles? 

  CC [M]  net/sunrpc/xprtsock.o
net/sunrpc/xprtsock.c: In function ‘xs_setup_tcp’:
net/sunrpc/xprtsock.c:2844:1: internal compiler error: in move_insn, at 
haifa-sched.c:2353

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmc: sdhci-esdhc-imx: Check the return value from clk_prepare_enable()

2013-10-08 Thread Shawn Guo
On Tue, Oct 08, 2013 at 10:47:28AM -0300, Fabio Estevam wrote:
> clk_prepare_enable() may fail, so let's check its return value and propagate 
> it
> in the case of error.
> 
> Also, fix the sequence for disabling the clock in the probe error path and 
> also in the remove function.
> 
> Signed-off-by: Fabio Estevam 
> ---
>  drivers/mmc/host/sdhci-esdhc-imx.c | 24 +---
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c 
> b/drivers/mmc/host/sdhci-esdhc-imx.c
> index b9899e9..abc9836 100644
> --- a/drivers/mmc/host/sdhci-esdhc-imx.c
> +++ b/drivers/mmc/host/sdhci-esdhc-imx.c
> @@ -877,9 +877,17 @@ static int sdhci_esdhc_imx_probe(struct platform_device 
> *pdev)
>  
>   pltfm_host->clk = imx_data->clk_per;
>  
> - clk_prepare_enable(imx_data->clk_per);
> - clk_prepare_enable(imx_data->clk_ipg);
> - clk_prepare_enable(imx_data->clk_ahb);
> + err = clk_prepare_enable(imx_data->clk_per);
> + if (err)
> + goto free_sdhci;
> +
> + err = clk_prepare_enable(imx_data->clk_ipg);
> + if (err)
> + goto err_ipg;
> +
> + err = clk_prepare_enable(imx_data->clk_ahb);
> + if (err)
> + goto err_ahb;
>  
>   imx_data->pinctrl = devm_pinctrl_get(>dev);
>   if (IS_ERR(imx_data->pinctrl)) {
> @@ -995,9 +1003,11 @@ static int sdhci_esdhc_imx_probe(struct platform_device 
> *pdev)
>   return 0;
>  
>  disable_clk:
> - clk_disable_unprepare(imx_data->clk_per);
> - clk_disable_unprepare(imx_data->clk_ipg);
>   clk_disable_unprepare(imx_data->clk_ahb);
> +err_ahb:

Naming schema of the existing labels seems to be 'acting' rather than
'reasoning', so I guess 'disable_ipg:' might fit here better?

Shawn

> + clk_disable_unprepare(imx_data->clk_ipg);
> +err_ipg:
> + clk_disable_unprepare(imx_data->clk_per);
>  free_sdhci:
>   sdhci_pltfm_free(pdev);
>   return err;
> @@ -1012,9 +1022,9 @@ static int sdhci_esdhc_imx_remove(struct 
> platform_device *pdev)
>  
>   sdhci_remove_host(host, dead);
>  
> - clk_disable_unprepare(imx_data->clk_per);
> - clk_disable_unprepare(imx_data->clk_ipg);
>   clk_disable_unprepare(imx_data->clk_ahb);
> + clk_disable_unprepare(imx_data->clk_ipg);
> + clk_disable_unprepare(imx_data->clk_per);
>  
>   sdhci_pltfm_free(pdev);
>  
> -- 
> 1.8.1.2
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 6:18 PM, Al Viro  wrote:
>
> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

Actually POSIX very much allows zero returns. O_NDELAY is mentioned as
a possible cause, in addition to zero-sized writes themselves, of
course.

Also, writing to (but not past) the end of a block device returns 0
for "end of device", iirc.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread Michael Ellerman
On Tue, Oct 08, 2013 at 09:33:02AM +0200, Alexander Gordeev wrote:
> On Tue, Oct 08, 2013 at 03:33:30PM +1100, Michael Ellerman wrote:
> > On Wed, Oct 02, 2013 at 12:29:04PM +0200, Alexander Gordeev wrote:
> > > This technique proved to be confusing and error-prone. Vast share
> > > of device drivers simply fail to follow the described guidelines.
> > 
> > To clarify "Vast share of device drivers":
> > 
> >  - 58 drivers call pci_enable_msix()
> >  - 24 try a single allocation and then fallback to MSI/LSI
> >  - 19 use the loop style allocation as above
> >  - 14 try an allocation, and if it fails retry once
> >  - 1  incorrectly continues when pci_enable_msix() returns > 0
> > 
> > So 33 drivers (> 50%) successfully make use of the "confusing and
> > error-prone" return value.
> 
> Ok, you caught me - 'vast share' is incorrect and is a subject to
> rewording. But out of 19/58 how many drivers tested fallbacks on the
> real hardware? IOW, which drivers are affected by the pSeries quota?

It's not 19/58, it's 33/58.

As to how many we care about on powerpc I can't say, so you have a point
there. But I still think the interface is not actually that terrible.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] BUG: unable to handle kernel NULL pointer dereference at (null)

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 6:09 PM, Fengguang Wu  wrote:
> [   27.189229] BUG: unable to handle kernel NULL pointer dereference at 
> 0108
> [   27.190165] IP: [] rw_verify_area+0xa0/0x1b0

This looks like file->f_inode is NULL, and it's trying to access inode->i_flock.

There's a number of other ones too (that last delayed __fput one, for
example) that might be due to a NULL inode.

And a lot of thre rest are file-handling too, with the BUG_ON() in
__put_cred() when there's something wrong with file->f_cred.

.. but I have no clue how that could happen and why the asm goto
should matter for this.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xhci-hub.c: handle command_trb that may be link TRB

2013-10-08 Thread Xiao Jin
From: xiao jin 
Date: Wed, 9 Oct 2013 09:38:45 +0800
Subject: [PATCH] xhci-hub.c: handle command_trb that may be link TRB

When xhci stop device, it's possible cmd_ring enqueue point to
link TRB after queue the last but one stop endpoint. We must
handle the command_trb point to the next segment trb. Otherwise
xhci stop devie will timeout because command_trb can't match
with cmd_ring dequeue.

The patch is to let command_trb point to the next segment trb if
cmd_ring enqueue point to link TRB.

Signed-off-by: xiao jin 
---
 drivers/usb/host/xhci-hub.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 1d35459..4872640 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -287,6 +287,13 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int 
slot_id, int suspend)
xhci_queue_stop_endpoint(xhci, slot_id, i, suspend);
}
cmd->command_trb = xhci->cmd_ring->enqueue;
+   /* Enqueue pointer can be left pointing to the link TRB,
+* we must handle that
+*/
+   if (TRB_TYPE_LINK_LE32(cmd->command_trb->link.control))
+   cmd->command_trb =
+   xhci->cmd_ring->enq_seg->next->trbs;
+
list_add_tail(>cmd_list, _dev->cmd_list);
xhci_queue_stop_endpoint(xhci, slot_id, 0, suspend);
xhci_ring_cmd_db(xhci);
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Joe Perches
On Tue, 2013-10-08 at 17:49 -0700, Joe Perches wrote:
> On Wed, 2013-10-09 at 11:15 +1100, Ryan Mallon wrote:
> > Some setuid binaries will allow reading of files which have read
> > permission by the real user id. This is problematic with files which
> > use %pK because the file access permission is checked at open() time,
> > but the kptr_restrict setting is checked at read() time. If a setuid
> > binary opens a %pK file as an unprivileged user, and then elevates
> > permissions before reading the file, then kernel pointer values may be
> > leaked.
> 
> I think it should explicitly test 0.

Also, Documentation/sysctl/kernel.txt should be updated too.

Here's a suggested patch:

---
 Documentation/sysctl/kernel.txt | 14 --
 lib/vsprintf.c  | 38 ++
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d1..eac53d5 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -290,13 +290,15 @@ Default value is "/sbin/hotplug".
 kptr_restrict:
 
 This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces.  When
-kptr_restrict is set to (0), there are no restrictions.  When
-kptr_restrict is set to (1), the default, kernel pointers
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to (0), there are no restrictions.
+When kptr_restrict is set to (1), the default, kernel pointers
 printed using the %pK format specifier will be replaced with 0's
-unless the user has CAP_SYSLOG.  When kptr_restrict is set to
-(2), kernel pointers printed using %pK will be replaced with 0's
-regardless of privileges.
+unless the user has CAP_SYSLOG and effective user and group ids
+are equal to the real ids.
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
 
 ==
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 26559bd..986fdbe 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include   /* for PAGE_SIZE */
@@ -1302,20 +1303,33 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
return buf;
}
case 'K':
-   /*
-* %pK cannot be used in IRQ context because its test
-* for CAP_SYSLOG would be meaningless.
-*/
-   if (kptr_restrict && (in_irq() || in_serving_softirq() ||
- in_nmi())) {
-   if (spec.field_width == -1)
-   spec.field_width = default_width;
-   return string(buf, end, "pK-error", spec);
+   switch (kptr_restrict) {
+   case 0: /* None */
+   break;
+   case 1: {   /* Restricted (the default) */
+   const struct cred *cred;
+
+   if (in_irq() || in_serving_softirq() || in_nmi()) {
+   /*
+* This cannot be used in IRQ context because
+* the test for CAP_SYSLOG would be meaningless
+*/
+   if (spec.field_width == -1)
+   spec.field_width = default_width;
+   return string(buf, end, "pK-error", spec);
+   }
+   cred = current_cred();
+   if (!has_capability_noaudit(current, CAP_SYSLOG) ||
+   !uid_eq(cred->euid, cred->uid) ||
+   !gid_eq(cred->egid, cred->gid))
+   ptr = NULL;
+   break;
}
-   if (!((kptr_restrict == 0) ||
- (kptr_restrict == 1 &&
-  has_capability_noaudit(current, CAP_SYSLOG
+   case 2: /* Forbidden - Always 0 */
+   default:
ptr = NULL;
+   break;
+   }
break;
case 'N':
switch (fmt[1]) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] DMA: extend documentation to provide more API details

2013-10-08 Thread Dan Williams
On Mon, Oct 7, 2013 at 12:40 AM, Guennadi Liakhovetski
 wrote:
> Hi Russell
>
> On Sun, 6 Oct 2013, Russell King - ARM Linux wrote:
>
>> On Sat, Oct 05, 2013 at 11:00:45PM +0200, Guennadi Liakhovetski wrote:
>> > On Sat, 5 Oct 2013, Russell King - ARM Linux wrote:
>> >
>> > > On Sat, Oct 05, 2013 at 07:36:20PM +0200, Guennadi Liakhovetski wrote:
>> > > > +   A DMA transaction descriptor, returned to the user by one of 
>> > > > "prep"
>> > > > +   methods is considered as belogning to the user until it is 
>> > > > submitted
>> > > > +   to the dmaengine driver for transfer. However, it is 
>> > > > recommended, that
>> > > > +   the dmaengine driver keeps references to prepared descriptors 
>> > > > too,
>> > > > +   because if dmaengine_terminate_all() is called at that time, 
>> > > > the driver
>> > > > +   will have to recycle all allocated descriptors for the 
>> > > > respective
>> > > > +   channel.
>> > >
>> > > No.  That's quite dangerous.  think about what can happen.
>> > >
>> > > Thread 1  Thread 2
>> > > Driver calls prepare
>> > >   dmaengine_terminate_all()
>> > >   dmaengine driver frees prepared descriptor
>> > > driver submits descriptor
>> > >
>> > > You now have a descriptor which has been freed submitted to the DMA 
>> > > engine
>> > > queue.  This will cause chaos.
>> >
>> > Yes, I understand this. So, it is intentional, that after a *_prep_* a
>> > descriptor belongs to the user and - if the user fails - it will simply be
>> > leaked. A terminate-all shouldn't recycle them and dmaengine driver
>> > unbinding is impossible at that time, as long as the user hasn't released
>> > the channel. Ok, I can rework the above just to explain this.
>>
>> "the user fails" should be very difficult.  One of the requirements here
>> is that any "user" submits the prepared descriptor as soon as possible
>> after preparation.  Some DMA engine implementations hold a spinlock
>> between preparation and submission so this is absolutely necessary.
>
> Ouch...

The pain here is that some engines have a pre-allocated descriptor
ring, and the prep methods write directly to that ring.  Without
holding a lock a parallel submission could advance the ring before the
last one was submitted.  This was done because the original
implementation of the prep routines started the pattern of writing
directly to the ring.

I think that is the original sin of dmaengine as every other driver
subsystem in the universe submits a generic request object to a driver
who then copies the values into the hardware control structure.  For
dma-slave that overhead is almost certainly worth paying.

>
>> As Dan Williams explained to me, the reason for the separate submission
>> step is there to allow the callback information to be set.  So literally
>> any "user" of this should do this:
>>
>>   desc = prepare();
>>   if (!desc)
>>   goto failed_to_prepare;
>>
>>   desc->callback = function;
>>   desc->callback_param = param;
>>   dmaengine_submit(desc);
>>
>> There should be very little possiblity of the user failing between those
>> two calls.
>
> I see.
>
>> > > > -   On completion of each DMA operation, the next in queue is started 
>> > > > and
>> > > > -   a tasklet triggered. The tasklet will then call the client driver
>>   
>> > > > -   completion callback routine for notification, if set.
>> > > > +   On completion of each DMA operation, the next active transaction 
>> > > > in queue is
>> > > > +   started and the ISR bottom half, e.g. a tasklet or a kernel thread 
>> > > > is
>> > > > +   triggered.
>> > >
>> > > Or a kernel thread?  I don't think that's right.  It's always been
>> > > specified that the callback will happen from tasklet context.
>> >
>> > Do you see any problems using, say, a threaded interrupt handler, apart
>> > from possible performance issues? That seems to be pretty convenient.
>> > Otherwise we should really mandate somewhere, that bottom half processing
>> > should take place in a tasklet?
>>
>> The documentation has always stated that callbacks will be made from
>> tasklet context.  The problem with allowing different contexts from
>> different drivers is taht spinlocking becomes problematical.  Remember
>> that we have _bh() variants which lock against tasklets but allow IRQs.
>
> Spinlocks are local to dmaengine drivers, and I currently see quite a few
> of them doing spin_lock_irq(save)() instead of _bh(). Some also take that
> lock in their ISR, like the pl330 does.

Yes, but in that example the driver still arranges for the callback to
be in tasklet context (pl330_tasklet).  Threaded irqs should be fine.
The callbacks just expect no stricter than bh context and cannot
assume process context.

>
> The disadvantage of using a threaded IRQ, that I see so far, is that you
> then can only wake up the thread from the ISR, not from other contexts,
> but even then it 

Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().

2013-10-08 Thread Michael Ellerman
On Wed, Oct 09, 2013 at 12:03:19PM +1100, Michael Ellerman wrote:
> On Tue, 2013-10-08 at 12:31 -0700, Sukadev Bhattiprolu wrote:
> > Michael Ellerman [mich...@ellerman.id.au] wrote:
> > | bool is_load_store(int ext_opcode)
> > | {
> > | upper = ext_opcode >> 5;
> > | lower = ext_opcode & 0x1f;
> > | 
> > | /* Short circuit as many misses as we can */
> > | if (lower < 3 || lower > 23)
> > | return false;
> > 
> > I see some loads/stores like these which are not covered by
> > the above check. Is it ok to ignore them ?
> > 
> > lower == 29: ldepx, stdepx, eviddepx, evstddepx
> > 
> > lower == 31: lwepx, lbepx, lfdepx, stfdepx,
> 
> Those are the external process ID instructions, which I've never heard
> of anyone using, I think we can ignore them.
> 
> > Looking through the opcode maps, I also see these for primary
> > op code 4:
> > 
> > evldd, evlddx, evldwx, evldw, evldh, evldhx.
> > 
> > Should we include those also ?
> 
> Yes I think so. I didn't check any of the other opcodes for you.

Paul points out these are for the SPE extension, which we also don't
care about. So ignore those as well.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kobject: show debug info on delayed kobject release

2013-10-08 Thread Fengguang Wu
Useful for locating buggy drivers on kernel oops.

It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is
hopefully only enabled in debug kernels (like maybe the Fedora rawhide
one, or at developers), so being a bit more verbose is likely ok.

CC: Russell King - ARM Linux 
CC: Greg Kroah-Hartman 
Signed-off-by: Fengguang Wu 
---
 lib/kobject.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/kobject.c b/lib/kobject.c
index 669bf19..084f7b1 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -592,7 +592,7 @@ static void kobject_release(struct kref *kref)
 {
struct kobject *kobj = container_of(kref, struct kobject, kref);
 #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
-   pr_debug("kobject: '%s' (%p): %s, parent %p (delayed)\n",
+   pr_info("kobject: '%s' (%p): %s, parent %p (delayed)\n",
 kobject_name(kobj), kobj, __func__, kobj->parent);
INIT_DELAYED_WORK(>release, kobject_delayed_cleanup);
schedule_delayed_work(>release, HZ);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the imx-mxs tree

2013-10-08 Thread Shawn Guo
On Tue, Oct 08, 2013 at 03:44:12PM +0200, Thierry Reding wrote:
> Today's linux-next merge of the imx-mxs tree got conflicts in
> 
>   arch/arm/mach-imx/mach-imx6q.c
> 
> caused by commits 4d9d18a (ARM: imx: remove custom .init_time hook),
> e709f38 (ARM: imx6: report soc info via soc device) and 3087d32 (ARM:
> imx6q: use common soc revision helpers).
> 
> I fixed it up (see below). Please verify that the resolution looks
> correct.

Thanks, Thierry.  The resolution is correct.

Shawn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Al Viro
On Wed, Oct 09, 2013 at 02:18:33AM +0100, Al Viro wrote:

> Point, but I would argue that we should yell very loud if we get 0 from
> vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
> to return that, but a lot of userland code won't be expecting that and
> won't be able to cope...

PS: I agree that we should abort that loop if we get nr == 0, of course,
but I believe that we should report the pathname of the offending file,
at the very least.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mmc: sdhci-pci: add Intel Merrifield support

2013-10-08 Thread David Cohen

Kindly ping :)
Any comments?

Br, David Cohen

On 10/01/2013 01:18 PM, David Cohen wrote:

Implement initial SDHCI Intel Merrifield support.
This patch is based on previous one from Yunpeng Gao 

Signed-off-by: David Cohen 
---
  drivers/mmc/host/sdhci-pci.c | 30 ++
  1 file changed, 30 insertions(+)

diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
index d7d6bc8..06f026a 100644
--- a/drivers/mmc/host/sdhci-pci.c
+++ b/drivers/mmc/host/sdhci-pci.c
@@ -37,6 +37,7 @@
  #define PCI_DEVICE_ID_INTEL_BYT_SDIO  0x0f15
  #define PCI_DEVICE_ID_INTEL_BYT_SD0x0f16
  #define PCI_DEVICE_ID_INTEL_BYT_EMMC2 0x0f50
+#define PCI_DEVICE_ID_INTEL_MRFL_MMC   0x1190

  /*
   * PCI registers
@@ -356,6 +357,28 @@ static const struct sdhci_pci_fixes sdhci_intel_byt_sd = {
.allow_runtime_pm = true,
  };

+/* Define Host controllers for Intel Merrifield platform */
+#define INTEL_MRFL_EMMC_0  0
+#define INTEL_MRFL_EMMC_1  1
+
+static int intel_mrfl_mmc_probe_slot(struct sdhci_pci_slot *slot)
+{
+   if ((PCI_FUNC(slot->chip->pdev->devfn) != INTEL_MRFL_EMMC_0) &&
+   (PCI_FUNC(slot->chip->pdev->devfn) != INTEL_MRFL_EMMC_1))
+   /* SD support is not ready yet */
+   return -ENODEV;
+
+   slot->host->mmc->caps |= MMC_CAP_8_BIT_DATA | MMC_CAP_NONREMOVABLE |
+MMC_CAP_1_8V_DDR;
+
+   return 0;
+}
+
+static const struct sdhci_pci_fixes sdhci_intel_mrfl_mmc = {
+   .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
+   .probe_slot = intel_mrfl_mmc_probe_slot,
+};
+
  /* O2Micro extra registers */
  #define O2_SD_LOCK_WP 0xD3
  #define O2_SD_MULTI_VCC3V 0xEE
@@ -940,6 +963,13 @@ static const struct pci_device_id pci_ids[] = {
},

{
+   .vendor = PCI_VENDOR_ID_INTEL,
+   .device = PCI_DEVICE_ID_INTEL_MRFL_MMC,
+   .subvendor  = PCI_ANY_ID,
+   .subdevice  = PCI_ANY_ID,
+   .driver_data= (kernel_ulong_t)_intel_mrfl_mmc,
+   },
+   {
.vendor = PCI_VENDOR_ID_O2,
.device = PCI_DEVICE_ID_O2_8120,
.subvendor  = PCI_ANY_ID,



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

2013-10-08 Thread Dave Jones
On Tue, Oct 08, 2013 at 05:45:37PM -0700, Linus Torvalds wrote:

 > normally that whole DEBUG_KOBJECT_RELEASE
 > thing is hopefully only enabled in debug kernels (like maybe the
 > Fedora rawhide one

Nope. After spending a couple of days fruitlessly trying to get my machine to 
boot
with it enabled, I think we decided it's not worth the pain.  Debugging
"hangs instantly and silently on boot" bugs are hard enough when you're in 
front of the machine,
doing so via bugzilla isn't something I'd wish on my worst enemy.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Al Viro
On Tue, Oct 08, 2013 at 05:52:42PM -0700, Linus Torvalds wrote:
> On Tue, Oct 8, 2013 at 5:15 PM, Al Viro  wrote:
> >
> > ... and deal with short writes properly
> 
> .. except you don't.
> 
> > +   while (nr) {
> > +   if (dump_interrupted())
> > +   return 0;
> > +   n = vfs_write(file, addr, nr, );
> > +   if (n < 0)
> > +   return 0;
> > +   file->f_pos = pos;
> > +   cprm->written += n;
> > +   nr -= n;
> > +   }
> 
> Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
> or ENOSPC), but write returning zero is actually possible and a valid
> return value and traditional for "end of media". Looping forever is
> not a good idea.

Point, but I would argue that we should yell very loud if we get 0 from
vfs_write() for non-zero size.  I'm not sure if POSIX allows write(2)
to return that, but a lot of userland code won't be expecting that and
won't be able to cope...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xhci: correct the usage of USB_CTRL_SET_TIMEOUT

2013-10-08 Thread Xiao Jin
From: xiao jin 
Date: Wed, 9 Oct 2013 09:09:46 +0800
Subject: [PATCH] xhci: correct the usage of USB_CTRL_SET_TIMEOUT

The usage of USB_CTRL_SET_TIMEOUT is incorrect. The
definition of USB_CTRL_SET_TIMEOUT is 5000ms. The
input timeout to wait_for_completion_interruptible_timeout
is jiffies. That makes the timeout be longer than what
we want, such as 50s in some platform.

The patch is to convert USB_CTRL_SET_TIMEOUT to jiffies as
command completion event timeout.

Signed-off-by: xiao jin 
---
 drivers/usb/host/xhci-hub.c |2 +-
 drivers/usb/host/xhci.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 1d35459..78cf294 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -295,7 +295,7 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int 
slot_id, int suspend)
/* Wait for last stop endpoint command to finish */
timeleft = wait_for_completion_interruptible_timeout(
cmd->completion,
-   USB_CTRL_SET_TIMEOUT);
+   msecs_to_jiffies(USB_CTRL_SET_TIMEOUT));
if (timeleft <= 0) {
xhci_warn(xhci, "%s while waiting for stop endpoint command\n",
timeleft == 0 ? "Timeout" : "Signal");
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9478caa..f9ebc72 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3486,7 +3486,7 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, 
struct usb_device *udev)
/* Wait for the Reset Device command to finish */
timeleft = wait_for_completion_interruptible_timeout(
reset_device_cmd->completion,
-   USB_CTRL_SET_TIMEOUT);
+   msecs_to_jiffies(USB_CTRL_SET_TIMEOUT));
if (timeleft <= 0) {
xhci_warn(xhci, "%s while waiting for reset device command\n",
timeleft == 0 ? "Timeout" : "Signal");
-- 
1.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reworking dm-writeboost [was: Re: staging: Add dm-writeboost]

2013-10-08 Thread Akira Hayakawa
Mike,

I am happy to see that
guys from filesystem to the block subsystem
have been discussing how to handle barriers in each layer
almost independently.

>> Merging the barriers and replacing it with a single FLUSH
>> by accepting a lot of writes
>> is the reason for deferring barriers in writeboost.
>> If you want to know further I recommend you to
>> look at the source code to see
>> how queue_barrier_io() is used and
>> how the barriers are kidnapped in queue_flushing().
> 
> AFAICT, this is an unfortunate hack resulting from dm-writeboost being a
> bio-based DM target.  The block layer already has support for FLUSH
> merging, see commit ae1b1539622fb4 ("block: reimplement FLUSH/FUA to
> support merge")

I have read the comments on this patch.
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae

My understanding is that
REQ_FUA and REQ_FLUSH are decomposed to more primitive flags
in accordance with the property of the device.
{PRE|POST}FLUSH request are queued in flush_queue[one of the two]
(which is often called "pending" queue) and
calls blk_kick_flush that defers flushing and later
if few conditions are satisfied it actually inserts "a single" flush request
no matter how many flush requests are in the pending queue
(just judged by !list_empty(pending)).

If my understanding is correct,
we are deferring flush across three layers.

Let me summarize.
- For filesystem, Dave said that metadata journaling defers
  barriers.
- For device-mapper, writeboost, dm-cache and dm-thin defers
  barriers.
- For block, it defers barriers and results it to
  merging several requests into one after all.

I think writeboost can not discard this deferring hack because
deferring the barriers is usually very effective to
make it likely to fulfill the RAM buffer which
makes the write throughput higher and decrease the CPU usage.
However, for particular case such as what Dave pointed out,
this hack is just a disturbance.
Even for writeboost, the hack in the patch
is just a disturbance too unfortunately.
Upper layer dislikes the lower layers hidden optimization is
just a limitation of the layered architecture of Linux kernel.

I think these three layers are thinking almost the same thing
is that these hacks are all good and each layer
preparing a switch to turn on/off the optimization
is what we have to do for compromise.

All the problems originates from the fact that
we have volatile cache and persistent memory can
take these problems away.

With persistent memory provided
writeboost can switch off the deferring barriers.
However,
I think all the servers are equipped with
persistent memory is the future tale.
So, my idea is to maintain both modes
for RAM buffer type (volatile, non-volatile)
and in case of the former type
deferring hack is a good compromise.

Akira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/pid.c: check pid whether be NULL in __change_pid()

2013-10-08 Thread Chen Gang
On 10/09/2013 01:56 AM, Oleg Nesterov wrote:
> On 10/08, Chen Gang wrote:
>>
>> On 10/07/2013 08:43 PM, Oleg Nesterov wrote:
>>>
 but still recommend to check it
 in __change_pid() to let itself consistency.
>>>
>>> Why?
>>>
>>> Contrary, I think we should not hide the problem. If __change_pid() is
>>> called when task->pids[type].pid is already NULL there is something
>>> seriously wrong.
>>>
>>
>> Hmm... In my opinion, it means need BUG_ON() for original 'link->pid'.
>>
>> patch begin-
>>
>> [PATCH] kernel/pid.c: add BUG_ON() for "!pid" in __change_pid()
>>
>>   Within __change_pid(), 'new' may be NULL if it comes from detach_pid(),
> 
> Yes, this is fine,
> 
>>   and 'link->pid' also may be NULL ("link->pid = new"),
>> ...
>>   the original 'link->pid' may be NULL, too.
> 
> Too? You mean, it becomes NULL after detach_pid().
> 
>>   But in real world, all related extern functions always assume "if
>>   'link->pid' is already NULL, there must be something seriously wrong",
>>   although __change_pid() can accept parameter 'new' as NULL.
> 
> I simply can't understand why you mix "new == NULL" and "link->pid == NULL".
> 
>>   So in __change_pid(), need add BUG_ON() for it: "it should not happen,
>>   when it really happen, OS must be continuing blindly,
> 
> OS will crash a couple of lines below trying to dereference this pointer.
> 
>> --- a/kernel/pid.c
>> +++ b/kernel/pid.c
>> @@ -396,6 +396,12 @@ static void __change_pid(struct task_struct *task, enum 
>> pid_type type,
>>  link = >pids[type];
>>  pid = link->pid;
>>
>> +/*
>> + * If task->pids[type].pid is already NULL, there must be something
>> + * seriously wrong
>> + */
>> +BUG_ON(!pid);
> 
> Following this logic you should also add
> 
>   BUG_ON(!task);
>   BUG_ON(!link->node.next);
>   BUG_ON(!link->node.prev || link->node.prev == LIST_POISON2);
>   ...
> 
> Seriously, I do not understand the point. Yes, detach_pid() should not
> be called twice. And it has a single caller. And this caller will crash
> too if it is called twice. So you can also add BUG_ON() into
> __unhash_process(). And so on.
> 

In my opinion, for using BUG_ON(), it has 3 requirements:

 - OS is just continuing blindly.

 - next, will cause real issue (or need use WARN_ON instead of).

 - Can let the related code self consitency (or will add many wastes).


Your demo are match 2 requrements, but not match the 3rd one: "it is
reasonable to assume 'task', 'link', and 'link->node' are valid in
__change_pid()".

But for link->pid, the function name '__change_pid' tells us it is only
for changing pid, if 'new' can be NULL, 'link->pid' also can be NULL,
so the original 'link-pid' can be NULL, too.

So for self consistency, we also can change the function name from
'__change_pid' to another one (e.g. 'change_orig_valid_pid'), to let
itself consistency (so don't need BUG_ON)


The related patch is below, please check, thanks.

patch begin-

kernel/pid.c: use 'change_orig_valid_pid' instead of '__change_pid' for 
function name

  For function name '__change_pid' is only for changing pid. In fact, it
  always assumes the original pid is valid, but new pid can be NULL, so
  recommend to use 'change_orig_valid_pid' instead of.

Signed-off-by: Chen Gang 
---
 kernel/pid.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a266..408a3b5 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -386,7 +386,7 @@ void attach_pid(struct task_struct *task, enum pid_type 
type)
hlist_add_head_rcu(>node, >pid->tasks[type]);
 }
 
-static void __change_pid(struct task_struct *task, enum pid_type type,
+static void change_orig_valid_pid(struct task_struct *task, enum pid_type type,
struct pid *new)
 {
struct pid_link *link;
@@ -408,13 +408,13 @@ static void __change_pid(struct task_struct *task, enum 
pid_type type,
 
 void detach_pid(struct task_struct *task, enum pid_type type)
 {
-   __change_pid(task, type, NULL);
+   change_orig_valid_pid(task, type, NULL);
 }
 
 void change_pid(struct task_struct *task, enum pid_type type,
struct pid *pid)
 {
-   __change_pid(task, type, pid);
+   change_orig_valid_pid(task, type, pid);
attach_pid(task, type);
 }
 
-- 
1.7.7.6

patch end---

Thanks.
-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9][v5] powerpc: implement is_instr_load_store().

2013-10-08 Thread Michael Ellerman
On Tue, 2013-10-08 at 12:31 -0700, Sukadev Bhattiprolu wrote:
> Michael Ellerman [mich...@ellerman.id.au] wrote:
> | bool is_load_store(int ext_opcode)
> | {
> | upper = ext_opcode >> 5;
> | lower = ext_opcode & 0x1f;
> | 
> | /* Short circuit as many misses as we can */
> | if (lower < 3 || lower > 23)
> | return false;
> 
> I see some loads/stores like these which are not covered by
> the above check. Is it ok to ignore them ?
> 
>   lower == 29: ldepx, stdepx, eviddepx, evstddepx
> 
>   lower == 31: lwepx, lbepx, lfdepx, stfdepx,

Those are the external process ID instructions, which I've never heard
of anyone using, I think we can ignore them.

> Looking through the opcode maps, I also see these for primary
> op code 4:
> 
>   evldd, evlddx, evldwx, evldw, evldh, evldhx.
> 
> Should we include those also ?

Yes I think so. I didn't check any of the other opcodes for you.

cheers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 5:15 PM, Al Viro  wrote:
>
> ... and deal with short writes properly

.. except you don't.

> +   while (nr) {
> +   if (dump_interrupted())
> +   return 0;
> +   n = vfs_write(file, addr, nr, );
> +   if (n < 0)
> +   return 0;
> +   file->f_pos = pos;
> +   cprm->written += n;
> +   nr -= n;
> +   }

Please handle 'n == 0' too. Maybe it never happens (ie you get EPIPE
or ENOSPC), but write returning zero is actually possible and a valid
return value and traditional for "end of media". Looping forever is
not a good idea.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Joe Perches
On Wed, 2013-10-09 at 11:15 +1100, Ryan Mallon wrote:
> Some setuid binaries will allow reading of files which have read
> permission by the real user id. This is problematic with files which
> use %pK because the file access permission is checked at open() time,
> but the kptr_restrict setting is checked at read() time. If a setuid
> binary opens a %pK file as an unprivileged user, and then elevates
> permissions before reading the file, then kernel pointer values may be
> leaked.

I think it should explicitly test 0.

Dan? Might this be any problem?

Otherwise, just style notes:

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> @@ -1312,10 +1312,26 @@ char *pointer(const char *fmt, char *buf, char *end, 
> void *ptr,
>   spec.field_width = default_width;
>   return string(buf, end, "pK-error", spec);
>   }
> - if (!((kptr_restrict == 0) ||
> -   (kptr_restrict == 1 &&
> -has_capability_noaudit(current, CAP_SYSLOG
> - ptr = NULL;
> +
> + /*
> +  * If kptr_restrict is set to 2, then %pK always prints as
> +  * NULL. If it is set to 1, then only print the real pointer
> +  * value if the current proccess has CAP_SYSLOG and is running
> +  * with the same credentials it started with. This is because
> +  * access to files is checked at open() time, but %pK checks
> +  * permission at read() time. We don't want to leak pointer
> +  * values if a binary opens a file using %pK and then elevates
> +  * privileges before reading it.
> +  */
> + {
> + const struct cred *cred = current_cred();

Please add #include 

> + if (kptr_restrict == 2 || (kptr_restrict == 1 &&
> +  (!has_capability_noaudit(current, CAP_SYSLOG) ||
> +   !uid_eq(cred->euid, cred->uid) ||
> +   !gid_eq(cred->egid, cred->gid
> + ptr = NULL;
> + }
>   break;

Also, it might be easier to read as:

if (kptr_restrict == 0)
break;
else if (kptr_restrict == 1) {
const struct cred *cred = current_cred();

if (!has_capability_noaudit(current, CAP_SYSLOG) ||
!uid_eq(cred->euid, cred->uid) ||
!gid_eq(cred->egid, cred->gid))
ptr = NULL;
} else {
ptr = NULL;
}
break;

>   case 'N':
>   switch (fmt[1]) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mmc: core: protect references to host->areq with host->lock

2013-10-08 Thread Grant Grundler
Ulf,
While this patch might be correct, it's not solving the problem I
claimed and my explanation was wrong. See comments in this code
review:
https://chromium-review.googlesource.com/#/c/170880/1//COMMIT_MSG

While I no longer see the same crash with this change in our "ToT
tree", I'm able to reproduce the original mmcqd crash on a different
kernel variant (also based on chromeos-3.4 kernel).

I think I need to review references to mqrq_prev and mqrq_cur since
those appear to be protected by mq->thread_sem and I suspect
references are happening from dw_mmc tasklet without holding this
semaphore.

apologies,
grant


On Thu, Sep 26, 2013 at 12:22 PM, Grant Grundler  wrote:
> Races between host->areq being NULL or not are resulting in mmcqd
> hung_task panics. Like this one:
>
> <3>[  240.501202] INFO: task mmcqd/1:85 blocked for more than 120 seconds.
> <3>[  240.501213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> <6>[  240.501223] mmcqd/1 D 80528020 085  2 0x
> <5>[  240.501254] [<80528020>] (__schedule+0x614/0x780) from [<80528550>] 
> (schedule+0x94/0x98)
> <5>[  240.501269] [<80528550>] (schedule+0x94/0x98) from [<80526270>] 
> (schedule_timeout+0x38/0x2d0)
> <5>[  240.501284] [<80526270>] (schedule_timeout+0x38/0x2d0) from 
> [<805283a4>] (wait_for_common+0x164/0x1a0)
> <5>[  240.501298] [<805283a4>] (wait_for_common+0x164/0x1a0) from 
> [<805284b8>] (wait_for_completion+0x20/0x24)
> <5>[  240.501313] [<805284b8>] (wait_for_completion+0x20/0x24) from 
> [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84)
> <5>[  240.501327] [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) from 
> [<803d81c0>] (mmc_start_req+0x60/0x120)
> <5>[  240.501342] [<803d81c0>] (mmc_start_req+0x60/0x120) from [<803e402c>] 
> (mmc_blk_issue_rw_rq+0xa0/0x3a8)
> <5>[  240.501355] [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from 
> [<803e4758>] (mmc_blk_issue_rq+0x424/0x478)
> <5>[  240.501368] [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) from 
> [<803e587c>] (mmc_queue_thread+0xb0/0x118)
> <5>[  240.501382] [<803e587c>] (mmc_queue_thread+0xb0/0x118) from 
> [<8004d61c>] (kthread+0xa8/0xbc)
> <5>[  240.501396] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] 
> (kernel_thread_exit+0x0/0x8)
> <0>[  240.501407] Kernel panic - not syncing: hung_task: blocked tasks
> <5>[  240.501421] [<800150a4>] (unwind_backtrace+0x0/0x114) from [<80521920>] 
> (dump_stack+0x20/0x24)
> <5>[  240.501434] [<80521920>] (dump_stack+0x20/0x24) from [<80521a90>] 
> (panic+0xa8/0x1f4)
> <5>[  240.501447] [<80521a90>] (panic+0xa8/0x1f4) from [<80086d3c>] 
> (watchdog+0x1f4/0x25c)
> <5>[  240.501459] [<80086d3c>] (watchdog+0x1f4/0x25c) from [<8004d61c>] 
> (kthread+0xa8/0xbc)
> <5>[  240.501471] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] 
> (kernel_thread_exit+0x0/0x8)
>
> I was able to reproduce the mmcqd "hung task" timeout consistently
> with this fio command line on an Exynos5250 system with Toshiba HS200
> eMMC running in HS200 mode:
> fio --name=short_randwrite --size=2G --time_based --runtime=3m \
> --readwrite=randwrite --numjobs=2 --bs=4k --norandommap \
> --ioengine=psync --direct=0 --filename=/dev/mmcblk0p5
>
> I believe the key parameters are "--numjobs=2" (or more) and "randwrite"
> workload.  Then the completions are happening around the same time as
> mmc_start_req() is referencing and/or updating host->areq.
>
> I was NOT able to consistently reproduce the problem on a similar
> Exynos 5250 system which had "engineering samples" of Samsung HS200
> capable eMMC installed. Just my clue that the timing is different
> (and the fio performance numbers are different too).
>
> Signed-off-by: Grant Grundler 
> ---
>  drivers/mmc/core/core.c | 34 +++---
>  1 file changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 36cfe91..e5a9599 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -529,29 +529,40 @@ struct mmc_async_req *mmc_start_req(struct mmc_host 
> *host,
>  {
> int saved_err = 0;
> int start_err = 0;
> -   struct mmc_async_req *saved_areq = host->areq;
> +   struct mmc_async_req *saved_areq;
> +   unsigned long flags;
>
> -   if (!saved_areq && !areq)
> -   /* Nothing to do...some code is polling. */
> +   spin_lock_irqsave(>lock, flags);
> +   saved_areq = host->areq;
> +   if (!saved_areq && !areq) {
> +   /* Nothing? Code is racing to harvest a completion. */
> +   spin_unlock_irqrestore(>lock, flags);
> goto set_error;
> +   }
>
> /* Prepare a new request */
> if (areq)
> mmc_pre_req(host, areq->mrq, !saved_areq);
>
> if (saved_areq) {
> +   /* This thread owns this IO (saved_areq) for now. */
> +   host->areq = NULL;
> +   

Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

2013-10-08 Thread Linus Torvalds
On Tue, Oct 8, 2013 at 3:48 PM, Greg Kroah-Hartman
 wrote:
>
> It's going to make things really noisy at boot time, but then it should
> settle down and not be bad at all.  Let's try it and see if it helps or
> not.

Yeah. And quite frankly, normally that whole DEBUG_KOBJECT_RELEASE
thing is hopefully only enabled in debug kernels (like maybe the
Fedora rawhide one, or at developers), so being a bit more verbose is
likely ok.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [xen] double fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

2013-10-08 Thread Fengguang Wu
On Tue, Oct 08, 2013 at 03:48:40PM -0700, Greg KH wrote:
> On Tue, Oct 08, 2013 at 11:14:17PM +0100, Russell King - ARM Linux wrote:
> > On Tue, Oct 08, 2013 at 08:17:42PM +0800, Fengguang Wu wrote:
> > > I find the above debug messages very helpful in locating the buggy
> > > driver. How about enabling it whenever CONFIG_DEBUG_KOBJECT_RELEASE is
> > > enabled? Something like
> > > 
> > >  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> > > -   pr_debug("kobject: '%s' (%p): %s, parent %p (delayed)\n",
> > > +   printk(KERN_INFO "kobject: '%s' (%p): %s, parent %p (delayed)\n",
> > >  kobject_name(kobj), kobj, __func__, kobj->parent);
> > > 
> > > pr_debug() won't be displayed by default, and it depends on
> > > CONFIG_DYNAMIC_DEBUG.
> > 
> > Please consider using pr_info() instead of printk(KERN_INFO - it's
> > slightly less typing.

Good suggestion!

> > I can see that being a useful change while we have these problematical
> > instances to track down, but I do wonder whether it'll make the thing
> > too noisy.  Does anyone have other opinions on this point?  Linus?
> > Greg?
> 
> It's going to make things really noisy at boot time, but then it should
> settle down and not be bad at all.

FYI, in the randconfig kernel involved in this thread, it will emit
about 20 lines of dmesg.

> Let's try it and see if it helps or not.

OK, I'll submit a patch. These messages would allow me to do a
statistic analyze of similar bugs: since I'm testing 100+ new
randconfigs every day, some of them will include the buggy drivers and
some not, we can collect all these lines

[2.929669] kobject: 'drm' (880006db2848): kobject_release, parent 
88189648 (delayed)
...
[3.750200] kobject: 'mc13783-adc' (880007707000): kobject_release, 
parent 88189248 (delayed)

, count how many times each one shows up in the GOOD kernel boots and
how many show up in the BAD boots. Then we may be able to learn which
drivers are likely buggy and should be manually checked.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 09/13] binfmt_elf: count notes towards coredump limit

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 fs/binfmt_elf.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index bb59220..42ef312 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2034,7 +2034,6 @@ static int elf_core_dump(struct coredump_params *cprm)
int has_dumped = 0;
mm_segment_t fs;
int segs;
-   size_t size = 0;
struct vm_area_struct *vma, *gate_vma;
struct elfhdr *elf = NULL;
loff_t offset = 0, dataoff;
@@ -2155,7 +2154,6 @@ static int elf_core_dump(struct coredump_params *cprm)
if (!elf_core_write_extra_phdrs(cprm, offset))
goto end_coredump;
 
-   size = cprm->written;
/* write out the notes section */
if (!write_note_info(, cprm))
goto end_coredump;
@@ -2167,7 +2165,6 @@ static int elf_core_dump(struct coredump_params *cprm)
if (!dump_seek(cprm->file, dataoff - cprm->written))
goto end_coredump;
 
-   cprm->written = size;
for (vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma)) {
unsigned long addr;
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 01/13] restore 32bit aout coredump

2013-10-08 Thread Al Viro

just getting rid of bitrot

Signed-off-by: Al Viro 
---
 arch/x86/ia32/ia32_aout.c |   70 +++--
 1 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index bae3aba..80361c0 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -33,14 +34,18 @@
 #include 
 
 #undef WARN_OLD
-#undef CORE_DUMP /* definitely broken */
 
 static int load_aout_binary(struct linux_binprm *);
 static int load_aout_library(struct file *);
 
-#ifdef CORE_DUMP
-static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file,
- unsigned long limit);
+#ifdef CONFIG_COREDUMP
+static int aout_core_dump(struct coredump_params *);
+
+static unsigned long get_dr(int n)
+{
+   struct perf_event *bp = current->thread.ptrace_bps[n];
+   return bp ? bp->hw.info.address : 0;
+}
 
 /*
  * fill in the user structure for a core dump..
@@ -48,6 +53,7 @@ static int aout_core_dump(long signr, struct pt_regs *regs, 
struct file *file,
 static void dump_thread32(struct pt_regs *regs, struct user32 *dump)
 {
u32 fs, gs;
+   memset(dump, 0, sizeof(*dump));
 
 /* changed the size calculations - should hopefully work better. lbt */
dump->magic = CMAGIC;
@@ -57,15 +63,12 @@ static void dump_thread32(struct pt_regs *regs, struct 
user32 *dump)
dump->u_dsize = ((unsigned long)
 (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
-   dump->u_ssize = 0;
-   dump->u_debugreg[0] = current->thread.debugreg0;
-   dump->u_debugreg[1] = current->thread.debugreg1;
-   dump->u_debugreg[2] = current->thread.debugreg2;
-   dump->u_debugreg[3] = current->thread.debugreg3;
-   dump->u_debugreg[4] = 0;
-   dump->u_debugreg[5] = 0;
+   dump->u_debugreg[0] = get_dr(0);
+   dump->u_debugreg[1] = get_dr(1);
+   dump->u_debugreg[2] = get_dr(2);
+   dump->u_debugreg[3] = get_dr(3);
dump->u_debugreg[6] = current->thread.debugreg6;
-   dump->u_debugreg[7] = current->thread.debugreg7;
+   dump->u_debugreg[7] = current->thread.ptrace_dr7;
 
if (dump->start_stack < 0xc000) {
unsigned long tmp;
@@ -74,24 +77,24 @@ static void dump_thread32(struct pt_regs *regs, struct 
user32 *dump)
dump->u_ssize = tmp >> PAGE_SHIFT;
}
 
-   dump->regs.bx = regs->bx;
-   dump->regs.cx = regs->cx;
-   dump->regs.dx = regs->dx;
-   dump->regs.si = regs->si;
-   dump->regs.di = regs->di;
-   dump->regs.bp = regs->bp;
-   dump->regs.ax = regs->ax;
+   dump->regs.ebx = regs->bx;
+   dump->regs.ecx = regs->cx;
+   dump->regs.edx = regs->dx;
+   dump->regs.esi = regs->si;
+   dump->regs.edi = regs->di;
+   dump->regs.ebp = regs->bp;
+   dump->regs.eax = regs->ax;
dump->regs.ds = current->thread.ds;
dump->regs.es = current->thread.es;
savesegment(fs, fs);
dump->regs.fs = fs;
savesegment(gs, gs);
dump->regs.gs = gs;
-   dump->regs.orig_ax = regs->orig_ax;
-   dump->regs.ip = regs->ip;
+   dump->regs.orig_eax = regs->orig_ax;
+   dump->regs.eip = regs->ip;
dump->regs.cs = regs->cs;
-   dump->regs.flags = regs->flags;
-   dump->regs.sp = regs->sp;
+   dump->regs.eflags = regs->flags;
+   dump->regs.esp = regs->sp;
dump->regs.ss = regs->ss;
 
 #if 1 /* FIXME */
@@ -107,7 +110,7 @@ static struct linux_binfmt aout_format = {
.module = THIS_MODULE,
.load_binary= load_aout_binary,
.load_shlib = load_aout_library,
-#ifdef CORE_DUMP
+#ifdef CONFIG_COREDUMP
.core_dump  = aout_core_dump,
 #endif
.min_coredump   = PAGE_SIZE
@@ -122,7 +125,7 @@ static void set_brk(unsigned long start, unsigned long end)
vm_brk(start, end - start);
 }
 
-#ifdef CORE_DUMP
+#ifdef CONFIG_COREDUMP
 /*
  * These are the only things you should do on a core-file: use only these
  * macros to write out all the necessary info.
@@ -131,14 +134,14 @@ static void set_brk(unsigned long start, unsigned long 
end)
 #include 
 
 #define DUMP_WRITE(addr, nr)\
-   if (!dump_write(file, (void *)(addr), (nr))) \
+   if (!dump_write(cprm->file, (void *)(addr), (nr))) \
goto end_coredump;
 
 #define DUMP_SEEK(offset)  \
-   if (!dump_seek(file, offset))   \
+   if (!dump_seek(cprm->file, offset)) \
goto end_coredump;
 
-#define START_DATA()   (u.u_tsize << PAGE_SHIFT)
+#define START_DATA(u)  (u.u_tsize << PAGE_SHIFT)
 #define START_STACK(u) (u.start_stack)
 
 /*
@@ -151,8 +154,7 @@ static void set_brk(unsigned long start, unsigned long end)
  * dumping of the process results in another error..
  */
 

[RFC][PATCH 11/13] dump_skip(): dump_seek() replacement taking coredump_params

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/x86/ia32/ia32_aout.c |2 +-
 fs/binfmt_aout.c  |2 +-
 fs/binfmt_elf.c   |4 ++--
 fs/binfmt_elf_fdpic.c |   11 +++
 fs/coredump.c |   43 ++-
 include/linux/coredump.h  |3 +--
 6 files changed, 22 insertions(+), 43 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 9e26e9e..d21ff89 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -187,7 +187,7 @@ static int aout_core_dump(struct coredump_params *cprm)
if (!dump_emit(cprm, , sizeof(dump)))
goto end_coredump;
/* Now dump all of the user data.  Include malloced stuff as well */
-   if (!dump_seek(cprm->file, PAGE_SIZE - sizeof(dump)))
+   if (!dump_skip(cprm, PAGE_SIZE - sizeof(dump)))
goto end_coredump;
/* now we start writing out the user space info */
set_fs(USER_DS);
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index a4f847f..ca0ba15 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -87,7 +87,7 @@ static int aout_core_dump(struct coredump_params *cprm)
if (!dump_emit(cprm, , sizeof(dump)))
goto end_coredump;
 /* Now dump all of the user data.  Include malloced stuff as well */
-   if (!dump_seek(cprm->file, PAGE_SIZE - sizeof(dump)))
+   if (!dump_skip(cprm, PAGE_SIZE - sizeof(dump)))
goto end_coredump;
 /* now we start writing out the user space info */
set_fs(USER_DS);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 42ef312..da33c1d 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2162,7 +2162,7 @@ static int elf_core_dump(struct coredump_params *cprm)
goto end_coredump;
 
/* Align to page */
-   if (!dump_seek(cprm->file, dataoff - cprm->written))
+   if (!dump_skip(cprm, dataoff - cprm->written))
goto end_coredump;
 
for (vma = first_vma(current, gate_vma); vma != NULL;
@@ -2183,7 +2183,7 @@ static int elf_core_dump(struct coredump_params *cprm)
kunmap(page);
page_cache_release(page);
} else
-   stop = !dump_seek(cprm->file, PAGE_SIZE);
+   stop = !dump_skip(cprm, PAGE_SIZE);
if (stop)
goto end_coredump;
}
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 1806e25..723fa02 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1510,7 +1510,7 @@ static bool elf_fdpic_dump_segments(struct 
coredump_params *cprm,
kunmap(page);
page_cache_release(page);
} else {
-   res = dump_seek(file, PAGE_SIZE);
+   res = dump_skip(cprm, PAGE_SIZE);
}
if (!res)
return false;
@@ -1548,11 +1548,10 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
int has_dumped = 0;
mm_segment_t fs;
int segs;
-   size_t size = 0;
int i;
struct vm_area_struct *vma;
struct elfhdr *elf = NULL;
-   loff_t offset = 0, dataoff, foffset;
+   loff_t offset = 0, dataoff;
int numnote;
struct memelfnote *notes = NULL;
struct elf_prstatus *prstatus = NULL;   /* NT_PRSTATUS */
@@ -1683,7 +1682,6 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
 
offset += sizeof(*elf); /* Elf header */
offset += segs * sizeof(struct elf_phdr);   /* Program headers */
-   foffset = offset;
 
/* Write notes phdr entry */
{
@@ -1752,8 +1750,6 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
if (!elf_core_write_extra_phdrs(cprm, offset))
goto end_coredump;
 
-   size = cprm->written;
-   cprm->written = foffset;
/* write out the notes section */
for (i = 0; i < numnote; i++)
if (!writenote(notes + i, cprm))
@@ -1769,10 +1765,9 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
goto end_coredump;
}
 
-   if (!dump_seek(cprm->file, dataoff - cprm->written))
+   if (!dump_skip(cprm->file, dataoff - cprm->written))
goto end_coredump;
 
-   cprm->written = size;
if (!elf_fdpic_dump_segments(cprm))
goto end_coredump;
 
diff --git a/fs/coredump.c b/fs/coredump.c
index 478ebad..489372e 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -685,14 +685,6 @@ fail:
  * do on a core-file: use only these functions to write out all the
  * necessary info.
  */
-int dump_write(struct file *file, const void *addr, int nr)
-{
-   

[RFC][PATCH 12/13] spufs: get rid of dump_emit() wrappers

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/powerpc/platforms/cell/spufs/coredump.c |   69 +++--
 1 files changed, 20 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c 
b/arch/powerpc/platforms/cell/spufs/coredump.c
index 5d9b0a2..158 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -50,33 +50,6 @@ static ssize_t do_coredump_read(int num, struct spu_context 
*ctx, void *buffer,
return ++ret; /* count trailing NULL */
 }
 
-/*
- * These are the only things you should do on a core-file: use only these
- * functions to write out all the necessary info.
- */
-static int spufs_dump_write(struct coredump_params *cprm, const void *addr, 
int nr)
-{
-   if (!dump_emit(cprm, addr, nr))
-   return -EIO;
-   return 0;
-}
-
-static int spufs_dump_align(struct coredump_params *cprm, char *buf, loff_t 
new_off)
-{
-   int rc, size;
-
-   size = min((loff_t)PAGE_SIZE, new_off - cprm->written);
-   memset(buf, 0, size);
-
-   rc = 0;
-   while (rc == 0 && new_off > cprm->written) {
-   size = min((loff_t)PAGE_SIZE, new_off - cprm->written);
-   rc = spufs_dump_write(cprm, buf, size);
-   }
-
-   return rc;
-}
-
 static int spufs_ctx_note_size(struct spu_context *ctx, int dfd)
 {
int i, sz, total = 0;
@@ -159,7 +132,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, 
int i,
  struct coredump_params *cprm, int dfd)
 {
loff_t pos = 0;
-   int sz, rc, nread, total = 0;
+   int sz, rc, total = 0;
const int bufsz = PAGE_SIZE;
char *name;
char fullname[80], *buf;
@@ -177,38 +150,36 @@ static int spufs_arch_write_note(struct spu_context *ctx, 
int i,
en.n_descsz = sz;
en.n_type = NT_SPU;
 
-   rc = spufs_dump_write(cprm, , sizeof(en));
-   if (rc)
-   goto out;
+   if (!dump_emit(cprm, , sizeof(en)))
+   goto Eio;
 
-   rc = spufs_dump_write(cprm, fullname, en.n_namesz);
-   if (rc)
-   goto out;
+   if (!dump_emit(cprm, fullname, en.n_namesz))
+   goto Eio;
 
-   rc = spufs_dump_align(cprm, buf, roundup(cprm->written, 4));
-   if (rc)
-   goto out;
+   if (!dump_skip(cprm, roundup(cprm->written, 4) - cprm->written))
+   goto Eio;
 
do {
-   nread = do_coredump_read(i, ctx, buf, bufsz, );
-   if (nread > 0) {
-   rc = spufs_dump_write(cprm, buf, nread);
-   if (rc)
-   goto out;
-   total += nread;
+   rc = do_coredump_read(i, ctx, buf, bufsz, );
+   if (rc > 0) {
+   if (!dump_emit(cprm, buf, rc))
+   goto Eio;
+   total += rc;
}
-   } while (nread == bufsz && total < sz);
+   } while (rc == bufsz && total < sz);
 
-   if (nread < 0) {
-   rc = nread;
+   if (rc < 0)
goto out;
-   }
-
-   rc = spufs_dump_align(cprm, buf, roundup(cprm->written - total + sz, 
4));
 
+   if (!dump_skip(cprm,
+  roundup(cprm->written - total + sz, 4) - cprm->written))
+   goto Eio;
 out:
free_page((unsigned long)buf);
return rc;
+Eio:
+   free_page((unsigned long)buf);
+   return -EIO;
 }
 
 int spufs_coredump_extra_notes_write(struct coredump_params *cprm)
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 07/13] switch elf_coredump_extra_notes_write() to dump_emit()

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/powerpc/include/asm/spu.h   |3 +-
 arch/powerpc/platforms/cell/spu_syscalls.c   |5 ++-
 arch/powerpc/platforms/cell/spufs/coredump.c |   44 ++
 arch/powerpc/platforms/cell/spufs/spufs.h|3 +-
 fs/binfmt_elf.c  |7 ++--
 include/linux/elf.h  |6 ++--
 6 files changed, 30 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/spu.h b/arch/powerpc/include/asm/spu.h
index 93f280e..37b7ca3 100644
--- a/arch/powerpc/include/asm/spu.h
+++ b/arch/powerpc/include/asm/spu.h
@@ -235,6 +235,7 @@ extern long spu_sys_callback(struct spu_syscall_block *s);
 
 /* syscalls implemented in spufs */
 struct file;
+struct coredump_params;
 struct spufs_calls {
long (*create_thread)(const char __user *name,
unsigned int flags, umode_t mode,
@@ -242,7 +243,7 @@ struct spufs_calls {
long (*spu_run)(struct file *filp, __u32 __user *unpc,
__u32 __user *ustatus);
int (*coredump_extra_notes_size)(void);
-   int (*coredump_extra_notes_write)(struct file *file, loff_t *foffset);
+   int (*coredump_extra_notes_write)(struct coredump_params *cprm);
void (*notify_spus_active)(void);
struct module *owner;
 };
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c 
b/arch/powerpc/platforms/cell/spu_syscalls.c
index db4e638..3844f13 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -126,7 +127,7 @@ int elf_coredump_extra_notes_size(void)
return ret;
 }
 
-int elf_coredump_extra_notes_write(struct file *file, loff_t *foffset)
+int elf_coredump_extra_notes_write(struct coredump_params *cprm)
 {
struct spufs_calls *calls;
int ret;
@@ -135,7 +136,7 @@ int elf_coredump_extra_notes_write(struct file *file, 
loff_t *foffset)
if (!calls)
return 0;
 
-   ret = calls->coredump_extra_notes_write(file, foffset);
+   ret = calls->coredump_extra_notes_write(cprm);
 
spufs_calls_put(calls);
 
diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c 
b/arch/powerpc/platforms/cell/spufs/coredump.c
index c9500ea..5d9b0a2 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -27,6 +27,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -52,35 +54,24 @@ static ssize_t do_coredump_read(int num, struct spu_context 
*ctx, void *buffer,
  * These are the only things you should do on a core-file: use only these
  * functions to write out all the necessary info.
  */
-static int spufs_dump_write(struct file *file, const void *addr, int nr, 
loff_t *foffset)
+static int spufs_dump_write(struct coredump_params *cprm, const void *addr, 
int nr)
 {
-   unsigned long limit = rlimit(RLIMIT_CORE);
-   ssize_t written;
-
-   if (*foffset + nr > limit)
+   if (!dump_emit(cprm, addr, nr))
return -EIO;
-
-   written = file->f_op->write(file, addr, nr, >f_pos);
-   *foffset += written;
-
-   if (written != nr)
-   return -EIO;
-
return 0;
 }
 
-static int spufs_dump_align(struct file *file, char *buf, loff_t new_off,
-   loff_t *foffset)
+static int spufs_dump_align(struct coredump_params *cprm, char *buf, loff_t 
new_off)
 {
int rc, size;
 
-   size = min((loff_t)PAGE_SIZE, new_off - *foffset);
+   size = min((loff_t)PAGE_SIZE, new_off - cprm->written);
memset(buf, 0, size);
 
rc = 0;
-   while (rc == 0 && new_off > *foffset) {
-   size = min((loff_t)PAGE_SIZE, new_off - *foffset);
-   rc = spufs_dump_write(file, buf, size, foffset);
+   while (rc == 0 && new_off > cprm->written) {
+   size = min((loff_t)PAGE_SIZE, new_off - cprm->written);
+   rc = spufs_dump_write(cprm, buf, size);
}
 
return rc;
@@ -165,7 +156,7 @@ int spufs_coredump_extra_notes_size(void)
 }
 
 static int spufs_arch_write_note(struct spu_context *ctx, int i,
- struct file *file, int dfd, loff_t *foffset)
+ struct coredump_params *cprm, int dfd)
 {
loff_t pos = 0;
int sz, rc, nread, total = 0;
@@ -186,22 +177,22 @@ static int spufs_arch_write_note(struct spu_context *ctx, 
int i,
en.n_descsz = sz;
en.n_type = NT_SPU;
 
-   rc = spufs_dump_write(file, , sizeof(en), foffset);
+   rc = spufs_dump_write(cprm, , sizeof(en));
if (rc)
goto out;
 
-   rc = spufs_dump_write(file, fullname, en.n_namesz, foffset);
+   rc = spufs_dump_write(cprm, fullname, en.n_namesz);
if (rc)
goto out;
 
-   rc = 

[RFC][PATCH 06/13] convert the rest of binfmt_elf_fdpic to dump_emit()

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c |  109 ++---
 1 files changed, 31 insertions(+), 78 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 77bf7e3..1806e25 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1267,35 +1267,23 @@ static int notesize(struct memelfnote *en)
 
 /* #define DEBUG */
 
-#define DUMP_WRITE(addr, nr, foffset)  \
-   do { if (!dump_write(file, (addr), (nr))) return 0; *foffset += (nr); } 
while(0)
-
-static int alignfile(struct file *file, loff_t *foffset)
+static int alignfile(struct coredump_params *cprm)
 {
static const char buf[4] = { 0, };
-   DUMP_WRITE(buf, roundup(*foffset, 4) - *foffset, foffset);
-   return 1;
+   return dump_emit(cprm, buf, roundup(cprm->written, 4) - cprm->written);
 }
 
-static int writenote(struct memelfnote *men, struct file *file,
-   loff_t *foffset)
+static int writenote(struct memelfnote *men, struct coredump_params *cprm)
 {
struct elf_note en;
en.n_namesz = strlen(men->name) + 1;
en.n_descsz = men->datasz;
en.n_type = men->type;
 
-   DUMP_WRITE(, sizeof(en), foffset);
-   DUMP_WRITE(men->name, en.n_namesz, foffset);
-   if (!alignfile(file, foffset))
-   return 0;
-   DUMP_WRITE(men->data, men->datasz, foffset);
-   if (!alignfile(file, foffset))
-   return 0;
-
-   return 1;
+   return dump_emit(cprm, , sizeof(en)) &&
+   dump_emit(cprm, men->name, en.n_namesz) && alignfile(cprm) &&
+   dump_emit(cprm, men->data, men->datasz) && alignfile(cprm);
 }
-#undef DUMP_WRITE
 
 static inline void fill_elf_fdpic_header(struct elfhdr *elf, int segs)
 {
@@ -1500,12 +1488,10 @@ static void fill_extnum_info(struct elfhdr *elf, struct 
elf_shdr *shdr4extnum,
 /*
  * dump the segments for an MMU process
  */
-#ifdef CONFIG_MMU
-static int elf_fdpic_dump_segments(struct file *file, size_t *size,
-  unsigned long *limit, unsigned long mm_flags)
+static bool elf_fdpic_dump_segments(struct coredump_params *cprm,
+  unsigned long mm_flags)
 {
struct vm_area_struct *vma;
-   int err = 0;
 
for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
unsigned long addr;
@@ -1513,53 +1499,30 @@ static int elf_fdpic_dump_segments(struct file *file, 
size_t *size,
if (!maydump(vma, mm_flags))
continue;
 
+#ifdef CONFIG_MMU
for (addr = vma->vm_start; addr < vma->vm_end;
addr += PAGE_SIZE) {
+   bool res;
struct page *page = get_dump_page(addr);
if (page) {
void *kaddr = kmap(page);
-   *size += PAGE_SIZE;
-   if (*size > *limit)
-   err = -EFBIG;
-   else if (!dump_write(file, kaddr, PAGE_SIZE))
-   err = -EIO;
+   res = dump_emit(cprm, kaddr, PAGE_SIZE);
kunmap(page);
page_cache_release(page);
-   } else if (!dump_seek(file, PAGE_SIZE))
-   err = -EFBIG;
-   if (err)
-   goto out;
+   } else {
+   res = dump_seek(file, PAGE_SIZE);
+   }
+   if (!res)
+   return false;
}
-   }
-out:
-   return err;
-}
-#endif
-
-/*
- * dump the segments for a NOMMU process
- */
-#ifndef CONFIG_MMU
-static int elf_fdpic_dump_segments(struct file *file, size_t *size,
-  unsigned long *limit, unsigned long mm_flags)
-{
-   struct vm_area_struct *vma;
-
-   for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
-   if (!maydump(vma, mm_flags))
-   continue;
-
-   if ((*size += PAGE_SIZE) > *limit)
-   return -EFBIG;
-
-   if (!dump_write(file, (void *) vma->vm_start,
+#else
+   if (!dump_emit(cprm, (void *) vma->vm_start,
vma->vm_end - vma->vm_start))
-   return -EIO;
+   return false;
+#endif
}
-
-   return 0;
+   return true;
 }
-#endif
 
 static size_t elf_core_vma_data_size(unsigned long mm_flags)
 {
@@ -1755,13 +1718,10 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
 
offset = dataoff;
 
-   size += sizeof(*elf);
-   if (size > cprm->limit || !dump_write(cprm->file, elf, sizeof(*elf)))
+   if (!dump_emit(cprm, elf, sizeof(*elf)))
 

[PATCH v2] vsprintf: Check real user/group id for %pK

2013-10-08 Thread Ryan Mallon
Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.

This happens for example with the setuid pppd application on Ubuntu 12.04:

  $ head -1 /proc/kallsyms
   T startup_32

  $ pppd file /proc/kallsyms
  pppd: In file /proc/kallsyms: unrecognized option 'c100'

This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.

Fix this by adding a check that in addition to the current process
having CAP_SYSLOG, that effective user and group ids are equal to the
real ids. If a setuid binary reads the contents of a file which uses
%pK then the pointer values will be printed as NULL if the real user
is unprivileged.

Signed-off-by: Ryan Mallon 
---
Changes since v1:

 * Fix the description to say 'vsprintf' instead of 'printk'.
 * Clarify the open() vs read() time checks in the patch description and
code comment.
 * Remove comment about 'badly written' setuid binaries. This occurs
with setuids binaries which correctly handle opening files.
 * Added extra people to the Cc list.
---

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 26559bd..c02faf3 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -1312,10 +1312,26 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
spec.field_width = default_width;
return string(buf, end, "pK-error", spec);
}
-   if (!((kptr_restrict == 0) ||
- (kptr_restrict == 1 &&
-  has_capability_noaudit(current, CAP_SYSLOG
-   ptr = NULL;
+
+   /*
+* If kptr_restrict is set to 2, then %pK always prints as
+* NULL. If it is set to 1, then only print the real pointer
+* value if the current proccess has CAP_SYSLOG and is running
+* with the same credentials it started with. This is because
+* access to files is checked at open() time, but %pK checks
+* permission at read() time. We don't want to leak pointer
+* values if a binary opens a file using %pK and then elevates
+* privileges before reading it.
+*/
+   {
+   const struct cred *cred = current_cred();
+
+   if (kptr_restrict == 2 || (kptr_restrict == 1 &&
+(!has_capability_noaudit(current, CAP_SYSLOG) ||
+ !uid_eq(cred->euid, cred->uid) ||
+ !gid_eq(cred->egid, cred->gid
+   ptr = NULL;
+   }
break;
case 'N':
switch (fmt[1]) {




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 04/13] switch elf_core_write_extra_data() to dump_emit()

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/ia64/kernel/elfcore.c |6 ++
 arch/x86/um/elfcore.c  |8 ++--
 fs/binfmt_elf.c|4 +++-
 fs/binfmt_elf_fdpic.c  |4 +++-
 include/linux/elfcore.h|2 +-
 5 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/ia64/kernel/elfcore.c b/arch/ia64/kernel/elfcore.c
index 798ce54..04bc8fd 100644
--- a/arch/ia64/kernel/elfcore.c
+++ b/arch/ia64/kernel/elfcore.c
@@ -40,8 +40,7 @@ int elf_core_write_extra_phdrs(struct coredump_params *cprm, 
loff_t offset)
return 1;
 }
 
-int elf_core_write_extra_data(struct file *file, size_t *size,
- unsigned long limit)
+int elf_core_write_extra_data(struct coredump_params *cprm)
 {
const struct elf_phdr *const gate_phdrs =
(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
@@ -52,8 +51,7 @@ int elf_core_write_extra_data(struct file *file, size_t *size,
void *addr = (void *)gate_phdrs[i].p_vaddr;
size_t memsz = PAGE_ALIGN(gate_phdrs[i].p_memsz);
 
-   *size += memsz;
-   if (*size > limit || !dump_write(file, addr, memsz))
+   if (!dump_emit(cprm, addr, memsz))
return 0;
break;
}
diff --git a/arch/x86/um/elfcore.c b/arch/x86/um/elfcore.c
index fc21f98..7bb89a2 100644
--- a/arch/x86/um/elfcore.c
+++ b/arch/x86/um/elfcore.c
@@ -38,8 +38,7 @@ int elf_core_write_extra_phdrs(struct coredump_params *cprm, 
loff_t offset)
return 1;
 }
 
-int elf_core_write_extra_data(struct file *file, size_t *size,
- unsigned long limit)
+int elf_core_write_extra_data(struct coredump_params *cprm)
 {
if ( vsyscall_ehdr ) {
const struct elfhdr *const ehdrp =
@@ -52,10 +51,7 @@ int elf_core_write_extra_data(struct file *file, size_t 
*size,
if (phdrp[i].p_type == PT_LOAD) {
void *addr = (void *) phdrp[i].p_vaddr;
size_t filesz = phdrp[i].p_filesz;
-
-   *size += filesz;
-   if (*size > limit
-   || !dump_write(file, addr, filesz))
+   if (!dump_emit(cprm, addr, filesz))
return 0;
}
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index aba4199..666a5a5 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2196,8 +2196,10 @@ static int elf_core_dump(struct coredump_params *cprm)
}
}
 
-   if (!elf_core_write_extra_data(cprm->file, , cprm->limit))
+   cprm->written = size;
+   if (!elf_core_write_extra_data(cprm))
goto end_coredump;
+   size = cprm->written;
 
if (e_phnum == PN_XNUM) {
size += sizeof(*shdr4extnum);
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 44db8b9..77bf7e3 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1818,8 +1818,10 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
cprm->mm_flags) < 0)
goto end_coredump;
 
-   if (!elf_core_write_extra_data(cprm->file, , cprm->limit))
+   cprm->written = size;
+   if (!elf_core_write_extra_data(cprm))
goto end_coredump;
+   size = cprm->written;
 
if (e_phnum == PN_XNUM) {
size += sizeof(*shdr4extnum);
diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index 1b92a8c..698d51a 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -67,7 +67,7 @@ extern Elf_Half elf_core_extra_phdrs(void);
 extern int
 elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset);
 extern int
-elf_core_write_extra_data(struct file *file, size_t *size, unsigned long 
limit);
+elf_core_write_extra_data(struct coredump_params *cprm);
 extern size_t elf_core_extra_data_size(void);
 
 #endif /* _LINUX_ELFCORE_H */
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 13/13] new helper: dump_align()

2013-10-08 Thread Al Viro

dump_skip to given alignment...

Signed-off-by: Al Viro 
---
 arch/powerpc/platforms/cell/spufs/coredump.c |2 +-
 fs/binfmt_elf.c  |   10 ++
 fs/binfmt_elf_fdpic.c|   10 ++
 fs/coredump.c|6 ++
 include/linux/coredump.h |1 +
 5 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c 
b/arch/powerpc/platforms/cell/spufs/coredump.c
index 158..be6212d 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -156,7 +156,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, 
int i,
if (!dump_emit(cprm, fullname, en.n_namesz))
goto Eio;
 
-   if (!dump_skip(cprm, roundup(cprm->written, 4) - cprm->written))
+   if (!dump_align(cprm, 4))
goto Eio;
 
do {
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index da33c1d..b28d141 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1225,12 +1225,6 @@ static int notesize(struct memelfnote *en)
return sz;
 }
 
-static int alignfile(struct coredump_params *cprm)
-{
-   static const char buf[4] = { 0, };
-   return dump_emit(cprm, buf, roundup(cprm->written, 4) - cprm->written);
-}
-
 static int writenote(struct memelfnote *men, struct coredump_params *cprm)
 {
struct elf_note en;
@@ -1239,8 +1233,8 @@ static int writenote(struct memelfnote *men, struct 
coredump_params *cprm)
en.n_type = men->type;
 
return dump_emit(cprm, , sizeof(en)) &&
-   dump_emit(cprm, men->name, en.n_namesz) && alignfile(cprm) &&
-   dump_emit(cprm, men->data, men->datasz) && alignfile(cprm);
+   dump_emit(cprm, men->name, en.n_namesz) && dump_align(cprm, 4) &&
+   dump_emit(cprm, men->data, men->datasz) && dump_align(cprm, 4);
 }
 
 static void fill_elf_header(struct elfhdr *elf, int segs,
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 723fa02..8847a91 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1267,12 +1267,6 @@ static int notesize(struct memelfnote *en)
 
 /* #define DEBUG */
 
-static int alignfile(struct coredump_params *cprm)
-{
-   static const char buf[4] = { 0, };
-   return dump_emit(cprm, buf, roundup(cprm->written, 4) - cprm->written);
-}
-
 static int writenote(struct memelfnote *men, struct coredump_params *cprm)
 {
struct elf_note en;
@@ -1281,8 +1275,8 @@ static int writenote(struct memelfnote *men, struct 
coredump_params *cprm)
en.n_type = men->type;
 
return dump_emit(cprm, , sizeof(en)) &&
-   dump_emit(cprm, men->name, en.n_namesz) && alignfile(cprm) &&
-   dump_emit(cprm, men->data, men->datasz) && alignfile(cprm);
+   dump_emit(cprm, men->name, en.n_namesz) && dump_align(cprm, 4) 
&&
+   dump_emit(cprm, men->data, men->datasz) && dump_align(cprm, 4);
 }
 
 static inline void fill_elf_fdpic_header(struct elfhdr *elf, int segs)
diff --git a/fs/coredump.c b/fs/coredump.c
index 489372e..ed68432 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -728,3 +728,9 @@ int dump_skip(struct coredump_params *cprm, size_t nr)
}
 }
 EXPORT_SYMBOL(dump_skip);
+
+int dump_align(struct coredump_params *cprm, int align)
+{
+   return dump_skip(cprm, roundup(cprm->written, align) - cprm->written);
+}
+EXPORT_SYMBOL(dump_align);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 07a0af9..d8eb880be 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -13,6 +13,7 @@
 struct coredump_params;
 extern int dump_skip(struct coredump_params *cprm, size_t nr);
 extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
+extern int dump_align(struct coredump_params *cprm, int align);
 #ifdef CONFIG_COREDUMP
 extern void do_coredump(siginfo_t *siginfo);
 #else
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 08/13] aout: switch to dump_emit

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/x86/ia32/ia32_aout.c |   20 
 fs/binfmt_aout.c  |7 +++
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 80361c0..9e26e9e 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -133,14 +133,6 @@ static void set_brk(unsigned long start, unsigned long end)
 
 #include 
 
-#define DUMP_WRITE(addr, nr)\
-   if (!dump_write(cprm->file, (void *)(addr), (nr))) \
-   goto end_coredump;
-
-#define DUMP_SEEK(offset)  \
-   if (!dump_seek(cprm->file, offset)) \
-   goto end_coredump;
-
 #define START_DATA(u)  (u.u_tsize << PAGE_SHIFT)
 #define START_STACK(u) (u.start_stack)
 
@@ -192,22 +184,26 @@ static int aout_core_dump(struct coredump_params *cprm)
 
set_fs(KERNEL_DS);
/* struct user */
-   DUMP_WRITE(, sizeof(dump));
+   if (!dump_emit(cprm, , sizeof(dump)))
+   goto end_coredump;
/* Now dump all of the user data.  Include malloced stuff as well */
-   DUMP_SEEK(PAGE_SIZE - sizeof(dump));
+   if (!dump_seek(cprm->file, PAGE_SIZE - sizeof(dump)))
+   goto end_coredump;
/* now we start writing out the user space info */
set_fs(USER_DS);
/* Dump the data area */
if (dump.u_dsize != 0) {
dump_start = START_DATA(dump);
dump_size = dump.u_dsize << PAGE_SHIFT;
-   DUMP_WRITE(dump_start, dump_size);
+   if (!dump_emit(cprm, (void *)dump_start, dump_size))
+   goto end_coredump;
}
/* Now prepare to dump the stack area */
if (dump.u_ssize != 0) {
dump_start = START_STACK(dump);
dump_size = dump.u_ssize << PAGE_SHIFT;
-   DUMP_WRITE(dump_start, dump_size);
+   if (!dump_emit(cprm, (void *)dump_start, dump_size))
+   goto end_coredump;
}
 end_coredump:
set_fs(fs);
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index faaa819..a4f847f 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -45,7 +45,6 @@ static int load_aout_library(struct file*);
  */
 static int aout_core_dump(struct coredump_params *cprm)
 {
-   struct file *file = cprm->file;
mm_segment_t fs;
int has_dumped = 0;
void __user *dump_start;
@@ -85,7 +84,7 @@ static int aout_core_dump(struct coredump_params *cprm)
 
set_fs(KERNEL_DS);
 /* struct user */
-   if (!dump_write(file, , sizeof(dump)))
+   if (!dump_emit(cprm, , sizeof(dump)))
goto end_coredump;
 /* Now dump all of the user data.  Include malloced stuff as well */
if (!dump_seek(cprm->file, PAGE_SIZE - sizeof(dump)))
@@ -96,14 +95,14 @@ static int aout_core_dump(struct coredump_params *cprm)
if (dump.u_dsize != 0) {
dump_start = START_DATA(dump);
dump_size = dump.u_dsize << PAGE_SHIFT;
-   if (!dump_write(file, dump_start, dump_size))
+   if (!dump_emit(cprm, dump_start, dump_size))
goto end_coredump;
}
 /* Now prepare to dump the stack area */
if (dump.u_ssize != 0) {
dump_start = START_STACK(dump);
dump_size = dump.u_ssize << PAGE_SHIFT;
-   if (!dump_write(file, dump_start, dump_size))
+   if (!dump_emit(cprm, dump_start, dump_size))
goto end_coredump;
}
 end_coredump:
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 10/13] make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

2013-10-08 Thread Al Viro

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro 
---
 fs/coredump.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 319f973..478ebad 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -696,13 +696,20 @@ EXPORT_SYMBOL(dump_write);
 int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
 {
struct file *file = cprm->file;
-   if (dump_interrupted() || !access_ok(VERIFY_READ, addr, nr))
-   return 0;
+   loff_t pos = file->f_pos;
+   ssize_t n;
if (cprm->written + nr > cprm->limit)
return 0;
-   if (file->f_op->write(file, addr, nr, >f_pos) != nr)
-   return 0;
-   cprm->written += nr;
+   while (nr) {
+   if (dump_interrupted())
+   return 0;
+   n = vfs_write(file, addr, nr, );
+   if (n < 0)
+   return 0;
+   file->f_pos = pos;
+   cprm->written += n;
+   nr -= n;
+   }
return 1;
 }
 EXPORT_SYMBOL(dump_emit);
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 02/13] new helper: dump_emit()

2013-10-08 Thread Al Viro

dump_write() analog, takes core_dump_params instead of file,
keeps track of the amount written in cprm->written and checks for
cprm->limit.  Start using it in binfmt_elf.c...

Signed-off-by: Al Viro 
---
 fs/binfmt_elf.c  |   62 +
 fs/coredump.c|   14 ++
 include/linux/binfmts.h  |1 +
 include/linux/coredump.h |2 +
 4 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 501c8a4..4bdf6b6 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1225,35 +1225,23 @@ static int notesize(struct memelfnote *en)
return sz;
 }
 
-#define DUMP_WRITE(addr, nr, foffset)  \
-   do { if (!dump_write(file, (addr), (nr))) return 0; *foffset += (nr); } 
while(0)
-
-static int alignfile(struct file *file, loff_t *foffset)
+static int alignfile(struct coredump_params *cprm)
 {
static const char buf[4] = { 0, };
-   DUMP_WRITE(buf, roundup(*foffset, 4) - *foffset, foffset);
-   return 1;
+   return dump_emit(cprm, buf, roundup(cprm->written, 4) - cprm->written);
 }
 
-static int writenote(struct memelfnote *men, struct file *file,
-   loff_t *foffset)
+static int writenote(struct memelfnote *men, struct coredump_params *cprm)
 {
struct elf_note en;
en.n_namesz = strlen(men->name) + 1;
en.n_descsz = men->datasz;
en.n_type = men->type;
 
-   DUMP_WRITE(, sizeof(en), foffset);
-   DUMP_WRITE(men->name, en.n_namesz, foffset);
-   if (!alignfile(file, foffset))
-   return 0;
-   DUMP_WRITE(men->data, men->datasz, foffset);
-   if (!alignfile(file, foffset))
-   return 0;
-
-   return 1;
+   return dump_emit(cprm, , sizeof(en)) &&
+   dump_emit(cprm, men->name, en.n_namesz) && alignfile(cprm) &&
+   dump_emit(cprm, men->data, men->datasz) && alignfile(cprm);
 }
-#undef DUMP_WRITE
 
 static void fill_elf_header(struct elfhdr *elf, int segs,
u16 machine, u32 flags)
@@ -1702,7 +1690,7 @@ static size_t get_note_info_size(struct elf_note_info 
*info)
  * process-wide notes are interleaved after the first thread-specific note.
  */
 static int write_note_info(struct elf_note_info *info,
-  struct file *file, loff_t *foffset)
+  struct coredump_params *cprm)
 {
bool first = 1;
struct elf_thread_core_info *t = info->thread;
@@ -1710,22 +1698,22 @@ static int write_note_info(struct elf_note_info *info,
do {
int i;
 
-   if (!writenote(>notes[0], file, foffset))
+   if (!writenote(>notes[0], cprm))
return 0;
 
-   if (first && !writenote(>psinfo, file, foffset))
+   if (first && !writenote(>psinfo, cprm))
return 0;
-   if (first && !writenote(>signote, file, foffset))
+   if (first && !writenote(>signote, cprm))
return 0;
-   if (first && !writenote(>auxv, file, foffset))
+   if (first && !writenote(>auxv, cprm))
return 0;
if (first && info->files.data &&
-   !writenote(>files, file, foffset))
+   !writenote(>files, cprm))
return 0;
 
for (i = 1; i < info->thread_notes; ++i)
if (t->notes[i].data &&
-   !writenote(>notes[i], file, foffset))
+   !writenote(>notes[i], cprm))
return 0;
 
first = 0;
@@ -1934,14 +1922,14 @@ static size_t get_note_info_size(struct elf_note_info 
*info)
return sz;
 }
 
-static int write_note_info(struct elf_note_info *info,
-  struct file *file, loff_t *foffset)
+static int write_note_info(struct elf_note_info *info
+  struct coredump_params *cprm)
 {
int i;
struct list_head *t;
 
for (i = 0; i < info->numnote; i++)
-   if (!writenote(info->notes + i, file, foffset))
+   if (!writenote(info->notes + i, cprm))
return 0;
 
/* write out the thread status notes section */
@@ -1950,7 +1938,7 @@ static int write_note_info(struct elf_note_info *info,
list_entry(t, struct elf_thread_status, list);
 
for (i = 0; i < tmp->num_notes; i++)
-   if (!writenote(>notes[i], file, foffset))
+   if (!writenote(>notes[i], cprm))
return 0;
}
 
@@ -2136,13 +2124,10 @@ static int elf_core_dump(struct coredump_params *cprm)
 
offset = dataoff;
 
-   size += sizeof(*elf);
-   if (size > cprm->limit || !dump_write(cprm->file, elf, sizeof(*elf)))
+   if 

[RFC][PATCH 05/13] binfmt_elf: convert writing actual dump pages to dump_emit()

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 fs/binfmt_elf.c |   14 +++---
 1 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 666a5a5..bc01aaf 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2093,7 +2093,6 @@ static int elf_core_dump(struct coredump_params *cprm)
 
offset += sizeof(*elf); /* Elf header */
offset += segs * sizeof(struct elf_phdr);   /* Program headers */
-   foffset = offset;
 
/* Write notes phdr entry */
{
@@ -2157,7 +2156,6 @@ static int elf_core_dump(struct coredump_params *cprm)
goto end_coredump;
 
size = cprm->written;
-   cprm->written = foffset;/* will disappear */
/* write out the notes section */
if (!write_note_info(, cprm))
goto end_coredump;
@@ -2170,6 +2168,7 @@ static int elf_core_dump(struct coredump_params *cprm)
if (!dump_seek(cprm->file, dataoff - foffset))
goto end_coredump;
 
+   cprm->written = size;
for (vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma)) {
unsigned long addr;
@@ -2184,9 +2183,7 @@ static int elf_core_dump(struct coredump_params *cprm)
page = get_dump_page(addr);
if (page) {
void *kaddr = kmap(page);
-   stop = ((size += PAGE_SIZE) > cprm->limit) ||
-   !dump_write(cprm->file, kaddr,
-   PAGE_SIZE);
+   stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
kunmap(page);
page_cache_release(page);
} else
@@ -2196,16 +2193,11 @@ static int elf_core_dump(struct coredump_params *cprm)
}
}
 
-   cprm->written = size;
if (!elf_core_write_extra_data(cprm))
goto end_coredump;
-   size = cprm->written;
 
if (e_phnum == PN_XNUM) {
-   size += sizeof(*shdr4extnum);
-   if (size > cprm->limit
-   || !dump_write(cprm->file, shdr4extnum,
-  sizeof(*shdr4extnum)))
+   if (!dump_emit(cprm, shdr4extnum, sizeof(*shdr4extnum)))
goto end_coredump;
}
 
-- 
1.7.2.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 00/13] coredump cleanups

2013-10-08 Thread Al Viro
This series tries to regularize the coredump writes/seeks/rlimit 
handling
etc.  Quite a bit of boilerplate code removed, another open-coded caller of
->write() converted to use of normal codepath (which was the original reason
I've got into that mess).  RLIMIT_CORE handling got a lot more uniform,
short writes handling fixed (we had weird cases where a very large ->write()
to pipe might happen and where short write had been treated as an error).

It needs more testing; I don't have any test setups that could deal
with spufs, for example, and while it seems to produce normal coredumps on
what I have tested, I'd still like more eyes on it.

The same series is in the tip of vfs.git#experimental, so if you prefer
git access for review...

Shortlog:
  restore 32bit aout coredump
  new helper: dump_emit()
  switch elf_core_write_extra_phdrs() to dump_emit()
  switch elf_core_write_extra_data() to dump_emit()
  binfmt_elf: convert writing actual dump pages to dump_emit()
  convert the rest of binfmt_elf_fdpic to dump_emit()
  switch elf_coredump_extra_notes_write() to dump_emit()
  aout: switch to dump_emit
  binfmt_elf: count notes towards coredump limit
  make dump_emit() use vfs_write() instead of banging at ->f_op->write 
directly
  dump_skip(): dump_seek() replacement taking coredump_params
  spufs: get rid of dump_emit() wrappers
  new helper: dump_align()

Diffstat:
 arch/ia64/kernel/elfcore.c   |   12 +--
 arch/powerpc/include/asm/spu.h   |3 +-
 arch/powerpc/platforms/cell/spu_syscalls.c   |5 +-
 arch/powerpc/platforms/cell/spufs/coredump.c |   89 ++--
 arch/powerpc/platforms/cell/spufs/spufs.h|3 +-
 arch/x86/ia32/ia32_aout.c|   86 ++--
 arch/x86/um/elfcore.c|   15 +---
 fs/binfmt_aout.c |9 +-
 fs/binfmt_elf.c  |   84 ++-
 fs/binfmt_elf_fdpic.c|  114 +++---
 fs/coredump.c|   64 +--
 include/linux/binfmts.h  |1 +
 include/linux/coredump.h |6 +-
 include/linux/elf.h  |6 +-
 include/linux/elfcore.h  |7 +-
 kernel/elfcore.c |   10 +--
 16 files changed, 196 insertions(+), 318 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 03/13] switch elf_core_write_extra_phdrs() to dump_emit()

2013-10-08 Thread Al Viro

Signed-off-by: Al Viro 
---
 arch/ia64/kernel/elfcore.c |6 ++
 arch/x86/um/elfcore.c  |7 ++-
 fs/binfmt_elf.c|4 ++--
 fs/binfmt_elf_fdpic.c  |4 +++-
 include/linux/elfcore.h|5 +++--
 kernel/elfcore.c   |   10 +++---
 6 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/ia64/kernel/elfcore.c b/arch/ia64/kernel/elfcore.c
index bac1639..798ce54 100644
--- a/arch/ia64/kernel/elfcore.c
+++ b/arch/ia64/kernel/elfcore.c
@@ -11,8 +11,7 @@ Elf64_Half elf_core_extra_phdrs(void)
return GATE_EHDR->e_phnum;
 }
 
-int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
-  unsigned long limit)
+int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
 {
const struct elf_phdr *const gate_phdrs =
(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
@@ -35,8 +34,7 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t 
offset, size_t *size,
phdr.p_offset += ofs;
}
phdr.p_paddr = 0; /* match other core phdrs */
-   *size += sizeof(phdr);
-   if (*size > limit || !dump_write(file, , sizeof(phdr)))
+   if (!dump_emit(cprm, , sizeof(phdr)))
return 0;
}
return 1;
diff --git a/arch/x86/um/elfcore.c b/arch/x86/um/elfcore.c
index 6bb49b6..fc21f98 100644
--- a/arch/x86/um/elfcore.c
+++ b/arch/x86/um/elfcore.c
@@ -11,8 +11,7 @@ Elf32_Half elf_core_extra_phdrs(void)
return vsyscall_ehdr ? (((struct elfhdr *)vsyscall_ehdr)->e_phnum) : 0;
 }
 
-int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
-  unsigned long limit)
+int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
 {
if ( vsyscall_ehdr ) {
const struct elfhdr *const ehdrp =
@@ -32,9 +31,7 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t 
offset, size_t *size,
phdr.p_offset += ofs;
}
phdr.p_paddr = 0; /* match other core phdrs */
-   *size += sizeof(phdr);
-   if (*size > limit
-   || !dump_write(file, , sizeof(phdr)))
+   if (!dump_emit(cprm, , sizeof(phdr)))
return 0;
}
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 4bdf6b6..aba4199 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2152,11 +2152,11 @@ static int elf_core_dump(struct coredump_params *cprm)
if (!dump_emit(cprm, , sizeof(phdr)))
goto end_coredump;
}
-   size = cprm->written;
 
-   if (!elf_core_write_extra_phdrs(cprm->file, offset, , cprm->limit))
+   if (!elf_core_write_extra_phdrs(cprm, offset))
goto end_coredump;
 
+   size = cprm->written;
cprm->written = foffset;/* will disappear */
/* write out the notes section */
if (!write_note_info(, cprm))
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index ea4c627..44db8b9 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1791,9 +1791,11 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
goto end_coredump;
}
 
-   if (!elf_core_write_extra_phdrs(cprm->file, offset, , cprm->limit))
+   cprm->written = size;
+   if (!elf_core_write_extra_phdrs(cprm, offset))
goto end_coredump;
 
+   size = cprm->written;
/* write out the notes section */
for (i = 0; i < numnote; i++)
if (!writenote(notes + i, cprm->file, ))
diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index cdd3d13..1b92a8c 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -6,6 +6,8 @@
 #include 
 #include 
 
+struct coredump_params;
+
 static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs 
*regs)
 {
 #ifdef ELF_CORE_COPY_REGS
@@ -63,8 +65,7 @@ static inline int elf_core_copy_task_xfpregs(struct 
task_struct *t, elf_fpxregse
  */
 extern Elf_Half elf_core_extra_phdrs(void);
 extern int
-elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
-  unsigned long limit);
+elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset);
 extern int
 elf_core_write_extra_data(struct file *file, size_t *size, unsigned long 
limit);
 extern size_t elf_core_extra_data_size(void);
diff --git a/kernel/elfcore.c b/kernel/elfcore.c
index ff915ef..e556751 100644
--- a/kernel/elfcore.c
+++ b/kernel/elfcore.c
@@ -1,23 +1,19 @@
 #include 
 #include 
 #include 
-
-#include 
-
+#include 
 
 Elf_Half __weak elf_core_extra_phdrs(void)
 {
return 0;
 }
 
-int __weak elf_core_write_extra_phdrs(struct file 

administrativní Zpráva

2013-10-08 Thread Univerzita Karlova



--
Vážení Web - Mail Majitel účtu:
 
To přišlo na naše upozornění , že vaše e-mailová neprošel ověření / 
aktualizace proces, který jsme v současné době pracuje na . V současné 
době modernizace naší databázi a e - mailový účet centrum , čímž Smazání 
všech starých webový e-mail e-mailový účet vytvořit více prostoru pro 
nové účty .


Chcete-li zabránit svému účtu od uzavření budete muset aktualizovat 
tak, že budeme vědět, že jev současné době používán účet . Chcete-li 
dokončit svůj účet re- potvrzení , musíte odpovědět na tento e-mail 
okamžitě a zadejte své údaje o účtu , jak požaduje níže.


***
Email uživatelské jméno :  .
Heslo : ...
Potvrdit heslo: ...
Datum narození: ...



 Důležité:
 
Tato aktualizace je povinná v důsledku našich nedávných serveru změní . 
Pokud se vám nepodaří aktualizovat svou e-mailovou adresu, budete brzy 
moci přijímat / odesílat mails.Also Váš e-mail nebude vybaven 
nejnovějším antivirovým systémem na našich nových servers.This bude vaše 
e-mailová a PC náchylné k útokům virů z internet .

 

 Omlouváme se za způsobené nepříjemnosti .
  Webmail účet Upgrader "
** 
**
 Jedná se o správní Vzkaz od Univerzita Karlova a obsah musí být 
považovány za důvěrné za všech okolností
 ** 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 0/1] FPGA subsystem core

2013-10-08 Thread delicious quinoa
On Tue, Oct 8, 2013 at 4:44 PM, Greg Kroah-Hartman
 wrote:
> On Tue, Oct 08, 2013 at 12:00:14PM -0500, Alan Tull wrote:
>> On Fri, 2013-10-04 at 16:33 -0700, Greg Kroah-Hartman wrote:
>> > On Fri, Oct 04, 2013 at 11:12:13AM -0700, H. Peter Anvin wrote:
>> > > On 10/04/2013 10:44 AM, Michal Simek wrote:
>> > > >
>> > > > If you look at it in general I believe that there is wide range of
>> > > > applications which just contain one bitstream per fpga and the
>> > > > bitstream is replaced by newer version in upgrade. For them
>> > > > firmware interface should be pretty useful. Just setup firmware
>> > > > name with bitstream and it will be automatically loaded in startup
>> > > > phase.
>> > > >
>> > > > Then there is another set of applications especially in connection
>> > > > to partial reconfiguration where this can be done statically by
>> > > > pregenerated partial bitstreams or automatically generated on
>> > > > target cpu. For doing everything on the target firmware interface
>> > > > is not the best because everything can be handled by user
>> > > > application and it is easier just to push this bitstream to do
>> > > > device and not to save it to the fs.
>> > > >
>> > > > I think the question here is if this subsystem could have several
>> > > > interfaces. For example Alan is asking for adding char support.
>> > > > Does it even make sense to have more interfaces with the same
>> > > > backend driver? When this is answered then we can talk which one
>> > > > make sense to have. In v2 is sysfs and firmware one. Adding char
>> > > > is also easy to do.
>> > > >
>> > >
>> > > Greg, what do you think?
>> > >
>> > > I agree that the firmware interface makes sense when the use of the
>> > > FPGA is an implementation detail in a fixed hardware configuration,
>> > > but that is a fairly restricted use case all things considered.
>> >
>> > Ideally I thought this would be just like "firmware", you dump the file
>> > to the FPGA, it validates it and away you go with a new image running in
>> > the chip.
>> >
>> > But, it sounds like this is much more complicated, so much so that
>> > configfs might be the correct interface for it, as you can do lots of
>> > things there, and it is very flexible (some say too flexible...)
>> >
>> > A char device, with a zillion different custom ioctls is also a way to
>> > do it, but one that I really want to avoid as that gets messy really
>> > quickly.
>>
>> Hi Greg,
>>
>> We are discussing a char device that has very few interfaces:
>>  - a way of writing the image to fpga
>>  - a way of getting fpga manager status
>>  - a way of setting fpga manager state
>>
>> This all looks like standard char driver interface to me.  Writing the
>> image could be writing to the devnode (cat image.bin > /dev/fpga0). The
>> status stuff would be sysfs attributes.  All normal stuff any char
>> driver in the kernel would do.  Why not just go with that?
>
> Because we really hate to add new ioctls to the kernel if at all
> possible.

I don't see any need for adding any ioctls.

> Using sysfs (and it's one-value-per-file rule), makes
> userspace tools easier, and managing the different devices in the system
> easier (you know _exactly_ which device you are talking to, you don't
> have to guess based on minor number).

That's cool.  The interface we could use is writing the raw fpga data
to /sys/class/fpga_manager/fpga0/fpga_config_data

Reading or setting the fpga state could be from
/sys/class/fpga_manager/fpga0/fpga_config_state

Or do I misunderstand?  Do you include sysfs attributes when you
are talking about ioctls?

Alan

>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] MAINTAINERS / Documentation: Update Rafael's e-mail address

2013-10-08 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The e-mail address r...@sisk.pl that I have been using for quite some
time is going to expire at one point, so replace it with a new one,
r...@rjwysocki.net, everywhere in MAINTAINERS and Documentation/ABI.

Signed-off-by: Rafael J. Wysocki 
---
 Documentation/ABI/testing/sysfs-devices-power |   32 +-
 Documentation/ABI/testing/sysfs-power |   22 -
 MAINTAINERS   |   12 -
 3 files changed, 33 insertions(+), 33 deletions(-)

Index: linux-pm/Documentation/ABI/testing/sysfs-devices-power
===
--- linux-pm.orig/Documentation/ABI/testing/sysfs-devices-power
+++ linux-pm/Documentation/ABI/testing/sysfs-devices-power
@@ -1,6 +1,6 @@
 What:  /sys/devices/.../power/
 Date:  January 2009
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../power directory contains attributes
allowing the user space to check and modify some power
@@ -8,7 +8,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup
 Date:  January 2009
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../power/wakeup attribute allows the user
space to check if the device is enabled to wake up the system
@@ -34,7 +34,7 @@ Description:
 
 What:  /sys/devices/.../power/control
 Date:  January 2009
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../power/control attribute allows the user
space to control the run-time power management of the device.
@@ -53,7 +53,7 @@ Description:
 
 What:  /sys/devices/.../power/async
 Date:  January 2009
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../async attribute allows the user space to
enable or diasble the device's suspend and resume callbacks to
@@ -79,7 +79,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_count
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_count attribute contains the number
of signaled wakeup events associated with the device.  This
@@ -88,7 +88,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_active_count
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_active_count attribute contains the
number of times the processing of wakeup events associated with
@@ -98,7 +98,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_abort_count
 Date:  February 2012
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_abort_count attribute contains the
number of times the processing of a wakeup event associated with
@@ -109,7 +109,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_expire_count
 Date:  February 2012
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_expire_count attribute contains the
number of times a wakeup event associated with the device has
@@ -119,7 +119,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_active
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_active attribute contains either 1,
or 0, depending on whether or not a wakeup event associated with
@@ -129,7 +129,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_total_time_ms
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_total_time_ms attribute contains
the total time of processing wakeup events associated with the
@@ -139,7 +139,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_max_time_ms
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_max_time_ms attribute contains
the maximum time of processing a single wakeup event associated
@@ -149,7 +149,7 @@ Description:
 
 What:  /sys/devices/.../power/wakeup_last_time_ms
 Date:  September 2010
-Contact:   Rafael J. Wysocki 
+Contact:   Rafael J. Wysocki 
 Description:
The /sys/devices/.../wakeup_last_time_ms attribute contains

Re: [PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions

2013-10-08 Thread Joseph S. Myers
On Tue, 8 Oct 2013, Joseph S. Myers wrote:

> I'll send as a followup the testcase I used for verifying that the
> instructions (other than the theoretical conversions to 64-bit
> integers) produce the correct results.  In addition, this has been
> tested with the glibc testsuite (with the e500 port as posted at
> , where it
> improves the libm test results.

Here is that testcase.

#include 
#include 

#define INFF __builtin_inff ()
#define INFD __builtin_inf ()
#define NANF __builtin_nanf ("")
#define NAND __builtin_nan ("")

/* e500 rounding modes: 0 = nearest, 1 = zero, 2 = up, 3 = down.  */

static inline void
set_rm (unsigned int mode)
{
  unsigned int spefscr;
  asm volatile ("mfspefscr %0" : "=r" (spefscr));
  spefscr = (spefscr & ~3) | mode;
  asm volatile ("mtspefscr %0" : : "r" (spefscr));
}

static int success_count, failure_count;

struct float_test_data
{
  float input;
  unsigned int expected[4];
};

struct double_test_data
{
  double input;
  unsigned int expected[4];
};

typedef float vfloat __attribute__ ((vector_size (8)));
typedef unsigned int vuint __attribute__ ((vector_size (8)));

union vfloat_union
{
  vfloat vf;
  float f[2];
};

union vuint_union
{
  vuint vui;
  unsigned int ui[2];
};

#define T(A, B, C, D, E) { (A), { (B), (C), (D), (E) } }
#define TZ(A, B) T (A, B, B, B, B)

static void
check_result (const char *insn, double input, unsigned int rm,
  unsigned int expected, unsigned int res)
{
  if (res == expected)
success_count++;
  else
{
  failure_count++;
  printf ("%s %a mode %u expected 0x%x (%d) got 0x%x (%d)\n",
  insn, input, rm, expected, (int) expected, res, (int) res);
}
}

#define RUN_FLOAT_TESTS(INSN)   \
static void \
test_##INSN (void)  \
{   \
  size_t i; \
  for (i = 0;   \
   i < sizeof (INSN##_test_data) / sizeof (INSN##_test_data[0]);\
   i++) \
{   \
  unsigned int rm;  \
  for (rm = 0; rm <= 3; rm++)   \
{   \
  set_rm (rm);  \
  unsigned int res; \
  asm volatile (#INSN " %0, %1" \
: "=" (res)   \
: "r" (INSN##_test_data[i].input)); \
  check_result (#INSN, INSN##_test_data[i].input, rm,   \
INSN##_test_data[i].expected[rm], res); \
}   \
}   \
}

#define RUN_VFLOAT_TESTS(INSN, TINSN)   \
static void \
test_##INSN (void)  \
{   \
  size_t i; \
  for (i = 0;   \
   i < sizeof (TINSN##_test_data) / sizeof (TINSN##_test_data[0]);  \
   i++) \
{   \
  unsigned int rm;  \
  for (rm = 0; rm <= 3; rm++)   \
{   \
  set_rm (rm);  \
  union vfloat_union varg;  \
  union vuint_union vres;   \
  varg.f[0] = TINSN##_test_data[i].input;   \
  varg.f[1] = 0;\
  asm volatile (#INSN " %0, %1" \
: "=" (vres.vui)  \
: "r" (varg.vf));   \
  check_result (#INSN " (high)", TINSN##_test_data[i].input,\
rm, TINSN##_test_data[i].expected[rm],  \
vres.ui[0]);   

[PATCH 0/2] Update my e-mail address and ACPI info in MAINTAINERS

2013-10-08 Thread Rafael J. Wysocki
Hi All,

The following two patches update my e-mail address (the one I use for kernel
development) and ACPI-related information in MAINTAINERS.

Kind regards,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] MAINTAINERS / ACPI: Update links and git tree information

2013-10-08 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Update the ACPI subsystem's git tree and Web links in the
MAINTAINERS file.

Signed-off-by: Rafael J. Wysocki 
---
 MAINTAINERS |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-pm/MAINTAINERS
===
--- linux-pm.orig/MAINTAINERS
+++ linux-pm/MAINTAINERS
@@ -239,9 +239,9 @@ ACPI
 M: Len Brown 
 M: Rafael J. Wysocki 
 L: linux-a...@vger.kernel.org
-W: http://www.lesswatts.org/projects/acpi/
-Q: http://patchwork.kernel.org/project/linux-acpi/list/
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
+W: https://01.org/linux-acpi
+Q: https://patchwork.kernel.org/project/linux-acpi/list/
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
 S: Supported
 F: drivers/acpi/
 F: drivers/pnp/pnpacpi/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] powerpc: fix e500 SPE float to integer and fixed-point conversions

2013-10-08 Thread Joseph S. Myers
From: Joseph Myers 

The e500 SPE floating-point emulation code has several problems in how
it handles conversions to integer and fixed-point fractional types.

There are the following 20 relevant instructions.  These can convert
to signed or unsigned 32-bit integers, either rounding towards zero
(as correct for C casts from floating-point to integer) or according
to the current rounding mode, or to signed or unsigned 32-bit
fixed-point values (values in the range [-1, 1) or [0, 1)).  For
conversion from double precision there are also instructions to
convert to 64-bit integers, rounding towards zero, although as far as
I know those instructions are completely theoretical (they are only
defined for implementations that support both SPE and classic 64-bit,
and I'm not aware of any such hardware even though the architecture
definition permits that combination).

#define EFSCTUI 0x2d4
#define EFSCTSI 0x2d5
#define EFSCTUF 0x2d6
#define EFSCTSF 0x2d7
#define EFSCTUIZ0x2d8
#define EFSCTSIZ0x2da

#define EVFSCTUI0x294
#define EVFSCTSI0x295
#define EVFSCTUF0x296
#define EVFSCTSF0x297
#define EVFSCTUIZ   0x298
#define EVFSCTSIZ   0x29a

#define EFDCTUIDZ   0x2ea
#define EFDCTSIDZ   0x2eb

#define EFDCTUI 0x2f4
#define EFDCTSI 0x2f5
#define EFDCTUF 0x2f6
#define EFDCTSF 0x2f7
#define EFDCTUIZ0x2f8
#define EFDCTSIZ0x2fa

The emulation code, for the instructions that come in variants
rounding either towards zero or according to the current rounding
direction, uses "if (func & 0x4)" as a condition for using _FP_ROUND
(otherwise _FP_ROUND_ZERO is used).  The condition is correct, but the
code it controls isn't.  Whether _FP_ROUND or _FP_ROUND_ZERO is used
makes no difference, as the effect of those soft-fp macros is to round
an intermediate floating-point result using the low three bits (the
last one sticky) of the working format.  As these operations are
dealing with a freshly unpacked floating-point input, those low bits
are zero and no rounding occurs.  The emulation code then uses the
FP_TO_INT_* macros for the actual integer conversion, with the effect
of always rounding towards zero; for rounding according to the current
rounding direction, it should be using FP_TO_INT_ROUND_*.

The instructions in question have semantics defined (in the Power ISA
documents) for out-of-range values and NaNs: out-of-range values
saturate and NaNs are converted to zero.  The emulation does nothing
to follow those semantics for NaNs (the soft-fp handling is to treat
them as infinities), and messes up the saturation semantics.  For
single-precision conversion to integers, (((func & 0x3) != 0) || SB_s)
is the condition used for doing a signed conversion.  The first part
is correct, but the second isn't: negative numbers should result in
saturation to 0 when converted to unsigned.  Double-precision
conversion to 64-bit integers correctly uses ((func & 0x1) == 0).
Double-precision conversion to 32-bit integers uses (((func & 0x3) !=
0) || DB_s), with correct first part and incorrect second part.  And
vector float conversion to integers uses (((func & 0x3) != 0) ||
SB0_s) (and similar for the other vector element), where the sign bit
check is again wrong.

The incorrect handling of negative numbers converted to unsigned was
introduced in commit afc0a07d4a283599ac3a6a31d7454e9baaeccca0.  The
rationale given there was a C testcase with cast from float to
unsigned int.  Conversion of out-of-range floating-point numbers to
integer types in C is undefined behavior in the base standard, defined
in Annex F to produce an unspecified value.  That is, the C testcase
used to justify that patch is incorrect - there is no ISO C
requirement for a particular value resulting from this conversion -
and in any case, the correct semantics for such emulation are the
semantics for the instruction (unsigned saturation, which is what it
does in hardware when the emulation is disabled).

The conversion to fixed-point values has its own problems.  That code
doesn't try to do a full emulation; it relies on the trap handler only
being called for arguments that are infinities, NaNs, subnormal or out
of range.  That's fine, but the logic ((vb.wp[1] >> 23) == 0xff &&
((vb.wp[1] & 0x7f) > 0)) for NaN detection won't detect negative
NaNs as being NaNs (the same applies for the double-precision case),
and subnormals are mapped to 0 rather than respecting the rounding
mode; the code should also explicitly raise the "invalid" exception.
The code for vectors works by executing the scalar float instruction
with the trapping disabled, meaning at least subnormals won't be
handled correctly.

As well as all those problems in the main emulation code, the rounding
handler - used to emulate rounding upward and downward when not
supported in hardware and when no higher priority exception occurred -
has its own problems.

* It gets 

Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Scott Wood
On Tue, 2013-10-08 at 17:25 -0600, Bjorn Helgaas wrote:
> >> - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
> >> + dma_addr_t msiir; /* MSIIR Address in CCSR */
> >
> > Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies
> > that it's the output of the DMA API, but I don't think the DMA API is
> > used in the MSI driver.  Perhaps it should be, but we still want the raw
> > physical address to pass on to VFIO.
> 
> I don't know what "msiir" is used for, but if it's an address you
> program into a PCI device, then it's a dma_addr_t even if you didn't
> get it from the DMA API.  Maybe "bus_addr_t" would have been a more
> suggestive name than "dma_addr_t".  That said, I have no idea how this
> relates to VFIO.

It's a bit awkward because it gets used both as something to program
into a PCI device (and it's probably a bug that the DMA API doesn't get
used), and also (if I understand the current plans correctly) as a
physical address to give to VFIO to be a destination address in an IOMMU
mapping.  So I think the value we keep here should be a phys_addr_t (it
comes straight from the MMIO address in the device tree), which gets
trivially turned into a dma_addr_t by the non-VFIO code path because
there's currently no translation there.

-Scott



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Trivial PATCH] media: Remove unnecessary semicolons

2013-10-08 Thread Joe Perches
These aren't necessary after switch and while statements.

Signed-off-by: Joe Perches 
---
 drivers/media/common/b2c2/flexcop-sram.c | 6 +++---
 drivers/media/dvb-frontends/cx24110.c| 2 +-
 drivers/media/dvb-frontends/cx24123.c| 2 +-
 drivers/media/dvb-frontends/tda8083.c| 4 ++--
 drivers/media/i2c/soc_camera/ov9640.c| 2 +-
 drivers/media/platform/exynos4-is/fimc-isp.c | 2 +-
 drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v5.c  | 2 +-
 drivers/media/platform/s5p-mfc/s5p_mfc_cmd_v6.c  | 2 +-
 drivers/media/platform/s5p-mfc/s5p_mfc_opr_v5.c  | 2 +-
 drivers/media/platform/s5p-tv/mixer_grp_layer.c  | 2 +-
 drivers/media/platform/s5p-tv/mixer_vp_layer.c   | 2 +-
 drivers/media/radio/si470x/radio-si470x-common.c | 2 +-
 drivers/media/v4l2-core/v4l2-ctrls.c | 2 +-
 13 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/media/common/b2c2/flexcop-sram.c 
b/drivers/media/common/b2c2/flexcop-sram.c
index f2199e4..185c285 100644
--- a/drivers/media/common/b2c2/flexcop-sram.c
+++ b/drivers/media/common/b2c2/flexcop-sram.c
@@ -85,7 +85,7 @@ static void flexcop_sram_write(struct adapter *adapter, u32 
bank, u32 addr, u8 *
while (((read_reg_dw(adapter, 0x700) & 0x8000) != 0) && 
(retries > 0)) {
mdelay(1);
retries--;
-   };
+   }
 
if (retries == 0)
printk("%s: SRAM timeout\n", __func__);
@@ -110,7 +110,7 @@ static void flex_sram_read(struct adapter *adapter, u32 
bank, u32 addr, u8 *buf,
while (((read_reg_dw(adapter, 0x700) & 0x8000) != 0) && 
(retries > 0)) {
mdelay(1);
retries--;
-   };
+   }
 
if (retries == 0)
printk("%s: SRAM timeout\n", __func__);
@@ -122,7 +122,7 @@ static void flex_sram_read(struct adapter *adapter, u32 
bank, u32 addr, u8 *buf,
while (((read_reg_dw(adapter, 0x700) & 0x8000) != 0) && 
(retries > 0)) {
mdelay(1);
retries--;
-   };
+   }
 
if (retries == 0)
printk("%s: SRAM timeout\n", __func__);
diff --git a/drivers/media/dvb-frontends/cx24110.c 
b/drivers/media/dvb-frontends/cx24110.c
index 0cd6927..95b981c 100644
--- a/drivers/media/dvb-frontends/cx24110.c
+++ b/drivers/media/dvb-frontends/cx24110.c
@@ -378,7 +378,7 @@ static int cx24110_set_voltage (struct dvb_frontend* fe, 
fe_sec_voltage_t voltag
return 
cx24110_writereg(state,0x76,(cx24110_readreg(state,0x76)&0x3b)|0x40);
default:
return -EINVAL;
-   };
+   }
 }
 
 static int cx24110_diseqc_send_burst(struct dvb_frontend* fe, 
fe_sec_mini_cmd_t burst)
diff --git a/drivers/media/dvb-frontends/cx24123.c 
b/drivers/media/dvb-frontends/cx24123.c
index a771da3..72fb583 100644
--- a/drivers/media/dvb-frontends/cx24123.c
+++ b/drivers/media/dvb-frontends/cx24123.c
@@ -739,7 +739,7 @@ static int cx24123_set_voltage(struct dvb_frontend *fe,
return 0;
default:
return -EINVAL;
-   };
+   }
 
return 0;
 }
diff --git a/drivers/media/dvb-frontends/tda8083.c 
b/drivers/media/dvb-frontends/tda8083.c
index 9d08350..69e62f4 100644
--- a/drivers/media/dvb-frontends/tda8083.c
+++ b/drivers/media/dvb-frontends/tda8083.c
@@ -189,7 +189,7 @@ static int tda8083_set_tone (struct tda8083_state* state, 
fe_sec_tone_mode_t ton
return tda8083_writereg (state, 0x29, 0x80);
default:
return -EINVAL;
-   };
+   }
 }
 
 static int tda8083_set_voltage (struct tda8083_state* state, fe_sec_voltage_t 
voltage)
@@ -201,7 +201,7 @@ static int tda8083_set_voltage (struct tda8083_state* 
state, fe_sec_voltage_t vo
return tda8083_writereg (state, 0x20, 0x11);
default:
return -EINVAL;
-   };
+   }
 }
 
 static int tda8083_send_diseqc_burst (struct tda8083_state* state, 
fe_sec_mini_cmd_t burst)
diff --git a/drivers/media/i2c/soc_camera/ov9640.c 
b/drivers/media/i2c/soc_camera/ov9640.c
index e968c3f..bc74224 100644
--- a/drivers/media/i2c/soc_camera/ov9640.c
+++ b/drivers/media/i2c/soc_camera/ov9640.c
@@ -371,7 +371,7 @@ static void ov9640_alter_regs(enum v4l2_mbus_pixelcode code,
alt->com13  = OV9640_COM13_RGB_AVG;
alt->com15  = OV9640_COM15_RGB_565;
break;
-   };
+   }
 }
 
 /* Setup registers according to resolution and color encoding */
diff --git a/drivers/media/platform/exynos4-is/fimc-isp.c 
b/drivers/media/platform/exynos4-is/fimc-isp.c
index d2e6cba..f3c6136 100644
--- a/drivers/media/platform/exynos4-is/fimc-isp.c
+++ b/drivers/media/platform/exynos4-is/fimc-isp.c
@@ -511,7 +511,7 @@ static int __ctrl_set_metering(struct fimc_is *is, 

Re: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bjorn Helgaas
>> - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
>> + dma_addr_t msiir; /* MSIIR Address in CCSR */
>
> Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies
> that it's the output of the DMA API, but I don't think the DMA API is
> used in the MSI driver.  Perhaps it should be, but we still want the raw
> physical address to pass on to VFIO.

I don't know what "msiir" is used for, but if it's an address you
program into a PCI device, then it's a dma_addr_t even if you didn't
get it from the DMA API.  Maybe "bus_addr_t" would have been a more
suggestive name than "dma_addr_t".  That said, I have no idea how this
relates to VFIO.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >