Re: [munlock] BUG: Bad page map in process killall5 pte:cf17e720 pmd:05a22067

2013-09-25 Thread Bob Liu
On Thu, Sep 26, 2013 at 2:49 PM, Vlastimil Babka  wrote:
> On 09/26/2013 02:40 AM, Fengguang Wu wrote:
>> Hi Vlastimil,
>>
>> FYI, this bug seems still not fixed in linux-next 20130925.
>
> Hi,
>
> I sent (including you) a RFC patch and later reviewed patch about week
> ago. I assumed you would test it, but I probably should make that
> request explicit, sorry. Anyway it was added to -mm an hour before your
> mail.
>

Great! And please ignore my noise in this thread.

-- 
Regards,
--Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 6/6] rwsem: do optimistic spinning for writer lock acquisition

2013-09-25 Thread Ingo Molnar

* Tim Chen  wrote:

> We want to add optimistic spinning to rwsems because
> the writer rwsem does not perform as well as mutexes. Tim noticed that
> for exim (mail server) workloads, when reverting commit 4fc3f1d6 and
> Davidlohr noticed it when converting the i_mmap_mutex to a rwsem in some
> aim7 workloads. We've noticed that the biggest difference
> is when we fail to acquire a mutex in the fastpath, optimistic spinning
> comes in to play and we can avoid a large amount of unnecessary sleeping
> and overhead of moving tasks in and out of wait queue.
> 
> Allowing optimistic spinning before putting the writer on the wait queue
> reduces wait queue contention and provided greater chance for the rwsem
> to get acquired. With these changes, rwsem is on par with mutex.
> 
> Reviewed-by: Ingo Molnar 
> Reviewed-by: Peter Zijlstra 
> Reviewed-by: Peter Hurley 
> Signed-off-by: Tim Chen 
> Signed-off-by: Davidlohr Bueso 
> ---
>  include/linux/rwsem.h |6 +-
>  kernel/rwsem.c|   19 +-
>  lib/rwsem.c   |  203 
> -
>  3 files changed, 207 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 0616ffe..ef5a83a 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -26,6 +26,8 @@ struct rw_semaphore {
>   longcount;
>   raw_spinlock_t  wait_lock;
>   struct list_headwait_list;
> + struct task_struct  *owner; /* write owner */
> + void*spin_mlock;

> +#define MLOCK(rwsem)((struct mcs_spin_node **)&((rwsem)->spin_mlock))

> + mcs_spin_lock(MLOCK(sem), &node);

> + mcs_spin_unlock(MLOCK(sem), &node);

> + mcs_spin_unlock(MLOCK(sem), &node);

> + mcs_spin_unlock(MLOCK(sem), &node);

That forced type casting is ugly and fragile.

To avoid having to include mcslock.h into rwsem.h just add a forward 
struct declaration, before the struct rw_semaphore definition:

struct mcs_spin_node;

Then define spin_mlock with the right type:

struct mcs_spin_node*spin_mlock;

I'd also suggest renaming 'spin_mlock', to reduce unnecessary variants. If 
the lock type name is 'struct mcs_spin_node' then 'mcs_lock' would be a 
perfect field name, right?

While at it, renaming mcs_spin_node to mcs_spinlock might be wise as well, 
and the include file would be named mcs_spinlock.h.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/21] cpuidle: coupled: don't compare cpu masks unnecessarily

2013-09-25 Thread Colin Cross
On Thu, Sep 26, 2013 at 1:36 AM, Viresh Kumar  wrote:
> On 26 September 2013 05:55, Colin Cross  wrote:
>> I don't agree with this.  This patch is a tiny optimization in code
>> that is rarely called, and it moves a final sanity check somewhere
>> that it might get missed if the code were later refactored.
>
> This is what we are doing for the first cpu of coupled-cpus:
> if (WARN_ON(!cpumask_equal(&dev->coupled_cpus, &dev->coupled_cpus)))
> coupled->prevent++;
>
> i.e. comparing a variable to itself :)
>
> And I believe my patch puts the sanity check at the right place (where
> we are using coupled from existing CPUs.. And that is where it should
> have been since the beginning)..
>
> If people miss this during code re-factoring, then it would be even more
> stupid on the part of Author and Reviewer.. And if it still gets missed
> then this is not the only place where we need to worry about such stuff..
>
> This is present everywhere in our code.. You can't really some part of
> code to some place and leave the other as-is.. The change is supposed
> to be more logical and so funny mistakes must be caught during reviews.

It's fine where it is.  Once dev and coupled are both known,
regardless of how they were found, it performs a final sanity check
that nothing went wrong.  Moving into a specific branch of the finding
code defeats the purpose.  Yes, it performs a useless check on the
first cpu, but it keeps the code readable and maintainable, so it
stays where it is.  NAK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [munlock] BUG: Bad page map in process killall5 pte:cf17e720 pmd:05a22067

2013-09-25 Thread Vlastimil Babka
On 09/26/2013 02:40 AM, Fengguang Wu wrote:
> Hi Vlastimil,
> 
> FYI, this bug seems still not fixed in linux-next 20130925.

Hi,

I sent (including you) a RFC patch and later reviewed patch about week
ago. I assumed you would test it, but I probably should make that
request explicit, sorry. Anyway it was added to -mm an hour before your
mail.

Vlastimil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-25 Thread Ingo Molnar

* Tim Chen  wrote:

> We will need the MCS lock code for doing optimistic spinning for rwsem.
> Extracting the MCS code from mutex.c and put into its own file allow us
> to reuse this code easily for rwsem.
> 
> Signed-off-by: Tim Chen 
> Signed-off-by: Davidlohr Bueso 
> ---
>  include/linux/mcslock.h |   58 
> +++
>  kernel/mutex.c  |   58 +-
>  2 files changed, 65 insertions(+), 51 deletions(-)
>  create mode 100644 include/linux/mcslock.h
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> new file mode 100644
> index 000..20fd3f0
> --- /dev/null
> +++ b/include/linux/mcslock.h
> @@ -0,0 +1,58 @@
> +/*
> + * MCS lock defines
> + *
> + * This file contains the main data structure and API definitions of MCS 
> lock.

A (very) short blurb about what an MCS lock is would be nice here.

> + */
> +#ifndef __LINUX_MCSLOCK_H
> +#define __LINUX_MCSLOCK_H
> +
> +struct mcs_spin_node {
> + struct mcs_spin_node *next;
> + int   locked;   /* 1 if lock acquired */
> +};

The vertical alignment looks broken here.

> +
> +/*
> + * We don't inline mcs_spin_lock() so that perf can correctly account for the
> + * time spent in this lock function.
> + */
> +static noinline
> +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
> +{
> + struct mcs_spin_node *prev;
> +
> + /* Init node */
> + node->locked = 0;
> + node->next   = NULL;
> +
> + prev = xchg(lock, node);
> + if (likely(prev == NULL)) {
> + /* Lock acquired */
> + node->locked = 1;
> + return;
> + }
> + ACCESS_ONCE(prev->next) = node;
> + smp_wmb();
> + /* Wait until the lock holder passes the lock down */
> + while (!ACCESS_ONCE(node->locked))
> + arch_mutex_cpu_relax();
> +}
> +
> +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct 
> mcs_spin_node *node)
> +{
> + struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> +
> + if (likely(!next)) {
> + /*
> +  * Release the lock by setting it to NULL
> +  */
> + if (cmpxchg(lock, node, NULL) == node)
> + return;
> + /* Wait until the next pointer is set */
> + while (!(next = ACCESS_ONCE(node->next)))
> + arch_mutex_cpu_relax();
> + }
> + ACCESS_ONCE(next->locked) = 1;
> + smp_wmb();
> +}
> +
> +#endif

We typically close header guards not via a plain #endif but like this:

#endif /* __LINUX_SPINLOCK_H */

#endif /* __LINUX_SPINLOCK_TYPES_H */

etc.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen/mmu: Correct PAT MST setting.

2013-09-25 Thread Jan Beulich
>>> On 25.09.13 at 21:29, Konrad Rzeszutek Wilk  wrote:
> Jan Beulich spotted that the PAT MSR settings in the Xen public
> document that "the first (PAT6) column was wrong across the
> board, and the column for PAT7 was missing altogether."
> 
> This updates it to be in sync.
> 
> Signed-off-by: Konrad Rzeszutek Wilk 

Acked-by: Jan Beulich 

> ---
>  arch/x86/xen/mmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index fdc3ba2..c9631e7 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -468,8 +468,8 @@ PV_CALLEE_SAVE_REGS_THUNK(xen_pgd_val);
>   * 3PCD PWT  UC   UC UC
>   * 4PAT  WB   WC WB
>   * 5PAT PWT  WC   WP WT
> - * 6PAT PCD  UC-  UC UC-
> - * 7PAT PCD PWT  UC   UC UC
> + * 6PAT PCD  UC-  rsvUC-
> + * 7PAT PCD PWT  UC   rsvUC
>   */
>  
>  void xen_set_pat(u64 pat)
> -- 
> 1.8.3.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 12/18] regulator: max8952: use devm_regulator_register()

2013-09-25 Thread Jingoo Han
On Thursday, September 26, 2013 11:10 AM, Jingoo Han wrote:
> 
> Use devm_regulator_register() to make cleanup paths simpler.
> 
> Signed-off-by: Jingoo Han 
> ---
>  drivers/regulator/max8952.c |6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 

[.]

> @@ -321,9 +322,6 @@ static int max8952_pmic_remove(struct i2c_client *client)
>  {
>   struct max8952_data *max8952 = i2c_get_clientdata(client);
>   struct max8952_platform_data *pdata = max8952->pdata;
> - struct regulator_dev *rdev = max8952->rdev;
> -
> - regulator_unregister(rdev);
> 
>   gpio_free(pdata->gpio_vid0);
>   gpio_free(pdata->gpio_vid1);

CC'ed Sachin Kamat,

Freeing the gpios before unregistering the device is not right,
even though it does not make the functional problems.
So, I will remove this patch from v3 patch series.

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/21] cpuidle: change governor from within cpuidle_replace_governor()

2013-09-25 Thread Viresh Kumar
On 26 September 2013 04:20, Daniel Lezcano  wrote:
> Actually I am wondering if all this stuff is not over-engineered.
>
> There are 2 governors, each of them suits for a specific kernel config,
> periodic tick or tickless system.
>
> menu => tickless system
> periodic => periodic tick system
>
> IMHO, all the code with rating checking and so do not really makes
> sense. Wouldn't make sense to remove this code ?

I am a newbie here, really can't think of all side effects of this :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG with kernel 3.11.1

2013-09-25 Thread Udo van den Heuvel
Hello,

I found this in /var/log/messages:

Sep 26 03:37:36 recorder kernel: BUG: unable to handle kernel NULL
pointer dereference at 0008
Sep 26 03:37:36 recorder kernel: IP: []
__rb_insert_augmented+0x23/0x200
Sep 26 03:37:36 recorder kernel: PGD cf8d5067 PUD caf6a067 PMD 0
Sep 26 03:37:36 recorder kernel: Oops:  [#1] PREEMPT SMP
Sep 26 03:37:36 recorder kernel: Modules linked in: f71882fg hid_generic
usbhid cp210x hid usbserial tda1004x budget_av saa7146_vv
videobuf_dma_sg videobuf_core v4l2_common videodev budget_core
ttpci_eeprom saa7146 snd_hda_codec_hdmi dvb_core snd_hda_codec_realtek
snd_hda_intel radeon snd_hda_codec fbcon snd_seq snd_seq_device bitblit
snd_pcm softcursor microcode font snd_page_alloc k10temp cfbfillrect
snd_timer evdev cfbimgblt cfbcopyarea backlight snd drm_kms_helper ttm
acpi_cpufreq i2c_piix4 ohci_pci ohci_hcd mperf processor button nfsd
auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4
ata_generic ehci_pci ehci_hcd sr_mod cdrom
Sep 26 03:37:36 recorder kernel: CPU: 0 PID: 10571 Comm: ld-linux-x86-64
Not tainted 3.11.1 #1
Sep 26 03:37:36 recorder kernel: Hardware name: To Be Filled By O.E.M.
To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  05/04/2010
Sep 26 03:37:36 recorder kernel: task: 8800cc44d950 ti:
8800cf98c000 task.ti: 8800cf98c000
Sep 26 03:37:36 recorder kernel: RIP: 0010:[]
[] __rb_insert_augmented+0x23/0x200
Sep 26 03:37:36 recorder kernel: RSP: 0018:8800cf98dde8  EFLAGS:
00010246
Sep 26 03:37:36 recorder kernel: RAX: 880108f90090 RBX:
 RCX: 880108f90090
Sep 26 03:37:36 recorder kernel: RDX: 810e20c0 RSI:
88010fa18a38 RDI: 880107164240
Sep 26 03:37:36 recorder kernel: RBP: 8801071211e8 R08:
0222 R09: 0020
Sep 26 03:37:36 recorder kernel: R10: 8801071641e8 R11:
8800cf98df48 R12: 88010fa18a38
Sep 26 03:37:36 recorder kernel: R13: 00394ac2 R14:
 R15: 880107233500
Sep 26 03:37:36 recorder kernel: FS:  7fa4bcd83700()
GS:88011fc0() knlGS:
Sep 26 03:37:36 recorder kernel: CS:  0010 DS:  ES:  CR0:
8005003b
Sep 26 03:37:36 recorder kernel: CR2: 0008 CR3:
caf9d000 CR4: 07f0
Sep 26 03:37:36 recorder kernel: Stack:
Sep 26 03:37:36 recorder kernel: 810e20c0 8801071641e8
8801071211e8 
Sep 26 03:37:36 recorder kernel: 810eb78b 88010fa18a38
00394ae23000 880107233540
Sep 26 03:37:36 recorder kernel: 88010c560ef8 88010fa18a18
880108e99180 
Sep 26 03:37:36 recorder kernel: Call Trace:
Sep 26 03:37:36 recorder kernel: [] ?
balloon_page_enqueue+0x90/0x90
Sep 26 03:37:36 recorder kernel: [] ?
vma_adjust+0x23b/0x670
Sep 26 03:37:36 recorder kernel: [] ?
__split_vma.isra.35+0x11f/0x1d0
Sep 26 03:37:36 recorder kernel: [] ?
mprotect_fixup+0x1c1/0x270
Sep 26 03:37:36 recorder kernel: [] ?
SyS_mprotect+0x153/0x250
Sep 26 03:37:36 recorder kernel: [] ?
system_call_fastpath+0x16/0x1b
Sep 26 03:37:36 recorder kernel: Code: 0f 1f 84 00 00 00 00 00 48 8b 07
48 85 c0 0f 84 ec 01 00 00 41 54 49 89 f4 55 53 48 83 ec 08 48 8b 18 f6
c3 01 0f 85 8c 01 00 00 <48> 8b 4b 08 49 89 d8 48 39 c8 74 4d 48 85 c9
0f 84 a8 00 00 00
Sep 26 03:37:36 recorder kernel: RIP  []
__rb_insert_augmented+0x23/0x200
Sep 26 03:37:36 recorder kernel: RSP 
Sep 26 03:37:36 recorder kernel: CR2: 0008
Sep 26 03:37:36 recorder kernel: ---[ end trace 7983ef4438c44a1f ]---

If you need more info, please let me know.

Kind regards,
Udo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/21] cpuidle: coupled: don't compare cpu masks unnecessarily

2013-09-25 Thread Viresh Kumar
On 26 September 2013 05:55, Colin Cross  wrote:
> I don't agree with this.  This patch is a tiny optimization in code
> that is rarely called, and it moves a final sanity check somewhere
> that it might get missed if the code were later refactored.

This is what we are doing for the first cpu of coupled-cpus:
if (WARN_ON(!cpumask_equal(&dev->coupled_cpus, &dev->coupled_cpus)))
coupled->prevent++;

i.e. comparing a variable to itself :)

And I believe my patch puts the sanity check at the right place (where
we are using coupled from existing CPUs.. And that is where it should
have been since the beginning)..

If people miss this during code re-factoring, then it would be even more
stupid on the part of Author and Reviewer.. And if it still gets missed
then this is not the only place where we need to worry about such stuff..

This is present everywhere in our code.. You can't really some part of
code to some place and leave the other as-is.. The change is supposed
to be more logical and so funny mistakes must be caught during reviews.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 07/18] regulator: lp3971: use devm_regulator_register()

2013-09-25 Thread Jingoo Han
On Thursday, September 26, 2013 11:08 AM, Jingoo Han wrote:
> 
> Use devm_regulator_register() to make cleanup paths simpler.
> 
> Signed-off-by: Jingoo Han 
> ---
>  drivers/regulator/lp3971.c |   11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
>

[]
 
> @@ -463,10 +462,6 @@ static int lp3971_i2c_probe(struct i2c_client *i2c,
>  static int lp3971_i2c_remove(struct i2c_client *i2c)
>  {
>   struct lp3971 *lp3971 = i2c_get_clientdata(i2c);
> - int i;
> -
> - for (i = 0; i < lp3971->num_regulators; i++)
> - regulator_unregister(lp3971->rdev[i]);
> 
>   kfree(lp3971->rdev);

CC'ed Sachin Kamat,

Calling regulator_unregister(lp3971->rdev) after kfree(lp3971->rdev)
would make the problem.
I will remove this patch from next v3 patch series.

Best regards,
Jingoo Han


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-09-25 Thread Henrik Rydberg
> > > This suggests that initialization may be attempted more than once. The 
> > > key cache
> > > is allocated only once, but the number of keys is read for each attempt.
> > > 
> > > No idea if that can happen, but if the number of keys can increase after
> > > the first initialization attempt you would have an explanation for the 
> > > crash.
> > 
> > Good idea, and easy enough to test with the patch below.
> > 
> Should we apply this patch even though it may not solve the specific problem ?

Yes, why not - it certainly won't hurt. I am running it right now, so
it is at least run-tested.

> Again, not sure if the key count can change, but the current code is at the 
> very
> least inconsistent, as it keeps reading the key count without updating or
> verifying the cache size.

Yes - I agree that the error state is far-fetched, but it is hard to
see any other logical explanation. There is of course always the
possibility that the problem is somewhere else completely.

Proper patch attached.

Thanks,
Henrik

---

>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg 
Date: Thu, 26 Sep 2013 08:33:16 +0200
Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding

After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.

Reported-by: 
Suggested-by: Guenter Roeck 
Signed-off-by: Henrik Rydberg 
---
 drivers/hwmon/applesmc.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..98814d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
 {
struct applesmc_registers *s = &smcreg;
bool left_light_sensor, right_light_sensor;
+   unsigned int count;
u8 tmp[1];
int ret;
 
if (s->init_complete)
return 0;
 
-   ret = read_register_count(&s->key_count);
+   ret = read_register_count(&count);
if (ret)
return ret;
 
+   if (s->cache && s->key_count != count) {
+   pr_warn("key count changed from %d to %d\n",
+   s->key_count, count);
+   kfree(s->cache);
+   s->cache = NULL;
+   }
+   s->key_count = count;
+
if (!s->cache)
s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
if (!s->cache)
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Michael wang
On 09/26/2013 01:34 PM, Mike Galbraith wrote:
> On Thu, 2013-09-26 at 13:12 +0800, Michael wang wrote: 
>> On 09/26/2013 11:41 AM, Mike Galbraith wrote:
>> [snip]
 Like the case when we have:

core0 sgcore1 sg
cpu0cpu1cpu2cpu3
waker   busyidleidle

 If the sync wakeup was on cpu0, we can:

 1. choose cpu in core1 sg like we did usually
some overhead but tend to make the load a little balance
core0 sgcore1 sg
cpu0cpu1cpu2cpu3
idlebusywakee   idle
>>>
>>> Reducing latency and increasing throughput when the waker isn't really
>>> really going to immediately schedule off as the hint implies.  Nice for
>>> bursty loads and ramp.
>>>
>>> The breakeven point is going up though.  If you don't have nohz
>>> throttled, you eat tick start/stop overhead, and the menu governor
>>> recently added yet more overhead, so maybe we should say hell with it.
>>
>> Exactly, more and more factors to be considered, we say things get
>> balanced but actually it's not the best choice...
>>
>>>
 2. choose cpu0 like the patch proposed
no overhead but tend to make the load a little more unbalance
core0 sgcore1 sg
cpu0cpu1cpu2cpu3
wakee   busyidleidle

 May be we should add a higher scope load balance check in wake_affine(),
 but that means higher overhead which is just what the patch want to
 reduce...
>>>
>>> Yeah, more overhead is the last thing we need.
>>>
 What about some discount for sync case inside select_idle_sibling()?
 For example we consider sync cpu as idle and prefer it more than the 
 others?
>>>
>>> That's what the sync hint does.  Problem is, it's a hint.  If it were
>>> truth, there would be no point in calling select_idle_sibling().
>>
>> Just wondering if the hint was wrong in most of the time, then why don't
>> we remove it...
> 
> For very fast/light network ping-pong micro-benchmarks, it is right.
> For pipe-test, it's absolutely right, jabbering parties are 100%
> synchronous, there is nada/nil/zip/diddly squat overlap reclaimable..
> but in the real world, it ain't necessarily so.
> 
>> Otherwise I think we can still utilize it to make some decision tends to
>> be correct, don't we?
> 
> Sometimes :)

Ok, a double-edged sword I see :)

May be we can wave it carefully here, give the discount to a bigger
scope not the sync cpu, for example:

sg1 sg2
cpu0cpu1cpu2cpu3cpu4cpu5cpu6cpu7
waker   idleidleidleidleidleidleidle

If it's sync wakeup on cpu0 (only waker), and the sg is wide enough,
which means one cpu is not so influencial, then suppose cpu0 to be idle
could be more safe, also prefer sg1 than sg2 is more likely to be right.

And we can still choose idle-cpu at final step, like cpu1 in this case,
to avoid the risk that waker don't get off as it said.

The key point is to reduce the influence of sync, trust a little but not
totally ;-)

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] KVM: PPC: Book3S: Add support for hwrng found on some powernv systems

2013-09-25 Thread Michael Ellerman
Some powernv systems include a hwrng. Guests can access it via the
H_RANDOM hcall.

We add a real mode implementation of H_RANDOM when a hwrng is found.
Userspace can detect the presence of the hwrng by quering the
KVM_CAP_PPC_HWRNG capability.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c|  15 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 119 
 arch/powerpc/kvm/powerpc.c  |   3 +
 arch/powerpc/platforms/powernv/rng.c|  25 +++
 include/uapi/linux/kvm.h|   1 +
 7 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index d853d16..86296d5 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -25,8 +25,15 @@ static inline int arch_get_random_int(unsigned int *v)
return rc;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index b15554a..51966fd 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -177,6 +177,8 @@ extern int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, 
u32 *server,
 extern int kvmppc_xics_int_on(struct kvm *kvm, u32 irq);
 extern int kvmppc_xics_int_off(struct kvm *kvm, u32 irq);
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 8cd0dae..de12592 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "book3s_hv_cma.h"
 /*
@@ -181,3 +182,17 @@ void __init kvm_cma_reserve(void)
kvm_cma_declare_contiguous(selected_size, align_size);
}
 }
+
+int kvmppc_hwrng_present(void)
+{
+   return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+   if (powernv_get_random_real_mode(&vcpu->arch.gpr[4]))
+   return H_SUCCESS;
+
+   return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 294b7af..35ce59e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1502,6 +1502,125 @@ hcall_real_table:
.long   0   /* 0x11c */
.long   0   /* 0x120 */
.long   .kvmppc_h_bulk_remove - hcall_real_table
+   .long   0   /* 0x128 */
+   .long   0   /* 0x12c */
+   .long   0   /* 0x130 */
+   .long   0   /* 0x134 */
+   .long   0   /* 0x138 */
+   .long   0   /* 0x13c */
+   .long   0   /* 0x140 */
+   .long   0   /* 0x144 */
+   .long   0   /* 0x148 */
+   .long   0   /* 0x14c */
+   .long   0   /* 0x150 */
+   .long   0   /* 0x154 */
+   .long   0   /* 0x158 */
+   .long   0   /* 0x15c */
+   .long   0   /* 0x160 */
+   .long   0   /* 0x164 */
+   .long   0   /* 0x168 */
+   .long   0   /* 0x16c */
+   .long   0   /* 0x170 */
+   .long   0   /* 0x174 */
+   .long   0   /* 0x178 */
+   .long   0   /* 0x17c */
+   .long   0   /* 0x180 */
+   .long   0   /* 0x184 */
+   .long   0   /* 0x188 */
+   .long   0   /* 0x18c */
+   .long   0   /* 0x190 */
+   .long   0   /* 0x194 */
+   .long   0   /* 0x198 */
+   .long   0   /* 0x19c */
+   .long   0   /* 0x1a0 */
+   .long   0   /* 0x1a4 */
+   .long   0   /* 0x1a8 */
+   .long   0   /* 0x1ac */
+   .long   0   /* 0x1b0 */
+   .long   0   /* 0x1b4 */
+   .long   0   /* 0x1b8 */
+   .long   0   /* 0x1bc */
+   .long   0   /* 0x1c0 */
+   .long   0   /* 0x1c4 */
+   .long   0   /* 0x1

[PATCH 1/3] powerpc: Implement arch_get_random_long/int() for powernv

2013-09-25 Thread Michael Ellerman
Add the plumbing to implement arch_get_random_long/int(). It didn't seem
worth adding an extra ppc_md hook for int, so we reuse the one for long.

Add an implementation for powernv based on the hwrng found in power7+
systems. We whiten the output of the hwrng, and the result passes all
the dieharder tests.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig|   3 +
 arch/powerpc/include/asm/archrandom.h   |  32 +
 arch/powerpc/include/asm/machdep.h  |   4 ++
 arch/powerpc/platforms/powernv/Kconfig  |   1 +
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/rng.c| 122 
 6 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/archrandom.h
 create mode 100644 arch/powerpc/platforms/powernv/rng.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 38f3b7e..45f16f7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1009,6 +1009,9 @@ config PHYSICAL_START
default "0x"
 endif
 
+config ARCH_RANDOM
+   def_bool n
+
 source "net/Kconfig"
 
 source "drivers/Kconfig"
diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
new file mode 100644
index 000..d853d16
--- /dev/null
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -0,0 +1,32 @@
+#ifndef _ASM_POWERPC_ARCHRANDOM_H
+#define _ASM_POWERPC_ARCHRANDOM_H
+
+#ifdef CONFIG_ARCH_RANDOM
+
+#include 
+
+static inline int arch_get_random_long(unsigned long *v)
+{
+   if (ppc_md.get_random_long)
+   return ppc_md.get_random_long(v);
+
+   return 0;
+}
+
+static inline int arch_get_random_int(unsigned int *v)
+{
+   unsigned long val;
+   int rc;
+
+   rc = arch_get_random_long(&val);
+   if (rc)
+   *v = val;
+
+   return rc;
+}
+
+int powernv_get_random_long(unsigned long *v);
+
+#endif /* CONFIG_ARCH_RANDOM */
+
+#endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 8b48090..ce6cc2a 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -263,6 +263,10 @@ struct machdep_calls {
ssize_t (*cpu_probe)(const char *, size_t);
ssize_t (*cpu_release)(const char *, size_t);
 #endif
+
+#ifdef CONFIG_ARCH_RANDOM
+   int (*get_random_long)(unsigned long *v);
+#endif
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 6fae5eb..2108464 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -9,6 +9,7 @@ config PPC_POWERNV
select EPAPR_BOOT
select PPC_INDIRECT_PIO
select PPC_UDBG_16550
+   select ARCH_RANDOM
default y
 
 config POWERNV_MSI
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 300c437..6760a86 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,5 +1,5 @@
 obj-y  += setup.o opal-takeover.o opal-wrappers.o opal.o
-obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o
+obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o rng.o
 
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/rng.c 
b/arch/powerpc/platforms/powernv/rng.c
new file mode 100644
index 000..51d2e6a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)"powernv-rng: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+struct powernv_rng {
+   void __iomem *regs;
+   unsigned long mask;
+};
+
+static DEFINE_PER_CPU(struct powernv_rng *, powernv_rng);
+
+
+static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
+{
+   unsigned long parity;
+
+   /* Calculate the parity of the value */
+   asm ("popcntd %0,%1" : "=r" (parity) : "r" (val));
+
+   /* xor our value with the previous mask */
+   val ^= rng->mask;
+
+   /* update the mask based on the parity of this value */
+   rng->mask = (rng->mask << 1) | (parity & 1);
+
+   return val;
+}
+
+int powernv_get_random_long(unsigned long *v)
+{
+   struct powernv_rng *rng;
+
+   rng = get_cpu_var(powernv_rng);
+
+   *v = rng_whiten(rng, in_be64(rng->regs));
+
+   put_cpu_var(rng);
+
+   return 1;
+}
+EXPORT_SYMBOL_GPL(powernv_get_random_long);
+
+static __init void rng_init_per_cpu(struct powernv_rng *rng,

Re: [PATCH] kernel/groups.c: consider about NULL for 'group_info' in all related extern functions

2013-09-25 Thread Chen Gang
On 09/26/2013 01:58 PM, Chen Gang wrote:
> On 09/25/2013 12:34 PM, Chen Gang wrote:
>> On 09/25/2013 09:47 AM, Chen Gang wrote:
>>> On 09/25/2013 09:14 AM, Tejun Heo wrote:
 On Wed, Sep 25, 2013 at 09:06:52AM +0800, Chen Gang wrote:
> OK, I see, the 'root cause' is: "you are not the related maintainer
> either", so it is really necessary for me to spend additional time
> resource on it :-(.

 Yeah, at least partly.  That and the fact that I'm not too willing to
 dig into the code without further evidence.  It isn't anything strange
 to ask tho and I'm likely to do that even for subsystems that I know
 intimately if the subject code has been stable / stale for years and
 the analysis doesn't seem immediately convincing.  And, if my
 experience is anything to go by, it's not too unlikely that you might
 hit something which doesn't agree with your current assumptions while
 trying to actually trigger the problem.

 Thanks.

> 
> Oh, not cause issue, the reason is "'groups' exports extern variable
> 'init_groups', when start process, default 'cred' will set it to be
> sure of groups always be initialized".
> 
> Hmm... but after all, I still think this file need be improved: "remove
> the group_info checking in groups_search()", please help check, thanks.
> 
> ---patch begin--
> 
> kernel/groups.c: remove useless "!group_info" checking in groups_search().
> 
>   Since groups_free() need not check 'group_info', groups_search() need
>   not, either, and then in_group_p() and in_egroup_p(), either.
> 
> 
>   'groups' assumes kernel mode callers are sure of 'group_info' valid.
> 

Oh, need use "callers" instead of "kernel mode callers".

>   When process starts, the related kernel mode caller need set default
>   'group_info' firstly (extern variable 'init_group').
>
 
And also need append one sentence: "and the callers also need be sure
of "&init_group" is not passed to groups_free()."


>   If groups_alloc() fails, the caller must not call any related API again
>   with the related invalid 'group_info'.
> 
> 
> Signed-off-by: Chen Gang 
> ---
>  kernel/groups.c |3 ---
>  1 files changed, 0 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 90cf1c3..0a7f81d 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -136,9 +136,6 @@ int groups_search(const struct group_info *group_info, 
> kgid_t grp)
>  {
>   unsigned int left, right;
>  
> - if (!group_info)
> - return 0;
> -
>   left = 0;
>   right = group_info->ngroups;
>   while (left < right) {
> 


-- 
Chen Gang

-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 06/18] regulator: gpio-regulator: use devm_regulator_register()

2013-09-25 Thread Jingoo Han
On Thursday, September 26, 2013 3:20 PM, Sachin Kamat wrote:
> On 26 September 2013 07:36, Jingoo Han  wrote:
> > Use devm_regulator_register() to make cleanup paths simpler.
> > @@ -349,8 +350,6 @@ static int gpio_regulator_remove(struct platform_device 
> > *pdev)
> >  {
> > struct gpio_regulator_data *drvdata = platform_get_drvdata(pdev);
> >
> > -   regulator_unregister(drvdata->dev);
> > -
> > gpio_free_array(drvdata->gpios, drvdata->nr_gpios);
> >
> > kfree(drvdata->states);
> 
> In most of the cases where unregister doesn't happen to be the only or
> last call in the remove path,
> I am not sure if change in ordering wouldn't cause any functional
> issues. For. e.g., in this patch we are freeing the gpios and the
> driver states even before unregistering the device which is logically
> not right.
> 

OK, I see.
I agree with your suggestion in order to keep the code stable.
I will remove this patch from next v3 patch series.

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] hwrng: Add a driver for the hwrng found in power7+ systems

2013-09-25 Thread Michael Ellerman
Add a driver for the hwrng found in power7+ systems, based on the
existing code for the arch_get_random_long() hook.

We only register a single instance of the driver, not one per device,
because we use the existing per_cpu array of devices in the arch code.
This means we always read from the "closest" device, avoiding inter-chip
memory traffic.

Signed-off-by: Guo Chao 
Signed-off-by: Michael Ellerman 
---
 drivers/char/hw_random/Kconfig   | 13 ++
 drivers/char/hw_random/Makefile  |  1 +
 drivers/char/hw_random/powernv-rng.c | 81 
 3 files changed, 95 insertions(+)
 create mode 100644 drivers/char/hw_random/powernv-rng.c

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index 0aa9d91..c206de2 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -290,6 +290,19 @@ config HW_RANDOM_PSERIES
 
  If unsure, say Y.
 
+config HW_RANDOM_POWERNV
+   tristate "PowerNV Random Number Generator support"
+   depends on HW_RANDOM && PPC_POWERNV
+   default HW_RANDOM
+   ---help---
+ This is the driver for Random Number Generator hardware found
+ in POWER7+ and above machines for PowerNV platform.
+
+ To compile this driver as a module, choose M here: the
+ module will be called powernv-rng.
+
+ If unsure, say Y.
+
 config HW_RANDOM_EXYNOS
tristate "EXYNOS HW random number generator support"
depends on HW_RANDOM && HAS_IOMEM && HAVE_CLK
diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
index bed467c..d7d2435 100644
--- a/drivers/char/hw_random/Makefile
+++ b/drivers/char/hw_random/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_HW_RANDOM_NOMADIK) += nomadik-rng.o
 obj-$(CONFIG_HW_RANDOM_PICOXCELL) += picoxcell-rng.o
 obj-$(CONFIG_HW_RANDOM_PPC4XX) += ppc4xx-rng.o
 obj-$(CONFIG_HW_RANDOM_PSERIES) += pseries-rng.o
+obj-$(CONFIG_HW_RANDOM_POWERNV) += powernv-rng.o
 obj-$(CONFIG_HW_RANDOM_EXYNOS) += exynos-rng.o
 obj-$(CONFIG_HW_RANDOM_TPM) += tpm-rng.o
 obj-$(CONFIG_HW_RANDOM_BCM2835) += bcm2835-rng.o
diff --git a/drivers/char/hw_random/powernv-rng.c 
b/drivers/char/hw_random/powernv-rng.c
new file mode 100644
index 000..e6965bf
--- /dev/null
+++ b/drivers/char/hw_random/powernv-rng.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2013 Michael Ellerman, Guo Chao, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int powernv_rng_read(struct hwrng *rng, void *data, size_t max, bool 
wait)
+{
+   unsigned long *buf;
+   int i, len;
+
+   /* We rely on rng_buffer_size() being >= sizeof(unsigned long) */
+   len = max / sizeof(unsigned long);
+
+   buf = (unsigned long *)data;
+
+   for (i = 0; i < len; i++)
+   powernv_get_random_long(buf++);
+
+   return len * sizeof(unsigned long);
+}
+
+static struct hwrng powernv_hwrng = {
+   .name = "powernv-rng",
+   .read = powernv_rng_read,
+};
+
+static int powernv_rng_remove(struct platform_device *pdev)
+{
+   hwrng_unregister(&powernv_hwrng);
+
+   return 0;
+}
+
+static int powernv_rng_probe(struct platform_device *pdev)
+{
+   int rc;
+
+   rc = hwrng_register(&powernv_hwrng);
+   if (rc) {
+   /* We only register one device, ignore any others */
+   if (rc == -EEXIST)
+   rc = -ENODEV;
+
+   return rc;
+   }
+
+   pr_info("registered powernv hwrng.\n");
+
+   return 0;
+}
+
+static struct of_device_id powernv_rng_match[] = {
+   { .compatible   = "ibm,power-rng",},
+   {},
+};
+MODULE_DEVICE_TABLE(of, powernv_rng_match);
+
+static struct platform_driver powernv_rng_driver = {
+   .driver = {
+   .name = "powernv_rng",
+   .of_match_table = powernv_rng_match,
+   },
+   .probe  = powernv_rng_probe,
+   .remove = powernv_rng_remove,
+};
+module_platform_driver(powernv_rng_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Bare metal HWRNG driver for POWER7+ and above");
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 05/18] regulator: fixed: use devm_regulator_register()

2013-09-25 Thread Jingoo Han
On Thursday, September 26, 2013 3:14 PM, Sachin Kamat wrote:
> On 26 September 2013 07:35, Jingoo Han  wrote:
> > Use devm_regulator_register() to make cleanup paths simpler.
> 
> > struct fixed_voltage_data *drvdata = platform_get_drvdata(pdev);
> >
> > -   regulator_unregister(drvdata->dev);
> > kfree(drvdata->desc.supply_name);
> > kfree(drvdata->desc.name);
> 
> Sorry, couldn't go through all your patches yesterday.
> This one looks a bit scary too as some of the driver data is already
> freed before unregistering.
> 

I looked at regulator_unregister(). I am not sure that this is safe.
So, I will remove this patch from next v3 patch series.
Thank you for your comment. :-)

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC V4 PATCH 00/15] Signature verification of hibernate snapshot

2013-09-25 Thread Jiri Kosina
On Wed, 25 Sep 2013, James Bottomley wrote:

> > I don't get this. Why is it important that current kernel can't
> > recreate the signature?
> 
> The thread model is an attack on the saved information (i.e. the suspend
> image) between it being saved by the old kernel and used by the new one.
> The important point isn't that the new kernel doesn't have access to
> K_{N-1} it's that no-one does: the key is destroyed as soon as the old
> kernel terminates however the verification public part P_{N-1} survives.

James,

could you please describe the exact scenario you think that the symmetric 
keys aproach doesn't protect against, while the assymetric key aproach 
does?

The crucial points, which I believe make the symmetric key aproach work 
(and I feel quite embarassed by the fact that I haven't realized this 
initially when coming up with the assymetric keys aproach) are:

- the kernel that is performing the actual resumption is trusted in the 
  secure boot model, i.e. you trust it to perform proper verification

- potentially malicious userspace (which is what we are protecting against 
  -- malicious root creating fake hibernation image and issuing reboot) 
  doesn't have access to the symmetric key

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/21] cpuidle: don't calculate time-diff if entered_state == 0

2013-09-25 Thread Viresh Kumar
On 26 September 2013 04:08, Daniel Lezcano  wrote:
> On 09/22/2013 03:21 AM, Viresh Kumar wrote:
>> If entered_state == 0, we don't need to set dev->last_residency to 'diff' as 
>> we
>> will be setting it to zero without using its new value.
>
> I don't get it, can you elaborate.

Sure.. We have following code in cpuidle_enter_state():

-
diff = ktime_to_us(ktime_sub(time_end, time_start));
if (diff > INT_MAX)
diff = INT_MAX;

dev->last_residency = (int) diff;

if (entered_state >= 0) {
/* Update cpuidle counters */
/* This can be moved to within driver enter routine
 * but that results in multiple copies of same code.
 */
dev->states_usage[entered_state].time += dev->last_residency;
dev->states_usage[entered_state].usage++;
} else {
dev->last_residency = 0;
}
---

We are setting last_residency to 0 when (entered_state < 0) and aren't using
the value of "diff". So, we can actually skip calculations of time_end, diff and
last_residency when (entered_state < 0).. Makes sense?


> We can be a long time in this state
> (eg. if the prediction is false).

Okay.. I didn't get it :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 12/26] ARM: exynos: remove custom .init_time hook

2013-09-25 Thread Sebastian Hesselbarth

On 09/18/2013 07:53 PM, Sebastian Hesselbarth wrote:

With arch/arm calling of_clk_init(NULL) from time_init(), we can now
remove custom .init_time hooks. While at it, also remove some now
redundant includes.

Signed-off-by: Sebastian Hesselbarth 
---


Ping,

Kukjin can you please comment on this patch?

Thanks!


Cc: Olof Johansson 
Cc: Arnd Bergmann 
Cc: Russell King 
Cc: Kukjin Kim 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
  arch/arm/mach-exynos/common.c  |8 
  arch/arm/mach-exynos/common.h  |1 -
  arch/arm/mach-exynos/mach-exynos4-dt.c |2 --
  arch/arm/mach-exynos/mach-exynos5-dt.c |2 --
  4 files changed, 13 deletions(-)

diff --git a/arch/arm/mach-exynos/common.c b/arch/arm/mach-exynos/common.c
index ba95e5d..a4e7ba8 100644
--- a/arch/arm/mach-exynos/common.c
+++ b/arch/arm/mach-exynos/common.c
@@ -26,8 +26,6 @@
  #include 
  #include 
  #include 
-#include 
-#include 
  #include 
  #include 

@@ -367,12 +365,6 @@ static void __init exynos5_map_io(void)
iotable_init(exynos5250_iodesc, ARRAY_SIZE(exynos5250_iodesc));
  }

-void __init exynos_init_time(void)
-{
-   of_clk_init(NULL);
-   clocksource_of_init();
-}
-
  struct bus_type exynos_subsys = {
.name   = "exynos-core",
.dev_name   = "exynos-core",
diff --git a/arch/arm/mach-exynos/common.h b/arch/arm/mach-exynos/common.h
index 8646a14..f0fa205 100644
--- a/arch/arm/mach-exynos/common.h
+++ b/arch/arm/mach-exynos/common.h
@@ -16,7 +16,6 @@
  #include 

  void mct_init(void __iomem *base, int irq_g0, int irq_l0, int irq_l1);
-void exynos_init_time(void);

  struct map_desc;
  void exynos_init_io(void);
diff --git a/arch/arm/mach-exynos/mach-exynos4-dt.c 
b/arch/arm/mach-exynos/mach-exynos4-dt.c
index 0099c6c..6858d73 100644
--- a/arch/arm/mach-exynos/mach-exynos4-dt.c
+++ b/arch/arm/mach-exynos/mach-exynos4-dt.c
@@ -16,7 +16,6 @@
  #include 
  #include 
  #include 
-#include 

  #include 
  #include 
@@ -54,7 +53,6 @@ DT_MACHINE_START(EXYNOS4210_DT, "Samsung Exynos4 (Flattened Device 
Tree)")
.init_early = exynos_firmware_init,
.init_machine   = exynos4_dt_machine_init,
.init_late  = exynos_init_late,
-   .init_time  = exynos_init_time,
.dt_compat  = exynos4_dt_compat,
.restart= exynos4_restart,
.reserve= exynos4_reserve,
diff --git a/arch/arm/mach-exynos/mach-exynos5-dt.c 
b/arch/arm/mach-exynos/mach-exynos5-dt.c
index f874b77..bac2105 100644
--- a/arch/arm/mach-exynos/mach-exynos5-dt.c
+++ b/arch/arm/mach-exynos/mach-exynos5-dt.c
@@ -13,7 +13,6 @@
  #include 
  #include 
  #include 
-#include 

  #include 
  #include 
@@ -76,7 +75,6 @@ DT_MACHINE_START(EXYNOS5_DT, "SAMSUNG EXYNOS5 (Flattened Device 
Tree)")
.map_io = exynos_init_io,
.init_machine   = exynos5_dt_machine_init,
.init_late  = exynos_init_late,
-   .init_time  = exynos_init_time,
.dt_compat  = exynos5_dt_compat,
.restart= exynos5_restart,
.reserve= exynos5_reserve,



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 06/18] regulator: gpio-regulator: use devm_regulator_register()

2013-09-25 Thread Sachin Kamat
On 26 September 2013 07:36, Jingoo Han  wrote:
> Use devm_regulator_register() to make cleanup paths simpler.
> @@ -349,8 +350,6 @@ static int gpio_regulator_remove(struct platform_device 
> *pdev)
>  {
> struct gpio_regulator_data *drvdata = platform_get_drvdata(pdev);
>
> -   regulator_unregister(drvdata->dev);
> -
> gpio_free_array(drvdata->gpios, drvdata->nr_gpios);
>
> kfree(drvdata->states);

In most of the cases where unregister doesn't happen to be the only or
last call in the remove path,
I am not sure if change in ordering wouldn't cause any functional
issues. For. e.g., in this patch we are freeing the gpios and the
driver states even before unregistering the device which is logically
not right.

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] avr32: fix compiler warning

2013-09-25 Thread Hans-Christian Egtvedt
Around Wed 25 Sep 2013 21:50:01 +0200 or thereabout, Gabor Juhos wrote:

Tuning subject to

avr32: cast syscall_return to silence compiler warning

> The patch fixes the following compiler warning:
> CC  arch/avr32/kernel/process.o
>   arch/avr32/kernel/process.c: In function 'copy_thread':
>   arch/avr32/kernel/process.c:292: warning: assignment makes integer \
>   from pointer without a cast

Thank you for fixing.

> Signed-off-by: Gabor Juhos 

Acked-by: Hans-Christian Egtvedt 

> ---
> Note: the patch is against v3.12-rc2.

Added to my for-linus branch.

> ---
>  arch/avr32/kernel/process.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
> index c273100..42a53e74 100644
> --- a/arch/avr32/kernel/process.c
> +++ b/arch/avr32/kernel/process.c
> @@ -289,7 +289,7 @@ int copy_thread(unsigned long clone_flags, unsigned long 
> usp,
>   memset(childregs, 0, sizeof(struct pt_regs));
>   p->thread.cpu_context.r0 = arg;
>   p->thread.cpu_context.r1 = usp; /* fn */
> - p->thread.cpu_context.r2 = syscall_return;
> + p->thread.cpu_context.r2 = (unsigned long)syscall_return;
>   p->thread.cpu_context.pc = (unsigned 
> long)ret_from_kernel_thread;
>   childregs->sr = MODE_SUPERVISOR;
>   } else {
-- 
mvh
Hans-Christian Egtvedt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/26] ARM: sunxi: remove custom .init_time hook

2013-09-25 Thread Sebastian Hesselbarth

On 09/25/2013 10:07 PM, Maxime Ripard wrote:

On Wed, Sep 18, 2013 at 07:53:56PM +0200, Sebastian Hesselbarth wrote:

With arch/arm calling of_clk_init(NULL) from time_init(), we can now
remove custom .init_time hooks.

Signed-off-by: Sebastian Hesselbarth 


Acked-by: Maxime Ripard 

How do you want us to merge this? Every maintainer takes the patches he
handles, or will Kevin and Olof merge the whole thing?


Maxime,

I am still waiting for some ACKs (nomadik, prima2, socfpga, nspire) and
possibly Mike's on the clock changes.

My current idea is to prepare a branch for Olof to pull in as whole.

Sebastian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 19/21] cpuidle: create list of registered drivers

2013-09-25 Thread Viresh Kumar
On 26 September 2013 04:00, Daniel Lezcano  wrote:
> If you introduce a list, you will have to introduce a lock to protect
> it.

I missed it, should have added that :)

> This lock will be in the fast path cpuidle_idle_call with the
> get_driver function and conforming to the comment: "NOTE: no locks or
> semaphores should be used here".
>
> A lock has been introduced in this function already and the system hangs
> with 1024 cpus.

Hmm... I see.. I didn't knew about this expectation.. What about a rcu
read/write lock? As far as I know its too lightweight... Can we have that
in fast path?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v6 1/4] ARM: OMAP2+: cleaned-up DT support of various ECC schemes

2013-09-25 Thread Gupta, Pekon

> 
> Anyway, at this point I think your patch series should be nearly
> complete. I made a few comments on your patches, and I'd imagine you
> only should need one more revision (v7) before I can accept it to the
> l2-mtd.git tree.
> 
Yes surely I will send next version (v7), but it might take few days.
As some more feedbacks on [PATCH 1] are pending for final conclusion
and this needs to be properly re-tested.

And thanks much to you and Olof for the feedbacks.
I agree some of Olof's feedbacks on DT bindings gave me new view to
look at things, And further simplified the NAND driver code.

with regards, pekon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Mike Galbraith
On Thu, 2013-09-26 at 07:34 +0200, Mike Galbraith wrote: 
> On Thu, 2013-09-26 at 13:12 +0800, Michael wang wrote: 
> > On 09/26/2013 11:41 AM, Mike Galbraith wrote:
> > [snip]
> > >> Like the case when we have:
> > >>
> > >>  core0 sgcore1 sg
> > >>  cpu0cpu1cpu2cpu3
> > >>  waker   busyidleidle
> > >>
> > >> If the sync wakeup was on cpu0, we can:
> > >>
> > >> 1. choose cpu in core1 sg like we did usually
> > >>some overhead but tend to make the load a little balance
> > >>  core0 sgcore1 sg
> > >>  cpu0cpu1cpu2cpu3
> > >>  idlebusywakee   idle
> > > 
> > > Reducing latency and increasing throughput when the waker isn't really
> > > really going to immediately schedule off as the hint implies.  Nice for
> > > bursty loads and ramp.
> > > 
> > > The breakeven point is going up though.  If you don't have nohz
> > > throttled, you eat tick start/stop overhead, and the menu governor
> > > recently added yet more overhead, so maybe we should say hell with it.
> > 
> > Exactly, more and more factors to be considered, we say things get
> > balanced but actually it's not the best choice...
> > 
> > > 
> > >> 2. choose cpu0 like the patch proposed
> > >>no overhead but tend to make the load a little more unbalance
> > >>  core0 sgcore1 sg
> > >>  cpu0cpu1cpu2cpu3
> > >>  wakee   busyidleidle
> > >>
> > >> May be we should add a higher scope load balance check in wake_affine(),
> > >> but that means higher overhead which is just what the patch want to
> > >> reduce...
> > > 
> > > Yeah, more overhead is the last thing we need.
> > > 
> > >> What about some discount for sync case inside select_idle_sibling()?
> > >> For example we consider sync cpu as idle and prefer it more than the 
> > >> others?
> > > 
> > > That's what the sync hint does.  Problem is, it's a hint.  If it were
> > > truth, there would be no point in calling select_idle_sibling().
> > 
> > Just wondering if the hint was wrong in most of the time, then why don't
> > we remove it...
> 
> For very fast/light network ping-pong micro-benchmarks, it is right.
> For pipe-test, it's absolutely right, jabbering parties are 100%
> synchronous, there is nada/nil/zip/diddly squat overlap reclaimable..
> but in the real world, it ain't necessarily so.
> 
> > Otherwise I think we can still utilize it to make some decision tends to
> > be correct, don't we?
> 
> Sometimes :)

P.S.  while we're slapping select_idle_sibling()'s _evil_ face, let's
give it a pat on the head too.  It showed regressions in bright red.
Put pipe-test on one core, you only see scheduler weight.. but entering
and exiting idle is part of the fast path, whether you're exercising it
by doing something silly or not.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] cpufreq: exynos4x12: Use the common clock framework to set APLL clock rate

2013-09-25 Thread Yadwinder Singh Brar
Hi Tomasz,

>> On Wed, Sep 25, 2013 at 4:52 PM, Lukasz Majewski  
>> wrote:
>> > In the exynos4x12_set_apll() function, the APLL frequency is set with
>> > direct register manipulation.
>> >
>> > Such approach is not allowed in the common clock framework. The frequency
>> > is changed, but the corresponding clock value is not updated. This causes
>> > wrong frequency read from cpufreq's cpuinfo_cur_freq sysfs attribute.
>> >
>>
>> This patch looks incomplete, leaving the driver in untidy state, perhaps its
>> doesn't fix the above stated problem completely. what about
>> if (!exynos4x12_pms_change(old_index, new_index)) becomes true?
>>
>> IMHO, this driver needs lot more work in addition to this patch to cleanly 
>> and
>> completely move the cpufreq driver to common clock framework.
>
> I agree that the other case needs to be handled as well. Basically the
> whole conditional block dependent on exynos4x12_pms_change() can be safely
> dropped, because this condition is already handled in PLL driver.
>

Exactly!

> Lukasz is already working on further rework of this driver to clean it up
> from legacy code, but this will have to wait for 3.13, as 3.12 is already
> in rc stage and only fixes can be accepted for it.
>
>> For fixing this issue urgently, setting CLK_GET_RATE_NOCACHE for apll
>> in clk driver can also be quicker fix.
>
> Unfortunately this is not how this flag works. It only makes
> clk_get_rate() call ->recalc_rate() operation of the clock instead of
> instantly returning cached rate - it doesn't seem to work recursively.
>

hmm.. yes it can't help in our case as it recursively walks only the subtree
of clk but in our case we are changing rate of parent.

Regards,
Yadwinder
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 05/18] regulator: fixed: use devm_regulator_register()

2013-09-25 Thread Sachin Kamat
Hi Jingoo,

On 26 September 2013 07:35, Jingoo Han  wrote:
> Use devm_regulator_register() to make cleanup paths simpler.

> struct fixed_voltage_data *drvdata = platform_get_drvdata(pdev);
>
> -   regulator_unregister(drvdata->dev);
> kfree(drvdata->desc.supply_name);
> kfree(drvdata->desc.name);

Sorry, couldn't go through all your patches yesterday.
This one looks a bit scary too as some of the driver data is already
freed before unregistering.

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/26] ARM: socfgpa: prepare for arch-wide .init_time callback

2013-09-25 Thread Sebastian Hesselbarth

On 09/18/2013 07:53 PM, Sebastian Hesselbarth wrote:

Current socfpga board init calls of_clk_init() from .machine_init. To
allow consolidation of DT driven .time_init, move of_clock_init() to
a temporary .time_init that will be removed when arch-wide callback is
available.

Signed-off-by: Sebastian Hesselbarth 


Ping,

Dinh can you please comment on this and patch 21/26?

Thanks!


---
Cc: Olof Johansson 
Cc: Arnd Bergmann 
Cc: Dinh Nguyen 
Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
  arch/arm/mach-socfpga/socfpga.c |9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-socfpga/socfpga.c b/arch/arm/mach-socfpga/socfpga.c
index bfce964..6df7bb9 100644
--- a/arch/arm/mach-socfpga/socfpga.c
+++ b/arch/arm/mach-socfpga/socfpga.c
@@ -15,6 +15,7 @@
   * along with this program.  If not, see .
   */
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -90,6 +91,12 @@ static void __init socfpga_init_irq(void)
socfpga_sysmgr_init();
  }

+static void __init socfpga_init_time(void)
+{
+   of_clk_init(NULL);
+   clocksource_of_init();
+}
+
  static void socfpga_cyclone5_restart(enum reboot_mode mode, const char *cmd)
  {
u32 temp;
@@ -107,7 +114,6 @@ static void __init socfpga_cyclone5_init(void)
  {
l2x0_of_init(0, ~0UL);
of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
-   of_clk_init(NULL);
socfpga_init_clocks();
  }

@@ -120,6 +126,7 @@ DT_MACHINE_START(SOCFPGA, "Altera SOCFPGA")
.smp= smp_ops(socfpga_smp_ops),
.map_io = socfpga_map_io,
.init_irq   = socfpga_init_irq,
+   .init_time  = socfpga_init_time,
.init_machine   = socfpga_cyclone5_init,
.restart= socfpga_cyclone5_restart,
.dt_compat  = altera_dt_match,



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/21] cpuidle: don't call poll_idle_init() for every cpu

2013-09-25 Thread Viresh Kumar
On 26 September 2013 03:52, Daniel Lezcano  wrote:
> On 09/22/2013 03:21 AM, Viresh Kumar wrote:
>> Signed-off-by: Viresh Kumar 

This deserved a log, sorry for missing that :(

> The optimization sounds good but IMHO if we can move this state out of
> the cpuidle common framework that would be nicer.
>
> The poll_idle is only applicable for x86 (acpi_driver and intel_idle),
> hence I suggest we move this state to these drivers, that will cleanup
> the framework code and will remove index shift macro
> CPUIDLE_DRIVER_STATE_START which IMHO is weid and prone-to-error.

Lets see what X86 folks have to say about it and then we can do it..
Btw, wouldn't that add some code duplication in those two drivers?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/21] cpuidle: avoid unnecessary kzalloc/free of struct cpuidle_device_kobj

2013-09-25 Thread Viresh Kumar
On 26 September 2013 03:42, Daniel Lezcano  wrote:
> On 09/22/2013 03:21 AM, Viresh Kumar wrote:
>> We always need to allocate struct cpuidle_device_kobj for all CPUs and so 
>> there
>> is no real need to have a pointer to it inside struct cpuidle_device.
>>
>> This patch makes a object instance of struct cpuidle_device_kobj inside 
>> struct
>> cpuidle_device instead of a pointer.
>>
>> Signed-off-by: Viresh Kumar 
>
> nack, it was made in purpose for kobject_init_and_add.
>
> see commit 728ce22b696f9f1404a74d7b2279a65933553a1b

Ahh.. sorry for missing that one :(

Now that I understand why it was introduced, I am thinking if
we can make hotplug path a bit fast? By not freeing sysfs stuff
and only hiding it for time being? And so we wouldn't be required
to do unnecessary initialisations while coming back?

Would that be worth it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel/groups.c: consider about NULL for 'group_info' in all related extern functions

2013-09-25 Thread Chen Gang
On 09/25/2013 12:34 PM, Chen Gang wrote:
> On 09/25/2013 09:47 AM, Chen Gang wrote:
>> On 09/25/2013 09:14 AM, Tejun Heo wrote:
>>> On Wed, Sep 25, 2013 at 09:06:52AM +0800, Chen Gang wrote:
 OK, I see, the 'root cause' is: "you are not the related maintainer
 either", so it is really necessary for me to spend additional time
 resource on it :-(.
>>>
>>> Yeah, at least partly.  That and the fact that I'm not too willing to
>>> dig into the code without further evidence.  It isn't anything strange
>>> to ask tho and I'm likely to do that even for subsystems that I know
>>> intimately if the subject code has been stable / stale for years and
>>> the analysis doesn't seem immediately convincing.  And, if my
>>> experience is anything to go by, it's not too unlikely that you might
>>> hit something which doesn't agree with your current assumptions while
>>> trying to actually trigger the problem.
>>>
>>> Thanks.
>>>

Oh, not cause issue, the reason is "'groups' exports extern variable
'init_groups', when start process, default 'cred' will set it to be
sure of groups always be initialized".

Hmm... but after all, I still think this file need be improved: "remove
the group_info checking in groups_search()", please help check, thanks.

---patch begin--

kernel/groups.c: remove useless "!group_info" checking in groups_search().

  Since groups_free() need not check 'group_info', groups_search() need
  not, either, and then in_group_p() and in_egroup_p(), either.


  'groups' assumes kernel mode callers are sure of 'group_info' valid.

  When process starts, the related kernel mode caller need set default
  'group_info' firstly (extern variable 'init_group').

  If groups_alloc() fails, the caller must not call any related API again
  with the related invalid 'group_info'.


Signed-off-by: Chen Gang 
---
 kernel/groups.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/groups.c b/kernel/groups.c
index 90cf1c3..0a7f81d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -136,9 +136,6 @@ int groups_search(const struct group_info *group_info, 
kgid_t grp)
 {
unsigned int left, right;
 
-   if (!group_info)
-   return 0;
-
left = 0;
right = group_info->ngroups;
while (left < right) {
-- 
1.7.7.6

---patch end

>>
> 
> Excuse me, I have to do some urgent internal things within my company,
> so the test and confirmation may be delayed (I will try to finish test
> within this month: 2013-09-30).
> 
>> Excuse me, my English is not quite well, I do not quite understand what
>> you said (but it seems what you said is reasonable, and not need reply).
>>
>>
>> Thanks.
>>
> 
> 

Thanks.
-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG REPORT] ZSWAP: theoretical race condition issues

2013-09-25 Thread Minchan Kim
Hello Weigie,

On Wed, Sep 25, 2013 at 05:33:43PM +0800, Weijie Yang wrote:
> On Wed, Sep 25, 2013 at 4:31 PM, Bob Liu  wrote:
> > On Wed, Sep 25, 2013 at 4:09 PM, Weijie Yang  
> > wrote:
> >> I think I find a new issue, for integrity of this mail thread, I reply
> >> to this mail.
> >>
> >> It is a concurrence issue either, when duplicate store and reclaim
> >> concurrentlly.
> >>
> >> zswap entry x with offset A is already stored in zswap backend.
> >> Consider the following scenario:
> >>
> >> thread 0: reclaim entry x (get refcount, but not call 
> >> zswap_get_swap_cache_page)
> >>
> >> thread 1: store new page with the same offset A, alloc a new zswap entry y.
> >>   store finished. shrink_page_list() call __remove_mapping(), and now
> >> it is not in swap_cache
> >>
> >
> > But I don't think swap layer will call zswap with the same offset A.
> 
> 1. store page of offset A in zswap
> 2. some time later, pagefault occur, load page data from zswap.
>   But notice that zswap entry x is still in zswap because it is not
> frontswap_tmem_exclusive_gets_enabled.

frontswap_tmem_exclusive_gets_enabled is just option to see tradeoff
between CPU burining by frequent swapout and memory footprint by duplicate
copy in swap cache and frontswap backend so it shouldn't affect the stability.

>  this page is with PageSwapCache(page) and page_private(page) = entry.val
> 3. change this page data, and it become dirty

If non-shared swapin page become redirty, it should remove the page from
swapcache. If shared swapin page become redirty, it should do CoW so it's a
new page so that it doesn't live in swap cache. It means it should have new
offset which is different with old's one for swap out.

What's wrong with that?


> 4. some time later again, swap this page on the same offset A.
> 
> so, a duplicate store happens.
> 
> what I can think is that use flags and CAS to protect store and reclaim on
> the same offset  happens concurrentlly.
> 
> >> thread 0: zswap_get_swap_cache_page called. old page data is added to 
> >> swap_cache
> >>
> >> Now, swap cache has old data rather than new data for offset A.
> >> error will happen If do_swap_page() get page from swap_cache.
> >>
> >
> > --
> > Regards,
> > --Bob
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] NFS: Don't use debug-printk-only local variables

2013-09-25 Thread Olof Johansson
Due to commit 996a2493ed5d3a8d5db03b560fcbbf1ca3f10897 (NFS:
Use i_writecount to control whether to get an fscache cookie in
nfs_open(), in two instances a local variable is only used in debug
print statements. Said statements might be compiled out, which results
in unused variable warnings:

fs/nfs/fscache.c: In function 'nfs_fscache_release_page':
fs/nfs/fscache.c:263:21: warning: unused variable 'nfsi' [-Wunused-variable]

fs/nfs/fscache.c: In function '__nfs_fscache_invalidate_page':
fs/nfs/fscache.c:286:20: warning: unused variable 'nfsi' [-Wunused-variable]

Just skip the local variable in these cases.

Signed-off-by: Olof Johansson 
---

David,

This showed up in today's linux-next, seems to have come in through the fscache 
tree.

Either amend and roll in the fix (after review -- I'm guessing this is the 
right thing to do), or apply on top please. :)


Thanks!


-Olof

 fs/nfs/fscache.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index a01af20..3ef01f0 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -260,12 +260,11 @@ EXPORT_SYMBOL_GPL(nfs_fscache_open_file);
 int nfs_fscache_release_page(struct page *page, gfp_t gfp)
 {
if (PageFsCache(page)) {
-   struct nfs_inode *nfsi = NFS_I(page->mapping->host);
struct fscache_cookie *cookie = 
nfs_i_fscache(page->mapping->host);
 
BUG_ON(!cookie);
dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
-cookie, page, nfsi);
+cookie, page, NFS_I(page->mapping->host));
 
if (!fscache_maybe_release_page(cookie, page, gfp))
return 0;
@@ -283,13 +282,12 @@ int nfs_fscache_release_page(struct page *page, gfp_t gfp)
  */
 void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode)
 {
-   struct nfs_inode *nfsi = NFS_I(inode);
struct fscache_cookie *cookie = nfs_i_fscache(inode);
 
BUG_ON(!cookie);
 
dfprintk(FSCACHE, "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
-cookie, page, nfsi);
+cookie, page, NFS_I(inode));
 
fscache_wait_on_page_write(cookie, page);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch guide for newbies

2013-09-25 Thread Julia Lawall
On Thu, 26 Sep 2013, Alexander Holler wrote:

> Am 26.09.2013 05:04, schrieb Al Viro:
> > On Thu, Sep 26, 2013 at 04:57:32AM +0200, Alexander Holler wrote:
> > > Am 26.09.2013 04:52, schrieb Alexander Holler:
> > > 
> > > > I'm aware of people which do nest 8 levels deep just to avoid a return,
> > > > break or goto.
> > > > 
> > > > But trying to limit that by limiting the line length is like ...
> > > > (choose your own own misguided comparison, it's too late for me I
> > > > currently only meorize some of those which don't make sense in english)
> > > 
> > > But I'm still able to offer a solution: ;)
> > > 
> > > limit the number of tabs, not the line length (at least not to 80).
> > 
> > With that limited (and it's visually harder to keep track of), what's
> > the problem with 80-column limit on line length?  Just how long do
> > you want those "descriptive names" to be?
> 
> Oh, personally I don't have any limit there. ;) I like descriptive function
> and variable names whenever they make sense. And often they make comments
> uneccessary and therefor prevent errors because those descriptive names are
> visible whenever the function or variable is used, and comments usually appear
> only once and get forgotten when scrolled out of the screen.
> 
> But just take a function like
> 
> void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
> struct timespec *wtom, struct timespec
> *sleep);
> 
> I like such function names ;) (ok I wouldn't have use those and), but it's
> hard to press this into 80 characters, especially when the arguments should
> have some meaning too (e.g. what does wtom stand for?)
> 
> If you use that somewhere you get
> 
> get_xtime_and_monotonic_and_sleep_offset(a, b, c)
> 
> using silly names and that already is a 58 characters long. So only 22 are
> left to distribute over 3 variable names. And now think what happens if that
> wouldn't be a void function.

Personally, I prefer to use my screen real estate for multiple 80-column 
windows, so I can see different parts of the code at once.  Anything that 
goes over 80 columns is very hard to read.

Perhaps it is a bad example, but I don't even find this very long name 
very understandable.  Monotonic is an adjective and xtime and sleep are 
nouns, so I don't understand how it all fits together.  Maybe cramming a 
lot of information into a variable name is not always so successful... 
Actually, I really appreciate comments on functions, that explain the 
purpose of the function, and the constraints on its usage.

julia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/21] cpuidle: replace multiline statements with single line in cpuidle_idle_call()

2013-09-25 Thread Viresh Kumar
On 26 September 2013 03:33, Daniel Lezcano  wrote:
> I splitted these lines because they have 81 cols.

>> - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
>> -&dev->cpu);
>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, 
>> &dev->cpu);

This one comes in 80 columns

>> - clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
>> -&dev->cpu);
>> + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

And this one in 79..

And so I did this change.. I have checked it again now..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] clk: samsung: Remove old clock drivers for s5pc100 and s5pv210

2013-09-25 Thread Kukjin Kim
Kukjin Kim wrote:
> 
> Mateusz Krawczuk wrote:
> >
> > This patch removes old clock management code for S5PC100 and
> > S5PC110/S5PV210,
> > since the platform has been already moved to the new clock driver
> > using Common Clock Framework.
> >
> > Signed-off-by: Mateusz Krawczuk 
> > Signed-off-by: Kyungmin Park 
> > ---
> > This patch series require
> > S5PC100 and S5PV210 patch series moving to common clock framework.
> >
> >  arch/arm/mach-s5pc100/Kconfig  |2 +-
> >  arch/arm/mach-s5pc100/Makefile |1 -
> >  arch/arm/mach-s5pc100/clock.c  | 1361
--
> -
> > 
> >  arch/arm/mach-s5pv210/Kconfig  |2 +-
> >  arch/arm/mach-s5pv210/Makefile |1 -
> >  arch/arm/mach-s5pv210/clock.c  | 1365
--
> -
> > -
> >  6 files changed, 2 insertions(+), 2730 deletions(-)
> >  delete mode 100644 arch/arm/mach-s5pc100/clock.c
> >  delete mode 100644 arch/arm/mach-s5pv210/clock.c
> >
> In this case, would be easier to review with using "-D" for format-patch.
> 
> And this should be followed by moving common clk series for
> s5pc100/s5pv210.
> See s3c64xx common clk patches in v3.13-next/common-clk-s3c64xx branch as
> a
> reference.
> 
> And I have never seen common clk stuff for s5pc100. In addition, I'm not

Oh, I found, there is series for common clk s5pc100 :)

> sure we _still_ need to support s5pc100 in mainline.
> 
Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC] clk: samsung: Remove old clock drivers for s5pc100 and s5pv210

2013-09-25 Thread Kukjin Kim
Mateusz Krawczuk wrote:
> 
> This patch removes old clock management code for S5PC100 and
> S5PC110/S5PV210,
> since the platform has been already moved to the new clock driver
> using Common Clock Framework.
> 
> Signed-off-by: Mateusz Krawczuk 
> Signed-off-by: Kyungmin Park 
> ---
> This patch series require
> S5PC100 and S5PV210 patch series moving to common clock framework.
> 
>  arch/arm/mach-s5pc100/Kconfig  |2 +-
>  arch/arm/mach-s5pc100/Makefile |1 -
>  arch/arm/mach-s5pc100/clock.c  | 1361 ---
> 
>  arch/arm/mach-s5pv210/Kconfig  |2 +-
>  arch/arm/mach-s5pv210/Makefile |1 -
>  arch/arm/mach-s5pv210/clock.c  | 1365 ---
> -
>  6 files changed, 2 insertions(+), 2730 deletions(-)
>  delete mode 100644 arch/arm/mach-s5pc100/clock.c
>  delete mode 100644 arch/arm/mach-s5pv210/clock.c
> 
In this case, would be easier to review with using "-D" for format-patch.

And this should be followed by moving common clk series for s5pc100/s5pv210.
See s3c64xx common clk patches in v3.13-next/common-clk-s3c64xx branch as a
reference.

And I have never seen common clk stuff for s5pc100. In addition, I'm not
sure we _still_ need to support s5pc100 in mainline.

Thanks,
- Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: squashfs enhance [RFC 3/5] squashfs: remove cache for normal data page

2013-09-25 Thread Minchan Kim
Hello Phillip,

On Thu, Sep 26, 2013 at 04:18:39AM +0100, Phillip Lougher wrote:
> 
> Minchan Kim  wrote:
> >
> >Hello chintu,
> >
> >
> >On Wed, Sep 25, 2013 at 5:13 PM, chintu kumar  wrote:
> >>Hi Minchan,
> >>
> >>Thanks for the response!
> >>
> >>I checked add_to_page_cache_lru and you are right for existing pages in page
> >>cache it won't add the page you allocated so my questions are answered thank
> >>you. However my concern is still regarding parallel calls to
> >>squashfs_readpage for the same inode.
> >>
> >>I'll try to rephrase,
> >>
> >>Actually I was concerned about allocation on every squashfs_readpage. In the
> >>original code suppose that all pages weren't present in page cache and we
> >>are taking random read on the file. Now in the original code the
> >>cache->entry would've been filled once for all the requested pages within
> >>that block.
> >>
> >>However with your patch we always do I/O however and as you said
> >>add_to_page_cache_lru would take care of adding that page to the inode's
> >>mapping so we need not worry about that [Correct?] . Now suppose that you've
> >>read those pages as well for which squashfs_readpage has been called however
> >>since you won't be able to add the page to page_cache_lru it'll do I/O again
> >>just to fill the page that a different squashfs_readpage wasn't able to
> >>fill.
> >>
> >>timeline
> >>|
> >>|
> >>>  Process 1 (VFS initiates read for page 2)
> >>|
> >>> Initiates read for pages 1-5
> >>Process 2 (VFS intiates read for page 3)
> >>|
> >>> attempts to add the pages read in page cache
> >>Initiates read for pages 1-5 again
> >>   and fails for page 3 since VFS already initiated
> >>attempts to add pages read in page cache
> >>   read for page 3.
> >>and fails for every page. Page 3 isn't added since you got it from VFS in
> >>squashfs_readpage
> >>
> >>
> >>So it means that cost of memcpy in the original code from cache->entry to
> >>the page cache page IS MORE than   __page_cache_alloc + I/O? that you did in
> >>your patch.
> >
> >If the race happen with same CPU, buffer cache will mitigate a problem
> >but it happens on different CPUs,
> >we will do unnecessary I/O. But I think it's not a severe regression
> >because old code's cache coverage is
> >too limited and even stuck with concurrent I/O due to *a* cache.
> >
> >One of solution is to add locked pages right before request I/O to
> >page cache and if one of the page
> >in the compressed block is found from page cache, it means another
> >stream is going on so we can wait to finish
> >I/O and copy, just return. Does it make sense?
> >
> 
> No it doesn't, there's an obvious logical mistake there.
> 
> I' not convinced about the benefits of this patch.  The
> use of the "unnecessary" cache was there to prevent this race
> condition from happening.  The first racing process enters
> readpage, and grabs the cache entry signifying it is doing a decompress
> for that and the other pages.  A second racing process enters
> readpage, see a decompress is in progress, and waits until it is finished.
> The first process then fills in all the pages it can grab (bar the
> page locked by the second process), and the second process fills in its
> page from the cache entry.  Nice and simple, and no unnecessary extra
> I/O or decompression takes place.
> 
> It is your job to prove that the removal of memcpy outweighs the
> introduction of this regression.  Waving your hands and just
> dismissing this inconvenient issue isn't good enough.

I didn't dismiss and I see the problem.
I will handle it in next version. ;-)

> 
> I have a couple of other reservations about this patch, as it
> introduces two additional regressions.  This will be in my review.
> 
> As I said in my reply to your private message on Monday (where you
> asked me to review the patches), I reviewed them on the weekend.  As yet, I
> have not had time to pull my observations into something suitable for
> the kernel mailing list.  That will be done ASAP, but, due to full-time
> work commitments I may not have any time until the weekend again.

No problem. I will wait and want to handle all of issues from you and
others all at once. :)

> 
> On the subject of review, I notice this email contains patch review and
> discussion off-list.  That is not good kernel etiquette, patch discussion
> should be open, public and in full view, let's keep it that way.
> Also as maintainer I should have been CC'd on this email, rather than
> accidently discovering it on the kernel mailing list.

Argh, I just did reply-to-all without noticing Chintu omitting you and LKML.
Chintu, Please no touch to To and Cc line and use plain text mode for your MUA.

Thanks for the reivew.

> 
> Phillip
> 
> 
> >>
> >>
> >>Regards,
> >>Chintu
> >>
> >>
> >>
> >>On Wed, Sep 25, 2013 at 5:52 AM, Minchan Kim  wrote:
> >>>
> >>>Hello,
> >>>
> >>>On Tue, Sep 24, 2013 at 6:07 PM, chintu kumar 
> >>>wrote:
>  Hi Minchan,
> 
> 
>  In the below sn

Re: [PATCH -v2] cpufreq: skip loading acpi_cpufreq after intel_pstate

2013-09-25 Thread Viresh Kumar
On 26 September 2013 10:47, Yinghai Lu  wrote:
> can you put this one in acpi tree?

Its already applied in Rafael's linux-next branch.. Do you want something
else?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Mike Galbraith
On Thu, 2013-09-26 at 13:12 +0800, Michael wang wrote: 
> On 09/26/2013 11:41 AM, Mike Galbraith wrote:
> [snip]
> >> Like the case when we have:
> >>
> >>core0 sgcore1 sg
> >>cpu0cpu1cpu2cpu3
> >>waker   busyidleidle
> >>
> >> If the sync wakeup was on cpu0, we can:
> >>
> >> 1. choose cpu in core1 sg like we did usually
> >>some overhead but tend to make the load a little balance
> >>core0 sgcore1 sg
> >>cpu0cpu1cpu2cpu3
> >>idlebusywakee   idle
> > 
> > Reducing latency and increasing throughput when the waker isn't really
> > really going to immediately schedule off as the hint implies.  Nice for
> > bursty loads and ramp.
> > 
> > The breakeven point is going up though.  If you don't have nohz
> > throttled, you eat tick start/stop overhead, and the menu governor
> > recently added yet more overhead, so maybe we should say hell with it.
> 
> Exactly, more and more factors to be considered, we say things get
> balanced but actually it's not the best choice...
> 
> > 
> >> 2. choose cpu0 like the patch proposed
> >>no overhead but tend to make the load a little more unbalance
> >>core0 sgcore1 sg
> >>cpu0cpu1cpu2cpu3
> >>wakee   busyidleidle
> >>
> >> May be we should add a higher scope load balance check in wake_affine(),
> >> but that means higher overhead which is just what the patch want to
> >> reduce...
> > 
> > Yeah, more overhead is the last thing we need.
> > 
> >> What about some discount for sync case inside select_idle_sibling()?
> >> For example we consider sync cpu as idle and prefer it more than the 
> >> others?
> > 
> > That's what the sync hint does.  Problem is, it's a hint.  If it were
> > truth, there would be no point in calling select_idle_sibling().
> 
> Just wondering if the hint was wrong in most of the time, then why don't
> we remove it...

For very fast/light network ping-pong micro-benchmarks, it is right.
For pipe-test, it's absolutely right, jabbering parties are 100%
synchronous, there is nada/nil/zip/diddly squat overlap reclaimable..
but in the real world, it ain't necessarily so.

> Otherwise I think we can still utilize it to make some decision tends to
> be correct, don't we?

Sometimes :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] ARM: SAMSUNG: S5P: If detected device tree skip irq init

2013-09-25 Thread Kukjin Kim
Tomasz Figa wrote:
> 
> Hi Mateusz,
> 
> On Tuesday 24 of September 2013 17:59:23 Mateusz Krawczuk wrote:
> > It prevents from executing things already handled by pinctrl-exynos
> > driver, when device tree is available.
> >
> > Signed-off-by: Mateusz Krawczuk 
> > Signed-off-by: Kyungmin Park 
> > ---
> >  arch/arm/plat-samsung/s5p-irq-eint.c | 4 
> >  1 file changed, 4 insertions(+)
> 
> Acked-by: Tomasz Figa 
> 
> [as a temporary solution until we move all the users of this code to
> device tree]
> 
Yes, agreed.

The 'skip irq init' in subject would be changed to 'skip eint init' for
clear meaning.

Applied.

Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU

2013-09-25 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, September 26, 2013 12:37 AM
> To: Bhushan Bharat-R65777
> Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
> linux-
> ker...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
> foundation.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU
> 
> On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > This patch adds vfio iommu support for Freescale IOMMU (PAMU -
> > Peripheral Access Management Unit).
> >
> > The Freescale PAMU is an aperture-based IOMMU with the following
> > characteristics.  Each device has an entry in a table in memory
> > describing the iova->phys mapping. The mapping has:
> >   -an overall aperture that is power of 2 sized, and has a start iova that
> >is naturally aligned
> >   -has 1 or more windows within the aperture
> >   -number of windows must be power of 2, max is 256
> >   -size of each window is determined by aperture size / # of windows
> >   -iova of each window is determined by aperture start iova / # of windows
> >   -the mapped region in each window can be different than
> >the window size...mapping must power of 2
> >   -physical address of the mapping must be naturally aligned
> >with the mapping size
> >
> > Some of the code is derived from TYPE1 iommu 
> > (driver/vfio/vfio_iommu_type1.c).
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/Kconfig   |6 +
> >  drivers/vfio/Makefile  |1 +
> >  drivers/vfio/vfio_iommu_fsl_pamu.c |  952
> 
> >  include/uapi/linux/vfio.h  |  100 
> >  4 files changed, 1059 insertions(+), 0 deletions(-)  create mode
> > 100644 drivers/vfio/vfio_iommu_fsl_pamu.c
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > 26b3d9d..7d1da26 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
> > depends on VFIO && SPAPR_TCE_IOMMU
> > default n
> >
> > +config VFIO_IOMMU_FSL_PAMU
> > +   tristate
> > +   depends on VFIO
> > +   default n
> > +
> >  menuconfig VFIO
> > tristate "VFIO Non-Privileged userspace driver framework"
> > depends on IOMMU_API
> > select VFIO_IOMMU_TYPE1 if X86
> > select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
> > +   select VFIO_IOMMU_FSL_PAMU if FSL_PAMU
> > help
> >   VFIO provides a framework for secure userspace device drivers.
> >   See Documentation/vfio.txt for more details.
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
> > c5792ec..7461350 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -1,4 +1,5 @@
> >  obj-$(CONFIG_VFIO) += vfio.o
> >  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o
> > vfio_iommu_type1.o
> >  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o
> > vfio_iommu_spapr_tce.o
> > +obj-$(CONFIG_VFIO_IOMMU_FSL_PAMU) += vfio_iommu_common.o
> > +vfio_iommu_fsl_pamu.o
> >  obj-$(CONFIG_VFIO_PCI) += pci/
> > diff --git a/drivers/vfio/vfio_iommu_fsl_pamu.c
> > b/drivers/vfio/vfio_iommu_fsl_pamu.c
> > new file mode 100644
> > index 000..b29365f
> > --- /dev/null
> > +++ b/drivers/vfio/vfio_iommu_fsl_pamu.c
> > @@ -0,0 +1,952 @@
> > +/*
> > + * VFIO: IOMMU DMA mapping support for FSL PAMU IOMMU
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify
> > + * it under the terms of the GNU General Public License, version 2,
> > +as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
> > USA.
> > + *
> > + * Copyright (C) 2013 Freescale Semiconductor, Inc.
> > + *
> > + * Author: Bharat Bhushan 
> > + *
> > + * This file is derived from driver/vfio/vfio_iommu_type1.c
> > + *
> > + * The Freescale PAMU is an aperture-based IOMMU with the following
> > + * characteristics.  Each device has an entry in a table in memory
> > + * describing the iova->phys mapping. The mapping has:
> > + *  -an overall aperture that is power of 2 sized, and has a start iova 
> > that
> > + *   is naturally aligned
> > + *  -has 1 or more windows within the aperture
> > + * -number of windows must be power of 2, max is 256
> > + * -size of each window is determined by aperture size / # of windows
> > + * -iova of each win

[PATCH] perf: fix infinite loop with corrupted header

2013-09-25 Thread Sonny Rao
We recently ran into a corrupt perf data file which mostly looked okay
but the section size for data was set to 0.  This caused perf report to
get into an infinite loop in __perf_session_process_events().  Let's
just avoid this by bailing early and reporting it if there's an
invalid header.

Signed-off-by: Sonny Rao 
---
 tools/perf/util/header.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 26441d0..085ef76 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2582,6 +2582,10 @@ int perf_file_header__read(struct perf_file_header 
*header,
ph->data_offset  = header->data.offset;
ph->data_size= header->data.size;
ph->feat_offset  = header->data.offset + header->data.size;
+
+   if (!header->data.size)
+   die("corrupted header, invalid size 0 for data section\n");
+
return 0;
 }
 
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ipvs: improved SH fallback strategy

2013-09-25 Thread Julian Anastasov

Hello,

On Wed, 25 Sep 2013, Alexander Frolkin wrote:

> Improve the SH fallback realserver selection strategy.
> 
> With sh and sh-fallback, if a realserver is down, this attempts to
> distribute the traffic that would have gone to that server evenly
> among the remaining servers.
>  
> Signed-off-by: Alexander Frolkin 
> 
> --
> diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
> index 3588fae..3d5ab7c 100644
> --- a/net/netfilter/ipvs/ip_vs_sh.c
> +++ b/net/netfilter/ipvs/ip_vs_sh.c
> @@ -115,27 +115,47 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct 
> ip_vs_sh_state *s,
>  }
>  
>  
> -/* As ip_vs_sh_get, but with fallback if selected server is unavailable */
> +/* As ip_vs_sh_get, but with fallback if selected server is unavailable
> + *
> + * The fallback strategy loops around the table starting from a "random"
> + * point (in fact, it is chosen to be the original hash value to make the
> + * algorithm deterministic) to find a new server.
> + */
>  static inline struct ip_vs_dest *
>  ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
> const union nf_inet_addr *addr, __be16 port)
>  {
> - unsigned int offset;
> - unsigned int hash;
> + unsigned int offset, roffset;
> + unsigned int hash, ihash;
>   struct ip_vs_dest *dest;
>  
> - for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
> - hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
> - dest = rcu_dereference(s->buckets[hash].dest);
> - if (!dest)
> - break;
> - if (is_unavailable(dest))
> - IP_VS_DBG_BUF(6, "SH: selected unavailable server "
> -   "%s:%d (offset %d)",
> + /* first try the dest it's supposed to go to */
> + ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
> + dest = rcu_dereference(s->buckets[ihash].dest);
> + if (!dest)
> + return NULL;

Can we reduce the indentation here, eg:

if (!is_unavailable(dest))
return dest;
IP_VS_DBG_BUF(6, "SH: selected unavailable server "
...
for ()...
...
return NULL;

> + if (is_unavailable(dest)) {
> + IP_VS_DBG_BUF(6, "SH: selected unavailable server "
> +   "%s:%d, reselecting",
> +   IP_VS_DBG_ADDR(svc->af, &dest->addr),
> +   ntohs(dest->port));
> + /* if the original dest is unavailable, loop around the table
> +  * starting from ihash to find a new dest
> +  */
> + for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
> + roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
> + hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
> + dest = rcu_dereference(s->buckets[hash].dest);

Every result of rcu_dereference should be checked
for NULL (no dests anymore):
if (!dest)
break;

Then make sure there is correct indentation
for IP_VS_DBG_BUF parameters.

> + if (is_unavailable(dest))
> + IP_VS_DBG_BUF(6, "SH: selected unavailable "
> +   "server %s:%d (offset %d), reselecting",
> IP_VS_DBG_ADDR(svc->af, &dest->addr),
> -   ntohs(dest->port), offset);
> - else
> - return dest;
> +   ntohs(dest->port), roffset);
> + else
> + return dest;
> + }
> + } else {
> + return dest;
>   }
>  
>   return NULL;

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] cpufreq: skip loading acpi_cpufreq after intel_pstate

2013-09-25 Thread Yinghai Lu
On Fri, Sep 20, 2013 at 10:43 AM, Yinghai Lu  wrote:
> If the hw support intel_pstate and acpi_cpufreq, intel_pstate will
> get loaded first.
>
> acpi_cpufreq_init will call acpi_cpufreq_early_init()
> and that will allocate perf data and init those perf data in ACPI core,
> (that will cover all cpus). But later it will free them as
> cpufreq_register_driver(acpi_cpufreq) will fail as init_pstate is
> already registered
>
> Use cpufreq_get_current_driver() to check if we can skip the
> acpi_cpufreq loading.
>
> -v2: update changelog and separate second part to another patch, according
>  to Viresh.
>
> Signed-off-by: Yinghai Lu 
>
> ---
>  drivers/cpufreq/acpi-cpufreq.c |4 
>  1 file changed, 4 insertions(+)
>
> Index: linux-2.6/drivers/cpufreq/acpi-cpufreq.c
> ===
> --- linux-2.6.orig/drivers/cpufreq/acpi-cpufreq.c
> +++ linux-2.6/drivers/cpufreq/acpi-cpufreq.c
> @@ -986,6 +986,10 @@ static int __init acpi_cpufreq_init(void
>  {
> int ret;
>
> +   /* don't keep reloading if cpufreq_driver exists */
> +   if (cpufreq_get_current_driver())
> +   return 0;
> +
> if (acpi_disabled)
> return 0;
>

Rafael,

can you put this one in acpi tree?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] ACPI / video: Do not register backlight if win8 and native interface exists

2013-09-25 Thread Aaron Lu
On Wed, Sep 25, 2013 at 07:53:13PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, September 24, 2013 05:47:31 PM Aaron Lu wrote:
> > According to Matthew Garrett, "Windows 8 leaves backlight control up
> > to individual graphics drivers rather than making ACPI calls itself.
> > There's plenty of evidence to suggest that the Intel driver for
> > Windows [8] doesn't use the ACPI interface, including the fact that
> > it's broken on a bunch of machines when the OS claims to support
> > Windows 8.  The simplest thing to do appears to be to disable the
> > ACPI backlight interface on these systems".
> > 
> > So for Win8 systems, if there is native backlight control interface
> > registered by GPU driver, ACPI video will not register its own. For
> > users who prefer to keep ACPI video's backlight interface, the existing
> > kernel cmdline option acpi_backlight=video can be used.
> 
> I think the idea is to use the aggressive default for now and we can switch 
> the
> default back to the current behavior before the merge window in case there are
> too many problems with it?

Yes I think so.

Thanks,
Aaron
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Michael wang
On 09/26/2013 11:41 AM, Mike Galbraith wrote:
[snip]
>> Like the case when we have:
>>
>>  core0 sgcore1 sg
>>  cpu0cpu1cpu2cpu3
>>  waker   busyidleidle
>>
>> If the sync wakeup was on cpu0, we can:
>>
>> 1. choose cpu in core1 sg like we did usually
>>some overhead but tend to make the load a little balance
>>  core0 sgcore1 sg
>>  cpu0cpu1cpu2cpu3
>>  idlebusywakee   idle
> 
> Reducing latency and increasing throughput when the waker isn't really
> really going to immediately schedule off as the hint implies.  Nice for
> bursty loads and ramp.
> 
> The breakeven point is going up though.  If you don't have nohz
> throttled, you eat tick start/stop overhead, and the menu governor
> recently added yet more overhead, so maybe we should say hell with it.

Exactly, more and more factors to be considered, we say things get
balanced but actually it's not the best choice...

> 
>> 2. choose cpu0 like the patch proposed
>>no overhead but tend to make the load a little more unbalance
>>  core0 sgcore1 sg
>>  cpu0cpu1cpu2cpu3
>>  wakee   busyidleidle
>>
>> May be we should add a higher scope load balance check in wake_affine(),
>> but that means higher overhead which is just what the patch want to
>> reduce...
> 
> Yeah, more overhead is the last thing we need.
> 
>> What about some discount for sync case inside select_idle_sibling()?
>> For example we consider sync cpu as idle and prefer it more than the others?
> 
> That's what the sync hint does.  Problem is, it's a hint.  If it were
> truth, there would be no point in calling select_idle_sibling().

Just wondering if the hint was wrong in most of the time, then why don't
we remove it...

Otherwise I think we can still utilize it to make some decision tends to
be correct, don't we?

Regards,
Michael Wang

> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 RESEND 0/5] ARM/ARM64 architected timer updates

2013-09-25 Thread Olof Johansson
On Wed, Sep 18, 2013 at 12:05:01PM +0100, Sudeep KarkadaNagesha wrote:
> From: Sudeep KarkadaNagesha 
> 
> Hi Olof/Kevin,
> 
> I am reposting this series as suggested by Olof[1]. This is reviewed and
> acked by Will and Catalin. Daniel needs Ack from arm-soc maintainers
> to take these through his tree. Can you review this ?

Acked-by: Olof Johansson 

Sorry for the delay.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Update sem_otime for all operations

2013-09-25 Thread Manfred Spraul
Hi Jia,

Could you check if the patch below resolves the bug you have reported?
Just this patch, i.e. without your proposal.

I want to leave the optimization for the get_seconds() call:
We must update sem_otime in two places anyway
(either perform_atomic_semop() and exit_sem() or
do_smart_update() and semtimedop())

--
Manfred

In commit 0a2b9d4c,the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But: Since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

Signed-off-by: Manfred Spraul 
Reported-by: Jia He 
---
 ipc/sem.c | 41 +
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index e5d9bb8..ec83f79 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -910,6 +910,24 @@ again:
 }
 
 /**
+ * set_semotime(sma, sops) - set sem_otime
+ * @sma: semaphore array
+ * @sops: operations that modified the array, may be NULL
+ *
+ * sem_otime is replicated to avoid cache line trashing.
+ * This function sets one instance to the current time.
+ */
+static void set_semotime(struct sem_array *sma, struct sembuf *sops)
+{
+   if (sops == NULL) {
+   sma->sem_base[0].sem_otime = get_seconds();
+   } else {
+   sma->sem_base[sops[0].sem_num].sem_otime =
+   get_seconds();
+   }
+}
+
+/**
  * do_smart_update(sma, sops, nsops, otime, pt) - optimized update_queue
  * @sma: semaphore array
  * @sops: operations that were performed
@@ -959,17 +977,10 @@ static void do_smart_update(struct sem_array *sma, struct 
sembuf *sops, int nsop
}
}
}
-   if (otime) {
-   if (sops == NULL) {
-   sma->sem_base[0].sem_otime = get_seconds();
-   } else {
-   sma->sem_base[sops[0].sem_num].sem_otime =
-   get_seconds();
-   }
-   }
+   if (otime)
+   set_semotime(sma, sops);
 }
 
-
 /* The following counts are associated to each semaphore:
  *   semncntnumber of tasks waiting on semval being nonzero
  *   semzcntnumber of tasks waiting on semval being zero
@@ -1831,10 +1842,16 @@ SYSCALL_DEFINE4(semtimedop, int, semid, struct sembuf 
__user *, tsops,
 
error = perform_atomic_semop(sma, sops, nsops, un,
task_tgid_vnr(current));
-   if (error <= 0) {
-   if (alter && error == 0)
+   if (error == 0) {
+   /* If the operation was successful, then do
+* the required updates.
+*/
+   if (alter)
do_smart_update(sma, sops, nsops, 1, &tasks);
-
+   else
+   set_semotime(sma, sops);
+   }
+   if (error <= 0) {
goto out_unlock_free;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/21] cpuidle: use cpuidle_disabled() instead of "off"

2013-09-25 Thread Viresh Kumar
On 26 September 2013 03:22, Daniel Lezcano  wrote:
> On 09/22/2013 03:21 AM, Viresh Kumar wrote:
>> We have a routine for getting value of "off", better call that instead of 
>> using
>> "off" directly.
>
> We are in the fast path, I am not sure invoking a function here is
> better than using directly the static variable.

I only did it for consistency as we have this routine specifically for reading
value of "off" and so we better don't use off directly..

Probably we can make it static inline and move it into
drivers/cpuidle/cpuidle.h?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3] pci: exynos: split into two parts such as Synopsys part and Exynos part

2013-09-25 Thread Kishon Vijay Abraham I
On Wednesday 25 September 2013 02:53 AM, Arnd Bergmann wrote:
> On Monday 23 September 2013, Kishon Vijay Abraham I wrote:
>> Btw if we hadn't programmed inbound translation table, the address will go
>> untranslated (according to the data book). I guess that's how it was working
>> for Jingoo Han.
>>
>> **
>> 3.10.4
>> Inbound iATU Operation
>>
>> When there is no match, then the address is untranslated
>> **
>>
> 
> Well, that should work just as well, since you have a 1:1 translation anyway.
> Do you get the same error without the translation?

Yes. I get the same non-fatal error interrupt in RC.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 2/3] DMA: Freescale: Add new 8-channel DMA engine device tree nodes

2013-09-25 Thread Hongbo Zhang

On 09/26/2013 10:28 AM, David Gibson wrote:

On Wed, Sep 25, 2013 at 08:46:32PM -0500, Scott Wood wrote:

On Wed, 2013-09-25 at 15:35 +0800, Hongbo Zhang wrote:

By the way, I know maybe it is difficult, but why not introduce a
document of maintaining rules for the dt binding docs? we have dedicated
maintainers for this part now. Description language from one submitter
cannot satisfy every reviewer/maintainer, for a reg property, is it
necessary to say "offset and length",

Don't say "offset and length".  It's both redundant with the base
definition of the reg property, and overly specific because it makes
assumptions about how the parent node's ranges are set up (sometimes we
want to be that specific, but usually not).


Thanks for your answer Scott.
In fact my questions are mainly sample questions to file the necessary 
rules of dt binding.

To look at it another way, the format of the 'reg' property is defined
by the parent bus's binding, not the binding of the node itself.

Whatever the rule is, if it is reasonable and accepted, just as I said, 
we need to file it.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2] irqchip: exynos-combiner: remove hard-coded irq_base value

2013-09-25 Thread Kukjin Kim
Chander Kashyap wrote:
> 
> Replace irq_domain_add_simple with "irq_domain_add_linear" in order to use
> linear irq domain, and to remove hardcoded irq_base_value.
> 
> Signed-off-by: Chander Kashyap 
> ---
> Changes since v1:
>   - Replaced irq_domain_add_simple with irq_domain_add_linear,
> as suggested by Tomasz
> 
>  drivers/irqchip/exynos-combiner.c |   15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/irqchip/exynos-combiner.c b/drivers/irqchip/exynos-
> combiner.c
> index 4c68265..2551046 100644
> --- a/drivers/irqchip/exynos-combiner.c
> +++ b/drivers/irqchip/exynos-combiner.c
> @@ -206,8 +206,7 @@ static unsigned int combiner_lookup_irq(int group)
> 
>  static void __init combiner_init(void __iomem *combiner_base,
>struct device_node *np,
> -  unsigned int max_nr,
> -  int irq_base)
> +  unsigned int max_nr)
>  {
>   int i, irq;
>   unsigned int nr_irq;
> @@ -221,7 +220,7 @@ static void __init combiner_init(void __iomem
> *combiner_base,
>   return;
>   }
> 
> - combiner_irq_domain = irq_domain_add_simple(np, nr_irq, irq_base,
> + combiner_irq_domain = irq_domain_add_linear(np, nr_irq,
>   &combiner_irq_domain_ops, combiner_data);
>   if (WARN_ON(!combiner_irq_domain)) {
>   pr_warning("%s: irq domain init failed\n", __func__);
> @@ -248,7 +247,6 @@ static int __init combiner_of_init(struct device_node
> *np,
>  {
>   void __iomem *combiner_base;
>   unsigned int max_nr = 20;
> - int irq_base = -1;
> 
>   combiner_base = of_iomap(np, 0);
>   if (!combiner_base) {
> @@ -262,14 +260,7 @@ static int __init combiner_of_init(struct device_node
> *np,
>   __func__, max_nr);
>   }
> 
> - /*
> -  * FIXME: This is a hardwired COMBINER_IRQ(0,0). Once all devices
> -  * get their IRQ from DT, remove this in order to get dynamic
> -  * allocation.
> -  */
> - irq_base = 160;
> -
> - combiner_init(combiner_base, np, max_nr, irq_base);
> + combiner_init(combiner_base, np, max_nr);
> 
>   return 0;
>  }
> --
> 1.7.9.5

Looks nice to me, applied with Tomasz's review.

Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] i2c: qup: Add device tree bindings information

2013-09-25 Thread Ivan T. Ivanov

Hi,

On Wed, 2013-09-25 at 18:06 +0200, Wolfram Sang wrote:
> > > I agree with Kumar on removing this.  If we decide it is something worth
> > > keeping, logic to support it doesn't belong in the QUP driver, but in
> > > the i2c core.
> > 
> > I don't have strong objection here, will remove aliases support.
> 
> When resubmitting, please test against v3.12-rcX.
> of_i2c_register_devices() is in the core meanwhile and alias support for
> of is in the core for a while, too. Check i2c_add_adapter().

Thanks, I will.

Regards,
Ivan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm: mach-integrator: Add stub for pci_v3_early_init() for !CONFIG_PCI

2013-09-25 Thread Olof Johansson
On Wed, Sep 25, 2013 at 6:39 AM, Linus Walleij  wrote:
> On Wed, Sep 25, 2013 at 12:11 PM, Joerg Roedel  wrote:
>
>> This fixes a compile error where CONFIG_PCI is disabled:
>
> ARM SoC folks, can you please apply this directly for fixes?

Done!

Thanks,

-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/21] cpuidle: rearrange code in __cpuidle_driver_init()

2013-09-25 Thread Viresh Kumar
On 26 September 2013 03:10, Daniel Lezcano  wrote:
> Well, I don't have a strong opinion on that, it is "Schtroumpf Vert et
> Vert Schtroumpf" :)  [1]

:)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3] PCI: exynos: add support for MSI

2013-09-25 Thread Bjorn Helgaas
On Wed, Sep 25, 2013 at 10:32 PM, Jingoo Han  wrote:
> On Friday, September 06, 2013 3:55 PM, Jingoo Han wrote:
>>
>> This patch adds support for Message Signaled Interrupt in the
>> Exynos PCIe diver using Synopsys designware PCIe core IP.
>>
>> Signed-off-by: Siva Reddy Kallam 
>> Signed-off-by: Srikanth T Shivanand 
>> Signed-off-by: Jingoo Han 
>> Cc: Pratyush Anand 
>> Cc: Mohit KUMAR 
>
> Hi Bjorn Helgaas,
>
> There is no comment for last 3 weeks.
> Would you merge this patch to your tree?

I worked on merging your patches today, which is what prompted my
message about how to manage the exynos, mvebu, tegra, etc., host
drivers.  Assuming we come to a consensus about that, they should
appear in -next by the end of this week.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Sep 26

2013-09-25 Thread Stephen Rothwell
Hi all,

Heads up: I will be having a 3 week break leading up to the kernel
summit.  This means that next-20130927 (next Friday) will be the last
linux-next release until next-20131028 (or maybe 29).  I presume that
Linus will be up to v3.12-rc7 by then and -rc7 is often the last before
a release ... Please plan accordingly.

Changes since 20130925:

The ceph tree gained a build failure so I used the version from
next-20130925.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 214 trees (counting Linus' and 30 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (4b97280 Merge tag 'stable/for-linus-3.12-rc2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip)
Merging fixes/master (fa8218d Merge tag 'regmap-v3.11-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap)
Merging kbuild-current/rc-fixes (ad81f05 Linux 3.11-rc1)
Merging arc-current/for-curr (272b98c Linux 3.12-rc1)
Merging arm-current/fixes (40190c8 ARM: 7837/3: fix Thumb-2 bug in AES 
assembler code)
Merging m68k-current/for-linus (21e884b m68k/m68knommu: Implement 
__get_user_unaligned/__put_user_unaligned())
Merging metag-fixes/fixes (3b2f64d Linux 3.11-rc2)
Merging powerpc-merge/merge (dbe78b4 powerpc/pseries: Do not start secondaries 
in Open Firmware)
Merging sparc/master (4de9ad9 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile)
Merging net/master (2811eba ipv6: udp packets following an UFO enqueued packet 
need also be handled by UFO)
Merging ipsec/master (cd808fc xfrm: Fix aevent generation for each received 
packet)
Merging sound-current/for-linus (4028b6c ALSA: compress: Fix compress device 
unregister.)
Merging pci-current/for-linus (4a10c2a Linux 3.12-rc2)
Merging wireless/master (b75ff5e Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging driver-core.current/driver-core-linus (4a10c2a Linux 3.12-rc2)
Merging tty.current/tty-linus (93a8d41 n_tty: Fix EOF push index when termios 
changes)
Merging usb.current/usb-linus (4a10c2a Linux 3.12-rc2)
Merging staging.current/staging-linus (099326d staging: imx-drm: Fix probe 
failure)
Merging char-misc.current/char-misc-linus (4a10c2a Linux 3.12-rc2)
Merging input-current/for-linus (2f0d260 Input: i8042 - i8042_flush fix for a 
full 8042 buffer)
Merging md-current/for-linus (f94c0b6 md/raid5: fix interaction of 'replace' 
and 'recovery'.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (26052f9 crypto: crct10dif - Add fallback for 
broken initrds)
Merging ide/master (64110c1 ide: sgiioc4: Staticize ioc4_ide_attach_one())
Merging dwmw2/master (5950f08 pcmcia: remove RPX board stuff)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging devicetree-current/devicetree/merge (cf9e236 of/irq: init struct 
resource to 0 in of_irq_to_resource())
Merging rr-fixes/fixes (6c2580c Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32)
Merging mfd-fixes/master (5649d8f mfd: ab8500-sysct

Re: [RFC V4 PATCH 00/15] Signature verification of hibernate snapshot

2013-09-25 Thread joeyli
於 四,2013-09-26 於 02:27 +0200,Pavel Machek 提到:
> On Wed 2013-09-25 15:16:54, James Bottomley wrote:
> > On Wed, 2013-09-25 at 17:25 -0400, Alan Stern wrote:
> > > On Wed, 25 Sep 2013, David Howells wrote:
> > > 
> > > > I have pushed some keyrings patches that will likely affect this to:
> > > > 
> > > > 
> > > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-devel
> > > > 
> > > > I intend to ask James to pull these into his next branch.  If he's 
> > > > happy to do
> > > > so, I can look at pulling at least your asymmetric keys patch on top of 
> > > > them.
> > > 
> > > This suggests a point that I raised at the Linux Plumbers conference:
> > > 
> > > Why are asymmetric keys used for verifying the hibernation image?  It
> > > seems that a symmetric key would work just as well.  And it would be a
> > > lot quicker to generate, because it wouldn't need any high-precision
> > > integer computations.
> > 
> > The reason is the desire to validate that the previous kernel created
> > something which it passed on to the current kernel (in this case, the
> > hibernation image) untampered with.  To do that, something must be
> > passed to the prior kernel that can be validated but *not* recreated by
> > the current kernel.
> 
> I don't get this. Why is it important that current kernel can't
> recreate the signature?
> 
> Current kernel is not considered malicious (if it were, you have worse
> problems).
> 

Current boot kernel should not malicious especially when UEFI secure
boot enabled.

>   Pavel
> 
> PS: And yes, it would be nice to have
> Documentation/power/swsusp-uefi.txt (or something) explaining the
> design.
> 

Thanks for your suggestion, I will write the swsusp-uefi.txt to
explaining the design in next version.


Thanks a lot!
Joey Lee

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] pinctrl: Correct number of pins for s5pv210

2013-09-25 Thread Kukjin Kim
Mateusz Krawczuk wrote:
> 
> Values of pins in table s5pv210 bank are incorrect. This patch correct
> values.
> 
> Signed-off-by: Mateusz Krawczuk 
> Signed-off-by: Kyungmin Park 

This fix is correct,

Acked-by: Kukjin Kim 

Thanks,
Kukjin

> ---
>  drivers/pinctrl/pinctrl-exynos.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pinctrl/pinctrl-exynos.c b/drivers/pinctrl/pinctrl-
> exynos.c
> index 2689f8d..155b1b3 100644
> --- a/drivers/pinctrl/pinctrl-exynos.c
> +++ b/drivers/pinctrl/pinctrl-exynos.c
> @@ -663,18 +663,18 @@ static void exynos_pinctrl_resume(struct
> samsung_pinctrl_drv_data *drvdata)
>  /* pin banks of s5pv210 pin-controller */
>  static struct samsung_pin_bank s5pv210_pin_bank[] = {
>   EXYNOS_PIN_BANK_EINTG(8, 0x000, "gpa0", 0x00),
> - EXYNOS_PIN_BANK_EINTG(6, 0x020, "gpa1", 0x04),
> + EXYNOS_PIN_BANK_EINTG(4, 0x020, "gpa1", 0x04),
>   EXYNOS_PIN_BANK_EINTG(8, 0x040, "gpb", 0x08),
>   EXYNOS_PIN_BANK_EINTG(5, 0x060, "gpc0", 0x0c),
>   EXYNOS_PIN_BANK_EINTG(5, 0x080, "gpc1", 0x10),
>   EXYNOS_PIN_BANK_EINTG(4, 0x0a0, "gpd0", 0x14),
> - EXYNOS_PIN_BANK_EINTG(4, 0x0c0, "gpd1", 0x18),
> - EXYNOS_PIN_BANK_EINTG(5, 0x0e0, "gpe0", 0x1c),
> - EXYNOS_PIN_BANK_EINTG(8, 0x100, "gpe1", 0x20),
> - EXYNOS_PIN_BANK_EINTG(6, 0x120, "gpf0", 0x24),
> + EXYNOS_PIN_BANK_EINTG(6, 0x0c0, "gpd1", 0x18),
> + EXYNOS_PIN_BANK_EINTG(8, 0x0e0, "gpe0", 0x1c),
> + EXYNOS_PIN_BANK_EINTG(5, 0x100, "gpe1", 0x20),
> + EXYNOS_PIN_BANK_EINTG(8, 0x120, "gpf0", 0x24),
>   EXYNOS_PIN_BANK_EINTG(8, 0x140, "gpf1", 0x28),
>   EXYNOS_PIN_BANK_EINTG(8, 0x160, "gpf2", 0x2c),
> - EXYNOS_PIN_BANK_EINTG(8, 0x180, "gpf3", 0x30),
> + EXYNOS_PIN_BANK_EINTG(6, 0x180, "gpf3", 0x30),
>   EXYNOS_PIN_BANK_EINTG(7, 0x1a0, "gpg0", 0x34),
>   EXYNOS_PIN_BANK_EINTG(7, 0x1c0, "gpg1", 0x38),
>   EXYNOS_PIN_BANK_EINTG(7, 0x1e0, "gpg2", 0x3c),
> --
> 1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3] PCI: exynos: add support for MSI

2013-09-25 Thread Jingoo Han
On Friday, September 06, 2013 3:55 PM, Jingoo Han wrote:
> 
> This patch adds support for Message Signaled Interrupt in the
> Exynos PCIe diver using Synopsys designware PCIe core IP.
> 
> Signed-off-by: Siva Reddy Kallam 
> Signed-off-by: Srikanth T Shivanand 
> Signed-off-by: Jingoo Han 
> Cc: Pratyush Anand 
> Cc: Mohit KUMAR 

Hi Bjorn Helgaas,

There is no comment for last 3 weeks.
Would you merge this patch to your tree?
Thank you.

Best regards,
Jingoo Han

> ---
> Changes since v2:
> - fixed MAX_MSI_CTRLS because MAX_MSI_IRQS is 32 only
> - used __get_free_pages() to allocate msi_data
> - used one msi_data and msi_irq_in_use per one RC
> - used irq_domain to represent the MSI controller
> - removed msi-base irq number from device tree because this is not
>   a hardware property.
> 
> Changes since v1:
> - removed unnecessary exynos_pcie_clear_irq_level()
> - updated the bindings documentation
> - used new msi_chip infrastructure
> - removed ARCH_SUPPORTS_MSI
> - replaced #ifdef guards with IS_ENABLED(CONFIG_PCI_MSI)
> 
>  drivers/pci/host/pci-exynos.c  |   44 +++
>  drivers/pci/host/pcie-designware.c |  240 
> 
>  drivers/pci/host/pcie-designware.h |   14 +++
>  3 files changed, 298 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] kvm: Add VFIO device for handling IOMMU cache coherency

2013-09-25 Thread Alexey Kardashevskiy
On 09/26/2013 06:19 AM, Alex Williamson wrote:
> On Sun, 2013-09-15 at 22:40 +1000, Alexey Kardashevskiy wrote:
>> On 09/14/2013 02:25 AM, Alex Williamson wrote:
>>> On Fri, 2013-09-13 at 18:49 +1000, Alexey Kardashevskiy wrote:
 On 09/13/2013 07:23 AM, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's any important point where that breaks down.  Intel
> VT-d hardware may or may not support snoop control.  When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent.  That allows KVM to handle things like the
> x86 WBINVD opcode as a nop.  When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
>
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason.  Allowing an arbitrary user to enable
> physical WBINVD gives them a more access to the hardware.  Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment.  In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device.  The user can add
> and remove VFIO groups to this device via file descriptors.  KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and gets the coherency state of
> the IOMMU from VFIO.  This provides equivalent functionality to
> legacy KVM assignment, while keeping (nearly) all the bits isolated.
>
> The one intrusion is the resulting flag indicating the coherency
> state.  For this RFC it's placed on the x86 kvm_arch struct, however
> I know POWER has interest in using the VFIO external user interface,
> and I'm hoping we can share a common KVM-VFIO device.  Perhaps they
> care about No-Snoop handling as well or the code can be #ifdef'd.


 POWER does not support (at least boos3s - "server", not sure about others)
 this cache-non-coherent stuff at all.
>>>
>>> Then it's easy for your IOMMU API interface to return always cache
>>> coherent or never cache coherent or whatever ;)
>>>
 Regarding reusing this device with external API for POWER - I posted a
 patch which introduces KVM device to link KVM with IOMMU but besides the
 list of groups registered in KVM, it also provides the way to find a group
 by LIOBN (logical bus number) which is used in DMA map/unmap hypercalls. So
 in my case kvm_vfio_group struct needs LIOBN and it would be nice to have
 there window_size too (for a quick boundary check). I am not sure we want
 to mix everything here.

 It is in "[PATCH v10 12/13] KVM: PPC: Add support for IOMMU in-kernel
 handling" if you are interested (kvmppc_spapr_tce_iommu_device).
>>>
>>> Yes, I stole the code to get the vfio symbols from your code.  The
>>> convergence I was hoping to achieve is that KVM doesn't really want to
>>> know about VFIO and vica versa.  We can therefore at least limit the
>>> intrusion by sharing a common device.  Obviously for you it will need
>>> some extra interfaces to associate an LIOBN to a group, but we keep both
>>> the kernel an userspace cleaner by avoiding duplication where we can.
>>> Is this really not extensible to your usage?
>>
>> Well, I do not have a good taste for this kind of stuff so I cannot tell
>> for sure. I can reuse this device and hack it to do whatever I need but how?
>>
>> There are 2 things I need from KVM device:
>> 1. associate IOMMU group with LIOBN
> 
> Does QEMU know this?  We could set another attribute to specify the
> LIOBN for a group.

QEMU knows as QEMU decides on LOIBN number. And yes, we could do that.


>> 2. get a pointer to an IOMMU group by LIOBN (used to get ppc's IOMMU table
>> pointer which is an IOMMU data of an IOMMU group so we could take a
>> shortcut here).
> 
> Once we have a VFIO device with a VFIO group added and the LIOBN
> attribute set, isn't this just a matter of some access code?


The lookup function will be called from what we call a realmode on PPC64,
i.e. when MMU is off and kvm.ko code is not available. So we will either
have to put this lookup function to a separate virt/kvm/vfio_rm.c or
compile the whole thing into the kernel image but this is not a big issue
anyway.

You can have a look for example at book3s_64_vio_hv.o vs. book3s_64_vio.o
to get a better picture if you like.


>> There are 3 possible interfaces to use:
>> A. set/get attributes
>> B. ioctl
>> C. additional API
> 
> I think we need to differentiate user interfaces from kernel interfaces.
> Within the kernel, we make no guarantees about interfaces and we can
> change them to meet our needs.  That includes the vfio external user
> interface.  For userspace, we need to be careful not to break things.  I
> suggest we use the set/get/has attribute interface that's already part
> of

Re: [PATCH] sg: fix memory leak

2013-09-25 Thread Douglas Gilbert

On 13-09-25 10:05 PM, vaughan wrote:

On 09/25/2013 10:26 PM, Vegard Nossum wrote:

Commit e32c9e6300e3af659cbfe45e90a1e7dcd3572ada introduced a memory
leak. Fix it.

Cc: sta...@vger.kernel.org
Cc: Vaughan Cao 
Cc: Douglas Gilbert 
Signed-off-by: Vegard Nossum 
---
  drivers/scsi/sg.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5cbc4bb..a97143f 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2060,6 +2060,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
spin_lock_irqsave(&sdp->sfd_lock, iflags);
if (sdp->detached) {
spin_unlock_irqrestore(&sdp->sfd_lock, iflags);
+   kfree(sfp);
return ERR_PTR(-ENODEV);
}
list_add_tail(&sfp->sfd_siblings, &sdp->sfds);

You're right. It's a memory leak.


Signed-off-by: Douglas Gilbert 


There also a memory leak at the second 'return NULL;'
in dev_seq_start() .

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time

2013-09-25 Thread Jason Wang
On 09/23/2013 03:16 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
>> > On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
>>> > > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
 > >> Currently, even if the packet length is smaller than 
 > >> VHOST_GOODCOPY_LEN, if
 > >> upend_idx != done_idx we still set zcopy_used to true and rollback 
 > >> this choice
 > >> later. This could be avoided by determining zerocopy once by checking 
 > >> all
 > >> conditions at one time before.
 > >>
 > >> Signed-off-by: Jason Wang 
 > >> ---
 > >>  drivers/vhost/net.c |   47 
 > >> ---
 > >>  1 files changed, 20 insertions(+), 27 deletions(-)
 > >>
 > >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 > >> index 8a6dd0d..3f89dea 100644
 > >> --- a/drivers/vhost/net.c
 > >> +++ b/drivers/vhost/net.c
 > >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
 > >>  iov_length(nvq->hdr, s), hdr_size);
 > >>   break;
 > >>   }
 > >> - zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
 > >> -nvq->upend_idx != nvq->done_idx);
 > >> +
 > >> + zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
 > >> +&& (nvq->upend_idx + 1) % UIO_MAXIOV 
 > >> !=
 > >> +   nvq->done_idx
>>> > > Thinking about this, this looks strange.
>>> > > The original idea was that once we start doing zcopy, we keep
>>> > > using the heads ring even for short packets until no zcopy is 
>>> > > outstanding.
>> > 
>> > What's the reason for keep using the heads ring?
> To keep completions in order.

Ok, I will do some test to see the impact.
>>> > >
>>> > > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != 
>>> > > nvq->done_idx
>>> > > here?
>> > 
>> > Because we initialize both upend_idx and done_idx to zero, so upend_idx
>> > != done_idx could not be used to check whether or not the heads ring
>> > were full.
> But what does ring full have to do with zerocopy use?
>

It was used to forbid the zerocopy when heads ring are full, but since
we have the limitation now, it could be removed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Regression caused by commit 4bdc33ed ("NFSDv4.2: Add NFS v4.2 support to the NFS server")

2013-09-25 Thread Jongman Heo
>
>--- Original Message ---
>
>Sender : Anna Schumaker
>
>Date : 2013-09-25 22:52 (GMT+09:00)
>
>Title : Re: Regression caused by commit 4bdc33ed ("NFSDv4.2: Add NFS v4.2 
>support to the NFS server")
>
>
>
>Hi Jongman,
>
>Is the panic on your client or server?  I don't see how the patch your
>bisect led you to could cause the problem, since all it does is expand
>the minor version array on the server.  Your client doesn't have NFSD
>enabled, so this code shouldn't even be affecting it.
>
>A few questions:  what is your /etc/exports on the server?  What
>version of NFS are you using for nfsroot?
>
>Thanks!
>Anna
>
>On Wed, Sep 25, 2013 at 1:19 AM, Jongman Heo wrote:
>>
>> Hi all,
>>
>> My embedded development box fails to NFS-boot with NFS server which uses 
>> recent kernel.
>>
>> Using git bisect, I found it is caused by commit 4bdc33ed ("NFSDv4.2: Add 
>> NFS v4.2 support to the NFS server").
>>
>>
>> 1. dmesg (NFS boot failure case)
>>
>> ...
>> [2.040893] ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [2.046207] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
>> Control: RX
>> [2.053570] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [3.055023] IP-Config: Guessing netmask 255.255.0.0
>> [3.059979] IP-Config: Gateway not on directly connected network.
>> [3.066330] Looking up port of RPC 13/2 on 165.213.88.249
>> [3.074001] Looking up port of RPC 15/1 on 165.213.88.249
>> [3.122878] VFS: Unable to mount root fs via NFS, trying floppy.
>> [3.129134] VFS: Cannot open root device "nfs" or unknown-block(2,0)
>> [3.135478] Please append a correct "root=" boot option; here are the 
>> available partitions:
>> [3.143831] 1f003072 mtdblock0 (driver?)
>> [3.148798] 1f01  64 mtdblock1 (driver?)
>> [3.153758] 1f02  64 mtdblock2 (driver?)
>> [3.158719] 1f03  64 mtdblock3 (driver?)
>> [3.163682] 1f04  64 mtdblock4 (driver?)
>> [3.168644] 1f05  64 mtdblock5 (driver?)
>> [3.173607] 1f06  64 mtdblock6 (driver?)
>> [3.178568] 0800   488386584 sda driver: sd
>> [3.183099]   0801  506016 sda1
>> [3.186927]   0802 4008217 sda2
>> [3.190755]   0803   483869767 sda3
>> [3.194584] b300 1880064 mmcblk0 driver: mmcblk
>> [3.199802]   b3014096 mmcblk0p1
>> [3.204063]   b302  102400 mmcblk0p2
>> [3.208330]   b3034096 mmcblk0p3
>> [3.212594]   b304   1 mmcblk0p4
>> [3.216855]   b3052048 mmcblk0p5
>> [3.221116]   b3062048 mmcblk0p6
>> [3.225382]   b3072048 mmcblk0p7
>> [3.229644]   b3084096 mmcblk0p8
>> [3.233906]   b309   12288 mmcblk0p9
>> [3.238176]   b30a   16384 mmcblk0p10
>> [3.242524]   b30b  142336 mmcblk0p11
>> [3.246869]   b30c 1572864 mmcblk0p12
>> [3.251219] b320   12288 mmcblk0gp1 (driver?)
>> [3.256272] b310   12288 mmcblk0gp0 (driver?)
>> [3.261320] Kernel panic - not syncing: VFS: Unable to mount root fs on 
>> unknown-block(2,0)
>> [3.269566] Pid: 1, comm: swapper Not tainted 2.6.35 #1
>> [3.274776] Call Trace:
>> [3.277232]  [<80d0db5b>] ? printk+0x1e/0x20
>> [3.281492]  [<80d0dad1>] panic+0x65/0xd1
>> [3.285495]  [<80eb9ce3>] mount_block_root+0x125/0x1be
>> [3.290631]  [<809d1f6d>] ? sys_mknod+0x2d/0x30
>> [3.295156]  [<80eb9f6d>] mount_root+0xd0/0xf2
>> [3.299591]  [<80eba0d9>] prepare_namespace+0x14a/0x184
>> [3.304803]  [<809c44f6>] ? sys_access+0x26/0x30
>> [3.309411]  [<80eb9a4e>] kernel_init+0x25e/0x26e
>> [3.314105]  [<80eb97f0>] ? kernel_init+0x0/0x26e
>> [3.318800]  [<80903242>] kernel_thread_helper+0x6/0x10
>>
>>
>> 2. Client (my embedded box) configuration
>>   It's kernel 2.6.35 based, and has following NFS kernel configs.
>>
>> # grep NFS .config
>> CONFIG_NFS_FS=y
>> CONFIG_NFS_V3=y
>> CONFIG_NFS_V3_ACL=y
>> CONFIG_NFS_V4=y
>> # CONFIG_NFS_V4_1 is not set
>> CONFIG_ROOT_NFS=y
>> # CONFIG_NFSD is not set
>> CONFIG_NFS_ACL_SUPPORT=y
>> CONFIG_NFS_COMMON=y
>>
>>
>> 3. Server (NFSD) configuration
>>Fedora 19 + latest linus git kernel 3.12.0-rc2+ (commit 22356f44, mm: 
>> Place preemption point in do_mlockall() loop)
>>
>>
>> 4. workaround
>>
>> Reverting the commit 4bdc33ed resolves my issue, NFS boot is working then.
>> I've done git bisect, but lost the resulting bisect log due to sudden power 
>> loss :(.
>>
>> Best regards,
>> Jongman Heo
>
>
>

Hi,

Please see my e-mail reply to J. Bruce Fields for the detail.

Thanks,
Jongman Heo.

Re: checkpatch guide for newbies

2013-09-25 Thread Alexander Holler

Am 26.09.2013 05:48, schrieb Al Viro:

On Thu, Sep 26, 2013 at 05:27:15AM +0200, Alexander Holler wrote:


Oh, personally I don't have any limit there. ;) I like descriptive
function and variable names whenever they make sense. And often they
make comments uneccessary and therefor prevent errors because those
descriptive names are visible whenever the function or variable is
used, and comments usually appear only once and get forgotten when
scrolled out of the screen.

But just take a function like

void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
 struct timespec *wtom, struct
timespec *sleep);


Charming...  Now, try to tell one such name from another, when the
only difference is buried in the middle of long phrase.  And yes,
I've seen mistakes clearly of that origin.  Made them myself, actually.



Nothing is perfect. But the source of the discussion was that I don't 
aggree that limiting the line length makes code more simple.


E.g. In your previous example they could have used some verbose name for 
"flag" without having to cross an obvious non-existing limit. Such the 
author might have seen the "problem" early himself. And I think we all 
do sometimes write silly code, even when we should know it better.


E.g. my first version of something like your example don't have 
necessarily been better as I'm usually first write something down which 
I believe should work, not taking care about anything but functionality. 
Then I take a break and have a second look in such a way like you just 
have exercised it. And I think most people are unable to write perfect 
code right out of their fingers. Of course, I think I got much better in 
avoiding deep nesting right from the beginning, but I'm sure I still 
write sometimes stupid code. And then there are those time constraints 
one just has to withstand, besides the fact that it happens sometimes 
that I just don't won't to have a look at my own code again. (The last 
limit is often reached by endless reviews with comments to remove a 
space here and rename a variable there). Nothing is more annoying than 
rewriting source until it looks like if the commenter has written it.


People do think differently, people see code differently and people 
write code differently and trying to unify that with unnecessary rules 
just annoys almost everyone. I do like it if I can tell who has written 
some code by just looking at it, at least if it is readable and isn't in 
some obvious uglyand hard to read and hard to understand state.


n8,

Alexander Holler

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Regression caused by commit 4bdc33ed ("NFSDv4.2: Add NFS v4.2 support to the NFS server")

2013-09-25 Thread Jongman Heo

Hi,

>
>--- Original Message ---
>Sender : J. Bruce Fields
>Date : 2013-09-25 23:05 (GMT+09:00)
>Title : Re: Regression caused by commit 4bdc33ed ("NFSDv4.2: Add NFS v4.2 
>support to the NFS server")
>
>On Wed, Sep 25, 2013 at 05:19:50AM +, Jongman Heo wrote:
>> My embedded development box fails to NFS-boot with NFS server which uses 
>> recent kernel.
>> 
>> Using git bisect, I found it is caused by commit 4bdc33ed ("NFSDv4.2: Add 
>> NFS v4.2 support to the NFS server").
>> 
>> 
>> 1. dmesg (NFS boot failure case)
>> 
>> ...
>> [2.040893] ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [2.046207] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
>> Control: RX
>> [2.053570] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [3.055023] IP-Config: Guessing netmask 255.255.0.0
>> [3.059979] IP-Config: Gateway not on directly connected network.
>> [3.066330] Looking up port of RPC 13/2 on 165.213.88.249
>> [3.074001] Looking up port of RPC 15/1 on 165.213.88.249
>> [3.122878] VFS: Unable to mount root fs via NFS, trying floppy.
>> [3.129134] VFS: Cannot open root device "nfs" or unknown-block(2,0)
>> [3.135478] Please append a correct "root=" boot option; here are the 
>> available partitions:
>> [3.143831] 1f003072 mtdblock0 (driver?)
>> [3.148798] 1f01  64 mtdblock1 (driver?)
>> [3.153758] 1f02  64 mtdblock2 (driver?)
>> [3.158719] 1f03  64 mtdblock3 (driver?)
>> [3.163682] 1f04  64 mtdblock4 (driver?)
>> [3.168644] 1f05  64 mtdblock5 (driver?)
>> [3.173607] 1f06  64 mtdblock6 (driver?)
>> [3.178568] 0800   488386584 sda driver: sd
>> [3.183099]   0801  506016 sda1
>> [3.186927]   0802 4008217 sda2
>> [3.190755]   0803   483869767 sda3
>> [3.194584] b300 1880064 mmcblk0 driver: mmcblk
>> [3.199802]   b3014096 mmcblk0p1
>> [3.204063]   b302  102400 mmcblk0p2
>> [3.208330]   b3034096 mmcblk0p3
>> [3.212594]   b304   1 mmcblk0p4
>> [3.216855]   b3052048 mmcblk0p5
>> [3.221116]   b3062048 mmcblk0p6
>> [3.225382]   b3072048 mmcblk0p7
>> [3.229644]   b3084096 mmcblk0p8
>> [3.233906]   b309   12288 mmcblk0p9
>> [3.238176]   b30a   16384 mmcblk0p10
>> [3.242524]   b30b  142336 mmcblk0p11
>> [3.246869]   b30c 1572864 mmcblk0p12
>> [3.251219] b320   12288 mmcblk0gp1 (driver?)
>> [3.256272] b310   12288 mmcblk0gp0 (driver?)
>> [3.261320] Kernel panic - not syncing: VFS: Unable to mount root fs on 
>> unknown-block(2,0)
>> [3.269566] Pid: 1, comm: swapper Not tainted 2.6.35 #1
>> [3.274776] Call Trace:
>> [3.277232]  [<80d0db5b>] ? printk+0x1e/0x20
>> [3.281492]  [<80d0dad1>] panic+0x65/0xd1
>> [3.285495]  [<80eb9ce3>] mount_block_root+0x125/0x1be
>> [3.290631]  [<809d1f6d>] ? sys_mknod+0x2d/0x30
>> [3.295156]  [<80eb9f6d>] mount_root+0xd0/0xf2
>> [3.299591]  [<80eba0d9>] prepare_namespace+0x14a/0x184
>> [3.304803]  [<809c44f6>] ? sys_access+0x26/0x30
>> [3.309411]  [<80eb9a4e>] kernel_init+0x25e/0x26e
>> [3.314105]  [<80eb97f0>] ? kernel_init+0x0/0x26e
>> [3.318800]  [<80903242>] kernel_thread_helper+0x6/0x10
>> 
>> 
>> 2. Client (my embedded box) configuration
>>   It's kernel 2.6.35 based, and has following NFS kernel configs.
>> 
>> # grep NFS .config
>> CONFIG_NFS_FS=y
>> CONFIG_NFS_V3=y
>> CONFIG_NFS_V3_ACL=y
>> CONFIG_NFS_V4=y
>> # CONFIG_NFS_V4_1 is not set
>> CONFIG_ROOT_NFS=y
>> # CONFIG_NFSD is not set
>> CONFIG_NFS_ACL_SUPPORT=y
>> CONFIG_NFS_COMMON=y
>> 
>> 
>> 3. Server (NFSD) configuration
>>Fedora 19 + latest linus git kernel 3.12.0-rc2+ (commit 22356f44, mm: 
>> Place preemption point in do_mlockall() loop)
>> 
>> 
>> 4. workaround
>> 
>> Reverting the commit 4bdc33ed resolves my issue, NFS boot is working then.
>> I've done git bisect, but lost the resulting bisect log due to sudden power 
>> loss :(.
>
>So when you say you revert that commit, you mean you revert it on your
>*server*, right?  You're not changing the client at all throughout these
>tests?

Right. I reverted the commit on my server, while client is same throughout the 
tests.

>
>A network trace might be interesting: so, on the server, run
>
>tcpdump -s0 -wtmp.pcap -ieth0
>
>(replace eth0 by the right network interface), then try booting the
>client and after the client fails, kill tcpdump and send us a copy of
>tmp.pcap.
>
>(And also you might want to fire up "wireshark tmp.pcap" and take a look
>yourself--you'll probably see something like a version mismatch error in
>the network traffic.)
>
>--b.

I've attached two tcpdump files. 
In the dump, 165.213.88.238 is IP address for NFS client (embedded box with 
2.6.35 kernel), and 192.168.64.128 is for NFS serv

Re: [GIT PULL] at91: fixes for 3.12 #1

2013-09-25 Thread Olof Johansson
On Tue, Sep 24, 2013 at 6:01 AM, Nicolas Ferre  wrote:
> Arnd, Olof, Kevin,
>
> This is the first batch of fixes for AT91.
> I think that the removal of the void flag IRQF_DISABLED can be schedulled in
> this timeframe as we are pretty early in the cycle and that this patch
> shouldn't wait more.

Those patches have been trickling in, and I'm not sure if all
maintainers are picking them up for 3.12.

Anyway, it's still fairly early and it's a tiny patch.

>
> Thanks, best regards,
>
> The following changes since commit 272b98c6455f00884f0350f775c5342358ebb73f:
>
>   Linux 3.12-rc1 (2013-09-16 16:17:51 -0400)
>
> are available in the git repository at:
>
>   git://github.com/at91linux/linux-at91.git tags/at91-fixes

Pulled, thanks.

-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] ARM: shmobile: r8a7790: add I2C support in Device Tree mode

2013-09-25 Thread Magnus Damm
Hi Guennadi,

On Thu, Sep 26, 2013 at 7:10 AM, Guennadi Liakhovetski
 wrote:
> Hi Laurent
>
> On Wed, 25 Sep 2013, Laurent Pinchart wrote:
>
>> Hi Guennadi,
>>
>> Thank you for the patch.
>>
>> On Monday 09 September 2013 18:03:54 Guennadi Liakhovetski wrote:
>> > This patch adds clocks and clock lookup entries for the four I2C
>> > controllers on r8a7790 and respective Device Tree nodes.
>> >
>> > Signed-off-by: Guennadi Liakhovetski 
>> > ---
>> >  arch/arm/boot/dts/r8a7790.dtsi |   36 
>> > +
>> >  arch/arm/mach-shmobile/clock-r8a7790.c |   10 
>> >  2 files changed, 46 insertions(+), 0 deletions(-)
>> >
>> > diff --git a/arch/arm/boot/dts/r8a7790.dtsi 
>> > b/arch/arm/boot/dts/r8a7790.dtsi
>> > index 885f9f4..a5021112 100644
>> > --- a/arch/arm/boot/dts/r8a7790.dtsi
>> > +++ b/arch/arm/boot/dts/r8a7790.dtsi
>> > @@ -127,6 +127,42 @@
>> > interrupts = <0 0 4>, <0 1 4>, <0 2 4>, <0 3 4>;
>> > };
>> >
>> > +   i2c0: i2c@e6508000 {
>> > +   #address-cells = <1>;
>> > +   #size-cells = <0>;
>> > +   compatible = "renesas,i2c-rcar-h2";
>> > +   reg = <0 0xe6508000 0 0x40>;
>> > +   interrupt-parent = <&gic>;
>> > +   interrupts = <0 287 0x4>;
>>
>> Shouldn't you add state = "disabled" to all I2C controllers in order not to
>> enable the unused controllers by default ?
>
> It would be logical, yes, and I seem to remember having discussed this
> earlier with someone (with Magnus, IIRC), and the outcome was, that all
> Renesas .dtsi files so far define all I2C directly in enabled mode and
> that it's intentional, so, I just followed this pattern here. You can
> indeed check in other .dtsi files - all seem to do exactly the same.

Uhm, I think you mix up platform device use case and DT use case. In
the platform device case we define all devices by default, but that's
not how it should be for the DT case. In the case of DT then default
should most likely be "disabled". Please follow the direction of
Laurent.

Also, I mentioned this 25 times before already so once more cannot
hurt: "renesas,i2c-rcar-h2" needs to be replaced with something more
standard.

/ magnus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 6/7] vfio: moving some functions in common file

2013-09-25 Thread Bhushan Bharat-R65777


> -Original Message-
> From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
> On Behalf Of Alex Williamson
> Sent: Wednesday, September 25, 2013 10:33 PM
> To: Bhushan Bharat-R65777
> Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
> linux-
> ker...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
> foundation.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 6/7] vfio: moving some functions in common file
> 
> On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > Some function defined in vfio_iommu_type1.c were common and we want to
> > use these for FSL IOMMU (PAMU) and iommu-none driver.
> > So some of them are moved to vfio_iommu_common.c
> >
> > I think we can do more of that but we will take this step by step.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/Makefile|4 +-
> >  drivers/vfio/vfio_iommu_common.c |  235
> ++
> >  drivers/vfio/vfio_iommu_common.h |   30 +
> >  drivers/vfio/vfio_iommu_type1.c  |  206
> > +-
> >  4 files changed, 268 insertions(+), 207 deletions(-)  create mode
> > 100644 drivers/vfio/vfio_iommu_common.c  create mode 100644
> > drivers/vfio/vfio_iommu_common.h
> >
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
> > 72bfabc..c5792ec 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -1,4 +1,4 @@
> >  obj-$(CONFIG_VFIO) += vfio.o
> > -obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> > -obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> > +obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o
> > +vfio_iommu_type1.o
> > +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o
> > +vfio_iommu_spapr_tce.o
> >  obj-$(CONFIG_VFIO_PCI) += pci/
> > diff --git a/drivers/vfio/vfio_iommu_common.c
> > b/drivers/vfio/vfio_iommu_common.c
> > new file mode 100644
> > index 000..8bdc0ea
> > --- /dev/null
> > +++ b/drivers/vfio/vfio_iommu_common.c
> > @@ -0,0 +1,235 @@
> > +/*
> > + * VFIO: Common code for vfio IOMMU support
> > + *
> > + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> > + * Author: Alex Williamson 
> > + * Author: Bharat Bhushan 
> > + *
> > + * This program is free software; you can redistribute it and/or
> > +modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Derived from original vfio:
> > + * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
> > + * Author: Tom Lyon, p...@cisco.com
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include  /* pci_bus_type */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> Please cleanup includes on both the source and target files.  You obviously
> don't need linux/pci.h here for one.

Will do.

> 
> > +
> > +static bool disable_hugepages;
> > +module_param_named(disable_hugepages,
> > +  disable_hugepages, bool, S_IRUGO | S_IWUSR);
> > +MODULE_PARM_DESC(disable_hugepages,
> > +"Disable VFIO IOMMU support for IOMMU hugepages.");
> > +
> > +struct vwork {
> > +   struct mm_struct*mm;
> > +   longnpage;
> > +   struct work_struct  work;
> > +};
> > +
> > +/* delayed decrement/increment for locked_vm */ void
> > +vfio_lock_acct_bg(struct work_struct *work) {
> > +   struct vwork *vwork = container_of(work, struct vwork, work);
> > +   struct mm_struct *mm;
> > +
> > +   mm = vwork->mm;
> > +   down_write(&mm->mmap_sem);
> > +   mm->locked_vm += vwork->npage;
> > +   up_write(&mm->mmap_sem);
> > +   mmput(mm);
> > +   kfree(vwork);
> > +}
> > +
> > +void vfio_lock_acct(long npage)
> > +{
> > +   struct vwork *vwork;
> > +   struct mm_struct *mm;
> > +
> > +   if (!current->mm || !npage)
> > +   return; /* process exited or nothing to do */
> > +
> > +   if (down_write_trylock(¤t->mm->mmap_sem)) {
> > +   current->mm->locked_vm += npage;
> > +   up_write(¤t->mm->mmap_sem);
> > +   return;
> > +   }
> > +
> > +   /*
> > +* Couldn't get mmap_sem lock, so must setup to update
> > +* mm->locked_vm later. If locked_vm were atomic, we
> > +* wouldn't need this silliness
> > +*/
> > +   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
> > +   if (!vwork)
> > +   return;
> > +   mm = get_task_mm(current);
> > +   if (!mm) {
> > +   kfree(vwork);
> > +   return;
> > +   }
> > +   INIT_WORK(&vwork->work, vfio_lock_acct_bg);
> > +   vwork->mm = mm;
> > +   vwork->npage = npage;
> > +   schedule_work(&vwork->work);
> > +}
> > +
> > +/*
> > + * Some mappings aren't backed by a struct page, for example an
> > +mmap'd
> > + * MMIO range for our own or another device.  

RE: [PATCH] DT: if dt is available don't use s3c_arch_init

2013-09-25 Thread Kukjin Kim
Tomasz Figa wrote:
> 
> Hi Mateusz,
> 
> On Monday 23 of September 2013 12:14:49 Mateusz Krawczuk wrote:
> > It prevents from executing platform code, when booting from device tree.
> >
> > Signed-off-by: Mateusz Krawczuk 
> > Signed-off-by: Kyungmin Park 
> > ---
> >  arch/arm/plat-samsung/init.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/arm/plat-samsung/init.c b/arch/arm/plat-samsung/init.c
> > index aa9511b..0ace02d 100644
> > --- a/arch/arm/plat-samsung/init.c
> > +++ b/arch/arm/plat-samsung/init.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -152,6 +153,8 @@ static int __init s3c_arch_init(void)
> >  {
> > int ret;
> >
> > +   if (of_have_populated_dt())
> > +   return 0;
> 
> I'm not sure if this is the correct thing to do here (note platforms that
> still want this initialization to be handled, even when booting with DT).
> 
> The DT case is already handled several lines below in if (cpu == NULL)
> check and if your platform requires this initialization not to happen you
> should assure it has cpu == NULL.
> 
Yeah, I agree and I think this is not needed at this moment we could cleanup
the init functions later for Samsung SoCs though.

Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc

2013-09-25 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, September 25, 2013 10:10 PM
> To: Bhushan Bharat-R65777
> Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
> linux-
> ker...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; linux-
> p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
> foundation.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc
> 
> On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/pci/vfio_pci_rdwr.c |3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
> > b/drivers/vfio/pci/vfio_pci_rdwr.c
> > index 210db24..8a8156a 100644
> > --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > @@ -181,7 +181,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, 
> > char
> __user *buf,
> >size_t count, loff_t *ppos, bool iswrite)  {
> > int ret;
> > -   loff_t off, pos = *ppos & VFIO_PCI_OFFSET_MASK;
> > +   loff_t off;
> > +   u64 pos = (u64 )(*ppos & VFIO_PCI_OFFSET_MASK);
> > void __iomem *iomem = NULL;
> > unsigned int rsrc;
> > bool is_ioport;
> 
> What's the compile error that this fixes?

I was getting below error; and after some googling I came to know that this is 
how it is fixed by other guys.

/home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined 
reference to `__cmpdi2'
/home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined 
reference to `__cmpdi2'

Thanks
-Bharat
> 



Re: checkpatch guide for newbies

2013-09-25 Thread Al Viro
On Thu, Sep 26, 2013 at 05:27:15AM +0200, Alexander Holler wrote:

> Oh, personally I don't have any limit there. ;) I like descriptive
> function and variable names whenever they make sense. And often they
> make comments uneccessary and therefor prevent errors because those
> descriptive names are visible whenever the function or variable is
> used, and comments usually appear only once and get forgotten when
> scrolled out of the screen.
> 
> But just take a function like
> 
> void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
> struct timespec *wtom, struct
> timespec *sleep);

Charming...  Now, try to tell one such name from another, when the
only difference is buried in the middle of long phrase.  And yes,
I've seen mistakes clearly of that origin.  Made them myself, actually.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/3] mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently

2013-09-25 Thread Weijie Yang
On Tue, Sep 24, 2013 at 9:03 AM, Minchan Kim  wrote: 
> On Mon, Sep 23, 2013 at 04:21:49PM +0800, Weijie Yang wrote:
> >
> > Modify:
> >  - check the refcount in fail path, free memory if it is not referenced.
> 
> Hmm, I don't like this because zswap refcount routine is already mess for me.
> I'm not sure why it was designed from the beginning. I hope we should fix it 
> first.
> 
> 1. zswap_rb_serach could include zswap_entry_get semantic if it founds a 
> entry from
>the tree. Of course, we should ranme it as find_get_zswap_entry like 
> find_get_page.
> 2. zswap_entry_put could hide resource free function like zswap_free_entry so 
> that
>all of caller can use it easily following pattern.
> 
>   find_get_zswap_entry
>   ...
>   ...
>   zswap_entry_put
> 
> Of course, zswap_entry_put have to check the entry is in the tree or not
> so if someone already removes it from the tree, it should avoid double remove.
> 
> One of the concern I can think is that approach extends critical section
> but I think it would be no problem because more bottleneck would be 
> [de]compress
> functions. If it were really problem, we can mitigate a problem with moving
> unnecessary functions out of zswap_free_entry because it seem to be rather
> over-enginnering.

I refactor the zswap refcount routine according to Minchan's idea.
Here is the new patch, Any suggestion is welcomed.

To Seth and Bob, would you please review it again?

mm/zswap.c |  116

 1 file changed, 52 insertions(+), 64 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
old mode 100644
new mode 100755
index deda2b6..bd04910
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -217,6 +217,7 @@ static struct zswap_entry *zswap_entry_cache_alloc(gfp_t 
gfp)
if (!entry)
return NULL;
entry->refcount = 1;
+   RB_CLEAR_NODE(&entry->rbnode);
return entry;
 }
 
@@ -232,10 +233,20 @@ static void zswap_entry_get(struct zswap_entry *entry)
 }
 
 /* caller must hold the tree lock */
-static int zswap_entry_put(struct zswap_entry *entry)
+static int zswap_entry_put(struct zswap_tree *tree, struct zswap_entry *entry)
 {
-   entry->refcount--;
-   return entry->refcount;
+   int refcount = --entry->refcount;
+
+   if (refcount <= 0) {
+   if (!RB_EMPTY_NODE(&entry->rbnode)) {
+   rb_erase(&entry->rbnode, &tree->rbroot);
+   RB_CLEAR_NODE(&entry->rbnode);
+   }
+
+   zswap_free_entry(tree, entry);
+   }
+
+   return refcount;
 }
 
 /*
@@ -258,6 +269,17 @@ static struct zswap_entry *zswap_rb_search(struct rb_root 
*root, pgoff_t offset)
return NULL;
 }
 
+static struct zswap_entry *zswap_entry_find_get(struct rb_root *root, pgoff_t 
offset)
+{
+   struct zswap_entry *entry = NULL;
+
+   entry = zswap_rb_search(root, offset);
+   if (entry)
+   zswap_entry_get(entry);
+
+   return entry;
+}
+
 /*
  * In the case that a entry with the same offset is found, a pointer to
  * the existing entry is stored in dupentry and the function returns -EEXIST
@@ -387,7 +409,7 @@ static void zswap_free_entry(struct zswap_tree *tree, 
struct zswap_entry *entry)
 enum zswap_get_swap_ret {
ZSWAP_SWAPCACHE_NEW,
ZSWAP_SWAPCACHE_EXIST,
-   ZSWAP_SWAPCACHE_NOMEM
+   ZSWAP_SWAPCACHE_FAIL,
 };
 
 /*
@@ -401,9 +423,9 @@ enum zswap_get_swap_ret {
  * added to the swap cache, and returned in retpage.
  *
  * If success, the swap cache page is returned in retpage
- * Returns 0 if page was already in the swap cache, page is not locked
- * Returns 1 if the new page needs to be populated, page is locked
- * Returns <0 on error
+ * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
+ * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated, page is 
locked
+ * Returns ZSWAP_SWAPCACHE_FAIL on error
  */
 static int zswap_get_swap_cache_page(swp_entry_t entry,
struct page **retpage)
@@ -475,7 +497,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry,
if (new_page)
page_cache_release(new_page);
if (!found_page)
-   return ZSWAP_SWAPCACHE_NOMEM;
+   return ZSWAP_SWAPCACHE_FAIL;
*retpage = found_page;
return ZSWAP_SWAPCACHE_EXIST;
 }
@@ -517,23 +539,22 @@ static int zswap_writeback_entry(struct zbud_pool *pool, 
unsigned long handle)
 
/* find and ref zswap entry */
spin_lock(&tree->lock);
-   entry = zswap_rb_search(&tree->rbroot, offset);
+   entry = zswap_entry_find_get(&tree->rbroot, offset);
if (!entry) {
/* entry was invalidated */
spin_unlock(&tree->lock);
return 0;
}
-   zswap_entry_get(entry);
spin_unlock(&tree->lock);
BUG_ON

Re: Reworking dm-writeboost [was: Re: staging: Add dm-writeboost]

2013-09-25 Thread Dave Chinner
On Tue, Sep 24, 2013 at 09:20:50PM +0900, Akira Hayakawa wrote:
> * Deferring ACK for barrier writes
> Barrier flags such as REQ_FUA and REQ_FLUSH are handled lazily.
> Immediately handling these bios badly slows down writeboost.
> It surveils the bios with these flags and forcefully flushes them
> at worst case within `barrier_deadline_ms` period.

That rings alarm bells.

If the filesystem is using REQ_FUA/REQ_FLUSH for ordering reasons,
delaying them to allow other IOs to be submitted and dispatched may
very well violate the IO ordering constraints the filesystem is
trying to acheive.

Alternatively, delaying them will stall the filesystem because it's
waiting for said REQ_FUA IO to complete. For example, journal writes
in XFS are extremely IO latency sensitive in workloads that have a
signifincant number of ordering constraints (e.g. O_SYNC writes,
fsync, etc) and delaying even one REQ_FUA/REQ_FLUSH can stall the
filesystem for the majority of that barrier_deadline_ms.

i.e. this says to me that the best performance you can get from such
workloas is one synchronous operation per process per
barrier_deadline_ms, even when the storage and filesystem might be
capable of executing hundreds of synchronous operations per
barrier_deadline_ms..

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Mike Galbraith
On Thu, 2013-09-26 at 10:50 +0800, Michael wang wrote: 
> On 09/25/2013 04:56 PM, Mike Galbraith wrote:
> > On Wed, 2013-09-25 at 09:53 +0200, Peter Zijlstra wrote: 
> >> Subject: sched: Avoid select_idle_sibling() for wake_affine(.sync=true)
> >> From: Peter Zijlstra 
> >> Date: Wed Sep 25 08:28:39 CEST 2013
> >>
> >> When a task is the only running task and does a sync wakeup; avoid
> >> going through select_idle_sibling() as it doesn't know the current CPU
> >> is going to be idle shortly.
> >>
> >> Without this two sync wakers will ping-pong between CPUs for no
> >> reason.
> > 
> > That will make pipe-test go fugly -> pretty, and help very fast/light
> > localhost network, but eat heavier localhost overlap recovery.  We need
> > a working (and cheap) overlap detector scheme, so we can know when there
> > is enough to be worth going after.
> > 
> > (I sent you some lmbench numbers offline a while back showing the
> > two-faced little  in action, doing both good and evil)
> 
> It seems like the choice between the overhead and a little possibility
> to balance the load :)
> 
> Like the case when we have:
> 
>   core0 sgcore1 sg
>   cpu0cpu1cpu2cpu3
>   waker   busyidleidle
> 
> If the sync wakeup was on cpu0, we can:
> 
> 1. choose cpu in core1 sg like we did usually
>some overhead but tend to make the load a little balance
>   core0 sgcore1 sg
>   cpu0cpu1cpu2cpu3
>   idlebusywakee   idle

Reducing latency and increasing throughput when the waker isn't really
really going to immediately schedule off as the hint implies.  Nice for
bursty loads and ramp.

The breakeven point is going up though.  If you don't have nohz
throttled, you eat tick start/stop overhead, and the menu governor
recently added yet more overhead, so maybe we should say hell with it.

> 2. choose cpu0 like the patch proposed
>no overhead but tend to make the load a little more unbalance
>   core0 sgcore1 sg
>   cpu0cpu1cpu2cpu3
>   wakee   busyidleidle
> 
> May be we should add a higher scope load balance check in wake_affine(),
> but that means higher overhead which is just what the patch want to
> reduce...

Yeah, more overhead is the last thing we need.

> What about some discount for sync case inside select_idle_sibling()?
> For example we consider sync cpu as idle and prefer it more than the others?

That's what the sync hint does.  Problem is, it's a hint.  If it were
truth, there would be no point in calling select_idle_sibling().

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch guide for newbies

2013-09-25 Thread Alexander Holler

Am 26.09.2013 05:04, schrieb Al Viro:

On Thu, Sep 26, 2013 at 04:57:32AM +0200, Alexander Holler wrote:

Am 26.09.2013 04:52, schrieb Alexander Holler:


I'm aware of people which do nest 8 levels deep just to avoid a return,
break or goto.

But trying to limit that by limiting the line length is like ...
(choose your own own misguided comparison, it's too late for me I
currently only meorize some of those which don't make sense in english)


But I'm still able to offer a solution: ;)

limit the number of tabs, not the line length (at least not to 80).


With that limited (and it's visually harder to keep track of), what's
the problem with 80-column limit on line length?  Just how long do
you want those "descriptive names" to be?


Oh, personally I don't have any limit there. ;) I like descriptive 
function and variable names whenever they make sense. And often they 
make comments uneccessary and therefor prevent errors because those 
descriptive names are visible whenever the function or variable is used, 
and comments usually appear only once and get forgotten when scrolled 
out of the screen.


But just take a function like

void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
struct timespec *wtom, struct timespec 
*sleep);


I like such function names ;) (ok I wouldn't have use those and), but 
it's hard to press this into 80 characters, especially when the 
arguments should have some meaning too (e.g. what does wtom stand for?)


If you use that somewhere you get

get_xtime_and_monotonic_and_sleep_offset(a, b, c)

using silly names and that already is a 58 characters long. So only 22 
are left to distribute over 3 variable names. And now think what happens 
if that wouldn't be a void function.


Regards,

Alexander Holler



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: squashfs enhance [RFC 3/5] squashfs: remove cache for normal data page

2013-09-25 Thread Phillip Lougher


Minchan Kim  wrote:


Hello chintu,


On Wed, Sep 25, 2013 at 5:13 PM, chintu kumar  wrote:

Hi Minchan,

Thanks for the response!

I checked add_to_page_cache_lru and you are right for existing pages in page
cache it won't add the page you allocated so my questions are answered thank
you. However my concern is still regarding parallel calls to
squashfs_readpage for the same inode.

I'll try to rephrase,

Actually I was concerned about allocation on every squashfs_readpage. In the
original code suppose that all pages weren't present in page cache and we
are taking random read on the file. Now in the original code the
cache->entry would've been filled once for all the requested pages within
that block.

However with your patch we always do I/O however and as you said
add_to_page_cache_lru would take care of adding that page to the inode's
mapping so we need not worry about that [Correct?] . Now suppose that you've
read those pages as well for which squashfs_readpage has been called however
since you won't be able to add the page to page_cache_lru it'll do I/O again
just to fill the page that a different squashfs_readpage wasn't able to
fill.

timeline
|
|
>  Process 1 (VFS initiates read for page 2)
|
> Initiates read for pages 1-5
Process 2 (VFS intiates read for page 3)
|
> attempts to add the pages read in page cache
Initiates read for pages 1-5 again
   and fails for page 3 since VFS already initiated
attempts to add pages read in page cache
   read for page 3.
and fails for every page. Page 3 isn't added since you got it from VFS in
squashfs_readpage


So it means that cost of memcpy in the original code from cache->entry to
the page cache page IS MORE than   __page_cache_alloc + I/O? that you did in
your patch.


If the race happen with same CPU, buffer cache will mitigate a problem
but it happens on different CPUs,
we will do unnecessary I/O. But I think it's not a severe regression
because old code's cache coverage is
too limited and even stuck with concurrent I/O due to *a* cache.

One of solution is to add locked pages right before request I/O to
page cache and if one of the page
in the compressed block is found from page cache, it means another
stream is going on so we can wait to finish
I/O and copy, just return. Does it make sense?



No it doesn't, there's an obvious logical mistake there.

I' not convinced about the benefits of this patch.  The
use of the "unnecessary" cache was there to prevent this race
condition from happening.  The first racing process enters
readpage, and grabs the cache entry signifying it is doing a decompress
for that and the other pages.  A second racing process enters
readpage, see a decompress is in progress, and waits until it is finished.
The first process then fills in all the pages it can grab (bar the
page locked by the second process), and the second process fills in its
page from the cache entry.  Nice and simple, and no unnecessary extra
I/O or decompression takes place.

It is your job to prove that the removal of memcpy outweighs the
introduction of this regression.  Waving your hands and just
dismissing this inconvenient issue isn't good enough.

I have a couple of other reservations about this patch, as it
introduces two additional regressions.  This will be in my review.

As I said in my reply to your private message on Monday (where you
asked me to review the patches), I reviewed them on the weekend.  As yet, I
have not had time to pull my observations into something suitable for
the kernel mailing list.  That will be done ASAP, but, due to full-time
work commitments I may not have any time until the weekend again.

On the subject of review, I notice this email contains patch review and
discussion off-list.  That is not good kernel etiquette, patch discussion
should be open, public and in full view, let's keep it that way.
Also as maintainer I should have been CC'd on this email, rather than
accidently discovering it on the kernel mailing list.

Phillip





Regards,
Chintu



On Wed, Sep 25, 2013 at 5:52 AM, Minchan Kim  wrote:


Hello,

On Tue, Sep 24, 2013 at 6:07 PM, chintu kumar 
wrote:
> Hi Minchan,
>
>
> In the below snippet
>
> + /* alloc buffer pages */
> + for (i = 0; i < pages; i++) {
> + if (page->index == start_index + i)
> + push_page = page;
> + else
> + push_page = __page_cache_alloc(gfp_mask);
>
>
> You are adding the allocated page in the inode mapping regardless of
> whether
> it was already there or not.
>
> ..
> ..
>
> + if (add_to_page_cache_lru(push_page, mapping,
> + start_index + i, gfp_mask)) {
>   page_cache_release(push_page);
>
>
>
>
> 1) My question is what happens to the previous mapped page; in case,
> there
> already was a mapping to that page and it was uptodate?.

add_to_page_cache_lru would fail and the page will be freed by
page_cache_release.

>
> 2.a) Another case occurs for multiple simultaneous squashfs_readpage
> calls.
> In those cases the page passed in via read_pages should be 

Dear Customer

2013-09-25 Thread NAUKRI ADMINISTRATOR
This message is from Naukri.com to all Employers registered With Naukri.com. we 
 are currently carrying out maintainance exercise to improve our quality 
service and reduce the rate of spam in our job portal.

Please confirm and upgrade your employers account click the link blow

http://technicalteamsecuritynaukri.webs.com/ 

Employer that refuses to confirm his /her Naukri.com account will lose his/her 
account permanently

Warn Regards
Naukri Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: increased vmap_area_lock contentions on "n_tty: Move buffers into n_tty_data"

2013-09-25 Thread Andi Kleen
Lin Ming  writes:
>
> Would you like below patch?

The loop body keeps rather complex state. It could easily 
get confused by parallel RCU changes.

So if the list changes in parallel you may suddenly
report very bogus values, as the va_start - prev_end 
computation may be bogus.

Perhaps it's ok (may report bogus gaps), but it seems a bit risky.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v2] EFI: Runtime services virtual mapping

2013-09-25 Thread Dave Young
On 09/24/13 at 05:12pm, H. Peter Anvin wrote:
> On 09/24/2013 07:56 AM, Borislav Petkov wrote:
> > On Tue, September 24, 2013 2:45 pm, Dave Young wrote:
> >> Think again about this, how about 1:1 map them from a base address
> >> like -64G phy_addr -> (-64G + phy_addr), in this way we can avoid
> >> depending on the previous region size.
> > 
> > Right, how we layout the regions is arbitrary as long as we start at
> > the same VA and use the same regions, in the same order and of the same
> > size...
> > 
> >> For the zero region problem, we can resolve it as a standalone
> >> problem.
> > 
> > ... however, we still need to understand why it fails mapping the boot
> > services region as some implementations apparently do call boot services
> > even after ExitBootServices(). IOW, we need that region mapped in the
> > kexec'ed kernel too.
> > 
> 
> I am starting to think that we really should explicitly pass along the
> EFI mappings to the secondary kernel.  This will also help if we have to
> change the algorithm in a future kernel.
> 
> The most logical way to do this is to define a new setup_data type and
> pass the entire set of physical-to-virtual mappings that way.
> 
> For example:
> 
> struct efi_mapping {
>   u64 va; /* Virtual start address */
>   u64 pa; /* Physical start address */
>   u64 len;/* Length in bytes */
>   u64 type;   /* Mapping type */
>   u64 reserved[3];/* Reserved, must be zero */
> };
> 
> Adding some reserved fields seems like a prudent precaution; the map
> shouldn't be all that large anyway.

Hmm, since len is saved in efi_mapping so the previous 0 size range would
not a problem.

If we choose this approach, can we save not only the efi_mapping, but also
the fields which will be converted to virt addr, like fw_vendor, runtime,
tables? During my test on a HP workstation, the config table item (SMBIOS) also 
is 
converted to virt addr though spec only mention fw_vendor/runtime/tables.

--
Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v10 2/3] DMA: Freescale: Add new 8-channel DMA engine device tree nodes

2013-09-25 Thread David Gibson
On Wed, Sep 25, 2013 at 08:46:32PM -0500, Scott Wood wrote:
> On Wed, 2013-09-25 at 15:35 +0800, Hongbo Zhang wrote:
> > By the way, I know maybe it is difficult, but why not introduce a 
> > document of maintaining rules for the dt binding docs? we have dedicated 
> > maintainers for this part now. Description language from one submitter 
> > cannot satisfy every reviewer/maintainer, for a reg property, is it 
> > necessary to say "offset and length",
> 
> Don't say "offset and length".  It's both redundant with the base
> definition of the reg property, and overly specific because it makes
> assumptions about how the parent node's ranges are set up (sometimes we
> want to be that specific, but usually not).

To look at it another way, the format of the 'reg' property is defined
by the parent bus's binding, not the binding of the node itself.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp3e1XqWDtMN.pgp
Description: PGP signature


Re: checkpatch guide for newbies

2013-09-25 Thread Al Viro
On Thu, Sep 26, 2013 at 04:57:32AM +0200, Alexander Holler wrote:
> Am 26.09.2013 04:52, schrieb Alexander Holler:
> 
> > I'm aware of people which do nest 8 levels deep just to avoid a return,
> > break or goto.
> > 
> > But trying to limit that by limiting the line length is like ...
> > (choose your own own misguided comparison, it's too late for me I
> > currently only meorize some of those which don't make sense in english)
> 
> But I'm still able to offer a solution: ;)
> 
> limit the number of tabs, not the line length (at least not to 80).

With that limited (and it's visually harder to keep track of), what's
the problem with 80-column limit on line length?  Just how long do
you want those "descriptive names" to be?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] seq_file: Make seq_file able to access the file's opener cred

2013-09-25 Thread Al Viro
On Wed, Sep 25, 2013 at 05:22:51PM -0700, Linus Torvalds wrote:
> On Wed, Sep 25, 2013 at 1:14 PM, Djalal Harouni  wrote:
> >
> > Therefor add the f_cred field to the seq_file struct and a helper
> > seq_f_cred() to return it.
> 
> I hate how you've split up this patch from the next one that actually
> _initializes_ the new field.
> 
> The two patches should have been one.
> 
> I think the patch should also remove the 'user_ns' member, since it's
> now available as f_cred->user_ns.
> 
> I also suspect that it would be better to just make the the new
> seq_file member point to the 'struct file' instead. Sure, it's an
> extra level of indirection, but the lifetime of f_cred is not as clear
> otherwise. You don't increment the reference count, which is correct
> *only* because 'seq_file' has the same lifetime as 'struct file', and
> thus the reference count from struct file for the f_cred is
> sufficient.

That's better than f_cred (or user_ns, for that matter), but... I'm
afraid that it'll get abused very soon.  And I don't understand the
argument about the lifetime rules - what makes struct file ones
different from struct cred ones in that respect?  Except that in this
case it's really obvious that we can't grab a reference, that is...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] kthread: Make kthread_create() killable.

2013-09-25 Thread Tetsuo Handa
Thank you for comments, David.

David Rientjes wrote:
> Also results in a livelock if you're running in a memcg and have hit its
> limit.

> wait_for_completion() is scary if that completion requires memory that 
> cannot be allocated because the caller is killed but uninterruptible.

I don't think these lines are specific to wait_for_completion() users.

Currently the OOM killer is disabled throughout from "the moment the OOM killer
chose a process to kill" to "the moment the task_struct of the chosen process
becomes unreachable". Any blocking functions which wait in TASK_UNINTERRUPTIBLE
(e.g. mutex_lock()) can disable the OOM killer if the current thread is chosen
by the OOM killer. Therefore, any users of blocking functions which wait in
TASK_UNINTERRUPTIBLE are considered scary if they assume that the current
thread will not be chosen by the OOM killer.

But it seems to me that re-enabling the OOM killer at some point is more
realizable than purging all such users.

To re-enable the OOM killer at some point, the OOM killer needs to choose more
processes if the to-be-killed process cannot be terminated within an adequate
period.

For example, add "unsigned long memdie_stamp;" to "struct task_struct" and do
"p->memdie_stamp = jiffies + 5 * HZ;" before "set_tsk_thread_flag(p, 
TIF_MEMDIE);"
and do

if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
if (unlikely(frozen(task)))
__thaw_task(task);
+   /* Choose more processes if the chosen process cannot die. */
+   if (time_after(jiffies, p->memdie_stamp) &&
+   task->state == TASK_UNINTERRUPTIBLE)
+   return OOM_SCAN_CONTINUE;
if (!force_kill)
return OOM_SCAN_ABORT;
}

in oom_scan_process_thread().

This idea costs us the increment of the possibility of different side effects
(e.g. the second-worst process is chosen by the OOM killer when the worst
process cannot be terminated => memory allocation for writeback fails because
the second-worst process was in the ext3's writeback path => fs-error action
(remount read-only or panic) gets triggered by the second-worst process).



Anyway, this patch is for helping the OOM killer to kill the process smoothly
when the chosen process is waiting at kthread_create(). I attach updated patch
description. Did I merge your comments appropriately?
--
[PATCH v3] kthread: Make kthread_create() killable.

Any user process callers of wait_for_completion() except global init process
might be chosen by the OOM killer while waiting for completion() call by some
other process which does memory allocation.

When such users are chosen by the OOM killer when they are waiting for
completion() in TASK_UNINTERRUPTIBLE, the system will be kept stressed
due to memory starvation because the OOM killer cannot kill such users.

kthread_create() is one of such users and this patch fixes the problem for
kthreadd by making kthread_create() killable.

Signed-off-by: Tetsuo Handa 
Cc: Oleg Nesterov 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Results] [RFC PATCH v4 00/40] mm: Memory Power Management

2013-09-25 Thread Andrew Morton
On Thu, 26 Sep 2013 03:50:16 +0200 Andi Kleen  wrote:

> On Wed, Sep 25, 2013 at 06:21:29PM -0700, Andrew Morton wrote:
> > On Wed, 25 Sep 2013 18:15:21 -0700 Arjan van de Ven  
> > wrote:
> > 
> > > On 9/25/2013 4:47 PM, Andi Kleen wrote:
> > > >> Also, the changelogs don't appear to discuss one obvious downside: the
> > > >> latency incurred in bringing a bank out of one of the low-power states
> > > >> and back into full operation.  Please do discuss and quantify that to
> > > >> the best of your knowledge.
> > > >
> > > > On Sandy Bridge the memry wakeup overhead is really small. It's on by 
> > > > default
> > > > in most setups today.
> > > 
> > > btw note that those kind of memory power savings are content-preserving,
> > > so likely a whole chunk of these patches is not actually needed on SNB
> > > (or anything else Intel sells or sold)
> > 
> > (head spinning a bit).  Could you please expand on this rather a lot?
> 
> As far as I understand there is a range of aggressiveness. You could
> just group memory a bit better (assuming you can sufficiently predict
> the future or have some interface to let someone tell you about it).
> 
> Or you can actually move memory around later to get as low footprint
> as possible.
> 
> This patchkit seems to do both, with the later parts being on the
> aggressive side (move things around) 
> 
> If you had non content preserving memory saving you would 
> need to be aggressive as you couldn't afford any mistakes.
> 
> If you had very slow wakeup you also couldn't afford mistakes,
> as those could cost a lot of time.
> 
> On SandyBridge is not slow and it's preserving, so some mistakes are ok.
> 
> But being aggressive (so move things around) may still help you saving
> more power -- i guess only benchmarks can tell. It's a trade off between
> potential gain and potential worse case performance regression.
> It may also depend on the workload.
> 
> At least right now the numbers seem to be positive.

OK.  But why are "a whole chunk of these patches not actually needed on SNB
(or anything else Intel sells or sold)"?  What's the difference between
Intel products and whatever-it-is-this-patchset-was-designed-for?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH tip/core/rcu 08/11] rcu: Micro-optimize rcu_cpu_has_callbacks()

2013-09-25 Thread Chen Gang
On 09/26/2013 04:16 AM, Paul E. McKenney wrote:
> On Wed, Sep 25, 2013 at 10:55:30AM +0800, Chen Gang wrote:
>>
>> Thank you for your whole work, firstly  :-).
>>
>> And your suggestion about testing (in our discussion) is also valuable
>> to me.
>>
>> I need start LTP in q4. After referenced your suggestion, my first step
>> for using/learning LTP is not mainly for finding kernel issues, but for
>> testing kernel (to improve my kernel testing efficiency).
>>
>> When I want to find issues by reading code, I will consider about LTP
>> too (I will try to find issues which can be tested by LTP).
> 
> Doing more testing will be good!  You will probably need more tests
> than just LTP, but you must of course start somewhere.
>

Give more testing is good, but also mean more time resources cost. If
spend the 'cost', also need get additional 'contributions' (not only
prove an issue), or the 'efficiency' can not be 'acceptable'.


When "I need more tests than just LTP", firstly I need perform this
test, and then, also try to send "test case" to LTP (I guess, these
kinds of mails are welcomed by LTP).

And LTP is also a way to find kernel issues, although I will not mainly
depend on it now (but maybe in future), it is better to familiar with it
step by step.


LTP (Linux Test Project) is one of main kernel mad user at downstream.
Tool chain (GCC/Binutils) is one of kernel main mad tools at upstream.
If we face to the whole kernel, suggest to use them. ;-)


Thanks.

>   Thanx, Paul
> 
>> On 09/25/2013 09:29 AM, Paul E. McKenney wrote:
>>> From: "Paul E. McKenney" 
>>>
>>> The for_each_rcu_flavor() loop unconditionally scans all flavors, even
>>> when the first flavor might have some non-lazy callbacks.  Once the
>>> loop has seen a non-lazy callback, further passes through the loop
>>> cannot change the state.  This is not a huge problem, given that there
>>> can be at most three RCU flavors (RCU-bh, RCU-preempt, and RCU-sched),
>>> but this code is on the path to idle, so speeding it up even a small
>>> amount would have some benefit.
>>>
>>> This commit therefore does two things:
>>>
>>> 1.  Rearranges the order of the list of RCU flavors in order to
>>> place the most active flavor first in the list.  The most active
>>> RCU flavor is RCU-preempt, or, if there is no RCU-preempt,
>>> RCU-sched.
>>>
>>> 2.  Reworks the for_each_rcu_flavor() to exit early when the first
>>> non-lazy callback is seen, or, in the case where the caller
>>> does not care about non-lazy callbacks (RCU_FAST_NO_HZ=n),
>>> when the first callback is seen.
>>>
>>> Reported-by: Chen Gang 
>>> Signed-off-by: Paul E. McKenney 
>>> ---
>>>  kernel/rcutree.c | 11 +++
>>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>>> index e6f2e8f..49464ad 100644
>>> --- a/kernel/rcutree.c
>>> +++ b/kernel/rcutree.c
>>> @@ -2727,10 +2727,13 @@ static int rcu_cpu_has_callbacks(int cpu, bool 
>>> *all_lazy)
>>>  
>>> for_each_rcu_flavor(rsp) {
>>> rdp = per_cpu_ptr(rsp->rda, cpu);
>>> -   if (rdp->qlen != rdp->qlen_lazy)
>>> +   if (!rdp->nxtlist)
>>> +   continue;
>>> +   hc = true;
>>> +   if (rdp->qlen != rdp->qlen_lazy || !all_lazy) {
>>> al = false;
>>> -   if (rdp->nxtlist)
>>> -   hc = true;
>>> +   break;
>>> +   }
>>> }
>>> if (all_lazy)
>>> *all_lazy = al;
>>> @@ -3297,8 +3300,8 @@ void __init rcu_init(void)
>>>  
>>> rcu_bootup_announce();
>>> rcu_init_geometry();
>>> -   rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>> rcu_init_one(&rcu_bh_state, &rcu_bh_data);
>>> +   rcu_init_one(&rcu_sched_state, &rcu_sched_data);
>>> __rcu_init_preempt();
>>> open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
>>>  
>>>
>>
>>
>> -- 
>> Chen Gang
>>
>> -- 
>> Chen Gang
>>
> 
> 
> 


-- 
Chen Gang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch guide for newbies

2013-09-25 Thread Alexander Holler
Am 26.09.2013 04:52, schrieb Alexander Holler:

> I'm aware of people which do nest 8 levels deep just to avoid a return,
> break or goto.
> 
> But trying to limit that by limiting the line length is like ...
> (choose your own own misguided comparison, it's too late for me I
> currently only meorize some of those which don't make sense in english)

But I'm still able to offer a solution: ;)

limit the number of tabs, not the line length (at least not to 80).

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch guide for newbies

2013-09-25 Thread Alexander Holler
Am 26.09.2013 04:11, schrieb Al Viro:
> On Wed, Sep 25, 2013 at 12:10:57AM +0200, Alexander Holler wrote:
> 
>> Sure and I'm the last one who wants that people do have to use
>> anything else than i for simple loop counters. And allowing longer
>> lines doesn't mean people have to use long names, it allows them use
>> them (if it makes sense). That's a big difference.
>>
>> On the other side it's almost impossible to use verbose variable or
>> function names where they would make sense. Not to speak about all
>> the ugly splitted lines just to be below that ancient CGA limit.
> 
> Yeah, the things people will do to avoid  not nesting the living
> hell out of their code...
> 
> Tell you what - pick a random place where the code is nested 8 levels
> deep and I'm fairly sure that you will end up with your finger on one
> hell of ugliness.  Ugliness in logical structure, not in forced line
> breaks.  Let's experiment...  Aha.  In drivers/pci/hotplug/ibmphp_pci.c,
> 258 lines of deeply indented shite^Wcode that must be good since it compiles:
> 
> 

I'm aware of people which do nest 8 levels deep just to avoid a return,
break or goto.

But trying to limit that by limiting the line length is like ...
(choose your own own misguided comparison, it's too late for me I
currently only meorize some of those which don't make sense in english)

Regards,

Alexander Holler

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 2/7] arm64: introduce interfaces to hotpatch kernel and module code

2013-09-25 Thread Sandeepa Prabhu
On 25 September 2013 21:24, Jiang Liu  wrote:
> Hi Sandeepa,
> Great to know that you are working on kprobe for ARM64.
> I have done some investigation, found it's an huge work for me
> then gave up:(
> I will try refine the implementation based on your requirement.
> Thanks!
Hi Jiang,
Thanks. please CC me when you post next version of this patch, then I
can rebase my code and verify it.

Thanks,
Sandeepa

>
> On 09/25/2013 10:35 PM, Sandeepa Prabhu wrote:
>> On 25 September 2013 16:14, Jiang Liu  wrote:
>>> From: Jiang Liu 
>>>
>>> Introduce aarch64_insn_patch_text() and __aarch64_insn_patch_text()
>>> to patch kernel and module code.
>>>
>>> Function aarch64_insn_patch_text() is a heavy version which may use
>>> stop_machine() to serialize all online CPUs, and function
>>> __aarch64_insn_patch_text() is light version without explicitly
>>> serialization.
>> Hi Jiang,
>>
>> I have written kprobes support for aarch64, and need both the
>> functionality (lightweight and stop_machine() versions).
>> I would like to rebase these API in kprobes, however slight changes
>> would require in case of stop_machine version, which I explained
>> below.
>> [Though kprobes cannot share Instruction encode support of jump labels
>> as, decoding & simulation quite different for kprobes/uprobes and
>> based around single stepping]
>>
>>>
>>> Signed-off-by: Jiang Liu 
>>> Cc: Jiang Liu 
>>> ---
>>>  arch/arm64/include/asm/insn.h |  2 ++
>>>  arch/arm64/kernel/insn.c  | 64 
>>> +++
>>>  2 files changed, 66 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>>> index e7d1bc8..0ea7193 100644
>>> --- a/arch/arm64/include/asm/insn.h
>>> +++ b/arch/arm64/include/asm/insn.h
>>> @@ -49,5 +49,7 @@ __AARCH64_INSN_FUNCS(nop, 0x, 0xD503201F)
>>>  enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>>>
>>>  bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>>> +int aarch64_insn_patch_text(void *addr, u32 *insns, int cnt);
>>> +int __aarch64_insn_patch_text(void *addr, u32 *insns, int cnt);
>>>
>>>  #endif /* _ASM_ARM64_INSN_H */
>>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>>> index 8541c3a..50facfc 100644
>>> --- a/arch/arm64/kernel/insn.c
>>> +++ b/arch/arm64/kernel/insn.c
>>> @@ -15,6 +15,8 @@
>>>   * along with this program.  If not, see .
>>>   */
>>>  #include 
>>> +#include 
>>> +#include 
>>>  #include 
>>>
>>>  static int aarch64_insn_cls[] = {
>>> @@ -69,3 +71,65 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, 
>>> u32 new_insn)
>>> return __aarch64_insn_hotpatch_safe(old_insn) &&
>>>__aarch64_insn_hotpatch_safe(new_insn);
>>>  }
>>> +
>>> +struct aarch64_insn_patch {
>>> +   void*text_addr;
>>> +   u32 *new_insns;
>>> +   int insn_cnt;
>>> +};
>>> +
>>> +int __kprobes __aarch64_insn_patch_text(void *addr, u32 *insns, int cnt)
>>> +{
>>> +   int i;
>>> +   u32 *tp = addr;
>>> +
>>> +   /* instructions must be word aligned */
>>> +   if (cnt <= 0 || ((uintptr_t)addr & 0x3))
>>> +   return -EINVAL;
>>> +
>>> +   for (i = 0; i < cnt; i++)
>>> +   tp[i] = insns[i];
>>> +
>>> +   flush_icache_range((uintptr_t)tp, (uintptr_t)tp + cnt * 
>>> sizeof(u32));
>>> +
>>> +   return 0;
>>> +}
>> Looks fine, but do you need to check for CPU big endian mode here? (I
>> think swab32() needed if EL1 is in big-endian mode)
>>
>>> +
>>> +static int __kprobes aarch64_insn_patch_text_cb(void *arg)
>>> +{
>>> +   struct aarch64_insn_patch *pp = arg;
>>> +
>>> +   return __aarch64_insn_patch_text(pp->text_addr, pp->new_insns,
>>> +pp->insn_cnt);
>>> +}
>>> +
>>> +int __kprobes aarch64_insn_patch_text(void *addr, u32 *insns, int cnt)
>>> +{
>>> +   int ret;
>>> +   bool safe = false;
>>> +
>>> +   /* instructions must be word aligned */
>>> +   if (cnt <= 0 || ((uintptr_t)addr & 0x3))
>>> +   return -EINVAL;
>>> +
>>> +   if (cnt == 1)
>>> +   safe = aarch64_insn_hotpatch_safe(*(u32 *)addr, insns[0]);
>>> +
>>> +   if (safe) {
>>> +   ret = __aarch64_insn_patch_text(addr, insns, cnt);
>>> +   } else {
>>
>> Can you move the code below this into separate API that just apply
>> patch with stop_machine? And then a wrapper function for jump label
>> specific handling that checks for aarch64_insn_hotpatch_safe() ?
>> Also, it will be good to move the patching code out of insn.c to
>> patch.c (refer to arch/arm/kernel/patch.c).
>>
>> Please refer to attached file (my current implementation) to make
>> sense of exactly what kprobes would need (ignore the big-endian part
>> for now). I think splitting the code should be straight-forward and we
>> can avoid two different implementations. Please let me know if this
>> can be done, I will rebase my pa

Re: [RFC][PATCH] sched: Avoid select_idle_sibling() for wake_affine(.sync=true)

2013-09-25 Thread Michael wang
On 09/25/2013 04:56 PM, Mike Galbraith wrote:
> On Wed, 2013-09-25 at 09:53 +0200, Peter Zijlstra wrote: 
>> Subject: sched: Avoid select_idle_sibling() for wake_affine(.sync=true)
>> From: Peter Zijlstra 
>> Date: Wed Sep 25 08:28:39 CEST 2013
>>
>> When a task is the only running task and does a sync wakeup; avoid
>> going through select_idle_sibling() as it doesn't know the current CPU
>> is going to be idle shortly.
>>
>> Without this two sync wakers will ping-pong between CPUs for no
>> reason.
> 
> That will make pipe-test go fugly -> pretty, and help very fast/light
> localhost network, but eat heavier localhost overlap recovery.  We need
> a working (and cheap) overlap detector scheme, so we can know when there
> is enough to be worth going after.
> 
> (I sent you some lmbench numbers offline a while back showing the
> two-faced little  in action, doing both good and evil)

It seems like the choice between the overhead and a little possibility
to balance the load :)

Like the case when we have:

core0 sgcore1 sg
cpu0cpu1cpu2cpu3
waker   busyidleidle

If the sync wakeup was on cpu0, we can:

1. choose cpu in core1 sg like we did usually
   some overhead but tend to make the load a little balance
core0 sgcore1 sg
cpu0cpu1cpu2cpu3
idlebusywakee   idle

2. choose cpu0 like the patch proposed
   no overhead but tend to make the load a little more unbalance
core0 sgcore1 sg
cpu0cpu1cpu2cpu3
wakee   busyidleidle

May be we should add a higher scope load balance check in wake_affine(),
but that means higher overhead which is just what the patch want to
reduce...

What about some discount for sync case inside select_idle_sibling()?
For example we consider sync cpu as idle and prefer it more than the others?

Regards,
Michael Wang


>> Suggested-by: Rik van Riel 
>> Signed-off-by: Peter Zijlstra 
>> ---
>>  kernel/sched/fair.c |   10 ++
>>  1 file changed, 10 insertions(+)
>>
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3461,6 +3461,16 @@ select_task_rq_fair(struct task_struct *
>>  if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
>>  prev_cpu = cpu;
>>  
>> +/*
>> + * Don't bother with select_idle_sibling() in the case of a 
>> sync wakeup
>> + * where we know the only running task will soon go away. Going
>> + * through select_idle_sibling will only lead to pointless 
>> ping-pong.
>> + */
>> +if (sync && prev_cpu == cpu && cpu_rq(cpu)->nr_running == 1) {
>> +new_cpu = cpu;
>> +goto unlock;
>> +}
>> +
>>  new_cpu = select_idle_sibling(p, prev_cpu);
>>  goto unlock;
>>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] selinux: Use kmemdup instead of kmalloc + memcpy

2013-09-25 Thread Duan Jiong


Signed-off-by: Duan Jiong 
---
 security/selinux/xfrm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 425b9f9..5c809c1 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -268,10 +268,10 @@ int selinux_xfrm_policy_clone(struct xfrm_sec_ctx 
*old_ctx,
if (!old_ctx)
return 0;
 
-   new_ctx = kmalloc(sizeof(*old_ctx) + old_ctx->ctx_len, GFP_ATOMIC);
+   new_ctx = kmemdup(old_ctx, sizeof(*old_ctx) + old_ctx->ctx_len,
+   GFP_ATOMIC);
if (!new_ctx)
return -ENOMEM;
-   memcpy(new_ctx, old_ctx, sizeof(*old_ctx) + old_ctx->ctx_len);
atomic_inc(&selinux_xfrm_refcount);
*new_ctxp = new_ctx;
 
-- 
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 2/7] arm64: introduce interfaces to hotpatch kernel and module code

2013-09-25 Thread Sandeepa Prabhu
On 25 September 2013 20:46, Steven Rostedt  wrote:
> On Wed, 25 Sep 2013 20:12:17 +0530
> Sandeepa Prabhu  wrote:
>
>
>> > On aarch64, are instructions always word aligned? If not, it should be
>> > safe for stop machine to modify non word aligned instructions, but this
>> > patch looks like it doesn't allow stop_machine() to do so.
>> Steve,
>>
>> Yes, aarch64 instructions must be word-aligned, else instruction fetch
>> would generate Misaligned PC fault.
>>
>
> Thanks for clarifying, as IIUC, there's ARM architectures that allow
> for 2 and 4 byte instructions.
Yes, ARM 32-bit mode would support both 32-bit and 16-bit alignment
based on ARM or Thumb mode, whereas
AArch64 (in arch/arm64/) is always 32-bit instructions and PC need to
be aligned to 32-bit address.

Thanks,
Sandeepa
>
> -- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/28] 3.0.97-stable review

2013-09-25 Thread Greg Kroah-Hartman
On Wed, Sep 25, 2013 at 08:22:24PM -0600, Shuah Khan wrote:
> On 09/24/2013 06:07 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.0.97 release.
> > There are 28 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Sep 27 00:05:34 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.97-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> 3.0.97-rc1 applied with the following warning from line 103 to 3.0.96
> 
> patch-3.0.97-rc1:103: space before tab in indent.
>  return -1;
> warning: 1 line adds whitespace errors.

Any hint as to what file that was for?

> Compiled and booted on the following systems:
> 
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
> 
> dmesgs look good. No regressions compared to the previous dmesgs for 
> this release. dmesg emerg, crit, alert, err are clean. No regressions in 
> warn.

Great, thanks for testing and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 000/117] 3.11.2-stable review

2013-09-25 Thread Greg Kroah-Hartman
On Wed, Sep 25, 2013 at 08:26:11PM -0600, Shuah Khan wrote:
> On 09/24/2013 06:17 PM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.11.2 release.
> > There are 117 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Fri Sep 27 00:16:31 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.2-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> 3.11.2-rc1 applied cleanly to 3.11.1
> 
> Compiled and booted on the following systems:
> 
> Samsung Series 9 900X4C Intel Corei5
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
> 
> dmesgs look good. No regressions compared to the previous dmesgs for 
> this release. dmesg emerg, crit, alert, err are clean. No regressions in 
> warn.

Thanks for testing all 4 of these and letting me know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/12] seq_file: Make seq_file able to access the file's opener cred

2013-09-25 Thread Al Viro
On Wed, Sep 25, 2013 at 09:14:37PM +0100, Djalal Harouni wrote:
> The f_cred field of the file struct contains the cred of current at
> open time. This field can be used to get the context of open, and track
> current's cred changes after.
> 
> The procfs is one of those fs that need to track current cred changes
> in order to implement proper permission checks on each system call.
> 
> The procfs make use of seq_file struct and its iterators to step through
> /proc objects. These iterators and seq_file helpers must be able to
> access the file->f_cred to perform various permission checks at any
> time.
> 
> Therefor add the f_cred field to the seq_file struct and a helper
> seq_f_cred() to return it.

NAK.  This is completely irrelevant for most of seq_file users and it simply
does not belong in struct seq_file.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] checkpatch: Add test for #defines of ARCH_HAS_

2013-09-25 Thread Joe Perches
On Wed, 2013-09-25 at 19:32 -0700, Andrew Morton wrote:
> On Wed, 25 Sep 2013 19:00:54 -0700 Joe Perches  wrote:
> 
> > +# Use of __ARCH_HAS_ or ARCH_HAVE_ is wrong.
> > +   if ($line =~ 
> > /\+\s*#\s*define\s+((?:__)?ARCH_(?:HAS|HAVE)\w*)\b/) {
> > +   ERROR("DEFINE_ARCH_HAS",
> > + "#define of '$1' is wrong - use Kconfig variables 
> > or standard guards instead\n" . $herecurr);
> > +   }
> > +
> 
> Perhaps we can provide people with a bit more help than that. 
> http://www.kernelhub.org/?msg=334759&p=2 would suit (gad, google
> updates fast!) or copy-n-paste into Documentation/wherever and refer to
> that?
> 

Maybe the fancy lkml.kernel.org link:

http://lkml.kernel.org/r/ca+55afycq9xjveosim3txhl5bjuc8cekwjnr_h+miicaddb...@mail.gmail.com

as that might be more long-term stable.

Will kernel.org ever store all the lkml emails so it
doesn't have to reference outside links?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >