date:20160429

Re: linux-next: build warning after merge of the akpm-current tree

2016-04-29 Thread Stephen Rothwell

Hi Josh,

On Fri, 29 Apr 2016 09:03:42 -0500 Josh Poimboeuf  wrote:
>
> On Fri, Apr 29, 2016 at 08:32:19AM -0500, Josh Poimboeuf wrote:
> > On Fri, Apr 29, 2016 at 04:45:43PM +1000, Stephen Rothwell wrote:  
> > > Hi Andrew,
> > > 
> > > After merging the akpm-current tree, today's linux-next build (x86_64
> > > allmodconfig) produced this warning:
> > > 
> > > drivers/scsi/ipr.c: In function 'ipr_show_device_id':
> > > drivers/scsi/ipr.c:4462:34: warning: format '%llx' expects argument of 
> > > type 'long long unsigned int', but argument 4 has type 'long unsigned 
> > > int' [-Wformat=]
> > >len = snprintf(buf, PAGE_SIZE, "0x%llx\n", be64_to_cpu(res->dev_id));
> > >   ^
> > > 
> > > Lots and lots like this :-(
> > > 
> > > Probably introduced by commit
> > > 
> > >   eef17fb79096 ("byteswap: try to avoid __builtin_constant_p gcc bug")
> > > 
> > > I guess __builtin_bswap64() has type "unsigned long int" :-(  
> > 
> > Hm, I suppose this is cross-compiled on a powerpc host?
> > 
> > We probably need to add a (__u64) cast to the return value of
> > __builtin_bswap64(), like:
> > 
> > diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
> > index de56fd5..d737804 100644
> > --- a/include/uapi/linux/swab.h
> > +++ b/include/uapi/linux/swab.h
> > @@ -123,7 +123,7 @@ static inline __attribute_const__ __u32 
> > __fswahb32(__u32 val)
> >   * @x: value to byteswap
> >   */
> >  #ifdef __HAVE_BUILTIN_BSWAP64__
> > -#define __swab64(x) __builtin_bswap64((__u64)(x))
> > +#define __swab64(x) (__u64)__builtin_bswap64((__u64)(x))
> >  #else
> >  #define __swab64(x)\
> > (__builtin_constant_p((__u64)(x)) ? \  
> 
> 
> Never mind about cross-compiling on powerpc, this has nothing to do with
> that.  But the above patch does seem to fix it.

Thanks.  I have added Andrew's tidied up version to linux-next to
replace the revert.

-- 
Cheers,
Stephen Rothwell

Re: linux-next: build warning after merge of the akpm-current tree

2016-04-29 Thread Stephen Rothwell

Hi Josh,

On Fri, 29 Apr 2016 09:03:42 -0500 Josh Poimboeuf  wrote:
>
> On Fri, Apr 29, 2016 at 08:32:19AM -0500, Josh Poimboeuf wrote:
> > On Fri, Apr 29, 2016 at 04:45:43PM +1000, Stephen Rothwell wrote:  
> > > Hi Andrew,
> > > 
> > > After merging the akpm-current tree, today's linux-next build (x86_64
> > > allmodconfig) produced this warning:
> > > 
> > > drivers/scsi/ipr.c: In function 'ipr_show_device_id':
> > > drivers/scsi/ipr.c:4462:34: warning: format '%llx' expects argument of 
> > > type 'long long unsigned int', but argument 4 has type 'long unsigned 
> > > int' [-Wformat=]
> > >len = snprintf(buf, PAGE_SIZE, "0x%llx\n", be64_to_cpu(res->dev_id));
> > >   ^
> > > 
> > > Lots and lots like this :-(
> > > 
> > > Probably introduced by commit
> > > 
> > >   eef17fb79096 ("byteswap: try to avoid __builtin_constant_p gcc bug")
> > > 
> > > I guess __builtin_bswap64() has type "unsigned long int" :-(  
> > 
> > Hm, I suppose this is cross-compiled on a powerpc host?
> > 
> > We probably need to add a (__u64) cast to the return value of
> > __builtin_bswap64(), like:
> > 
> > diff --git a/include/uapi/linux/swab.h b/include/uapi/linux/swab.h
> > index de56fd5..d737804 100644
> > --- a/include/uapi/linux/swab.h
> > +++ b/include/uapi/linux/swab.h
> > @@ -123,7 +123,7 @@ static inline __attribute_const__ __u32 
> > __fswahb32(__u32 val)
> >   * @x: value to byteswap
> >   */
> >  #ifdef __HAVE_BUILTIN_BSWAP64__
> > -#define __swab64(x) __builtin_bswap64((__u64)(x))
> > +#define __swab64(x) (__u64)__builtin_bswap64((__u64)(x))
> >  #else
> >  #define __swab64(x)\
> > (__builtin_constant_p((__u64)(x)) ? \  
> 
> 
> Never mind about cross-compiling on powerpc, this has nothing to do with
> that.  But the above patch does seem to fix it.

Thanks.  I have added Andrew's tidied up version to linux-next to
replace the revert.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH v2 5/7] perf: Introduce address range filtering

2016-04-29 Thread Alexander Shishkin

Mathieu Poirier  writes:

> I see two things in this work:

[trimmed 700+ lines of context that had no purpose]

> 1) A framework to deal with filters described in user space.
> 2) An implementation for address filtering that will work for both
> Intel and ARM.
>
> This will work well for address filtering (for both PT and CS) but
> what happens when we want to introduce new filters?  This is
> inevitable and some filters will be architecture agnostic while others
> architecture specific.

Haven't we been through this [1] already?

[1] http://marc.info/?l=linux-kernel=145013911827358

Regards,
--
Alex

Re: [PATCH v2 5/7] perf: Introduce address range filtering

2016-04-29 Thread Alexander Shishkin

Mathieu Poirier  writes:

> I see two things in this work:

[trimmed 700+ lines of context that had no purpose]

> 1) A framework to deal with filters described in user space.
> 2) An implementation for address filtering that will work for both
> Intel and ARM.
>
> This will work well for address filtering (for both PT and CS) but
> what happens when we want to introduce new filters?  This is
> inevitable and some filters will be architecture agnostic while others
> architecture specific.

Haven't we been through this [1] already?

[1] http://marc.info/?l=linux-kernel=145013911827358

Regards,
--
Alex

Re: [PATCH] Use existing helper to convert "on/off" to boolean

2016-04-29 Thread Minfei Huang

On 04/29/16 at 02:21P, Andrew Morton wrote:
> On Fri, 29 Apr 2016 13:47:04 +0800 Minfei Huang  wrote:
> 
> > It's more convenient to use existing function helper to convert string
> > "on/off" to boolean.
> > 
> > ...
> >
> > --- a/lib/kstrtox.c
> > +++ b/lib/kstrtox.c
> > @@ -326,7 +326,7 @@ EXPORT_SYMBOL(kstrtos8);
> >   * @s: input string
> >   * @res: result
> >   *
> > - * This routine returns 0 iff the first character is one of 'Yy1Nn0', or
> > + * This routine returns 0 if the first character is one of 'Yy1Nn0', or
> 
> That isn't actually a typo.  "iff" is shorthand for "if and only if". 
> ie: kstrtobool() will not return 0 in any other case.
> 
> Use of "iff" is a bit pretentious but I guess it does convey some
> conceivably useful info.
> 

Got it. Thanks for your explanation.

Thanks
Minfei

Re: [PATCH] Use existing helper to convert "on/off" to boolean

2016-04-29 Thread Minfei Huang

On 04/29/16 at 02:21P, Andrew Morton wrote:
> On Fri, 29 Apr 2016 13:47:04 +0800 Minfei Huang  wrote:
> 
> > It's more convenient to use existing function helper to convert string
> > "on/off" to boolean.
> > 
> > ...
> >
> > --- a/lib/kstrtox.c
> > +++ b/lib/kstrtox.c
> > @@ -326,7 +326,7 @@ EXPORT_SYMBOL(kstrtos8);
> >   * @s: input string
> >   * @res: result
> >   *
> > - * This routine returns 0 iff the first character is one of 'Yy1Nn0', or
> > + * This routine returns 0 if the first character is one of 'Yy1Nn0', or
> 
> That isn't actually a typo.  "iff" is shorthand for "if and only if". 
> ie: kstrtobool() will not return 0 in any other case.
> 
> Use of "iff" is a bit pretentious but I guess it does convey some
> conceivably useful info.
> 

Got it. Thanks for your explanation.

Thanks
Minfei

[PATCH] Tolerate signal > 32 in sig_fatal (apparent bug detected by UBSAN)

2016-04-29 Thread Adam Richter

I apologize if there is some more specialized mailing list to which I
should have first directed this unreviewed patch.  Please feel free to
redirect me.

The undefined behavior sanitizer ("UBSAN") on 32-bit i386 detected a
case in linux-4.6-rc5/kernel/signal/signal.c where complete_signal
called the sig_fatal test macro on a signal greater than SIGRTMIN (32
on all architectures, I think), which causes it to attempt to shift 32
bits or more of a value that is only 32 bits long.

If the behavior of shifting a 32 bit value by 32 bits or more is shift
it by that value mod 32, then I suspect that this may prevent signals
32+SIGKILL, 32+SIGSUSP and 32+SIGCONT from being caught.  I have not
tried to produce a test program to actually show that this results in
incorrect behavior, but, in case anyone wants to try, I'll mention
that the situation where UBSAN caught the problem appeared to
originate from user level call to tgkill, judging by the stack trace:

__sigqueue_alloc+0x7f/0x190
complete_signal+0x2d6/0x3e0
? release_pages+0x11f/0x4c0
 __send_signal+0x20c/0x740
send_signal+0x34/0x80
do_send_sig_info+0x3a/0x80
do_send_specific+0x63/0xa0
do_tkill+0xb5/0x130
SyS_tgkill+0x1e/0x30

I think the simplest solution to this is pretty trivial, which would
be to add to sig_fatal the same "sig < SIGRTMIN" safeguard that is the
other sig_ test macros that include/linux/signal.h defines.  However,
it happens that sig_fatal uses the macro siginmask, and all other
users siginmask already have the check, so I propose the following
patch, which moves that test into the siginmask macro and removes it
from the callers of siginmask.

Signed-off-by: Adam Richter 

---

diff --git a/include/linux/signal.h b/include/linux/signal.h
index 92557bb..332aa2a 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -385,7 +385,7 @@ int unhandled_signal(struct task_struct *tsk, int sig);
 #else
 #define rt_sigmask(sig)sigmask(sig)
 #endif
-#define siginmask(sig, mask) (rt_sigmask(sig) & (mask))
+#define siginmask(sig, mask) (((sig) < SIGRTMIN) && (rt_sigmask(sig) & (mask)))

 #define SIG_KERNEL_ONLY_MASK (\
 rt_sigmask(SIGKILL)   |  rt_sigmask(SIGSTOP))
@@ -406,14 +406,10 @@ int unhandled_signal(struct task_struct *tsk, int sig);
 rt_sigmask(SIGCONT)   |  rt_sigmask(SIGCHLD)   | \
 rt_sigmask(SIGWINCH)  |  rt_sigmask(SIGURG))

-#define sig_kernel_only(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_ONLY_MASK))
-#define sig_kernel_coredump(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_COREDUMP_MASK))
-#define sig_kernel_ignore(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_IGNORE_MASK))
-#define sig_kernel_stop(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_STOP_MASK))
+#define sig_kernel_only(sig) siginmask(sig, SIG_KERNEL_ONLY_MASK)
+#define sig_kernel_coredump(sig) siginmask(sig, SIG_KERNEL_COREDUMP_MASK)
+#define sig_kernel_ignore(sig)   siginmask(sig, SIG_KERNEL_IGNORE_MASK)
+#define sig_kernel_stop(sig) siginmask(sig, SIG_KERNEL_STOP_MASK)

 #define sig_user_defined(t, signr) \
 (((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_DFL) &&\

[PATCH] Tolerate signal > 32 in sig_fatal (apparent bug detected by UBSAN)

2016-04-29 Thread Adam Richter

I apologize if there is some more specialized mailing list to which I
should have first directed this unreviewed patch.  Please feel free to
redirect me.

The undefined behavior sanitizer ("UBSAN") on 32-bit i386 detected a
case in linux-4.6-rc5/kernel/signal/signal.c where complete_signal
called the sig_fatal test macro on a signal greater than SIGRTMIN (32
on all architectures, I think), which causes it to attempt to shift 32
bits or more of a value that is only 32 bits long.

If the behavior of shifting a 32 bit value by 32 bits or more is shift
it by that value mod 32, then I suspect that this may prevent signals
32+SIGKILL, 32+SIGSUSP and 32+SIGCONT from being caught.  I have not
tried to produce a test program to actually show that this results in
incorrect behavior, but, in case anyone wants to try, I'll mention
that the situation where UBSAN caught the problem appeared to
originate from user level call to tgkill, judging by the stack trace:

__sigqueue_alloc+0x7f/0x190
complete_signal+0x2d6/0x3e0
? release_pages+0x11f/0x4c0
 __send_signal+0x20c/0x740
send_signal+0x34/0x80
do_send_sig_info+0x3a/0x80
do_send_specific+0x63/0xa0
do_tkill+0xb5/0x130
SyS_tgkill+0x1e/0x30

I think the simplest solution to this is pretty trivial, which would
be to add to sig_fatal the same "sig < SIGRTMIN" safeguard that is the
other sig_ test macros that include/linux/signal.h defines.  However,
it happens that sig_fatal uses the macro siginmask, and all other
users siginmask already have the check, so I propose the following
patch, which moves that test into the siginmask macro and removes it
from the callers of siginmask.

Signed-off-by: Adam Richter 

---

diff --git a/include/linux/signal.h b/include/linux/signal.h
index 92557bb..332aa2a 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -385,7 +385,7 @@ int unhandled_signal(struct task_struct *tsk, int sig);
 #else
 #define rt_sigmask(sig)sigmask(sig)
 #endif
-#define siginmask(sig, mask) (rt_sigmask(sig) & (mask))
+#define siginmask(sig, mask) (((sig) < SIGRTMIN) && (rt_sigmask(sig) & (mask)))

 #define SIG_KERNEL_ONLY_MASK (\
 rt_sigmask(SIGKILL)   |  rt_sigmask(SIGSTOP))
@@ -406,14 +406,10 @@ int unhandled_signal(struct task_struct *tsk, int sig);
 rt_sigmask(SIGCONT)   |  rt_sigmask(SIGCHLD)   | \
 rt_sigmask(SIGWINCH)  |  rt_sigmask(SIGURG))

-#define sig_kernel_only(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_ONLY_MASK))
-#define sig_kernel_coredump(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_COREDUMP_MASK))
-#define sig_kernel_ignore(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_IGNORE_MASK))
-#define sig_kernel_stop(sig) \
-(((sig) < SIGRTMIN) && siginmask(sig, SIG_KERNEL_STOP_MASK))
+#define sig_kernel_only(sig) siginmask(sig, SIG_KERNEL_ONLY_MASK)
+#define sig_kernel_coredump(sig) siginmask(sig, SIG_KERNEL_COREDUMP_MASK)
+#define sig_kernel_ignore(sig)   siginmask(sig, SIG_KERNEL_IGNORE_MASK)
+#define sig_kernel_stop(sig) siginmask(sig, SIG_KERNEL_STOP_MASK)

 #define sig_user_defined(t, signr) \
 (((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_DFL) &&\

Re: [PATCH v5 09/21] IB/hns: Add hca support

2016-04-29 Thread Or Gerlitz

On Wed, Apr 27, 2016 at 6:34 AM, oulijun  wrote:
> On 2016/4/26 22:25, Jiri Pirko wrote:
>> Tue, Apr 26, 2016 at 04:18:21PM CEST, l...@kernel.org wrote:

 I appreciate your keen eye. this code is meant for ARM64bit therefore 
 should run corretly for 64-bit AARCH64.

>> The driver should run correctly on any arch.

>  Hi Jiri Pirko,
> Our driver run in ARM64 platform by depending on Kconfig. It will be 
> configure in Kconfig file.

Can you elaborate what design aspects in the driver or anywhere else
should impose that limitation?

Or.

Re: [PATCH v5 09/21] IB/hns: Add hca support

2016-04-29 Thread Or Gerlitz

On Wed, Apr 27, 2016 at 6:34 AM, oulijun  wrote:
> On 2016/4/26 22:25, Jiri Pirko wrote:
>> Tue, Apr 26, 2016 at 04:18:21PM CEST, l...@kernel.org wrote:

 I appreciate your keen eye. this code is meant for ARM64bit therefore 
 should run corretly for 64-bit AARCH64.

>> The driver should run correctly on any arch.

>  Hi Jiri Pirko,
> Our driver run in ARM64 platform by depending on Kconfig. It will be 
> configure in Kconfig file.

Can you elaborate what design aspects in the driver or anywhere else
should impose that limitation?

Or.

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

> Not doing it for 64-bit constants makes no sense if it just uses the
> trivial Booth's algorithm version.

AFAICT, gcc 5 *does* optimize 64-bit multiplies by constants.
Does the belief that it doesn't date back to some really old
version?

There's still a threshold where it just punts to the multiplier.

Some examples, x86-64 (gcc 6.0.1) and aarch64 (gcc 5.3.1).
Note the difference in the multiply-by-12345 routine.

return x*9;
mul9:
leaq(%rdi,%rdi,8), %rax
ret
mul9:
add x0, x0, x0, lsl 3
ret

return x*10;
mul10:
leaq(%rdi,%rdi,4), %rax
addq%rax, %rax
ret
mul10:
lsl x1, x0, 3
add x0, x1, x0, lsl 1
ret

return x*127;
mul127:
movq%rdi, %rax
salq$7, %rax
subq%rdi, %rax
ret
mul127:
lsl x1, x0, 7
sub x0, x1, x0
ret

return x*12345;
mul12345:
imulq   $12345, %rdi, %rax
ret
mul12345:
lsl x1, x0, 3
sub x1, x1, x0
lsl x1, x1, 1
sub x1, x1, x0
lsl x1, x1, 3
sub x1, x1, x0
lsl x1, x1, 3
sub x0, x1, x0
lsl x1, x0, 4
sub x0, x1, x0
ret

uint64_t y = (x << 9) - (x << 3) + x;
return x + (x << 14) - (y << 3);
mul12345_manual:
movq%rdi, %rdx
salq$14, %rax
salq$9, %rdx
addq%rdi, %rax
addq%rdi, %rdx
salq$3, %rdi
subq%rdi, %rdx
salq$3, %rdx
subq%rdx, %rax
ret
mul12345_manual:
lsl x2, x0, 9
lsl x1, x0, 14
add x2, x2, x0
add x1, x1, x0
sub x0, x2, x0, lsl 3
sub x0, x1, x0, lsl 3
ret

return x*2654435769:
mul2654435769:
movl$2654435769, %eax
imulq   %rdi, %rax
ret
mul2654435769:
mov x1, 31161
movkx1, 0x9e37, lsl 16
mul x0, x0, x1
ret

The problem with variant code paths like mul12345_manual is that the
infrastructure required to determine which to use is many times larger
than the code itself.  :-(

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

> Not doing it for 64-bit constants makes no sense if it just uses the
> trivial Booth's algorithm version.

AFAICT, gcc 5 *does* optimize 64-bit multiplies by constants.
Does the belief that it doesn't date back to some really old
version?

There's still a threshold where it just punts to the multiplier.

Some examples, x86-64 (gcc 6.0.1) and aarch64 (gcc 5.3.1).
Note the difference in the multiply-by-12345 routine.

return x*9;
mul9:
leaq(%rdi,%rdi,8), %rax
ret
mul9:
add x0, x0, x0, lsl 3
ret

return x*10;
mul10:
leaq(%rdi,%rdi,4), %rax
addq%rax, %rax
ret
mul10:
lsl x1, x0, 3
add x0, x1, x0, lsl 1
ret

return x*127;
mul127:
movq%rdi, %rax
salq$7, %rax
subq%rdi, %rax
ret
mul127:
lsl x1, x0, 7
sub x0, x1, x0
ret

return x*12345;
mul12345:
imulq   $12345, %rdi, %rax
ret
mul12345:
lsl x1, x0, 3
sub x1, x1, x0
lsl x1, x1, 1
sub x1, x1, x0
lsl x1, x1, 3
sub x1, x1, x0
lsl x1, x1, 3
sub x0, x1, x0
lsl x1, x0, 4
sub x0, x1, x0
ret

uint64_t y = (x << 9) - (x << 3) + x;
return x + (x << 14) - (y << 3);
mul12345_manual:
movq%rdi, %rdx
salq$14, %rax
salq$9, %rdx
addq%rdi, %rax
addq%rdi, %rdx
salq$3, %rdi
subq%rdi, %rdx
salq$3, %rdx
subq%rdx, %rax
ret
mul12345_manual:
lsl x2, x0, 9
lsl x1, x0, 14
add x2, x2, x0
add x1, x1, x0
sub x0, x2, x0, lsl 3
sub x0, x1, x0, lsl 3
ret

return x*2654435769:
mul2654435769:
movl$2654435769, %eax
imulq   %rdi, %rax
ret
mul2654435769:
mov x1, 31161
movkx1, 0x9e37, lsl 16
mul x0, x0, x1
ret

The problem with variant code paths like mul12345_manual is that the
infrastructure required to determine which to use is many times larger
than the code itself.  :-(

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Doug Anderson

Hi,

On Fri, Apr 29, 2016 at 5:31 PM, Peter Hurley  wrote:
> On 04/29/2016 05:03 PM, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley  
>> wrote:
>>
>> On 04/29/2016 04:01 PM, Doug Anderson wrote:
>> > * serial allows numbering devices by alias.
>>
>> Which is in fact a total nightmare.
>>
>> While stable device order is mandatory in serial because of
>> console command line parameters and existing userspace expectations,
>> it is the number one barrier to providing a shared ttyS namespace
>> for mixed uart platforms.
>>
>> Stable device order has a very real (and often unforeseen) maintenance
>> burden.
>>
>>
>> Interesting. I wonder if these burdens are unique to serial or shared
>> by all the other subsystems that allow ordering? Maybe this is all
>> because of legacy reasons?
>
> Well, the specific issue is certainly unique to serial.
> But what I was suggesting is that 5 years from now, these patches
> could be the "legacy reasons" in mmc.
>
> FWIW, there is already a defacto expectation by boot configurations in the
> field that a given mmc block device is stable across boots. The reality
> is that 10's of kernel command lines look like:
>
> root=/dev/mmcblk0p2
>
> This was a recent regression fixed by Ulf in commit 9aaf3437aa72
> ("mmc: block: Use the mmc host device index as the mmcblk device index")

Ah.  Well, in this case it sounds like we've already got an
expectation of stable numbering from boot to boot.  I had missed Ulf's
patch, so I guess part 3 of my series isn't actually needed and can be
dropped.

So it's just a question of whether we allow people to manually specify
via device tree.

Note: if we really think using root=/dev/mmcblkNpM is a bad idea then
we should deprecate it and yell about it in the boot log.  Then 5 (or
20) years down the road we could remove the feature when the legacy
burden become a pain.  Note that even if we deprecated
root=/dev/mmcblkNpM I'd still love to see the numbering be consistent
to help folks parse dmesg.

Thanks for your thoughts!

-Doug

Re: [PATCH v2] pstore: add lzo/lz4 compression support

2016-04-29 Thread Geliang Tang

On Thu, Feb 18, 2016 at 09:37:26AM -0800, Kees Cook wrote:
> On Thu, Feb 18, 2016 at 6:04 AM, Geliang Tang  wrote:
> > Like zlib compression in pstore, this patch added lzo and lz4
> > compression support so that users can have more options and better
> > compression ratio.
> >
> > The original code treats the compressed data together with the
> > uncompressed ECC correction notice by using zlib decompress. The
> > ECC correction notice is missing in the decompression process. The
> > treatment also makes lzo and lz4 not working. So I treat them
> > separately by using pstore_decompress() to treat the compressed
> > data, and memcpy() to treat the uncompressed ECC correction notice.
> >
> > Signed-off-by: Geliang Tang 
> 
> Thanks for the updates!
> 
> Reviewed-by: Kees Cook 
> 
> -Kees

Hi there,

It's been three months since this patch has been reviewed. But it hasn't
been replied yet by far. Can I please know if there's anything wrong with
it?

-Geliang Tang

> > ---
> > Changes in v2:
> >  revise the patch as Kees suggested:
> >  - drop 'depends on PSTORE' in config choice PSTORE_ZLIB/LZO/LZ4_COMPRESS;
> >  - set big_oops_buf_sz to 0 in free_zlib/lzo/lz4;
> >  - change zbackend[] to zbackend.
> > ---
> >  drivers/firmware/efi/efi-pstore.c |   1 +
> >  fs/pstore/Kconfig |  31 +-
> >  fs/pstore/platform.c  | 218 
> > --
> >  fs/pstore/ram.c   |  10 +-
> >  include/linux/pstore.h|   3 +-
> >  5 files changed, 248 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi-pstore.c 
> > b/drivers/firmware/efi/efi-pstore.c
> > index eac76a7..cd8c35f 100644
> > --- a/drivers/firmware/efi/efi-pstore.c
> > +++ b/drivers/firmware/efi/efi-pstore.c
> > @@ -210,6 +210,7 @@ static int efi_pstore_sysfs_entry_iter(void *data, 
> > struct efivar_entry **pos)
> >  static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
> >int *count, struct timespec *timespec,
> >char **buf, bool *compressed,
> > +  ssize_t *ecc_notice_size,
> >struct pstore_info *psi)
> >  {
> > struct pstore_read_data data;
> > diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> > index 360ae43..be40813 100644
> > --- a/fs/pstore/Kconfig
> > +++ b/fs/pstore/Kconfig
> > @@ -1,8 +1,6 @@
> >  config PSTORE
> > tristate "Persistent store support"
> > default n
> > -   select ZLIB_DEFLATE
> > -   select ZLIB_INFLATE
> > help
> >This option enables generic access to platform level
> >persistent storage via "pstore" filesystem that can
> > @@ -14,6 +12,35 @@ config PSTORE
> >If you don't have a platform persistent store driver,
> >say N.
> >
> > +choice
> > +prompt "Choose compression algorithm"
> > +depends on PSTORE
> > +default PSTORE_ZLIB_COMPRESS
> > +help
> > +  This option chooses compression algorithm.
> > +
> > +config PSTORE_ZLIB_COMPRESS
> > +bool "ZLIB"
> > +select ZLIB_DEFLATE
> > +select ZLIB_INFLATE
> > +help
> > +  This option enables ZLIB compression algorithm support.
> > +
> > +config PSTORE_LZO_COMPRESS
> > +bool "LZO"
> > +select LZO_COMPRESS
> > +select LZO_DECOMPRESS
> > +help
> > +  This option enables LZO compression algorithm support.
> > +
> > +config PSTORE_LZ4_COMPRESS
> > +bool "LZ4"
> > +select LZ4_COMPRESS
> > +select LZ4_DECOMPRESS
> > +help
> > +  This option enables LZ4 compression algorithm support.
> > +endchoice
> > +
> >  config PSTORE_CONSOLE
> > bool "Log kernel console messages"
> > depends on PSTORE
> > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> > index 588461b..7246d50 100644
> > --- a/fs/pstore/platform.c
> > +++ b/fs/pstore/platform.c
> > @@ -28,7 +28,15 @@
> >  #include 
> >  #include 
> >  #include 
> > +#ifdef CONFIG_PSTORE_ZLIB_COMPRESS
> >  #include 
> > +#endif
> > +#ifdef CONFIG_PSTORE_LZO_COMPRESS
> > +#include 
> > +#endif
> > +#ifdef CONFIG_PSTORE_LZ4_COMPRESS
> > +#include 
> > +#endif
> >  #include 
> >  #include 
> >  #include 
> > @@ -69,10 +77,23 @@ struct pstore_info *psinfo;
> >  static char *backend;
> >
> >  /* Compression parameters */
> > +#ifdef CONFIG_PSTORE_ZLIB_COMPRESS
> >  #define COMPR_LEVEL 6
> >  #define WINDOW_BITS 12
> >  #define MEM_LEVEL 4
> >  static struct z_stream_s stream;
> > +#else
> > +static unsigned char *workspace;
> > +#endif
> > +
> > +struct pstore_zbackend {
> > +   int (*compress)(const void *in, void *out, size_t inlen, size_t 
> > outlen);
> > +   int (*decompress)(void *in, void *out, size_t inlen, size_t outlen);
> > +   void (*allocate)(void);
> > +   void

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Doug Anderson

Hi,

On Fri, Apr 29, 2016 at 5:31 PM, Peter Hurley  wrote:
> On 04/29/2016 05:03 PM, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley  
>> wrote:
>>
>> On 04/29/2016 04:01 PM, Doug Anderson wrote:
>> > * serial allows numbering devices by alias.
>>
>> Which is in fact a total nightmare.
>>
>> While stable device order is mandatory in serial because of
>> console command line parameters and existing userspace expectations,
>> it is the number one barrier to providing a shared ttyS namespace
>> for mixed uart platforms.
>>
>> Stable device order has a very real (and often unforeseen) maintenance
>> burden.
>>
>>
>> Interesting. I wonder if these burdens are unique to serial or shared
>> by all the other subsystems that allow ordering? Maybe this is all
>> because of legacy reasons?
>
> Well, the specific issue is certainly unique to serial.
> But what I was suggesting is that 5 years from now, these patches
> could be the "legacy reasons" in mmc.
>
> FWIW, there is already a defacto expectation by boot configurations in the
> field that a given mmc block device is stable across boots. The reality
> is that 10's of kernel command lines look like:
>
> root=/dev/mmcblk0p2
>
> This was a recent regression fixed by Ulf in commit 9aaf3437aa72
> ("mmc: block: Use the mmc host device index as the mmcblk device index")

Ah.  Well, in this case it sounds like we've already got an
expectation of stable numbering from boot to boot.  I had missed Ulf's
patch, so I guess part 3 of my series isn't actually needed and can be
dropped.

So it's just a question of whether we allow people to manually specify
via device tree.

Note: if we really think using root=/dev/mmcblkNpM is a bad idea then
we should deprecate it and yell about it in the boot log.  Then 5 (or
20) years down the road we could remove the feature when the legacy
burden become a pain.  Note that even if we deprecated
root=/dev/mmcblkNpM I'd still love to see the numbering be consistent
to help folks parse dmesg.

Thanks for your thoughts!

-Doug

Re: [PATCH v2] pstore: add lzo/lz4 compression support

2016-04-29 Thread Geliang Tang

On Thu, Feb 18, 2016 at 09:37:26AM -0800, Kees Cook wrote:
> On Thu, Feb 18, 2016 at 6:04 AM, Geliang Tang  wrote:
> > Like zlib compression in pstore, this patch added lzo and lz4
> > compression support so that users can have more options and better
> > compression ratio.
> >
> > The original code treats the compressed data together with the
> > uncompressed ECC correction notice by using zlib decompress. The
> > ECC correction notice is missing in the decompression process. The
> > treatment also makes lzo and lz4 not working. So I treat them
> > separately by using pstore_decompress() to treat the compressed
> > data, and memcpy() to treat the uncompressed ECC correction notice.
> >
> > Signed-off-by: Geliang Tang 
> 
> Thanks for the updates!
> 
> Reviewed-by: Kees Cook 
> 
> -Kees

Hi there,

It's been three months since this patch has been reviewed. But it hasn't
been replied yet by far. Can I please know if there's anything wrong with
it?

-Geliang Tang

> > ---
> > Changes in v2:
> >  revise the patch as Kees suggested:
> >  - drop 'depends on PSTORE' in config choice PSTORE_ZLIB/LZO/LZ4_COMPRESS;
> >  - set big_oops_buf_sz to 0 in free_zlib/lzo/lz4;
> >  - change zbackend[] to zbackend.
> > ---
> >  drivers/firmware/efi/efi-pstore.c |   1 +
> >  fs/pstore/Kconfig |  31 +-
> >  fs/pstore/platform.c  | 218 
> > --
> >  fs/pstore/ram.c   |  10 +-
> >  include/linux/pstore.h|   3 +-
> >  5 files changed, 248 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi-pstore.c 
> > b/drivers/firmware/efi/efi-pstore.c
> > index eac76a7..cd8c35f 100644
> > --- a/drivers/firmware/efi/efi-pstore.c
> > +++ b/drivers/firmware/efi/efi-pstore.c
> > @@ -210,6 +210,7 @@ static int efi_pstore_sysfs_entry_iter(void *data, 
> > struct efivar_entry **pos)
> >  static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
> >int *count, struct timespec *timespec,
> >char **buf, bool *compressed,
> > +  ssize_t *ecc_notice_size,
> >struct pstore_info *psi)
> >  {
> > struct pstore_read_data data;
> > diff --git a/fs/pstore/Kconfig b/fs/pstore/Kconfig
> > index 360ae43..be40813 100644
> > --- a/fs/pstore/Kconfig
> > +++ b/fs/pstore/Kconfig
> > @@ -1,8 +1,6 @@
> >  config PSTORE
> > tristate "Persistent store support"
> > default n
> > -   select ZLIB_DEFLATE
> > -   select ZLIB_INFLATE
> > help
> >This option enables generic access to platform level
> >persistent storage via "pstore" filesystem that can
> > @@ -14,6 +12,35 @@ config PSTORE
> >If you don't have a platform persistent store driver,
> >say N.
> >
> > +choice
> > +prompt "Choose compression algorithm"
> > +depends on PSTORE
> > +default PSTORE_ZLIB_COMPRESS
> > +help
> > +  This option chooses compression algorithm.
> > +
> > +config PSTORE_ZLIB_COMPRESS
> > +bool "ZLIB"
> > +select ZLIB_DEFLATE
> > +select ZLIB_INFLATE
> > +help
> > +  This option enables ZLIB compression algorithm support.
> > +
> > +config PSTORE_LZO_COMPRESS
> > +bool "LZO"
> > +select LZO_COMPRESS
> > +select LZO_DECOMPRESS
> > +help
> > +  This option enables LZO compression algorithm support.
> > +
> > +config PSTORE_LZ4_COMPRESS
> > +bool "LZ4"
> > +select LZ4_COMPRESS
> > +select LZ4_DECOMPRESS
> > +help
> > +  This option enables LZ4 compression algorithm support.
> > +endchoice
> > +
> >  config PSTORE_CONSOLE
> > bool "Log kernel console messages"
> > depends on PSTORE
> > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> > index 588461b..7246d50 100644
> > --- a/fs/pstore/platform.c
> > +++ b/fs/pstore/platform.c
> > @@ -28,7 +28,15 @@
> >  #include 
> >  #include 
> >  #include 
> > +#ifdef CONFIG_PSTORE_ZLIB_COMPRESS
> >  #include 
> > +#endif
> > +#ifdef CONFIG_PSTORE_LZO_COMPRESS
> > +#include 
> > +#endif
> > +#ifdef CONFIG_PSTORE_LZ4_COMPRESS
> > +#include 
> > +#endif
> >  #include 
> >  #include 
> >  #include 
> > @@ -69,10 +77,23 @@ struct pstore_info *psinfo;
> >  static char *backend;
> >
> >  /* Compression parameters */
> > +#ifdef CONFIG_PSTORE_ZLIB_COMPRESS
> >  #define COMPR_LEVEL 6
> >  #define WINDOW_BITS 12
> >  #define MEM_LEVEL 4
> >  static struct z_stream_s stream;
> > +#else
> > +static unsigned char *workspace;
> > +#endif
> > +
> > +struct pstore_zbackend {
> > +   int (*compress)(const void *in, void *out, size_t inlen, size_t 
> > outlen);
> > +   int (*decompress)(void *in, void *out, size_t inlen, size_t outlen);
> > +   void (*allocate)(void);
> > +   void (*free)(void);
> > +
> > +   const char *name;
> > +};
> >
> >

[PATCH v2 1/3] USB: serial: cp210x: Fixed RTS mode setting by the CRTSCTS flag.

2016-04-29 Thread Konstantin Shkolnyy

A bug in the CRTSCTS handling caused RTS to alternate between
CRTSCTS=0 => "RTS transmits active signal" and
CRTSCTS=1 => "RTS receives flow control"
instead of
CRTSCTS=0 => "RTS is statically active" and
CRTSCTS=1 => "RTS receives flow control"
This only happened after first having enabled CRTSCTS.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index dd47823..abcdb63 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -163,10 +163,6 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x1843, 0x0200) }, /* Vaisala USB Instrument Cable */
{ USB_DEVICE(0x18EF, 0xE00F) }, /* ELV USB-I2C-Interface */
{ USB_DEVICE(0x18EF, 0xE025) }, /* ELV Marble Sound Board 1 */
-   { USB_DEVICE(0x1901, 0x0190) }, /* GE B850 CP2105 Recorder interface */
-   { USB_DEVICE(0x1901, 0x0193) }, /* GE B650 CP2104 PMC interface */
-   { USB_DEVICE(0x1901, 0x0194) }, /* GE Healthcare Remote Alarm Box */
-   { USB_DEVICE(0x19CF, 0x3000) }, /* Parrot NMEA GPS Flight Recorder */
{ USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */
{ USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */
{ USB_DEVICE(0x1BA4, 0x0002) }, /* Silicon Labs 358x factory default */
@@ -967,8 +963,7 @@ static void cp210x_set_termios(struct tty_struct *tty,
} else {
modem_ctl[0] &= ~0x7B;
modem_ctl[0] |= 0x01;
-   /* FIXME - OR here instead of assignment looks wrong */
-   modem_ctl[4] |= 0x40;
+   modem_ctl[4] = 0x40;
dev_dbg(dev, "%s - flow control = NONE\n", __func__);
}
 
-- 
1.8.4.5

[PATCH v2 1/3] USB: serial: cp210x: Fixed RTS mode setting by the CRTSCTS flag.

2016-04-29 Thread Konstantin Shkolnyy

A bug in the CRTSCTS handling caused RTS to alternate between
CRTSCTS=0 => "RTS transmits active signal" and
CRTSCTS=1 => "RTS receives flow control"
instead of
CRTSCTS=0 => "RTS is statically active" and
CRTSCTS=1 => "RTS receives flow control"
This only happened after first having enabled CRTSCTS.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index dd47823..abcdb63 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -163,10 +163,6 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x1843, 0x0200) }, /* Vaisala USB Instrument Cable */
{ USB_DEVICE(0x18EF, 0xE00F) }, /* ELV USB-I2C-Interface */
{ USB_DEVICE(0x18EF, 0xE025) }, /* ELV Marble Sound Board 1 */
-   { USB_DEVICE(0x1901, 0x0190) }, /* GE B850 CP2105 Recorder interface */
-   { USB_DEVICE(0x1901, 0x0193) }, /* GE B650 CP2104 PMC interface */
-   { USB_DEVICE(0x1901, 0x0194) }, /* GE Healthcare Remote Alarm Box */
-   { USB_DEVICE(0x19CF, 0x3000) }, /* Parrot NMEA GPS Flight Recorder */
{ USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */
{ USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */
{ USB_DEVICE(0x1BA4, 0x0002) }, /* Silicon Labs 358x factory default */
@@ -967,8 +963,7 @@ static void cp210x_set_termios(struct tty_struct *tty,
} else {
modem_ctl[0] &= ~0x7B;
modem_ctl[0] |= 0x01;
-   /* FIXME - OR here instead of assignment looks wrong */
-   modem_ctl[4] |= 0x40;
+   modem_ctl[4] = 0x40;
dev_dbg(dev, "%s - flow control = NONE\n", __func__);
}
 
-- 
1.8.4.5

[PATCH v2 3/3] USB: serial: cp210x: Cleaned up CRTSCTS flag code.

2016-04-29 Thread Konstantin Shkolnyy

The CRTSCTS flag code cleared (and inconsistently) bits unrelated to
CRTSCTS functionality. It was also harder than necessary to read.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index da0567c..d3e1dd2 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -988,24 +988,21 @@ static void cp210x_set_termios(struct tty_struct *tty,
dev_dbg(dev, "%s - read ulControlHandshake=%08x 
ulFlowReplace=%08x\n",
__func__, ControlHandshake, FlowReplace);
 
+   ControlHandshake &= ~SERIAL_DSR_HANDSHAKE;
+   ControlHandshake &= ~SERIAL_DCD_HANDSHAKE;
+   ControlHandshake &= ~SERIAL_DSR_SENSITIVITY;
+   ControlHandshake &= ~SERIAL_DTR_MASK;
+   ControlHandshake |= SERIAL_DTR_ACTIVE;
if (cflag & CRTSCTS) {
-   ControlHandshake &= ~(SERIAL_DTR_MASK |
-   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
-   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
-   ControlHandshake |= SERIAL_DTR_ACTIVE;
ControlHandshake |= SERIAL_CTS_HANDSHAKE;
-   /* FIXME why clear bits unrelated to flow control */
-   /* FIXME why clear _XOFF_CONTINUE which is never set */
-   FlowReplace &= ~0x;
+
+   FlowReplace &= ~SERIAL_RTS_MASK;
FlowReplace |= SERIAL_RTS_FLOW_CTL;
dev_dbg(dev, "%s - flow control = CRTSCTS\n", __func__);
} else {
-   ControlHandshake &= ~(SERIAL_DTR_MASK |
-   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
-   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
-   ControlHandshake |= SERIAL_DTR_ACTIVE;
-   /* FIXME - why clear bits unrelated to flow control */
-   FlowReplace &= ~0xff;
+   ControlHandshake &= ~SERIAL_CTS_HANDSHAKE;
+
+   FlowReplace &= ~SERIAL_RTS_MASK;
FlowReplace |= SERIAL_RTS_ACTIVE;
dev_dbg(dev, "%s - flow control = NONE\n", __func__);
}
-- 
1.8.4.5

[PATCH v2 2/3] USB: serial: cp210x: Added comments to CRTSCTS flag code.

2016-04-29 Thread Konstantin Shkolnyy

Replaced magic numbers used in the CRTSCTS flag code with symbolic names
from the chip specification.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 93 +
 1 file changed, 69 insertions(+), 24 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index abcdb63..da0567c 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -323,6 +323,40 @@ struct cp210x_comm_status {
  */
 #define PURGE_ALL  0x000f
 
+/* CP210X_GET_FLOW/CP210X_SET_FLOW read/write these 0x10 bytes */
+struct cp210x_flow_ctl {
+   __le32  ulControlHandshake;
+   __le32  ulFlowReplace;
+   __le32  ulXonLimit;
+   __le32  ulXoffLimit;
+} __packed;
+
+/* cp210x_flow_ctl::ulControlHandshake */
+#define SERIAL_DTR_MASK0x0003
+#define SERIAL_CTS_HANDSHAKE   0x0008
+#define SERIAL_DSR_HANDSHAKE   0x0010
+#define SERIAL_DCD_HANDSHAKE   0x0020
+#define SERIAL_DSR_SENSITIVITY 0x0040
+
+/* values for cp210x_flow_ctl::ulControlHandshake::SERIAL_DTR_MASK */
+#define SERIAL_DTR_INACTIVE0x
+#define SERIAL_DTR_ACTIVE  0x0001
+#define SERIAL_DTR_FLOW_CTL0x0002
+
+/* cp210x_flow_ctl::ulFlowReplace */
+#define SERIAL_AUTO_TRANSMIT   0x0001
+#define SERIAL_AUTO_RECEIVE0x0002
+#define SERIAL_ERROR_CHAR  0x0004
+#define SERIAL_NULL_STRIPPING  0x0008
+#define SERIAL_BREAK_CHAR  0x0010
+#define SERIAL_RTS_MASK0x00c0
+#define SERIAL_XOFF_CONTINUE   0x8000
+
+/* values for cp210x_flow_ctl::ulFlowReplace::SERIAL_RTS_MASK */
+#define SERIAL_RTS_INACTIVE0x
+#define SERIAL_RTS_ACTIVE  0x0040
+#define SERIAL_RTS_FLOW_CTL0x0080
+
 /*
  * Reads a variable-sized block of CP210X_ registers, identified by req.
  * Returns data into buf in native USB byte order.
@@ -690,7 +724,7 @@ static void cp210x_get_termios_port(struct usb_serial_port 
*port,
 {
struct device *dev = >dev;
unsigned int cflag;
-   u8 modem_ctl[16];
+   struct cp210x_flow_ctl flow_ctl;
u32 baud;
u16 bits;
 
@@ -788,9 +822,9 @@ static void cp210x_get_termios_port(struct usb_serial_port 
*port,
break;
}
 
-   cp210x_read_reg_block(port, CP210X_GET_FLOW, modem_ctl,
-   sizeof(modem_ctl));
-   if (modem_ctl[0] & 0x08) {
+   cp210x_read_reg_block(port, CP210X_GET_FLOW, _ctl,
+   sizeof(flow_ctl));
+   if (le32_to_cpu(flow_ctl.ulControlHandshake) & SERIAL_CTS_HANDSHAKE) {
dev_dbg(dev, "%s - flow control = CRTSCTS\n", __func__);
cflag |= CRTSCTS;
} else {
@@ -859,7 +893,6 @@ static void cp210x_set_termios(struct tty_struct *tty,
struct device *dev = >dev;
unsigned int cflag, old_cflag;
u16 bits;
-   u8 modem_ctl[16];
 
cflag = tty->termios.c_cflag;
old_cflag = old_termios->c_cflag;
@@ -944,33 +977,45 @@ static void cp210x_set_termios(struct tty_struct *tty,
 
if ((cflag & CRTSCTS) != (old_cflag & CRTSCTS)) {
 
-   /* Only bytes 0, 4 and 7 out of first 8 have functional bits */
+   struct cp210x_flow_ctl flow_ctl;
+   u32 ControlHandshake;
+   u32 FlowReplace;
 
-   cp210x_read_reg_block(port, CP210X_GET_FLOW, modem_ctl,
-   sizeof(modem_ctl));
-   dev_dbg(dev, "%s - read modem controls = %02x .. .. .. %02x .. 
.. %02x\n",
-   __func__, modem_ctl[0], modem_ctl[4], modem_ctl[7]);
+   cp210x_read_reg_block(port, CP210X_GET_FLOW, _ctl,
+   sizeof(flow_ctl));
+   ControlHandshake = le32_to_cpu(flow_ctl.ulControlHandshake);
+   FlowReplace  = le32_to_cpu(flow_ctl.ulFlowReplace);
+   dev_dbg(dev, "%s - read ulControlHandshake=%08x 
ulFlowReplace=%08x\n",
+   __func__, ControlHandshake, FlowReplace);
 
if (cflag & CRTSCTS) {
-   modem_ctl[0] &= ~0x7B;
-   modem_ctl[0] |= 0x09;
-   modem_ctl[4] = 0x80;
-   /* FIXME - why clear reserved bits just read? */
-   modem_ctl[5] = 0;
-   modem_ctl[6] = 0;
-   modem_ctl[7] = 0;
+   ControlHandshake &= ~(SERIAL_DTR_MASK |
+   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
+   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
+   ControlHandshake |= SERIAL_DTR_ACTIVE;
+   ControlHandshake |= SERIAL_CTS_HANDSHAKE;
+   /* FIXME why clear bits unrelated to flow

[PATCH v2 3/3] USB: serial: cp210x: Cleaned up CRTSCTS flag code.

2016-04-29 Thread Konstantin Shkolnyy

The CRTSCTS flag code cleared (and inconsistently) bits unrelated to
CRTSCTS functionality. It was also harder than necessary to read.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index da0567c..d3e1dd2 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -988,24 +988,21 @@ static void cp210x_set_termios(struct tty_struct *tty,
dev_dbg(dev, "%s - read ulControlHandshake=%08x 
ulFlowReplace=%08x\n",
__func__, ControlHandshake, FlowReplace);
 
+   ControlHandshake &= ~SERIAL_DSR_HANDSHAKE;
+   ControlHandshake &= ~SERIAL_DCD_HANDSHAKE;
+   ControlHandshake &= ~SERIAL_DSR_SENSITIVITY;
+   ControlHandshake &= ~SERIAL_DTR_MASK;
+   ControlHandshake |= SERIAL_DTR_ACTIVE;
if (cflag & CRTSCTS) {
-   ControlHandshake &= ~(SERIAL_DTR_MASK |
-   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
-   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
-   ControlHandshake |= SERIAL_DTR_ACTIVE;
ControlHandshake |= SERIAL_CTS_HANDSHAKE;
-   /* FIXME why clear bits unrelated to flow control */
-   /* FIXME why clear _XOFF_CONTINUE which is never set */
-   FlowReplace &= ~0x;
+
+   FlowReplace &= ~SERIAL_RTS_MASK;
FlowReplace |= SERIAL_RTS_FLOW_CTL;
dev_dbg(dev, "%s - flow control = CRTSCTS\n", __func__);
} else {
-   ControlHandshake &= ~(SERIAL_DTR_MASK |
-   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
-   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
-   ControlHandshake |= SERIAL_DTR_ACTIVE;
-   /* FIXME - why clear bits unrelated to flow control */
-   FlowReplace &= ~0xff;
+   ControlHandshake &= ~SERIAL_CTS_HANDSHAKE;
+
+   FlowReplace &= ~SERIAL_RTS_MASK;
FlowReplace |= SERIAL_RTS_ACTIVE;
dev_dbg(dev, "%s - flow control = NONE\n", __func__);
}
-- 
1.8.4.5

[PATCH v2 2/3] USB: serial: cp210x: Added comments to CRTSCTS flag code.

2016-04-29 Thread Konstantin Shkolnyy

Replaced magic numbers used in the CRTSCTS flag code with symbolic names
from the chip specification.

Signed-off-by: Konstantin Shkolnyy 
---
Changes in v2:
Improved CRTSCTS fix based on feedback. Dropped get_termios error handling.

 drivers/usb/serial/cp210x.c | 93 +
 1 file changed, 69 insertions(+), 24 deletions(-)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index abcdb63..da0567c 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -323,6 +323,40 @@ struct cp210x_comm_status {
  */
 #define PURGE_ALL  0x000f
 
+/* CP210X_GET_FLOW/CP210X_SET_FLOW read/write these 0x10 bytes */
+struct cp210x_flow_ctl {
+   __le32  ulControlHandshake;
+   __le32  ulFlowReplace;
+   __le32  ulXonLimit;
+   __le32  ulXoffLimit;
+} __packed;
+
+/* cp210x_flow_ctl::ulControlHandshake */
+#define SERIAL_DTR_MASK0x0003
+#define SERIAL_CTS_HANDSHAKE   0x0008
+#define SERIAL_DSR_HANDSHAKE   0x0010
+#define SERIAL_DCD_HANDSHAKE   0x0020
+#define SERIAL_DSR_SENSITIVITY 0x0040
+
+/* values for cp210x_flow_ctl::ulControlHandshake::SERIAL_DTR_MASK */
+#define SERIAL_DTR_INACTIVE0x
+#define SERIAL_DTR_ACTIVE  0x0001
+#define SERIAL_DTR_FLOW_CTL0x0002
+
+/* cp210x_flow_ctl::ulFlowReplace */
+#define SERIAL_AUTO_TRANSMIT   0x0001
+#define SERIAL_AUTO_RECEIVE0x0002
+#define SERIAL_ERROR_CHAR  0x0004
+#define SERIAL_NULL_STRIPPING  0x0008
+#define SERIAL_BREAK_CHAR  0x0010
+#define SERIAL_RTS_MASK0x00c0
+#define SERIAL_XOFF_CONTINUE   0x8000
+
+/* values for cp210x_flow_ctl::ulFlowReplace::SERIAL_RTS_MASK */
+#define SERIAL_RTS_INACTIVE0x
+#define SERIAL_RTS_ACTIVE  0x0040
+#define SERIAL_RTS_FLOW_CTL0x0080
+
 /*
  * Reads a variable-sized block of CP210X_ registers, identified by req.
  * Returns data into buf in native USB byte order.
@@ -690,7 +724,7 @@ static void cp210x_get_termios_port(struct usb_serial_port 
*port,
 {
struct device *dev = >dev;
unsigned int cflag;
-   u8 modem_ctl[16];
+   struct cp210x_flow_ctl flow_ctl;
u32 baud;
u16 bits;
 
@@ -788,9 +822,9 @@ static void cp210x_get_termios_port(struct usb_serial_port 
*port,
break;
}
 
-   cp210x_read_reg_block(port, CP210X_GET_FLOW, modem_ctl,
-   sizeof(modem_ctl));
-   if (modem_ctl[0] & 0x08) {
+   cp210x_read_reg_block(port, CP210X_GET_FLOW, _ctl,
+   sizeof(flow_ctl));
+   if (le32_to_cpu(flow_ctl.ulControlHandshake) & SERIAL_CTS_HANDSHAKE) {
dev_dbg(dev, "%s - flow control = CRTSCTS\n", __func__);
cflag |= CRTSCTS;
} else {
@@ -859,7 +893,6 @@ static void cp210x_set_termios(struct tty_struct *tty,
struct device *dev = >dev;
unsigned int cflag, old_cflag;
u16 bits;
-   u8 modem_ctl[16];
 
cflag = tty->termios.c_cflag;
old_cflag = old_termios->c_cflag;
@@ -944,33 +977,45 @@ static void cp210x_set_termios(struct tty_struct *tty,
 
if ((cflag & CRTSCTS) != (old_cflag & CRTSCTS)) {
 
-   /* Only bytes 0, 4 and 7 out of first 8 have functional bits */
+   struct cp210x_flow_ctl flow_ctl;
+   u32 ControlHandshake;
+   u32 FlowReplace;
 
-   cp210x_read_reg_block(port, CP210X_GET_FLOW, modem_ctl,
-   sizeof(modem_ctl));
-   dev_dbg(dev, "%s - read modem controls = %02x .. .. .. %02x .. 
.. %02x\n",
-   __func__, modem_ctl[0], modem_ctl[4], modem_ctl[7]);
+   cp210x_read_reg_block(port, CP210X_GET_FLOW, _ctl,
+   sizeof(flow_ctl));
+   ControlHandshake = le32_to_cpu(flow_ctl.ulControlHandshake);
+   FlowReplace  = le32_to_cpu(flow_ctl.ulFlowReplace);
+   dev_dbg(dev, "%s - read ulControlHandshake=%08x 
ulFlowReplace=%08x\n",
+   __func__, ControlHandshake, FlowReplace);
 
if (cflag & CRTSCTS) {
-   modem_ctl[0] &= ~0x7B;
-   modem_ctl[0] |= 0x09;
-   modem_ctl[4] = 0x80;
-   /* FIXME - why clear reserved bits just read? */
-   modem_ctl[5] = 0;
-   modem_ctl[6] = 0;
-   modem_ctl[7] = 0;
+   ControlHandshake &= ~(SERIAL_DTR_MASK |
+   SERIAL_CTS_HANDSHAKE | SERIAL_DSR_HANDSHAKE |
+   SERIAL_DCD_HANDSHAKE | SERIAL_DSR_SENSITIVITY);
+   ControlHandshake |= SERIAL_DTR_ACTIVE;
+   ControlHandshake |= SERIAL_CTS_HANDSHAKE;
+   /* FIXME why clear bits unrelated to flow control */
+

[PATCH 2/6] HSI: omap_ssi: fix module unloading

2016-04-29 Thread Sebastian Reichel

Removal of ssi controller debugfs directory must
happen after the clients have been removed from
it.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index c582229d1cd2..2dd46b219af2 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -526,6 +526,9 @@ static int __exit ssi_remove(struct platform_device *pd)
 {
struct hsi_controller *ssi = platform_get_drvdata(pd);
 
+   /* cleanup of of_platform_populate() call */
+   device_for_each_child(>dev, NULL, ssi_remove_ports);
+
 #ifdef CONFIG_DEBUG_FS
ssi_debug_remove_ctrl(ssi);
 #endif
@@ -534,9 +537,6 @@ static int __exit ssi_remove(struct platform_device *pd)
 
pm_runtime_disable(>dev);
 
-   /* cleanup of of_platform_populate() call */
-   device_for_each_child(>dev, NULL, ssi_remove_ports);
-
return 0;
 }
 
-- 
2.8.1

[PATCH 2/6] HSI: omap_ssi: fix module unloading

2016-04-29 Thread Sebastian Reichel

Removal of ssi controller debugfs directory must
happen after the clients have been removed from
it.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index c582229d1cd2..2dd46b219af2 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -526,6 +526,9 @@ static int __exit ssi_remove(struct platform_device *pd)
 {
struct hsi_controller *ssi = platform_get_drvdata(pd);
 
+   /* cleanup of of_platform_populate() call */
+   device_for_each_child(>dev, NULL, ssi_remove_ports);
+
 #ifdef CONFIG_DEBUG_FS
ssi_debug_remove_ctrl(ssi);
 #endif
@@ -534,9 +537,6 @@ static int __exit ssi_remove(struct platform_device *pd)
 
pm_runtime_disable(>dev);
 
-   /* cleanup of of_platform_populate() call */
-   device_for_each_child(>dev, NULL, ssi_remove_ports);
-
return 0;
 }
 
-- 
2.8.1

[PATCH 5/6] HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module

2016-04-29 Thread Sebastian Reichel

Merge omap_ssi and omap_ssi_port into one module. This
fixes problems with module cycle dependencies introduced
by future patches.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/Kconfig |  5 -
 drivers/hsi/controllers/Makefile|  4 ++--
 drivers/hsi/controllers/omap_ssi.h  |  2 ++
 drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} | 17 -
 drivers/hsi/controllers/omap_ssi_port.c | 16 +---
 5 files changed, 21 insertions(+), 23 deletions(-)
 rename drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} (97%)

diff --git a/drivers/hsi/controllers/Kconfig b/drivers/hsi/controllers/Kconfig
index 6aba27808172..084ec97eec64 100644
--- a/drivers/hsi/controllers/Kconfig
+++ b/drivers/hsi/controllers/Kconfig
@@ -12,8 +12,3 @@ config OMAP_SSI
  If you say Y here, you will enable the OMAP SSI hardware driver.
 
  If unsure, say N.
-
-config OMAP_SSI_PORT
-   tristate
-   default m if OMAP_SSI=m
-   default y if OMAP_SSI=y
diff --git a/drivers/hsi/controllers/Makefile b/drivers/hsi/controllers/Makefile
index d2665cf9c545..7aba9c7f71bb 100644
--- a/drivers/hsi/controllers/Makefile
+++ b/drivers/hsi/controllers/Makefile
@@ -2,5 +2,5 @@
 # Makefile for HSI controllers drivers
 #
 
-obj-$(CONFIG_OMAP_SSI) += omap_ssi.o
-obj-$(CONFIG_OMAP_SSI_PORT)+= omap_ssi_port.o
+omap_ssi-objs  += omap_ssi_core.o omap_ssi_port.o
+obj-$(CONFIG_OMAP_SSI) += omap_ssi.o
diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index 1fa028078a3c..e493321cb0c3 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -164,4 +164,6 @@ struct omap_ssi_controller {
 #endif
 };
 
+extern struct platform_driver ssi_port_pdriver;
+
 #endif /* __LINUX_HSI_OMAP_SSI_H__ */
diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi_core.c
similarity index 97%
rename from drivers/hsi/controllers/omap_ssi.c
rename to drivers/hsi/controllers/omap_ssi_core.c
index 68dfdaa19938..535c76038288 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi_core.c
@@ -605,7 +605,22 @@ static struct platform_driver ssi_pdriver = {
},
 };
 
-module_platform_driver(ssi_pdriver);
+static int __init ssi_init(void) {
+   int ret;
+
+   ret = platform_driver_register(_pdriver);
+   if (ret)
+   return ret;
+
+   return platform_driver_register(_port_pdriver);
+}
+module_init(ssi_init);
+
+static void __exit ssi_exit(void) {
+   platform_driver_unregister(_port_pdriver);
+   platform_driver_unregister(_pdriver);
+}
+module_exit(ssi_exit);
 
 MODULE_ALIAS("platform:omap_ssi");
 MODULE_AUTHOR("Carlos Chinea ");
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index 530095ed39e7..1569bbb53ee8 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -1117,11 +1117,6 @@ static int ssi_port_probe(struct platform_device *pd)
 
dev_dbg(>dev, "init ssi port...\n");
 
-   if (!try_module_get(ssi->owner)) {
-   dev_err(>dev, "could not increment parent module 
refcount\n");
-   return -ENODEV;
-   }
-
if (!ssi->port || !omap_ssi->port) {
dev_err(>dev, "ssi controller not initialized!\n");
err = -ENODEV;
@@ -1242,7 +1237,6 @@ static int ssi_port_remove(struct platform_device *pd)
 
omap_ssi->port[omap_port->port_id] = NULL;
platform_set_drvdata(pd, NULL);
-   module_put(ssi->owner);
pm_runtime_disable(>dev);
 
return 0;
@@ -1369,7 +1363,7 @@ MODULE_DEVICE_TABLE(of, omap_ssi_port_of_match);
 #define omap_ssi_port_of_match NULL
 #endif
 
-static struct platform_driver ssi_port_pdriver = {
+struct platform_driver ssi_port_pdriver = {
.probe = ssi_port_probe,
.remove = ssi_port_remove,
.driver = {
@@ -1378,11 +1372,3 @@ static struct platform_driver ssi_port_pdriver = {
.pm = DEV_PM_OPS,
},
 };
-
-module_platform_driver(ssi_port_pdriver);
-
-MODULE_ALIAS("platform:omap_ssi_port");
-MODULE_AUTHOR("Carlos Chinea ");
-MODULE_AUTHOR("Sebastian Reichel ");
-MODULE_DESCRIPTION("Synchronous Serial Interface Port Driver");
-MODULE_LICENSE("GPL v2");
-- 
2.8.1

[PATCH 6/6] HSI: omap-ssi: add clk change support

2016-04-29 Thread Sebastian Reichel

This adds support for frequency changes of the SSI
functional clock, which may occur due to DVFS.

Signed-off-By: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.h  |  6 
 drivers/hsi/controllers/omap_ssi_core.c | 63 +
 drivers/hsi/controllers/omap_ssi_port.c | 20 +++
 3 files changed, 89 insertions(+)

diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index e493321cb0c3..7b4dec2c69ff 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -134,6 +134,8 @@ struct gdd_trn {
  * @gdd_tasklet: bottom half for DMA transfers
  * @gdd_trn: Array of GDD transaction data for ongoing GDD transfers
  * @lock: lock to serialize access to GDD
+ * @fck_nb: DVFS notfifier block
+ * @fck_rate: clock rate
  * @loss_count: To follow if we need to restore context or not
  * @max_speed: Maximum TX speed (Kb/s) set by the clients.
  * @sysconfig: SSI controller saved context
@@ -151,6 +153,7 @@ struct omap_ssi_controller {
struct tasklet_struct   gdd_tasklet;
struct gdd_trn  gdd_trn[SSI_MAX_GDD_LCH];
spinlock_t  lock;
+   struct notifier_block   fck_nb;
unsigned long   fck_rate;
u32 loss_count;
u32 max_speed;
@@ -164,6 +167,9 @@ struct omap_ssi_controller {
 #endif
 };
 
+void omap_ssi_port_update_fclk(struct hsi_controller *ssi,
+  struct omap_ssi_port *omap_port);
+
 extern struct platform_driver ssi_port_pdriver;
 
 #endif /* __LINUX_HSI_OMAP_SSI_H__ */
diff --git a/drivers/hsi/controllers/omap_ssi_core.c 
b/drivers/hsi/controllers/omap_ssi_core.c
index 535c76038288..15b2a600d77b 100644
--- a/drivers/hsi/controllers/omap_ssi_core.c
+++ b/drivers/hsi/controllers/omap_ssi_core.c
@@ -290,6 +290,64 @@ static unsigned long ssi_get_clk_rate(struct 
hsi_controller *ssi)
return rate;
 }
 
+static int ssi_clk_event(struct notifier_block *nb, unsigned long event,
+   void *data)
+{
+   struct omap_ssi_controller *omap_ssi = container_of(nb,
+   struct omap_ssi_controller, fck_nb);
+   struct hsi_controller *ssi = to_hsi_controller(omap_ssi->dev);
+   struct clk_notifier_data *clk_data = data;
+   struct omap_ssi_port *omap_port;
+   int i;
+
+   switch (event) {
+   case PRE_RATE_CHANGE:
+   dev_dbg(>device, "pre rate change\n");
+
+   for (i = 0; i < ssi->num_ports; i++) {
+   omap_port = omap_ssi->port[i];
+
+   if (!omap_port)
+   continue;
+
+   /* Workaround for SWBREAK + CAwake down race in CMT */
+   tasklet_disable(_port->wake_tasklet);
+
+   /* stop all ssi communication */
+   pinctrl_pm_select_idle_state(omap_port->pdev);
+   udelay(1); /* wait for racing frames */
+   }
+
+   break;
+   case ABORT_RATE_CHANGE:
+   dev_dbg(>device, "abort rate change\n");
+   /* Fall through */
+   case POST_RATE_CHANGE:
+   dev_dbg(>device, "post rate change (%lu -> %lu)\n",
+   clk_data->old_rate, clk_data->new_rate);
+   omap_ssi->fck_rate = DIV_ROUND_CLOSEST(clk_data->new_rate, 
1000); /* KHz */
+
+   for (i = 0; i < ssi->num_ports; i++) {
+   omap_port = omap_ssi->port[i];
+
+   if (!omap_port)
+   continue;
+
+   omap_ssi_port_update_fclk(ssi, omap_port);
+
+   /* resume ssi communication */
+   pinctrl_pm_select_default_state(omap_port->pdev);
+   tasklet_enable(_port->wake_tasklet);
+   }
+
+   break;
+   default:
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
 static int ssi_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
@@ -369,6 +427,10 @@ static int ssi_add_controller(struct hsi_controller *ssi,
goto out_err;
}
 
+   omap_ssi->fck_nb.notifier_call = ssi_clk_event;
+   omap_ssi->fck_nb.priority = INT_MAX;
+   clk_notifier_register(omap_ssi->fck, _ssi->fck_nb);
+
/* TODO: find register, which can be used to detect context loss */
omap_ssi->get_loss = NULL;
 
@@ -432,6 +494,7 @@ static void ssi_remove_controller(struct hsi_controller 
*ssi)
int id = ssi->id;
tasklet_kill(_ssi->gdd_tasklet);
hsi_unregister_controller(ssi);
+   clk_notifier_unregister(omap_ssi->fck, _ssi->fck_nb);
ida_simple_remove(_omap_ssi_ida, id);
 }
 
diff --git

[PATCH 5/6] HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module

2016-04-29 Thread Sebastian Reichel

Merge omap_ssi and omap_ssi_port into one module. This
fixes problems with module cycle dependencies introduced
by future patches.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/Kconfig |  5 -
 drivers/hsi/controllers/Makefile|  4 ++--
 drivers/hsi/controllers/omap_ssi.h  |  2 ++
 drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} | 17 -
 drivers/hsi/controllers/omap_ssi_port.c | 16 +---
 5 files changed, 21 insertions(+), 23 deletions(-)
 rename drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} (97%)

diff --git a/drivers/hsi/controllers/Kconfig b/drivers/hsi/controllers/Kconfig
index 6aba27808172..084ec97eec64 100644
--- a/drivers/hsi/controllers/Kconfig
+++ b/drivers/hsi/controllers/Kconfig
@@ -12,8 +12,3 @@ config OMAP_SSI
  If you say Y here, you will enable the OMAP SSI hardware driver.
 
  If unsure, say N.
-
-config OMAP_SSI_PORT
-   tristate
-   default m if OMAP_SSI=m
-   default y if OMAP_SSI=y
diff --git a/drivers/hsi/controllers/Makefile b/drivers/hsi/controllers/Makefile
index d2665cf9c545..7aba9c7f71bb 100644
--- a/drivers/hsi/controllers/Makefile
+++ b/drivers/hsi/controllers/Makefile
@@ -2,5 +2,5 @@
 # Makefile for HSI controllers drivers
 #
 
-obj-$(CONFIG_OMAP_SSI) += omap_ssi.o
-obj-$(CONFIG_OMAP_SSI_PORT)+= omap_ssi_port.o
+omap_ssi-objs  += omap_ssi_core.o omap_ssi_port.o
+obj-$(CONFIG_OMAP_SSI) += omap_ssi.o
diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index 1fa028078a3c..e493321cb0c3 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -164,4 +164,6 @@ struct omap_ssi_controller {
 #endif
 };
 
+extern struct platform_driver ssi_port_pdriver;
+
 #endif /* __LINUX_HSI_OMAP_SSI_H__ */
diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi_core.c
similarity index 97%
rename from drivers/hsi/controllers/omap_ssi.c
rename to drivers/hsi/controllers/omap_ssi_core.c
index 68dfdaa19938..535c76038288 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi_core.c
@@ -605,7 +605,22 @@ static struct platform_driver ssi_pdriver = {
},
 };
 
-module_platform_driver(ssi_pdriver);
+static int __init ssi_init(void) {
+   int ret;
+
+   ret = platform_driver_register(_pdriver);
+   if (ret)
+   return ret;
+
+   return platform_driver_register(_port_pdriver);
+}
+module_init(ssi_init);
+
+static void __exit ssi_exit(void) {
+   platform_driver_unregister(_port_pdriver);
+   platform_driver_unregister(_pdriver);
+}
+module_exit(ssi_exit);
 
 MODULE_ALIAS("platform:omap_ssi");
 MODULE_AUTHOR("Carlos Chinea ");
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index 530095ed39e7..1569bbb53ee8 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -1117,11 +1117,6 @@ static int ssi_port_probe(struct platform_device *pd)
 
dev_dbg(>dev, "init ssi port...\n");
 
-   if (!try_module_get(ssi->owner)) {
-   dev_err(>dev, "could not increment parent module 
refcount\n");
-   return -ENODEV;
-   }
-
if (!ssi->port || !omap_ssi->port) {
dev_err(>dev, "ssi controller not initialized!\n");
err = -ENODEV;
@@ -1242,7 +1237,6 @@ static int ssi_port_remove(struct platform_device *pd)
 
omap_ssi->port[omap_port->port_id] = NULL;
platform_set_drvdata(pd, NULL);
-   module_put(ssi->owner);
pm_runtime_disable(>dev);
 
return 0;
@@ -1369,7 +1363,7 @@ MODULE_DEVICE_TABLE(of, omap_ssi_port_of_match);
 #define omap_ssi_port_of_match NULL
 #endif
 
-static struct platform_driver ssi_port_pdriver = {
+struct platform_driver ssi_port_pdriver = {
.probe = ssi_port_probe,
.remove = ssi_port_remove,
.driver = {
@@ -1378,11 +1372,3 @@ static struct platform_driver ssi_port_pdriver = {
.pm = DEV_PM_OPS,
},
 };
-
-module_platform_driver(ssi_port_pdriver);
-
-MODULE_ALIAS("platform:omap_ssi_port");
-MODULE_AUTHOR("Carlos Chinea ");
-MODULE_AUTHOR("Sebastian Reichel ");
-MODULE_DESCRIPTION("Synchronous Serial Interface Port Driver");
-MODULE_LICENSE("GPL v2");
-- 
2.8.1

[PATCH 6/6] HSI: omap-ssi: add clk change support

2016-04-29 Thread Sebastian Reichel

This adds support for frequency changes of the SSI
functional clock, which may occur due to DVFS.

Signed-off-By: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.h  |  6 
 drivers/hsi/controllers/omap_ssi_core.c | 63 +
 drivers/hsi/controllers/omap_ssi_port.c | 20 +++
 3 files changed, 89 insertions(+)

diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index e493321cb0c3..7b4dec2c69ff 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -134,6 +134,8 @@ struct gdd_trn {
  * @gdd_tasklet: bottom half for DMA transfers
  * @gdd_trn: Array of GDD transaction data for ongoing GDD transfers
  * @lock: lock to serialize access to GDD
+ * @fck_nb: DVFS notfifier block
+ * @fck_rate: clock rate
  * @loss_count: To follow if we need to restore context or not
  * @max_speed: Maximum TX speed (Kb/s) set by the clients.
  * @sysconfig: SSI controller saved context
@@ -151,6 +153,7 @@ struct omap_ssi_controller {
struct tasklet_struct   gdd_tasklet;
struct gdd_trn  gdd_trn[SSI_MAX_GDD_LCH];
spinlock_t  lock;
+   struct notifier_block   fck_nb;
unsigned long   fck_rate;
u32 loss_count;
u32 max_speed;
@@ -164,6 +167,9 @@ struct omap_ssi_controller {
 #endif
 };
 
+void omap_ssi_port_update_fclk(struct hsi_controller *ssi,
+  struct omap_ssi_port *omap_port);
+
 extern struct platform_driver ssi_port_pdriver;
 
 #endif /* __LINUX_HSI_OMAP_SSI_H__ */
diff --git a/drivers/hsi/controllers/omap_ssi_core.c 
b/drivers/hsi/controllers/omap_ssi_core.c
index 535c76038288..15b2a600d77b 100644
--- a/drivers/hsi/controllers/omap_ssi_core.c
+++ b/drivers/hsi/controllers/omap_ssi_core.c
@@ -290,6 +290,64 @@ static unsigned long ssi_get_clk_rate(struct 
hsi_controller *ssi)
return rate;
 }
 
+static int ssi_clk_event(struct notifier_block *nb, unsigned long event,
+   void *data)
+{
+   struct omap_ssi_controller *omap_ssi = container_of(nb,
+   struct omap_ssi_controller, fck_nb);
+   struct hsi_controller *ssi = to_hsi_controller(omap_ssi->dev);
+   struct clk_notifier_data *clk_data = data;
+   struct omap_ssi_port *omap_port;
+   int i;
+
+   switch (event) {
+   case PRE_RATE_CHANGE:
+   dev_dbg(>device, "pre rate change\n");
+
+   for (i = 0; i < ssi->num_ports; i++) {
+   omap_port = omap_ssi->port[i];
+
+   if (!omap_port)
+   continue;
+
+   /* Workaround for SWBREAK + CAwake down race in CMT */
+   tasklet_disable(_port->wake_tasklet);
+
+   /* stop all ssi communication */
+   pinctrl_pm_select_idle_state(omap_port->pdev);
+   udelay(1); /* wait for racing frames */
+   }
+
+   break;
+   case ABORT_RATE_CHANGE:
+   dev_dbg(>device, "abort rate change\n");
+   /* Fall through */
+   case POST_RATE_CHANGE:
+   dev_dbg(>device, "post rate change (%lu -> %lu)\n",
+   clk_data->old_rate, clk_data->new_rate);
+   omap_ssi->fck_rate = DIV_ROUND_CLOSEST(clk_data->new_rate, 
1000); /* KHz */
+
+   for (i = 0; i < ssi->num_ports; i++) {
+   omap_port = omap_ssi->port[i];
+
+   if (!omap_port)
+   continue;
+
+   omap_ssi_port_update_fclk(ssi, omap_port);
+
+   /* resume ssi communication */
+   pinctrl_pm_select_default_state(omap_port->pdev);
+   tasklet_enable(_port->wake_tasklet);
+   }
+
+   break;
+   default:
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
 static int ssi_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
@@ -369,6 +427,10 @@ static int ssi_add_controller(struct hsi_controller *ssi,
goto out_err;
}
 
+   omap_ssi->fck_nb.notifier_call = ssi_clk_event;
+   omap_ssi->fck_nb.priority = INT_MAX;
+   clk_notifier_register(omap_ssi->fck, _ssi->fck_nb);
+
/* TODO: find register, which can be used to detect context loss */
omap_ssi->get_loss = NULL;
 
@@ -432,6 +494,7 @@ static void ssi_remove_controller(struct hsi_controller 
*ssi)
int id = ssi->id;
tasklet_kill(_ssi->gdd_tasklet);
hsi_unregister_controller(ssi);
+   clk_notifier_unregister(omap_ssi->fck, _ssi->fck_nb);
ida_simple_remove(_omap_ssi_ida, id);
 }
 
diff --git a/drivers/hsi/controllers/omap_ssi_port.c

[PATCH 4/6] HSI: omap_ssi: fix removal of port platform device

2016-04-29 Thread Sebastian Reichel

This avoids removal of the HSI port device when
only the platform port device should be removed
and clears the POPULATED bit in the DT node, so
that a new platform device is created when the
driver is probed again.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index ffb921482e76..68dfdaa19938 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -451,6 +451,10 @@ static int ssi_remove_ports(struct device *dev, void *c)
 {
struct platform_device *pdev = to_platform_device(dev);
 
+   if (!dev->of_node)
+   return 0;
+
+   of_node_clear_flag(dev->of_node, OF_POPULATED);
of_device_unregister(pdev);
 
return 0;
-- 
2.8.1

[PATCH 4/6] HSI: omap_ssi: fix removal of port platform device

2016-04-29 Thread Sebastian Reichel

This avoids removal of the HSI port device when
only the platform port device should be removed
and clears the POPULATED bit in the DT node, so
that a new platform device is created when the
driver is probed again.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index ffb921482e76..68dfdaa19938 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -451,6 +451,10 @@ static int ssi_remove_ports(struct device *dev, void *c)
 {
struct platform_device *pdev = to_platform_device(dev);
 
+   if (!dev->of_node)
+   return 0;
+
+   of_node_clear_flag(dev->of_node, OF_POPULATED);
of_device_unregister(pdev);
 
return 0;
-- 
2.8.1

[PATCH 1/6] HSI: omap_ssi_port: switch to gpiod API

2016-04-29 Thread Sebastian Reichel

Simplify driver by switching to new gpio descriptor based API.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c  |  1 -
 drivers/hsi/controllers/omap_ssi.h  |  4 ++--
 drivers/hsi/controllers/omap_ssi_port.c | 31 ++-
 3 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index 27b91f14ba7a..c582229d1cd2 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index f9aaf37262be..1fa028078a3c 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -97,7 +97,7 @@ struct omap_ssi_port {
struct list_headbrkqueue;
unsigned intirq;
int wake_irq;
-   int wake_gpio;
+   struct gpio_desc*wake_gpio;
struct tasklet_struct   pio_tasklet;
struct tasklet_struct   wake_tasklet;
boolwktest:1; /* FIXME: HACK to be removed */
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index e80a66e20998..948bdc7946fb 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 
 #include "omap_ssi_regs.h"
@@ -43,7 +43,7 @@ static inline int hsi_dummy_cl(struct hsi_client *cl 
__maybe_unused)
 static inline unsigned int ssi_wakein(struct hsi_port *port)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
-   return gpio_get_value(omap_port->wake_gpio);
+   return gpiod_get_value(omap_port->wake_gpio);
 }
 
 #ifdef CONFIG_DEBUG_FS
@@ -1036,12 +1036,12 @@ static int __init ssi_wake_irq(struct hsi_port *port,
int cawake_irq;
int err;
 
-   if (omap_port->wake_gpio == -1) {
+   if (!omap_port->wake_gpio) {
omap_port->wake_irq = -1;
return 0;
}
 
-   cawake_irq = gpio_to_irq(omap_port->wake_gpio);
+   cawake_irq = gpiod_to_irq(omap_port->wake_gpio);
 
omap_port->wake_irq = cawake_irq;
tasklet_init(_port->wake_tasklet, ssi_wake_tasklet,
@@ -,7 +,7 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
struct omap_ssi_port *omap_port;
struct hsi_controller *ssi = dev_get_drvdata(pd->dev.parent);
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
-   int cawake_gpio = 0;
+   struct gpio_desc *cawake_gpio = NULL;
u32 port_id;
int err;
 
@@ -1147,20 +1147,10 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
goto error;
}
 
-   err = of_get_named_gpio(np, "ti,ssi-cawake-gpio", 0);
-   if (err < 0) {
-   dev_err(>dev, "DT data is missing cawake gpio (err=%d)\n",
-   err);
-   goto error;
-   }
-   cawake_gpio = err;
-
-   err = devm_gpio_request_one(>device, cawake_gpio, GPIOF_DIR_IN,
-   "cawake");
-   if (err) {
-   dev_err(>dev, "could not request cawake gpio (err=%d)!\n",
-   err);
-   err = -ENXIO;
+   cawake_gpio = devm_gpiod_get(>dev, "ti,ssi-cawake", GPIOD_IN);
+   if (IS_ERR(cawake_gpio)) {
+   err = PTR_ERR(cawake_gpio);
+   dev_err(>dev, "couldn't get cawake gpio (err=%d)!\n", err);
goto error;
}
 
@@ -1219,8 +1209,7 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
 
hsi_add_clients_from_dt(port, np);
 
-   dev_info(>dev, "ssi port %u successfully initialized (cawake=%d)\n",
-   port_id, cawake_gpio);
+   dev_info(>dev, "ssi port %u successfully initialized\n", port_id);
 
return 0;
 
-- 
2.8.1

[PATCH 3/6] HSI: omap_ssi: make sure probe stays available

2016-04-29 Thread Sebastian Reichel

device can be unbind/rebind, so probe should
stay available.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c  | 17 +
 drivers/hsi/controllers/omap_ssi_port.c | 19 ++-
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index 2dd46b219af2..ffb921482e76 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -140,7 +140,7 @@ static const struct file_operations ssi_gdd_regs_fops = {
.release= single_release,
 };
 
-static int __init ssi_debug_add_ctrl(struct hsi_controller *ssi)
+static int ssi_debug_add_ctrl(struct hsi_controller *ssi)
 {
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
struct dentry *dir;
@@ -290,7 +290,7 @@ static unsigned long ssi_get_clk_rate(struct hsi_controller 
*ssi)
return rate;
 }
 
-static int __init ssi_get_iomem(struct platform_device *pd,
+static int ssi_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
struct resource *mem;
@@ -310,7 +310,7 @@ static int __init ssi_get_iomem(struct platform_device *pd,
return 0;
 }
 
-static int __init ssi_add_controller(struct hsi_controller *ssi,
+static int ssi_add_controller(struct hsi_controller *ssi,
struct platform_device *pd)
 {
struct omap_ssi_controller *omap_ssi;
@@ -386,7 +386,7 @@ out_err:
return err;
 }
 
-static int __init ssi_hw_init(struct hsi_controller *ssi)
+static int ssi_hw_init(struct hsi_controller *ssi)
 {
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
unsigned int i;
@@ -456,7 +456,7 @@ static int ssi_remove_ports(struct device *dev, void *c)
return 0;
 }
 
-static int __init ssi_probe(struct platform_device *pd)
+static int ssi_probe(struct platform_device *pd)
 {
struct platform_device *childpdev;
struct device_node *np = pd->dev.of_node;
@@ -522,7 +522,7 @@ out1:
return err;
 }
 
-static int __exit ssi_remove(struct platform_device *pd)
+static int ssi_remove(struct platform_device *pd)
 {
struct hsi_controller *ssi = platform_get_drvdata(pd);
 
@@ -592,7 +592,8 @@ MODULE_DEVICE_TABLE(of, omap_ssi_of_match);
 #endif
 
 static struct platform_driver ssi_pdriver = {
-   .remove = __exit_p(ssi_remove),
+   .probe = ssi_probe,
+   .remove = ssi_remove,
.driver = {
.name   = "omap_ssi",
.pm = DEV_PM_OPS,
@@ -600,7 +601,7 @@ static struct platform_driver ssi_pdriver = {
},
 };
 
-module_platform_driver_probe(ssi_pdriver, ssi_probe);
+module_platform_driver(ssi_pdriver);
 
 MODULE_ALIAS("platform:omap_ssi");
 MODULE_AUTHOR("Carlos Chinea ");
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index 948bdc7946fb..530095ed39e7 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -171,7 +171,7 @@ static int ssi_div_set(void *data, u64 val)
 
 DEFINE_SIMPLE_ATTRIBUTE(ssi_sst_div_fops, ssi_div_get, ssi_div_set, "%llu\n");
 
-static int __init ssi_debug_add_port(struct omap_ssi_port *omap_port,
+static int ssi_debug_add_port(struct omap_ssi_port *omap_port,
 struct dentry *dir)
 {
struct hsi_port *port = to_hsi_port(omap_port->dev);
@@ -1007,7 +1007,7 @@ static irqreturn_t ssi_wake_isr(int irq __maybe_unused, 
void *ssi_port)
return IRQ_HANDLED;
 }
 
-static int __init ssi_port_irq(struct hsi_port *port,
+static int ssi_port_irq(struct hsi_port *port,
struct platform_device *pd)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
@@ -1029,7 +1029,7 @@ static int __init ssi_port_irq(struct hsi_port *port,
return err;
 }
 
-static int __init ssi_wake_irq(struct hsi_port *port,
+static int ssi_wake_irq(struct hsi_port *port,
struct platform_device *pd)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
@@ -1060,7 +1060,7 @@ static int __init ssi_wake_irq(struct hsi_port *port,
return err;
 }
 
-static void __init ssi_queues_init(struct omap_ssi_port *omap_port)
+static void ssi_queues_init(struct omap_ssi_port *omap_port)
 {
unsigned int ch;
 
@@ -1071,7 +1071,7 @@ static void __init ssi_queues_init(struct omap_ssi_port 
*omap_port)
INIT_LIST_HEAD(_port->brkqueue);
 }
 
-static int __init ssi_port_get_iomem(struct platform_device *pd,
+static int ssi_port_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
struct hsi_port *port = platform_get_drvdata(pd);
@@ -1104,7 +1104,7 @@ static int __init

[PATCH 1/6] HSI: omap_ssi_port: switch to gpiod API

2016-04-29 Thread Sebastian Reichel

Simplify driver by switching to new gpio descriptor based API.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c  |  1 -
 drivers/hsi/controllers/omap_ssi.h  |  4 ++--
 drivers/hsi/controllers/omap_ssi_port.c | 31 ++-
 3 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index 27b91f14ba7a..c582229d1cd2 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -24,7 +24,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/hsi/controllers/omap_ssi.h 
b/drivers/hsi/controllers/omap_ssi.h
index f9aaf37262be..1fa028078a3c 100644
--- a/drivers/hsi/controllers/omap_ssi.h
+++ b/drivers/hsi/controllers/omap_ssi.h
@@ -27,7 +27,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -97,7 +97,7 @@ struct omap_ssi_port {
struct list_headbrkqueue;
unsigned intirq;
int wake_irq;
-   int wake_gpio;
+   struct gpio_desc*wake_gpio;
struct tasklet_struct   pio_tasklet;
struct tasklet_struct   wake_tasklet;
boolwktest:1; /* FIXME: HACK to be removed */
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index e80a66e20998..948bdc7946fb 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 
 #include "omap_ssi_regs.h"
@@ -43,7 +43,7 @@ static inline int hsi_dummy_cl(struct hsi_client *cl 
__maybe_unused)
 static inline unsigned int ssi_wakein(struct hsi_port *port)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
-   return gpio_get_value(omap_port->wake_gpio);
+   return gpiod_get_value(omap_port->wake_gpio);
 }
 
 #ifdef CONFIG_DEBUG_FS
@@ -1036,12 +1036,12 @@ static int __init ssi_wake_irq(struct hsi_port *port,
int cawake_irq;
int err;
 
-   if (omap_port->wake_gpio == -1) {
+   if (!omap_port->wake_gpio) {
omap_port->wake_irq = -1;
return 0;
}
 
-   cawake_irq = gpio_to_irq(omap_port->wake_gpio);
+   cawake_irq = gpiod_to_irq(omap_port->wake_gpio);
 
omap_port->wake_irq = cawake_irq;
tasklet_init(_port->wake_tasklet, ssi_wake_tasklet,
@@ -,7 +,7 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
struct omap_ssi_port *omap_port;
struct hsi_controller *ssi = dev_get_drvdata(pd->dev.parent);
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
-   int cawake_gpio = 0;
+   struct gpio_desc *cawake_gpio = NULL;
u32 port_id;
int err;
 
@@ -1147,20 +1147,10 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
goto error;
}
 
-   err = of_get_named_gpio(np, "ti,ssi-cawake-gpio", 0);
-   if (err < 0) {
-   dev_err(>dev, "DT data is missing cawake gpio (err=%d)\n",
-   err);
-   goto error;
-   }
-   cawake_gpio = err;
-
-   err = devm_gpio_request_one(>device, cawake_gpio, GPIOF_DIR_IN,
-   "cawake");
-   if (err) {
-   dev_err(>dev, "could not request cawake gpio (err=%d)!\n",
-   err);
-   err = -ENXIO;
+   cawake_gpio = devm_gpiod_get(>dev, "ti,ssi-cawake", GPIOD_IN);
+   if (IS_ERR(cawake_gpio)) {
+   err = PTR_ERR(cawake_gpio);
+   dev_err(>dev, "couldn't get cawake gpio (err=%d)!\n", err);
goto error;
}
 
@@ -1219,8 +1209,7 @@ static int __init ssi_port_probe(struct platform_device 
*pd)
 
hsi_add_clients_from_dt(port, np);
 
-   dev_info(>dev, "ssi port %u successfully initialized (cawake=%d)\n",
-   port_id, cawake_gpio);
+   dev_info(>dev, "ssi port %u successfully initialized\n", port_id);
 
return 0;
 
-- 
2.8.1

[PATCH 3/6] HSI: omap_ssi: make sure probe stays available

2016-04-29 Thread Sebastian Reichel

device can be unbind/rebind, so probe should
stay available.

Signed-off-by: Sebastian Reichel 
---
 drivers/hsi/controllers/omap_ssi.c  | 17 +
 drivers/hsi/controllers/omap_ssi_port.c | 19 ++-
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/hsi/controllers/omap_ssi.c 
b/drivers/hsi/controllers/omap_ssi.c
index 2dd46b219af2..ffb921482e76 100644
--- a/drivers/hsi/controllers/omap_ssi.c
+++ b/drivers/hsi/controllers/omap_ssi.c
@@ -140,7 +140,7 @@ static const struct file_operations ssi_gdd_regs_fops = {
.release= single_release,
 };
 
-static int __init ssi_debug_add_ctrl(struct hsi_controller *ssi)
+static int ssi_debug_add_ctrl(struct hsi_controller *ssi)
 {
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
struct dentry *dir;
@@ -290,7 +290,7 @@ static unsigned long ssi_get_clk_rate(struct hsi_controller 
*ssi)
return rate;
 }
 
-static int __init ssi_get_iomem(struct platform_device *pd,
+static int ssi_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
struct resource *mem;
@@ -310,7 +310,7 @@ static int __init ssi_get_iomem(struct platform_device *pd,
return 0;
 }
 
-static int __init ssi_add_controller(struct hsi_controller *ssi,
+static int ssi_add_controller(struct hsi_controller *ssi,
struct platform_device *pd)
 {
struct omap_ssi_controller *omap_ssi;
@@ -386,7 +386,7 @@ out_err:
return err;
 }
 
-static int __init ssi_hw_init(struct hsi_controller *ssi)
+static int ssi_hw_init(struct hsi_controller *ssi)
 {
struct omap_ssi_controller *omap_ssi = hsi_controller_drvdata(ssi);
unsigned int i;
@@ -456,7 +456,7 @@ static int ssi_remove_ports(struct device *dev, void *c)
return 0;
 }
 
-static int __init ssi_probe(struct platform_device *pd)
+static int ssi_probe(struct platform_device *pd)
 {
struct platform_device *childpdev;
struct device_node *np = pd->dev.of_node;
@@ -522,7 +522,7 @@ out1:
return err;
 }
 
-static int __exit ssi_remove(struct platform_device *pd)
+static int ssi_remove(struct platform_device *pd)
 {
struct hsi_controller *ssi = platform_get_drvdata(pd);
 
@@ -592,7 +592,8 @@ MODULE_DEVICE_TABLE(of, omap_ssi_of_match);
 #endif
 
 static struct platform_driver ssi_pdriver = {
-   .remove = __exit_p(ssi_remove),
+   .probe = ssi_probe,
+   .remove = ssi_remove,
.driver = {
.name   = "omap_ssi",
.pm = DEV_PM_OPS,
@@ -600,7 +601,7 @@ static struct platform_driver ssi_pdriver = {
},
 };
 
-module_platform_driver_probe(ssi_pdriver, ssi_probe);
+module_platform_driver(ssi_pdriver);
 
 MODULE_ALIAS("platform:omap_ssi");
 MODULE_AUTHOR("Carlos Chinea ");
diff --git a/drivers/hsi/controllers/omap_ssi_port.c 
b/drivers/hsi/controllers/omap_ssi_port.c
index 948bdc7946fb..530095ed39e7 100644
--- a/drivers/hsi/controllers/omap_ssi_port.c
+++ b/drivers/hsi/controllers/omap_ssi_port.c
@@ -171,7 +171,7 @@ static int ssi_div_set(void *data, u64 val)
 
 DEFINE_SIMPLE_ATTRIBUTE(ssi_sst_div_fops, ssi_div_get, ssi_div_set, "%llu\n");
 
-static int __init ssi_debug_add_port(struct omap_ssi_port *omap_port,
+static int ssi_debug_add_port(struct omap_ssi_port *omap_port,
 struct dentry *dir)
 {
struct hsi_port *port = to_hsi_port(omap_port->dev);
@@ -1007,7 +1007,7 @@ static irqreturn_t ssi_wake_isr(int irq __maybe_unused, 
void *ssi_port)
return IRQ_HANDLED;
 }
 
-static int __init ssi_port_irq(struct hsi_port *port,
+static int ssi_port_irq(struct hsi_port *port,
struct platform_device *pd)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
@@ -1029,7 +1029,7 @@ static int __init ssi_port_irq(struct hsi_port *port,
return err;
 }
 
-static int __init ssi_wake_irq(struct hsi_port *port,
+static int ssi_wake_irq(struct hsi_port *port,
struct platform_device *pd)
 {
struct omap_ssi_port *omap_port = hsi_port_drvdata(port);
@@ -1060,7 +1060,7 @@ static int __init ssi_wake_irq(struct hsi_port *port,
return err;
 }
 
-static void __init ssi_queues_init(struct omap_ssi_port *omap_port)
+static void ssi_queues_init(struct omap_ssi_port *omap_port)
 {
unsigned int ch;
 
@@ -1071,7 +1071,7 @@ static void __init ssi_queues_init(struct omap_ssi_port 
*omap_port)
INIT_LIST_HEAD(_port->brkqueue);
 }
 
-static int __init ssi_port_get_iomem(struct platform_device *pd,
+static int ssi_port_get_iomem(struct platform_device *pd,
const char *name, void __iomem **pbase, dma_addr_t *phy)
 {
struct hsi_port *port = platform_get_drvdata(pd);
@@ -1104,7 +1104,7 @@ static int __init ssi_port_get_iomem(struct 
platform_device

[PATCH 0/6] omap-ssi cleanups + dvfs support

2016-04-29 Thread Sebastian Reichel

Hi,

The following patches add a few cleanups to the omap-ssi driver,
fix module unloading (and reloading) and merge omap-ssi and
omap-ssi-port into the same module to avoid a circular dependency
introduced by the last patch.

P.S.: The last patch has already been sent in this patchset
(https://lkml.org/lkml/2016/1/30/283), but could not be applied
due to the circular dependency.

-- Sebastian

Sebastian Reichel (6):
  HSI: omap_ssi_port: switch to gpiod API
  HSI: omap_ssi: fix module unloading
  HSI: omap_ssi: make sure probe stays available
  HSI: omap_ssi: fix removal of port platform device
  HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module
  HSI: omap-ssi: add clk change support

 drivers/hsi/controllers/Kconfig|   5 -
 drivers/hsi/controllers/Makefile   |   4 +-
 drivers/hsi/controllers/omap_ssi.h |  12 ++-
 .../controllers/{omap_ssi.c => omap_ssi_core.c}| 106 ++---
 drivers/hsi/controllers/omap_ssi_port.c|  84 
 5 files changed, 146 insertions(+), 65 deletions(-)
 rename drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} (86%)

-- 
2.8.1

[PATCH 0/6] omap-ssi cleanups + dvfs support

2016-04-29 Thread Sebastian Reichel

Hi,

The following patches add a few cleanups to the omap-ssi driver,
fix module unloading (and reloading) and merge omap-ssi and
omap-ssi-port into the same module to avoid a circular dependency
introduced by the last patch.

P.S.: The last patch has already been sent in this patchset
(https://lkml.org/lkml/2016/1/30/283), but could not be applied
due to the circular dependency.

-- Sebastian

Sebastian Reichel (6):
  HSI: omap_ssi_port: switch to gpiod API
  HSI: omap_ssi: fix module unloading
  HSI: omap_ssi: make sure probe stays available
  HSI: omap_ssi: fix removal of port platform device
  HSI: omap_ssi: built omap_ssi and omap_ssi_port into one module
  HSI: omap-ssi: add clk change support

 drivers/hsi/controllers/Kconfig|   5 -
 drivers/hsi/controllers/Makefile   |   4 +-
 drivers/hsi/controllers/omap_ssi.h |  12 ++-
 .../controllers/{omap_ssi.c => omap_ssi_core.c}| 106 ++---
 drivers/hsi/controllers/omap_ssi_port.c|  84 
 5 files changed, 146 insertions(+), 65 deletions(-)
 rename drivers/hsi/controllers/{omap_ssi.c => omap_ssi_core.c} (86%)

-- 
2.8.1

Re: [RFC PATCH 1/1] clk: imx7d: move clk setting out of imx7d_clocks_init

2016-04-29 Thread Stefan Agner

On 2016-04-29 02:45, Dong Aisheng wrote:
> During kernel early booting(e.g. in time_init()), there's only one
> init idle task running, and the idle sched class indicates that it's
> not valid to schedule for idle task. If it happens the kernel
> will complain with a error message as follows:
> [0.00] bad: scheduling from the idle thread!
> 
> We can observe this warning on an i.MX7D SDB board. See full log below.
> It is caused by imx7d_clocks_init function called in time_init
> invokes a lot clk_prepare_enable to enable many clocks and it happens
> that the Audio/Video PLLs need large delay causes a sleep.
> 
> Since we should not sleep during time_init, this patch fundamentally
> moves all clk_prepare_enable and clk_set_parent out of imx7d_clocks_init
> and use a postcore init function imx7d_clocks_setup to do it later instead.
> Then we simply reply on the bootloader settings to do early boot.

Is this really a good idea? What happens if something requests a clock
before postcore initcalls get called? E.g. clock source is initialized
before that, which might request a clock...

Ok, we can just say that all those clocks need to be bootloader on, but
how do we know? Do we catch if the bootloader does not adhere to that? 

--
Stefan


> 
> [0.00] bad: scheduling from the idle thread!
> [0.00] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW  
> 4.6.0-rc3-00064-gded8bc08fb45-dirty #836
> [0.00] Hardware name: Freescale i.MX7 Dual (Device Tree)
> [0.00] Backtrace:
> [0.00] [] (dump_backtrace) from []
> (show_stack+0x18/0x1c)
> [0.00]  r6:6053 r5: r4:c0d21f9c r3:
> [0.00] [] (show_stack) from []
> (dump_stack+0xb4/0xe8)
> [0.00] [] (dump_stack) from []
> (dequeue_task_idle+0x20/0x30)
> [0.00]  r10:c08f5554 r9:c0d02b0c r8:c0d06844 r7:ef7c14d0
> r6:0001 r5:ef7c14c0
> [0.00]  r4:ef7c14c0 r3:
> [0.00] [] (dequeue_task_idle) from []
> (deactivate_task+0x64/0x68)
> [0.00]  r4:c0d06500 r3:c0154368
> [0.00] [] (deactivate_task) from []
> (__schedule+0x2e8/0x738)
> [0.00]  r6:c0d06500 r5:ef7c14c0 r4:c0c744c0 r3:0002
> [0.00] [] (__schedule) from [] 
> (schedule+0x48/0xa0)
> [0.00]  r10:01b6 r9:0003 r8:0036 r7:
> r6:0007a120 r5:
> [0.00]  r4:c0d0
> [0.00] [] (schedule) from []
> (schedule_hrtimeout_range_clock+0xac/0x12c)
> [0.00]  r4:0006ddd0 r3:c0d06500
> [0.00] [] (schedule_hrtimeout_range_clock) from
> [] (schedule_hrtimeout_range+0x24/0x2c)
> [0.00]  r7:c0d5d434 r6:c0d02100 r5:8ad1 r4:ef00c040
> [0.00] [] (schedule_hrtimeout_range) from
> [] (usleep_range+0x54/0x5c)
> [0.00] [] (usleep_range) from []
> (clk_pllv3_wait_lock+0x7c/0xb4)
> [0.00] [] (clk_pllv3_wait_lock) from []
> (clk_pllv3_prepare+0x2c/0x30)
> [0.00]  r6: r5:c15603f4 r4:ef007680 r3:201b
> [0.00] [] (clk_pllv3_prepare) from []
> (clk_core_prepare+0x98/0xc8)
> [0.00] [] (clk_core_prepare) from []
> (clk_prepare+0x20/0x38)
> [0.00]  r5:c15603f4 r4:ef00c100
> [0.00] [] (clk_prepare) from []
> (imx7d_clocks_init+0x6108/0x6188)
> [0.00]  r4:ef00c100 r3:0001
> [0.00] [] (imx7d_clocks_init) from []
> (of_clk_init+0x10c/0x1d4)
> [0.00]  r10:0001 r9:c0d01f68 r8: r7:c0d01f60
> r6:ef7e3d60 r5:ef004340
> [0.00]  r4:0002
> [0.00] [] (of_clk_init) from []
> (time_init+0x2c/0x38)
> [0.00]  r10:efffcac0 r9:c0c5ca48 r8:c0d76000 r7:c0d028c0
> r6: r5:c0d76000
> [0.00]  r4:
> [0.00] [] (time_init) from []
> (start_kernel+0x218/0x398)
> [0.00] [] (start_kernel) from [<8000807c>] (0x8000807c)
> [0.00]  r10: r9:410fc075 r8:8000406a r7:c0d07e8c
> r6:c0c5ca44 r5:c0d0296c
> [0.00]  r4:c0d76294
> [0.00] Architected cp15 timer(s) running at 8.00MHz (virt).
> [0.00] clocksource: arch_sys_counter: mask: 0xff
> max_cycles: 0x1d854df40, max_idle_ns: 440795202120 ns
> 
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: Shawn Guo 
> Cc: Stefan Agner 
> Cc: Lucas Stach 
> Signed-off-by: Dong Aisheng 
> ---
>  drivers/clk/imx/clk-imx7d.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/clk/imx/clk-imx7d.c b/drivers/clk/imx/clk-imx7d.c
> index 7912be83c4af..3be2e9371491 100644
> --- a/drivers/clk/imx/clk-imx7d.c
> +++ b/drivers/clk/imx/clk-imx7d.c
> @@ -378,7 +378,6 @@ static void __init imx7d_clocks_init(struct
> device_node *ccm_node)
>  {
>   struct device_node *np;
>   void __iomem *base;
> - int i;
>  
>   clks[IMX7D_CLK_DUMMY] = imx_clk_fixed("dummy", 0);
>   clks[IMX7D_OSC_24M_CLK] =

Re: [RFC PATCH 1/1] clk: imx7d: move clk setting out of imx7d_clocks_init

2016-04-29 Thread Stefan Agner

On 2016-04-29 02:45, Dong Aisheng wrote:
> During kernel early booting(e.g. in time_init()), there's only one
> init idle task running, and the idle sched class indicates that it's
> not valid to schedule for idle task. If it happens the kernel
> will complain with a error message as follows:
> [0.00] bad: scheduling from the idle thread!
> 
> We can observe this warning on an i.MX7D SDB board. See full log below.
> It is caused by imx7d_clocks_init function called in time_init
> invokes a lot clk_prepare_enable to enable many clocks and it happens
> that the Audio/Video PLLs need large delay causes a sleep.
> 
> Since we should not sleep during time_init, this patch fundamentally
> moves all clk_prepare_enable and clk_set_parent out of imx7d_clocks_init
> and use a postcore init function imx7d_clocks_setup to do it later instead.
> Then we simply reply on the bootloader settings to do early boot.

Is this really a good idea? What happens if something requests a clock
before postcore initcalls get called? E.g. clock source is initialized
before that, which might request a clock...

Ok, we can just say that all those clocks need to be bootloader on, but
how do we know? Do we catch if the bootloader does not adhere to that? 

--
Stefan


> 
> [0.00] bad: scheduling from the idle thread!
> [0.00] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW  
> 4.6.0-rc3-00064-gded8bc08fb45-dirty #836
> [0.00] Hardware name: Freescale i.MX7 Dual (Device Tree)
> [0.00] Backtrace:
> [0.00] [] (dump_backtrace) from []
> (show_stack+0x18/0x1c)
> [0.00]  r6:6053 r5: r4:c0d21f9c r3:
> [0.00] [] (show_stack) from []
> (dump_stack+0xb4/0xe8)
> [0.00] [] (dump_stack) from []
> (dequeue_task_idle+0x20/0x30)
> [0.00]  r10:c08f5554 r9:c0d02b0c r8:c0d06844 r7:ef7c14d0
> r6:0001 r5:ef7c14c0
> [0.00]  r4:ef7c14c0 r3:
> [0.00] [] (dequeue_task_idle) from []
> (deactivate_task+0x64/0x68)
> [0.00]  r4:c0d06500 r3:c0154368
> [0.00] [] (deactivate_task) from []
> (__schedule+0x2e8/0x738)
> [0.00]  r6:c0d06500 r5:ef7c14c0 r4:c0c744c0 r3:0002
> [0.00] [] (__schedule) from [] 
> (schedule+0x48/0xa0)
> [0.00]  r10:01b6 r9:0003 r8:0036 r7:
> r6:0007a120 r5:
> [0.00]  r4:c0d0
> [0.00] [] (schedule) from []
> (schedule_hrtimeout_range_clock+0xac/0x12c)
> [0.00]  r4:0006ddd0 r3:c0d06500
> [0.00] [] (schedule_hrtimeout_range_clock) from
> [] (schedule_hrtimeout_range+0x24/0x2c)
> [0.00]  r7:c0d5d434 r6:c0d02100 r5:8ad1 r4:ef00c040
> [0.00] [] (schedule_hrtimeout_range) from
> [] (usleep_range+0x54/0x5c)
> [0.00] [] (usleep_range) from []
> (clk_pllv3_wait_lock+0x7c/0xb4)
> [0.00] [] (clk_pllv3_wait_lock) from []
> (clk_pllv3_prepare+0x2c/0x30)
> [0.00]  r6: r5:c15603f4 r4:ef007680 r3:201b
> [0.00] [] (clk_pllv3_prepare) from []
> (clk_core_prepare+0x98/0xc8)
> [0.00] [] (clk_core_prepare) from []
> (clk_prepare+0x20/0x38)
> [0.00]  r5:c15603f4 r4:ef00c100
> [0.00] [] (clk_prepare) from []
> (imx7d_clocks_init+0x6108/0x6188)
> [0.00]  r4:ef00c100 r3:0001
> [0.00] [] (imx7d_clocks_init) from []
> (of_clk_init+0x10c/0x1d4)
> [0.00]  r10:0001 r9:c0d01f68 r8: r7:c0d01f60
> r6:ef7e3d60 r5:ef004340
> [0.00]  r4:0002
> [0.00] [] (of_clk_init) from []
> (time_init+0x2c/0x38)
> [0.00]  r10:efffcac0 r9:c0c5ca48 r8:c0d76000 r7:c0d028c0
> r6: r5:c0d76000
> [0.00]  r4:
> [0.00] [] (time_init) from []
> (start_kernel+0x218/0x398)
> [0.00] [] (start_kernel) from [<8000807c>] (0x8000807c)
> [0.00]  r10: r9:410fc075 r8:8000406a r7:c0d07e8c
> r6:c0c5ca44 r5:c0d0296c
> [0.00]  r4:c0d76294
> [0.00] Architected cp15 timer(s) running at 8.00MHz (virt).
> [0.00] clocksource: arch_sys_counter: mask: 0xff
> max_cycles: 0x1d854df40, max_idle_ns: 440795202120 ns
> 
> Cc: Michael Turquette 
> Cc: Stephen Boyd 
> Cc: Shawn Guo 
> Cc: Stefan Agner 
> Cc: Lucas Stach 
> Signed-off-by: Dong Aisheng 
> ---
>  drivers/clk/imx/clk-imx7d.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/clk/imx/clk-imx7d.c b/drivers/clk/imx/clk-imx7d.c
> index 7912be83c4af..3be2e9371491 100644
> --- a/drivers/clk/imx/clk-imx7d.c
> +++ b/drivers/clk/imx/clk-imx7d.c
> @@ -378,7 +378,6 @@ static void __init imx7d_clocks_init(struct
> device_node *ccm_node)
>  {
>   struct device_node *np;
>   void __iomem *base;
> - int i;
>  
>   clks[IMX7D_CLK_DUMMY] = imx_clk_fixed("dummy", 0);
>   clks[IMX7D_OSC_24M_CLK] = of_clk_get_by_name(ccm_node, "osc");
> @@ -409,13 +408,6 @@ static void __init imx7d_clocks_init(struct
> device_node *ccm_node)
>

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Rik van Riel

On Fri, 2016-04-29 at 16:51 -0700, Linus Torvalds wrote:

> There's presumably a few optimal values from a "spread bits out
> evenly" standpoint, and they won't have anything to do with random
> irrational constants, and will have everything to do with having nice
> bitpatterns.
> 
> I'm adding Rik to the cc, because the original broken constants came
> from him long long ago (they go back to 2002, originally only used
> for
> the waitqueue hashing. Maybe he had some other input that caused him
> to believe that the primeness actually mattered.

I do not remember where I got that hashing algorithm and
magic constant from 14 years ago, but the changelog suggests
I got it from Chuck Lever's paper.

Chuck Lever's paper does mention that primeness "adds
certain desirable qualities", and I may have read too
much into that.

I really do not remember the "bit sparse" thing at all,
and have no idea
where that came from. Googling old email
threads about the code mostly
makes me wonder "hey, where
did that person go?"

I am all for magic numbers that work better.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Rik van Riel

On Fri, 2016-04-29 at 16:51 -0700, Linus Torvalds wrote:

> There's presumably a few optimal values from a "spread bits out
> evenly" standpoint, and they won't have anything to do with random
> irrational constants, and will have everything to do with having nice
> bitpatterns.
> 
> I'm adding Rik to the cc, because the original broken constants came
> from him long long ago (they go back to 2002, originally only used
> for
> the waitqueue hashing. Maybe he had some other input that caused him
> to believe that the primeness actually mattered.

I do not remember where I got that hashing algorithm and
magic constant from 14 years ago, but the changelog suggests
I got it from Chuck Lever's paper.

Chuck Lever's paper does mention that primeness "adds
certain desirable qualities", and I may have read too
much into that.

I really do not remember the "bit sparse" thing at all,
and have no idea
where that came from. Googling old email
threads about the code mostly
makes me wonder "hey, where
did that person go?"

I am all for magic numbers that work better.

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 5:32 PM, George Spelvin  wrote:
>
> Odd is important.  If the multiplier is even, the msbit of the input
> doesn't affect the hash result at all.

Fair enough. My test-set was incomplete.

>> Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.
>
> It's not as clever as it could be; it just does the same Booth
> recoding thing, a simple series of shifts with add/subtract.

Ahh. I thought gcc did the Bernstein's algorithm thing, which is
exponential in the bit size. That would have explained why it only
does it for 32-bit constants.

Not doing it for 64-bit constants makes no sense if it just uses the
trivial Booth's algorithm version.

So the odd "we don't do it for 64-bit" is apparently just an
oversight, not because gcc does something clever.

Oh well.

 Linus

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 5:32 PM, George Spelvin  wrote:
>
> Odd is important.  If the multiplier is even, the msbit of the input
> doesn't affect the hash result at all.

Fair enough. My test-set was incomplete.

>> Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.
>
> It's not as clever as it could be; it just does the same Booth
> recoding thing, a simple series of shifts with add/subtract.

Ahh. I thought gcc did the Bernstein's algorithm thing, which is
exponential in the bit size. That would have explained why it only
does it for 32-bit constants.

Not doing it for 64-bit constants makes no sense if it just uses the
trivial Booth's algorithm version.

So the odd "we don't do it for 64-bit" is apparently just an
oversight, not because gcc does something clever.

Oh well.

 Linus

Re:Good Day

2016-04-29 Thread Mr. Hirano Nobuyuki

I am Mr. Hirano Nobuyuki , from Japan, I will like to seek your 
partnership and

mutual understanding regarding a beneficial transaction.

Reply Immediately to my email below for Details .
 ( hiranonobuyuki...@lycos.com )

Regards For Hirano
Sincerely,
Mr.Nobuyuki Hirano
Bank Of Tokyo-Mitsubishi UFJ

Re:Good Day

2016-04-29 Thread Mr. Hirano Nobuyuki

I am Mr. Hirano Nobuyuki , from Japan, I will like to seek your 
partnership and

mutual understanding regarding a beneficial transaction.

Reply Immediately to my email below for Details .
 ( hiranonobuyuki...@lycos.com )

Regards For Hirano
Sincerely,
Mr.Nobuyuki Hirano
Bank Of Tokyo-Mitsubishi UFJ

Re: [PATCH 00/14] staging: fsl-mc: misc updates

2016-04-29 Thread Greg KH

On Mon, Apr 11, 2016 at 11:48:25AM -0500, Stuart Yoder wrote:
> From: Stuart Yoder 
> 
> This patch series makes further progress towards completing the fsl-mc
> TODO list.

I don't seem to have gotten patch 14/14, can you resend just that one?

thanks,

greg k-h

Re: [PATCH 00/14] staging: fsl-mc: misc updates

2016-04-29 Thread Greg KH

On Mon, Apr 11, 2016 at 11:48:25AM -0500, Stuart Yoder wrote:
> From: Stuart Yoder 
> 
> This patch series makes further progress towards completing the fsl-mc
> TODO list.

I don't seem to have gotten patch 14/14, can you resend just that one?

thanks,

greg k-h

[PATCH] x86, perf: Add model numbers for Kabylake CPUs

2016-04-29 Thread Andi Kleen

From: Andi Kleen 

Everything the same as Skylake, just new model numbers.

Signed-off-by: Andi Kleen 
---
 arch/x86/events/intel/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 79b59437f5ee..90ba3ae3074e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3794,6 +3794,8 @@ __init int intel_pmu_init(void)
pr_cont("Knights Landing events, ");
break;
 
+   case 142: /* 14nm Kabylake Mobile */
+   case 158: /* 14nm Kabylake Desktop */
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
case 85: /* 14nm Skylake Server */
-- 
2.5.5

[PATCH] x86, perf: Add model numbers for Kabylake CPUs

2016-04-29 Thread Andi Kleen

From: Andi Kleen 

Everything the same as Skylake, just new model numbers.

Signed-off-by: Andi Kleen 
---
 arch/x86/events/intel/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 79b59437f5ee..90ba3ae3074e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3794,6 +3794,8 @@ __init int intel_pmu_init(void)
pr_cont("Knights Landing events, ");
break;
 
+   case 142: /* 14nm Kabylake Mobile */
+   case 158: /* 14nm Kabylake Desktop */
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
case 85: /* 14nm Skylake Server */
-- 
2.5.5

Re: [PATCH v2 1/4] drivers: staging: fix parameter alignment

2016-04-29 Thread Greg KH

On Mon, Apr 18, 2016 at 10:35:28AM +1000, Tobin C Harding wrote:
> drivers/staging/android/ion/ion.c checkpatch produces alignment checks.
> 
> This patch is whitespace only and fixes these checks.
> 
> Signed-off-by: Tobin C Harding 
> ---
>  drivers/staging/android/ion/ion.c | 64 
> +++
>  1 file changed, 32 insertions(+), 32 deletions(-)

Please put the driver's name in the subject line to make it easier to
see.  For this one it would be:
Subject: [PATCH v2 1/4] staging: android: ion: fix parameter alignment

Please fix up and resend the series after dropping the one patch.

thanks,

greg k-h

Re: [PATCH v2 1/4] drivers: staging: fix parameter alignment

2016-04-29 Thread Greg KH

On Mon, Apr 18, 2016 at 10:35:28AM +1000, Tobin C Harding wrote:
> drivers/staging/android/ion/ion.c checkpatch produces alignment checks.
> 
> This patch is whitespace only and fixes these checks.
> 
> Signed-off-by: Tobin C Harding 
> ---
>  drivers/staging/android/ion/ion.c | 64 
> +++
>  1 file changed, 32 insertions(+), 32 deletions(-)

Please put the driver's name in the subject line to make it easier to
see.  For this one it would be:
Subject: [PATCH v2 1/4] staging: android: ion: fix parameter alignment

Please fix up and resend the series after dropping the one patch.

thanks,

greg k-h

Re: [PATCH] staging: rts5208: Unnecessary parantheses around chip->sd_card

2016-04-29 Thread Greg KH

On Fri, Apr 29, 2016 at 11:15:27AM +0530, Manav Batra wrote:
> Removes unnecessary parantheses around chip->sd_card
> Signed-off-by: Manav Batra 
> ---
>  drivers/staging/rts5208/sd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

HTML patches do not work, sorry :(

Also, put a blank line between your changelog text and the signed-off-by
line.

thanks,

greg k-h

Re: [PATCH] staging: rts5208: Unnecessary parantheses around chip->sd_card

2016-04-29 Thread Greg KH

On Fri, Apr 29, 2016 at 11:15:27AM +0530, Manav Batra wrote:
> Removes unnecessary parantheses around chip->sd_card
> Signed-off-by: Manav Batra 
> ---
>  drivers/staging/rts5208/sd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

HTML patches do not work, sorry :(

Also, put a blank line between your changelog text and the signed-off-by
line.

thanks,

greg k-h

Re: [PATCH] staging: rts5208: Avoid multiple assignment in one line

2016-04-29 Thread Greg KH

On Thu, Apr 28, 2016 at 10:42:07PM -0700, Manav Batra wrote:
> Signed-off-by: Manav Batra 
> 
> Separates out assignment in one line to two lines.

signed-off-by goes at the end of the text, not at the top.

Please fix all of these and resend.

thanks,

greg k-h

Re: [PATCH] staging: rts5208: Avoid multiple assignment in one line

2016-04-29 Thread Greg KH

On Thu, Apr 28, 2016 at 10:42:07PM -0700, Manav Batra wrote:
> Signed-off-by: Manav Batra 
> 
> Separates out assignment in one line to two lines.

signed-off-by goes at the end of the text, not at the top.

Please fix all of these and resend.

thanks,

greg k-h

Re: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

2016-04-29 Thread Dave Hansen

On 04/29/2016 04:12 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:32:15PM -0700, Dave Hansen wrote:
>> The reason I haven't acked this patch is that I want to be _sure_ that
>> we've audited all of the call paths that access the XSAVE buffer to
>> ensure that they can all either handle the XSAVES format *or* don't care
>> for whatever reason.
>>
>> Could you share the steps that you've taken to assure yourself that all
>> of the call paths are handled and we don't have more bugs?
> 
> We tested for signal, ptrace, context switch, avx, and mpx.  We also run
> these tests with your audit patch to detect any format mis-match.
> That said, I cannot be sure there are no more bugs.  As you said, we want
> to get this feature tested in the field and find potential issues early. 

That's better than what we had before, but it relies entirely on testing
coverage and runtime checks.

Is it too much to ask that you also take a look and audit all the places
the XSAVE buffer is accessed in the kernel and ensure that they either
have code to handle standard vs. compacted/supervisor or don't care for
some reason?

I did such an audit once upon a time, but I think it would be a good
exercise to repeat both by a second set of eyes and because some time
has passed.

Re: [PATCH v4 10/10] x86/xsaves: Re-enable XSAVES

2016-04-29 Thread Dave Hansen

On 04/29/2016 04:12 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:32:15PM -0700, Dave Hansen wrote:
>> The reason I haven't acked this patch is that I want to be _sure_ that
>> we've audited all of the call paths that access the XSAVE buffer to
>> ensure that they can all either handle the XSAVES format *or* don't care
>> for whatever reason.
>>
>> Could you share the steps that you've taken to assure yourself that all
>> of the call paths are handled and we don't have more bugs?
> 
> We tested for signal, ptrace, context switch, avx, and mpx.  We also run
> these tests with your audit patch to detect any format mis-match.
> That said, I cannot be sure there are no more bugs.  As you said, we want
> to get this feature tested in the field and find potential issues early. 

That's better than what we had before, but it relies entirely on testing
coverage and runtime checks.

Is it too much to ask that you also take a look and audit all the places
the XSAVE buffer is accessed in the kernel and ensure that they either
have code to handle standard vs. compacted/supervisor or don't care for
some reason?

I did such an audit once upon a time, but I think it would be a good
exercise to repeat both by a second set of eyes and because some time
has passed.

Re: [PATCH v2 00/13] De-stage Sync File Framework

2016-04-29 Thread Greg Kroah-Hartman

On Thu, Apr 28, 2016 at 10:46:47AM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Hi,
> 
> This patchset sits on top of Sync ABI Rework v13:
> 
> https://www.spinics.net/lists/dri-devel/msg105667.html
> 
> The first eight clean up and prepare sync_file for de-staging. The last four
> patches do the de-staging, moving files to drivers/dma-buf/ and include/linux/
> plus adding Documentation.
> 
> As the de-stage depends upon many changes on the staging tree it would
> be good to get all the patches merged through the staging tree if Sumit
> agrees with that.
> 
> The next step on the Sync de-stage is clean up the remaining bits
> of the Sync Framework, mainly SW_SYNC, which is only used for testing.
> 
> v2: - Add Reviewed-by: tag from Daniel Vetter to all patches.
> - Take in sugestions for the Sync File Documentation (Daniel)
> - Remove name arg from sync_file_crate() (Daniel)
> - Revome leftover EXPORT_SYMBOL(sync_file_merge) (Daniel)

Thanks for sticking with this, both series of patches are now applied,
thanks.

greg k-h

Re: [PATCH v2 00/13] De-stage Sync File Framework

2016-04-29 Thread Greg Kroah-Hartman

On Thu, Apr 28, 2016 at 10:46:47AM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Hi,
> 
> This patchset sits on top of Sync ABI Rework v13:
> 
> https://www.spinics.net/lists/dri-devel/msg105667.html
> 
> The first eight clean up and prepare sync_file for de-staging. The last four
> patches do the de-staging, moving files to drivers/dma-buf/ and include/linux/
> plus adding Documentation.
> 
> As the de-stage depends upon many changes on the staging tree it would
> be good to get all the patches merged through the staging tree if Sumit
> agrees with that.
> 
> The next step on the Sync de-stage is clean up the remaining bits
> of the Sync Framework, mainly SW_SYNC, which is only used for testing.
> 
> v2: - Add Reviewed-by: tag from Daniel Vetter to all patches.
> - Take in sugestions for the Sync File Documentation (Daniel)
> - Remove name arg from sync_file_crate() (Daniel)
> - Revome leftover EXPORT_SYMBOL(sync_file_merge) (Daniel)

Thanks for sticking with this, both series of patches are now applied,
thanks.

greg k-h

Re: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

2016-04-29 Thread Dave Hansen

On 04/29/2016 03:43 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:09:07PM -0700, Dave Hansen wrote:
>> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
>>> +static int may_copy_fpregs_to_sigframe(void)
>>> +{
>>> +   /*
>>> +* In signal handling path, the kernel already checks if
>>> +* FPU instructions have been used before it calls
>>> +* copy_fpstate_to_sigframe(). We check this here again
>>> +* to detect any potential mis-use and saving invalid
>>> +* register values directly to a signal frame.
>>> +*/
>>> +   WARN_ONCE(!current->thread.fpu.fpstate_active,
>>> + "direct FPU save with no math use\n");
>>
>> This is probably an OK check for this _particular_ context (since this
>> context is all ready to copy_to_user() the fpu state).  But is it good
>> generally?  Why couldn't you have a !fpstate_active thread that _was_
>> fpregs_active?
>>
>> Such a thread _could_ do a direct XSAVE with no issues.
> 
> But it won't come to this function unless fpstate_active is ture?

If may_copy_fpregs_to_sigframe() were called from a slightly different
context, or if we change the call-site, what breaks?

In other words. if we can still "may_copy_fpregs_to_sigframe()" no
matter the state of fpu.fpstate_active, then I don't think we should be
checking it in may_copy_fpregs_to_sigframe().

Re: [PATCH v4 04/10] x86/xsaves: Introduce a new check that allows correct xstates copy from kernel to user directly

2016-04-29 Thread Dave Hansen

On 04/29/2016 03:43 PM, Yu-cheng Yu wrote:
> On Fri, Apr 29, 2016 at 01:09:07PM -0700, Dave Hansen wrote:
>> On 03/04/2016 10:12 AM, Yu-cheng Yu wrote:
>>> +static int may_copy_fpregs_to_sigframe(void)
>>> +{
>>> +   /*
>>> +* In signal handling path, the kernel already checks if
>>> +* FPU instructions have been used before it calls
>>> +* copy_fpstate_to_sigframe(). We check this here again
>>> +* to detect any potential mis-use and saving invalid
>>> +* register values directly to a signal frame.
>>> +*/
>>> +   WARN_ONCE(!current->thread.fpu.fpstate_active,
>>> + "direct FPU save with no math use\n");
>>
>> This is probably an OK check for this _particular_ context (since this
>> context is all ready to copy_to_user() the fpu state).  But is it good
>> generally?  Why couldn't you have a !fpstate_active thread that _was_
>> fpregs_active?
>>
>> Such a thread _could_ do a direct XSAVE with no issues.
> 
> But it won't come to this function unless fpstate_active is ture?

If may_copy_fpregs_to_sigframe() were called from a slightly different
context, or if we change the call-site, what breaks?

In other words. if we can still "may_copy_fpregs_to_sigframe()" no
matter the state of fpu.fpstate_active, then I don't think we should be
checking it in may_copy_fpregs_to_sigframe().

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

> At least for my tests, even that seems to actually be a total
> non-issue. Yes, odd values *might* be better, but as mentioned in my
> crossing email, it doesn't actually seem to matter for any case the
> kernel cares about, since we tend to want to hash down to 10-20 bits
> of data, so the least significant bit (particularly for the 64-bit
> case) just doesn't matter all that much.

Odd is important.  If the multiplier is even, the msbit of the input
doesn't affect the hash result at all.  x and (x + 0x8000) hash to
the same value, always.  That just seems like a crappy hash function.

> Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.

It's not as clever as it could be; it just does the same Booth
recoding thing, a simple series of shifts with add/subtract.

Here's the ARM code that GCC produces (9 instructions, all dependent):

mult1:
add r3, r0, r0, lsl #1
rsb r3, r0, r3, lsl #5
add r3, r3, r3, lsl #4
rsb r3, r3, r3, lsl #5
add r3, r0, r3, lsl #5
add r3, r0, r3, lsl #1
add r3, r0, r3, lsl #3
add r3, r0, r3, lsl #3
rsb r0, r0, r3, lsl #3
bx  lr

versus the clever code (6 instructions, #4 and #5 could dual-issue):
mult2:
add r3, r0, r0, lsl #19
add r2, r3, r0, lsl #9
add r0, r2, r0, lsl #23
add r3, r3, r2, lsl #8
rsb r0, r0, r0, lsl #6
add r0, r0, r3, lsl #3
bx  lr

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:03 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley  
> wrote:
> 
> On 04/29/2016 04:01 PM, Doug Anderson wrote:
> > * serial allows numbering devices by alias.
> 
> Which is in fact a total nightmare.
> 
> While stable device order is mandatory in serial because of
> console command line parameters and existing userspace expectations,
> it is the number one barrier to providing a shared ttyS namespace
> for mixed uart platforms.
> 
> Stable device order has a very real (and often unforeseen) maintenance
> burden.
> 
> 
> Interesting. I wonder if these burdens are unique to serial or shared
> by all the other subsystems that allow ordering? Maybe this is all
> because of legacy reasons?

Well, the specific issue is certainly unique to serial.
But what I was suggesting is that 5 years from now, these patches
could be the "legacy reasons" in mmc.

FWIW, there is already a defacto expectation by boot configurations in the
field that a given mmc block device is stable across boots. The reality
is that 10's of kernel command lines look like:

root=/dev/mmcblk0p2

This was a recent regression fixed by Ulf in commit 9aaf3437aa72
("mmc: block: Use the mmc host device index as the mmcblk device index")

> Note that for MMC devices we wouldn't be asserting that _every_
> device has a stable order. Only those known at boot.
> 
> 
> For example, I noticed these patches are strictly for DT.
> I'm assuming that's because guaranteeing stable device order for
> mmc in general is more difficult; eg., by performing minimal scan
> serially and more expensive portion of the probe async.
> 
> 
> Presumably if there were another system similar to DT where we knew
> all the possible built-in devices right at boot then we could
> extend.
> 
> ...but yes, obviously this wouldn't be a sane thing to do for
> anything probable like PCI or USB.

SCSI does. Here's what Linus said:

"So the *simple* model is to just scan the devices minimally serially,
and allocate the names at that point (so the names are reliable
between boots for the same hardware configuration). And then do the
more expensive device setup asynchronously (ie querying device
information, spinning up disks, whatever - things that can take
anything from milliseonds to several seconds, because they are doing
actual IO). So you'd do some very basic (and _often_ fairly quick)
operations serially, but then try to do the expensive parts
concurrently.

The SCSI layer actually goes a bit further than that: it has a fairly
asynchronous scanning thing, but it does allocate the actual host
device nodes serially, and then it even has an ordered list of
"scanning_hosts" that is used to complete the scanning in-order, so
that the sysfs devices show up in the right order even if things
actually got scanned out-of-order. So scans that finished early will
wait for other scans that are for "earlier" devices, and you end up
with what *looks* ordered to the outside, even if internally it was
all done out-of-order.
"

Regards,
Peter Hurley

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

> At least for my tests, even that seems to actually be a total
> non-issue. Yes, odd values *might* be better, but as mentioned in my
> crossing email, it doesn't actually seem to matter for any case the
> kernel cares about, since we tend to want to hash down to 10-20 bits
> of data, so the least significant bit (particularly for the 64-bit
> case) just doesn't matter all that much.

Odd is important.  If the multiplier is even, the msbit of the input
doesn't affect the hash result at all.  x and (x + 0x8000) hash to
the same value, always.  That just seems like a crappy hash function.

> Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.

It's not as clever as it could be; it just does the same Booth
recoding thing, a simple series of shifts with add/subtract.

Here's the ARM code that GCC produces (9 instructions, all dependent):

mult1:
add r3, r0, r0, lsl #1
rsb r3, r0, r3, lsl #5
add r3, r3, r3, lsl #4
rsb r3, r3, r3, lsl #5
add r3, r0, r3, lsl #5
add r3, r0, r3, lsl #1
add r3, r0, r3, lsl #3
add r3, r0, r3, lsl #3
rsb r0, r0, r3, lsl #3
bx  lr

versus the clever code (6 instructions, #4 and #5 could dual-issue):
mult2:
add r3, r0, r0, lsl #19
add r2, r3, r0, lsl #9
add r0, r2, r0, lsl #23
add r3, r3, r2, lsl #8
rsb r0, r0, r0, lsl #6
add r0, r0, r3, lsl #3
bx  lr

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:03 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley  
> wrote:
> 
> On 04/29/2016 04:01 PM, Doug Anderson wrote:
> > * serial allows numbering devices by alias.
> 
> Which is in fact a total nightmare.
> 
> While stable device order is mandatory in serial because of
> console command line parameters and existing userspace expectations,
> it is the number one barrier to providing a shared ttyS namespace
> for mixed uart platforms.
> 
> Stable device order has a very real (and often unforeseen) maintenance
> burden.
> 
> 
> Interesting. I wonder if these burdens are unique to serial or shared
> by all the other subsystems that allow ordering? Maybe this is all
> because of legacy reasons?

Well, the specific issue is certainly unique to serial.
But what I was suggesting is that 5 years from now, these patches
could be the "legacy reasons" in mmc.

FWIW, there is already a defacto expectation by boot configurations in the
field that a given mmc block device is stable across boots. The reality
is that 10's of kernel command lines look like:

root=/dev/mmcblk0p2

This was a recent regression fixed by Ulf in commit 9aaf3437aa72
("mmc: block: Use the mmc host device index as the mmcblk device index")

> Note that for MMC devices we wouldn't be asserting that _every_
> device has a stable order. Only those known at boot.
> 
> 
> For example, I noticed these patches are strictly for DT.
> I'm assuming that's because guaranteeing stable device order for
> mmc in general is more difficult; eg., by performing minimal scan
> serially and more expensive portion of the probe async.
> 
> 
> Presumably if there were another system similar to DT where we knew
> all the possible built-in devices right at boot then we could
> extend.
> 
> ...but yes, obviously this wouldn't be a sane thing to do for
> anything probable like PCI or USB.

SCSI does. Here's what Linus said:

"So the *simple* model is to just scan the devices minimally serially,
and allocate the names at that point (so the names are reliable
between boots for the same hardware configuration). And then do the
more expensive device setup asynchronously (ie querying device
information, spinning up disks, whatever - things that can take
anything from milliseonds to several seconds, because they are doing
actual IO). So you'd do some very basic (and _often_ fairly quick)
operations serially, but then try to do the expensive parts
concurrently.

The SCSI layer actually goes a bit further than that: it has a fairly
asynchronous scanning thing, but it does allocate the actual host
device nodes serially, and then it even has an ordered list of
"scanning_hosts" that is used to complete the scanning in-order, so
that the sysfs devices show up in the right order even if things
actually got scanned out-of-order. So scans that finished early will
wait for other scans that are for "earlier" devices, and you end up
with what *looks* ordered to the outside, even if internally it was
all done out-of-order.
"

Regards,
Peter Hurley

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 02:04:18PM +0200, Michal Hocko wrote:
> I would also like to revisit generic inode/dentry shrinker and see
> whether it could be more __GFP_FS friendly. As you say many FS might
> even not depend on some FS internal locks so pushing GFP_FS check down
> the layers might make a lot of sense and allow to clean some [id]cache
> even for __GFP_FS context.

That's precisely my point about passing a context to the shrinker.
It's recursion within a single superblock context that makes up the
majority of cases GFP_NOFS is used for, so passing the superblock
immediately allows for reclaim to run the superblock shrinker on
every other superblock.

We can refine it further from there, but I strongly suspect that
further refinement is going to require filesystems to specifically
configure the superblock shrinker.

e.g. in XFS, we can't allow evict() even on clean VFS inodes in a
PF_FSTRANS context, because we may run a transaction on a clean
VFS inode to prepare it for reclaim.  We can, however,
allow the fs-specific shrinker callouts to run (i.e. call into
.free_cached_objects) so that it can reclaim clean XFS inodes
because that doesn't require transactions

i.e. the infrastructure I suggested we use is aimed directly at
providing the mechanism required for finer-grained inode/dentry
cache reclaim in contexts that it is currently disallowed
completely. I was also implying that once the infrastructure to pass
contexts is in place, I'd then make the changes to the generic
superblock shrinker code to enable finer grained reclaim and
optimise the XFS shrinkers to make use of it...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 02:04:18PM +0200, Michal Hocko wrote:
> I would also like to revisit generic inode/dentry shrinker and see
> whether it could be more __GFP_FS friendly. As you say many FS might
> even not depend on some FS internal locks so pushing GFP_FS check down
> the layers might make a lot of sense and allow to clean some [id]cache
> even for __GFP_FS context.

That's precisely my point about passing a context to the shrinker.
It's recursion within a single superblock context that makes up the
majority of cases GFP_NOFS is used for, so passing the superblock
immediately allows for reclaim to run the superblock shrinker on
every other superblock.

We can refine it further from there, but I strongly suspect that
further refinement is going to require filesystems to specifically
configure the superblock shrinker.

e.g. in XFS, we can't allow evict() even on clean VFS inodes in a
PF_FSTRANS context, because we may run a transaction on a clean
VFS inode to prepare it for reclaim.  We can, however,
allow the fs-specific shrinker callouts to run (i.e. call into
.free_cached_objects) so that it can reclaim clean XFS inodes
because that doesn't require transactions

i.e. the infrastructure I suggested we use is aimed directly at
providing the mechanism required for finer-grained inode/dentry
cache reclaim in contexts that it is currently disallowed
completely. I was also implying that once the infrastructure to pass
contexts is in place, I'd then make the changes to the generic
superblock shrinker code to enable finer grained reclaim and
optimise the XFS shrinkers to make use of it...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH v2] ecryptfs: open lower files using kernel creds

2016-04-29 Thread Tyler Hicks

On 2016-04-12 03:15:44, Ricky Zhou wrote:
> In LSMs such as SELinux, files can be associated with state from the
> credentials of the task that opens it. Since ecryptfs shares a single
> handle to lower files across tasks that access it, others tasks can
> later be denied access to the lower file as a result.
> 
> This change removes the kthread and unconditionally opens lower files
> with kernel service credentials. Under SELinux, users will need to allow
> the FD__USE permissions on the kernel context to process that need to
> access files on an ecryptfs filesystem.
> 
> Signed-off-by: Ricky Zhou 
> ---
>  fs/ecryptfs/Makefile  |   2 +-
>  fs/ecryptfs/ecryptfs_kernel.h |   4 -
>  fs/ecryptfs/kthread.c | 172 
> --
>  fs/ecryptfs/main.c|  78 +++
>  4 files changed, 65 insertions(+), 191 deletions(-)
>  delete mode 100644 fs/ecryptfs/kthread.c
> 
> diff --git a/fs/ecryptfs/Makefile b/fs/ecryptfs/Makefile
> index 49678a6..f0bcf51 100644
> --- a/fs/ecryptfs/Makefile
> +++ b/fs/ecryptfs/Makefile
> @@ -5,6 +5,6 @@
>  obj-$(CONFIG_ECRYPT_FS) += ecryptfs.o
>  
>  ecryptfs-y := dentry.o file.o inode.o main.o super.o mmap.o read_write.o \
> -   crypto.o keystore.o kthread.o debug.o
> +   crypto.o keystore.o debug.o
>  
>  ecryptfs-$(CONFIG_ECRYPT_FS_MESSAGING) += messaging.o miscdev.o
> diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
> index d123fba..6434736 100644
> --- a/fs/ecryptfs/ecryptfs_kernel.h
> +++ b/fs/ecryptfs/ecryptfs_kernel.h
> @@ -695,10 +695,6 @@ int ecryptfs_find_daemon_by_euid(struct ecryptfs_daemon 
> **daemon);
>  #endif
>  int ecryptfs_init_kthread(void);
>  void ecryptfs_destroy_kthread(void);

The ecryptfs_{init,destroy}_kthread() declarations also need to be
removed.

> -int ecryptfs_privileged_open(struct file **lower_file,
> -  struct dentry *lower_dentry,
> -  struct vfsmount *lower_mnt,
> -  const struct cred *cred);
>  int ecryptfs_get_lower_file(struct dentry *dentry, struct inode *inode);
>  void ecryptfs_put_lower_file(struct inode *inode);
>  int
> diff --git a/fs/ecryptfs/kthread.c b/fs/ecryptfs/kthread.c
> deleted file mode 100644
> index 866bb18..000
> --- a/fs/ecryptfs/kthread.c
> +++ /dev/null
> @@ -1,172 +0,0 @@
> -/**
> - * eCryptfs: Linux filesystem encryption layer
> - *
> - * Copyright (C) 2008 International Business Machines Corp.
> - *   Author(s): Michael A. Halcrow 
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License as
> - * published by the Free Software Foundation; either version 2 of the
> - * License, or (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
> - * 02111-1307, USA.
> - */
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include "ecryptfs_kernel.h"
> -
> -struct ecryptfs_open_req {
> - struct file **lower_file;
> - struct path path;
> - struct completion done;
> - struct list_head kthread_ctl_list;
> -};
> -
> -static struct ecryptfs_kthread_ctl {
> -#define ECRYPTFS_KTHREAD_ZOMBIE 0x0001
> - u32 flags;
> - struct mutex mux;
> - struct list_head req_list;
> - wait_queue_head_t wait;
> -} ecryptfs_kthread_ctl;
> -
> -static struct task_struct *ecryptfs_kthread;
> -
> -/**
> - * ecryptfs_threadfn
> - * @ignored: ignored
> - *
> - * The eCryptfs kernel thread that has the responsibility of getting
> - * the lower file with RW permissions.
> - *
> - * Returns zero on success; non-zero otherwise
> - */
> -static int ecryptfs_threadfn(void *ignored)
> -{
> - set_freezable();
> - while (1)  {
> - struct ecryptfs_open_req *req;
> -
> - wait_event_freezable(
> - ecryptfs_kthread_ctl.wait,
> - (!list_empty(_kthread_ctl.req_list)
> -  || kthread_should_stop()));
> - mutex_lock(_kthread_ctl.mux);
> - if (ecryptfs_kthread_ctl.flags & ECRYPTFS_KTHREAD_ZOMBIE) {
> - mutex_unlock(_kthread_ctl.mux);
> - goto out;
> - }
> - while (!list_empty(_kthread_ctl.req_list)) {
> - req = list_first_entry(_kthread_ctl.req_list,
> -struct ecryptfs_open_req,
> -

Re: [PATCH v2] ecryptfs: open lower files using kernel creds

2016-04-29 Thread Tyler Hicks

On 2016-04-12 03:15:44, Ricky Zhou wrote:
> In LSMs such as SELinux, files can be associated with state from the
> credentials of the task that opens it. Since ecryptfs shares a single
> handle to lower files across tasks that access it, others tasks can
> later be denied access to the lower file as a result.
> 
> This change removes the kthread and unconditionally opens lower files
> with kernel service credentials. Under SELinux, users will need to allow
> the FD__USE permissions on the kernel context to process that need to
> access files on an ecryptfs filesystem.
> 
> Signed-off-by: Ricky Zhou 
> ---
>  fs/ecryptfs/Makefile  |   2 +-
>  fs/ecryptfs/ecryptfs_kernel.h |   4 -
>  fs/ecryptfs/kthread.c | 172 
> --
>  fs/ecryptfs/main.c|  78 +++
>  4 files changed, 65 insertions(+), 191 deletions(-)
>  delete mode 100644 fs/ecryptfs/kthread.c
> 
> diff --git a/fs/ecryptfs/Makefile b/fs/ecryptfs/Makefile
> index 49678a6..f0bcf51 100644
> --- a/fs/ecryptfs/Makefile
> +++ b/fs/ecryptfs/Makefile
> @@ -5,6 +5,6 @@
>  obj-$(CONFIG_ECRYPT_FS) += ecryptfs.o
>  
>  ecryptfs-y := dentry.o file.o inode.o main.o super.o mmap.o read_write.o \
> -   crypto.o keystore.o kthread.o debug.o
> +   crypto.o keystore.o debug.o
>  
>  ecryptfs-$(CONFIG_ECRYPT_FS_MESSAGING) += messaging.o miscdev.o
> diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
> index d123fba..6434736 100644
> --- a/fs/ecryptfs/ecryptfs_kernel.h
> +++ b/fs/ecryptfs/ecryptfs_kernel.h
> @@ -695,10 +695,6 @@ int ecryptfs_find_daemon_by_euid(struct ecryptfs_daemon 
> **daemon);
>  #endif
>  int ecryptfs_init_kthread(void);
>  void ecryptfs_destroy_kthread(void);

The ecryptfs_{init,destroy}_kthread() declarations also need to be
removed.

> -int ecryptfs_privileged_open(struct file **lower_file,
> -  struct dentry *lower_dentry,
> -  struct vfsmount *lower_mnt,
> -  const struct cred *cred);
>  int ecryptfs_get_lower_file(struct dentry *dentry, struct inode *inode);
>  void ecryptfs_put_lower_file(struct inode *inode);
>  int
> diff --git a/fs/ecryptfs/kthread.c b/fs/ecryptfs/kthread.c
> deleted file mode 100644
> index 866bb18..000
> --- a/fs/ecryptfs/kthread.c
> +++ /dev/null
> @@ -1,172 +0,0 @@
> -/**
> - * eCryptfs: Linux filesystem encryption layer
> - *
> - * Copyright (C) 2008 International Business Machines Corp.
> - *   Author(s): Michael A. Halcrow 
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License as
> - * published by the Free Software Foundation; either version 2 of the
> - * License, or (at your option) any later version.
> - *
> - * This program is distributed in the hope that it will be useful, but
> - * WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> - * General Public License for more details.
> - *
> - * You should have received a copy of the GNU General Public License
> - * along with this program; if not, write to the Free Software
> - * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
> - * 02111-1307, USA.
> - */
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include "ecryptfs_kernel.h"
> -
> -struct ecryptfs_open_req {
> - struct file **lower_file;
> - struct path path;
> - struct completion done;
> - struct list_head kthread_ctl_list;
> -};
> -
> -static struct ecryptfs_kthread_ctl {
> -#define ECRYPTFS_KTHREAD_ZOMBIE 0x0001
> - u32 flags;
> - struct mutex mux;
> - struct list_head req_list;
> - wait_queue_head_t wait;
> -} ecryptfs_kthread_ctl;
> -
> -static struct task_struct *ecryptfs_kthread;
> -
> -/**
> - * ecryptfs_threadfn
> - * @ignored: ignored
> - *
> - * The eCryptfs kernel thread that has the responsibility of getting
> - * the lower file with RW permissions.
> - *
> - * Returns zero on success; non-zero otherwise
> - */
> -static int ecryptfs_threadfn(void *ignored)
> -{
> - set_freezable();
> - while (1)  {
> - struct ecryptfs_open_req *req;
> -
> - wait_event_freezable(
> - ecryptfs_kthread_ctl.wait,
> - (!list_empty(_kthread_ctl.req_list)
> -  || kthread_should_stop()));
> - mutex_lock(_kthread_ctl.mux);
> - if (ecryptfs_kthread_ctl.flags & ECRYPTFS_KTHREAD_ZOMBIE) {
> - mutex_unlock(_kthread_ctl.mux);
> - goto out;
> - }
> - while (!list_empty(_kthread_ctl.req_list)) {
> - req = list_first_entry(_kthread_ctl.req_list,
> -struct ecryptfs_open_req,
> -kthread_ctl_list);
> -

Re: [RFC][PATCH 14/31] locking,metag: Implement atomic_fetch_{add,sub,and,or,xor}()

2016-04-29 Thread James Hogan

Hi Peter,

On Fri, Apr 22, 2016 at 11:04:27AM +0200, Peter Zijlstra wrote:
> Implement FETCH-OP atomic primitives, these are very similar to the
> existing OP-RETURN primitives we already have, except they return the
> value of the atomic variable _before_ modification.
> 
> This is especially useful for irreversible operations -- such as
> bitops (because it becomes impossible to reconstruct the state prior
> to modification).
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/metag/include/asm/atomic.h|2 +
>  arch/metag/include/asm/atomic_lnkget.h |   36 
> +
>  arch/metag/include/asm/atomic_lock1.h  |   33 ++
>  3 files changed, 63 insertions(+), 8 deletions(-)
> 
> --- a/arch/metag/include/asm/atomic.h
> +++ b/arch/metag/include/asm/atomic.h
> @@ -17,6 +17,8 @@
>  #include 
>  #endif
>  
> +#define atomic_fetch_or atomic_fetch_or
> +
>  #define atomic_add_negative(a, v)   (atomic_add_return((a), (v)) < 0)
>  
>  #define atomic_dec_return(v) atomic_sub_return(1, (v))
> --- a/arch/metag/include/asm/atomic_lnkget.h
> +++ b/arch/metag/include/asm/atomic_lnkget.h
> @@ -69,16 +69,44 @@ static inline int atomic_##op##_return(i
>   return result;  \
>  }
>  
> -#define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op)
> +#define ATOMIC_FETCH_OP(op)  \
> +static inline int atomic_fetch_##op(int i, atomic_t *v)  
> \
> +{\
> + int result, temp;   \
> + \
> + smp_mb();   \
> + \
> + asm volatile (  \
> + "1: LNKGETD %1, [%2]\n" \
> + "   " #op " %0, %1, %3\n"   \

i was hoping never to have to think about meta asm constraints again :-P

and/or/xor are only available in the data units, as determined by %1 in
this case, so the constraint for result shouldn't have "a" in it.

diff --git a/arch/metag/include/asm/atomic_lnkget.h 
b/arch/metag/include/asm/atomic_lnkget.h
index 50ad05050947..def2c642f053 100644
--- a/arch/metag/include/asm/atomic_lnkget.h
+++ b/arch/metag/include/asm/atomic_lnkget.h
@@ -84,7 +84,7 @@ static inline int atomic_fetch_##op(int i, atomic_t *v)   
\
"   ANDT%0, %0, #HI(0x3f00)\n"  \
"   CMPT%0, #HI(0x0200)\n"  \
"   BNZ 1b\n"   \
-   : "=" (temp), "=" (result) \
+   : "=" (temp), "=" (result)  \
: "da" (>counter), "bd" (i)  \
: "cc");\

That also ensures the "bd" constraint for %3 (meaning "an op2 register
where op1 [%1 in this case] is a data unit register and the instruction
supports O2R") is consistent.

So with that change this patch looks good to me:
Acked-by: James Hogan 


Note that for the ATOMIC_OP_RETURN() case (add/sub only) either address
or data units can be used (hence the "da" for %1), but then the "bd"
constraint on %3 is wrong as op1 [%1] may not be in data unit (sorry I
didn't spot that at the time). I'll queue a fix, something like below
probably ("br" means "An Op2 register and the instruction supports O2R",
i.e. op1/%1 doesn't have to be a data unit register):

diff --git a/arch/metag/include/asm/atomic_lnkget.h 
b/arch/metag/include/asm/atomic_lnkget.h
index 50ad05050947..def2c642f053 100644
--- a/arch/metag/include/asm/atomic_lnkget.h
+++ b/arch/metag/include/asm/atomic_lnkget.h
@@ -61,7 +61,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v)
\
"   CMPT%0, #HI(0x0200)\n"  \
"   BNZ 1b\n"   \
: "=" (temp), "=" (result) \
-   : "da" (>counter), "bd" (i)  \
+   : "da" (>counter), "br" (i)  \
: "cc");\
\
smp_mb();   \
\   
\

Thanks
James

> + "   LNKSETD [%2], %0\n" \
> +

Re: [RFC][PATCH 14/31] locking,metag: Implement atomic_fetch_{add,sub,and,or,xor}()

2016-04-29 Thread James Hogan

Hi Peter,

On Fri, Apr 22, 2016 at 11:04:27AM +0200, Peter Zijlstra wrote:
> Implement FETCH-OP atomic primitives, these are very similar to the
> existing OP-RETURN primitives we already have, except they return the
> value of the atomic variable _before_ modification.
> 
> This is especially useful for irreversible operations -- such as
> bitops (because it becomes impossible to reconstruct the state prior
> to modification).
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/metag/include/asm/atomic.h|2 +
>  arch/metag/include/asm/atomic_lnkget.h |   36 
> +
>  arch/metag/include/asm/atomic_lock1.h  |   33 ++
>  3 files changed, 63 insertions(+), 8 deletions(-)
> 
> --- a/arch/metag/include/asm/atomic.h
> +++ b/arch/metag/include/asm/atomic.h
> @@ -17,6 +17,8 @@
>  #include 
>  #endif
>  
> +#define atomic_fetch_or atomic_fetch_or
> +
>  #define atomic_add_negative(a, v)   (atomic_add_return((a), (v)) < 0)
>  
>  #define atomic_dec_return(v) atomic_sub_return(1, (v))
> --- a/arch/metag/include/asm/atomic_lnkget.h
> +++ b/arch/metag/include/asm/atomic_lnkget.h
> @@ -69,16 +69,44 @@ static inline int atomic_##op##_return(i
>   return result;  \
>  }
>  
> -#define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op)
> +#define ATOMIC_FETCH_OP(op)  \
> +static inline int atomic_fetch_##op(int i, atomic_t *v)  
> \
> +{\
> + int result, temp;   \
> + \
> + smp_mb();   \
> + \
> + asm volatile (  \
> + "1: LNKGETD %1, [%2]\n" \
> + "   " #op " %0, %1, %3\n"   \

i was hoping never to have to think about meta asm constraints again :-P

and/or/xor are only available in the data units, as determined by %1 in
this case, so the constraint for result shouldn't have "a" in it.

diff --git a/arch/metag/include/asm/atomic_lnkget.h 
b/arch/metag/include/asm/atomic_lnkget.h
index 50ad05050947..def2c642f053 100644
--- a/arch/metag/include/asm/atomic_lnkget.h
+++ b/arch/metag/include/asm/atomic_lnkget.h
@@ -84,7 +84,7 @@ static inline int atomic_fetch_##op(int i, atomic_t *v)   
\
"   ANDT%0, %0, #HI(0x3f00)\n"  \
"   CMPT%0, #HI(0x0200)\n"  \
"   BNZ 1b\n"   \
-   : "=" (temp), "=" (result) \
+   : "=" (temp), "=" (result)  \
: "da" (>counter), "bd" (i)  \
: "cc");\

That also ensures the "bd" constraint for %3 (meaning "an op2 register
where op1 [%1 in this case] is a data unit register and the instruction
supports O2R") is consistent.

So with that change this patch looks good to me:
Acked-by: James Hogan 


Note that for the ATOMIC_OP_RETURN() case (add/sub only) either address
or data units can be used (hence the "da" for %1), but then the "bd"
constraint on %3 is wrong as op1 [%1] may not be in data unit (sorry I
didn't spot that at the time). I'll queue a fix, something like below
probably ("br" means "An Op2 register and the instruction supports O2R",
i.e. op1/%1 doesn't have to be a data unit register):

diff --git a/arch/metag/include/asm/atomic_lnkget.h 
b/arch/metag/include/asm/atomic_lnkget.h
index 50ad05050947..def2c642f053 100644
--- a/arch/metag/include/asm/atomic_lnkget.h
+++ b/arch/metag/include/asm/atomic_lnkget.h
@@ -61,7 +61,7 @@ static inline int atomic_##op##_return(int i, atomic_t *v)
\
"   CMPT%0, #HI(0x0200)\n"  \
"   BNZ 1b\n"   \
: "=" (temp), "=" (result) \
-   : "da" (>counter), "bd" (i)  \
+   : "da" (>counter), "br" (i)  \
: "cc");\
\
smp_mb();   \
\   
\

Thanks
James

> + "   LNKSETD [%2], %0\n" \
> + "   DEFR%0, TXSTAT\n"

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 03:35:42PM +1000, NeilBrown wrote:
> On Tue, Apr 26 2016, Michal Hocko wrote:
> 
> > Hi,
> > we have discussed this topic at LSF/MM this year. There was a general
> > interest in the scope GFP_NOFS allocation context among some FS
> > developers. For those who are not aware of the discussion or the issue
> > I am trying to sort out (or at least start in that direction) please
> > have a look at patch 1 which adds memalloc_nofs_{save,restore} api
> > which basically copies what we have for the scope GFP_NOIO allocation
> > context. I haven't converted any of the FS myself because that is way
> > beyond my area of expertise but I would be happy to help with further
> > changes on the MM front as well as in some more generic code paths.
> >
> > Dave had an idea on how to further improve the reclaim context to be
> > less all-or-nothing wrt. GFP_NOFS. In short he was suggesting an opaque
> > and FS specific cookie set in the FS allocation context and consumed
> > by the FS reclaim context to allow doing some provably save actions
> > that would be skipped due to GFP_NOFS normally.  I like this idea and
> > I believe we can go that direction regardless of the approach taken here.
> > Many filesystems simply need to cleanup their NOFS usage first before
> > diving into a more complex changes.>
> 
> This strikes me as over-engineering to work around an unnecessarily
> burdensome interface but without details it is hard to be certain.
> 
> Exactly what things happen in "FS reclaim context" which may, or may
> not, be safe depending on the specific FS allocation context?  Do they
> need to happen at all?
> 
> My research suggests that for most filesystems the only thing that
> happens in reclaim context that is at all troublesome is the final
> 'evict()' on an inode.  This needs to flush out dirty pages and sync the
> inode to storage.  Some time ago we moved most dirty-page writeout out
> of the reclaim context and into kswapd.  I think this was an excellent
> advance in simplicity.

No, we didn't move dirty page writeout to kswapd - we moved it back
to the background writeback threads where it can be done
efficiently.  kswapd should almost never do single page writeback
because of how inefficient it is from an IO perspective, even though
it can. i.e. if we are doing any significant amount of dirty page
writeback from memory reclaim (direct, kswapd or otherwise) then
we've screwed something up.

> If we could similarly move evict() into kswapd (and I believe we can)
> then most file systems would do nothing in reclaim context that
> interferes with allocation context.

When lots of GFP_NOFS allocation is being done, this already
happens. The shrinkers that can't run due to context accumulate the
work on the shrinker structure, and when the shrinker can next run
(e.g. run from kswapd) it runs all the deferred work from GFP_NOFS
reclaim contexts.

IOWs, we already move shrinker work from direct reclaim to kswapd
when appropriate.

> The exceptions include:
>  - nfs and any filesystem using fscache can block for up to 1 second
>in ->releasepage().  They used to block waiting for some IO, but that
>caused deadlocks and wasn't really needed.  I left the timeout because
>it seemed likely that some throttling would help.  I suspect that a
>careful analysis will show that there is sufficient throttling
>elsewhere.
> 
>  - xfs_qm_shrink_scan is nearly unique among shrinkers in that it waits
>for IO so it can free some quotainfo things. 

No it's not. evict() can block on IO - waiting for data or inode
writeback to complete, or even for filesystems to run transactions
on the inode. Hence the superblock shrinker can and does block in
inode cache reclaim.

Indeed, blocking the superblock shrinker in reclaim is a key part of
balancing inode cache pressure in XFS. If the shrinker starts
hitting dirty inodes, it blocks on cleaning them, thereby slowing
the rate of allocation to that which inodes can be cleaned and
reclaimed. There are also background threads that walk ahead freeing
clean inodes, but we have to throttle direct reclaim in this manner
otherwise the allocation pressure vastly outweighs the ability to
reclaim inodes. if we don't balance this, inode allocation triggers
the OOM killer because reclaim keeps reporting "no progress being
made" because dirty inodes are skipped. BY blocking on such inodes,
the shrinker makes progress (slowly) and reclaim sees that memory is
being freed and so continues without invoking the OOM killer...

>If it could be changed
>to just schedule the IO without waiting for it then I think this
>would be safe to be called in any FS allocation context.  It already
>uses a 'trylock' in xfs_dqlock_nowait() to avoid deadlocking
>if the lock is held.

We could, but then we have the same problem as the inode cache -
there's no indication of progress going back to the memory reclaim
subsystem, nor is reclaim able to throttle

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 03:35:42PM +1000, NeilBrown wrote:
> On Tue, Apr 26 2016, Michal Hocko wrote:
> 
> > Hi,
> > we have discussed this topic at LSF/MM this year. There was a general
> > interest in the scope GFP_NOFS allocation context among some FS
> > developers. For those who are not aware of the discussion or the issue
> > I am trying to sort out (or at least start in that direction) please
> > have a look at patch 1 which adds memalloc_nofs_{save,restore} api
> > which basically copies what we have for the scope GFP_NOIO allocation
> > context. I haven't converted any of the FS myself because that is way
> > beyond my area of expertise but I would be happy to help with further
> > changes on the MM front as well as in some more generic code paths.
> >
> > Dave had an idea on how to further improve the reclaim context to be
> > less all-or-nothing wrt. GFP_NOFS. In short he was suggesting an opaque
> > and FS specific cookie set in the FS allocation context and consumed
> > by the FS reclaim context to allow doing some provably save actions
> > that would be skipped due to GFP_NOFS normally.  I like this idea and
> > I believe we can go that direction regardless of the approach taken here.
> > Many filesystems simply need to cleanup their NOFS usage first before
> > diving into a more complex changes.>
> 
> This strikes me as over-engineering to work around an unnecessarily
> burdensome interface but without details it is hard to be certain.
> 
> Exactly what things happen in "FS reclaim context" which may, or may
> not, be safe depending on the specific FS allocation context?  Do they
> need to happen at all?
> 
> My research suggests that for most filesystems the only thing that
> happens in reclaim context that is at all troublesome is the final
> 'evict()' on an inode.  This needs to flush out dirty pages and sync the
> inode to storage.  Some time ago we moved most dirty-page writeout out
> of the reclaim context and into kswapd.  I think this was an excellent
> advance in simplicity.

No, we didn't move dirty page writeout to kswapd - we moved it back
to the background writeback threads where it can be done
efficiently.  kswapd should almost never do single page writeback
because of how inefficient it is from an IO perspective, even though
it can. i.e. if we are doing any significant amount of dirty page
writeback from memory reclaim (direct, kswapd or otherwise) then
we've screwed something up.

> If we could similarly move evict() into kswapd (and I believe we can)
> then most file systems would do nothing in reclaim context that
> interferes with allocation context.

When lots of GFP_NOFS allocation is being done, this already
happens. The shrinkers that can't run due to context accumulate the
work on the shrinker structure, and when the shrinker can next run
(e.g. run from kswapd) it runs all the deferred work from GFP_NOFS
reclaim contexts.

IOWs, we already move shrinker work from direct reclaim to kswapd
when appropriate.

> The exceptions include:
>  - nfs and any filesystem using fscache can block for up to 1 second
>in ->releasepage().  They used to block waiting for some IO, but that
>caused deadlocks and wasn't really needed.  I left the timeout because
>it seemed likely that some throttling would help.  I suspect that a
>careful analysis will show that there is sufficient throttling
>elsewhere.
> 
>  - xfs_qm_shrink_scan is nearly unique among shrinkers in that it waits
>for IO so it can free some quotainfo things. 

No it's not. evict() can block on IO - waiting for data or inode
writeback to complete, or even for filesystems to run transactions
on the inode. Hence the superblock shrinker can and does block in
inode cache reclaim.

Indeed, blocking the superblock shrinker in reclaim is a key part of
balancing inode cache pressure in XFS. If the shrinker starts
hitting dirty inodes, it blocks on cleaning them, thereby slowing
the rate of allocation to that which inodes can be cleaned and
reclaimed. There are also background threads that walk ahead freeing
clean inodes, but we have to throttle direct reclaim in this manner
otherwise the allocation pressure vastly outweighs the ability to
reclaim inodes. if we don't balance this, inode allocation triggers
the OOM killer because reclaim keeps reporting "no progress being
made" because dirty inodes are skipped. BY blocking on such inodes,
the shrinker makes progress (slowly) and reclaim sees that memory is
being freed and so continues without invoking the OOM killer...

>If it could be changed
>to just schedule the IO without waiting for it then I think this
>would be safe to be called in any FS allocation context.  It already
>uses a 'trylock' in xfs_dqlock_nowait() to avoid deadlocking
>if the lock is held.

We could, but then we have the same problem as the inode cache -
there's no indication of progress going back to the memory reclaim
subsystem, nor is reclaim able to throttle

Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 4:27 PM, "Josh Poimboeuf"  wrote:
>
> On Fri, Apr 29, 2016 at 02:38:02PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 29, 2016 at 1:50 PM, Josh Poimboeuf  wrote:
> > > On Fri, Apr 29, 2016 at 12:39:16PM -0700, Andy Lutomirski wrote:
> > >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf  
> > >> wrote:
> > >> > Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> > >> > stacks start at the same offset right above their saved pt_regs,
> > >> > regardless of which syscall was used to enter the kernel.  That creates
> > >> > a nice convention which makes it straightforward to identify the
> > >> > "bottom" of the stack, which can be useful for stack walking code which
> > >> > needs to verify the stack is sane.
> > >> >
> > >> > However there are still a few types of tasks which don't yet follow 
> > >> > that
> > >> > convention:
> > >> >
> > >> > 1) CPU idle tasks, aka the "swapper" tasks
> > >> >
> > >> > 2) freshly forked TIF_FORK tasks which don't have a stack at all
> > >> >
> > >> > Make the idle tasks conform to the new stack bottom convention by
> > >> > starting their stack at a sizeof(pt_regs) offset from the end of the
> > >> > stack page.
> > >> >
> > >> > Signed-off-by: Josh Poimboeuf 
> > >> > ---
> > >> >  arch/x86/kernel/head_64.S | 7 ---
> > >> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > >> > index 6dbd2c0..0b12311 100644
> > >> > --- a/arch/x86/kernel/head_64.S
> > >> > +++ b/arch/x86/kernel/head_64.S
> > >> > @@ -296,8 +296,9 @@ ENTRY(start_cpu)
> > >> >  *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > >> >  *  address given in m16:64.
> > >> >  */
> > >> > -   movqinitial_code(%rip),%rax
> > >> > -   pushq   $0  # fake return address to stop unwinder
> > >> > +   call1f  # put return address on stack for 
> > >> > unwinder
> > >> > +1: xorq%rbp, %rbp  # clear frame pointer
> > >> > +   movqinitial_code(%rip), %rax
> > >> > pushq   $__KERNEL_CS# set correct cs
> > >> > pushq   %rax# target address in negative space
> > >> > lretq
> > >> > @@ -325,7 +326,7 @@ ENDPROC(start_cpu0)
> > >> > GLOBAL(initial_gs)
> > >> > .quad   INIT_PER_CPU_VAR(irq_stack_union)
> > >> > GLOBAL(initial_stack)
> > >> > -   .quad  init_thread_union+THREAD_SIZE-8
> > >> > +   .quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
> > >>
> > >> As long as you're doing this, could you also set orig_ax to -1?  I
> > >> remember running into some oddities resulting from orig_ax containing
> > >> garbage at some point.
> > >
> > > I assume you mean to initialize the orig_rax value in the pt_regs at the
> > > bottom of the stack of the idle task?
> > >
> > > How could that cause a problem?  Since the idle task never returns from
> > > a system call, I'd assume that memory never gets accessed?
> > >
> >
> > Look at collect_syscall in lib/syscall.c
>
> I don't see how collect_syscall() can be called for the per-cpu idle
> "swapper" tasks (which is what the above code affects).  They don't have
> pids or /proc entries so you can't do /proc//syscall on them.

If so, then never mind.

--Andy

>
> --
> Josh

Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 4:27 PM, "Josh Poimboeuf"  wrote:
>
> On Fri, Apr 29, 2016 at 02:38:02PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 29, 2016 at 1:50 PM, Josh Poimboeuf  wrote:
> > > On Fri, Apr 29, 2016 at 12:39:16PM -0700, Andy Lutomirski wrote:
> > >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf  
> > >> wrote:
> > >> > Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> > >> > stacks start at the same offset right above their saved pt_regs,
> > >> > regardless of which syscall was used to enter the kernel.  That creates
> > >> > a nice convention which makes it straightforward to identify the
> > >> > "bottom" of the stack, which can be useful for stack walking code which
> > >> > needs to verify the stack is sane.
> > >> >
> > >> > However there are still a few types of tasks which don't yet follow 
> > >> > that
> > >> > convention:
> > >> >
> > >> > 1) CPU idle tasks, aka the "swapper" tasks
> > >> >
> > >> > 2) freshly forked TIF_FORK tasks which don't have a stack at all
> > >> >
> > >> > Make the idle tasks conform to the new stack bottom convention by
> > >> > starting their stack at a sizeof(pt_regs) offset from the end of the
> > >> > stack page.
> > >> >
> > >> > Signed-off-by: Josh Poimboeuf 
> > >> > ---
> > >> >  arch/x86/kernel/head_64.S | 7 ---
> > >> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > >> > index 6dbd2c0..0b12311 100644
> > >> > --- a/arch/x86/kernel/head_64.S
> > >> > +++ b/arch/x86/kernel/head_64.S
> > >> > @@ -296,8 +296,9 @@ ENTRY(start_cpu)
> > >> >  *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > >> >  *  address given in m16:64.
> > >> >  */
> > >> > -   movqinitial_code(%rip),%rax
> > >> > -   pushq   $0  # fake return address to stop unwinder
> > >> > +   call1f  # put return address on stack for 
> > >> > unwinder
> > >> > +1: xorq%rbp, %rbp  # clear frame pointer
> > >> > +   movqinitial_code(%rip), %rax
> > >> > pushq   $__KERNEL_CS# set correct cs
> > >> > pushq   %rax# target address in negative space
> > >> > lretq
> > >> > @@ -325,7 +326,7 @@ ENDPROC(start_cpu0)
> > >> > GLOBAL(initial_gs)
> > >> > .quad   INIT_PER_CPU_VAR(irq_stack_union)
> > >> > GLOBAL(initial_stack)
> > >> > -   .quad  init_thread_union+THREAD_SIZE-8
> > >> > +   .quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
> > >>
> > >> As long as you're doing this, could you also set orig_ax to -1?  I
> > >> remember running into some oddities resulting from orig_ax containing
> > >> garbage at some point.
> > >
> > > I assume you mean to initialize the orig_rax value in the pt_regs at the
> > > bottom of the stack of the idle task?
> > >
> > > How could that cause a problem?  Since the idle task never returns from
> > > a system call, I'd assume that memory never gets accessed?
> > >
> >
> > Look at collect_syscall in lib/syscall.c
>
> I don't see how collect_syscall() can be called for the per-cpu idle
> "swapper" tasks (which is what the above code affects).  They don't have
> pids or /proc entries so you can't do /proc//syscall on them.

If so, then never mind.

--Andy

>
> --
> Josh

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 3:11 PM, "Jiri Kosina"  wrote:
>
> On Fri, 29 Apr 2016, Andy Lutomirski wrote:
>
> > > NMI, MCE and interrupts aren't a problem because they have dedicated
> > > stacks, which are easy to detect.  If the tasks' stack is on an
> > > exception stack or an irq stack, we consider it unreliable.
> >
> > Only on x86_64.
>
> Well, MCEs are more or less x86-specific as well. But otherwise good
> point, thanks Andy.
>
> So, how does stack layout generally look like in case when NMI is actually
> running on proper kernel stack? I thought it's guaranteed to contain
> pt_regs anyway in all cases. Is that not guaranteed to be the case?
>

On x86, at least, there will still be pt_regs for the NMI.  For the
interrupted state, though, there might not be pt_regs, as the NMI
might have happened while still populating pt_regs.  In fact, the NMI
stack could overlap task_pt_regs.

For x86_32, there's no guarantee that pt_regs contains sp due to
hardware silliness.  You need to parse it more carefully, as,
!user_mode(regs), then the old sp is just above pt_regs.

--Andy

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 3:11 PM, "Jiri Kosina"  wrote:
>
> On Fri, 29 Apr 2016, Andy Lutomirski wrote:
>
> > > NMI, MCE and interrupts aren't a problem because they have dedicated
> > > stacks, which are easy to detect.  If the tasks' stack is on an
> > > exception stack or an irq stack, we consider it unreliable.
> >
> > Only on x86_64.
>
> Well, MCEs are more or less x86-specific as well. But otherwise good
> point, thanks Andy.
>
> So, how does stack layout generally look like in case when NMI is actually
> running on proper kernel stack? I thought it's guaranteed to contain
> pt_regs anyway in all cases. Is that not guaranteed to be the case?
>

On x86, at least, there will still be pt_regs for the NMI.  For the
interrupted state, though, there might not be pt_regs, as the NMI
might have happened while still populating pt_regs.  In fact, the NMI
stack could overlap task_pt_regs.

For x86_32, there's no guarantee that pt_regs contains sp due to
hardware silliness.  You need to parse it more carefully, as,
!user_mode(regs), then the old sp is just above pt_regs.

--Andy

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 3:41 PM, "Josh Poimboeuf"  wrote:
>
> On Fri, Apr 29, 2016 at 02:37:41PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 29, 2016 at 2:25 PM, Josh Poimboeuf  wrote:
> > > I think the easiest way to make it work would be to modify the idtentry
> > > macro to put all the idt entries in a dedicated section.  Then the
> > > unwinder could easily detect any calls from that code.
> >
> > That would work.  Would it make sense to do the same for the irq entries?
>
> Yes, I think so.
>
> > >> I suppose we could try to rejigger the code so that rbp points to
> > >> pt_regs or similar.
> > >
> > > I think we should avoid doing something like that because it would break
> > > gdb and all the other unwinders who don't know about it.
> >
> > How so?
> >
> > Currently, rbp in the entry code is meaningless.  I'm suggesting that,
> > when we do, for example, 'call \do_sym' in idtentry, we point rbp to
> > the pt_regs.  Currently it points to something stale (which the
> > dump_stack code might be relying on.  Hmm.)  But it's probably also
> > safe to assume that if you unwind to the 'call \do_sym', then pt_regs
> > is the next thing on the stack, so just doing the section thing would
> > work.
>
> Yes, rbp is meaningless on the entry from user space.  But if an
> in-kernel interrupt occurs (e.g. page fault, preemption) and you have
> nested entry, rbp keeps its old value, right?  So the unwinder can walk
> past the nested entry frame and keep going until it gets to the original
> entry.

Yes.

It would be nice if we could do better, though, and actually notice
the pt_regs and identify the entry.  For example, I'd love to see
"page fault, RIP=xyz" printed in the middle of a stack dump on a
crash.  Also, I think that just following rbp links will lose the
actual function that took the page fault (or whatever function
pt_regs->ip actually points to).

>
> > We should really re-add DWARF some day.
>
> Working on it :-)

Excellent.

Have you looked at my vdso unwinding test at all?  If we could do
something similar for the kernel, IMO it would make testing much more
pleasant.

--Andy

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-04-29 Thread Andy Lutomirski

On Apr 29, 2016 3:41 PM, "Josh Poimboeuf"  wrote:
>
> On Fri, Apr 29, 2016 at 02:37:41PM -0700, Andy Lutomirski wrote:
> > On Fri, Apr 29, 2016 at 2:25 PM, Josh Poimboeuf  wrote:
> > > I think the easiest way to make it work would be to modify the idtentry
> > > macro to put all the idt entries in a dedicated section.  Then the
> > > unwinder could easily detect any calls from that code.
> >
> > That would work.  Would it make sense to do the same for the irq entries?
>
> Yes, I think so.
>
> > >> I suppose we could try to rejigger the code so that rbp points to
> > >> pt_regs or similar.
> > >
> > > I think we should avoid doing something like that because it would break
> > > gdb and all the other unwinders who don't know about it.
> >
> > How so?
> >
> > Currently, rbp in the entry code is meaningless.  I'm suggesting that,
> > when we do, for example, 'call \do_sym' in idtentry, we point rbp to
> > the pt_regs.  Currently it points to something stale (which the
> > dump_stack code might be relying on.  Hmm.)  But it's probably also
> > safe to assume that if you unwind to the 'call \do_sym', then pt_regs
> > is the next thing on the stack, so just doing the section thing would
> > work.
>
> Yes, rbp is meaningless on the entry from user space.  But if an
> in-kernel interrupt occurs (e.g. page fault, preemption) and you have
> nested entry, rbp keeps its old value, right?  So the unwinder can walk
> past the nested entry frame and keep going until it gets to the original
> entry.

Yes.

It would be nice if we could do better, though, and actually notice
the pt_regs and identify the entry.  For example, I'd love to see
"page fault, RIP=xyz" printed in the middle of a stack dump on a
crash.  Also, I think that just following rbp links will lose the
actual function that took the page fault (or whatever function
pt_regs->ip actually points to).

>
> > We should really re-add DWARF some day.
>
> Working on it :-)

Excellent.

Have you looked at my vdso unwinding test at all?  If we could do
something similar for the kernel, IMO it would make testing much more
pleasant.

--Andy

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 4:31 PM, George Spelvin  wrote:
>
> After researching it, I think that the "high bits of a multiply" is
> in fact a decent way to do such a hash.

Our emails crossed. Yes. My test harness actually likes the
multiplication more than most of the specialized "spread out bits"
versions I've found, but only if the constants are better-chosen than
the ones we have now.

> One thing I note is that the advice in the comments to choose a prime
> number is misquoting Knuth!  Knuth says (vol. 3 section 6.4) the number
> should be *relatively* prime to the word size, which for binary computers
> simply means odd.

At least for my tests, even that seems to actually be a total
non-issue. Yes, odd values *might* be better, but as mentioned in my
crossing email, it doesn't actually seem to matter for any case the
kernel cares about, since we tend to want to hash down to 10-20 bits
of data, so the least significant bit (particularly for the 64-bit
case) just doesn't matter all that much.

For the 32-bit case I suspect it's more noticeable, since we might be
using even half or more of the result.

But as mentioned: that least-order bit seems to be a *lot* less
important than the mix of the bits in the middle. Because even if your
input ends up being all zeroes in the low bits (it that worst-case
"page aligned pointers" case that Thomas had), and the hash multiplies
by an even number, you'll still end up just dropping all those zero
bits anyway.

> When we have a hardware multiplier, keeping the Hamming weight low is
> a waste of time.  When we don't, clever organization can do
> better than the very naive addition/subtraction chain in the
> current hash_64().

Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.

Nothing I know of does it for the 64-bit case, which is why our
current hand-written one isn't even the smark one.

> To multiply by the 32-bit constant 1640531527 = 0x61c88647 (which is
> the negative of the golden ratio, so has identical distribution
> properties) can be done in 6 shifts + adds, with a critical path
> length of 7 operations (3 shifts + 4 adds).

So the reason we don't do this for the 32-bit case is exactly that gcc
can already do this.

If you can do the same for the 64-bit case, that might be worth it.

Linus

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 4:31 PM, George Spelvin  wrote:
>
> After researching it, I think that the "high bits of a multiply" is
> in fact a decent way to do such a hash.

Our emails crossed. Yes. My test harness actually likes the
multiplication more than most of the specialized "spread out bits"
versions I've found, but only if the constants are better-chosen than
the ones we have now.

> One thing I note is that the advice in the comments to choose a prime
> number is misquoting Knuth!  Knuth says (vol. 3 section 6.4) the number
> should be *relatively* prime to the word size, which for binary computers
> simply means odd.

At least for my tests, even that seems to actually be a total
non-issue. Yes, odd values *might* be better, but as mentioned in my
crossing email, it doesn't actually seem to matter for any case the
kernel cares about, since we tend to want to hash down to 10-20 bits
of data, so the least significant bit (particularly for the 64-bit
case) just doesn't matter all that much.

For the 32-bit case I suspect it's more noticeable, since we might be
using even half or more of the result.

But as mentioned: that least-order bit seems to be a *lot* less
important than the mix of the bits in the middle. Because even if your
input ends up being all zeroes in the low bits (it that worst-case
"page aligned pointers" case that Thomas had), and the hash multiplies
by an even number, you'll still end up just dropping all those zero
bits anyway.

> When we have a hardware multiplier, keeping the Hamming weight low is
> a waste of time.  When we don't, clever organization can do
> better than the very naive addition/subtraction chain in the
> current hash_64().

Yeah. gcc will actually do the clever stuff for the 32-bit case, afaik.

Nothing I know of does it for the 64-bit case, which is why our
current hand-written one isn't even the smark one.

> To multiply by the 32-bit constant 1640531527 = 0x61c88647 (which is
> the negative of the golden ratio, so has identical distribution
> properties) can be done in 6 shifts + adds, with a critical path
> length of 7 operations (3 shifts + 4 adds).

So the reason we don't do this for the 32-bit case is exactly that gcc
can already do this.

If you can do the same for the 64-bit case, that might be worth it.

Linus

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 04:01 PM, Doug Anderson wrote:
> * serial allows numbering devices by alias.

Which is in fact a total nightmare.

While stable device order is mandatory in serial because of
console command line parameters and existing userspace expectations,
it is the number one barrier to providing a shared ttyS namespace
for mixed uart platforms.

Stable device order has a very real (and often unforeseen) maintenance
burden.

For example, I noticed these patches are strictly for DT.
I'm assuming that's because guaranteeing stable device order for
mmc in general is more difficult; eg., by performing minimal scan
serially and more expensive portion of the probe async.

Regards,
Peter Hurley

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 04:01 PM, Doug Anderson wrote:
> * serial allows numbering devices by alias.

Which is in fact a total nightmare.

While stable device order is mandatory in serial because of
console command line parameters and existing userspace expectations,
it is the number one barrier to providing a shared ttyS namespace
for mixed uart platforms.

Stable device order has a very real (and often unforeseen) maintenance
burden.

For example, I noticed these patches are strictly for DT.
I'm assuming that's because guaranteeing stable device order for
mmc in general is more difficult; eg., by performing minimal scan
serially and more expensive portion of the probe async.

Regards,
Peter Hurley

Re: [PATCH 3/3] input: touchscreen: mxs-lradc: Add support for touchscreen

2016-04-29 Thread Marek Vasut

On 04/30/2016 01:36 AM, Dmitry Torokhov wrote:
> Hi Ksenija,

Hi all,

> On Fri, Apr 29, 2016 at 01:49:11PM +0200, Ksenija Stanojevic wrote:
>> Add mxs-lradc touchscreen driver.
>>
>> Signed-off-by: Ksenija Stanojevic 
>> ---
>>  drivers/input/touchscreen/Kconfig|  14 +-
>>  drivers/input/touchscreen/Makefile   |   1 +
>>  drivers/input/touchscreen/mxs-lradc-ts.c | 729 
>> +++
>>  3 files changed, 742 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/input/touchscreen/mxs-lradc-ts.c
>>
>> diff --git a/drivers/input/touchscreen/Kconfig 
>> b/drivers/input/touchscreen/Kconfig
>> index 8ecdc38..d614d248 100644
>> --- a/drivers/input/touchscreen/Kconfig
>> +++ b/drivers/input/touchscreen/Kconfig
>> @@ -566,7 +566,7 @@ config TOUCHSCREEN_HP600
>>  depends on SH_HP6XX && SH_ADC
>>  help
>>Say Y here if you have a HP Jornada 620/660/680/690 and want to
>> -  support the built-in touchscreen.
>> +  support the built-in touchscreen.
>>  
>>To compile this driver as a module, choose M here: the
>>module will be called hp680_ts_input.
>> @@ -685,7 +685,7 @@ config TOUCHSCREEN_UCB1400
>>This enables support for the Philips UCB1400 touchscreen interface.
>>The UCB1400 is an AC97 audio codec.  The touchscreen interface
>>will be initialized only after the ALSA subsystem has been
>> -  brought up and the UCB1400 detected.  You therefore have to
>> +  brought up and the UCB1400 detected.  You therefore have to
> 
> Why do we have the tab in the middle of the text?

This shouldn't be a part of the patch.

>>configure ALSA support as well (either built-in or modular,
>>independently of whether this driver is itself built-in or
>>modular) for this driver to work.

[...]

>> +
>> +return 0;
>> +}
>> +
>> +static struct platform_driver mxs_lradc_ts_driver = {
>> +.driver = {
>> +.name = DRIVER_NAME_TS,
>> +},
>> +.probe  = mxs_lradc_ts_probe,
>> +.remove = mxs_lradc_ts_remove,
>> +};
>> +module_platform_driver(mxs_lradc_ts_driver);
>> +
>> +MODULE_LICENSE("GPL v2");
> 
> "GPL" since you are not limiting to v2 only.

The original driver ( drivers/iio/adc/mxs-lradc.c ) is GPLv2 , but
unless the license gets changed to BSD or somesuch, I don't think anyone
will really complain if it's changed to a more fitting version(s) of
GPL. I'm fine with any GPL version here.

Thanks for the thorough review, some points were new to me.

> Thanks.
> 


-- 
Best regards,
Marek Vasut

Re: [PATCH 3/3] input: touchscreen: mxs-lradc: Add support for touchscreen

2016-04-29 Thread Marek Vasut

On 04/30/2016 01:36 AM, Dmitry Torokhov wrote:
> Hi Ksenija,

Hi all,

> On Fri, Apr 29, 2016 at 01:49:11PM +0200, Ksenija Stanojevic wrote:
>> Add mxs-lradc touchscreen driver.
>>
>> Signed-off-by: Ksenija Stanojevic 
>> ---
>>  drivers/input/touchscreen/Kconfig|  14 +-
>>  drivers/input/touchscreen/Makefile   |   1 +
>>  drivers/input/touchscreen/mxs-lradc-ts.c | 729 
>> +++
>>  3 files changed, 742 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/input/touchscreen/mxs-lradc-ts.c
>>
>> diff --git a/drivers/input/touchscreen/Kconfig 
>> b/drivers/input/touchscreen/Kconfig
>> index 8ecdc38..d614d248 100644
>> --- a/drivers/input/touchscreen/Kconfig
>> +++ b/drivers/input/touchscreen/Kconfig
>> @@ -566,7 +566,7 @@ config TOUCHSCREEN_HP600
>>  depends on SH_HP6XX && SH_ADC
>>  help
>>Say Y here if you have a HP Jornada 620/660/680/690 and want to
>> -  support the built-in touchscreen.
>> +  support the built-in touchscreen.
>>  
>>To compile this driver as a module, choose M here: the
>>module will be called hp680_ts_input.
>> @@ -685,7 +685,7 @@ config TOUCHSCREEN_UCB1400
>>This enables support for the Philips UCB1400 touchscreen interface.
>>The UCB1400 is an AC97 audio codec.  The touchscreen interface
>>will be initialized only after the ALSA subsystem has been
>> -  brought up and the UCB1400 detected.  You therefore have to
>> +  brought up and the UCB1400 detected.  You therefore have to
> 
> Why do we have the tab in the middle of the text?

This shouldn't be a part of the patch.

>>configure ALSA support as well (either built-in or modular,
>>independently of whether this driver is itself built-in or
>>modular) for this driver to work.

[...]

>> +
>> +return 0;
>> +}
>> +
>> +static struct platform_driver mxs_lradc_ts_driver = {
>> +.driver = {
>> +.name = DRIVER_NAME_TS,
>> +},
>> +.probe  = mxs_lradc_ts_probe,
>> +.remove = mxs_lradc_ts_remove,
>> +};
>> +module_platform_driver(mxs_lradc_ts_driver);
>> +
>> +MODULE_LICENSE("GPL v2");
> 
> "GPL" since you are not limiting to v2 only.

The original driver ( drivers/iio/adc/mxs-lradc.c ) is GPLv2 , but
unless the license gets changed to BSD or somesuch, I don't think anyone
will really complain if it's changed to a more fitting version(s) of
GPL. I'm fine with any GPL version here.

Thanks for the thorough review, some points were new to me.

> Thanks.
> 


-- 
Best regards,
Marek Vasut

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 2:10 PM, Linus Torvalds
 wrote:
> On Thu, Apr 28, 2016 at 11:32 AM, Linus Torvalds
>  wrote:
>>
>> hash_long/ptr really shouldn't care about the low bits being zero at
>> all, because it should mix in all the bits (using a prime multiplier
>> and taking the high bits).
>
> Looking at this assertion, it doesn't actually pan out very well at all.
>
> The 32-bit hash looks to be ok. Certainly not perfect, but not horrible.
>
> The 64-bit hash seems to be quite horribly bad with lots of values.

Ok, I have tried to research this a bit more. There's a lot of
confusion here that causes the fact that hash_64() sucks donkey balls.

The basic method for the hashing method by multiplication is fairly
sane. It's well-documented, and attributed to Knuth as the comment
above it says.

However, there's two problems in there that degrade the hash, and
particularly so for the 64-bit case.

The first is the confusion about multiplying with a prime number..
That actually makes no sense at all, and is in fact entirely wrong.
There's no reason to try to pick a prime number for the
multiplication, and I'm not seeing Knuth having ever suggested that.

Knuth's suggestion is to do the multiplication with a floating point
value A somewhere in between 0 and 1, and multiplying the integer with
it, and then just taking the fractional part and multiply it up by 'm'
and do the floor of that to get a number in the range 0..m

At no point are primes involved.

And our multiplication really does approximate that - except it's
being done in fixed-point arithmetic (so the thing you multiply with
is basically n./2**64, and the overflow is what gets rid of the
fractional part - then getting the "high bits" of the result is really
just multiplying by a power of two and taking the floor of the
result).

So the first mistake is thinking that the thing you should multiply
with should be prime. The primality matters for when you use a
division to get a modulus, which is presumably where the confusion
came from.

Now, what value 'A' you use doesn't seem to really matter much. Knuth
suggested the fractional part of the golden ratio, but I suspect that
is purely because it's an irrational number that is not near to 0 or
1. You could use anything, although since "random bit pattern" is part
of the issue, irrational numbers are a good starting point. I suspect
that with our patterns, there's actually seldom a good reason to do
lots of high-order bits, so you might as well pick something closer to
0, but whatever - it's only going to matter for the overflow part that
gets thrown away anyway.

The second mistake - and the one that actually causes the real problem
- is to believe that the "bit sparseness" is a good thing. It's not.
It's _very_ much not. If you don't mix the bits well in the
multiplication, you get exactly the problem we hit: certain bit
patternsjust  will not mix up into the higher order bits.

So if you look at what the actual golden ratio representation *should* have bee:

  #define GOLDEN_RATIO_32 0x9e3779b9
  #define GOLDEN_RATIO_64 0x9e3779b97f4a7c16

and then compare it to the values we actually _use_ (bit-sparse primes
closeish to those values):

  #define GOLDEN_RATIO_PRIME_32 0x9e370001UL
  #define GOLDEN_RATIO_PRIME_64 0x9e37fffc0001UL

you start to see the problem. The right set of values have roughly 50%
of the bits set in a random pattern. The wrong set of values do not.

But as far as I an tell, you might as well use the fractional part of
'e' or 'pi' or just make random shit up that has a reasonable bit
distribution.

So we could use the fractional part of the golden ratio (phi):
  0x9e3779b9
  0x9e3779b97f4a7c16

or pi:
  0x243f6a88
  0x243f6a8885a308d3

or e:
  0xb7e15162
  0xb7e151628aed2a6b

or whatever. The constants don't have to be prime. They don't even
have to be odd, because the low bits will always be masked off anyway.
The whole "hash one integer value down to X bits" is very different in
this respect to things like string hashes, where you really do tend to
want primes (because you keep all bits).

But none of those are sparse. I think *some* amount of sparseness
might be ok if it allows people with bad CPU's to do it using
shift-and-adds, it just must not be as sparse as the current number,
the 64-bit one on particular.

There's presumably a few optimal values from a "spread bits out
evenly" standpoint, and they won't have anything to do with random
irrational constants, and will have everything to do with having nice
bitpatterns.

I'm adding Rik to the cc, because the original broken constants came
from him long long ago (they go back to 2002, originally only used for
the waitqueue hashing. Maybe he had some other input that caused him
to believe that the primeness actually mattered.

 Linus

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Linus Torvalds

On Fri, Apr 29, 2016 at 2:10 PM, Linus Torvalds
 wrote:
> On Thu, Apr 28, 2016 at 11:32 AM, Linus Torvalds
>  wrote:
>>
>> hash_long/ptr really shouldn't care about the low bits being zero at
>> all, because it should mix in all the bits (using a prime multiplier
>> and taking the high bits).
>
> Looking at this assertion, it doesn't actually pan out very well at all.
>
> The 32-bit hash looks to be ok. Certainly not perfect, but not horrible.
>
> The 64-bit hash seems to be quite horribly bad with lots of values.

Ok, I have tried to research this a bit more. There's a lot of
confusion here that causes the fact that hash_64() sucks donkey balls.

The basic method for the hashing method by multiplication is fairly
sane. It's well-documented, and attributed to Knuth as the comment
above it says.

However, there's two problems in there that degrade the hash, and
particularly so for the 64-bit case.

The first is the confusion about multiplying with a prime number..
That actually makes no sense at all, and is in fact entirely wrong.
There's no reason to try to pick a prime number for the
multiplication, and I'm not seeing Knuth having ever suggested that.

Knuth's suggestion is to do the multiplication with a floating point
value A somewhere in between 0 and 1, and multiplying the integer with
it, and then just taking the fractional part and multiply it up by 'm'
and do the floor of that to get a number in the range 0..m

At no point are primes involved.

And our multiplication really does approximate that - except it's
being done in fixed-point arithmetic (so the thing you multiply with
is basically n./2**64, and the overflow is what gets rid of the
fractional part - then getting the "high bits" of the result is really
just multiplying by a power of two and taking the floor of the
result).

So the first mistake is thinking that the thing you should multiply
with should be prime. The primality matters for when you use a
division to get a modulus, which is presumably where the confusion
came from.

Now, what value 'A' you use doesn't seem to really matter much. Knuth
suggested the fractional part of the golden ratio, but I suspect that
is purely because it's an irrational number that is not near to 0 or
1. You could use anything, although since "random bit pattern" is part
of the issue, irrational numbers are a good starting point. I suspect
that with our patterns, there's actually seldom a good reason to do
lots of high-order bits, so you might as well pick something closer to
0, but whatever - it's only going to matter for the overflow part that
gets thrown away anyway.

The second mistake - and the one that actually causes the real problem
- is to believe that the "bit sparseness" is a good thing. It's not.
It's _very_ much not. If you don't mix the bits well in the
multiplication, you get exactly the problem we hit: certain bit
patternsjust  will not mix up into the higher order bits.

So if you look at what the actual golden ratio representation *should* have bee:

  #define GOLDEN_RATIO_32 0x9e3779b9
  #define GOLDEN_RATIO_64 0x9e3779b97f4a7c16

and then compare it to the values we actually _use_ (bit-sparse primes
closeish to those values):

  #define GOLDEN_RATIO_PRIME_32 0x9e370001UL
  #define GOLDEN_RATIO_PRIME_64 0x9e37fffc0001UL

you start to see the problem. The right set of values have roughly 50%
of the bits set in a random pattern. The wrong set of values do not.

But as far as I an tell, you might as well use the fractional part of
'e' or 'pi' or just make random shit up that has a reasonable bit
distribution.

So we could use the fractional part of the golden ratio (phi):
  0x9e3779b9
  0x9e3779b97f4a7c16

or pi:
  0x243f6a88
  0x243f6a8885a308d3

or e:
  0xb7e15162
  0xb7e151628aed2a6b

or whatever. The constants don't have to be prime. They don't even
have to be odd, because the low bits will always be masked off anyway.
The whole "hash one integer value down to X bits" is very different in
this respect to things like string hashes, where you really do tend to
want primes (because you keep all bits).

But none of those are sparse. I think *some* amount of sparseness
might be ok if it allows people with bad CPU's to do it using
shift-and-adds, it just must not be as sparse as the current number,
the 64-bit one on particular.

There's presumably a few optimal values from a "spread bits out
evenly" standpoint, and they won't have anything to do with random
irrational constants, and will have everything to do with having nice
bitpatterns.

I'm adding Rik to the cc, because the original broken constants came
from him long long ago (they go back to 2002, originally only used for
the waitqueue hashing. Maybe he had some other input that caused him
to believe that the primeness actually mattered.

 Linus

Re: [RFC PATCH v1 13/18] x86: DMA support for memory encryption

2016-04-29 Thread Tom Lendacky

On 04/29/2016 11:27 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 29, 2016 at 10:12:45AM -0500, Tom Lendacky wrote:
>> On 04/29/2016 02:17 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Apr 26, 2016 at 05:58:12PM -0500, Tom Lendacky wrote:
 Since DMA addresses will effectively look like 48-bit addresses when the
 memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
 device performing the DMA does not support 48-bits. SWIOTLB will be
 initialized to create un-encrypted bounce buffers for use by these devices.

>>>
>>>
>>> I presume the sme_me_mask does not use the lower 48 bits?
>>
>> The sme_me_mask will actually be bit 47. So, when applied, the address
>> will become a 48-bit address.
>>
>>>
>>>
>>> ..snip..
 diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
 index 7d56d1b..594dc65 100644
 --- a/arch/x86/mm/mem_encrypt.c
 +++ b/arch/x86/mm/mem_encrypt.c
 @@ -12,6 +12,8 @@
  
  #include 
  #include 
 +#include 
 +#include 
  
  #include 
  #include 
 @@ -168,6 +170,25 @@ void __init sme_early_init(void)
  }
  
  /* Architecture __weak replacement functions */
 +void __init mem_encrypt_init(void)
 +{
 +  if (!sme_me_mask)
 +  return;
 +
 +  /* Make SWIOTLB use an unencrypted DMA area */
 +  swiotlb_clear_encryption();
 +}
 +
 +unsigned long swiotlb_get_me_mask(void)
 +{
 +  return sme_me_mask;
 +}
 +
 +void swiotlb_set_mem_dec(void *vaddr, unsigned long size)
 +{
 +  sme_set_mem_dec(vaddr, size);
 +}
 +
  void __init *efi_me_early_memremap(resource_size_t paddr,
   unsigned long size)
  {
 diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
 index 017fced..121b9de 100644
 --- a/include/linux/swiotlb.h
 +++ b/include/linux/swiotlb.h
 @@ -30,6 +30,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long 
 nslabs, int verbose);
  extern unsigned long swiotlb_nr_tbl(void);
  unsigned long swiotlb_size_or_default(void);
  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 +extern void __init swiotlb_clear_encryption(void);
  
  /*
   * Enumeration for sync targets
 diff --git a/init/main.c b/init/main.c
 index b3c6e36..1013d1c 100644
 --- a/init/main.c
 +++ b/init/main.c
 @@ -458,6 +458,10 @@ void __init __weak thread_info_cache_init(void)
  }
  #endif
  
 +void __init __weak mem_encrypt_init(void)
 +{
 +}
 +
  /*
   * Set up kernel memory allocators
   */
 @@ -597,6 +601,8 @@ asmlinkage __visible void __init start_kernel(void)
 */
locking_selftest();
  
 +  mem_encrypt_init();
 +
  #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
>>>
>>> What happens if devices use the bounce buffer before mem_encrypt_init()?
>>
>> The call to mem_encrypt_init is early in the boot process, I may have
>> overlooked something, but what devices would be performing DMA before
>> this?
> 
> I am not saying that you overlooked. Merely wondering if somebody re-orders 
> these
> calls what would happen. It maybe also good to have a comment right before
> mem_encrpyt_init stating what will happen if the device does DMA before the 
> function
> is called.
> 

Ah, ok. Before mem_encrypt_init is called the bounce buffers will be
marked as encrypted in the page tables. The use of the bounce buffers
will not have the memory encryption bit as part of the DMA address so a
device will DMA into memory in the clear. When the bounce buffer is
copied to the original buffer it will be accessed by a virtual address
that has the memory encryption bit set in the page tables. So the
plaintext data that was DMA'd in will be decrypted resulting in invalid
data in the destination buffer.

I'll be sure to add a comment before the call.

Thanks,
Tom

Re: [RFC PATCH v1 13/18] x86: DMA support for memory encryption

2016-04-29 Thread Tom Lendacky

On 04/29/2016 11:27 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 29, 2016 at 10:12:45AM -0500, Tom Lendacky wrote:
>> On 04/29/2016 02:17 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Apr 26, 2016 at 05:58:12PM -0500, Tom Lendacky wrote:
 Since DMA addresses will effectively look like 48-bit addresses when the
 memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
 device performing the DMA does not support 48-bits. SWIOTLB will be
 initialized to create un-encrypted bounce buffers for use by these devices.

>>>
>>>
>>> I presume the sme_me_mask does not use the lower 48 bits?
>>
>> The sme_me_mask will actually be bit 47. So, when applied, the address
>> will become a 48-bit address.
>>
>>>
>>>
>>> ..snip..
 diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
 index 7d56d1b..594dc65 100644
 --- a/arch/x86/mm/mem_encrypt.c
 +++ b/arch/x86/mm/mem_encrypt.c
 @@ -12,6 +12,8 @@
  
  #include 
  #include 
 +#include 
 +#include 
  
  #include 
  #include 
 @@ -168,6 +170,25 @@ void __init sme_early_init(void)
  }
  
  /* Architecture __weak replacement functions */
 +void __init mem_encrypt_init(void)
 +{
 +  if (!sme_me_mask)
 +  return;
 +
 +  /* Make SWIOTLB use an unencrypted DMA area */
 +  swiotlb_clear_encryption();
 +}
 +
 +unsigned long swiotlb_get_me_mask(void)
 +{
 +  return sme_me_mask;
 +}
 +
 +void swiotlb_set_mem_dec(void *vaddr, unsigned long size)
 +{
 +  sme_set_mem_dec(vaddr, size);
 +}
 +
  void __init *efi_me_early_memremap(resource_size_t paddr,
   unsigned long size)
  {
 diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
 index 017fced..121b9de 100644
 --- a/include/linux/swiotlb.h
 +++ b/include/linux/swiotlb.h
 @@ -30,6 +30,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long 
 nslabs, int verbose);
  extern unsigned long swiotlb_nr_tbl(void);
  unsigned long swiotlb_size_or_default(void);
  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 +extern void __init swiotlb_clear_encryption(void);
  
  /*
   * Enumeration for sync targets
 diff --git a/init/main.c b/init/main.c
 index b3c6e36..1013d1c 100644
 --- a/init/main.c
 +++ b/init/main.c
 @@ -458,6 +458,10 @@ void __init __weak thread_info_cache_init(void)
  }
  #endif
  
 +void __init __weak mem_encrypt_init(void)
 +{
 +}
 +
  /*
   * Set up kernel memory allocators
   */
 @@ -597,6 +601,8 @@ asmlinkage __visible void __init start_kernel(void)
 */
locking_selftest();
  
 +  mem_encrypt_init();
 +
  #ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
>>>
>>> What happens if devices use the bounce buffer before mem_encrypt_init()?
>>
>> The call to mem_encrypt_init is early in the boot process, I may have
>> overlooked something, but what devices would be performing DMA before
>> this?
> 
> I am not saying that you overlooked. Merely wondering if somebody re-orders 
> these
> calls what would happen. It maybe also good to have a comment right before
> mem_encrpyt_init stating what will happen if the device does DMA before the 
> function
> is called.
> 

Ah, ok. Before mem_encrypt_init is called the bounce buffers will be
marked as encrypted in the page tables. The use of the bounce buffers
will not have the memory encryption bit as part of the DMA address so a
device will DMA into memory in the clear. When the bounce buffer is
copied to the original buffer it will be accessed by a virtual address
that has the memory encryption bit set in the page tables. So the
plaintext data that was DMA'd in will be decrypted resulting in invalid
data in the destination buffer.

I'll be sure to add a comment before the call.

Thanks,
Tom

Re: [PATCH 26/32] perf/x86/intel/cqm: integrate CQM cgroups with scheduler

2016-04-29 Thread Vikas Shivappa




On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


When an event is terminated, intel_cqm_event_stop calls
pqr_cache_update_rmid and sets state->next_rmid to the rmid of its
parent in the RMID hierarchy. That would make next call to
__pqr_update to update PQR_ASSOC.


What about all the cases like context swith to a pid2 which is not monitored.. 
(assume event1 still exists)




On Fri, Apr 29, 2016 at 2:32 PM Vikas Shivappa
 wrote:




On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


if __intel_cqm_no_event_sched_in does nothing, the PQR_ASSOC msr is
still updated if state->rmid != state->next_rmid  in __pqr_update,


But due to 2 and 3 below they are equal ?



even if next_rmid_mode == PQR_RMID_MODE_NOEVENT .

On Fri, Apr 29, 2016 at 2:01 PM, Vikas Shivappa
 wrote:



On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


Not sure I see the problem you point here. In step 3, PQR_ASSOC is
updated with RMID1, __pqr_update is the one called using the scheduler
hook, right after perf sched_in .

On Fri, Apr 29, 2016 at 1:25 PM, Vikas Shivappa
 wrote:




On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote:


Allow monitored cgroups to update the PQR MSR during task switch even
without an associated perf_event.

The package RMID for the current monr associated with a monitored
cgroup is written to hw during task switch (after perf_events is run)
if perf_event did not write a RMID for an event.

perf_event and any other caller of pqr_cache_update_rmid can update the
CPU's RMID using one of two modes:
 - PQR_RMID_MODE_NOEVENT: A RMID that do not correspond to an event.
   e.g. the RMID of the root pmonr when no event is scheduled.
 - PQR_RMID_MODE_EVENT:   A RMID used by an event. Set during pmu::add
unset on pmu::del. This mode prevents from using a non-event
cgroup RMID.

This patch also introduces caching of writes to PQR MSR within the
per-pcu
pqr state variable. This interface to update RMIDs and CLOSIDs will be
also utilized in upcoming versions of Intel's MBM and CAT drivers.

Reviewed-by: Stephane Eranian 
Signed-off-by: David Carrillo-Cisneros 
---
arch/x86/events/intel/cqm.c   | 65
+--
arch/x86/events/intel/cqm.h   |  2 --
arch/x86/include/asm/pqr_common.h | 53 +++
arch/x86/kernel/cpu/pqr_common.c  | 46 +++
4 files changed, 135 insertions(+), 31 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index daf9fdf..4ece0a4 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -198,19 +198,6 @@ static inline int cqm_prmid_update(struct prmid
*prmid)
return __cqm_prmid_update(prmid, __rmid_min_update_time);
}

-/*
- * Updates caller cpu's cache.
- */
-static inline void __update_pqr_prmid(struct prmid *prmid)
-{
-   struct intel_pqr_state *state = this_cpu_ptr(_state);
-
-   if (state->rmid == prmid->rmid)
-   return;
-   state->rmid = prmid->rmid;
-   wrmsr(MSR_IA32_PQR_ASSOC, prmid->rmid, state->closid);
-}
-
static inline bool __valid_pkg_id(u16 pkg_id)
{
return pkg_id < PQR_MAX_NR_PKGS;
@@ -2531,12 +2518,11 @@ static inline bool cqm_group_leader(struct
perf_event *event)
static inline void __intel_cqm_event_start(
struct perf_event *event, union prmid_summary summary)
{
-   u16 pkg_id = topology_physical_package_id(smp_processor_id());
if (!(event->hw.state & PERF_HES_STOPPED))
return;
-
event->hw.state &= ~PERF_HES_STOPPED;
-   __update_pqr_prmid(__prmid_from_rmid(pkg_id,
summary.sched_rmid));
+
+   pqr_cache_update_rmid(summary.sched_rmid, PQR_RMID_MODE_EVENT);
}

static void intel_cqm_event_start(struct perf_event *event, int mode)
@@ -2566,7 +2552,7 @@ static void intel_cqm_event_stop(struct perf_event
*event, int mode)
/* Occupancy of CQM events is obtained at read. No need to read
 * when event is stopped since read on inactive cpus succeed.
 */
-   __update_pqr_prmid(__prmid_from_rmid(pkg_id,
summary.sched_rmid));
+   pqr_cache_update_rmid(summary.sched_rmid,
PQR_RMID_MODE_NOEVENT);
}

static int intel_cqm_event_add(struct perf_event *event, int mode)
@@ -2977,6 +2963,8 @@ static void intel_cqm_cpu_starting(unsigned int
cpu)

state->rmid = 0;
state->closid = 0;
+   state->next_rmid = 0;
+   state->next_closid = 0;

/* XXX: lock */
/* XXX: Make sure this case is handled when hotplug happens. */
@@ -3152,6 +3140,12 @@ static int __init intel_cqm_init(void)
pr_info("Intel CQM monitoring enabled with at least %u rmids per
package.\n",
min_max_rmid + 1);

+   /* Make sure pqr_common_enable_key is enabled after
+* cqm_initialized_key.
+*/
+   barrier();
+
+

Re: [PATCH 26/32] perf/x86/intel/cqm: integrate CQM cgroups with scheduler

2016-04-29 Thread Vikas Shivappa




On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


When an event is terminated, intel_cqm_event_stop calls
pqr_cache_update_rmid and sets state->next_rmid to the rmid of its
parent in the RMID hierarchy. That would make next call to
__pqr_update to update PQR_ASSOC.


What about all the cases like context swith to a pid2 which is not monitored.. 
(assume event1 still exists)




On Fri, Apr 29, 2016 at 2:32 PM Vikas Shivappa
 wrote:




On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


if __intel_cqm_no_event_sched_in does nothing, the PQR_ASSOC msr is
still updated if state->rmid != state->next_rmid  in __pqr_update,


But due to 2 and 3 below they are equal ?



even if next_rmid_mode == PQR_RMID_MODE_NOEVENT .

On Fri, Apr 29, 2016 at 2:01 PM, Vikas Shivappa
 wrote:



On Fri, 29 Apr 2016, David Carrillo-Cisneros wrote:


Not sure I see the problem you point here. In step 3, PQR_ASSOC is
updated with RMID1, __pqr_update is the one called using the scheduler
hook, right after perf sched_in .

On Fri, Apr 29, 2016 at 1:25 PM, Vikas Shivappa
 wrote:




On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote:


Allow monitored cgroups to update the PQR MSR during task switch even
without an associated perf_event.

The package RMID for the current monr associated with a monitored
cgroup is written to hw during task switch (after perf_events is run)
if perf_event did not write a RMID for an event.

perf_event and any other caller of pqr_cache_update_rmid can update the
CPU's RMID using one of two modes:
 - PQR_RMID_MODE_NOEVENT: A RMID that do not correspond to an event.
   e.g. the RMID of the root pmonr when no event is scheduled.
 - PQR_RMID_MODE_EVENT:   A RMID used by an event. Set during pmu::add
unset on pmu::del. This mode prevents from using a non-event
cgroup RMID.

This patch also introduces caching of writes to PQR MSR within the
per-pcu
pqr state variable. This interface to update RMIDs and CLOSIDs will be
also utilized in upcoming versions of Intel's MBM and CAT drivers.

Reviewed-by: Stephane Eranian 
Signed-off-by: David Carrillo-Cisneros 
---
arch/x86/events/intel/cqm.c   | 65
+--
arch/x86/events/intel/cqm.h   |  2 --
arch/x86/include/asm/pqr_common.h | 53 +++
arch/x86/kernel/cpu/pqr_common.c  | 46 +++
4 files changed, 135 insertions(+), 31 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index daf9fdf..4ece0a4 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -198,19 +198,6 @@ static inline int cqm_prmid_update(struct prmid
*prmid)
return __cqm_prmid_update(prmid, __rmid_min_update_time);
}

-/*
- * Updates caller cpu's cache.
- */
-static inline void __update_pqr_prmid(struct prmid *prmid)
-{
-   struct intel_pqr_state *state = this_cpu_ptr(_state);
-
-   if (state->rmid == prmid->rmid)
-   return;
-   state->rmid = prmid->rmid;
-   wrmsr(MSR_IA32_PQR_ASSOC, prmid->rmid, state->closid);
-}
-
static inline bool __valid_pkg_id(u16 pkg_id)
{
return pkg_id < PQR_MAX_NR_PKGS;
@@ -2531,12 +2518,11 @@ static inline bool cqm_group_leader(struct
perf_event *event)
static inline void __intel_cqm_event_start(
struct perf_event *event, union prmid_summary summary)
{
-   u16 pkg_id = topology_physical_package_id(smp_processor_id());
if (!(event->hw.state & PERF_HES_STOPPED))
return;
-
event->hw.state &= ~PERF_HES_STOPPED;
-   __update_pqr_prmid(__prmid_from_rmid(pkg_id,
summary.sched_rmid));
+
+   pqr_cache_update_rmid(summary.sched_rmid, PQR_RMID_MODE_EVENT);
}

static void intel_cqm_event_start(struct perf_event *event, int mode)
@@ -2566,7 +2552,7 @@ static void intel_cqm_event_stop(struct perf_event
*event, int mode)
/* Occupancy of CQM events is obtained at read. No need to read
 * when event is stopped since read on inactive cpus succeed.
 */
-   __update_pqr_prmid(__prmid_from_rmid(pkg_id,
summary.sched_rmid));
+   pqr_cache_update_rmid(summary.sched_rmid,
PQR_RMID_MODE_NOEVENT);
}

static int intel_cqm_event_add(struct perf_event *event, int mode)
@@ -2977,6 +2963,8 @@ static void intel_cqm_cpu_starting(unsigned int
cpu)

state->rmid = 0;
state->closid = 0;
+   state->next_rmid = 0;
+   state->next_closid = 0;

/* XXX: lock */
/* XXX: Make sure this case is handled when hotplug happens. */
@@ -3152,6 +3140,12 @@ static int __init intel_cqm_init(void)
pr_info("Intel CQM monitoring enabled with at least %u rmids per
package.\n",
min_max_rmid + 1);

+   /* Make sure pqr_common_enable_key is enabled after
+* cqm_initialized_key.
+*/
+   barrier();
+
+   static_branch_enable(_common_enable_key);
return ret;

error_init_mutex:
@@ -3163,4 +3157,41 @@ error:
return ret;
}

+/* Schedule task without a CQM

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > > [Trim the CC list]
> > > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > > [...]
> > > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > > lockdep gets very unhappy about the same functions being called with
> > > > different reclaim contexts. e.g.  directory block mapping might
> > > > occur from readdir (no transaction context) or within transactions
> > > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > > stop lockdep emitting false positive warnings
> > > 
> > > As already said in other email, I have tried to revert the above
> > > commit and tried to run it with some fs workloads but didn't manage
> > > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > > have tried to find reports which led to this commit but didn't succeed
> > > much. Everything is from much earlier or later. Do you happen to
> > > remember which loads triggered them, what they looked like or have an
> > > idea what to try to reproduce them? So far I was trying heavy parallel
> > > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > > triggered direct reclaim all the time.
> > 
> > Most of those issues were reported by users and not reproducable by
> > any obvious means.
> 
> I would really appreciate a reference to some of those (my google-fu has
> failed me) or at least a pattern of those splats

If you can't find them with google, then I won't. Google is mostly
useless as a patch/mailing list search tool these days. You can try
looking through this list:

https://www.google.com.au/search?q=XFS+lockdep+site:oss.sgi.com+-splice

but I'm not seeing anything particularly relevant in that list -
there isn't a single reclaim related lockdep report in that...

> - was it 
> "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> or a different class reports?

Typically that was involved, but it quite often there'd be a number
of locks and sometimes even interrupt stacks in an interaction
between 5 or 6 different processes. Lockdep covers all sorts of
stuff now (like fs freeze annotations as well as locks and memory
reclaim) so sometimes the only thing we can do is remove the
reclaim context from the stack and see if that makes it go away...
> 
> > They may have been fixed since, but I'm sceptical
> > of that because, generally speaking, developer testing only catches
> > the obvious lockdep issues. i.e. it's users that report all the
> > really twisty issues, and they are generally not reproducable except
> > under their production workloads...
> > 
> > IOWs, the absence of reports in your testing does not mean there
> > isn't a problem, and that is one of the biggest problems with
> > lockdep annotations - we have no way of ever knowing if they are
> > still necessary or not without exposing users to regressions and
> > potential deadlocks.
> 
> I understand your points here but if we are sure that those lockdep
> reports are just false positives then we should rather provide an api to
> silence lockdep for those paths

I agree with this - please provide such infrastructure before we
need it...

> than abusing GFP_NOFS which a) hurts
> the overal reclaim healthiness

Which doesn't actually seem to be a problem for the vast majority of
users.

> and b) works around a non-existing
> problem with lockdep disabled which is the vast majority of
> configurations.

But the moment we have a lockdep problem, we get bug reports from
all over the place and people complaining about it, so we are
*required* to silence them one way or another. And, like I said,
when the choice is simply adding GFP_NOFS or spending a week or two
completely reworking complex code that has functioned correctly for
15 years, the risk/reward *always* falls on the side of "just add
GFP_NOFS".

Please keep in mind that there is as much code in fs/xfs as there is
in the mm/ subsystem, and XFS has twice that in userspace as well.
I say this, because we have only have 3-4 full time developers to do
all the work required on this code base, unlike the mm/ subsystem
which had 30-40 full time MM developers attending LSFMM. This is why
I push back on suggestions that require significant redesign of
subsystem code to handle memory allocation/reclaim quirks - most
subsystems simply don't have the resources available to do such
work, and so will always look for the quick 2 minute fix when it is
available

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-29 Thread Dave Chinner

On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > > [Trim the CC list]
> > > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > > [...]
> > > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > > lockdep gets very unhappy about the same functions being called with
> > > > different reclaim contexts. e.g.  directory block mapping might
> > > > occur from readdir (no transaction context) or within transactions
> > > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > > stop lockdep emitting false positive warnings
> > > 
> > > As already said in other email, I have tried to revert the above
> > > commit and tried to run it with some fs workloads but didn't manage
> > > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > > have tried to find reports which led to this commit but didn't succeed
> > > much. Everything is from much earlier or later. Do you happen to
> > > remember which loads triggered them, what they looked like or have an
> > > idea what to try to reproduce them? So far I was trying heavy parallel
> > > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > > triggered direct reclaim all the time.
> > 
> > Most of those issues were reported by users and not reproducable by
> > any obvious means.
> 
> I would really appreciate a reference to some of those (my google-fu has
> failed me) or at least a pattern of those splats

If you can't find them with google, then I won't. Google is mostly
useless as a patch/mailing list search tool these days. You can try
looking through this list:

https://www.google.com.au/search?q=XFS+lockdep+site:oss.sgi.com+-splice

but I'm not seeing anything particularly relevant in that list -
there isn't a single reclaim related lockdep report in that...

> - was it 
> "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> or a different class reports?

Typically that was involved, but it quite often there'd be a number
of locks and sometimes even interrupt stacks in an interaction
between 5 or 6 different processes. Lockdep covers all sorts of
stuff now (like fs freeze annotations as well as locks and memory
reclaim) so sometimes the only thing we can do is remove the
reclaim context from the stack and see if that makes it go away...
> 
> > They may have been fixed since, but I'm sceptical
> > of that because, generally speaking, developer testing only catches
> > the obvious lockdep issues. i.e. it's users that report all the
> > really twisty issues, and they are generally not reproducable except
> > under their production workloads...
> > 
> > IOWs, the absence of reports in your testing does not mean there
> > isn't a problem, and that is one of the biggest problems with
> > lockdep annotations - we have no way of ever knowing if they are
> > still necessary or not without exposing users to regressions and
> > potential deadlocks.
> 
> I understand your points here but if we are sure that those lockdep
> reports are just false positives then we should rather provide an api to
> silence lockdep for those paths

I agree with this - please provide such infrastructure before we
need it...

> than abusing GFP_NOFS which a) hurts
> the overal reclaim healthiness

Which doesn't actually seem to be a problem for the vast majority of
users.

> and b) works around a non-existing
> problem with lockdep disabled which is the vast majority of
> configurations.

But the moment we have a lockdep problem, we get bug reports from
all over the place and people complaining about it, so we are
*required* to silence them one way or another. And, like I said,
when the choice is simply adding GFP_NOFS or spending a week or two
completely reworking complex code that has functioned correctly for
15 years, the risk/reward *always* falls on the side of "just add
GFP_NOFS".

Please keep in mind that there is as much code in fs/xfs as there is
in the mm/ subsystem, and XFS has twice that in userspace as well.
I say this, because we have only have 3-4 full time developers to do
all the work required on this code base, unlike the mm/ subsystem
which had 30-40 full time MM developers attending LSFMM. This is why
I push back on suggestions that require significant redesign of
subsystem code to handle memory allocation/reclaim quirks - most
subsystems simply don't have the resources available to do such
work, and so will always look for the quick 2 minute fix when it is
available

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 3/3] input: touchscreen: mxs-lradc: Add support for touchscreen

2016-04-29 Thread Dmitry Torokhov

Hi Ksenija,

On Fri, Apr 29, 2016 at 01:49:11PM +0200, Ksenija Stanojevic wrote:
> Add mxs-lradc touchscreen driver.
> 
> Signed-off-by: Ksenija Stanojevic 
> ---
>  drivers/input/touchscreen/Kconfig|  14 +-
>  drivers/input/touchscreen/Makefile   |   1 +
>  drivers/input/touchscreen/mxs-lradc-ts.c | 729 
> +++
>  3 files changed, 742 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/input/touchscreen/mxs-lradc-ts.c
> 
> diff --git a/drivers/input/touchscreen/Kconfig 
> b/drivers/input/touchscreen/Kconfig
> index 8ecdc38..d614d248 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -566,7 +566,7 @@ config TOUCHSCREEN_HP600
>   depends on SH_HP6XX && SH_ADC
>   help
> Say Y here if you have a HP Jornada 620/660/680/690 and want to
> -  support the built-in touchscreen.
> +   support the built-in touchscreen.
>  
> To compile this driver as a module, choose M here: the
> module will be called hp680_ts_input.
> @@ -685,7 +685,7 @@ config TOUCHSCREEN_UCB1400
> This enables support for the Philips UCB1400 touchscreen interface.
> The UCB1400 is an AC97 audio codec.  The touchscreen interface
> will be initialized only after the ALSA subsystem has been
> -   brought up and the UCB1400 detected.  You therefore have to
> +   brought up and the UCB1400 detected.  You therefore have to

Why do we have the tab in the middle of the text?

> configure ALSA support as well (either built-in or modular,
> independently of whether this driver is itself built-in or
> modular) for this driver to work.
> @@ -842,6 +842,16 @@ config TOUCHSCREEN_MX25
> To compile this driver as a module, choose M here: the
> module will be called fsl-imx25-tcq.
>  
> +config TOUCHSCREEN_MXS_LRADC
> + tristate "Freescale i.MX23/i.MX28 LRADC touchscreen"
> + depends on MFD_MXS_LRADC
> + help
> +   Say Y here if you have a touchscreen connected to the low-resolution
> +   analog-to-digital converter (LRADC) on an i.MX23 or i.MX28 processor.
> +
> +   To compile this driver as a module, choose M here: the module will be
> +   called mxs-lradc-ts.
> +
>  config TOUCHSCREEN_MC13783
>   tristate "Freescale MC13783 touchscreen input driver"
>   depends on MFD_MC13XXX
> diff --git a/drivers/input/touchscreen/Makefile 
> b/drivers/input/touchscreen/Makefile
> index f42975e..513a6ff 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -54,6 +54,7 @@ obj-$(CONFIG_TOUCHSCREEN_MIGOR) += migor_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_MMS114) += mms114.o
>  obj-$(CONFIG_TOUCHSCREEN_MTOUCH) += mtouch.o
>  obj-$(CONFIG_TOUCHSCREEN_MK712)  += mk712.o
> +obj-$(CONFIG_TOUCHSCREEN_MXS_LRADC)  += mxs-lradc-ts.o
>  obj-$(CONFIG_TOUCHSCREEN_HP600)  += hp680_ts_input.o
>  obj-$(CONFIG_TOUCHSCREEN_HP7XX)  += jornada720_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_IPAQ_MICRO) += ipaq-micro-ts.o
> diff --git a/drivers/input/touchscreen/mxs-lradc-ts.c 
> b/drivers/input/touchscreen/mxs-lradc-ts.c
> new file mode 100644
> index 000..27abb8e
> --- /dev/null
> +++ b/drivers/input/touchscreen/mxs-lradc-ts.c
> @@ -0,0 +1,729 @@
> +/*
> + * Freescale MXS LRADC driver
> + *
> + * Copyright (c) 2012 DENX Software Engineering, GmbH.
> + * Marek Vasut 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Touchscreen handling
> + */
> +enum mxs_lradc_ts_plate {
> + LRADC_TOUCH = 0,
> + LRADC_SAMPLE_X,
> + LRADC_SAMPLE_Y,
> + LRADC_SAMPLE_PRESSURE,
> + LRADC_SAMPLE_VALID,
> +};
> +
> +struct mxs_lradc_ts {
> + struct mxs_lradc*lradc;
> + struct device   *dev;
> + /*
> +  * When the touchscreen is enabled, we give it two private virtual
> +  * channels: #6 and #7. This means that only 6 virtual channels (instead
> +  * of 8) will be available for buffered capture.
> +  */
> +#define TOUCHSCREEN_VCHANNEL17
> +#define TOUCHSCREEN_VCHANNEL26
> +
> + struct input_dev*ts_input;
> +
> + enum mxs_lradc_ts_plate cur_plate; /* state machine */
> + boolts_valid;
> + unsignedts_x_pos;
> + unsignedts_y_pos;
> +

Re: [PATCH 3/3] input: touchscreen: mxs-lradc: Add support for touchscreen

2016-04-29 Thread Dmitry Torokhov

Hi Ksenija,

On Fri, Apr 29, 2016 at 01:49:11PM +0200, Ksenija Stanojevic wrote:
> Add mxs-lradc touchscreen driver.
> 
> Signed-off-by: Ksenija Stanojevic 
> ---
>  drivers/input/touchscreen/Kconfig|  14 +-
>  drivers/input/touchscreen/Makefile   |   1 +
>  drivers/input/touchscreen/mxs-lradc-ts.c | 729 
> +++
>  3 files changed, 742 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/input/touchscreen/mxs-lradc-ts.c
> 
> diff --git a/drivers/input/touchscreen/Kconfig 
> b/drivers/input/touchscreen/Kconfig
> index 8ecdc38..d614d248 100644
> --- a/drivers/input/touchscreen/Kconfig
> +++ b/drivers/input/touchscreen/Kconfig
> @@ -566,7 +566,7 @@ config TOUCHSCREEN_HP600
>   depends on SH_HP6XX && SH_ADC
>   help
> Say Y here if you have a HP Jornada 620/660/680/690 and want to
> -  support the built-in touchscreen.
> +   support the built-in touchscreen.
>  
> To compile this driver as a module, choose M here: the
> module will be called hp680_ts_input.
> @@ -685,7 +685,7 @@ config TOUCHSCREEN_UCB1400
> This enables support for the Philips UCB1400 touchscreen interface.
> The UCB1400 is an AC97 audio codec.  The touchscreen interface
> will be initialized only after the ALSA subsystem has been
> -   brought up and the UCB1400 detected.  You therefore have to
> +   brought up and the UCB1400 detected.  You therefore have to

Why do we have the tab in the middle of the text?

> configure ALSA support as well (either built-in or modular,
> independently of whether this driver is itself built-in or
> modular) for this driver to work.
> @@ -842,6 +842,16 @@ config TOUCHSCREEN_MX25
> To compile this driver as a module, choose M here: the
> module will be called fsl-imx25-tcq.
>  
> +config TOUCHSCREEN_MXS_LRADC
> + tristate "Freescale i.MX23/i.MX28 LRADC touchscreen"
> + depends on MFD_MXS_LRADC
> + help
> +   Say Y here if you have a touchscreen connected to the low-resolution
> +   analog-to-digital converter (LRADC) on an i.MX23 or i.MX28 processor.
> +
> +   To compile this driver as a module, choose M here: the module will be
> +   called mxs-lradc-ts.
> +
>  config TOUCHSCREEN_MC13783
>   tristate "Freescale MC13783 touchscreen input driver"
>   depends on MFD_MC13XXX
> diff --git a/drivers/input/touchscreen/Makefile 
> b/drivers/input/touchscreen/Makefile
> index f42975e..513a6ff 100644
> --- a/drivers/input/touchscreen/Makefile
> +++ b/drivers/input/touchscreen/Makefile
> @@ -54,6 +54,7 @@ obj-$(CONFIG_TOUCHSCREEN_MIGOR) += migor_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_MMS114) += mms114.o
>  obj-$(CONFIG_TOUCHSCREEN_MTOUCH) += mtouch.o
>  obj-$(CONFIG_TOUCHSCREEN_MK712)  += mk712.o
> +obj-$(CONFIG_TOUCHSCREEN_MXS_LRADC)  += mxs-lradc-ts.o
>  obj-$(CONFIG_TOUCHSCREEN_HP600)  += hp680_ts_input.o
>  obj-$(CONFIG_TOUCHSCREEN_HP7XX)  += jornada720_ts.o
>  obj-$(CONFIG_TOUCHSCREEN_IPAQ_MICRO) += ipaq-micro-ts.o
> diff --git a/drivers/input/touchscreen/mxs-lradc-ts.c 
> b/drivers/input/touchscreen/mxs-lradc-ts.c
> new file mode 100644
> index 000..27abb8e
> --- /dev/null
> +++ b/drivers/input/touchscreen/mxs-lradc-ts.c
> @@ -0,0 +1,729 @@
> +/*
> + * Freescale MXS LRADC driver
> + *
> + * Copyright (c) 2012 DENX Software Engineering, GmbH.
> + * Marek Vasut 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Touchscreen handling
> + */
> +enum mxs_lradc_ts_plate {
> + LRADC_TOUCH = 0,
> + LRADC_SAMPLE_X,
> + LRADC_SAMPLE_Y,
> + LRADC_SAMPLE_PRESSURE,
> + LRADC_SAMPLE_VALID,
> +};
> +
> +struct mxs_lradc_ts {
> + struct mxs_lradc*lradc;
> + struct device   *dev;
> + /*
> +  * When the touchscreen is enabled, we give it two private virtual
> +  * channels: #6 and #7. This means that only 6 virtual channels (instead
> +  * of 8) will be available for buffered capture.
> +  */
> +#define TOUCHSCREEN_VCHANNEL17
> +#define TOUCHSCREEN_VCHANNEL26
> +
> + struct input_dev*ts_input;
> +
> + enum mxs_lradc_ts_plate cur_plate; /* state machine */
> + boolts_valid;
> + unsignedts_x_pos;
> + unsignedts_y_pos;
> + unsignedts_pressure;
> +
> +

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

On Fri, Apr 29, 2016 at 9:32 PM, Linus Torvalds
 wrote:
wrote:
> For example, that _long_ range of bits set ("7fffc" in the middle)
> is effectively just one bit set with a subtraction. And it's *right*
> in that bit area that is supposed to shuffle bits 14-40 to the high bits
> (which is what we actually *use*. So it effectively shuffles none of those
> bits around at all, and if you have a stride of 4096, your'e pretty much
> done for.

Gee, I recall saying something a lot like that.
> 64 bits is just as bad...  0x9e37fffc0001 becomes
> 0x7fffc0001, which is 2^51 - 2^18 + 1.

After researching it, I think that the "high bits of a multiply" is
in fact a decent way to do such a hash.  Interestingly, for a randomly
chosen odd multiplier A, the high k bits of the w-bit product A*x is a
universal hash function in the cryptographic sense.  See section 2.3 of
http://arxiv.org/abs/1504.06804

One thing I note is that the advice in the comments to choose a prime
number is misquoting Knuth!  Knuth says (vol. 3 section 6.4) the number
should be *relatively* prime to the word size, which for binary computers
simply means odd.

When we have a hardware multiplier, keeping the Hamming weight low is
a waste of time.  When we don't, clever organization can do
better than the very naive addition/subtraction chain in the
current hash_64().

To multiply by the 32-bit constant 1640531527 = 0x61c88647 (which is
the negative of the golden ratio, so has identical distribution
properties) can be done in 6 shifts + adds, with a critical path
length of 7 operations (3 shifts + 4 adds).

#define GOLDEN_RATIO_32 0x61c88647  /* phi^2 = 1-phi */
/* Returns x * GOLDEN_RATIO_32 without a hardware multiplier */
unsigned hash_32(unsigned x)
{
unsigned y, z;
/* Path length */
y = (x << 19) + x;  /* 1 shift + 1 add */
z = (x << 9) + y;   /* 1 shift + 2 add */
x = (x << 23) + z;  /* 1 shift + 3 add */
z = (z << 8) + y;   /* 2 shift + 3 add */
x = (x << 6) - x;   /* 2 shift + 4 add */
return (z << 3) + x;/* 3 shift + 4 add */
}

Finding a similarly efficient chain for the 64-bit golden ratio
0x9E3779B97F4A7C15 = 11400714819323198485
or
0x61C8864680B583EB = 7046029254386353131

is a bit of a challenge, but algorithms are known.

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread George Spelvin

On Fri, Apr 29, 2016 at 9:32 PM, Linus Torvalds
 wrote:
wrote:
> For example, that _long_ range of bits set ("7fffc" in the middle)
> is effectively just one bit set with a subtraction. And it's *right*
> in that bit area that is supposed to shuffle bits 14-40 to the high bits
> (which is what we actually *use*. So it effectively shuffles none of those
> bits around at all, and if you have a stride of 4096, your'e pretty much
> done for.

Gee, I recall saying something a lot like that.
> 64 bits is just as bad...  0x9e37fffc0001 becomes
> 0x7fffc0001, which is 2^51 - 2^18 + 1.

After researching it, I think that the "high bits of a multiply" is
in fact a decent way to do such a hash.  Interestingly, for a randomly
chosen odd multiplier A, the high k bits of the w-bit product A*x is a
universal hash function in the cryptographic sense.  See section 2.3 of
http://arxiv.org/abs/1504.06804

One thing I note is that the advice in the comments to choose a prime
number is misquoting Knuth!  Knuth says (vol. 3 section 6.4) the number
should be *relatively* prime to the word size, which for binary computers
simply means odd.

When we have a hardware multiplier, keeping the Hamming weight low is
a waste of time.  When we don't, clever organization can do
better than the very naive addition/subtraction chain in the
current hash_64().

To multiply by the 32-bit constant 1640531527 = 0x61c88647 (which is
the negative of the golden ratio, so has identical distribution
properties) can be done in 6 shifts + adds, with a critical path
length of 7 operations (3 shifts + 4 adds).

#define GOLDEN_RATIO_32 0x61c88647  /* phi^2 = 1-phi */
/* Returns x * GOLDEN_RATIO_32 without a hardware multiplier */
unsigned hash_32(unsigned x)
{
unsigned y, z;
/* Path length */
y = (x << 19) + x;  /* 1 shift + 1 add */
z = (x << 9) + y;   /* 1 shift + 2 add */
x = (x << 23) + z;  /* 1 shift + 3 add */
z = (z << 8) + y;   /* 2 shift + 3 add */
x = (x << 6) - x;   /* 2 shift + 4 add */
return (z << 3) + x;/* 3 shift + 4 add */
}

Finding a similarly efficient chain for the 64-bit golden ratio
0x9E3779B97F4A7C15 = 11400714819323198485
or
0x61C8864680B583EB = 7046029254386353131

is a bit of a challenge, but algorithms are known.

Re: [PATCH] arm64: Relocate screen_info.lfb_base on PCI BAR allocation

2016-04-29 Thread Bjorn Helgaas

[+cc Matt, because I'm out of my depth in this UEFI stuff]

On Fri, Apr 29, 2016 at 11:52:47PM +0200, Alexander Graf wrote:
> On 29.04.16 23:37, Bjorn Helgaas wrote:
> > On Fri, Apr 29, 2016 at 10:51:34PM +0200, Alexander Graf wrote:

> >> I think the only thing we can really take as granted is the physical
> >> address we receive. Any efi device topology may be as implementation
> >> dependent as your firmware vendor wants to it to be.
> >>
> >> So couldn't we just try to see whether the efifb is within a pci mmio
> >> window? We could then go through all devices on it, search for
> >> overlapping BAR allocations and know the device. From there it would
> >> just be a matter of officially reserving the region - or setting the
> >> resource to FIXED, right?
> > 
> > You could do something like a quirk that ran on every device after
> > reading the BARs.  Then you could see if any of the device's resources
> > contain the framebuffer address, and if so, set the FIXED bit for that
> > resource.  Or you could save the pci_dev and the BAR number and make
> > the framebuffer driver actually claim that device.  That would have
> > the advantage of cleaner ownership and allowing the core to reassign
> > it if necessary.
> 
> Unfortunately the framebuffer driver gets initialized after the BARs got
> reassigned, so that wouldn't work :).

Quirks (fixups) can run at several different times, including before
we read the BARs, after we read them but before any reassignment, etc.
I'll attach a call graph of the enumeration process (somewhat out of
date now, but enough to show the fixup points).  If we went this
route, a header quirk (DECLARE_PCI_FIXUP_HEADER) would do what we
need.

> Can you point me to the code that does "read[ing] the BARs"? So far my
> impression was that we don't even try to read out the current BAR values
> from pci config space, but I can't claim I fully grasp the Linux pci
> code quite yet.

The efifb.c driver doesn't do anything at all with PCI (it includes
linux/pci.h, but probably doesn't need it).  That's part of what I'm
suggesting -- if it *did* register as a PCI device driver, then it
would look at pci_dev->resource[n], which is populated by the PCI
core based on the BAR values.

> > It would be a lot nicer if UEFI told us which device was the console.
> > We can't figure this out from the ConOut variable or the HCDP or PCDP
> > table or something?
> 
> I guess you could try to look at the device path for the GOP handle, but
> I'm not sure any layout of the device paths is mandated by the spec. For
> all I know a valid implementation could just claim the GOP interface
> fell from the skies and it'd be legitimate. But I'm happy to stand
> corrected if someone points me to the respective part of the spec.

Is ConOut what you're after?  I.e., is the whole point of this
exercise to get a framebuffer driver attached to the device that was
the firmware console?  I would think the ConOut path should be
decodable -- it has to tell you how to navigate the interconnect from
the CPU to the device.  But I don't know how to do it.

It looks like on x86, at least, setup_gop32()/setup_gop64() might be
extracting the framebuffer address from the ConOut device and stuffing
it into screen_info, which is what efifb.c later looks at (maybe this
is what Ard was referring to).

> I suppose HCDP and PCDP are ACPI tables? We can't rely on those :).
> AArch64 systems can (and some do) boot perfectly fine without any ACPI
> tables exposed via uEFI. And the less we have to marry ACPI with uEFI
> the happier I am - uEFI seems like a pretty nice path towards
> standardized booting while ACPI was quite a nightmare every time I
> looked at it so far ;).

HCDP/PCDP was from DIG64, originally for ia64, but I can't find a copy
of the spec anymore.  The http://www.dig64.org/specifications/ pointer
in pcdp.h appears broken.  So forget I brought this up; this looks dead.

Bjorn

PCI enumeration call graph:

  pci_scan_root_bus
pci_create_root_bus
  pci_alloc_host_bridge
  device_register(pci_host_bridge)
  device_register(pci_bus)
  pcibios_add_bus(pci_bus)  # pcibios

pci_scan_child_bus
  pci_scan_slot
pci_scan_single_device
  pci_scan_device
pci_alloc_dev
pci_setup_device
  printk("[%04x:%04x] type ...")
  pci_fixup_device(pci_fixup_early) # fixup
  PCI_HEADER_TYPE_NORMAL:
pci_read_irq
pci_read_bases  # <-- read BARs
  PCI_HEADER_TYPE_BRIDGE:
pci_read_irq
pci_read_bases
  PCI_HEADER_TYPE_CARDBUS:
pci_read_irq
pci_read_bases
  pci_device_add
pci_fixup_device(pci_fixup_header)  # fixup
pci_reassigndev_resource_alignment
pci_init_capabilities
pcibios_add_device  # pcibios
device_add

Re: [PATCH] arm64: Relocate screen_info.lfb_base on PCI BAR allocation

2016-04-29 Thread Bjorn Helgaas

[+cc Matt, because I'm out of my depth in this UEFI stuff]

On Fri, Apr 29, 2016 at 11:52:47PM +0200, Alexander Graf wrote:
> On 29.04.16 23:37, Bjorn Helgaas wrote:
> > On Fri, Apr 29, 2016 at 10:51:34PM +0200, Alexander Graf wrote:

> >> I think the only thing we can really take as granted is the physical
> >> address we receive. Any efi device topology may be as implementation
> >> dependent as your firmware vendor wants to it to be.
> >>
> >> So couldn't we just try to see whether the efifb is within a pci mmio
> >> window? We could then go through all devices on it, search for
> >> overlapping BAR allocations and know the device. From there it would
> >> just be a matter of officially reserving the region - or setting the
> >> resource to FIXED, right?
> > 
> > You could do something like a quirk that ran on every device after
> > reading the BARs.  Then you could see if any of the device's resources
> > contain the framebuffer address, and if so, set the FIXED bit for that
> > resource.  Or you could save the pci_dev and the BAR number and make
> > the framebuffer driver actually claim that device.  That would have
> > the advantage of cleaner ownership and allowing the core to reassign
> > it if necessary.
> 
> Unfortunately the framebuffer driver gets initialized after the BARs got
> reassigned, so that wouldn't work :).

Quirks (fixups) can run at several different times, including before
we read the BARs, after we read them but before any reassignment, etc.
I'll attach a call graph of the enumeration process (somewhat out of
date now, but enough to show the fixup points).  If we went this
route, a header quirk (DECLARE_PCI_FIXUP_HEADER) would do what we
need.

> Can you point me to the code that does "read[ing] the BARs"? So far my
> impression was that we don't even try to read out the current BAR values
> from pci config space, but I can't claim I fully grasp the Linux pci
> code quite yet.

The efifb.c driver doesn't do anything at all with PCI (it includes
linux/pci.h, but probably doesn't need it).  That's part of what I'm
suggesting -- if it *did* register as a PCI device driver, then it
would look at pci_dev->resource[n], which is populated by the PCI
core based on the BAR values.

> > It would be a lot nicer if UEFI told us which device was the console.
> > We can't figure this out from the ConOut variable or the HCDP or PCDP
> > table or something?
> 
> I guess you could try to look at the device path for the GOP handle, but
> I'm not sure any layout of the device paths is mandated by the spec. For
> all I know a valid implementation could just claim the GOP interface
> fell from the skies and it'd be legitimate. But I'm happy to stand
> corrected if someone points me to the respective part of the spec.

Is ConOut what you're after?  I.e., is the whole point of this
exercise to get a framebuffer driver attached to the device that was
the firmware console?  I would think the ConOut path should be
decodable -- it has to tell you how to navigate the interconnect from
the CPU to the device.  But I don't know how to do it.

It looks like on x86, at least, setup_gop32()/setup_gop64() might be
extracting the framebuffer address from the ConOut device and stuffing
it into screen_info, which is what efifb.c later looks at (maybe this
is what Ard was referring to).

> I suppose HCDP and PCDP are ACPI tables? We can't rely on those :).
> AArch64 systems can (and some do) boot perfectly fine without any ACPI
> tables exposed via uEFI. And the less we have to marry ACPI with uEFI
> the happier I am - uEFI seems like a pretty nice path towards
> standardized booting while ACPI was quite a nightmare every time I
> looked at it so far ;).

HCDP/PCDP was from DIG64, originally for ia64, but I can't find a copy
of the spec anymore.  The http://www.dig64.org/specifications/ pointer
in pcdp.h appears broken.  So forget I brought this up; this looks dead.

Bjorn

PCI enumeration call graph:

  pci_scan_root_bus
pci_create_root_bus
  pci_alloc_host_bridge
  device_register(pci_host_bridge)
  device_register(pci_bus)
  pcibios_add_bus(pci_bus)  # pcibios

pci_scan_child_bus
  pci_scan_slot
pci_scan_single_device
  pci_scan_device
pci_alloc_dev
pci_setup_device
  printk("[%04x:%04x] type ...")
  pci_fixup_device(pci_fixup_early) # fixup
  PCI_HEADER_TYPE_NORMAL:
pci_read_irq
pci_read_bases  # <-- read BARs
  PCI_HEADER_TYPE_BRIDGE:
pci_read_irq
pci_read_bases
  PCI_HEADER_TYPE_CARDBUS:
pci_read_irq
pci_read_bases
  pci_device_add
pci_fixup_device(pci_fixup_header)  # fixup
pci_reassigndev_resource_alignment
pci_init_capabilities
pcibios_add_device  # pcibios
device_add

Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks

2016-04-29 Thread Josh Poimboeuf

On Fri, Apr 29, 2016 at 02:38:02PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 29, 2016 at 1:50 PM, Josh Poimboeuf  wrote:
> > On Fri, Apr 29, 2016 at 12:39:16PM -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf  
> >> wrote:
> >> > Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> >> > stacks start at the same offset right above their saved pt_regs,
> >> > regardless of which syscall was used to enter the kernel.  That creates
> >> > a nice convention which makes it straightforward to identify the
> >> > "bottom" of the stack, which can be useful for stack walking code which
> >> > needs to verify the stack is sane.
> >> >
> >> > However there are still a few types of tasks which don't yet follow that
> >> > convention:
> >> >
> >> > 1) CPU idle tasks, aka the "swapper" tasks
> >> >
> >> > 2) freshly forked TIF_FORK tasks which don't have a stack at all
> >> >
> >> > Make the idle tasks conform to the new stack bottom convention by
> >> > starting their stack at a sizeof(pt_regs) offset from the end of the
> >> > stack page.
> >> >
> >> > Signed-off-by: Josh Poimboeuf 
> >> > ---
> >> >  arch/x86/kernel/head_64.S | 7 ---
> >> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> >> > index 6dbd2c0..0b12311 100644
> >> > --- a/arch/x86/kernel/head_64.S
> >> > +++ b/arch/x86/kernel/head_64.S
> >> > @@ -296,8 +296,9 @@ ENTRY(start_cpu)
> >> >  *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> >> >  *  address given in m16:64.
> >> >  */
> >> > -   movqinitial_code(%rip),%rax
> >> > -   pushq   $0  # fake return address to stop unwinder
> >> > +   call1f  # put return address on stack for 
> >> > unwinder
> >> > +1: xorq%rbp, %rbp  # clear frame pointer
> >> > +   movqinitial_code(%rip), %rax
> >> > pushq   $__KERNEL_CS# set correct cs
> >> > pushq   %rax# target address in negative space
> >> > lretq
> >> > @@ -325,7 +326,7 @@ ENDPROC(start_cpu0)
> >> > GLOBAL(initial_gs)
> >> > .quad   INIT_PER_CPU_VAR(irq_stack_union)
> >> > GLOBAL(initial_stack)
> >> > -   .quad  init_thread_union+THREAD_SIZE-8
> >> > +   .quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
> >>
> >> As long as you're doing this, could you also set orig_ax to -1?  I
> >> remember running into some oddities resulting from orig_ax containing
> >> garbage at some point.
> >
> > I assume you mean to initialize the orig_rax value in the pt_regs at the
> > bottom of the stack of the idle task?
> >
> > How could that cause a problem?  Since the idle task never returns from
> > a system call, I'd assume that memory never gets accessed?
> >
> 
> Look at collect_syscall in lib/syscall.c

I don't see how collect_syscall() can be called for the per-cpu idle
"swapper" tasks (which is what the above code affects).  They don't have
pids or /proc entries so you can't do /proc//syscall on them.

-- 
Josh

Re: [RFC PATCH v2 03/18] x86/asm/head: standardize the bottom of the stack for idle tasks

2016-04-29 Thread Josh Poimboeuf

On Fri, Apr 29, 2016 at 02:38:02PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 29, 2016 at 1:50 PM, Josh Poimboeuf  wrote:
> > On Fri, Apr 29, 2016 at 12:39:16PM -0700, Andy Lutomirski wrote:
> >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf  
> >> wrote:
> >> > Thanks to all the recent x86 entry code refactoring, most tasks' kernel
> >> > stacks start at the same offset right above their saved pt_regs,
> >> > regardless of which syscall was used to enter the kernel.  That creates
> >> > a nice convention which makes it straightforward to identify the
> >> > "bottom" of the stack, which can be useful for stack walking code which
> >> > needs to verify the stack is sane.
> >> >
> >> > However there are still a few types of tasks which don't yet follow that
> >> > convention:
> >> >
> >> > 1) CPU idle tasks, aka the "swapper" tasks
> >> >
> >> > 2) freshly forked TIF_FORK tasks which don't have a stack at all
> >> >
> >> > Make the idle tasks conform to the new stack bottom convention by
> >> > starting their stack at a sizeof(pt_regs) offset from the end of the
> >> > stack page.
> >> >
> >> > Signed-off-by: Josh Poimboeuf 
> >> > ---
> >> >  arch/x86/kernel/head_64.S | 7 ---
> >> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> >> > index 6dbd2c0..0b12311 100644
> >> > --- a/arch/x86/kernel/head_64.S
> >> > +++ b/arch/x86/kernel/head_64.S
> >> > @@ -296,8 +296,9 @@ ENTRY(start_cpu)
> >> >  *  REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> >> >  *  address given in m16:64.
> >> >  */
> >> > -   movqinitial_code(%rip),%rax
> >> > -   pushq   $0  # fake return address to stop unwinder
> >> > +   call1f  # put return address on stack for 
> >> > unwinder
> >> > +1: xorq%rbp, %rbp  # clear frame pointer
> >> > +   movqinitial_code(%rip), %rax
> >> > pushq   $__KERNEL_CS# set correct cs
> >> > pushq   %rax# target address in negative space
> >> > lretq
> >> > @@ -325,7 +326,7 @@ ENDPROC(start_cpu0)
> >> > GLOBAL(initial_gs)
> >> > .quad   INIT_PER_CPU_VAR(irq_stack_union)
> >> > GLOBAL(initial_stack)
> >> > -   .quad  init_thread_union+THREAD_SIZE-8
> >> > +   .quad  init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
> >>
> >> As long as you're doing this, could you also set orig_ax to -1?  I
> >> remember running into some oddities resulting from orig_ax containing
> >> garbage at some point.
> >
> > I assume you mean to initialize the orig_rax value in the pt_regs at the
> > bottom of the stack of the idle task?
> >
> > How could that cause a problem?  Since the idle task never returns from
> > a system call, I'd assume that memory never gets accessed?
> >
> 
> Look at collect_syscall in lib/syscall.c

I don't see how collect_syscall() can be called for the per-cpu idle
"swapper" tasks (which is what the above code affects).  They don't have
pids or /proc entries so you can't do /proc//syscall on them.

-- 
Josh

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1642 matches

Mail list logo