Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-06-07 Thread Sergey Senozhatsky
On (24/06/07 10:40), Nhat Pham wrote:
> Personally, I'm not super convinced about class locks. We're
> essentially relying on the post-compression size of the data to
> load-balance the queries - I can imagine a scenario where a workload
> has a concentrated distribution of post-compression data (i.e its
> pages are compressed to similar-ish sizes), and we're once again
> contending for a (few) lock(s) again.
> 
> That said, I'll let the data tell the story :) We don't need a perfect
> solution, just a good enough solution for now.

Speaking of size class locks:

One thing to mention is that zsmalloc merges size classes, we never have
documented/claimed 256 size classe, the actual number is always much
much lower.  Each such "cluster" (merged size classes) holds a range of
objects' sizes (e.g. 3504-3584 bytes).  The wider the cluster's size range
the more likely the (size class) lock contention is.

Setting CONFIG_ZSMALLOC_CHAIN_SIZE to 10 or higher makes zsmalloc pool
to be configured with more size class clusters (which means that clusters
hold narrower size intervals).


Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-06-05 Thread Sergey Senozhatsky
On (24/06/06 12:46), Chengming Zhou wrote:
> >> Agree, I think we should try to improve locking scalability of zsmalloc.
> >> I have some thoughts to share, no code or test data yet:
> >>
> >> 1. First, we can change the pool global lock to per-class lock, which
> >>is more fine-grained.
> > 
> > Commit c0547d0b6a4b6 "zsmalloc: consolidate zs_pool's migrate_lock
> > and size_class's locks" [1] claimed no significant difference
> > between class->lock and pool->lock.
> 
> Ok, I haven't looked into the history much, that seems preparation of trying
> to introduce reclaim in the zsmalloc? Not sure. But now with the reclaim code
> in zsmalloc has gone, should we change back to the per-class lock? Which is

Well, the point that commit made was that Nhat (and Johannes?) were
unable to detect any impact of pool->lock on a variety of cases.  So
we went on with code simplification.

> obviously more fine-grained than the pool lock. Actually, I have just done it,
> will test to get some data later.

Thanks, we'll need data on this.  I'm happy to take the patch, but
jumping back and forth between class->lock and pool->lock merely
"for obvious reasons" is not what I'm extremely excited about.


Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

2024-06-05 Thread Sergey Senozhatsky
On (24/06/06 10:49), Chengming Zhou wrote:
> > Thanks for trying this out. This is interesting, so even two zpools is
> > too much fragmentation for your use case.
> > 
> > I think there are multiple ways to go forward here:
> > (a) Make the number of zpools a config option, leave the default as
> > 32, but allow special use cases to set it to 1 or similar. This is
> > probably not preferable because it is not clear to users how to set
> > it, but the idea is that no one will have to set it except special use
> > cases such as Erhard's (who will want to set it to 1 in this case).
> > 
> > (b) Make the number of zpools scale linearly with the number of CPUs.
> > Maybe something like nr_cpus/4 or nr_cpus/8. The problem with this
> > approach is that with a large number of CPUs, too many zpools will
> > start having diminishing returns. Fragmentation will keep increasing,
> > while the scalability/concurrency gains will diminish.
> > 
> > (c) Make the number of zpools scale logarithmically with the number of
> > CPUs. Maybe something like 4log2(nr_cpus). This will keep the number
> > of zpools from increasing too much and close to the status quo. The
> > problem is that at a small number of CPUs (e.g. 2), 4log2(nr_cpus)
> > will actually give a nr_zpools > nr_cpus. So we will need to come up
> > with a more fancy magic equation (e.g. 4log2(nr_cpus/4)).
> > 
> > (d) Make the number of zpools scale linearly with memory. This makes
> > more sense than scaling with CPUs because increasing the number of
> > zpools increases fragmentation, so it makes sense to limit it by the
> > available memory. This is also more consistent with other magic
> > numbers we have (e.g. SWAP_ADDRESS_SPACE_SHIFT).
> > 
> > The problem is that unlike zswap trees, the zswap pool is not
> > connected to the swapfile size, so we don't have an indication for how
> > much memory will be in the zswap pool. We can scale the number of
> > zpools with the entire memory on the machine during boot, but this
> > seems like it would be difficult to figure out, and will not take into
> > consideration memory hotplugging and the zswap global limit changing.
> > 
> > (e) A creative mix of the above.
> > 
> > (f) Something else (probably simpler).
> > 
> > I am personally leaning toward (c), but I want to hear the opinions of
> > other people here. Yu, Vlastimil, Johannes, Nhat? Anyone else?
> > 
> > In the long-term, I think we may want to address the lock contention
> > in zsmalloc itself instead of zswap spawning multiple zpools.

Sorry, I'm sure I'm not following this discussion closely enough,
has the lock contention been demonstrated/proved somehow? lock-stats?

> Agree, I think we should try to improve locking scalability of zsmalloc.
> I have some thoughts to share, no code or test data yet:
> 
> 1. First, we can change the pool global lock to per-class lock, which
>is more fine-grained.

Commit c0547d0b6a4b6 "zsmalloc: consolidate zs_pool's migrate_lock
and size_class's locks" [1] claimed no significant difference
between class->lock and pool->lock.

[1] https://lkml.kernel.org/r/20221128191616.1261026-4-npha...@gmail.com


Re: [RFC PATCH] mm: z3fold: rename CONFIG_Z3FOLD to CONFIG_Z3FOLD_DEPRECATED

2024-01-15 Thread Sergey Senozhatsky
On (24/01/15 08:47), Yosry Ahmed wrote:
> > At this point I NACK this patch. We're about to submit an allocator
> > which is clearly better that z3fold and is faster that zsmalloc in
> > most cases and that submission will mark z3fold as deprecated. But for
> > now this move is premature.
>
> I think unless there are current users of z3fold that cannot use
> zsmalloc, the introduction of a new allocator should be irrelevant to
> deprecating z3fold. Do you know of such users? Can you explain why
> zsmalloc is not usable for them?

I second this. I believe "do we know whether z3fold has users"
is a legit question that means no offense.


Re: [PATCH v2 25/44] printk: Remove trace_.*_rcuidle() usage

2022-09-19 Thread Sergey Senozhatsky
On (22/09/19 12:00), Peter Zijlstra wrote:
> The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console
> tracepoint"), was printk usage from the cpuidle path where RCU was
> already disabled.
> 
> Per the patches earlier in this series, this is no longer the case.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Reviewed-by: Sergey Senozhatsky 
> Acked-by: Petr Mladek 

Acked-by: Sergey Senozhatsky 


Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-13 Thread Sergey Senozhatsky
il.com, vgu...@kernel.org, linux-...@vger.kernel.org, mon...@monstr.eu, 
rost...@goodmis.org, r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, Peter Zijlstra 
, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Arnd Bergmann , rich...@nod.at, 
x...@kernel.org, li...@armlinux.org.uk, mi...@redhat.com, 
a...@eecs.berkeley.edu, paul...@kernel.org, h...@linux.ibm.com, 
stefan.kristians...@saunalahti.fi, openr...@lists.librecores.org, 
paul.walms...@sifive.com, linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
pv-driv...@vmware.com, linux-m...@vger.kernel.org, pal...@dabbelt.com, 
anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 9, 2022 at 10:02 PM Petr Mladek  wrote:
>
> On Thu 2022-06-09 20:30:58, Sergey Senozhatsky wrote:
> > My emails are getting rejected... Let me try web-interface
>
> Bad day for mail sending. I have problems as well ;-)

For me the problem is still there and apparently it's an "too many
recipients" error.

> > I'm somewhat curious whether we can actually remove that trace event.
>
> Good question.
>
> Well, I think that it might be useful. It allows to see trace and
> printk messages together.

Fair enough. Seems that back in 2011 people were pretty happy with it
https://lore.kernel.org/all/1322161388.5366.54.ca...@jlt3.sipsolutions.net/T/#m7bf6416f469119372191f22a6ecf653c5f7331d2

but... reportedly, one of the folks who Ack-ed it (*cough cough*
PeterZ) has never used it.

> It was ugly when it was in the console code. The new location
> in vprintk_store() allows to have it even "correctly" sorted
> (timestamp) against other tracing messages.

That's true.


Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-13 Thread Sergey Senozhatsky
il.com, vgu...@kernel.org, linux-...@vger.kernel.org, mon...@monstr.eu, 
rost...@goodmis.org, r...@vger.kernel.org, b...@alien8.de, bc...@quicinc.com, 
tsbog...@alpha.franken.de, linux-par...@vger.kernel.org, sudeep.ho...@arm.com, 
shawn...@kernel.org, da...@davemloft.net, dal...@libc.org, Peter Zijlstra 
, amakha...@vmware.com, bjorn.anders...@linaro.org, 
h...@zytor.com, sparcli...@vger.kernel.org, linux-hexa...@vger.kernel.org, 
linux-ri...@lists.infradead.org, anton.iva...@cambridgegreys.com, 
jo...@southpole.se, Arnd Bergmann , rich...@nod.at, 
x...@kernel.org, li...@armlinux.org.uk, mi...@redhat.com, 
a...@eecs.berkeley.edu, paul...@kernel.org, h...@linux.ibm.com, 
stefan.kristians...@saunalahti.fi, openr...@lists.librecores.org, 
paul.walms...@sifive.com, linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
pv-driv...@vmware.com, linux-m...@vger.kernel.org, pal...@dabbelt.com, 
anup@brainfa
 ult.org, i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Thu, Jun 9, 2022 at 10:06 PM Petr Mladek  wrote:
>
> Makes sense. Feel free to use for this patch:
>
> Acked-by: Petr Mladek 

Reviewed-by: Sergey Senozhatsky 


Re: [PATCH 24/36] printk: Remove trace_.*_rcuidle() usage

2022-06-10 Thread Sergey Senozhatsky
e>, ulli.kr...@googlemail.com, vgu...@kernel.org, linux-...@vger.kernel.org, 
j...@joshtriplett.org, rost...@goodmis.org, r...@vger.kernel.org, 
b...@alien8.de, bc...@quicinc.com, tsbog...@alpha.franken.de, 
linux-par...@vger.kernel.org, sudeep.ho...@arm.com, shawn...@kernel.org, 
da...@davemloft.net, dal...@libc.org, t...@atomide.com, amakha...@vmware.com, 
bjorn.anders...@linaro.org, h...@zytor.com, sparcli...@vger.kernel.org, 
linux-hexa...@vger.kernel.org, linux-ri...@lists.infradead.org, 
anton.iva...@cambridgegreys.com, jo...@southpole.se, yury.no...@gmail.com, 
rich...@nod.at, x...@kernel.org, li...@armlinux.org.uk, mi...@redhat.com, 
a...@eecs.berkeley.edu, paul...@kernel.org, h...@linux.ibm.com, 
stefan.kristians...@saunalahti.fi, openr...@lists.librecores.org, 
paul.walms...@sifive.com, linux-te...@vger.kernel.org, namhy...@kernel.org, 
andriy.shevche...@linux.intel.com, jpoim...@kernel.org, jgr...@suse.com, 
mon...@monstr.eu, linux-m...@vger.kernel.org, pal...@dabbelt.com, 
a...@brainfault.org
 , i...@jurassic.park.msu.ru, johan...@sipsolutions.net, 
linuxppc-dev@lists.ozlabs.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


My emails are getting rejected... Let me try web-interface

Kudos to Petr for the questions and thanks to PeterZ for the answers.

On Thu, Jun 9, 2022 at 7:02 PM Peter Zijlstra  wrote:
> This is the tracepoint used to spool all of printk into ftrace, I
> suspect there's users, but I haven't used it myself.

I'm somewhat curious whether we can actually remove that trace event.


Re: [PATCH printk v1 00/10] printk: introduce atomic consoles and sync mode

2021-08-30 Thread Sergey Senozhatsky
On (21/08/05 17:47), Petr Mladek wrote:
[..]
> 3. After introducing console kthread(s):
> 
>   int printk(...)
>   {
>   vprintk_store();
>   wake_consoles_via_irqwork();
>   }
> 
>   + in panic:
> 
>   + with atomic console like after this patchset?
>   + without atomic consoles?
> 
>   + during early boot?

I guess I'd also add netconsole to the list.


Re: [PATCH printk v2 2/5] printk: remove safe buffers

2021-04-01 Thread Sergey Senozhatsky
On (21/04/01 16:17), Petr Mladek wrote:
> > For the long term, we should introduce a printk-context API that allows
> > callers to perfectly pack their multi-line output into a single
> > entry. We discussed [0][1] this back in August 2020.
> 
> We need a "short" term solution. There are currently 3 solutions:
> 
> 1. Keep nmi_safe() and all the hacks around.
> 
> 2. Serialize nmi_cpu_backtrace() by a spin lock and later by
>the special lock used also by atomic consoles.
> 
> 3. Tell complaining people how to sort the messed logs.

Are we talking about nmi_cpu_backtrace()->dump_stack() or some
other path?

dump_stack() seems to be already serialized by `dump_lock`. Hmm,
show_regs() is not serialized, seems like it should be under the
same `dump_lock` as dump_stack().


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-21 Thread Sergey Senozhatsky
On (21/03/17 00:33), John Ogness wrote:
[..]
>  void printk_nmi_direct_enter(void)
>  {
> @@ -324,27 +44,8 @@ void printk_nmi_direct_exit(void)
>   this_cpu_and(printk_context, ~PRINTK_NMI_DIRECT_CONTEXT_MASK);
>  }
>  
> -#else
> -
> -static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
> -{
> - return 0;
> -}
> -
>  #endif /* CONFIG_PRINTK_NMI */
>  
> -/*
> - * Lock-less printk(), to avoid deadlocks should the printk() recurse
> - * into itself. It uses a per-CPU buffer to store the message, just like
> - * NMI.
> - */
> -static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args)
> -{
> - struct printk_safe_seq_buf *s = this_cpu_ptr(_print_seq);
> -
> - return printk_safe_log_store(s, fmt, args);
> -}
> -
>  /* Can be preempted by NMI. */
>  void __printk_safe_enter(void)
>  {
> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list 
> args)
>* Use the main logbuf even in NMI. But avoid calling console
>* drivers that might have their own locks.
>*/
> - if ((this_cpu_read(printk_context) & PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> + if (this_cpu_read(printk_context) &
> + (PRINTK_NMI_DIRECT_CONTEXT_MASK |
> +  PRINTK_NMI_CONTEXT_MASK |
> +  PRINTK_SAFE_CONTEXT_MASK)) {

Do we need printk_nmi_direct_enter/exit() and PRINTK_NMI_DIRECT_CONTEXT_MASK?
Seems like all printk_safe() paths are now DIRECT - we store messages to the
prb, but don't call console drivers.

-ss


[PATCH] hvc: unify console setup naming

2020-06-19 Thread Sergey Senozhatsky
Use the 'common' foo_console_setup() naming scheme. There are 71
foo_console_setup() callbacks and only one foo_setup_console().

Signed-off-by: Sergey Senozhatsky 
Cc: Andy Shevchenko 
Cc: Steven Rostedt 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
---
 drivers/tty/hvc/hvc_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 5ef08905fe05..2a0e51a20e34 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -603,7 +603,7 @@ static void xen_hvm_early_write(uint32_t vtermno, const 
char *str, int len) { }
 #endif
 
 #ifdef CONFIG_EARLY_PRINTK
-static int __init xenboot_setup_console(struct console *console, char *string)
+static int __init xenboot_console_setup(struct console *console, char *string)
 {
static struct xencons_info xenboot;
 
@@ -647,7 +647,7 @@ static void xenboot_write_console(struct console *console, 
const char *string,
 struct console xenboot_console = {
.name   = "xenboot",
.write  = xenboot_write_console,
-   .setup  = xenboot_setup_console,
+   .setup  = xenboot_console_setup,
.flags  = CON_PRINTBUFFER | CON_BOOT | CON_ANYTIME,
.index  = -1,
 };
-- 
2.27.0



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Sergey Senozhatsky
On (20/04/09 10:08), Minchan Kim wrote:
> > > Even though I don't know how many usecase we have using zsmalloc as
> > > module(I heard only once by dumb reason), it could affect existing
> > > users. Thus, please include concrete explanation in the patch to
> > > justify when the complain occurs.
> > 
> > The justification is 'we can unexport functions that have no sane reason
> > of being exported in the first place'.
> > 
> > The Changelog pretty much says that.
> 
> Okay, I hope there is no affected user since this patch.
> If there are someone, they need to provide sane reason why they want
> to have zsmalloc as module.

I'm one of those who use zsmalloc as a module - mainly because I use zram
as a compressing general purpose block device, not as a swap device.
I create zram0, mkfs, mount, checkout and compile code, once done -
umount, rmmod. This reduces the number of writes to SSD. Some people use
tmpfs, but zram device(-s) can be much larger in size. That's a niche use
case and I'm not against the patch.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-14 Thread Sergey Senozhatsky
On (19/11/13 10:39), Steven Rostedt wrote:
[..]
> > void show_stack(struct task_struct *task, unsigned long *sp, int log_level)
> > {
> > printk_emergency_enter(log_level);
> > __show_stack(task, sp);
> > printk_emergency_exit();
> > }
> > // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - //
> >
> > show_stack() never schedules, disabling preemption around it should
> > not change anything. Should it be interrupted, we will handle it via
> > preempt count.
>
> Please no! The whole point of the printk rewrite was to allow for
> printk to be preemptible and used in more contexts. The show_stack() can
> be all over the place and is not a fast function. Let's not disable
> preemption for it.

I never said that this code should be used all over the place. What
I did say several times was that this code is for quick debugging,
when one sits behind a slow serial console and wants to tweak loglevel
of all printk-s of a particular context/function only.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-14 Thread Sergey Senozhatsky
On (19/11/13 16:40), Dmitry Safonov wrote:
> It's also not very fun for me to create those patches.
> But they fix console_loglevel issues (I hope we could un-export it in
> the end) and also I need it for my other patches those will produce
> warnings with debug loglevel when configured through sysctl.

No objections.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-13 Thread Sergey Senozhatsky
On (19/11/13 02:25), Dmitry Safonov wrote:
> I guess I've pointed that in my point of view price for one-time testing
> code is cheaper than adding a new printk feature to swap log levels on
> the fly.
[..]
> I've gone through functions used by sysrq driver and the same changes
> introducing log level parameter would be needed for: sched_show_task(),
> debug_show_all_locks(), show_regs(), show_state(), show_mem(). Some of
> them don't need any platform changes, but at least show_regs() needs.

Good points and nice conclusion.

Well, here we go. There is a number of generally useful functions that
print nice data and where people might want to have better loglevel control
(for debugging purposes). show_stack() is just one of them. Patching all
those functions, which you have mentioned above, is hardly a fun task to do.
Hence the printk() per-CPU per-context loglevel proposition. The code there
is not clever or complicated and is meant for debugging purposes only, but
with just 3 lines of code one can do some stuff:

/* @BTW you can't do this with "%s" KERN_FOO ;P */
+   printk_emergency_enter(LOGLEVEL_SCHED);
+   debug_show_all_locks();
+   printk_emergency_exit();

Now...
I'm not sure if this whole thing is up to "printk maintainers only".
If no one is going to use "emergency printk contexts" then there is
no point in having that code in the kernel.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-12 Thread Sergey Senozhatsky
On (19/11/13 02:41), Dmitry Safonov wrote:
[..]
> 
> I don't strongly disagree, but if you look at those results:
> git grep 'printk("%s.*", \(lvl\|level\)'
> 
> it seems to be used in quite a few places.

Yes, you are right, it is used in some places. That's why I said
that I'd prefer to keep that number low (minimize it). But it's
not 0 (that ship has sailed).

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-12 Thread Sergey Senozhatsky
On (19/11/12 19:12), Sergey Senozhatsky wrote:
> On (19/11/12 09:35), Petr Mladek wrote:
> [..]
> > This is getting too complicated. It would introduce too many
> > hidden rules. While the explicitly passed loglevel parameter
> > is straightforward and clear.
>
> If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it
> (either downgrade or upgrade). That's one rule, basically. Not too
> complicated, I guess.

Can be taken even a bit further than

show_stack(NULL, NULL, LOGLEVEL_DEBUG);
or
show_stack(NULL, NULL, LOGLEVEL_ERR);

For instance,

spin_lock_irqsave(>lock, flags);
printk_emergency_enter(LOGLEVEL_SCHED);
...
show_stack(...);
printk();
printk();
...
spin_unlock_irqrestore(>lock, flags);

or

spin_lock_irqsave(_port->lock, flags);
printk_emergency_enter(LOGLEVEL_SCHED);
...
printk();
printk();
...

and so on.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-12 Thread Sergey Senozhatsky
On (19/11/12 09:35), Petr Mladek wrote:
[..]
> This is getting too complicated. It would introduce too many
> hidden rules. While the explicitly passed loglevel parameter
> is straightforward and clear.

If loglevel is DEFAULT or NOTICE or INFO then we can overwrite it
(either downgrade or upgrade). That's one rule, basically. Not too
complicated, I guess.

> I am getting more inclined to the solution introduced by this
> patchset. It looks reasonable for the different use-cases.

No pressure from my side. Up to arch maintainers.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Sergey Senozhatsky
On (19/11/12 13:44), Sergey Senozhatsky wrote:
[..]
> > But yes, this per-code-section loglevel is problematic. The feedback
> > against the patchset shows that people want it also the other way.
> > I mean to keep pr_debug() as pr_debug().
> 
> Hmm. Right.
> 
> > A solution might be to use the per-code-section loglevel only instead
> > of some special loglevel.
> 
> So maybe we can "overwrite" only KERN_DEFAULT loglevels?

LOGLEVEL_DEFAULT, LOGLEVEL_NOTICE, LOGLEVEL_INFO?

So we can downgrade some messages (log_store() only) or promote
some messages.

DEBUG perhaps should stay debug.

> We certainly should not mess with SCHED or with anything in between
> EMERG and ERR.

  [EMERG, WARN]

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Sergey Senozhatsky
On (19/11/11 10:12), Petr Mladek wrote:
[..]
> > I do recall that we talked about per-CPU printk state bit which would
> > start/end "just print it" section. We probably can extend it to "just
> > log_store" type of functionality. Doesn't look like a very bad idea.
> 
> The problem with per-CPU printk is that we would need to disable
> interrupts.

Or disable preemption and have loglevel per-CPU and per-context.
preempt_count can navigate us to the right context loglevel on
particular CPU. I'm talking here only about backtrace (error)
reporting contexts. Those can be atomic perfectly fine.

I posted a silly code snippet.

[..]
> But yes, this per-code-section loglevel is problematic. The feedback
> against the patchset shows that people want it also the other way.
> I mean to keep pr_debug() as pr_debug().

Hmm. Right.

> A solution might be to use the per-code-section loglevel only instead
> of some special loglevel.

So maybe we can "overwrite" only KERN_DEFAULT loglevels?
We certainly should not mess with SCHED or with anything in between
EMERG and ERR.

> The explicitly passed loglevel makes me feel more confident that
> all needed printk() calls were updated. But it might be a false
> feeling. I do not really have any strong preference.

I'm not like really objecting, just trying to explore some other
options.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Sergey Senozhatsky
On (19/11/12 02:40), Dmitry Safonov wrote:
[..]
> In my point of view the cost of one-time [mostly build] testing every
> architecture is cheaper than introducing some new smart code that will
> live forever.

Well, there may be the need to pass loglevel deeper due to "hey __show_stack()
on that arch invokes bar(), which invokes foo() and now foo() does printk(),
but we don't see it". The context which decided to backtaraces decided
to do so for a reason, probably, so I guess we can look at it as "a special
error reporting code block".

The proposed patch set passes loglevel via slightly unusual channel -
via sprintf(). We probably can do it, but I would prefer to minimize
the number of such printk-s in the kernel. The code snippet which I
posted also does pretty unusual thing w.r.t loglevel. Both approaches
are "non-standard" from that POV.

> I'll reply to your suggestion tomorrow, it's a bit late in my tz.

Sure.

To anyone who will comment on that code snippet - this is not a
"look, here is what you need to do" type of proposal. Just an
alternative approach with its pros and cons.

We had several requests over the years to have something like "forcibly
allow all underlying printk-s from here to here" or "forcibly suppress
or postpone underlying printk-s from here to here", etc.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-11 Thread Sergey Senozhatsky
On (19/11/11 19:47), Dmitry Safonov wrote:
[..]
> I don't see how bits on task_struct or in per-cpu are easier than
> supplying a log level parameter down the stack.
> How would it work if sysrq_handle_crash() called by key-press?
> How would that interact with deferred printing?
> How would it make visible prints from current context, but not from
> something that preempted it?

[..]

per-context log_level works pretty much the same way as per-message
log_level.

// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - //
static DEFINE_PER_CPU(int, cpu_loglevels[4]); // @INITME.. LOGLEVEL_DEBUG + 1?

static int __printing_context(void)
{
unsigned int preempt = preempt_count();

if (!(preempt & (NMI_MASK | HARDIRQ_MASK | SOFITRQ_OFFSET)))
return 0;
if (preempt & SOFITRQ_OFFSET)
return 1;
if (preempt & HARDIRQ_MASK)
return 2;
return 3;
}

static int adj_context_loglevel(int level)
{
int ctx = __printing_context();
int cpu_level = this_cpu_read(cpu_loglevels[ctx]);

// this one is important
if (level == LOGLEVEL_SCHED)
return level;
// we are not in emergency context
if (cpu_level == LOGLEVEL_DEBUG + 1)
return level;
// we better not override these
if (LOGLEVEL_EMERG <= level && level <= LOGLEVEL_ERR)
return level;
return cpu_level;
}

void printk_emergency_enter(int log_level)
{
int ctx;

preempt_disable();
ctx = __printing_context();
this_cpu_write(cpu_loglevels[ctx], log_level);
}

void printk_emergency_exit(void)
{
int ctx = __printing_context();

this_cpu_write(cpu_loglevels[ctx], LOGLEVEL_DEBUG + 1);
preempt_enable();
}

void vprintk_emit(...)
{
level = adj_context_loglevel(level);
}
//
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - //
//
static void __show_stack(struct task_struct *task, unsigned long *sp)
{
printk();
...
printk();
}

void show_stack(struct task_struct *task, unsigned long *sp, int log_level)
{
printk_emergency_enter(log_level);
__show_stack(task, sp);
printk_emergency_exit();
}
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - //

show_stack() never schedules, disabling preemption around it should
not change anything. Should it be interrupted, we will handle it via
preempt count.

printk_emergency_enter(log_level) handles every printk() that
__show_stack() and friends do. Not worse than printk("%s Stack", lvl);
all over the place.

> What I'm going to do - is to fix all build and reported issues, I'll
> send v2 this week and feel free to NAK it, I will forget about those
> patches and won't be offended.

Lovely.
And - no, I'm not going to NAK platform specific changes. Just so you know.

*All* I'm talking about is an alternative, less "go and touch a ton of
platform code" approach. The argument "I patched so many files that I'm
not even going to discuss anything now" is not productive, to say the
least. Hope this clarifies.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-10 Thread Sergey Senozhatsky
On (19/11/08 14:04), Petr Mladek wrote:
[..]
> I agree that it is complicated to pass the loglevel as
> a parameter. It would be better define the default
> log level for a given code section. It might be stored
> in task_struct for the normal context and in per-CPU
> variables for interrupt contexts.

I do recall that we talked about per-CPU printk state bit which would
start/end "just print it" section. We probably can extend it to "just
log_store" type of functionality. Doesn't look like a very bad idea.
"This task/context is in trouble, whatever it printk()-s is important".

Per-console loglevel also might help sometimes. Slower consoles would
->write() only critical messages, faster consoles everything.

Passing log_level as part of message payload, which printk machinery
magically hides is not entirely exciting. What we have in the code
now - printk("%s blah\n", lvl) - is not what we see in the logs.
Because the leading '%s' becomes special. And printk()/sprintf()
documentation should reflect that: '%s' prints a string, but sometimes
it doesn't.

-ss


Re: [PATCH 00/50] Add log level to show_stack()

2019-11-08 Thread Sergey Senozhatsky
On (19/11/06 09:35), Petr Mladek wrote:
> I agree with all the other justification.
> 
> I would add. The backtrace is really useful for debugging. It should
> be possible to print it even in less critical situations.

Hmm, I don't know.
Do we really need debug/info level backtraces? May be all backtraces
can be converted to something more severe (so we can stop playing games
with loglvl) and then we can clean up "(ab)users"?

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread Sergey Senozhatsky
On (05/14/19 21:13), Geert Uytterhoeven wrote:
> I would immediately understand there's a missing IS_ERR() check in a
> function that can return  -EINVAL, without having to add a new printk()
> to find out what kind of bogus value has been received, and without
> having to reboot, and trying to reproduce...

But chances are that missing IS_ERR() will crash the kernel sooner
or later (in general case), if not in sprintf() then somewhere else.

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Sergey Senozhatsky
On (05/14/19 11:07), Sergey Senozhatsky wrote:
> How about this:
> 
>   if ptr < PAGE_SIZE  -> "(null)"

No, this is totally stupid. Forget about it. Sorry.


>   if IS_ERR_VALUE(ptr)-> "(fault)"

But Steven's "(fault)" is nice.

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Sergey Senozhatsky
On (05/13/19 14:42), Petr Mladek wrote:
> > The "(null)" is good enough by itself and already an established
> > practice..
> 
> (efault) made more sense with the probe_kernel_read() that
> checked wide range of addresses. Well, I still think that
> it makes sense to distinguish a pure NULL. And it still
> used also for IS_ERR_VALUE().

Wouldn't anything within first PAGE_SIZE bytes be reported as
a NULL deref?

   char *p = (char *)(PAGE_SIZE - 2);
   *p = 'a';

gives

 kernel: BUG: kernel NULL pointer dereference, address = 0ffe
 kernel: #PF: supervisor-privileged write access from kernel code
 kernel: #PF: error_code(0x0002) - not-present page


And I like Steven's "(fault)" idea.
How about this:

if ptr < PAGE_SIZE  -> "(null)"
if IS_ERR_VALUE(ptr)-> "(fault)"

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-10 Thread Sergey Senozhatsky
On (05/10/19 10:42), Petr Mladek wrote:
[..]
> Fixes: 3e5903eb9cff70730 ("vsprintf: Prevent crash when dereferencing invalid 
> pointers")
> Signed-off-by: Petr Mladek 

FWIW
Reviewed-by: Sergey Senozhatsky 

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-10 Thread Sergey Senozhatsky
On (05/10/19 10:06), Petr Mladek wrote:
[..]
> I am going to send a patch replacing probe_kernel_address() with
> a simple check:
> 
>   if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>   return "(efault)";

I'm OK with this.
Probing ptrs was a good idea, it just didn't work out.

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-09 Thread Sergey Senozhatsky
On (05/09/19 21:47), Linus Torvalds wrote:
>[ Sorry about html and mobile crud, I'm not at the computer right now ]
>How about we just undo the whole misguided probe_kernel_address() thing?

But the problem will remain - %pS/%pF on PPC (and some other arch-s)
do dereference_function_descriptor(), which calls probe_kernel_address().
So if probe_kernel_address() starts to dump_stack(), then we are heading
towards stack overflow. Unless I'm totally missing something.

-ss


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-09 Thread Sergey Senozhatsky
On (05/09/19 14:19), Petr Mladek wrote:
> 1. Report on Power:
> 
> Kernel crashes very early during boot with with CONFIG_PPC_KUAP and
> CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG
> 
> The problem is the combination of some new code called via printk(),
> check_pointer() which calls probe_kernel_read(). That then calls
> allow_user_access() (PPC_KUAP) and that uses mmu_has_feature() too early
> (before we've patched features). With the JUMP_LABEL debug enabled that
> causes us to call printk() & dump_stack() and we end up recursing and
> overflowing the stack.

Hmm... hmm... PPC does an .opd-based symbol dereference, which
eventually probe_kernel_read()-s. So early printk(%pS) will do

printk(%pS)
 dereference_function_descriptor()
  probe_kernel_address()
   dump_stack()
printk(%pS)
 dereference_function_descriptor()
  probe_kernel_address()
   dump_stack()
printk(%pS)
 ...

I'd say... that it's not vsprintf that we want to fix, it's
the idea that probe_kernel_address() can dump_stack() on any
platform. On some archs probe_kernel_address()->dump_stack()
is going nowhere:
 dump_stack() does probe_kernel_address(), which calls dump_stack(),
 which calls printk(%pS)->probe_kernel_address() again and again,
 and again.

-ss


Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-10 Thread Sergey Senozhatsky
On (02/08/19 13:55), Steven Rostedt wrote:
[..]
> > +   if (len) {
> > +   /*
> > +* Order the stores above in vsnprintf() vs the store of the
> > +* space below which joins the two strings. Note this doesn't
> > +* make the code truly race free because there is no barrier on
> > +* the read side. ie. Another CPU might load the uninitialised
> > +* tail of the buffer first and then the space below (rather
> > +* than the NULL that was there previously), and so print the
> > +* uninitialised tail. But the whole string lives in BSS so in
> > +* practice it should just see NULLs.
> > +*/
> > +   smp_wmb();
> 
> This shows me that this can be called at a time when more than one CPU is
> active. What happens if we have two CPUs calling dump_stack_add_arch_desc() at
> the same time? Can't that corrupt the dump_stack_arch_desc_str?

Can overwrite part of it, I guess (but it seems that Michael
is OK with this). The string is still NULL terminated.

The worst case scenario I can think of is not the one when
two CPUs call dump_stack_add_arch_desc(), but when CPUA calls
dump_stack_add_arch_desc() to append some data and at the
same time CPUB calls dump_stack_set_arch_desc() and simply
overwrites dump_stack_arch_desc_str. Not sure if this is
critical (or possible).

-ss


Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-07 Thread Sergey Senozhatsky
Cc-ing Steven

  https://lore.kernel.org/lkml/20190207124635.3885-1-...@ellerman.id.au/T/#u

On (02/07/19 23:46), Michael Ellerman wrote:
> Arch code can set a "dump stack arch description string" which is
> displayed with oops output to describe the hardware platform.
> 
> It is useful to initialise this as early as possible, so that an early
> oops will have the hardware description.
> 
> However in practice we discover the hardware platform in stages, so it
> would be useful to be able to incrementally fill in the hardware
> description as we discover it.
> 
> This patch adds that ability, by creating dump_stack_add_arch_desc().
> 
> If there is no existing string it behaves exactly like
> dump_stack_set_arch_desc(). However if there is an existing string it
> appends to it, with a leading space.
> 
> This makes it easy to call it multiple times from different parts of the
> code and get a reasonable looking result.
> 
> Signed-off-by: Michael Ellerman 

You probably can have a __init buffer somewhere in ppc code, append
data to it, step by step, and call dump_stack_set_arch_desc() all
the time.

But no real objections; dump_stack_add_arch_desc() can do.

FWIW,
Reviewed-by: Sergey Senozhatsky 

-ss


[PATCH] powerpc: use a CONSOLE_LOGLEVEL_DEBUG macro

2019-01-07 Thread Sergey Senozhatsky
Use a CONSOLE_LOGLEVEL_DEBUG macro for console_loglevel rather
than a naked number.

Signed-off-by: Sergey Senozhatsky 
---
 arch/powerpc/kernel/udbg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
index 7cc38b5b58bc..8db4891acdaf 100644
--- a/arch/powerpc/kernel/udbg.c
+++ b/arch/powerpc/kernel/udbg.c
@@ -74,7 +74,7 @@ void __init udbg_early_init(void)
 #endif
 
 #ifdef CONFIG_PPC_EARLY_DEBUG
-   console_loglevel = 10;
+   console_loglevel = CONSOLE_LOGLEVEL_DEBUG;
 
register_early_udbg_console();
 #endif
-- 
2.20.1



Re: [PATCH v9 4/6] init: allow initcall tables to be emitted using relative references

2018-06-27 Thread Sergey Senozhatsky
On (06/26/18 20:27), Ard Biesheuvel wrote:
>  /*
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 247808333ba4..688a27b0888c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2772,7 +2772,8 @@ EXPORT_SYMBOL(unregister_console);
>  void __init console_init(void)
>  {
>   int ret;
> - initcall_t *call;
> + initcall_t call;
> + initcall_entry_t *ce;
>  
>   /* Setup the default TTY line discipline. */
>   n_tty_init();
> @@ -2781,13 +2782,14 @@ void __init console_init(void)
>* set up the console device so that later boot sequences can
>* inform about problems etc..
>*/
> - call = __con_initcall_start;
> + ce = __con_initcall_start;
>   trace_initcall_level("console");
> - while (call < __con_initcall_end) {
> - trace_initcall_start((*call));
> - ret = (*call)();
> - trace_initcall_finish((*call), ret);
> - call++;
> + while (ce < __con_initcall_end) {
> + call = initcall_from_entry(ce);
> + trace_initcall_start(call);
> + ret = call();
> + trace_initcall_finish(call, ret);
> + ce++;
>   }
>  }

printk bits look OK to me.
The patch set works fine on my x86_64 and does reduce the size of vmlinux.

Acked-by: Sergey Senozhatsky 

-ss


Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-12-06 Thread Sergey Senozhatsky
On (12/06/17 11:32), Petr Mladek wrote:
[..]
> > diff --git a/Documentation/printk-formats.txt 
> > b/Documentation/printk-formats.txt
> > index aa0a776c817a..02745028e909 100644
> > --- a/Documentation/printk-formats.txt
> > +++ b/Documentation/printk-formats.txt
> > @@ -61,41 +61,31 @@ Symbols/Function Pointers
> >  
> >  ::
> >  
> > -   %pF versatile_init+0x0/0x110
> > -   %pf versatile_init
> > -   %pS versatile_init+0x0/0x110
> > -   %pSRversatile_init+0x9/0x110
> > +   %pS versatile_init+0x0/0x110
> > +   %ps versatile_init
> > +   %pF versatile_init+0x0/0x110
> > +   %pf versatile_init
> > +   %pSRversatile_init+0x9/0x110
> > (with __builtin_extract_return_addr() translation)
> > -   %ps versatile_init
> > -   %pB prev_fn_of_versatile_init+0x88/0x88
> > +   %pB prev_fn_of_versatile_init+0x88/0x88
> 
> I was curious why so many lines were changed here. You converted
> the 2nd tab to spaces. I put back the tab. The result is:

ew... how did that happen. thanks for fixing up.

> > +static inline void *dereference_symbol_descriptor(void *ptr)
> > +{
> > +#ifdef HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR
> > +   struct module *mod;
> > +
> > +   ptr = dereference_kernel_function_descriptor(ptr);
> > +   if (is_ksym_addr((unsigned long)ptr))
> > +   return ptr;
> > +
> > +   preempt_disable();
> > +   mod = __module_address((unsigned long)ptr);
> > +   preempt_enable();
> > +
> > +   if (mod)
> > +   ptr = dereference_module_function_descriptor(mod, ptr);
> > +#endif
> > +   return ptr;
> > +}
> 
> It is a bit too long for an inline function but I did not find a
> better solution. It should always be defined and all suitable
> .c files are compiled only under certain configuration. Well,
> it is a nop on most architectures.

or we can move dereference_symbol_descriptor() to vsprintf.c,
since all the functions it depends on are now available either
as exported symbols or via kallsyms header file. not that it
annoys me, so we can keep it as it is.

-ss


Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-12-05 Thread Sergey Senozhatsky
Hello,

so we got a number of build-error reports [somehow I
thought 0day has compile tested the patches already; well, I
was wrong] basically on congifs that have no KALLSYMS.


Petr, can we replace 0006 with the following patch?

8<--- --- ---

From: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Subject: [PATCH] symbol lookup: introduce dereference_symbol_descriptor()

dereference_symbol_descriptor() invokes appropriate ARCH specific
function descriptor dereference callbacks:
- dereference_kernel_function_descriptor() if the pointer is a
  kernel symbol;

- dereference_module_function_descriptor() if the pointer is a
  module symbol.

This is the last step needed to make '%pS/%ps' smart enough to
handle function descriptor dereference on affected ARCHs and
to retire '%pF/%pf'.

To refresh it:
  Some architectures (ia64, ppc64, parisc64) use an indirect pointer
  for C function pointers - the function pointer points to a function
  descriptor and we need to dereference it to get the actual function
  pointer.

  Function descriptors live in .opd elf section and all affected
  ARCHs (ia64, ppc64, parisc64) handle it properly for kernel and
  modules. So we, technically, can decide if the dereference is
  needed by simply looking at the pointer: if it belongs to .opd
  section then we need to dereference it.

  The kernel and modules have their own .opd sections, obviously,
  that's why we need to split dereference_function_descriptor()
  and use separate kernel and module dereference arch callbacks.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 Documentation/printk-formats.txt | 42 ---
 include/linux/kallsyms.h | 53 
 kernel/kallsyms.c| 33 -
 lib/vsprintf.c   |  5 ++--
 4 files changed, 71 insertions(+), 62 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index aa0a776c817a..02745028e909 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -61,41 +61,31 @@ Symbols/Function Pointers
 
 ::
 
-   %pF versatile_init+0x0/0x110
-   %pf versatile_init
-   %pS versatile_init+0x0/0x110
-   %pSRversatile_init+0x9/0x110
+   %pS versatile_init+0x0/0x110
+   %ps versatile_init
+   %pF versatile_init+0x0/0x110
+   %pf versatile_init
+   %pSRversatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
-   %ps versatile_init
-   %pB prev_fn_of_versatile_init+0x88/0x88
+   %pB prev_fn_of_versatile_init+0x88/0x88
 
-The ``F`` and ``f`` specifiers are for printing function pointers,
-for example, f->func,  They have the same result as
-``S`` and ``s`` specifiers. But they do an extra conversion on
-ia64, ppc64 and parisc64 architectures where the function pointers
-are actually function descriptors.
+The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic
+format. They result in the symbol name with (``S``) or without (``s``)
+offsets. If KALLSYMS are disabled then the symbol address is printed instead.
 
-The ``S`` and ``s`` specifiers can be used for printing symbols
-from direct addresses, for example, __builtin_return_address(0),
-(void *)regs->ip. They result in the symbol name with (``S``) or
-without (``s``) offsets. If KALLSYMS are disabled then the symbol
-address is printed instead.
+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
+parisc64 function pointers are indirect and, in fact, are function
+descriptors, which require additional dereferencing before we can lookup
+the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
+platforms (when needed), so ``F`` and ``f`` exist for compatibility
+reasons only.
 
 The ``B`` specifier results in the symbol name with offsets and should be
 used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
 when tail-call``s are used and marked with the noreturn GCC attribute.
 
-Examples::
-
-   printk("Going to call: %pF\n", gettimeofday);
-   printk("Going to call: %pF\n", p->func);
-   printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
-   printk("%s: called from %pS\n", __func__,
-   (void *)__builtin_return_address(0));
-   printk("Faulted at %pS\n", (void *)regs->ip);
-   printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
-
 Kernel Pointers
 ===
 
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index bd118a6c60cb..1bcfe221e62c 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ 

Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-28 Thread Sergey Senozhatsky
On (11/28/17 16:47), Petr Mladek wrote:
> On Fri 2017-11-10 08:48:24, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > A reworked version. There is a new dereference_symbol_descriptor()
> > function now, where "the magic happens", so I don't touch kallsyms_lookup()
> > and module_address_lookup() anymore.
> 
> The new version looks good to me. Thanks a lot for reworking it.
> I feel much better now. For the whole series:
> 
> Reviewed-by: Petr Mladek <pmla...@suse.com>
> 
> > All Ack-s/Tested-by-s were dropped, since the patch set has been
> > reworked. I'm kindly asking arch-s maintainers and developers to test it
> > once again. Sorry for any inconveniences and thanks for your help in
> > advance.
> 
> I see that it was tested on all affected architectures. Thanks a lot
> all testers.
> 
> It seems that we are ready to go. I am going to push this into
> for-4.16 branch in printk.git.

thanks.

-ss


Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-13 Thread Sergey Senozhatsky
On (11/13/17 18:17), Helge Deller wrote:
> On 10.11.2017 00:48, Sergey Senozhatsky wrote:
> > All Ack-s/Tested-by-s were dropped, since the patch set has been
> > reworked. I'm kindly asking arch-s maintainers and developers to test it
> > once again. Sorry for any inconveniences and thanks for your help in
> > advance.
> 
> I tested it successfully on parisc64.
> You can add back the
> Tested-by: Helge Deller <del...@gmx.de> #parisc64

thanks!

-ss


Re: [PATCHv4 3/6] powerpc64: Add .opd based function descriptor dereference

2017-11-13 Thread Sergey Senozhatsky
On (11/13/17 12:41), Santosh Sivaraj wrote:
> * Sergey Senozhatsky <sergey.senozhat...@gmail.com> wrote (on 2017-11-10 
> 08:48:27 +0900):
> 
> > We are moving towards separate kernel and module function descriptor
> > dereference callbacks. This patch enables it for powerpc64.
> > 
> > For pointers that belong to the kernel
> > -  Added __start_opd and __end_opd pointers, to track the kernel
> >.opd section address range;
> > 
> > -  Added dereference_kernel_function_descriptor(). Now we
> >will dereference only function pointers that are within
> >[__start_opd, __end_opd);
> > 
> > For pointers that belong to a module
> > -  Added dereference_module_function_descriptor() to handle module
> >function descriptor dereference. Now we will dereference only
> >pointers that are within [module->opd.start, module->opd.end).
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
> > ---
> >  arch/powerpc/include/asm/module.h   |  3 +++
> >  arch/powerpc/include/asm/sections.h | 12 
> >  arch/powerpc/kernel/module_64.c | 14 ++
> >  arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
> >  4 files changed, 31 insertions(+)
> >
> 
> Looks good on powerpc. If you wish:
> 
> Tested-by: Santosh Sivaraj <sant...@fossix.org> # for powerpc

thanks!

-ss


Re: [PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-11-10 Thread Sergey Senozhatsky
On (11/10/17 10:09), Luck, Tony wrote:
> On Fri, Nov 10, 2017 at 08:48:29AM +0900, Sergey Senozhatsky wrote:
> > -Examples::
> > -
> > -   printk("Going to call: %pF\n", gettimeofday);
> > -   printk("Going to call: %pF\n", p->func);
> > -   printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
> > -   printk("%s: called from %pS\n", __func__,
> > -   (void *)__builtin_return_address(0));
> > -   printk("Faulted at %pS\n", (void *)regs->ip);
> > -   printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
> 
> Did you mean to delete the Examples completely?  Wouldn't it
> be better to just update (s/%pF/%pS/g)?

good question. yes, I think I did it deliberately :) we still
kinda have some sort of "examples", right at the beginning of
section "Symbols/Function Pointers"


>  Symbols/Function Pointers
>  =
>
>  ::
>
> %pS versatile_init+0x0/0x110
>  %ps versatile_init
>  %pF versatile_init+0x0/0x110
>  %pf versatile_init
>  %pSRversatile_init+0x9/0x110
> (with __builtin_extract_return_addr() translation)
>  %pB prev_fn_of_versatile_init+0x88/0x88
>
>  The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic
>  format. They result in the symbol name with (``S``) or without (``s``)
>  offsets. If KALLSYMS are disabled then the symbol address is printed instead.
>
>  Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
>  and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
>  parisc64 function pointers are indirect and, in fact, are function
>  descriptors, which require additional dereferencing before we can lookup
>  the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
>  platforms (when needed), so ``F`` and ``f`` exist for compatibility
>  reasons only.
>
>  The ``B`` specifier results in the symbol name with offsets and should be
>  used when printing stack backtraces. The specifier takes into
>  consideration the effect of compiler optimisations which may occur
>  when tail-call``s are used and marked with the noreturn GCC attribute.

I can return Examples back. don't really have a strong opinion
on this. let me know.

-ss


Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-10 Thread Sergey Senozhatsky
On (11/10/17 10:11), Luck, Tony wrote:
> On Fri, Nov 10, 2017 at 08:48:24AM +0900, Sergey Senozhatsky wrote:
> > All Ack-s/Tested-by-s were dropped, since the patch set has been
> > reworked. I'm kindly asking arch-s maintainers and developers to test it
> > once again. Sorry for any inconveniences and thanks for your help in
> > advance.
> 
> You can add back the:
> 
> Tested-by: Tony Luck <tony.l...@intel.com> #ia64

Thanks a ton, Tony!

-ss


[PATCHv4 6/6] checkpatch: add pF/pf deprecation warning

2017-11-09 Thread Sergey Senozhatsky
We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
enough to handle function pointer dereference on platforms where such
dereference is required.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Signed-off-by: Joe Perches <j...@perches.com>
Cc: Andy Whitcroft <a...@canonical.com>
Reviewed-by: Petr Mladek <pmla...@suse.com>
---
 scripts/checkpatch.pl | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3453df9f90ab..d081a2b7166e 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5752,18 +5752,25 @@ sub process {
for (my $count = $linenr; $count <= $lc; $count++) {
my $fmt = get_quoted_string($lines[$count - 1], 
raw_line($count, 0));
$fmt =~ s/%%//g;
-   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) {
+   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) {
$bad_extension = $1;
last;
}
}
if ($bad_extension ne "") {
my $stat_real = raw_line($linenr, 0);
+   my $ext_type = "Invalid";
+   my $use = "";
for (my $count = $linenr + 1; $count <= $lc; 
$count++) {
$stat_real = $stat_real . "\n" . 
raw_line($count, 0);
}
+   if ($bad_extension =~ /p[Ff]/) {
+   $ext_type = "Deprecated";
+   $use = " - use %pS instead";
+   $use =~ s/pS/ps/ if ($bad_extension =~ 
/pf/);
+   }
WARN("VSPRINTF_POINTER_EXTENSION",
-"Invalid vsprintf pointer extension 
'$bad_extension'\n" . "$here\n$stat_real\n");
+"$ext_type vsprintf pointer extension 
'$bad_extension'$use\n" . "$here\n$stat_real\n");
}
}
 
-- 
2.15.0



[PATCHv4 5/6] symbol lookup: introduce dereference_symbol_descriptor()

2017-11-09 Thread Sergey Senozhatsky
dereference_symbol_descriptor() invokes appropriate ARCH specific
function descriptor dereference callbacks:
- dereference_kernel_function_descriptor() if the pointer is a
  kernel symbol;

- dereference_module_function_descriptor() if the pointer is a
  module symbol.

This is the last step needed to make '%pS/%ps' smart enough to
handle function descriptor dereference on affected ARCHs and
to retire '%pF/%pf'.

To refresh it:
  Some architectures (ia64, ppc64, parisc64) use an indirect pointer
  for C function pointers - the function pointer points to a function
  descriptor and we need to dereference it to get the actual function
  pointer.

  Function descriptors live in .opd elf section and all affected
  ARCHs (ia64, ppc64, parisc64) handle it properly for kernel and
  modules. So we, technically, can decide if the dereference is
  needed by simply looking at the pointer: if it belongs to .opd
  section then we need to dereference it.

  The kernel and modules have their own .opd sections, obviously,
  that's why we need to split dereference_function_descriptor()
  and use separate kernel and module dereference arch callbacks.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 Documentation/printk-formats.txt | 49 
 include/linux/kallsyms.h |  2 ++
 kernel/kallsyms.c| 19 
 lib/vsprintf.c   |  5 ++--
 4 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..2f17e684b72e 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -50,42 +50,31 @@ Symbols/Function Pointers
 
 ::
 
-   %pF versatile_init+0x0/0x110
-   %pf versatile_init
-   %pS versatile_init+0x0/0x110
-   %pSRversatile_init+0x9/0x110
-   (with __builtin_extract_return_addr() translation)
-   %ps versatile_init
-   %pB prev_fn_of_versatile_init+0x88/0x88
-
-The ``F`` and ``f`` specifiers are for printing function pointers,
-for example, f->func,  They have the same result as
-``S`` and ``s`` specifiers. But they do an extra conversion on
-ia64, ppc64 and parisc64 architectures where the function pointers
-are actually function descriptors.
-
-The ``S`` and ``s`` specifiers can be used for printing symbols
-from direct addresses, for example, __builtin_return_address(0),
-(void *)regs->ip. They result in the symbol name with (``S``) or
-without (``s``) offsets. If KALLSYMS are disabled then the symbol
-address is printed instead.
+%pS versatile_init+0x0/0x110
+%ps versatile_init
+%pF versatile_init+0x0/0x110
+%pf versatile_init
+%pSRversatile_init+0x9/0x110
+(with __builtin_extract_return_addr() translation)
+%pB prev_fn_of_versatile_init+0x88/0x88
+
+The ``S`` and ``s`` specifiers are used for printing a pointer in symbolic
+format. They result in the symbol name with (``S``) or without (``s``)
+offsets. If KALLSYMS are disabled then the symbol address is printed instead.
+
+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
+parisc64 function pointers are indirect and, in fact, are function
+descriptors, which require additional dereferencing before we can lookup
+the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
+platforms (when needed), so ``F`` and ``f`` exist for compatibility
+reasons only.
 
 The ``B`` specifier results in the symbol name with offsets and should be
 used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
 when tail-call``s are used and marked with the noreturn GCC attribute.
 
-Examples::
-
-   printk("Going to call: %pF\n", gettimeofday);
-   printk("Going to call: %pF\n", p->func);
-   printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
-   printk("%s: called from %pS\n", __func__,
-   (void *)__builtin_return_address(0));
-   printk("Faulted at %pS\n", (void *)regs->ip);
-   printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
-
-
 Kernel Pointers
 ===
 
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
index 11dd93e42580..73f7e874c3e1 100644
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -16,6 +16,8 @@
 
 struct module;
 
+void *dereference_symbol_descriptor(void *ptr);
+
 #ifdef CONFIG_KALLSYMS
 /* Lookup the address for a symbol. Returns 0 if not found. */
 unsigned long kallsyms_lookup_name(const char *name);
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 1966fe1c2b57..18b3dffc4b84 100644
--- a/kernel/kallsyms.c
+++ b/kern

[PATCHv4 4/6] parisc64: Add .opd based function descriptor dereference

2017-11-09 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for parisc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd);

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end).

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Signed-off-by: Helge Deller <del...@gmx.de>
---
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  6 ++
 arch/parisc/kernel/module.c   | 16 
 arch/parisc/kernel/process.c  |  9 +
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 5 files changed, 35 insertions(+)

diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S 
b/arch/parisc/boot/compressed/vmlinux.lds.S
index a4ce3314e78e..4ebd4e65524c 100644
--- a/arch/parisc/boot/compressed/vmlinux.lds.S
+++ b/arch/parisc/boot/compressed/vmlinux.lds.S
@@ -29,7 +29,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index accdf40aa5b7..5a40b51df80c 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -6,8 +6,14 @@
 #include 
 
 #ifdef CONFIG_64BIT
+
+#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
+
 #undef dereference_function_descriptor
 void *dereference_function_descriptor(void *);
+
+#undef dereference_kernel_function_descriptor
+void *dereference_kernel_function_descriptor(void *);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index f1a76935a314..b5b3cb00f1fb 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -66,6 +66,7 @@
 
 #include 
 #include 
+#include 
 
 #if 0
 #define DEBUGP printk
@@ -954,3 +955,18 @@ void module_arch_cleanup(struct module *mod)
 {
deregister_unwind_table(mod);
 }
+
+#ifdef CONFIG_64BIT
+void *dereference_module_function_descriptor(struct module *mod, void *ptr)
+{
+   unsigned long start_opd = (Elf64_Addr)mod->core_layout.base +
+  mod->arch.fdesc_offset;
+   unsigned long end_opd = start_opd +
+   mod->arch.fdesc_count * sizeof(Elf64_Fdesc);
+
+   if (ptr < (void *)start_opd || ptr >= (void *)end_opd)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
+#endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 30f92391a93e..6c4585103a91 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -276,6 +276,15 @@ void *dereference_function_descriptor(void *ptr)
ptr = p;
return ptr;
 }
+
+void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd ||
+   ptr >= (void *)__end_opd)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
 #endif
 
 static inline unsigned long brk_rnd(void)
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 159a2ec0b4e0..da2e31190efa 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -100,7 +100,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
-- 
2.15.0



[PATCHv4 3/6] powerpc64: Add .opd based function descriptor dereference

2017-11-09 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for powerpc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd);

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end).

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/powerpc/include/asm/module.h   |  3 +++
 arch/powerpc/include/asm/sections.h | 12 
 arch/powerpc/kernel/module_64.c | 14 ++
 arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
 4 files changed, 31 insertions(+)

diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 6c0132c7212f..7e28442827f1 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -45,6 +45,9 @@ struct mod_arch_specific {
unsigned long tramp;
 #endif
 
+   /* For module function descriptor dereference */
+   unsigned long start_opd;
+   unsigned long end_opd;
 #else /* powerpc64 */
/* Indices of PLT sections within module. */
unsigned int core_plt_section;
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 82bec63bbd4f..e335a8f846af 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -66,6 +66,9 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
unsigned long end)
 }
 
 #ifdef PPC64_ELF_ABI_v1
+
+#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
+
 #undef dereference_function_descriptor
 static inline void *dereference_function_descriptor(void *ptr)
 {
@@ -76,6 +79,15 @@ static inline void *dereference_function_descriptor(void 
*ptr)
ptr = p;
return ptr;
 }
+
+#undef dereference_kernel_function_descriptor
+static inline void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
 #endif /* PPC64_ELF_ABI_v1 */
 
 #endif
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 759104b99f9f..218971ac7e04 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -93,6 +93,15 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 {
return 0;
 }
+
+void *dereference_module_function_descriptor(struct module *mod, void *ptr)
+{
+   if (ptr < (void *)mod->arch.start_opd ||
+   ptr >= (void *)mod->arch.end_opd)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
 #endif
 
 #define STUB_MAGIC 0x73747562 /* stub */
@@ -344,6 +353,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) {
+   me->arch.start_opd = sechdrs[i].sh_addr;
+   me->arch.end_opd = sechdrs[i].sh_addr +
+  sechdrs[i].sh_size;
+   }
 
/* We don't handle .init for the moment: rename to _init */
while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 0494e1566ee2..5dac5ab22fa2 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -278,7 +278,9 @@ SECTIONS
}
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
. = ALIGN(256);
-- 
2.15.0



[PATCHv4 2/6] ia64: Add .opd based function descriptor dereference

2017-11-09 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for IA64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd);

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end).

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/ia64/include/asm/sections.h | 10 +-
 arch/ia64/kernel/module.c| 12 
 arch/ia64/kernel/vmlinux.lds.S   |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index f3481408594e..cea15f2dd38d 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -27,6 +27,8 @@ extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_b
 extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
+#define HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR 1
+
 #undef dereference_function_descriptor
 static inline void *dereference_function_descriptor(void *ptr)
 {
@@ -38,6 +40,12 @@ static inline void *dereference_function_descriptor(void 
*ptr)
return ptr;
 }
 
+#undef dereference_kernel_function_descriptor
+static inline void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd || ptr >= (void *)__end_opd)
+   return ptr;
+   return dereference_function_descriptor(ptr);
+}
 
 #endif /* _ASM_IA64_SECTIONS_H */
-
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 853b5611a894..326448f9df16 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 
 #define ARCH_MODULE_DEBUG 0
 
@@ -918,3 +919,14 @@ module_arch_cleanup (struct module *mod)
if (mod->arch.core_unw_table)
unw_remove_unwind_table(mod->arch.core_unw_table);
 }
+
+void *dereference_module_function_descriptor(struct module *mod, void *ptr)
+{
+   Elf64_Shdr *opd = mod->arch.opd;
+
+   if (ptr < (void *)opd->sh_addr ||
+   ptr >= (void *)(opd->sh_addr + opd->sh_size))
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 58db59da0bd8..31e688981b4b 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -108,7 +108,9 @@ SECTIONS {
RODATA
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
/*
-- 
2.15.0



[PATCHv4 1/6] sections: split dereference_function_descriptor()

2017-11-09 Thread Sergey Senozhatsky
There are two format specifiers to print out a pointer in symbolic
format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two
mean exactly the same thing, but some architectures (ia64, ppc64,
parisc64) use an indirect pointer for C function pointers, where
the function pointer points to a function descriptor (which in
turn contains the actual pointer to the code). The '%pF/%pf, when
used appropriately, automatically does the appropriate function
descriptor dereference on such architectures.

The "when used appropriately" part is tricky. Basically this is
a subtle ABI detail, specific to some platforms, that made it to
the API level and people can be unaware of it and miss the whole
"we need to dereference the function" business out. [1] proves
that point (note that it fixes only '%pF' and '%pS', there might
be '%pf' and '%ps' cases as well).

It appears that we can handle everything within the affected
arches and make '%pS/%ps' smart enough to retire '%pF/%pf'.
Function descriptors live in .opd elf section and all affected
arches (ia64, ppc64, parisc64) handle it properly for kernel
and modules. So we, technically, can decide if the dereference
is needed by simply looking at the pointer: if it belongs to
.opd section then we need to dereference it.

The kernel and modules have their own .opd sections, obviously,
that's why we need to split dereference_function_descriptor()
and use separate kernel and module dereference arch callbacks.

This patch does the first step, it
a) adds dereference_kernel_function_descriptor() function.
b) adds a weak alias to dereference_module_function_descriptor()
   function.

So, for the time being, we will have:
1) dereference_function_descriptor()
   A generic function, that simply dereferences the pointer. There is
   bunch of places that call it: kgdbts, init/main.c, extable, etc.

2) dereference_kernel_function_descriptor()
   A function to call on kernel symbols that does kernel .opd section
   address range test.

3) dereference_module_function_descriptor()
   A function to call on modules' symbols that does modules' .opd
   section address range test.

[1] https://marc.info/?l=linux-kernel=150472969730573

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 include/asm-generic/sections.h | 8 ++--
 include/linux/module.h | 3 +++
 kernel/module.c| 6 ++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 03cc5f9bba71..849cd8eb5ca0 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -30,6 +30,7 @@
  * __ctors_start, __ctors_end
  * __irqentry_text_start, __irqentry_text_end
  * __softirqentry_text_start, __softirqentry_text_end
+ * __start_opd, __end_opd
  */
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
@@ -49,12 +50,15 @@ extern char __start_once[], __end_once[];
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
 
+/* Start and end of .opd section - used for function descriptors. */
+extern char __start_opd[], __end_opd[];
+
 extern __visible const void __nosave_begin, __nosave_end;
 
-/* function descriptor handling (if any).  Override
- * in asm/sections.h */
+/* Function descriptor handling (if any).  Override in asm/sections.h */
 #ifndef dereference_function_descriptor
 #define dereference_function_descriptor(p) (p)
+#define dereference_kernel_function_descriptor(p) (p)
 #endif
 
 /* random extra sections (if any).  Override
diff --git a/include/linux/module.h b/include/linux/module.h
index c69b49abe877..9dac6973b001 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -606,6 +606,9 @@ int ref_module(struct module *a, struct module *b);
__mod ? __mod->name : "kernel"; \
 })
 
+/* Dereference module function descriptor */
+void *dereference_module_function_descriptor(struct module *mod, void *ptr);
+
 /* For kallsyms to ask for address resolution.  namebuf should be at
  * least KSYM_NAME_LEN long: a pointer to namebuf is returned if
  * found, otherwise NULL. */
diff --git a/kernel/module.c b/kernel/module.c
index ab2978e4239c..1d6e996caa13 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3938,6 +3938,12 @@ static const char *get_ksymbol(struct module *mod,
return symname(kallsyms, best);
 }
 
+void * __weak dereference_module_function_descriptor(struct module *mod,
+void *ptr)
+{
+   return ptr;
+}
+
 /* For kallsyms to ask for address resolution.  NULL means not found.  Careful
  * not to lock to avoid deadlock on oopses, simply disable preemption. */
 const char *module_address_lookup(unsigned long addr,
-- 
2.15.0



[PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-11-09 Thread Sergey Senozhatsky
Hello,

A reworked version. There is a new dereference_symbol_descriptor()
function now, where "the magic happens", so I don't touch kallsyms_lookup()
and module_address_lookup() anymore.

All Ack-s/Tested-by-s were dropped, since the patch set has been
reworked. I'm kindly asking arch-s maintainers and developers to test it
once again. Sorry for any inconveniences and thanks for your help in
advance.

===

On some arches C function pointers are indirect and point to
a function descriptor, which contains the actual pointer to the code.
This mostly doesn't matter, except for cases when people want to print
out function pointers in symbolic format, because the usual '%pS/%ps'
does not work on those arches as expected. That's the reason why we
have '%pF/%pf', but since it's here because of a subtle ABI detail
specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
'%pF/%pf' and '%pS/%ps' (see [1], for example).

This patch set attempts to move ia64/ppc64/parisc64 C function
pointer ABI details out of printk() to arch code. Function dereference
code now checks if a pointer belongs to a .opd ELF section and dereferences
that pointer only if it does. The kernel and modules have their own .opd
sections that's why I use two different ARCH functions: for kernel and
for module pointer dereference.

I planned to remove dereference_function_descriptor() entirely,
but then I discovered a bunch other uses cases (kgdbts, init/main.c,
extable, etc.), so I decided to keep dereference_function_descriptor()
around because the main point of this patch set is to deprecate %pF/%pf.
But at the same time, I think I can go further and handle both kernel
and module descriptor dereference in dereference_function_descriptor().
We need a module pointer for module .opd check, so that will come at an
extra cost of module lookup (may be there will some other issues along
the way, haven't checked it).

Right now we've got:

- dereference_function_descriptor(addr)
a generic (old) function. it simply attempts to dereference
whatever pointer we give it.

- dereference_kernel_function_descriptor(addr)
dereferences a kernel pointer if it's within the kernel's .opd
section.

- dereference_module_function_descriptor(module, addr)
dereference a module pointer if it's within the module's .opd
section.

v4:
-- don't switch to ulong in derefence functions, keep using void pointer
-- introduce dereference_symbol_descriptor() function
-- avoid any dereference in kallsyms lookup/module address lookup
-- improved documentation
-- since this is a new version, I dropped all the the Ack-s and Tested-by-s

v3:
-- picked up ACKs and Tested-by
-- tweaked checkpatch warning (Joe)
-- updated Documentation

v2:
-- convert dereference_function_descriptor() to unsigned long
-- fix kernel descriptor range checks (Helge)
-- fix parisc module descriptor range check (Helge)
-- fix ppc64 module range check
-- add checkpatch patch

Sergey Senozhatsky (6):
  sections: split dereference_function_descriptor()
  ia64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  parisc64: Add .opd based function descriptor dereference
  symbol lookup: introduce dereference_symbol_descriptor()
  checkpatch: add pF/pf deprecation warning

 Documentation/printk-formats.txt  | 49 ---
 arch/ia64/include/asm/sections.h  | 10 ++-
 arch/ia64/kernel/module.c | 12 
 arch/ia64/kernel/vmlinux.lds.S|  2 ++
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  6 
 arch/parisc/kernel/module.c   | 16 ++
 arch/parisc/kernel/process.c  |  9 ++
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 arch/powerpc/include/asm/module.h |  3 ++
 arch/powerpc/include/asm/sections.h   | 12 
 arch/powerpc/kernel/module_64.c   | 14 +
 arch/powerpc/kernel/vmlinux.lds.S |  2 ++
 include/asm-generic/sections.h|  8 +++--
 include/linux/kallsyms.h  |  2 ++
 include/linux/module.h|  3 ++
 kernel/kallsyms.c | 19 
 kernel/module.c   |  6 
 lib/vsprintf.c|  5 ++--
 scripts/checkpatch.pl | 11 +--
 20 files changed, 155 insertions(+), 38 deletions(-)

-- 
2.15.0




Re: [v5,22/22] powerpc/mm: Add speculative page fault

2017-11-06 Thread Sergey Senozhatsky
On (11/02/17 15:11), Laurent Dufour wrote:
> On 26/10/2017 10:14, kemi wrote:
> > Some regression is found by LKP-tools(linux kernel performance) on this 
> > patch series
> > tested on Intel 2s/4s Skylake platform. 
> > The regression result is sorted by the metric will-it-scale.per_process_ops.
> 
> Hi Kemi,
> 
> Thanks for reporting this, I'll try to address it by turning some features
> of the SPF path off when the process is monothreaded.

make them madvice()-able?
not all multi-threaded apps will necessarily benefit of SPF. right?
just an idea.

-ss


Re: [PATCHv3 6/7] symbol lookup: use new kernel and module dereference functions

2017-10-23 Thread Sergey Senozhatsky
On (10/20/17 15:08), Petr Mladek wrote:
> On Thu 2017-10-19 15:42:35, Sergey Senozhatsky wrote:
> > Sorry for the delay and thanks for taking a look.
> > 
> > I'll try to re-spin the patch set by the end of this week/early next
> > week.
> > 
> > 
> > On (10/04/17 13:53), Petr Mladek wrote:
> > [..]
> > > Note that kallsyms_lookup() and module_address_lookup() is used
> > > in many other situations.
> > 
> > we dereference only things that can be dereferenced.
> > so calling it on already dereferenced address, or address
> > that does need to be dereferenced is OK.
> 
> My concern is that it changes the behavior. It will suddenly return
> another information for addresses that were not dereference before.

OK. I'd be really-really surprised to find out that anyone did
kallsyms_lookup()/module_address_lookup() on func descriptors,
but I understand your concerns. I'll try to keep everything
within vsprintf().

-ss


Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-10-19 Thread Sergey Senozhatsky
Michael,

On (09/27/17 15:01), Michael Ellerman wrote:
> Sergey Senozhatsky <sergey.senozhatsky.w...@gmail.com> writes:
> 
> > On (09/22/17 16:48), Luck, Tony wrote:
> > [..]
> >> Tested patch series on ia64 successfully.
> >> 
> >> Tested-by: Tony Luck <tony.l...@intel.com>
> >
> > thanks!
> >
> >> After this goes upstream, you should submit a patch to get rid of
> >> all uses of %pF (70 instances in 35 files) and %pf (63 in 34)
> >> 
> >> Perhaps break the patch by top-level directory (e.g. get all the %pF
> >> and %pF in the 17 files under drivers/ in one patch).
> >
> > frankly, I was going to have some sort of a lazy deprecation process:
> > didn't plan to send out a patch set that would hunt down all pf/pF-s.
> > hm...
> 
> That never works though, we have lots of cruft left over from times when
> that's happened and the conversion never quite got finished.

this time around it's different, I promise! :)


well...
I guess I can send out a tree wide pf/pF removal patch set. later.
when we will see that .opd based dereference does not make anyone
unhappy.

and I think we can't remove pf/pF from the kernel completely. it
will stay in vscnprintf() for some time. old habits die hard, I suppose,
there might be people using it for debugging/etc.


> At least if you send out the patches to do the removal they might
> eventually get merged.
> 
> > speaking of upstream, any objections if this patch set will go through
> > the printk tree, in one piece?
> 
> Do you mind putting it in a topic branch (based on rc2) and then merge
> that into the printk tree? That way I can merge the topic branch iff
> there are conflicts later down the line towards 4.15.

ok, let me re-spin the series. there are some changes here
and there, so I'll drop Tested-by/Reviewed-by tags and will
ask platforms' maintainers to re-test the patch set :(

if everything goes OK, then we can ask Petr to do the topic
branch (I don't have a kernel.org account).

-ss


Re: [PATCHv3 4/7] powerpc64: Add .opd based function descriptor dereference

2017-10-19 Thread Sergey Senozhatsky
Hello,

Michael, sorry for the delay. I'm catching up with the emails
after... absence.

On (10/04/17 22:06), Michael Ellerman wrote:
> Petr Mladek <pmla...@suse.com> writes:
> > On Sat 2017-09-30 11:53:16, Sergey Senozhatsky wrote:
> >> diff --git a/arch/powerpc/kernel/module_64.c 
> >> b/arch/powerpc/kernel/module_64.c
> >> index 0b0f89685b67..94caec045a90 100644
> >> --- a/arch/powerpc/kernel/module_64.c
> >> +++ b/arch/powerpc/kernel/module_64.c
> >> @@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
> >>return 0;
> >>  }
> >>  
> >> +#ifdef PPC64_ELF_ABI_v1
> >> +unsigned long dereference_module_function_descriptor(struct module *mod,
> >> +   unsigned long addr)
> >> +{
> >> +  if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd)
> >> +  return addr;
> >> +
> >> +  return dereference_function_descriptor(addr);
> >> +}
> >> +#endif /* PPC64_ELF_ABI_v1 */
> >
> > I would personally move this up in the source file. It is related to
> > the definition of func_desc() and other functions that are
> > also PPC_ELF_ABI-specific.
> 
> Yeah that would be neater. There's already a PPC64_ELF_ABI_v2 block, you
> could put this in the else case of that.
> 
> But we can do that later if you're not respinning otherwise.

will do.

-ss


Re: [PATCHv3 1/7] switch dereference_function_descriptor() to `unsigned long'

2017-10-19 Thread Sergey Senozhatsky
On (10/04/17 10:24), Petr Mladek wrote:
[..]
> To make it clear. All these comments are not a big deal and I do
> not want to invalidate all the acked-by and tested-by just because
> of them.
>
> But please, consider removing this change if we need to do
> another iteration of this patchset. IMHO, there is no real win
> and it would just pollute the git history.

I tend to rather keep it. would that cause any problems on your side?
it saves a ton of ugly casts later in the patch set and lets us to drop
some casts from the modules code/etc. the patch is here because otherwise
I had to add a bunch of new casts.

-ss


Re: [PATCHv3 2/7] sections: split dereference_function_descriptor()

2017-10-19 Thread Sergey Senozhatsky
On (10/04/17 11:00), Petr Mladek wrote:
[..]
> >  /* random extra sections (if any).  Override
> > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> > index 4d0cb9bba93e..172904e9cded 100644
> > --- a/include/linux/moduleloader.h
> > +++ b/include/linux/moduleloader.h
> > @@ -85,6 +85,10 @@ void module_arch_cleanup(struct module *mod);
> >  /* Any cleanup before freeing mod->module_init */
> >  void module_arch_freeing_init(struct module *mod);
> >  
> > +/* Dereference module function descriptor */
> > +unsigned long dereference_module_function_descriptor(struct module *mod,
> > +unsigned long addr);
> > +
> 
> The function is used when the module is already loaded. IMHO,
> include/linux/module.h would be a better place.
> 
> One advantage would be that we could use the same trick
> as in include/asm-generic/sections.h. I mean:
> 
> #define dereference_module_function_descriptor(mod, addr) (addr)
> 
> and redefine it in the three affected
> arch//include/asm/module.h headers. Then it might be completely
> optimized out on all architectures.

will take a look.

-ss


Re: [PATCHv3 4/7] powerpc64: Add .opd based function descriptor dereference

2017-10-19 Thread Sergey Senozhatsky
On (10/04/17 11:21), Petr Mladek wrote:
[..]
> > +#ifdef PPC64_ELF_ABI_v1
> > +unsigned long dereference_module_function_descriptor(struct module *mod,
> > +unsigned long addr)
> > +{
> > +   if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd)
> > +   return addr;
> > +
> > +   return dereference_function_descriptor(addr);
> > +}
> > +#endif /* PPC64_ELF_ABI_v1 */
> 
> I would personally move this up in the source file. It is related to
> the definition of func_desc() and other functions that are
> also PPC_ELF_ABI-specific.
> 
> Otherwise, it looks good to me.
> 
> Reviewed-by: Petr Mladek 

OK, will move.

-ss


Re: [PATCHv3 5/7] parisc64: Add .opd based function descriptor dereference

2017-10-19 Thread Sergey Senozhatsky
On (10/04/17 12:40), Petr Mladek wrote:
> > +unsigned long dereference_module_function_descriptor(struct module *mod,
> > +unsigned long addr)
> > +{
> > +   unsigned long start_opd = (Elf64_Addr)mod->core_layout.base +
> > +  mod->arch.fdesc_offset;
> > +   unsigned long end_opd = start_opd +
> > +   mod->arch.fdesc_count * sizeof(Elf64_Fdesc);
> 
> I know that this is used in rather slow paths. But it still might
> make sense to have these section borders pre-computed and
> stored in struct mod_arch_specific. I mean to do similar
> thing that we do on powerpc.
> 
> Well, we could do this in a followup patch if parisc people
> wanted it.
> 
> 
> > +   if (addr < start_opd || addr >= end_opd)
> > +   return addr;
> > +
> > +   return dereference_function_descriptor(addr);
> > +}
> > +#endif
> 
> Otherwise the patch looks fine to me.
> 
> Reviewed-by: Petr Mladek 

let's do it later, if need be.

-ss


Re: [PATCHv3 6/7] symbol lookup: use new kernel and module dereference functions

2017-10-19 Thread Sergey Senozhatsky
Sorry for the delay and thanks for taking a look.

I'll try to re-spin the patch set by the end of this week/early next
week.


On (10/04/17 13:53), Petr Mladek wrote:
[..]
> Note that kallsyms_lookup() and module_address_lookup() is used
> in many other situations.

we dereference only things that can be dereferenced.
so calling it on already dereferenced address, or address
that does need to be dereferenced is OK.

besides, not all of those "other" places are available on
ppc64, ia64, parisc.

[..]
> > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> > index 127e7cfafa55..e2fc09ea9509 100644
> > --- a/kernel/kallsyms.c
> > +++ b/kernel/kallsyms.c
> > @@ -322,6 +322,7 @@ const char *kallsyms_lookup(unsigned long addr,
> > if (is_ksym_addr(addr)) {
> 
> is_ksym_addr() ignores the special .opd elf sections if
> CONFIG_KALLSYMS_ALL is disabled. We should dereference before
> this call.

I'll move it.

> > unsigned long pos;
> >  
> > +   addr = dereference_kernel_function_descriptor(addr);
> > pos = get_symbol_pos(addr, symbolsize, offset);
> 
> I still wonder if doing the dereference in the widely used kallsyms
> might cause any regression.

more testing wouldn't hurt, yes.

> Also get_symbol_pos() is called in several other helpers
> but the dereference is done only here. It would be
> confusing if for example kallsyms_lookup_size_offset()
> and kallsyms_lookup() give different result.

hm, so there is no change in this regard, right? there was no
deference before, there is no dereference now. what am I missing?


I'm touching the pf/pF part in this patch set. if there are cases
of missing dereferences anywhere else then we need to address it
in a separate patch set, I think.

> I would feel much more comfortable if we keep the derefenrece
> only in vsprintf.

at a price of extra module lookup, because we need `struct module *'
for module address dereference.

-ss


[PATCHv3 7/7] checkpatch: add pF/pf deprecation warning

2017-09-29 Thread Sergey Senozhatsky
We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
enough to handle function pointer dereference on platforms where such
dereference is required.

checkpatch warning example:

WARNING: Deprecated vsprintf pointer extension '%pF' - use %pS instead

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Signed-off-by: Joe Perches <j...@perches.com>
Cc: Andy Whitcroft <a...@canonical.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 scripts/checkpatch.pl | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 03eb2551477d..387c453413e0 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5762,18 +5762,25 @@ sub process {
for (my $count = $linenr; $count <= $lc; $count++) {
my $fmt = get_quoted_string($lines[$count - 1], 
raw_line($count, 0));
$fmt =~ s/%%//g;
-   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) {
+   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) {
$bad_extension = $1;
last;
}
}
if ($bad_extension ne "") {
my $stat_real = raw_line($linenr, 0);
+   my $ext_type = "Invalid";
+   my $use = "";
for (my $count = $linenr + 1; $count <= $lc; 
$count++) {
$stat_real = $stat_real . "\n" . 
raw_line($count, 0);
}
+   if ($bad_extension =~ /p[Ff]/) {
+   $ext_type = "Deprecated";
+   $use = " - use %pS instead";
+   $use =~ s/pS/ps/ if ($bad_extension =~ 
/pf/);
+   }
WARN("VSPRINTF_POINTER_EXTENSION",
-"Invalid vsprintf pointer extension 
'$bad_extension'\n" . "$here\n$stat_real\n");
+"$ext_type vsprintf pointer extension 
'$bad_extension'$use\n" . "$here\n$stat_real\n");
}
}
 
-- 
2.14.2



[PATCHv3 6/7] symbol lookup: use new kernel and module dereference functions

2017-09-29 Thread Sergey Senozhatsky
Call appropriate function descriptor dereference ARCH callbacks:
- dereference_kernel_function_descriptor() if the pointer is a
  kernel symbol;

- dereference_module_function_descriptor() if the pointer is a
  module symbol.

This patch also removes dereference_function_descriptor() from
'%pF/%pf' vsprintf handler, because it has the same behavior with
'%pS/%ps' now.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 Documentation/printk-formats.txt | 20 ++--
 kernel/kallsyms.c|  1 +
 kernel/module.c  |  1 +
 lib/vsprintf.c   |  5 +
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..3adbc4fdd482 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -50,26 +50,28 @@ Symbols/Function Pointers
 
 ::
 
+   %pS versatile_init+0x0/0x110
+   %ps versatile_init
%pF versatile_init+0x0/0x110
%pf versatile_init
-   %pS versatile_init+0x0/0x110
%pSRversatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
-   %ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88
 
-The ``F`` and ``f`` specifiers are for printing function pointers,
-for example, f->func,  They have the same result as
-``S`` and ``s`` specifiers. But they do an extra conversion on
-ia64, ppc64 and parisc64 architectures where the function pointers
-are actually function descriptors.
-
 The ``S`` and ``s`` specifiers can be used for printing symbols
 from direct addresses, for example, __builtin_return_address(0),
 (void *)regs->ip. They result in the symbol name with (``S``) or
 without (``s``) offsets. If KALLSYMS are disabled then the symbol
 address is printed instead.
 
+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
+parisc64 function pointers are indirect and, in fact, are function
+descriptors, which require additional dereferencing before we can lookup
+the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
+platforms (when needed), so ``F`` and ``f`` exist for compatibility
+reasons only.
+
 The ``B`` specifier results in the symbol name with offsets and should be
 used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
@@ -77,8 +79,6 @@ when tail-call``s are used and marked with the noreturn GCC 
attribute.
 
 Examples::
 
-   printk("Going to call: %pF\n", gettimeofday);
-   printk("Going to call: %pF\n", p->func);
printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
printk("%s: called from %pS\n", __func__,
(void *)__builtin_return_address(0));
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..e2fc09ea9509 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -322,6 +322,7 @@ const char *kallsyms_lookup(unsigned long addr,
if (is_ksym_addr(addr)) {
unsigned long pos;
 
+   addr = dereference_kernel_function_descriptor(addr);
pos = get_symbol_pos(addr, symbolsize, offset);
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),
diff --git a/kernel/module.c b/kernel/module.c
index b792e814150a..63361de377ad 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3948,6 +3948,7 @@ const char *module_address_lookup(unsigned long addr,
preempt_disable();
mod = __module_address(addr);
if (mod) {
+   addr = dereference_module_function_descriptor(mod, addr);
if (modname)
*modname = mod->name;
ret = get_ksymbol(mod, addr, size, offset);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index bcd906a39010..bf04b4f5d8e7 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -40,7 +40,6 @@
 #include "../mm/internal.h"/* For the trace_print_flags arrays */
 
 #include   /* for PAGE_SIZE */
-#include   /* for dereference_function_descriptor() */
 #include  /* cpu_to_le16 */
 
 #include 
@@ -1721,10 +1720,8 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
}
 
switch (*fmt) {
-   case 'F':
+   case 'F': /* %pF and %pf are kept for compatibility reasons only */
case 'f':
-   ptr = (void *)dereference_function_descriptor((unsigned 
long)ptr);
-   /* Fallthrough */
case 'S':
case 's':
case 'B':
-- 
2.14.2



[PATCHv3 5/7] parisc64: Add .opd based function descriptor dereference

2017-09-29 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for parisc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Signed-off-by: Helge Deller <del...@gmx.de>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  2 ++
 arch/parisc/kernel/module.c   | 17 +
 arch/parisc/kernel/process.c  |  9 +
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 5 files changed, 32 insertions(+)

diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S 
b/arch/parisc/boot/compressed/vmlinux.lds.S
index a4ce3314e78e..4ebd4e65524c 100644
--- a/arch/parisc/boot/compressed/vmlinux.lds.S
+++ b/arch/parisc/boot/compressed/vmlinux.lds.S
@@ -29,7 +29,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index 59fbe0067112..845ddc9a3421 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -7,6 +7,8 @@
 #ifdef CONFIG_64BIT
 #undef dereference_function_descriptor
 unsigned long dereference_function_descriptor(unsigned long);
+#undef dereference_kernel_function_descriptor
+unsigned long dereference_kernel_function_descriptor(unsigned long);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index f1a76935a314..28f89b3dcc11 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -66,6 +66,7 @@
 
 #include 
 #include 
+#include 
 
 #if 0
 #define DEBUGP printk
@@ -954,3 +955,19 @@ void module_arch_cleanup(struct module *mod)
 {
deregister_unwind_table(mod);
 }
+
+#ifdef CONFIG_64BIT
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   unsigned long start_opd = (Elf64_Addr)mod->core_layout.base +
+  mod->arch.fdesc_offset;
+   unsigned long end_opd = start_opd +
+   mod->arch.fdesc_count * sizeof(Elf64_Fdesc);
+
+   if (addr < start_opd || addr >= end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d350aa913acc..423bbfe90e2b 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -276,6 +276,15 @@ unsigned long dereference_function_descriptor(unsigned 
long ptr)
ptr = (unsigned long)p;
return ptr;
 }
+
+unsigned long dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
 #endif
 
 static inline unsigned long brk_rnd(void)
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index ffe2cbf52d1a..ab030895dd1e 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -99,7 +99,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
-- 
2.14.2



[PATCHv3 4/7] powerpc64: Add .opd based function descriptor dereference

2017-09-29 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for powerpc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 arch/powerpc/include/asm/module.h   |  3 +++
 arch/powerpc/include/asm/sections.h | 11 +++
 arch/powerpc/kernel/module_64.c | 16 
 arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 6c0132c7212f..7e28442827f1 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -45,6 +45,9 @@ struct mod_arch_specific {
unsigned long tramp;
 #endif
 
+   /* For module function descriptor dereference */
+   unsigned long start_opd;
+   unsigned long end_opd;
 #else /* powerpc64 */
/* Indices of PLT sections within module. */
unsigned int core_plt_section;
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 67379b8945e8..6b4ee0d1645f 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -75,6 +75,17 @@ static inline unsigned long 
dereference_function_descriptor(unsigned long ptr)
ptr = (unsigned long)p;
return ptr;
 }
+
+#undef dereference_kernel_function_descriptor
+static inline unsigned long
+dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
 #endif /* PPC64_ELF_ABI_v1 */
 
 #endif
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 0b0f89685b67..94caec045a90 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -344,6 +344,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) {
+   me->arch.start_opd = sechdrs[i].sh_addr;
+   me->arch.end_opd = sechdrs[i].sh_addr +
+  sechdrs[i].sh_size;
+   }
 
/* We don't handle .init for the moment: rename to _init */
while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
@@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
return 0;
 }
 
+#ifdef PPC64_ELF_ABI_v1
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif /* PPC64_ELF_ABI_v1 */
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 #ifdef CC_USING_MPROFILE_KERNEL
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 882628fa6987..70e10251e083 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -277,7 +277,9 @@ SECTIONS
}
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
. = ALIGN(256);
-- 
2.14.2



[PATCHv3 3/7] ia64: Add .opd based function descriptor dereference

2017-09-29 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for IA64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 arch/ia64/include/asm/sections.h | 10 +-
 arch/ia64/kernel/module.c| 13 +
 arch/ia64/kernel/vmlinux.lds.S   |  2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index de6bfa1ef8fb..3ba7ce9d8bc8 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -37,6 +37,14 @@ static inline unsigned long 
dereference_function_descriptor(unsigned long ptr)
return ptr;
 }
 
+#undef dereference_kernel_function_descriptor
+static inline unsigned long
+dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+   return dereference_function_descriptor(addr);
+}
 
 #endif /* _ASM_IA64_SECTIONS_H */
-
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index d1d945c6bd05..0741ae6fa957 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -35,6 +35,7 @@
 
 #include 
 #include 
+#include 
 
 #define ARCH_MODULE_DEBUG 0
 
@@ -917,3 +918,15 @@ module_arch_cleanup (struct module *mod)
if (mod->arch.core_unw_table)
unw_remove_unwind_table(mod->arch.core_unw_table);
 }
+
+unsigned long
+dereference_module_function_descriptor(struct module *mod, unsigned long addr)
+{
+   Elf64_Shdr *opd = mod->arch.opd;
+
+   if (addr < opd->sh_addr ||
+   addr >= (opd->sh_addr + opd->sh_size))
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 798026dde52e..f872ba5ff82a 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -107,7 +107,9 @@ SECTIONS {
RODATA
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
/*
-- 
2.14.2



[PATCHv3 2/7] sections: split dereference_function_descriptor()

2017-09-29 Thread Sergey Senozhatsky
There are two format specifiers to print out a pointer in symbolic
format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two
mean exactly the same thing, but some architectures (ia64, ppc64,
parisc64) use an indirect pointer for C function pointers, where
the function pointer points to a function descriptor (which in
turn contains the actual pointer to the code). The '%pF/%pf, when
used appropriately, automatically does the appropriate function
descriptor dereference on such architectures.

The "when used appropriately" part is tricky. Basically this is
a subtle ABI detail, specific to some platforms, that made it to
the API level and people can be unaware of it and miss the whole
"we need to dereference the function" business out. [1] proves
that point (note that it fixes only '%pF' and '%pS', there might
be '%pf' and '%ps' cases as well).

It appears that we can handle everything within the affected
arches and make '%pS/%ps' smart enough to retire '%pF/%pf'.
Function descriptors live in .opd elf section and all affected
arches (ia64, ppc64, parisc64) handle it properly for kernel
and modules. So we, technically, can decide if the dereference
is needed by simply looking at the pointer: if it belongs to
.opd section then we need to dereference it.

The kernel and modules have their own .opd sections, obviously,
that's why we need to split dereference_function_descriptor()
and use separate kernel and module dereference arch callbacks.

This patch does the first step, it
a) adds dereference_kernel_function_descriptor() function.
b) adds a weak alias to dereference_module_function_descriptor()
   function.

So, for the time being, we will have:
1) dereference_function_descriptor()
   A generic function, that simply dereferences the pointer. There is
   bunch of places that call it: kgdbts, init/main.c, extable, etc.

2) dereference_kernel_function_descriptor()
   A function to call on kernel symbols that does kernel .opd section
   address range test.

3) dereference_module_function_descriptor()
   A function to call on modules' symbols that does modules' .opd
   section address range test.

[1] https://marc.info/?l=linux-kernel=150472969730573

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 include/asm-generic/sections.h | 8 ++--
 include/linux/moduleloader.h   | 4 
 kernel/module.c| 6 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..387f22c41e0d 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -29,6 +29,7 @@
  * __ctors_start, __ctors_end
  * __irqentry_text_start, __irqentry_text_end
  * __softirqentry_text_start, __softirqentry_text_end
+ * __start_opd, __end_opd
  */
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
@@ -47,12 +48,15 @@ extern char __softirqentry_text_start[], 
__softirqentry_text_end[];
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
 
+/* Start and end of .opd section - used for function descriptors. */
+extern char __start_opd[], __end_opd[];
+
 extern __visible const void __nosave_begin, __nosave_end;
 
-/* function descriptor handling (if any).  Override
- * in asm/sections.h */
+/* Function descriptor handling (if any).  Override in asm/sections.h */
 #ifndef dereference_function_descriptor
 #define dereference_function_descriptor(p) (p)
+#define dereference_kernel_function_descriptor(p) (p)
 #endif
 
 /* random extra sections (if any).  Override
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 4d0cb9bba93e..172904e9cded 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -85,6 +85,10 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+/* Dereference module function descriptor */
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index ea77ab13bead..b792e814150a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2121,6 +2121,12 @@ void __weak module_arch_freeing_init(struct module *mod)
 {
 }
 
+unsigned long __weak dereference_module_function_descriptor(struct module *mod,
+   unsigned long addr)
+{
+   return addr;
+}
+
 /* Free a module,

[PATCHv3 1/7] switch dereference_function_descriptor() to `unsigned long'

2017-09-29 Thread Sergey Senozhatsky
Convert dereference_function_descriptor() to accept and return
`unsigned long'. There will be two new ARCH function for kernel
and module function pointer dereference, which will work with
`unsigned long', so the patch unifies interfaces.

Besides, dereference_function_descriptor() mostly work with
`unsigned long':

drivers/misc/kgdbts.c:
addr = (unsigned long) dereference_function_descriptor((void *)addr);

init/main.c:
addr = (unsigned long) dereference_function_descriptor(fn);

kernel/extable.c:
addr = (unsigned long) dereference_function_descriptor(ptr);

kernel/module.c:
unsigned long a = (unsigned long)dereference_function_descriptor(addr);

Convert dereference_function_descriptor() users tree-wide.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Tested-by: Helge Deller <del...@gmx.de> # parisc64
Tested-by: Santosh Sivaraj <sant...@fossix.org> # powerpc64
Acked-by: Michael Ellerman <m...@ellerman.id.au> # powerpc64
Tested-by: Tony Luck <tony.l...@intel.com> # ia64
---
 arch/ia64/include/asm/sections.h| 6 +++---
 arch/parisc/include/asm/sections.h  | 2 +-
 arch/parisc/kernel/process.c| 6 +++---
 arch/parisc/mm/init.c   | 4 ++--
 arch/powerpc/include/asm/sections.h | 6 +++---
 drivers/misc/kgdbts.c   | 2 +-
 init/main.c | 2 +-
 kernel/extable.c| 2 +-
 kernel/module.c | 2 +-
 lib/vsprintf.c  | 2 +-
 10 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 2ab2003698ef..de6bfa1ef8fb 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -27,13 +27,13 @@ extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
 #undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
+static inline unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   struct fdesc *desc = ptr;
+   struct fdesc *desc = (struct fdesc *)ptr;
void *p;
 
if (!probe_kernel_address(>ip, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index 9d13c3507ad6..59fbe0067112 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -6,7 +6,7 @@
 
 #ifdef CONFIG_64BIT
 #undef dereference_function_descriptor
-void *dereference_function_descriptor(void *);
+unsigned long dereference_function_descriptor(unsigned long);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 30f92391a93e..d350aa913acc 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -267,13 +267,13 @@ get_wchan(struct task_struct *p)
 }
 
 #ifdef CONFIG_64BIT
-void *dereference_function_descriptor(void *ptr)
+unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   Elf64_Fdesc *desc = ptr;
+   Elf64_Fdesc *desc = (Elf64_Fdesc *)ptr;
void *p;
 
if (!probe_kernel_address(>addr, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 #endif
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 1ca9a2b4239f..06e1b79e2946 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -389,10 +389,10 @@ static void __init setup_bootmem(void)
 static int __init parisc_text_address(unsigned long vaddr)
 {
static unsigned long head_ptr __initdata;
+   unsigned long addr = (unsigned long)_kernel_start;
 
if (!head_ptr)
-   head_ptr = PAGE_MASK & (unsigned long)
-   dereference_function_descriptor(_kernel_start);
+   head_ptr = PAGE_MASK & dereference_function_descriptor(addr);
 
return core_kernel_text(vaddr) || vaddr == head_ptr;
 }
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 7902d6358854..67379b8945e8 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -66,13 +66,13 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
unsigned long end)
 
 #ifdef PPC64_ELF_ABI_v1
 #undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
+static inline unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   struct ppc64_opd_entry *desc = ptr;
+   struct ppc64_opd_entry *desc = (struct ppc64_opd_entry *)ptr;
void *p;
 
if (!probe_kernel_address(>funcaddr, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 #endif /* PPC64_ELF_ABI_v1 */
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index fc7efedbc4be..6a5a159dfb75 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kg

[PATCHv3 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-29 Thread Sergey Senozhatsky
Hello


Petr, could you please pick up the series?

==

On some arches C function pointers are indirect and point to
a function descriptor, which contains the actual pointer to the code.
This mostly doesn't matter, except for cases when people want to print
out function pointers in symbolic format, because the usual '%pS/%ps'
does not work on those arches as expected. That's the reason why we
have '%pF/%pf', but since it's here because of a subtle ABI detail
specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
'%pF/%pf' and '%pS/%ps' (see [1], for example).

This patch set attempts to move ia64/ppc64/parisc64 C function
pointer ABI details out of printk() to arch code. Function dereference
code now checks if a pointer belongs to a .opd ELF section and dereferences
that pointer only if it does. The kernel and modules have their own .opd
sections that's why I use two different ARCH functions: for kernel and
for module pointer dereference.

I planned to remove dereference_function_descriptor() entirely,
but then I discovered a bunch other uses cases (kgdbts, init/main.c,
extable, etc.), so I decided to keep dereference_function_descriptor()
around because the main point of this patch set is to deprecate %pF/%pf.
But at the same time, I think I can go further and handle both kernel
and module descriptor dereference in dereference_function_descriptor().
We need a module pointer for module .opd check, so that will come at an
extra cost of module lookup (may be there will some other issues along
the way, haven't checked it).

Right now we've got:

- dereference_function_descriptor(addr)
a generic (old) function. it simply attempts to dereference
whatever pointer we give it.

- dereference_kernel_function_descriptor(addr)
dereferences a kernel pointer if it's within the kernel's .opd
section.

- dereference_module_function_descriptor(module, addr)
dereference a module pointer if it's within the module's .opd
section.

v3:
-- picked up ACKs and Tested-by
-- tweaked checkpatch warning (Joe)
-- updated Documentation

v2:
-- convert dereference_function_descriptor() to unsigned long
-- fix kernel descriptor range checks (Helge)
-- fix parisc module descriptor range check (Helge)
-- fix ppc64 module range check
-- add checkpatch patch

Sergey Senozhatsky (7):
  switch dereference_function_descriptor() to `unsigned long'
  sections: split dereference_function_descriptor()
  ia64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  parisc64: Add .opd based function descriptor dereference
  symbol lookup: use new kernel and module dereference functions
  checkpatch: add pF/pf deprecation warning

 Documentation/printk-formats.txt  | 20 ++--
 arch/ia64/include/asm/sections.h  | 16 
 arch/ia64/kernel/module.c | 13 +
 arch/ia64/kernel/vmlinux.lds.S|  2 ++
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  4 +++-
 arch/parisc/kernel/module.c   | 17 +
 arch/parisc/kernel/process.c  | 15 ---
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 arch/parisc/mm/init.c |  4 ++--
 arch/powerpc/include/asm/module.h |  3 +++
 arch/powerpc/include/asm/sections.h   | 17 ++---
 arch/powerpc/kernel/module_64.c   | 16 
 arch/powerpc/kernel/vmlinux.lds.S |  2 ++
 drivers/misc/kgdbts.c |  2 +-
 include/asm-generic/sections.h|  8 ++--
 include/linux/moduleloader.h  |  4 
 init/main.c   |  2 +-
 kernel/extable.c  |  2 +-
 kernel/kallsyms.c |  1 +
 kernel/module.c   |  9 -
 lib/vsprintf.c|  5 +
 scripts/checkpatch.pl | 11 +--
 23 files changed, 142 insertions(+), 35 deletions(-)

-- 
2.14.2



Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-27 Thread Sergey Senozhatsky
On (09/27/17 16:26), Michael Ellerman wrote:
[..]
> > Tested-by: Santosh Sivaraj 
> 
> Thanks Santosh.
> 
> I also gave it a quick spin. I'll give you an ack for the powerpc changes.
> 
> Acked-by: Michael Ellerman  (powerpc)
> 
> 
> Thanks for cleaning this up Sergey.

thanks!

-ss


Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-25 Thread Sergey Senozhatsky
On (09/22/17 16:48), Luck, Tony wrote:
[..]
> Tested patch series on ia64 successfully.
> 
> Tested-by: Tony Luck 

thanks!

> After this goes upstream, you should submit a patch to get rid of
> all uses of %pF (70 instances in 35 files) and %pf (63 in 34)
> 
> Perhaps break the patch by top-level directory (e.g. get all the %pF
> and %pF in the 17 files under drivers/ in one patch).

frankly, I was going to have some sort of a lazy deprecation process:
didn't plan to send out a patch set that would hunt down all pf/pF-s.
hm...


speaking of upstream, any objections if this patch set will go through
the printk tree, in one piece?

I'll wait for several more days and then resend v3 with updated
Documentation and tweaked checkpatch warning message.

-ss


Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-22 Thread Sergey Senozhatsky
On (09/22/17 11:04), Santosh Sivaraj wrote:
[..]
> > *** A BIG NOTE ***
> > I don't own ia64/ppc64/parisc64 hardware, so the patches are not
> > tested. Sorry about that!
> 
> Tested patch series on ppc64 sucessfully.
> 
> You may add tested by to the series.
> 
> Tested-by: Santosh Sivaraj 


thanks!

-ss


Re: [RFC][PATCH v2 6/7] symbol lookup: use new kernel and module dereference functions

2017-09-21 Thread Sergey Senozhatsky
On (09/21/17 01:29), Sergey Senozhatsky wrote:
[..]
> + %pS versatile_init+0x0/0x110
> + %ps versatile_init
>   %pF versatile_init+0x0/0x110
>   %pf versatile_init
> - %pS versatile_init+0x0/0x110
>   %pSRversatile_init+0x9/0x110
>   (with __builtin_extract_return_addr() translation)
> - %ps versatile_init
>   %pB prev_fn_of_versatile_init+0x88/0x88
>  
> -The ``F`` and ``f`` specifiers are for printing function pointers,
> -for example, f->func,  They have the same result as
> -``S`` and ``s`` specifiers. But they do an extra conversion on
> -ia64, ppc64 and parisc64 architectures where the function pointers
> -are actually function descriptors.
> -
>  The ``S`` and ``s`` specifiers can be used for printing symbols
>  from direct addresses, for example, __builtin_return_address(0),
>  (void *)regs->ip. They result in the symbol name with (``S``) or
>  without (``s``) offsets. If KALLSYMS are disabled then the symbol
>  address is printed instead.
>  
> +Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
> +and thus deprecated.

JFI,

I have updated this part. it's probably too early to completely
wipe out pF/pf info.

the updated Doc goes like this:

+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated. We have ``F`` and ``f`` because on ia64, ppc64 and
+parisc64 function pointers are indirect and, in fact, are function
+descriptors, which require additional dereferencing before we can lookup
+the symbol. As of now, ``S`` and ``s`` perform dereferencing on those
+platforms (when needed), so ``F`` and ``f`` exist for compatibility
+reasons only.

-ss


Re: [RFC][PATCH v2 7/7] checkpatch: add pF/pf deprecation warning

2017-09-21 Thread Sergey Senozhatsky
On (09/20/17 11:24), Joe Perches wrote:
> On Wed, 2017-09-20 at 19:53 +0200, Helge Deller wrote:
[..]
> > Is it worth to mention, that it's still needed in older kernels?
> > Just in case some patch get's backported.

good question.

> I think probably not.
> 
> There are relatively few references and
> modifications are unlikely to be backported.

I tend to agree.

unlikely anyone backports printk message updates. I have quickly glanced
through stable-4.9 and haven't seen such backports. well, may be there
are some.

can tweak the warning a bit, probably. e.g. "if you are planning to
backport your change to kernels older than 4.14 then ignore this
warning". but not sure if it's worth it.

-ss


Re: [RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-20 Thread Sergey Senozhatsky
On (09/20/17 22:14), Helge Deller wrote:
> On 20.09.2017 18:29, Sergey Senozhatsky wrote:
> >  This patch set attempts to move ia64/ppc64/parisc64 C function
> > pointer ABI details out of printk() to arch code. Function dereference
> > code now checks if a pointer belongs to a .opd ELF section and dereferences
> > that pointer only if it does. The kernel and modules have their own .opd
> > sections that's why I use two different ARCH functions: for kernel and
> > for module pointer dereference.
> > ...> *** A BIG NOTE ***
> >  I don't own ia64/ppc64/parisc64 hardware, so the patches are not
> >  tested. Sorry about that!
> 
> 
> I just now tested your patch series successfully on parisc64.
> 
> You may add to the whole series:
> Tested-by: Helge Deller <del...@gmx.de> # parisc64

thanks, Helge!

> > Another note:
> >  I need to check what is BPF symbol lookup and do we need to
> >  do any dereference there.
> 
> Not relevant for parisc, since we don't support it yet.

so that was my suspicion as well. at glance it didn't look like
bpf symbol resolution would work on platforms that do description
dereference.

-ss


Re: [RFC][PATCH v2 7/7] checkpatch: add pF/pf deprecation warning

2017-09-20 Thread Sergey Senozhatsky
On (09/20/17 10:38), Joe Perches wrote:
> On Thu, 2017-09-21 at 01:29 +0900, Sergey Senozhatsky wrote:
> > We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
> > enough to handle function pointer dereference on platforms where such
> > dereference is required.
> > 
> > checkpatch warning example:
> > 
> > WARNING: Use '%pS/%ps' instead. This pointer extension was deprecated: '%pF'
> 
> If this series is accepted,  I think this message
> is unclear and would prefer something like:

sure, can tweak the patch.

[..]
>   if ($bad_extension ne "") {
>   my $stat_real = raw_line($linenr, 0);
> + my $ext_type = "Invalid";
> + my $use = "";
>   for (my $count = $linenr + 1; $count <= $lc; 
> $count++) {
>   $stat_real = $stat_real . "\n" . 
> raw_line($count, 0);
>   }
> + if ($bad_extension =~ /p[Ff]/i) {
I think /i is not necessary here

> + $ext_type = "Deprecated";
> + $use = " - use %pS instead";
> + $use =~ s/pS/ps/ if ($bad_extension =~ 
> /pf/);
ok, handy :)

-ss


Re: [RFC][PATCH v2 7/7] checkpatch: add pF/pf deprecation warning

2017-09-20 Thread Sergey Senozhatsky
On (09/21/17 01:29), Sergey Senozhatsky wrote:
> We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
> enough to handle function pointer dereference on platforms where such
> dereference is required.
> 
> checkpatch warning example:
> 
> WARNING: Use '%pS/%ps' instead. This pointer extension was deprecated: '%pF'

weird... somehow I sent this patch twice. please ignore this one.

-ss


Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-20 Thread Sergey Senozhatsky
On (09/20/17 12:20), Helge Deller wrote:
[..]
> > I've not looked at the specifics case...
> > 
> > Another option is using a struct with a single member and
> > passing it by value.
> 
> Actually, we do already have correct structs which could be referenced:
> parisc: struct Elf64_Fdesc
> ia64: struct fdesc
> ppc64:struct ppc64_opd_entry
> 
> One could "#define platform_opd_entry" to each of those depending on the 
> platform and use it.
> It might be misleading though, because the pointer which is handed over to
> dereference_function_descriptor() can be such a pointer but isn't necessary.
> I'll leave it up to Sergey to decide.

Hello,

I think I'll keep 'unsigned long' for now.

-ss


[RFC][PATCH v2 7/7] checkpatch: add pF/pf deprecation warning

2017-09-20 Thread Sergey Senozhatsky
We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
enough to handle function pointer dereference on platforms where such
dereference is required.

checkpatch warning example:

WARNING: Use '%pS/%ps' instead. This pointer extension was deprecated: '%pF'

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Cc: Andy Whitcroft <a...@canonical.com>
Cc: Joe Perches <j...@perches.com>
---
 scripts/checkpatch.pl | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dd2c262aebbf..5945e4843466 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5762,18 +5762,20 @@ sub process {
for (my $count = $linenr; $count <= $lc; $count++) {
my $fmt = get_quoted_string($lines[$count - 1], 
raw_line($count, 0));
$fmt =~ s/%%//g;
-   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) {
+   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) {
$bad_extension = $1;
last;
}
}
if ($bad_extension ne "") {
my $stat_real = raw_line($linenr, 0);
+   my $error_msg = "Invalid vsprintf pointer 
extension ";
for (my $count = $linenr + 1; $count <= $lc; 
$count++) {
$stat_real = $stat_real . "\n" . 
raw_line($count, 0);
}
+   $error_msg = "Use '%pS/%ps' instead. This 
pointer extension was deprecated:" if ($bad_extension =~ /pF|pf/);
WARN("VSPRINTF_POINTER_EXTENSION",
-"Invalid vsprintf pointer extension 
'$bad_extension'\n" . "$here\n$stat_real\n");
+"$error_msg '$bad_extension'\n" . 
"$here\n$stat_real\n");
}
}
 
-- 
2.14.1



[RFC][PATCH v2 7/7] checkpatch: add pF/pf deprecation warning

2017-09-20 Thread Sergey Senozhatsky
We deprecated '%pF/%pf' printk specifiers, since '%pS/%ps' is now smart
enough to handle function pointer dereference on platforms where such
dereference is required.

checkpatch warning example:

WARNING: Use '%pS/%ps' instead. This pointer extension was deprecated: '%pF'

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Cc: Andy Whitcroft <a...@canonical.com>
Cc: Joe Perches <j...@perches.com>
---
 scripts/checkpatch.pl | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dd2c262aebbf..5945e4843466 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5762,18 +5762,20 @@ sub process {
for (my $count = $linenr; $count <= $lc; $count++) {
my $fmt = get_quoted_string($lines[$count - 1], 
raw_line($count, 0));
$fmt =~ s/%%//g;
-   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WFfSsBKRraEhMmIiUDdgVCbGNO]).)/) {
+   if ($fmt =~ 
/(\%[\*\d\.]*p(?![\WSsBKRraEhMmIiUDdgVCbGNO]).)/) {
$bad_extension = $1;
last;
}
}
if ($bad_extension ne "") {
my $stat_real = raw_line($linenr, 0);
+   my $error_msg = "Invalid vsprintf pointer 
extension ";
for (my $count = $linenr + 1; $count <= $lc; 
$count++) {
$stat_real = $stat_real . "\n" . 
raw_line($count, 0);
}
+   $error_msg = "Use '%pS/%ps' instead. This 
pointer extension was deprecated:" if ($bad_extension =~ /pF|pf/);
WARN("VSPRINTF_POINTER_EXTENSION",
-"Invalid vsprintf pointer extension 
'$bad_extension'\n" . "$here\n$stat_real\n");
+"$error_msg '$bad_extension'\n" . 
"$here\n$stat_real\n");
}
}
 
-- 
2.14.1



[RFC][PATCH v2 6/7] symbol lookup: use new kernel and module dereference functions

2017-09-20 Thread Sergey Senozhatsky
Call appropriate function descriptor dereference ARCH callbacks:
- dereference_kernel_function_descriptor() if the pointer is a
  kernel symbol;

- dereference_module_function_descriptor() if the pointer is a
  module symbol.

This patch also removes dereference_function_descriptor() from
'%pF/%pf' vsprintf handler, because it has the same behavior with
'%pS/%ps' now.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 Documentation/printk-formats.txt | 15 +--
 kernel/kallsyms.c|  1 +
 kernel/module.c  |  1 +
 lib/vsprintf.c   |  5 +
 4 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..b2afafc84638 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -50,26 +50,23 @@ Symbols/Function Pointers
 
 ::
 
+   %pS versatile_init+0x0/0x110
+   %ps versatile_init
%pF versatile_init+0x0/0x110
%pf versatile_init
-   %pS versatile_init+0x0/0x110
%pSRversatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
-   %ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88
 
-The ``F`` and ``f`` specifiers are for printing function pointers,
-for example, f->func,  They have the same result as
-``S`` and ``s`` specifiers. But they do an extra conversion on
-ia64, ppc64 and parisc64 architectures where the function pointers
-are actually function descriptors.
-
 The ``S`` and ``s`` specifiers can be used for printing symbols
 from direct addresses, for example, __builtin_return_address(0),
 (void *)regs->ip. They result in the symbol name with (``S``) or
 without (``s``) offsets. If KALLSYMS are disabled then the symbol
 address is printed instead.
 
+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated.
+
 The ``B`` specifier results in the symbol name with offsets and should be
 used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
@@ -77,8 +74,6 @@ when tail-call``s are used and marked with the noreturn GCC 
attribute.
 
 Examples::
 
-   printk("Going to call: %pF\n", gettimeofday);
-   printk("Going to call: %pF\n", p->func);
printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
printk("%s: called from %pS\n", __func__,
(void *)__builtin_return_address(0));
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..e2fc09ea9509 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -322,6 +322,7 @@ const char *kallsyms_lookup(unsigned long addr,
if (is_ksym_addr(addr)) {
unsigned long pos;
 
+   addr = dereference_kernel_function_descriptor(addr);
pos = get_symbol_pos(addr, symbolsize, offset);
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),
diff --git a/kernel/module.c b/kernel/module.c
index b792e814150a..63361de377ad 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3948,6 +3948,7 @@ const char *module_address_lookup(unsigned long addr,
preempt_disable();
mod = __module_address(addr);
if (mod) {
+   addr = dereference_module_function_descriptor(mod, addr);
if (modname)
*modname = mod->name;
ret = get_ksymbol(mod, addr, size, offset);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index bcd906a39010..bf04b4f5d8e7 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -40,7 +40,6 @@
 #include "../mm/internal.h"/* For the trace_print_flags arrays */
 
 #include   /* for PAGE_SIZE */
-#include   /* for dereference_function_descriptor() */
 #include  /* cpu_to_le16 */
 
 #include 
@@ -1721,10 +1720,8 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
}
 
switch (*fmt) {
-   case 'F':
+   case 'F': /* %pF and %pf are kept for compatibility reasons only */
case 'f':
-   ptr = (void *)dereference_function_descriptor((unsigned 
long)ptr);
-   /* Fallthrough */
case 'S':
case 's':
case 'B':
-- 
2.14.1



[RFC][PATCH v2 5/7] parisc64: Add .opd based function descriptor dereference

2017-09-20 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for parisc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
Signed-off-by: Helge Deller <del...@gmx.de>
---
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  2 ++
 arch/parisc/kernel/module.c   | 17 +
 arch/parisc/kernel/process.c  |  9 +
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 5 files changed, 32 insertions(+)

diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S 
b/arch/parisc/boot/compressed/vmlinux.lds.S
index a4ce3314e78e..4ebd4e65524c 100644
--- a/arch/parisc/boot/compressed/vmlinux.lds.S
+++ b/arch/parisc/boot/compressed/vmlinux.lds.S
@@ -29,7 +29,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index 59fbe0067112..845ddc9a3421 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -7,6 +7,8 @@
 #ifdef CONFIG_64BIT
 #undef dereference_function_descriptor
 unsigned long dereference_function_descriptor(unsigned long);
+#undef dereference_kernel_function_descriptor
+unsigned long dereference_kernel_function_descriptor(unsigned long);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index f1a76935a314..28f89b3dcc11 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -66,6 +66,7 @@
 
 #include 
 #include 
+#include 
 
 #if 0
 #define DEBUGP printk
@@ -954,3 +955,19 @@ void module_arch_cleanup(struct module *mod)
 {
deregister_unwind_table(mod);
 }
+
+#ifdef CONFIG_64BIT
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   unsigned long start_opd = (Elf64_Addr)mod->core_layout.base +
+  mod->arch.fdesc_offset;
+   unsigned long end_opd = start_opd +
+   mod->arch.fdesc_count * sizeof(Elf64_Fdesc);
+
+   if (addr < start_opd || addr >= end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index f00a5f93492a..ff13726b2d2d 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -276,6 +276,15 @@ unsigned long dereference_function_descriptor(unsigned 
long ptr)
ptr = (unsigned long)p;
return ptr;
 }
+
+unsigned long dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
 #endif
 
 static inline unsigned long brk_rnd(void)
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index ffe2cbf52d1a..ab030895dd1e 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -99,7 +99,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
-- 
2.14.1



[RFC][PATCH v2 4/7] powerpc64: Add .opd based function descriptor dereference

2017-09-20 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for powerpc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/powerpc/include/asm/module.h   |  3 +++
 arch/powerpc/include/asm/sections.h | 11 +++
 arch/powerpc/kernel/module_64.c | 16 
 arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
 4 files changed, 32 insertions(+)

diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 6c0132c7212f..7e28442827f1 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -45,6 +45,9 @@ struct mod_arch_specific {
unsigned long tramp;
 #endif
 
+   /* For module function descriptor dereference */
+   unsigned long start_opd;
+   unsigned long end_opd;
 #else /* powerpc64 */
/* Indices of PLT sections within module. */
unsigned int core_plt_section;
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 67379b8945e8..6b4ee0d1645f 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -75,6 +75,17 @@ static inline unsigned long 
dereference_function_descriptor(unsigned long ptr)
ptr = (unsigned long)p;
return ptr;
 }
+
+#undef dereference_kernel_function_descriptor
+static inline unsigned long
+dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
 #endif /* PPC64_ELF_ABI_v1 */
 
 #endif
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 0b0f89685b67..94caec045a90 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -344,6 +344,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (!strcmp(secstrings + sechdrs[i].sh_name, ".opd")) {
+   me->arch.start_opd = sechdrs[i].sh_addr;
+   me->arch.end_opd = sechdrs[i].sh_addr +
+  sechdrs[i].sh_size;
+   }
 
/* We don't handle .init for the moment: rename to _init */
while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
@@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
return 0;
 }
 
+#ifdef PPC64_ELF_ABI_v1
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   if (addr < mod->arch.start_opd || addr >= mod->arch.end_opd)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif /* PPC64_ELF_ABI_v1 */
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 #ifdef CC_USING_MPROFILE_KERNEL
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 882628fa6987..70e10251e083 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -277,7 +277,9 @@ SECTIONS
}
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
. = ALIGN(256);
-- 
2.14.1



[RFC][PATCH v2 3/7] ia64: Add .opd based function descriptor dereference

2017-09-20 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for IA64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/ia64/include/asm/sections.h | 10 +-
 arch/ia64/kernel/module.c| 13 +
 arch/ia64/kernel/vmlinux.lds.S   |  2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index de6bfa1ef8fb..3ba7ce9d8bc8 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -37,6 +37,14 @@ static inline unsigned long 
dereference_function_descriptor(unsigned long ptr)
return ptr;
 }
 
+#undef dereference_kernel_function_descriptor
+static inline unsigned long
+dereference_kernel_function_descriptor(unsigned long addr)
+{
+   if (addr < (unsigned long)__start_opd ||
+   addr >= (unsigned long)__end_opd)
+   return addr;
+   return dereference_function_descriptor(addr);
+}
 
 #endif /* _ASM_IA64_SECTIONS_H */
-
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index d1d945c6bd05..0741ae6fa957 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -35,6 +35,7 @@
 
 #include 
 #include 
+#include 
 
 #define ARCH_MODULE_DEBUG 0
 
@@ -917,3 +918,15 @@ module_arch_cleanup (struct module *mod)
if (mod->arch.core_unw_table)
unw_remove_unwind_table(mod->arch.core_unw_table);
 }
+
+unsigned long
+dereference_module_function_descriptor(struct module *mod, unsigned long addr)
+{
+   Elf64_Shdr *opd = mod->arch.opd;
+
+   if (addr < opd->sh_addr ||
+   addr >= (opd->sh_addr + opd->sh_size))
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 798026dde52e..f872ba5ff82a 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -107,7 +107,9 @@ SECTIONS {
RODATA
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
/*
-- 
2.14.1



[RFC][PATCH v2 2/7] sections: split dereference_function_descriptor()

2017-09-20 Thread Sergey Senozhatsky
There are two format specifiers to print out a pointer in symbolic
format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two
mean exactly the same thing, but some architectures (ia64, ppc64,
parisc64) use an indirect pointer for C function pointers, where
the function pointer points to a function descriptor (which in
turn contains the actual pointer to the code). The '%pF/%pf, when
used appropriately, automatically does the appropriate function
descriptor dereference on such architectures.

The "when used appropriately" part is tricky. Basically this is
a subtle ABI detail, specific to some platforms, that made it to
the API level and people can be unaware of it and miss the whole
"we need to dereference the function" business out. [1] proves
that point (note that it fixes only '%pF' and '%pS', there might
be '%pf' and '%ps' cases as well).

It appears that we can handle everything within the affected
arches and make '%pS/%ps' smart enough to retire '%pF/%pf'.
Function descriptors live in .opd elf section and all affected
arches (ia64, ppc64, parisc64) handle it properly for kernel
and modules. So we, technically, can decide if the dereference
is needed by simply looking at the pointer: if it belongs to
.opd section then we need to dereference it.

The kernel and modules have their own .opd sections, obviously,
that's why we need to split dereference_function_descriptor()
and use separate kernel and module dereference arch callbacks.

This patch does the first step, it
a) adds dereference_kernel_function_descriptor() function.
b) adds a weak alias to dereference_module_function_descriptor()
   function.

So, for the time being, we will have:
1) dereference_function_descriptor()
   A generic function, that simply dereferences the pointer. There is
   bunch of places that call it: kgdbts, init/main.c, extable, etc.

2) dereference_kernel_function_descriptor()
   A function to call on kernel symbols that does kernel .opd section
   address range test.

3) dereference_module_function_descriptor()
   A function to call on modules' symbols that does modules' .opd
   section address range test.

[1] https://marc.info/?l=linux-kernel=150472969730573

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 include/asm-generic/sections.h | 8 ++--
 include/linux/moduleloader.h   | 4 
 kernel/module.c| 6 ++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..387f22c41e0d 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -29,6 +29,7 @@
  * __ctors_start, __ctors_end
  * __irqentry_text_start, __irqentry_text_end
  * __softirqentry_text_start, __softirqentry_text_end
+ * __start_opd, __end_opd
  */
 extern char _text[], _stext[], _etext[];
 extern char _data[], _sdata[], _edata[];
@@ -47,12 +48,15 @@ extern char __softirqentry_text_start[], 
__softirqentry_text_end[];
 /* Start and end of .ctors section - used for constructor calls. */
 extern char __ctors_start[], __ctors_end[];
 
+/* Start and end of .opd section - used for function descriptors. */
+extern char __start_opd[], __end_opd[];
+
 extern __visible const void __nosave_begin, __nosave_end;
 
-/* function descriptor handling (if any).  Override
- * in asm/sections.h */
+/* Function descriptor handling (if any).  Override in asm/sections.h */
 #ifndef dereference_function_descriptor
 #define dereference_function_descriptor(p) (p)
+#define dereference_kernel_function_descriptor(p) (p)
 #endif
 
 /* random extra sections (if any).  Override
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 4d0cb9bba93e..172904e9cded 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -85,6 +85,10 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+/* Dereference module function descriptor */
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index ea77ab13bead..b792e814150a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2121,6 +2121,12 @@ void __weak module_arch_freeing_init(struct module *mod)
 {
 }
 
+unsigned long __weak dereference_module_function_descriptor(struct module *mod,
+   unsigned long addr)
+{
+   return addr;
+}
+
 /* Free a module, remove from lists, etc. */
 static void free_module(struct module *mod)
 {
-- 
2.14.1



[RFC][PATCH v2 1/7] switch dereference_function_descriptor() to `unsigned long'

2017-09-20 Thread Sergey Senozhatsky
Convert dereference_function_descriptor() to accept and return
`unsigned long'. There will be two new ARCH function for kernel
and module function pointer dereference, which will work with
`unsigned long', so the patch unifies interfaces.

Besides, dereference_function_descriptor() mostly work with
`unsigned long':

drivers/misc/kgdbts.c:
addr = (unsigned long) dereference_function_descriptor((void *)addr);

init/main.c:
addr = (unsigned long) dereference_function_descriptor(fn);

kernel/extable.c:
addr = (unsigned long) dereference_function_descriptor(ptr);

kernel/module.c:
unsigned long a = (unsigned long)dereference_function_descriptor(addr);

Convert dereference_function_descriptor() users tree-wide.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/ia64/include/asm/sections.h| 6 +++---
 arch/parisc/include/asm/sections.h  | 2 +-
 arch/parisc/kernel/process.c| 6 +++---
 arch/parisc/mm/init.c   | 4 ++--
 arch/powerpc/include/asm/sections.h | 6 +++---
 drivers/misc/kgdbts.c   | 2 +-
 init/main.c | 2 +-
 kernel/extable.c| 2 +-
 kernel/module.c | 2 +-
 lib/vsprintf.c  | 2 +-
 10 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 2ab2003698ef..de6bfa1ef8fb 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -27,13 +27,13 @@ extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
 #undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
+static inline unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   struct fdesc *desc = ptr;
+   struct fdesc *desc = (struct fdesc *)ptr;
void *p;
 
if (!probe_kernel_address(>ip, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index 9d13c3507ad6..59fbe0067112 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -6,7 +6,7 @@
 
 #ifdef CONFIG_64BIT
 #undef dereference_function_descriptor
-void *dereference_function_descriptor(void *);
+unsigned long dereference_function_descriptor(unsigned long);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index a45a67d526f8..f00a5f93492a 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -267,13 +267,13 @@ get_wchan(struct task_struct *p)
 }
 
 #ifdef CONFIG_64BIT
-void *dereference_function_descriptor(void *ptr)
+unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   Elf64_Fdesc *desc = ptr;
+   Elf64_Fdesc *desc = (Elf64_Fdesc *)ptr;
void *p;
 
if (!probe_kernel_address(>addr, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 #endif
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 1ca9a2b4239f..06e1b79e2946 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -389,10 +389,10 @@ static void __init setup_bootmem(void)
 static int __init parisc_text_address(unsigned long vaddr)
 {
static unsigned long head_ptr __initdata;
+   unsigned long addr = (unsigned long)_kernel_start;
 
if (!head_ptr)
-   head_ptr = PAGE_MASK & (unsigned long)
-   dereference_function_descriptor(_kernel_start);
+   head_ptr = PAGE_MASK & dereference_function_descriptor(addr);
 
return core_kernel_text(vaddr) || vaddr == head_ptr;
 }
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 7902d6358854..67379b8945e8 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -66,13 +66,13 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
unsigned long end)
 
 #ifdef PPC64_ELF_ABI_v1
 #undef dereference_function_descriptor
-static inline void *dereference_function_descriptor(void *ptr)
+static inline unsigned long dereference_function_descriptor(unsigned long ptr)
 {
-   struct ppc64_opd_entry *desc = ptr;
+   struct ppc64_opd_entry *desc = (struct ppc64_opd_entry *)ptr;
void *p;
 
if (!probe_kernel_address(>funcaddr, p))
-   ptr = p;
+   ptr = (unsigned long)p;
return ptr;
 }
 #endif /* PPC64_ELF_ABI_v1 */
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index fc7efedbc4be..6a5a159dfb75 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -225,7 +225,7 @@ static unsigned long lookup_addr(char *arg)
addr = (unsigned long)_do_fork;
else if (!strcmp(arg, "hw_break_val"))
addr = (unsigned long)_break_v

[RFC][PATCH v2 0/7] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-20 Thread Sergey Senozhatsky
Hello

RFC

On some arches C function pointers are indirect and point to
a function descriptor, which contains the actual pointer to the code.
This mostly doesn't matter, except for cases when people want to print
out function pointers in symbolic format, because the usual '%pS/%ps'
does not work on those arches as expected. That's the reason why we
have '%pF/%pf', but since it's here because of a subtle ABI detail
specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
'%pF/%pf' and '%pS/%ps' (see [1], for example).

This patch set attempts to move ia64/ppc64/parisc64 C function
pointer ABI details out of printk() to arch code. Function dereference
code now checks if a pointer belongs to a .opd ELF section and dereferences
that pointer only if it does. The kernel and modules have their own .opd
sections that's why I use two different ARCH functions: for kernel and
for module pointer dereference.

I planned to remove dereference_function_descriptor() entirely,
but then I discovered a bunch other uses cases (kgdbts, init/main.c,
extable, etc.), so I decided to keep dereference_function_descriptor()
around because the main point of this patch set is to deprecate %pF/%pf.
But at the same time, I think I can go further and handle both kernel
and module descriptor dereference in dereference_function_descriptor().
We need a module pointer for module .opd check, so that will come at an
extra cost of module lookup (may be there will some other issues along
the way, haven't checked it).

Right now we've got:

- dereference_function_descriptor(addr)
a generic (old) function. it simply attempts to dereference
whatever pointer we give it.

- dereference_kernel_function_descriptor(addr)
dereferences a kernel pointer if it's within the kernel's .opd
section.

- dereference_module_function_descriptor(module, addr)
dereference a module pointer if it's within the module's .opd
section.


*** A BIG NOTE ***
I don't own ia64/ppc64/parisc64 hardware, so the patches are not
tested. Sorry about that!

Another note:
I need to check what is BPF symbol lookup and do we need to
do any dereference there.

v2:
-- convert dereference_function_descriptor() to unsigned long
-- fix kernel descriptor range checks (Helge)
-- fix parisc module descriptor range check (Helge)
-- fix ppc64 module range check
-- add checkpatch patch


Sergey Senozhatsky (7):
  switch dereference_function_descriptor() to `unsigned long'
  sections: split dereference_function_descriptor()
  ia64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  parisc64: Add .opd based function descriptor dereference
  symbol lookup: use new kernel and module dereference functions
  checkpatch: add pF/pf deprecation warning

 Documentation/printk-formats.txt  | 15 +--
 arch/ia64/include/asm/sections.h  | 16 
 arch/ia64/kernel/module.c | 13 +
 arch/ia64/kernel/vmlinux.lds.S|  2 ++
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  4 +++-
 arch/parisc/kernel/module.c   | 17 +
 arch/parisc/kernel/process.c  | 15 ---
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 arch/parisc/mm/init.c |  4 ++--
 arch/powerpc/include/asm/module.h |  3 +++
 arch/powerpc/include/asm/sections.h   | 17 ++---
 arch/powerpc/kernel/module_64.c   | 16 
 arch/powerpc/kernel/vmlinux.lds.S |  2 ++
 drivers/misc/kgdbts.c |  2 +-
 include/asm-generic/sections.h|  8 ++--
 include/linux/moduleloader.h  |  4 
 init/main.c   |  2 +-
 kernel/extable.c  |  2 +-
 kernel/kallsyms.c |  1 +
 kernel/module.c   |  9 -
 lib/vsprintf.c|  5 +
 scripts/checkpatch.pl |  6 --
 23 files changed, 132 insertions(+), 35 deletions(-)

-- 
2.14.1



Re: [PATCH 3/5] powerpc64: Add .opd based function descriptor dereference

2017-09-20 Thread Sergey Senozhatsky
On (09/20/17 11:51), Michael Ellerman wrote:
[..]
> > unlike ppc_function_entry(), printk() can get called on any symbol,
> > not just function pointers.
> >
> > for example,
> >
> > cat /proc/kallsyms | grep shrinker_rwsem
> > 81a4b1e0 d shrinker_rwsem
> 
> Yep, good point. So your patch is probably good then. Maybe someone
> other than me can find time to test it ;)

Hi,

I'll re-spin the series today/tomorrow.

-ss


Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-19 Thread Sergey Senozhatsky
On (09/19/17 22:03), Helge Deller wrote:
[..]
> Your implementation of dereference_module_function_descriptor() in
> arch/parisc/kernel/module.c is faulty.
> mod->arch.fdesc_offset is relative to the base address of the module,
> so you need to add to mod->core_layout.base.

aha, got it. I should have figured that out.
many thanks!


> Here is the relevant patch to fix this issue (against mainline).
> Additionally I compare against mod->arch.fdesc_count instead of
> mod->arch.fdesc_max.

hmm, .fdesc_max looked relevant to me. it's count_fdescs() - the
number of R_PARISC_FPTR64 relocation entries.

but ok, will use .fdesc_count.


> Can you please fold it into your patch
> [PATCH 4/5] parisc64: Add .opd based function descriptor dereference
> for the next round?

sure, will fold. + SoB.

I think I'll try to re-spin the series today (or tomorrow, I'm slightly
overloaded with another stuff right now). I've received enough bug reports
no need to wait for another week ;)

-ss


Re: [PATCH 3/5] powerpc64: Add .opd based function descriptor dereference

2017-09-19 Thread Sergey Senozhatsky
On (09/19/17 20:22), Michael Ellerman wrote:
> > On 2017/09/16 12:53PM, Sergey Senozhatsky wrote:
> >> We are moving towards separate kernel and module function descriptor
> >> dereference callbacks. This patch enables it for powerpc64.
> >> 
> >> For pointers that belong to the kernel
> >> -  Added __start_opd and __end_opd pointers, to track the kernel
> >>.opd section address range;
> >> 
> >> -  Added dereference_kernel_function_descriptor(). Now we
> >>will dereference only function pointers that are within
> >>[__start_opd, __end_opd];
> >> 
> >> For pointers that belong to a module
> >> -  Added dereference_module_function_descriptor() to handle module
> >>function descriptor dereference. Now we will dereference only
> >>pointers that are within [module->opd.start, module->opd.end].
> >
> > Would it be simpler to just use kernel_text_address() and dereference 
> > everything else? See commit 83e840c770f2c5 ("powerpc64/elfv1: Only 
> > dereference function descriptor for non-text symbols") for a related 
> > patch.
> 
> Yeah that would be a lot simpler and probably work perfectly well.

unlike ppc_function_entry(), printk() can get called on any symbol,
not just function pointers.

for example,

cat /proc/kallsyms | grep shrinker_rwsem
81a4b1e0 d shrinker_rwsem

or

cat /proc/kallsyms | grep vm_total_pages
81dcd418 B vm_total_pages


and so on.

-ss


Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Sergey Senozhatsky
On (09/18/17 10:44), Luck, Tony wrote:
[..]
> 
> A few new warnings when building on ia64:
> 
> arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
> 'dereference_function_descriptor' makes pointer from integer without a cast
> arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
> without a cast
> kernel/kallsyms.c:325: warning: assignment makes integer from pointer without 
> a cast
> kernel/kallsyms.c:325: warning: passing argument 1 of 
> 'dereference_kernel_function_descriptor' makes pointer from integer without a 
> cast

got it, will address in v2.

[..]
> Which looks like what you wanted. People unaware of the vagaries
> of ppc64/ia64/parisc64 can use the wrong %p[SF] variant, but still
> get the right output.

thanks!

-ss


Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Sergey Senozhatsky
On (09/18/17 20:39), Helge Deller wrote:
[..]
> > A few new warnings when building on ia64:
> > 
> > arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
> > 'dereference_function_descriptor' makes pointer from integer without a cast
> > arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
> > without a cast
> > kernel/kallsyms.c:325: warning: assignment makes integer from pointer 
> > without a cast
> > kernel/kallsyms.c:325: warning: passing argument 1 of 
> > 'dereference_kernel_function_descriptor' makes pointer from integer without 
> > a cast
> 
> 
> I got similiar warnings on parisc.
> This patch on top of yours fixed those:
> 

Tony, Helge,

thanks for the reports!

I'll simply convert everything to `unsigned long'. including the
dereference_function_descriptor() function [I believe there are
still some casts happening when we pass addr from kernel/module
dereference functions to dereference_function_descriptor(), or
when we return `void *' back to symbol resolution code, etc.)
besides, it seems that everything that uses
dereference_function_descriptor() wants `unsigned long' anyway:

drivers/misc/kgdbts.c:  addr = (unsigned long) 
dereference_function_descriptor((void *)addr);
init/main.c:addr = (unsigned long) dereference_function_descriptor(fn);
kernel/extable.c:   addr = (unsigned long) 
dereference_function_descriptor(ptr);
kernel/module.c:unsigned long a = (unsigned 
long)dereference_function_descriptor(addr);

so I'll just switch it to ulong.


> I did tried your testcases too.
> 
> "echo 1 > /proc/sys/vm/drop_caches" gave correct output:
>  printk#1 schedule_timeout+0x0/0x4a8
>  printk#2 schedule_timeout+0x0/0x4a8
>  printk#3 proc_sys_call_handler+0x120/0x180
>  printk#4 proc_sys_call_handler+0x120/0x180
>  printk#5 proc_sys_call_handler+0x120/0x180
>  printk#6 proc_sys_call_handler+0x120/0x180
> 
> and here is "modprobe zram":
>  printk#7 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
>  printk#8 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
>  printk#9 do_one_initcall+0x194/0x290
>  printk#10 do_one_initcall+0x194/0x290
>  printk#11 do_one_initcall+0x194/0x290
>  printk#12 do_one_initcall+0x194/0x290
>  printk#13 zram_init+0x22c/0x2a0 [zram]
>  printk#14 zram_init+0x22c/0x2a0 [zram]
>  printk#15 zram_init+0x22c/0x2a0 [zram]
>  printk#16 zram_init+0x22c/0x2a0 [zram]
> 
> I wonder why printk#7 and printk#8 don't show "zram_init"...

interesting... what does the unpatched kernel show?


> Regarding your patches:
> 
> In arch/parisc/kernel/process.c:
> +void *dereference_kernel_function_descriptor(void *ptr)
> +{
> +   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)
> 
> This needs to be (__end_opd is outside):
> +   if (ptr < (void *)__start_opd || (void *)__end_opd <= ptr)
> 
> The same is true for the checks in the other arches.

um... yeah. __end_opd is definitely not a valid place for a descriptor!
I think I had `if (!(ptr >= __start_opd && ptr < __end_opd))' which I
wrongly converted. "shame, shame, shame".

thanks!


> I'd suggest to move the various
>   extern char __start_opd[], __end_opd[];
> out of arch//include/asm/sections.h and into 

ok, will take a look.

-ss


Re: [PATCH 3/5] powerpc64: Add .opd based function descriptor dereference

2017-09-16 Thread Sergey Senozhatsky
On (09/16/17 15:13), Naveen N. Rao wrote:
[..]
> Would it be simpler to just use kernel_text_address() and dereference 
> everything else? See commit 83e840c770f2c5 ("powerpc64/elfv1: Only 
> dereference function descriptor for non-text symbols") for a related 
> patch.

I had this idea, see

lkml.kernel.org/r/20170908172528.qc2vdtxzqh777...@intel.com


-ss


[PATCH 5/5] symbol lookup: use new kernel and module dereference functions

2017-09-15 Thread Sergey Senozhatsky
Call appropriate function descriptor dereference ARCH callbacks:
- dereference_kernel_function_descriptor() if the pointer is a
  kernel symbol;

- dereference_module_function_descriptor() if the pointer is a
  module symbol.

This patch also removes dereference_function_descriptor() from
'%pF/%pf' vsprintf handler, because it has the same behavior with
'%pS/%ps' now.

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 Documentation/printk-formats.txt | 15 +--
 kernel/kallsyms.c|  1 +
 kernel/module.c  |  1 +
 lib/vsprintf.c   |  5 +
 4 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt
index 361789df51ec..b2afafc84638 100644
--- a/Documentation/printk-formats.txt
+++ b/Documentation/printk-formats.txt
@@ -50,26 +50,23 @@ Symbols/Function Pointers
 
 ::
 
+   %pS versatile_init+0x0/0x110
+   %ps versatile_init
%pF versatile_init+0x0/0x110
%pf versatile_init
-   %pS versatile_init+0x0/0x110
%pSRversatile_init+0x9/0x110
(with __builtin_extract_return_addr() translation)
-   %ps versatile_init
%pB prev_fn_of_versatile_init+0x88/0x88
 
-The ``F`` and ``f`` specifiers are for printing function pointers,
-for example, f->func,  They have the same result as
-``S`` and ``s`` specifiers. But they do an extra conversion on
-ia64, ppc64 and parisc64 architectures where the function pointers
-are actually function descriptors.
-
 The ``S`` and ``s`` specifiers can be used for printing symbols
 from direct addresses, for example, __builtin_return_address(0),
 (void *)regs->ip. They result in the symbol name with (``S``) or
 without (``s``) offsets. If KALLSYMS are disabled then the symbol
 address is printed instead.
 
+Note, that the ``F`` and ``f`` specifiers are identical to ``S`` (``s``)
+and thus deprecated.
+
 The ``B`` specifier results in the symbol name with offsets and should be
 used when printing stack backtraces. The specifier takes into
 consideration the effect of compiler optimisations which may occur
@@ -77,8 +74,6 @@ when tail-call``s are used and marked with the noreturn GCC 
attribute.
 
 Examples::
 
-   printk("Going to call: %pF\n", gettimeofday);
-   printk("Going to call: %pF\n", p->func);
printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
printk("%s: called from %pS\n", __func__,
(void *)__builtin_return_address(0));
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cfafa55..e2fc09ea9509 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -322,6 +322,7 @@ const char *kallsyms_lookup(unsigned long addr,
if (is_ksym_addr(addr)) {
unsigned long pos;
 
+   addr = dereference_kernel_function_descriptor(addr);
pos = get_symbol_pos(addr, symbolsize, offset);
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),
diff --git a/kernel/module.c b/kernel/module.c
index 87cdb46863cd..4f591f2bbf5a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3948,6 +3948,7 @@ const char *module_address_lookup(unsigned long addr,
preempt_disable();
mod = __module_address(addr);
if (mod) {
+   addr = dereference_module_function_descriptor(mod, addr);
if (modname)
*modname = mod->name;
ret = get_ksymbol(mod, addr, size, offset);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 86c3385b9eb3..bf04b4f5d8e7 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -40,7 +40,6 @@
 #include "../mm/internal.h"/* For the trace_print_flags arrays */
 
 #include   /* for PAGE_SIZE */
-#include   /* for dereference_function_descriptor() */
 #include  /* cpu_to_le16 */
 
 #include 
@@ -1721,10 +1720,8 @@ char *pointer(const char *fmt, char *buf, char *end, 
void *ptr,
}
 
switch (*fmt) {
-   case 'F':
+   case 'F': /* %pF and %pf are kept for compatibility reasons only */
case 'f':
-   ptr = dereference_function_descriptor(ptr);
-   /* Fallthrough */
case 'S':
case 's':
case 'B':
-- 
2.14.1



[PATCH 4/5] parisc64: Add .opd based function descriptor dereference

2017-09-15 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for parisc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  3 +++
 arch/parisc/kernel/module.c   | 14 ++
 arch/parisc/kernel/process.c  | 10 ++
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 5 files changed, 31 insertions(+)

diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S 
b/arch/parisc/boot/compressed/vmlinux.lds.S
index a4ce3314e78e..4ebd4e65524c 100644
--- a/arch/parisc/boot/compressed/vmlinux.lds.S
+++ b/arch/parisc/boot/compressed/vmlinux.lds.S
@@ -29,7 +29,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
diff --git a/arch/parisc/include/asm/sections.h 
b/arch/parisc/include/asm/sections.h
index 9d13c3507ad6..e3cde650b2f9 100644
--- a/arch/parisc/include/asm/sections.h
+++ b/arch/parisc/include/asm/sections.h
@@ -6,7 +6,10 @@
 
 #ifdef CONFIG_64BIT
 #undef dereference_function_descriptor
+#undef dereference_kernel_function_descriptor
+
 void *dereference_function_descriptor(void *);
+void *dereference_kernel_function_descriptor(void *);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index f1a76935a314..bc2eae8634fd 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -954,3 +954,17 @@ void module_arch_cleanup(struct module *mod)
 {
deregister_unwind_table(mod);
 }
+
+#ifdef CONFIG_64BIT
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   void *opd_sz = mod->arch.fdesc_offset +
+  mod->arch.fdesc_max * sizeof(Elf64_Fdesc);
+
+   if (addr < mod->arch.fdesc_offset || opd_sz < addr)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 30f92391a93e..f30776bdaa79 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -267,6 +267,8 @@ get_wchan(struct task_struct *p)
 }
 
 #ifdef CONFIG_64BIT
+extern char __start_opd[], __end_opd[];
+
 void *dereference_function_descriptor(void *ptr)
 {
Elf64_Fdesc *desc = ptr;
@@ -276,6 +278,14 @@ void *dereference_function_descriptor(void *ptr)
ptr = p;
return ptr;
 }
+
+void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
 #endif
 
 static inline unsigned long brk_rnd(void)
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index ffe2cbf52d1a..ab030895dd1e 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -99,7 +99,9 @@ SECTIONS
. = ALIGN(16);
/* Linkage tables */
.opd : {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
} PROVIDE (__gp = .);
.plt : {
*(.plt)
-- 
2.14.1



[PATCH 3/5] powerpc64: Add .opd based function descriptor dereference

2017-09-15 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for powerpc64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/powerpc/include/asm/module.h   |  3 +++
 arch/powerpc/include/asm/sections.h | 13 +
 arch/powerpc/kernel/module_64.c | 16 
 arch/powerpc/kernel/vmlinux.lds.S   |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index 6c0132c7212f..7e28442827f1 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -45,6 +45,9 @@ struct mod_arch_specific {
unsigned long tramp;
 #endif
 
+   /* For module function descriptor dereference */
+   unsigned long start_opd;
+   unsigned long end_opd;
 #else /* powerpc64 */
/* Indices of PLT sections within module. */
unsigned int core_plt_section;
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 7902d6358854..7cc4db86952b 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -16,6 +16,9 @@ extern char __end_interrupts[];
 extern char __prom_init_toc_start[];
 extern char __prom_init_toc_end[];
 
+extern char __start_opd[];
+extern char __end_opd[];
+
 static inline int in_kernel_text(unsigned long addr)
 {
if (addr >= (unsigned long)_stext && addr < (unsigned long)__init_end)
@@ -66,6 +69,8 @@ static inline int overlaps_kvm_tmp(unsigned long start, 
unsigned long end)
 
 #ifdef PPC64_ELF_ABI_v1
 #undef dereference_function_descriptor
+#undef dereference_kernel_function_descriptor
+
 static inline void *dereference_function_descriptor(void *ptr)
 {
struct ppc64_opd_entry *desc = ptr;
@@ -75,6 +80,14 @@ static inline void *dereference_function_descriptor(void 
*ptr)
ptr = p;
return ptr;
 }
+
+static inline void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)
+   return ptr;
+
+   return dereference_function_descriptor(ptr);
+}
 #endif /* PPC64_ELF_ABI_v1 */
 
 #endif
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 0b0f89685b67..52aa5d668364 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -344,6 +344,11 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (strcmp(secstrings + sechdrs[i].sh_name, ".opd")==0) {
+   me->arch.start_opd = sechdrs[i].sh_offset;
+   me->arch.end_opd = me->arch.start_opd +
+   sechdrs[i].sh_size;
+   }
 
/* We don't handle .init for the moment: rename to _init */
while ((p = strstr(secstrings + sechdrs[i].sh_name, ".init")))
@@ -712,6 +717,17 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
return 0;
 }
 
+#ifdef PPC64_ELF_ABI_v1
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   if (addr < mod->arch.start_opd || mod->arch.end_opd < addr)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
+#endif /* PPC64_ELF_ABI_v1 */
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 #ifdef CC_USING_MPROFILE_KERNEL
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 882628fa6987..70e10251e083 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -277,7 +277,9 @@ SECTIONS
}
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
. = ALIGN(256);
-- 
2.14.1



[PATCH 2/5] ia64: Add .opd based function descriptor dereference

2017-09-15 Thread Sergey Senozhatsky
We are moving towards separate kernel and module function descriptor
dereference callbacks. This patch enables it for IA64.

For pointers that belong to the kernel
-  Added __start_opd and __end_opd pointers, to track the kernel
   .opd section address range;

-  Added dereference_kernel_function_descriptor(). Now we
   will dereference only function pointers that are within
   [__start_opd, __end_opd];

For pointers that belong to a module
-  Added dereference_module_function_descriptor() to handle module
   function descriptor dereference. Now we will dereference only
   pointers that are within [module->opd.start, module->opd.end].

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 arch/ia64/include/asm/sections.h | 14 +-
 arch/ia64/kernel/module.c| 13 +
 arch/ia64/kernel/vmlinux.lds.S   |  2 ++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 2ab2003698ef..bff3f3535609 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -25,8 +25,11 @@ extern char __start_gate_fsyscall_patchlist[], 
__end_gate_fsyscall_patchlist[];
 extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_bubble_down_patchlist[];
 extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
+extern char __start_opd[], __end_opd[];
 
 #undef dereference_function_descriptor
+#undef dereference_kernel_function_descriptor
+
 static inline void *dereference_function_descriptor(void *ptr)
 {
struct fdesc *desc = ptr;
@@ -37,6 +40,15 @@ static inline void *dereference_function_descriptor(void 
*ptr)
return ptr;
 }
 
+static inline void *dereference_kernel_function_descriptor(void *ptr)
+{
+   /*
+* Check if the ptr is a function descriptor and thus needs to
+* be dereferenced.
+*/
+   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)
+   return ptr;
+   return dereference_function_descriptor(ptr);
+}
 
 #endif /* _ASM_IA64_SECTIONS_H */
-
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index d1d945c6bd05..d42f1e19d75d 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -35,6 +35,7 @@
 
 #include 
 #include 
+#include 
 
 #define ARCH_MODULE_DEBUG 0
 
@@ -917,3 +918,15 @@ module_arch_cleanup (struct module *mod)
if (mod->arch.core_unw_table)
unw_remove_unwind_table(mod->arch.core_unw_table);
 }
+
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr)
+{
+   Elf64_Shdr *opd = mod->arch.opd;
+
+   if (addr < opd->sh_addr ||
+   (opd->sh_addr + opd->sh_size) < addr)
+   return addr;
+
+   return dereference_function_descriptor(addr);
+}
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index 798026dde52e..f872ba5ff82a 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -107,7 +107,9 @@ SECTIONS {
RODATA
 
.opd : AT(ADDR(.opd) - LOAD_OFFSET) {
+   __start_opd = .;
*(.opd)
+   __end_opd = .;
}
 
/*
-- 
2.14.1



[PATCH 1/5] sections: split dereference_function_descriptor()

2017-09-15 Thread Sergey Senozhatsky
There are two format specifiers to print out a pointer in symbolic
format: '%pS/%ps' and '%pF/%pf'. On most architectures, the two
mean exactly the same thing, but some architectures (ia64, ppc64,
parisc64) use an indirect pointer for C function pointers, where
the function pointer points to a function descriptor (which in
turn contains the actual pointer to the code). The '%pF/%pf, when
used appropriately, automatically does the appropriate function
descriptor dereference on such architectures.

The "when used appropriately" part is tricky. Basically this is
a subtle ABI detail, specific to some platforms, that made it to
the API level and people can be unaware of it and miss the whole
"we need to dereference the function" business out. [1] proves
that point (note that it fixes only '%pF' and '%pS', there might
be '%pf' and '%ps' cases as well).

It appears that we can handle everything within the affected
arches and make '%pS/%ps' smart enough to retire '%pF/%pf'.
Function descriptors live in .opd elf section and all affected
arches (ia64, ppc64, parisc64) handle it properly for kernel
and modules. So we, technically, can decide if the dereference
is needed by simply looking at the pointer: if it belongs to
.opd section then we need to dereference it.

The kernel and modules have their own .opd sections, obviously,
that's why we need to split dereference_function_descriptor()
and use separate kernel and module dereference arch callbacks.

This patch does the first step, it
a) adds dereference_kernel_function_descriptor() function.
b) adds a weak alias to dereference_module_function_descriptor()
   function.

So, for the time being, we will have:
1) dereference_function_descriptor()
   A generic function, that simply dereferences the pointer. There is
   bunch of places that call it: kgdbts, init/main.c, extable, etc.

2) dereference_kernel_function_descriptor()
   A function to call on kernel symbols that does kernel .opd section
   address range test.

3) dereference_module_function_descriptor()
   A function to call on modules' symbols that does modules' .opd
   section address range test.

[1] https://marc.info/?l=linux-kernel=150472969730573

Signed-off-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>
---
 include/asm-generic/sections.h | 4 ++--
 include/linux/moduleloader.h   | 4 
 kernel/module.c| 6 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index e5da44eddd2f..21d2165e531a 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -49,10 +49,10 @@ extern char __ctors_start[], __ctors_end[];
 
 extern __visible const void __nosave_begin, __nosave_end;
 
-/* function descriptor handling (if any).  Override
- * in asm/sections.h */
+/* Function descriptor handling (if any).  Override in asm/sections.h */
 #ifndef dereference_function_descriptor
 #define dereference_function_descriptor(p) (p)
+#define dereference_kernel_function_descriptor(p) (p)
 #endif
 
 /* random extra sections (if any).  Override
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 4d0cb9bba93e..172904e9cded 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -85,6 +85,10 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
+/* Dereference module function descriptor */
+unsigned long dereference_module_function_descriptor(struct module *mod,
+unsigned long addr);
+
 #ifdef CONFIG_KASAN
 #include 
 #define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
diff --git a/kernel/module.c b/kernel/module.c
index de66ec825992..87cdb46863cd 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2121,6 +2121,12 @@ void __weak module_arch_freeing_init(struct module *mod)
 {
 }
 
+unsigned long __weak dereference_module_function_descriptor(struct module *mod,
+   unsigned long addr)
+{
+   return addr;
+}
+
 /* Free a module, remove from lists, etc. */
 static void free_module(struct module *mod)
 {
-- 
2.14.1



[PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-15 Thread Sergey Senozhatsky
Hello

RFC

On some arches C function pointers are indirect and point to
a function descriptor, which contains the actual pointer to the code.
This mostly doesn't matter, except for cases when people want to print
out function pointers in symbolic format, because the usual '%pS/%ps'
does not work on those arches as expected. That's the reason why we
have '%pF/%pf', but since it's here because of a subtle ABI detail
specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
'%pF/%pf' and '%pS/%ps' (see [1], for example).

This patch set attempts to move ia64/ppc64/parisc64 C function
pointer ABI details out of printk() to arch code. Function dereference
code now checks if a pointer belongs to a .opd ELF section and dereferences
that pointer only if it does. The kernel and modules have their own .opd
sections that's why I use two different ARCH functions: for kernel and
for module pointer dereference.

I planned to remove dereference_function_descriptor() entirely,
but then I discovered a bunch other uses cases (kgdbts, init/main.c,
extable, etc.), so I decided to keep dereference_function_descriptor()
around because the main point of this patch set is to deprecate %pF/%pf.
But at the same time, I think I can go further and handle both kernel
and module descriptor dereference in dereference_function_descriptor().
We need a module pointer for module .opd check, so that will come at an
extra cost of module lookup (may be there will some other issues along
the way, haven't checked it).

Right now we've got:

- dereference_function_descriptor(addr)
a generic (old) function. it simply attempts to dereference
whatever pointer we give it.

- dereference_kernel_function_descriptor(addr)
dereferences a kernel pointer if it's within the kernel's .opd
section.

- dereference_module_function_descriptor(module, addr)
dereference a module pointer if it's within the module's .opd
section.


*** A BIG NOTE ***
I don't own ia64/ppc64/parisc64 hardware, so the patches are not
tested. Sorry about that!


Another note:
I need to check what is BPF symbol lookup and do we need to
do any dereference there.


[1] https://marc.info/?l=linux-kernel=150472969730573

Sergey Senozhatsky (5):
  sections: split dereference_function_descriptor()
  ia64: Add .opd based function descriptor dereference
  powerpc64: Add .opd based function descriptor dereference
  parisc64: Add .opd based function descriptor dereference
  symbol lookup: use new kernel and module dereference functions

 Documentation/printk-formats.txt  | 15 +--
 arch/ia64/include/asm/sections.h  | 14 +-
 arch/ia64/kernel/module.c | 13 +
 arch/ia64/kernel/vmlinux.lds.S|  2 ++
 arch/parisc/boot/compressed/vmlinux.lds.S |  2 ++
 arch/parisc/include/asm/sections.h|  3 +++
 arch/parisc/kernel/module.c   | 14 ++
 arch/parisc/kernel/process.c  | 10 ++
 arch/parisc/kernel/vmlinux.lds.S  |  2 ++
 arch/powerpc/include/asm/module.h |  3 +++
 arch/powerpc/include/asm/sections.h   | 13 +
 arch/powerpc/kernel/module_64.c   | 16 
 arch/powerpc/kernel/vmlinux.lds.S |  2 ++
 include/asm-generic/sections.h|  4 ++--
 include/linux/moduleloader.h  |  4 
 kernel/kallsyms.c |  1 +
 kernel/module.c   |  7 +++
 lib/vsprintf.c|  5 +
 18 files changed, 113 insertions(+), 17 deletions(-)

-- 
2.14.1



Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-14 Thread Sergey Senozhatsky
On (09/14/17 11:15), Laurent Dufour wrote:
> On 14/09/2017 11:11, Sergey Senozhatsky wrote:
> > On (09/14/17 10:58), Laurent Dufour wrote:
> > [..]
> >> That's right, but here this is the  sequence counter mm->mm_seq, not the
> >> vm_seq one.
> > 
> > d'oh... you are right.
> 
> So I'm doubting about the probability of a deadlock here, but I don't like
> to see lockdep complaining. Is there an easy way to make it happy ?


 /*
  * well... answering your question - it seems raw versions of seqcount
  * functions don't call lockdep's lock_acquire/lock_release...
  *
  * but I have never told you that. never.
  */


lockdep, perhaps, can be wrong sometimes, and may be it's one of those
cases. may be not... I'm not a MM guy myself.

below is a lockdep splat I got yesterday. that's v3 of SPF patch set.


[ 2763.365898] ==
[ 2763.365899] WARNING: possible circular locking dependency detected
[ 2763.365902] 4.13.0-next-20170913-dbg-00039-ge3c06ea4b028-dirty #1837 Not 
tainted
[ 2763.365903] --
[ 2763.365905] khugepaged/42 is trying to acquire lock:
[ 2763.365906]  (>i_mmap_rwsem){}, at: [] 
rmap_walk_file+0x5a/0x142
[ 2763.365913] 
   but task is already holding lock:
[ 2763.365915]  (fs_reclaim){+.+.}, at: [] 
fs_reclaim_acquire+0x12/0x35
[ 2763.365920] 
   which lock already depends on the new lock.

[ 2763.365922] 
   the existing dependency chain (in reverse order) is:
[ 2763.365924] 
   -> #3 (fs_reclaim){+.+.}:
[ 2763.365930]lock_acquire+0x176/0x19e
[ 2763.365932]fs_reclaim_acquire+0x32/0x35
[ 2763.365934]__alloc_pages_nodemask+0x6d/0x1f9
[ 2763.365937]pte_alloc_one+0x17/0x62
[ 2763.365940]__pte_alloc+0x1f/0x83
[ 2763.365943]move_page_tables+0x2c3/0x5a2
[ 2763.365944]move_vma.isra.25+0xff/0x29f
[ 2763.365946]SyS_mremap+0x41b/0x49e
[ 2763.365949]entry_SYSCALL_64_fastpath+0x18/0xad
[ 2763.365951] 
   -> #2 (>vm_sequence/1){+.+.}:
[ 2763.365955]lock_acquire+0x176/0x19e
[ 2763.365958]write_seqcount_begin_nested+0x1b/0x1d
[ 2763.365959]__vma_adjust+0x1c4/0x5f1
[ 2763.365961]__split_vma+0x12c/0x181
[ 2763.365963]do_munmap+0x128/0x2af
[ 2763.365965]vm_munmap+0x5a/0x73
[ 2763.365968]elf_map+0xb1/0xce
[ 2763.365970]load_elf_binary+0x91e/0x137a
[ 2763.365973]search_binary_handler+0x70/0x1f3
[ 2763.365974]do_execveat_common+0x45e/0x68e
[ 2763.365978]call_usermodehelper_exec_async+0xf7/0x11f
[ 2763.365980]ret_from_fork+0x27/0x40
[ 2763.365981] 
   -> #1 (>vm_sequence){+.+.}:
[ 2763.365985]lock_acquire+0x176/0x19e
[ 2763.365987]write_seqcount_begin_nested+0x1b/0x1d
[ 2763.365989]__vma_adjust+0x1a9/0x5f1
[ 2763.365991]__split_vma+0x12c/0x181
[ 2763.365993]do_munmap+0x128/0x2af
[ 2763.365994]vm_munmap+0x5a/0x73
[ 2763.365996]elf_map+0xb1/0xce
[ 2763.365998]load_elf_binary+0x91e/0x137a
[ 2763.365999]search_binary_handler+0x70/0x1f3
[ 2763.366001]do_execveat_common+0x45e/0x68e
[ 2763.366003]call_usermodehelper_exec_async+0xf7/0x11f
[ 2763.366005]ret_from_fork+0x27/0x40
[ 2763.366006] 
   -> #0 (>i_mmap_rwsem){}:
[ 2763.366010]__lock_acquire+0xa72/0xca0
[ 2763.366012]lock_acquire+0x176/0x19e
[ 2763.366015]down_read+0x3b/0x55
[ 2763.366017]rmap_walk_file+0x5a/0x142
[ 2763.366018]page_referenced+0xfc/0x134
[ 2763.366022]shrink_active_list+0x1ac/0x37d
[ 2763.366024]shrink_node_memcg.constprop.72+0x3ca/0x567
[ 2763.366026]shrink_node+0x3f/0x14c
[ 2763.366028]try_to_free_pages+0x288/0x47a
[ 2763.366030]__alloc_pages_slowpath+0x3a7/0xa49
[ 2763.366032]__alloc_pages_nodemask+0xf1/0x1f9
[ 2763.366035]khugepaged+0xc8/0x167c
[ 2763.366037]kthread+0x133/0x13b
[ 2763.366039]ret_from_fork+0x27/0x40
[ 2763.366040] 
   other info that might help us debug this:

[ 2763.366042] Chain exists of:
 >i_mmap_rwsem --> >vm_sequence/1 --> fs_reclaim

[ 2763.366048]  Possible unsafe locking scenario:

[ 2763.366049]CPU0CPU1
[ 2763.366050]
[ 2763.366051]   lock(fs_reclaim);
[ 2763.366054]lock(>vm_sequence/1);
[ 2763.366056]lock(fs_reclaim);
[ 2763.366058]   lock(>i_mmap_rwsem);
[ 2763.366061] 
*** DEADLOCK ***

[ 2763.366063] 1 lock held by khugepaged/42:
[ 2763.366064]  #0:  (fs_reclaim){+.+.}, at: [] 
fs_reclaim_acquire+0x12/0x35
[ 2763.366068] 
   stack backtrace:
[ 2763.366071] CPU: 2 PID: 42 Comm: khugepa

Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-14 Thread Sergey Senozhatsky
On (09/14/17 10:58), Laurent Dufour wrote:
[..]
> That's right, but here this is the  sequence counter mm->mm_seq, not the
> vm_seq one.

d'oh... you are right.

-ss


Re: [PATCH v3 04/20] mm: VMA sequence count

2017-09-14 Thread Sergey Senozhatsky
Hi,

On (09/14/17 09:55), Laurent Dufour wrote:
[..]
> > so if there are two CPUs, one doing write_seqcount() and the other one
> > doing read_seqcount() then what can happen is something like this
> > 
> > CPU0CPU1
> > 
> > fs_reclaim_acquire()
> > write_seqcount_begin()
> > fs_reclaim_acquire()read_seqcount_begin()
> > write_seqcount_end()
> > 
> > CPU0 can't write_seqcount_end() because of fs_reclaim_acquire() from
> > CPU1, CPU1 can't read_seqcount_begin() because CPU0 did 
> > write_seqcount_begin()
> > and now waits for fs_reclaim_acquire(). makes sense?
> 
> Yes, this makes sense.
> 
> But in the case of this series, there is no call to
> __read_seqcount_begin(), and the reader (the speculative page fault
> handler), is just checking for (vm_seq & 1) and if this is true, simply
> exit the speculative path without waiting.
> So there is no deadlock possibility.

probably lockdep just knows that those locks interleave at some
point.


by the way, I think there is one path that can spin

find_vma_srcu()
 read_seqbegin()
  read_seqcount_begin()
   raw_read_seqcount_begin()
__read_seqcount_begin()

-ss


  1   2   >