Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-08 Thread Mike Galbraith
On Fri, 2016-04-08 at 15:19 -0700, Doug Smythies wrote:

> Could you send me, or point me to, the program "pipe-test"?
> So far, I have only found one, but it is both old and not
> the same program you are running (based on print statements).

It's the same old pipe-test, just bent up a little to suit my usage.

-Mike


Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-08 Thread Mike Galbraith
On Fri, 2016-04-08 at 22:59 +0200, Rafael J. Wysocki wrote:
> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
> > On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
> > 
> > > Cute, I thought you used governor=performance for your runs?
> > 
> > I do, and those numbers are with it thus set.
> 
> Well, this is a trade-off.
> 
> 4.5 introduced a power regression here so this one goes back to the previous
> state of things.

That sounds somewhat reasonable.  Too bad I don't have a super duper
watt meter handy.. seeing that you really really are saving me money
would perhaps make me less fond of those prettier numbers.

-Mike


DONATION !!!

2016-04-08 Thread Ally Mohammed
This Message is directed to you from Saudi Arabia Prince Alwaleed bin Talal for 
his charity donation and You have been selected as recipient/benefactor for 
$2.5 Million Dollars from Prince Alwaleed Philanthropic Foundation Grant. For 
more information contact rebeccab...@careceo.com

Thanks
Ally Mohammed


Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP

2016-04-08 Thread Mike Galbraith
On Fri, 2016-04-08 at 16:11 -0400, Tejun Heo wrote:

> > That's just plain broken... That is not how a proportional weight based
> > hierarchical controller works.
> 
> That's a strong statement.  When the hierarchy is composed of
> equivalent objects as in CPU, not distinguishing internal and leaf
> nodes would be a more natural way to organize; however...

You almost said it yourself, you want to make the natural organization
of cpu, cpuacct and cpuset controllers a prohibited organization. 
 There is no "however..." that can possibly justify that.  It's akin to
mandating: henceforth "horse" shall be spelled "cow", riders thereof
shall teach their "cow" the proper enunciation of "moo".  It's silly.

Like it or not, these controllers have thread encoded in their DNA,
it's an integral part of what they are, and how real users in the real
world use them.  No rationalization will change that cold hard fact.

Make an "Aunt Tilly" button for those incapable of comprehending the
complexities if you will, but please don't make cgroups so rigid and
idiot proof that only idiots (hi system thing) can use it.

-Mike

 


[GIT PULL] parisc fixes for 4.6-rc2

2016-04-08 Thread Helge Deller
Hi Linus,

please pull some important fixes for the parisc architecture from:

  git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
parisc-4.6-3

Since commit 0de7985 (parisc: Use generic extable search and sort routines)
module loading is boken on parisc, because the parisc module loader wasn't
prepared for the new R_PARISC_PCREL32 relocations.

In addition, due to that breakage, Mikulas Patocka noticed that handling
exceptions from modules probably never worked on parisc. It was just masked by
the fact that exceptions from modules don't happen during normal use.

This patch series fixes those issues and survives the tests of the
lib/test_user_copy kernel module test.  Some patches are tagged for stable.

Thanks,
Helge


Helge Deller (5):
  parisc: Handle R_PARISC_PCREL32 relocations in kernel modules
  parisc: Avoid function pointers for kernel exception routines
  parisc: Fix kernel crash with reversed copy_from_user()
  parisc: Unbreak handling exceptions from kernel modules
  parisc: Update comment regarding relative extable support

 arch/parisc/include/asm/uaccess.h | 11 +--
 arch/parisc/kernel/asm-offsets.c  |  1 +
 arch/parisc/kernel/module.c   |  8 
 arch/parisc/kernel/parisc_ksyms.c | 10 +-
 arch/parisc/kernel/traps.c|  3 +++
 arch/parisc/lib/fixup.S   |  6 ++
 arch/parisc/mm/fault.c|  1 +
 7 files changed, 29 insertions(+), 11 deletions(-)


Re: Kernel crash on startup - bisected to commit 3b24d854cb35

2016-04-08 Thread Eric Dumazet
On Fri, Apr 8, 2016 at 10:28 PM, Larry Finger  wrote:
> Following a recent pull of the wireless-drivers-next repo. my system got a
> kernel panic on startup at native_apic_msr_write+0x27. The problem was
> bisected to commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt
> under synflood"). I am confident of the bisection as a kernel built with the
> previous commit (3a5d1c0) boots OK.
>
> I have not yet captured the entire traceback for the crash, but I do have a
> crappy photo of the screen that I have attached. The RIP is at
> native_apic_msr_write+0x27. As this crash is likely configuration dependent,
> a copy of my .config is also attached. Note that IPv6 is turned off on my
> machine.
>
> Please let me know if any other info is needed.

Can you double check you have this fix ?

commit 8501786929de4616b10b8059ad97abd304a7dddf
Author: Eric Dumazet 
Date:   Wed Apr 6 22:07:34 2016 -0700

tcp/dccp: fix inet_reuseport_add_sock()

David Ahern reported panics in __inet_hash() caused by my recent commit.

The reason is inet_reuseport_add_sock() was still using
sk_nulls_for_each_rcu() instead of sk_for_each_rcu().
SO_REUSEPORT enabled listeners were causing an instant crash.

While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE
flag, as it is inherited from the parent at clone time.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt
under synflood")
Signed-off-by: Eric Dumazet 
Reported-by: David Ahern 
Tested-by: David Ahern 
Signed-off-by: David S. Miller 


Re: [PATCH v11 00/60] PCI: Resource allocation cleanup for v4.7

2016-04-08 Thread Yinghai Lu
On Thu, Apr 7, 2016 at 5:51 PM, Linus Torvalds
 wrote:
>
> So please, try to split this up sanely, and let's merge it in sane
> pieces. I see that you have that M7101 quirk removal randomly in the
> middle of this series, for example, and it doesn't seem to be the only
> such random patch. That's entirely independent of all the other
> patches in the series (and I thought I acked it already, but
> whatever).

ok, I will split them in small series.

Bjorn,

Can you review patch 1 - 16, and put them into pci-next at first ?

patch 1-11: parse MEM64 for sparc and other system with OF

patch 12-16: MMIO64 allocation enhancement
treat non-pref mmio64 if parent bridges are all pcie.
restore old pref allocation logic if hostbridge does not support mmio64.

Bjorn Helgaas (2):
  PCI: Fix iomem_is_exclusive() checking in pci_mmap_resource()
  alpha/PCI: Only check iomem_is_exclusive() for IORESOURCE_MEM, not
IORESOURCE_IO

Yinghai Lu (58):
  PCI: Add pci_find_bus_resource()
  sparc/PCI: Use correct offset for bus address to resource
  sparc/PCI: Reserve legacy mmio after PCI mmio
  sparc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  sparc/PCI: Keep resource idx order with bridge register number
  PCI: Kill wrong quirk about M7101
  powerpc/PCI: Keep resource idx order with bridge register number
  powerpc/PCI: Add IORESOURCE_MEM_64 for 64-bit resource in OF parsing
  OF/PCI: Add IORESOURCE_MEM_64 for 64-bit resource
  PCI: Check pref compatible bit for mem64 resource of PCIe device
  PCI: Only treat non-pref mmio64 as pref if all bridges have MEM_64
  PCI: Add has_mem64 for struct host_bridge
  PCI: Only treat non-pref mmio64 as pref if host bridge has mmio64
  PCI: Restore pref MMIO allocation logic for host bridge without mmio64

I will sort out others late.

Thanks

Yinghai


Re: [PATCH 4/4] irqchip: bcm2836: Use a more generic memory barrier call

2016-04-08 Thread Stephen Warren

On 04/08/2016 12:20 PM, Eric Anholt wrote:

Stephen Warren  writes:


On 04/04/2016 09:44 PM, Eric Anholt wrote:

dsb() requires an argument on arm64, so we needed to add "sy".
Instead, take this opportunity to switch to the same smp_wmb() call
that gic uses for its IPIs.  This is a less strong barrier than we
were doing before (dmb(ishst) compared to dsb(sy)), but it seems to be
the correct one.


I assume all MMIO is part of the ish domain?

If so, the series,
Acked-by: Stephen Warren 


I don't know if this barrier implies ordering all the way out to AXI on
this HW, but I don't think that's a requirement of this function.


My understanding was that the barrier was explicitly to work around a 
bug in the bus fabric of the SoC, and hence the barrier very much does 
have to affect the transaction all the way out to AXI. Re-reading 
BCM2835-ARM-Peripherals.pdf section 1.3 "Peripheral access precautions 
for correct memory ordering" seems to confirm this.


[GIT] Networking

2016-04-08 Thread David Miller

1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
   from Haishuang Yan.

2) Fix multicast frame handling in mac80211 AP code, from Felix
   Fietkau.

3) mac80211 station hashtable insert errors not handled properly, fix
   from Johannes Berg.

4) Fix TX descriptor count limit handling in e1000, from Alexander Duyck.

5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.

6) Must assign rtnl_link_ops of the device before registering it,
   fix in ip6_tunnel from Thadeu Lima de Souza Cascardo.

7) Memory leak fix in tc action net exit, from WANG Cong.

8) Add missing AF_KCM entries to name tables, from Dexuan Cui.

9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
   Duyck.

10) Fix memory allocation alignment and congestion map corruption in
RDS, from Shamir Rabinovitch.

11) Fix default qdisc regression in tuntap driver, from Jason Wang.

Please pull, thanks a lot!

The following changes since commit 05cf8077e54b20dddb756eaa26f3aeb5c38dd3cf:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-04-01 
20:03:33 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 30d237a6c2e9be1bb816fe8e787b88fd7aad833b:

  Merge tag 'mac80211-for-davem-2016-04-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 (2016-04-08 
16:41:28 -0400)


Alexander Duyck (3):
  e1000: Do not overestimate descriptor counts in Tx pre-check
  e1000: Double Tx descriptors needed check for 82544
  GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU

Arik Nemtsov (3):
  mac80211: TDLS: always downgrade invalid chandefs
  mac80211: TDLS: change BW calculation for WIDER_BW peers
  mac80211: recalc min_def chanctx even when chandef is identical

Bastien Philbert (1):
  bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr

Ben Greear (1):
  mac80211: ensure no limits on station rhashtable

Bjorn Helgaas (1):
  Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"

Dave Jones (1):
  af_packet: tone down the Tx-ring unsupported spew.

David S. Miller (3):
  Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue
  Revert "bridge: Fix incorrect variable assignment on error path in 
br_sysfs_addbr"
  Merge tag 'mac80211-for-davem-2016-04-06' of 
git://git.kernel.org/.../jberg/mac80211

Dexuan Cui (1):
  net: add the AF_KCM entries to family name tables

Emmanuel Grumbach (2):
  mac80211: don't send deferred frames outside the SP
  mac80211: close the SP when we enqueue frames during the SP

Felix Fietkau (1):
  mac80211: fix AP buffered multicast frames with queue control and txq

Giuseppe CAVALLARO (1):
  stmmac: fix adjust link call in case of a switch is attached

Haishuang Yan (2):
  ipv4: l2tp: fix a potential issue in l2tp_ip_recv
  ipv6: l2tp: fix a potential issue in l2tp_ip6_recv

Hariprasad Shenai (1):
  cxgb4: Add pci device id for chelsio t520-cr adapter

Ilan Peer (1):
  mac80211: Fix BW upgrade for TDLS peers

Jakub Sitnicki (1):
  ipv6: Count in extension headers in skb->network_header

Jason Wang (1):
  tuntap: restore default qdisc

Jeff Mahoney (1):
  mac80211: fix "warning: ‘target_metric’ may be used uninitialized"

Jesse Brandeburg (1):
  i40e: fix errant PCIe bandwidth message

Jiri Benc (1):
  MAINTAINERS: intel-wired-lan list is moderated

Johannes Berg (1):
  mac80211: properly deal with station hashtable insert errors

Jorgen Hansen (1):
  VSOCK: Detach QP check should filter out non matching QPs.

Luis de Bethencourt (2):
  mac80211: add doc for RX_FLAG_DUP_VALIDATED flag
  mac80211: remove description of dropped member

Marcelo Ricardo Leitner (2):
  sctp: flush if we can't fit another DATA chunk
  sctp: use list_* in sctp_list_dequeue

Naveen N. Rao (7):
  samples/bpf: Fix build breakage with map_perf_test_user.c
  samples/bpf: Use llc in PATH, rather than a hardcoded value
  samples/bpf: Enable powerpc support
  lib/test_bpf: Fix JMP_JSET tests
  lib/test_bpf: Add tests for unsigned BPF_JGT
  lib/test_bpf: Add test to check for result of 32-bit add that overflows
  lib/test_bpf: Add additional BPF_ADD tests

Roopa Prabhu (1):
  mpls: find_outdev: check for err ptr in addition to NULL check

Thadeu Lima de Souza Cascardo (1):
  ip6_tunnel: set rtnl_link_ops before calling register_netdevice

WANG Cong (1):
  net_sched: fix a memory leak in tc action

shamir rabinovitch (2):
  RDS: memory allocated must be align to 8
  RDS: fix congestion map corruption for PAGE_SIZE > 4k

stephen hemminger (1):
  bridge, netem: mark mailing lists as moderated

 MAINTAINERS|   6 +-
 drivers/net/ethernet/che

Re: [PATCH 2/3] oom, oom_reaper: Try to reap tasks which skipregular OOM killer path

2016-04-08 Thread Tetsuo Handa
Michal Hocko wrote:
> On Fri 08-04-16 20:19:28, Tetsuo Handa wrote:
> > I looked at next-20160408 but I again came to think that we should remove
> > these shortcuts (something like a patch shown bottom).
>
> feel free to send the patch with the full description. But I would
> really encourage you to check the history to learn why those have been
> added and describe why those concerns are not valid/important anymore.

I believe that past discussions and decisions about current code are too
optimistic because they did not take 'The "too small to fail" memory-
allocation rule' problem into account.

If you ignore me with "check the history to learn why those have been added
and describe why those concerns are not valid/important anymore", I can do
nothing. What are valid/important concerns that have higher priority than
keeping 'The "too small to fail" memory-allocation rule' problem and continue
telling a lie to end users? Please enumerate such concerns.

> Your way of throwing a large patch based on an extreme load which is
> basically DoSing the machine is not the ideal one.

You are not paying attention to real world's limitations I'm facing.
I have to waste my resource trying to identify and fix on behalf of
customers before they determine the kernel version to use for their
systems, for your way of thinking is that "We don't need to worry about
it because I have never received such report" while the reality of
customers is that "I'm not skillful enough to catch the problematic
behavior and make a reproducer" or "I have a little skill but I'm not
permitted to modify my systems for reporting the problematic behavior".
If you listen to me, I don't need to do such thing. It is very very sad.


Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

2016-04-08 Thread Xunlei Pang
On 2016/04/09 at 03:28, Steven Rostedt wrote:
> On Fri, 8 Apr 2016 15:15:42 -0400
> Steven Rostedt  wrote:
>
>> From what I understand, the slowfn() modifies the task pi_list (or
>> rbtree, as it is today). As this is an unlock, the task being woken
>> (the next one to grab the lock) is removed from the previous task's pi
>> list.
>>
>> In rt_mutex_adjust_prio(current) I see it simply grabs current's
>> pi_lock and calls __rt_mutex_adjust_prio(current). This calls
>> rt_mutex_getprio(current) which returns current's normal prio if it
>> doesn't have any pi waiters, or it looks at the top pi waiter on the
>> tasks list and returns that. Which wouldn't be the task on wake_q,
>> otherwise we wouldn't be deboosting in the first place.
>>
> OK, I now see that the your previous patch is changing what I'm looking
> at :-) This is what happens when you go away and try to catch up on
> email and not read the emails by threads. I see the
> rt_mutex_adjust_prio() is being changed.
>
> I'll go back and look at your previous patch (as I looked at that while
> traveling and didn't think too hard about it).

Sorry for that, I should add more comments about it, will add more next version.

Regards,
Xunlei

>
> -- Steve



[9 Apr 2016] Singapore Government Hackers Keep Deleting Teo En Ming's Firefox Bookmarks

2016-04-08 Thread Teo En Ming
Computer Hacking Incident Reporting (9 April 2016 Saturday)
===

Singapore Government computer hackers keep deleting Teo En Ming's
Firefox bookmarks. A few days ago, on 4 April 2016 Monday, I began to
systematically add Firefox bookmarks for all the open tabs in my
Firefox web browser. But numerous of my Firefox bookmarks were deleted
by Singapore Government computer hackers.

Today, on 9 April 2016 Saturday, I started to systematically add
Firefox bookmarks for all the open tabs in my Firefox web browser
AGAIN. AGAIN, numerous of my Firefox bookmarks were deleted by
Singapore Government computer hackers, AGAIN.

It goes without saying that Singapore Government computer hackers have
already hacked into my home desktop computer system a very long time
ago. It appears that Singapore Government computer hackers can very
easily hack into your Windows 10 64-bit Home Edition operating
systems. I believe there is a BACKDOOR in Windows 10 operating systems
which allows Singapore Government computer hackers to hack into your
computer very easily.

Yours sincerely,

Mr. Teo En Ming (Zhang Enming)
Singapore Citizen
TARGETED INDIVIDUAL (TI)
Mr. Teo En Ming (Zhang Enming) is Persecuted, Targeted and Blacklisted
by the Singapore Government
9 April 2016 Saturday 11:11 AM Singapore Time GMT+8


Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

2016-04-08 Thread Xunlei Pang
On 2016/04/09 at 02:59, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 02:50:55PM -0400, Steven Rostedt wrote:
>> On Fri, 8 Apr 2016 19:38:35 +0200
>> Peter Zijlstra  wrote:
>>
>>> On Fri, Apr 08, 2016 at 12:25:10PM -0400, Steven Rostedt wrote:
>>>
 So the preempt_disable() is to allow us to set current back to its
 normal priority first before waking up the other task because we don't
 want two tasks at the same priority?  
 What's the point of swapping deboost and the wake up again?  
>>> In the context of this patch, it ensures the new pi_task pointer points
>>> to something that exists -- this is a rather useful property for a
>>> pointer to have.
>> It's not clear to what would make the new pi_task pointer object no
>> longer exist from this patch. I take it that waking up the wake_q, will
>> cause something to change in the code of rt_mutex_adjust_prio(current).
>> If so, it should probably be stated in a comment, because nothing is
>> obvious here.
> Its pretty obvious that a running task can exit :-)
>
> But also, wake_q holds a task ref.
>
>>> It furthermore guarantees that it points to a blocked task, another
>>> useful property.
>> I would think that the slowfn() would have removed anything to do with
>> what's on the wake_q removed from current.
> It cannot.
>
>> What task on what pointer.
>> I'm only looking at this current patch, not anything to do with the
>> original patch of this thread. That is, just the swap of waking up
>> wake_q and calling rt_mutex_adjust_prio().
> This whole patch was in the context of the previous patch, as should be
> clear from the thread.
>
> In any case, I just realized we do not in fact provide this guarantee
> (of pointing to a blocked task) that needs a bit more work.

Current patch calls rt_mutex_adjust_prio() before wake_up_q() the wakee, at 
that moment
the wakee has been removed by rt_mutex_slowunlock()->mark_wakeup_next_waiter(),
from current's pi_waiters, rt_mutex_adjust_prio() won't see this wakee, so I 
think this should
not be problem.

Regards,
Xunlei



[PATCH] nvme/host: Add missing blk_integrity tag_size + flags assignments

2016-04-08 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

While doing recent bring-up of nvme/host with target-core T10-PI,
I noticed /sys/block/nvme*/integrity/device_is_integrity_capable
was false, and /sys/block/nvme*/integrity/tag_size contained
a bogus value.

AFAICT outside of blk_integrity_compare() for DM + MD these
are informational values, but go ahead and add the missing
assignments for nvme/host to match what SCSI does within
sd_dif_config_host() for consistency's sake.

Cc: Keith Busch 
Cc: Jay Freyensee 
Cc: Martin K. Petersen 
Cc: Sagi Grimberg 
Cc: Christoph Hellwig 
Cc: Jens Axboe 
Signed-off-by: Nicholas Bellinger 
---
 drivers/nvme/host/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b8e22fe..cbd08f8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -674,10 +674,14 @@ static void nvme_init_integrity(struct nvme_ns *ns)
switch (ns->pi_type) {
case NVME_NS_DPS_PI_TYPE3:
integrity.profile = &t10_pi_type3_crc;
+   integrity.tag_size = sizeof(u16) + sizeof(u32);
+   integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
break;
case NVME_NS_DPS_PI_TYPE1:
case NVME_NS_DPS_PI_TYPE2:
integrity.profile = &t10_pi_type1_crc;
+   integrity.tag_size = sizeof(u16);
+   integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
break;
default:
integrity.profile = NULL;
-- 
1.9.1



Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

2016-04-08 Thread Arnd Bergmann
On Friday 08 April 2016, Andrew Pinski wrote:
> On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski  wrote:
> > On Wed, 6 Apr 2016, Geert Uytterhoeven wrote:
> >> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov  
> >> wrote:
> >>>  v6:
> >>>  - time_t, __kenel_off_t and other types turned to be 32-bit
> >>>for compatibility reasons (after v5 discussion);
> >
> > Introducing a new arch today with y2038 problems is not a good idea.
> > Linus said so with appropriately pointy words in 2011.

This was before we made the decision to fix the y2038 problem for all
architectures.

> This is the third time we had this discussion on time_t for ILP32.  I
> had originally it as 32bit, then Catalin suggested I change it to
> 64bit and then Arnd (with his work for 2038 issue on 32bit arch) said
> ILP32 should match all other 32bit targets and the other 64bit time_t
> be fixed by the current work he was working on.  Now you are
> suggesting we change it again.
> Arnd can you please comment more on why we want 32bit time_t instead
> of the 64bit one?  I Know there was some POSIX (or was it C90)
> violation but I suspect there is an easy way to workaround this inside
> the kernel but the discussion to move over to 32bit time_t was already
> made by the time I started to look into that.

x32 still runs into new problems today, and will continue to have problems
with newly added drivers that pass time_t (or other __kernel_long_t) arguments
through ioctl.

To avoid having to audit every new driver for interfaces that behave
differently based on the __kernel_long_t definition, arm64 is not following
the same route as x86 here and instead uses the normal 32-bit ABI like
any other architecture. This means we use 32-bit time_t, aio_context_t,
size_t and clock_t and share the system call implementation with the
compat handling for arm (aarch32) mode.

Once we have the interfaces for 64-bit time_t in place in the kernel,
we will be able to rebuild glibc on all 32-bit architectures including
arm and arm64/ilp32 that way.

The POSIX and C99 incompatibility you mention is about struct timespec,
which uses 'long' as the type for the tv_nsec member. This is vaguely
related to the issue of 64-bit time_t, but is not the reason for
starting out with 32-bit time_t for the new ABI here.

[side note:
How to precisely handle tv_nsec on 32-bit architectures is still an open
issue that will have to be solved when we nail down the new system call
interfaces:
The issue specifically is what happens when the upper half of the
second 64-bit word in struct timespec argument passed into a system
call is nonzero: the normal 64-bit syscalls must return an error,
while the 32-bit user space expects the kernel to ignore the upper bits.
This means something between the application and the native system call
has to clear the bits, and this can either be done by copying the
data inside of glibc (as done on x32) or by adding an extra system
call entry point in the kernel.]

> >> We're already closer to the (future) y2038 than to the (past) introduction 
> >> of
> >> LP64...
> >>
> >> These unfixable legacy applications have been spreading through x32 to
> >> the shiny new arm64 server architecture (does ppc64el also have an ILP32 
> >> mode,
> >> or is it planned)? Lots of resources are spent on maintaining the status 
> >> quo,
> >> instead of on fixing the real problems.
> >
> > As an x32 (userland) porter, I can tell you that time_t!=long _did_ cause
> > non-trivial amounts of work.  But that work is already done (at least in
> > Debian), so you might as well benefit from it.
> 
> There is actually private code out there which uses timespec and
> timeval to pass time over the wire; yes I know bad coding style and
> all but they did it that way.  This is code which was working for x86
> and we are porting it to ARM64; a data center code by the way; not
> some networking code even.  This means they have not ported the code
> to fully 64bit yet and they might never.

This code will run into the same problem on arm64/ilp32 when built against
a future libc implementation that defines time_t as 64-bit, but at least
the glibc maintainers so far plan to leave this as a per-application
option for the forseeable future: even on a system that uses 64-bit time_t
in user space and kernel by default, you should be able to build an
application using a 32-bit time_t.

Arnd


Re: [PATCH v5 6/9] irqchip/gic-v3: Parse and export virtual GIC information

2016-04-08 Thread Shanker Donthineni
Hi Julien,

On 04/04/2016 06:37 AM, Julien Grall wrote:
> Fill up the recently introduced gic_kvm_info with the hardware
> information used for virtualization.
>
> Signed-off-by: Julien Grall 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
>
> ---
> Changes in v5:
> - Remove the alignment check for GICV. It's already done in the
> KVM code.
> - Fix initialization of KVM with ACPI.
>
> Changes in v4:
> - Change the flow to call gic_kvm_set_info only when all the
> mandatory information are valid.
> - Remove unecessary code in ACPI parsing (the virtual control
> interface doesn't exist for GICv3).
> - Rework commit message
> - Rework the ACPI support as it didn't collect hardware info for
> virtualization when there is more than 1 redistributor region
>
> Changes in v3:
> - Add ACPI support
>
> Changes in v2:
> - Use 0 rather than a negative value to know when the maintenance
> IRQ
> is not present.
> - Use resource for vcpu and vctrl
> ---
>  drivers/irqchip/irq-gic-v3.c   | 110
> -
>  include/linux/irqchip/arm-gic-common.h |   1 +
>  2 files changed, 110 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 50e87e6..08afbfe 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -28,6 +28,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -56,6 +57,8 @@ struct gic_chip_data {
>  static struct gic_chip_data gic_data __read_mostly;
>  static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
>  
> +static struct gic_kvm_info gic_v3_kvm_info;
> +
>  #define gic_data_rdist()
> (this_cpu_ptr(gic_data.rdists.rdist))
>  #define gic_data_rdist_rd_base() (gic_data_rdist()->rd_base)
>  #define gic_data_rdist_sgi_base()(gic_data_rdist_rd_base() +
> SZ_64K)
> @@ -901,6 +904,30 @@ static int __init gic_validate_dist_version(void
> __iomem *dist_base)
>   return 0;
>  }
>  
> +static void __init gic_of_setup_kvm_info(struct device_node *node)
> +{
> + int ret;
> + struct resource r;
> + u32 gicv_idx;
> +
> + gic_v3_kvm_info.type = GIC_V3;
> +
> + gic_v3_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
> + if (!gic_v3_kvm_info.maint_irq)
> + return;
> +
> + if (of_property_read_u32(node, "#redistributor-regions",
> +  &gicv_idx))
> + gicv_idx = 1;
> +
> + gicv_idx += 3;  /* Also skip GICD, GICC, GICH */
> + ret = of_address_to_resource(node, gicv_idx, &r);
> + if (!ret)
> + gic_v3_kvm_info.vcpu = r;
> +
> + gic_set_kvm_info(&gic_v3_kvm_info);
> +}
> +
>  static int __init gic_of_init(struct device_node *node, struct
> device_node *parent)
>  {
>   void __iomem *dist_base;
> @@ -952,8 +979,10 @@ static int __init gic_of_init(struct device_node
> *node, struct device_node *pare
>  
>   err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions,
>redist_stride, &node->fwnode);
> - if (!err)
> + if (!err) {
> + gic_of_setup_kvm_info(node);
>   return 0;
> + }
>  
>  out_unmap_rdist:
>   for (i = 0; i < nr_redist_regions; i++)
> @@ -974,6 +1003,9 @@ static struct
>   struct redist_region *redist_regs;
>   u32 nr_redist_regions;
>   bool single_redist;
> + u32 maint_irq;
> + int maint_irq_mode;
> + phys_addr_t vcpu_base;
>  } acpi_data __initdata;
>  
>  static void __init
> @@ -1110,7 +1142,81 @@ static bool __init acpi_validate_gic_table(struct
> acpi_subtable_header *header,
>   return true;
>  }
>  
> +static int __init gic_acpi_parse_virt_madt_gicc(struct
> acpi_subtable_header *header,
> + const unsigned long end)
> +{
> + struct acpi_madt_generic_interrupt *gicc =
> + (struct acpi_madt_generic_interrupt *)header;
> + int maint_irq_mode;
> + static int first_madt = true;
> +
> + maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ?
> + ACPI_EDGE_SENSITIVE : ACPI_LEVEL_SENSITIVE;
> +
Do you think GICC parameters are valid for an unusable processor?
If not we need a validation check here, some thing like this to skip
GICC subtable entry.
 
if (!(gicc->flags & ACPI_MADT_ENABLED))
return 0;

> + if (first_madt) {
> + first_madt = false;
> +
> + acpi_data.maint_irq = gicc->vgic_interrupt;
> + acpi_data.maint_irq_mode = maint_irq_mode;
> + acpi_data.vcpu_base = gicc->gicv_base_address;
> +
> + return 0;
> + }
> +
> + /*
> +  * The maintenance interrupt and GICV should be the same for every
> CPU
> +  */
> + if ((acpi_data.maint_irq != gicc->vgic_interrupt) ||
> + (acpi_data.maint_irq_m

Re: [PATCH] arm64: CONFIG_DEVPORT should not be used when PCI is being used

2016-04-08 Thread Arnd Bergmann
On Thursday 07 April 2016, Al Stone wrote:
> >>>  config DEVPORT
> >>>   bool
> >>> - depends on !M68K
> >>> + depends on !M68K && !ARM64
> >>
> >> Why not fix the real bug here, it's odd that only these two arches need
> >> this disabled, don't you agree?
> 
> Agreed.  It does seem odd.  I'm not sure I understand which bug you're 
> thinking
> is the real one, though -- that DEVPORT should be disabled in all places that
> don't have ISA or that arm64 needs to have /dev/port work properly?  Or 
> perhaps
> I missed something else entirely...

We've had a similar problem recently with ISA drivers crashing when no PCI
host registers itself for the first 0x1000 I/O port addresses.

I think both the request_resource() function and /dev/ioport should be
changed to interface with the dynamic registration of I/O port ranges
so they only ever allow access on ports that are mapped into virtual memory.

Arnd


Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code

2016-04-08 Thread Luis R. Rodriguez
On Thu, Mar 17, 2016 at 03:56:47PM -0600, Toshi Kani wrote:
> On Wed, 2016-03-16 at 00:29 +0100, Luis R. Rodriguez wrote:
> > On x86 Linux code we now have ioremap_uc() that can't use MTRR behind the
> > scenes, why would something like this on the BIOS not be possible? That
> > ultimately uses set_pte_at(). What limitations are there on the BIOS
> > that prevent us from just using strong UC for PAT on the BIOS?
> 
> Because it requires to run in virtual mode with page tables.

I see now. Specifically, BIOSes run in real mode, and PAT uses
paging. Paging requires bit 31 on CR0 set (PG), and PG has no
effect if the PE flag (Protection Enable) bit 0 on CR0 is clear.
If PE is clear we have real mode, which is what the BIOS uses.

Stupid question then:

are there no use case for a BIOS to enter PE, even if just
limited to set paging attributes for instance. For the simple
sake of burying MTRR this seems worthy, but I wonder if there
are other paging needs a BIOS might find use for.

  Luis


Re: [PATCH 1/4] acpi,pci,irq: reduce resource requirements

2016-04-08 Thread Sinan Kaya
Hi Bjorn,

On 4/5/2016 7:21 PM, Bjorn Helgaas wrote:
>> I know you want to validate that all PCI interrupts are level, it looks like 
>> the code
>> > is allowing other combinations.
> It's not that we need to validate that all PCI interrupts are level.  What
> we need is to make sure all users sharing an IRQ agree on the mode.
> 

I posted v2. Let me know if the implementation feels like what you are looking 
for.

Sinan

-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project


[PATCH V2 1/3] acpi,pci,irq: reduce resource requirements

2016-04-08 Thread Sinan Kaya
Code has been redesigned to calculate penalty requirements on the fly. This
significantly simplifies the implementation and removes some of the init
calls from x86 architecture. Command line penalty assignment has been
limited to ISA interrupts only.

Signed-off-by: Sinan Kaya 
---
 drivers/acpi/pci_link.c | 176 ++--
 1 file changed, 140 insertions(+), 36 deletions(-)

diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
index ededa90..25695ea 100644
--- a/drivers/acpi/pci_link.c
+++ b/drivers/acpi/pci_link.c
@@ -437,17 +437,15 @@ static int acpi_pci_link_set(struct acpi_pci_link *link, 
int irq)
  * enabled system.
  */
 
-#define ACPI_MAX_IRQS  256
 #define ACPI_MAX_ISA_IRQ   16
 
-#define PIRQ_PENALTY_PCI_AVAILABLE (0)
 #define PIRQ_PENALTY_PCI_POSSIBLE  (16*16)
 #define PIRQ_PENALTY_PCI_USING (16*16*16)
 #define PIRQ_PENALTY_ISA_TYPICAL   (16*16*16*16)
 #define PIRQ_PENALTY_ISA_USED  (16*16*16*16*16)
 #define PIRQ_PENALTY_ISA_ALWAYS(16*16*16*16*16*16)
 
-static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
+static int acpi_isa_irq_penalty[ACPI_MAX_ISA_IRQ] = {
PIRQ_PENALTY_ISA_ALWAYS,/* IRQ0 timer */
PIRQ_PENALTY_ISA_ALWAYS,/* IRQ1 keyboard */
PIRQ_PENALTY_ISA_ALWAYS,/* IRQ2 cascade */
@@ -457,9 +455,9 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ6 */
PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ7 parallel, spurious */
PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ8 rtc, sometimes */
-   PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ9  PCI, often acpi */
-   PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ10 PCI */
-   PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ11 PCI */
+   0,  /* IRQ9  PCI, often acpi */
+   0,  /* IRQ10 PCI */
+   0,  /* IRQ11 PCI */
PIRQ_PENALTY_ISA_USED,  /* IRQ12 mouse */
PIRQ_PENALTY_ISA_USED,  /* IRQ13 fpe, sometimes */
PIRQ_PENALTY_ISA_USED,  /* IRQ14 ide0 */
@@ -467,6 +465,121 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
/* >IRQ15 */
 };
 
+static int acpi_link_trigger(int irq, u8 *polarity, u8 *triggering)
+{
+   struct acpi_pci_link *link;
+   bool found = false;
+
+   *polarity = ~0;
+   *triggering = ~0;
+
+   list_for_each_entry(link, &acpi_link_list, list) {
+   int i;
+
+   if (link->irq.active && link->irq.active == irq) {
+   if (*polarity == ~0)
+   *polarity = link->irq.polarity;
+
+   if (*triggering == ~0)
+   *triggering = link->irq.triggering;
+
+   if (*polarity != link->irq.polarity)
+   return -EINVAL;
+
+   if (*triggering != link->irq.triggering)
+   return -EINVAL;
+
+   found = true;
+   }
+
+   for (i = 0; i < link->irq.possible_count; i++)
+   if (link->irq.possible[i] == irq) {
+   if (*polarity == ~0)
+   *polarity = link->irq.polarity;
+
+   if (*triggering == ~0)
+   *triggering = link->irq.triggering;
+
+   if (*polarity != link->irq.polarity)
+   return -EINVAL;
+
+   if (*triggering != link->irq.triggering)
+   return -EINVAL;
+
+   found = true;
+   }
+   }
+
+   return found ? 0 : -EINVAL;
+}
+
+static int acpi_pci_compatible_trigger(int irq)
+{
+   u8 polarity;
+   u8 triggering;
+
+   return acpi_link_trigger(irq, &polarity, &triggering);
+}
+
+static int acpi_irq_pci_sharing_penalty(int irq)
+{
+   struct acpi_pci_link *link;
+   int penalty = 0;
+
+   if (acpi_pci_compatible_trigger(irq))
+   return ~0;
+
+   list_for_each_entry(link, &acpi_link_list, list) {
+   /*
+* If a link is active, penalize its IRQ heavily
+* so we try to choose a different IRQ.
+*/
+   if (link->irq.active && link->irq.active == irq)
+   penalty += PIRQ_PENALTY_PCI_USING;
+   else {
+   int i;
+
+   /*
+* If a link is inactive, penalize the IRQs it
+* might use, but not as severely.
+*/
+   for (i = 0; i < link->irq.possible_count; i++)
+   if (link->irq.possible[i] == irq)
+   penalty +

[PATCH V2 3/3] acpi,pci,irq: remove SCI penalize function

2016-04-08 Thread Sinan Kaya
Removing the SCI penalize function as the penalty is now calculated on the
fly.

Signed-off-by: Sinan Kaya 
---
 arch/x86/kernel/acpi/boot.c | 1 -
 drivers/acpi/pci_link.c | 9 -
 include/linux/acpi.h| 1 -
 3 files changed, 11 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index e759076..5e99f22 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -445,7 +445,6 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 
polarity, u16 trigger,
polarity = acpi_sci_flags & ACPI_MADT_POLARITY_MASK;
 
mp_override_legacy_irq(bus_irq, polarity, trigger, gsi);
-   acpi_penalize_sci_irq(bus_irq, trigger, polarity);
 
/*
 * stash over-ride to indicate we've been here
diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
index 63dae95..5b9a2d0 100644
--- a/drivers/acpi/pci_link.c
+++ b/drivers/acpi/pci_link.c
@@ -909,15 +909,6 @@ bool acpi_isa_irq_available(int irq)
 }
 
 /*
- * Penalize IRQ used by ACPI SCI. If ACPI SCI pin attributes conflict with
- * PCI IRQ attributes, mark ACPI SCI as ISA_ALWAYS so it won't be use for
- * PCI IRQs.
- */
-void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
-{
-}
-
-/*
  * Over-ride default table to reserve additional IRQs for use by ISA
  * e.g. acpi_irq_isa=5
  * Useful for telling ACPI how not to interfere with your ISA sound card.
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 06ed7e5..0f41317 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -311,7 +311,6 @@ struct pci_dev;
 int acpi_pci_irq_enable (struct pci_dev *dev);
 void acpi_penalize_isa_irq(int irq, int active);
 bool acpi_isa_irq_available(int irq);
-void acpi_penalize_sci_irq(int irq, int trigger, int polarity);
 void acpi_pci_irq_disable (struct pci_dev *dev);
 
 extern int ec_read(u8 addr, u8 *val);
-- 
1.8.2.1



[PATCH V2 2/3] acpi,pci,irq: remove redundant code in acpi_irq_penalty_init

2016-04-08 Thread Sinan Kaya
acpi_irq_get_penalty is now calculating the penalty on the fly now.
No need to maintain global list of penalties or calculate them
at the init time. Removing duplicate code in acpi_irq_penalty_init.

Signed-off-by: Sinan Kaya 
---
 arch/x86/pci/acpi.c |  1 -
 drivers/acpi/pci_link.c | 36 
 include/acpi/acpi_drivers.h |  1 -
 3 files changed, 38 deletions(-)

diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 3cd6983..b2a4e2a 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -396,7 +396,6 @@ int __init pci_acpi_init(void)
return -ENODEV;
 
printk(KERN_INFO "PCI: Using ACPI for IRQ routing\n");
-   acpi_irq_penalty_init();
pcibios_enable_irq = acpi_pci_irq_enable;
pcibios_disable_irq = acpi_pci_irq_disable;
x86_init.pci.init_irq = x86_init_noop;
diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
index 25695ea..63dae95 100644
--- a/drivers/acpi/pci_link.c
+++ b/drivers/acpi/pci_link.c
@@ -580,42 +580,6 @@ static int acpi_irq_get_penalty(int irq)
return penalty;
 }
 
-int __init acpi_irq_penalty_init(void)
-{
-   struct acpi_pci_link *link;
-   int i;
-
-   /*
-* Update penalties to facilitate IRQ balancing.
-*/
-   list_for_each_entry(link, &acpi_link_list, list) {
-
-   /*
-* reflect the possible and active irqs in the penalty table --
-* useful for breaking ties.
-*/
-   if (link->irq.possible_count) {
-   int penalty =
-   PIRQ_PENALTY_PCI_POSSIBLE /
-   link->irq.possible_count;
-
-   for (i = 0; i < link->irq.possible_count; i++) {
-   if (link->irq.possible[i] < ACPI_MAX_ISA_IRQ)
-   acpi_isa_irq_penalty[link->irq.
-possible[i]] +=
-   penalty;
-   }
-
-   } else if (link->irq.active &&
-  (link->irq.active < ACPI_MAX_ISA_IRQ)) {
-   acpi_isa_irq_penalty[link->irq.active] +=
-   PIRQ_PENALTY_PCI_POSSIBLE;
-   }
-   }
-
-   return 0;
-}
-
 static int acpi_irq_balance = -1;  /* 0: static, 1: balance */
 
 static int acpi_pci_link_allocate(struct acpi_pci_link *link)
diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h
index 29c6912..797ae2e 100644
--- a/include/acpi/acpi_drivers.h
+++ b/include/acpi/acpi_drivers.h
@@ -78,7 +78,6 @@
 
 /* ACPI PCI Interrupt Link (pci_link.c) */
 
-int acpi_irq_penalty_init(void);
 int acpi_pci_link_allocate_irq(acpi_handle handle, int index, int *triggering,
   int *polarity, char **name);
 int acpi_pci_link_free_irq(acpi_handle handle);
-- 
1.8.2.1



RE: [PATCH -v2] drivers: net: ethernet: intel: e1000e: fix ethtool autoneg off for non-copper

2016-04-08 Thread Brown, Aaron F
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Daniel Walker
> Sent: Tuesday, April 5, 2016 11:30 AM
> To: Ruinskiy, Dima ; Kirsher, Jeffrey T
> ; Brandeburg, Jesse
> ; Nelson, Shannon
> ; Wyborny, Carolyn
> ; Skidmore, Donald C
> ; Allan, Bruce W ;
> Ronciak, John ; Williams, Mitch A
> 
> Cc: Steve Shih ; xe-ker...@external.cisco.com; Daniel
> Walker ; intel-wired-...@lists.osuosl.org;
> net...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH -v2] drivers: net: ethernet: intel: e1000e: fix ethtool 
> autoneg
> off for non-copper
> 
> From: Steve Shih 
> 
> This patch fixes the issues for disabling auto-negotiation and forcing
> speed and duplex settings for the non-copper media.
> 
> For non-copper media, e1000_get_settings should return
> ETH_TP_MDI_INVALID for
> eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent
> e1000_set_settings
> call would not fail with -EOPNOTSUPP.
> 
> e1000_set_spd_dplx should not automatically turn autoneg back on for
> forced
> 1000 Mbps full duplex settings for non-copper media.
> 
> Cc: xe-ker...@external.cisco.com
> Cc: Daniel Walker 
> Signed-off-by: Steve Shih 
> ---
>  drivers/net/ethernet/intel/e1000e/ethtool.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Tested-by: Aaron Brown 



Re: [RFC][PATCH] MAINTAINERS: Add Android Ion as a separate entry

2016-04-08 Thread Greg Kroah-Hartman
On Fri, Apr 08, 2016 at 04:35:25PM -0700, Laura Abbott wrote:
> The android drivers have a few other people reviewing patches.
> Add a separate entry to ensure patches go to the right people.
> 
> Signed-off-by: Laura Abbott 
> ---
> Sumit and I have been doing review anyway so I think it makes sense for
> us to be cc-ed on patches in addition to the generic Android maintainers.
> Anyone else who wants to join in is welcome.

Thanks for this, I'll queue it up soon.

greg k-h


Re: [RFT v2] iommu/amd: use subsys_initcall() on amdv2 iommu

2016-04-08 Thread Luis R. Rodriguez
On Tue, Mar 29, 2016 at 10:41 AM, Luis R. Rodriguez  wrote:
> We need to ensure amd iommu v2 initializes before
> driver uses such as drivers/gpu/drm/amd/amdkfd/kfd_module.c,
> to do this make its init routine a subsys_initcall() which
> ensures its load init is called first than modules when
> built-in.
>
> This reverts the old work around implemented through commit
> 1bacc894c227fad8a7 ("drivers: Move iommu/ before gpu/ in Makefile"),
> instead of making the dependency implicit by linker order this
> makes the ordering requirement explicit through proper kernel
> APIs.
>
> Cc: Oded Gabbay 
> Cc: Christian König 
> Signed-off-by: Luis R. Rodriguez 

*poke*

 Luis


[PATCH v1 0/2] x86/init: extend quirk use

2016-04-08 Thread Luis R. Rodriguez
This extends use of the quirks to other platorms as hinted as possible
and confirmed by hpa [0]. This small series depends on the work that added
this functionality [1] [2] to replace the paravirt_enabled() hacks
which is currently under review, sending this series separately. Its worth
reviewing already.

What seems a bit odd is CE4100 leaves RTC enabled, can someone
confirm if indeed it really needs it, or can it also disable it
as with Xen, lguest, and Intel MID ?

[0] http://lkml.kernel.org/r/5702b5c2.7070...@zytor.com
[1] http://lkml.kernel.org/r/1460158825-13117-1-git-send-email-mcg...@kernel.org
[2] 
https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20160408-pv-disabled-v5

Luis R. Rodriguez (2):
  x86/init: disable pnpbios for X86_SUBARCH_INTEL_MID
  x86/init: disable pnpbios for X86_SUBARCH_CE4100

 arch/x86/kernel/platform-quirks.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.7.2



[PATCH v1 1/2] x86/init: disable pnpbios for X86_SUBARCH_INTEL_MID

2016-04-08 Thread Luis R. Rodriguez
As per hpa Intel MID platforms can also disable pnpbios [0].

[0] http://lkml.kernel.org/r/5702b5c2.7070...@zytor.com

Suggested-by: H. Peter Anvin 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/platform-quirks.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/platform-quirks.c 
b/arch/x86/kernel/platform-quirks.c
index ab643825a7aa..853919484340 100644
--- a/arch/x86/kernel/platform-quirks.c
+++ b/arch/x86/kernel/platform-quirks.c
@@ -16,10 +16,8 @@ void __init x86_early_init_platform_quirks(void)
break;
case X86_SUBARCH_XEN:
case X86_SUBARCH_LGUEST:
-   x86_platform.legacy.devices.pnpbios = 0;
-   x86_platform.legacy.rtc = 0;
-   break;
case X86_SUBARCH_INTEL_MID:
+   x86_platform.legacy.devices.pnpbios = 0;
x86_platform.legacy.rtc = 0;
break;
}
-- 
2.7.2



[PATCH v1 2/2] x86/init: disable pnpbios for X86_SUBARCH_CE4100

2016-04-08 Thread Luis R. Rodriguez
As per hpa CE4100 platforms can also disable pnpbios [0].

[0] http://lkml.kernel.org/r/5702b5c2.7070...@zytor.com

Suggested-by: H. Peter Anvin 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/platform-quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/platform-quirks.c 
b/arch/x86/kernel/platform-quirks.c
index 853919484340..1b63daf0dd06 100644
--- a/arch/x86/kernel/platform-quirks.c
+++ b/arch/x86/kernel/platform-quirks.c
@@ -20,6 +20,9 @@ void __init x86_early_init_platform_quirks(void)
x86_platform.legacy.devices.pnpbios = 0;
x86_platform.legacy.rtc = 0;
break;
+   case X86_SUBARCH_CE4100:
+   x86_platform.legacy.devices.pnpbios = 0;
+   break;
}
 
if (x86_platform.set_legacy_features)
-- 
2.7.2



Re: [PATCH v5 00/14] x86: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
BTW also here's a tree if someone needs it:

https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20160408-pv-disabled-v5

  Luis


[PATCH v5 10/14] x86/cpu/intel: remove not needed paravirt_enabled() for f00f work around

2016-04-08 Thread Luis R. Rodriguez
The X86_BUG_F00F work around is responsible for fixing up the error
generated on attempted F00F exploitation from an OOPS to a SIGILL.
There is no reason why this code should not be allowed to run on
PV guest on a F00F-affected CPU -- it would simply never trigger.
The pv_enabled() check was there only to avoid printing the f00f
workaround, so removing the check is purely a cosmetic change.

Suggested-by: Andy Lutomirski 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/cpu/intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f71a34944b56..66509285ffdd 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -233,7 +233,7 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
 * The Quark is also family 5, but does not have the same bug.
 */
clear_cpu_bug(c, X86_BUG_F00F);
-   if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
+   if (c->x86 == 5 && c->x86_model < 9) {
static int f00f_workaround_enabled;
 
set_cpu_bug(c, X86_BUG_F00F);
-- 
2.7.2



[PATCH v5 14/14] x86/paravirt: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
That that paravirt_enabled() is replaced with proper
x86 semantics we can remove it.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/include/asm/paravirt.h   | 5 -
 arch/x86/include/asm/paravirt_types.h | 1 -
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/kvm.c | 8 
 arch/x86/kernel/paravirt.c| 1 -
 arch/x86/lguest/boot.c| 2 --
 arch/x86/xen/enlighten.c  | 1 -
 7 files changed, 19 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6c7a4a192032..dff26bc91b17 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,11 +15,6 @@
 #include 
 #include 
 
-static inline int paravirt_enabled(void)
-{
-   return pv_info.paravirt_enabled;
-}
-
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 6acc1b26cf40..7fedf24bd811 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -69,7 +69,6 @@ struct pv_info {
u16 extra_user_64bit_cs;  /* __USER_CS if none */
 #endif
 
-   int paravirt_enabled;
const char *name;
 };
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0c70c7daa6b8..8d326e822cb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -473,7 +473,6 @@ static inline unsigned long current_top_of_stack(void)
 #include 
 #else
 #define __cpuidnative_cpuid
-#define paravirt_enabled() 0
 
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index dc1207e2f193..eea2a6f72b31 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -285,14 +285,6 @@ static void __init paravirt_ops_setup(void)
 {
pv_info.name = "KVM";
 
-   /*
-* KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
-* guest kernel works like a bare metal kernel with additional
-* features, and paravirt_enabled is about features that are
-* missing.
-*/
-   pv_info.paravirt_enabled = 0;
-
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
pv_cpu_ops.io_delay = kvm_io_delay;
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index f08ac28b8136..71a2d8a05a66 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -294,7 +294,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 
 struct pv_info pv_info = {
.name = "bare hardware",
-   .paravirt_enabled = 0,
.kernel_rpl = 0,
.shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
 
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index f5497ee5fd2f..3847e736702e 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1408,8 +1408,6 @@ __init void lguest_init(void)
 {
/* We're under lguest. */
pv_info.name = "lguest";
-   /* Paravirt is enabled. */
-   pv_info.paravirt_enabled = 1;
/* We're running at privilege level 1, not 0 as normal. */
pv_info.kernel_rpl = 1;
/* Everyone except Xen runs with this set. */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e066fcf87c3d..7c1da39623f4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1186,7 +1186,6 @@ static unsigned xen_patch(u8 type, u16 clobbers, void 
*insnbuf,
 }
 
 static const struct pv_info xen_info __initconst = {
-   .paravirt_enabled = 1,
.shared_kernel_pmd = 0,
 
 #ifdef CONFIG_X86_64
-- 
2.7.2



[PATCH v5 09/14] x86/tboot: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
There is already a check for boot_params.tboot_addr prior
to paravirt_enabled(). Both Xen and lguest, which are also the
only ones that set paravirt_enabled to true, never set the
boot_params.tboot_addr. The Xen folks are sure a force disable
to 0 is not needed, we recently forced disabled this on lguest.
With this in place this check is no longer needed.

Xen folks are sure force disable to 0 is not needed because
apm_info lives in .bss, we recently forced disabled this on
lguest, and on the Xen side just to be sure Boris zeroed out
the .bss for PV guests through commit 04b6b4a56884327c1648
("xen/x86: Zero out .bss for PV guests"). With this care taken
into consideration the paravirt_enabled() check is simply not
needed anymore.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/tboot.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index e72a07f20b05..9b0185fbe3eb 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -74,12 +74,6 @@ void __init tboot_probe(void)
return;
}
 
-   /* only a natively booted kernel should be using TXT */
-   if (paravirt_enabled()) {
-   pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
-   return;
-   }
-
/* Map and check for tboot UUID. */
set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
-- 
2.7.2



[PATCH v5 07/14] tools/lguest: force disable tboot and apm

2016-04-08 Thread Luis R. Rodriguez
The paravirt_enabled() check is going away, the area tossed to
the kernel on lguest is not zerored out, so ensure lguest force
disables tboot and apm just in case the kernel file being read might
have this set for whatever reason.

Signed-off-by: Luis R. Rodriguez 
---
 tools/lguest/lguest.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/lguest/lguest.c b/tools/lguest/lguest.c
index ff0aa580c6e1..0aa75af6e862 100644
--- a/tools/lguest/lguest.c
+++ b/tools/lguest/lguest.c
@@ -3357,6 +3357,12 @@ int main(int argc, char *argv[])
/* Tell the entry path not to try to reload segment registers. */
boot->hdr.loadflags |= KEEP_SEGMENTS;
 
+   /* We don't support tboot */
+   boot->tboot_addr = 0;
+
+   /* Ensure this is 0 to prevent apm from loading */
+   boot->apm_bios_info.version = 0;
+
/* We tell the kernel to initialize the Guest. */
tell_kernel(start);
 
-- 
2.7.2



[PATCH v5 05/14] x86, ACPI: move ACPI_FADT_NO_CMOS_RTC check to ACPI boot code

2016-04-08 Thread Luis R. Rodriguez
This moves the ACPI specific check into the ACPI boot code,
it also takes advantage of the x86_platform.legacy.rtc which
is checked for already on the RTC initialization code. This
lets us remove the nasty #ifdefery and consolidate the checks
to use only one toggle to disable the RTC init code.

The works as RTC is initialized by device_initcall(add_rtc_cmos),
this will run late in boot on start_kernel() during rest_init(),
acpi_parse_fadt() gets called earlier during setup_arch().

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/acpi/boot.c | 4 
 arch/x86/kernel/rtc.c   | 8 
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 8c2f1ef6ca23..8c9c2bdba092 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -913,6 +913,10 @@ late_initcall(hpet_insert_resource);
 
 static int __init acpi_parse_fadt(struct acpi_table_header *table)
 {
+   if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) {
+   pr_debug("ACPI: not registering RTC platform device\n");
+   x86_platform.legacy.rtc = 0;
+   }
 
 #ifdef CONFIG_X86_PM_TIMER
/* detect the location of the ACPI PM Timer */
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 62c48da3889d..ff4f4180fefd 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -189,14 +189,6 @@ static __init int add_rtc_cmos(void)
if (of_have_populated_dt())
return 0;
 
-#ifdef CONFIG_ACPI
-   if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) {
-   /* This warning can likely go away again in a year or two. */
-   pr_info("ACPI: not registering RTC platform device\n");
-   return -ENODEV;
-   }
-#endif
-
if (!x86_platform.legacy.rtc)
return -ENODEV;
 
-- 
2.7.2



[PATCH v5 02/14] x86/xen: use X86_SUBARCH_XEN for PV guest boots

2016-04-08 Thread Luis R. Rodriguez
The use of subarch should have no current effect on Xen
PV guests, as such this should have no current functional
effects.

Reviewed-by: David Vrabel 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/xen/enlighten.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 9b8f1eacc110..40487f1ecb4c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1661,6 +1661,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
boot_params.hdr.ramdisk_image = initrd_start;
boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
+   boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
 
if (!xen_initial_domain()) {
add_preferred_console("xenboot", 0, NULL);
-- 
2.7.2



[PATCH v5 04/14] x86/rtc: replace paravirt rtc check with platform legacy quirk

2016-04-08 Thread Luis R. Rodriguez
We have 4 types of x86 platforms that disable RTC:

  * Intel MID
  * Lguest - uses paravirt
  * Xen dom-U - uses paravirt
  * x86 on legacy systems annotated with an ACPI legacy flag

We can consolidate all of these into a platform specific legacy
quirk set early in boot through i386_start_kernel() and through
x86_64_start_reservations(). This deals with the RTC quirks which
we can rely on through the hardware subarch, the ACPI check can
be dealt with separately.

For Xen things are bit more complex given that the @X86_SUBARCH_XEN
x86_hardware_subarch is shared on for Xen which uses the PV path for
both domU and dom0. Since the semantics for differentiating between
the two are Xen specific we provide a platform helper to help override
default legacy features -- x86_platform.set_legacy_features(). Use
of this helper is highly discouraged, its only purpose should be
to account for the lack of semantics available within your given
x86_hardware_subarch.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.textx86_early_init_platform_quirks()
+70 +62+62  +43

Only 8 bytes overhead total, as the main increase in size is
all removed via __init.

v2: split the subarch check from the ACPI check, clarify
on the ACPI change commit log why ordering works
v3: add x86_platform.set_legacy_features() to account for dom0,
add also size impact on vmlinux as per 0-day report

Suggested-by: Ingo Molnar 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/Makefile |  1 +
 arch/x86/include/asm/paravirt.h   |  6 --
 arch/x86/include/asm/paravirt_types.h |  5 -
 arch/x86/include/asm/processor.h  |  1 -
 arch/x86/include/asm/x86_init.h   | 21 +
 arch/x86/kernel/Makefile  |  6 +-
 arch/x86/kernel/head32.c  |  2 ++
 arch/x86/kernel/head64.c  |  1 +
 arch/x86/kernel/platform-quirks.c | 21 +
 arch/x86/kernel/rtc.c |  7 ++-
 arch/x86/lguest/boot.c|  1 -
 arch/x86/xen/enlighten.c  | 10 +++---
 12 files changed, 60 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kernel/platform-quirks.c

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4086abca0b32..f9ed8a7ce2b6 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -209,6 +209,7 @@ endif
 head-y := arch/x86/kernel/head_$(BITS).o
 head-y += arch/x86/kernel/head$(BITS).o
 head-y += arch/x86/kernel/head.o
+head-y += arch/x86/kernel/platform-quirks.o
 
 libs-y  += arch/x86/lib/
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 601f1b8f9961..6c7a4a192032 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -20,12 +20,6 @@ static inline int paravirt_enabled(void)
return pv_info.paravirt_enabled;
 }
 
-static inline int paravirt_has_feature(unsigned int feature)
-{
-   WARN_ON_ONCE(!pv_info.paravirt_enabled);
-   return (pv_info.features & feature);
-}
-
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index e8c2326478c8..6acc1b26cf40 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -70,14 +70,9 @@ struct pv_info {
 #endif
 
int paravirt_enabled;
-   unsigned int features;/* valid only if paravirt_enabled is set */
const char *name;
 };
 
-#define paravirt_has(x) paravirt_has_feature(PV_SUPPORTED_##x)
-/* Supported features */
-#define PV_SUPPORTED_RTC(1<<0)
-
 struct pv_init_ops {
/*
 * Patch may replace one of the defined code sequences with
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9264476f3d57..0c70c7daa6b8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuidnative_cpuid
 #define paravirt_enabled() 0
-#define paravirt_has(x)0
 
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 1ae89a2721d6..8bb8c1a4615a 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -142,6 +142,15 @@ struct x86_cpuinit_ops {
 struct timespec;
 
 /**
+ * struct x86_legacy_features - legacy x86 features
+ *
+ * @rtc: this device has a CMOS real-time clock present
+ */
+struct x86_legacy_features {
+   int rtc;
+};
+
+/**
  * struct x86_platform_ops - platform specific runtime functions
  * @calibrate_tsc: calibrate TSC
  * @get_wallclock: get time from HW clock like RTC etc.
@@ -152,6 +161,14 @@ struct timespe

Re: [PATCH v2] x86: Calculate MHz using APERF/MPERF for cpuinfo and scaling_cur_freq

2016-04-08 Thread Len Brown
> I have a minor ABI concern with this patch.  It seems that there is much more
> variance in the output of "cpu MHz" with this patch, and I think that
> needs to be noted in the changelog.
>
> ISTR having a conversation a while ago (with you Len?  with Srinivas?)
> where I mentioned that "cpu MHz" used to just reflect the "marketing"
> frequency of the processors on the system.  Is it worth going back to
> that static state, and leaving the calculation for the current frequency to
> userspace programs like turbostat, cpupower, etc.?
>
> FWIW: I *regularly* get bugzillas filed from people who do not understand
> that "cpu MHz" shows the current frequency of the core.  I've often
> thought it would be easier to make that value static ...

I am fine with always printing static cpu_khz in /proc/cpuinfo on all machines.

If it were up to me, I would not have allowed the cpufreq sub-system
to start messing with this.

But it did, and I figured the genie was out of the bottle.
Assuming I'd never be able to get the community to agree to stuff the
genie back in the bottle,
I figured that this file should show a value that actually means something,
and isn't completely different depending on the choice of cpufreq
driver being used on that system.  Indeed, your comment on variability
is right on the money, this solution is less "variable" than some drivers,
such as intel_pstate, and more variable than others, such as acpi-cpufreq.
Neither of those drivers return a value that is particularly meaningful.
This solution at least, has a semantic definition.

Len Brown, Intel Open Source Technology Center


Re: [PATCH v1 06/10] device property: switch to use UUID API

2016-04-08 Thread huang ying
On Fri, Apr 8, 2016 at 6:00 PM, Andy Shevchenko
 wrote:
> On Fri, 2016-04-08 at 09:27 +0800, Huang, Ying wrote:
>> Andy Shevchenko  writes:
>>
>> >
>> > On Fri, 2016-02-26 at 16:11 +0200, Andy Shevchenko wrote:
>> > >
>> > > On Thu, 2016-02-18 at 01:03 +0100, Rafael J. Wysocki wrote:
>> > > >
>> > > >
>> > > > On Wednesday, February 17, 2016 02:17:24 PM Andy Shevchenko
>> > > > wrote:
>> > > > >
>> > > > >
>> > > > > Switch to use a generic UUID API instead of custom approach.
>> > > > > It
>> > > > > allows to
>> > > > > define UUIDs, compare them, and validate.
>> > > []
>> > >
>> > Summon initial author of the UUID library.
>> >
>> > Summary: the API of comparison functions is rather strange. What the
>> > point to not take pointers directly? (Moreover I hope compiler too
>> > clever not to make a copy of constant arguments there)
>> >
>> > I could only imagine the case you are trying to avoid temporary
>> > variables for constants like NULL_UUID.
>> >
>> > Issue with this is the ugliness in the users of that, in
>> > particularly
>> > present in ACPI (drivers/acpi/apei/ghes.c).
>> >
>> > I would like to have more clear interface for that. Perhaps we may
>> > add
>> > something like
>> >
>> > cmp_p(pointer, non-pointer);
>> > cmp_pp(pointer, pointer);
>> >
>> > to not break existing API for now.
>> >
>> > It would be useful for many cases in the kernel.
>> You can take a look at the drivers/acpi/apei/erst.c for uuid_le_cmp
>> usage.
>>
>> #define
>> CPER_CREATOR_PSTORE \
>> UUID_LE(0x75a574e3, 0x5052, 0x4b29, 0x8a, 0x8e, 0xbe,
>> 0x2c, \
>> 0x64, 0x90, 0xb8, 0x9d)
>>
>> if (uuid_le_cmp(rcd->hdr.creator_id, CPER_CREATOR_PSTORE) !=
>> 0)
>> goto skip;
>>
>> Looks better?
>
> I don't quite understand the issues with
>
> if (uuid_le_cmp(&rcd->hdr.creator_id, &CPER_CREATOR_PSTORE) != 0)

I tried to make uuid_le looks like a primitive data type and UUID
constant looks like primitive type constants if possible.  If we can
define data as uuid_le/be, then it will look just like that.  But if
there are too many places we cannot use uuid_le/be directly, I am OK
to convert the interface to use pointer instead.

> or, like I mentioned previously, we may introduce _cmp_p() and use like
>
> if (uuid_le_cmp_p(&rcd->hdr.creator_id, CPER_CREATOR_PSTORE) != 0)

Personally, I don't like this interface. It is better for two
parameters to have same data type.

> if it looks better (again, I don't know if compiler is going to copy the last 
> argument).
>
>>
>> This is the typical use case in mind when I write the uuid.h.
>>
>> As for uuid_le_cmp usage in drivers/acpi/apei/ghes.c,
>>
>>   if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>>CPER_SEC_PLATFORM_MEM)) {
>
> Ditto
>
> if (!uuid_le_cmp_p((uuid_le *)gdata->section_type,
> CPER_SEC_PLATFORM_MEM)) {
>
>>
>> The code looks not good mainly because acpi_hest_generic_data is not
>> defined with uuid_le in mind.
>>
>> struct acpi_hest_generic_data {
>>   u8 section_type[16];
>>   u32 error_severity;
>>   u16 revision;
>>   u8 validation_bits;
>>   u8 flags;
>>   u32 error_data_length;
>>   u8 fru_id[16];
>>   u8 fru_text[20];
>> };
>>
>> If section_type was defined as uuid_le instead of u8[16], the
>> uuid_le_cmp usage would look better.  So I suggest to use uuid_le/be
>> in
>> data structure definition in new code if possible.
>
> This is understandable for such structures, but we might get a UUID from
> a buffer which is pointer to u8. It's not possible to convert to uuid_*
> since it's too generic stuff and might require to introduce
> ACPI_TYPE_UUID with standardization and all necessary work. Apparently
> not the shortest way.

If this is just a special case that happens seldom, we can just work
around it with *(uuid_le/be *)buf.  If it is common, we can change the
interface or add a new interface.

Best Regards,
Huang, YIng

>> >
>> > >
>> > > >
>> > > >
>> > > > >
>> > > > >
>> > > > > +static const uuid_le ads_uuid =
>> > > > > + UUID_LE(0xdbb8e3e6, 0x5886, 0x4ba6,
>> > > > > + 0x87, 0x95, 0x13, 0x19, 0xf5, 0x2a, 0x96,
>> > > > > 0x6b);
>> > > > >
>> > > > >  static bool acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > > > >  const union
>> > > > > acpi_object
>> > > > > *desc,
>> > > > > @@ -138,7 +136,7 @@ static bool
>> > > > > acpi_enumerate_nondev_subnodes(acpi_handle scope,
>> > > > >   || links->type != ACPI_TYPE_PACKAGE)
>> > > > >   break;
>> > > > >
>> > > > > - if (memcmp(uuid->buffer.pointer, ads_uuid,
>> > > > > sizeof(ads_uuid)))
>> > > > > + if (uuid_le_cmp(*(uuid_le *)uuid-
>> > > > > >buffer.pointer,
>> > > > > ads_uuid))
>> > > > Maybe it's too late, but I don't quite understand the pointer
>> > > > manipulations here.
>> > > >
>> > > > I can see why you need 

[PATCH v5 00/14] x86: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
This v5 updates the subarch documentation to annotate that
X86_SUBARCH_XEN can be use for both Xen dom0 and domU, and
adds an optional x86_platform.set_legacy_features() in order
to deal with further platform legacy fine tunings when the
platform requires further semantics than what is currently
available generically, and we've determined we don't need
these semantics added in a generic form to x86. In this case
this was needed for Xen given X86_SUBARCH_XEN can be used for
both dom0 and domU and Xen needs to enable RTC for dom0.

I suspect the hook can possibly later be used for further
fine tunings for HVMLite as it will use X86_SUBARCH_PC, so
its placed towards the end of x86_early_init_platform_quirks()
to enable any platform to take advantage of this.

This also updates the commit logs a bit more to make some
clarifications, and lists the impact of vmlinux size as per
trusty good 'ol 0-day.

The total size impact on vmlinux using i386-tinyconfig:

TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
+136+125   +125+96

In total that's only 11 byte overhead, 125 bytes are all .init.text
and that's all freed after boot.

Luis R. Rodriguez (14):
  x86/boot: enumerate documentation for the x86 hardware_subarch
  x86/xen: use X86_SUBARCH_XEN for PV guest boots
  tools/lguest: make lguest launcher use X86_SUBARCH_LGUEST explicitly
  x86/rtc: replace paravirt rtc check with platform legacy quirk
  x86, ACPI: move ACPI_FADT_NO_CMOS_RTC check to ACPI boot code
  x86/init: use a platform legacy quirk for ebda
  tools/lguest: force disable tboot and apm
  apm32: remove paravirt_enabled() use
  x86/tboot: remove paravirt_enabled()
  x86/cpu/intel: remove not needed paravirt_enabled() for f00f work
around
  pnpbios: replace paravirt_enabled() check with legacy device check
  x86, ACPI: parse ACPI_FADT_LEGACY_DEVICES
  x86/init: rename ebda code file
  x86/paravirt: remove paravirt_enabled()

 arch/x86/Makefile |  3 ++-
 arch/x86/include/asm/paravirt.h   | 11 
 arch/x86/include/asm/paravirt_types.h |  6 -
 arch/x86/include/asm/processor.h  |  2 --
 arch/x86/include/asm/x86_init.h   | 50 +++
 arch/x86/include/uapi/asm/bootparam.h | 37 +-
 arch/x86/kernel/Makefile  |  6 -
 arch/x86/kernel/acpi/boot.c   |  9 +++
 arch/x86/kernel/apm_32.c  |  2 +-
 arch/x86/kernel/cpu/intel.c   |  2 +-
 arch/x86/kernel/{head.c => ebda.c}|  2 +-
 arch/x86/kernel/head32.c  |  2 ++
 arch/x86/kernel/head64.c  |  1 +
 arch/x86/kernel/kvm.c |  8 --
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/platform-quirks.c | 36 +
 arch/x86/kernel/rtc.c | 15 ++-
 arch/x86/kernel/tboot.c   |  6 -
 arch/x86/lguest/boot.c|  3 ---
 arch/x86/xen/enlighten.c  | 12 ++---
 drivers/pnp/pnpbios/core.c|  3 ++-
 include/linux/pnp.h   |  2 ++
 tools/lguest/lguest.c | 10 +--
 23 files changed, 166 insertions(+), 63 deletions(-)
 rename arch/x86/kernel/{head.c => ebda.c} (98%)
 create mode 100644 arch/x86/kernel/platform-quirks.c

-- 
2.7.2



[PATCH v5 11/14] pnpbios: replace paravirt_enabled() check with legacy device check

2016-04-08 Thread Luis R. Rodriguez
Since we are removing paravirt_enabled() replace it with a
logical equivalent. Even though PNPBIOS is x86 specific we
add an arch-specific type call, which can be implemented by
any architecture to show how other legacy attribute devices
can later be also checked for with other ACPI legacy attribute
flags.

This implicates the first ACPI 5.2.9.3 IA-PC Boot Architecture
ACPI_FADT_LEGACY_DEVICES flag device, and shows how to add more.

The reason pnpbios gets a defined structure and as such uses
a different approach than the RTC legacy quirk is that ACPI
has a respective RTC flag, while pnpbios does not. We fold
the pnpbios quirk under ACPI_FADT_LEGACY_DEVICES ACPI flag
use case, and use a struct of possible devices to enable
future extensions of this.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
+32 +28+28 +28

That's 4 byte overhead total, the rest is cleared out on init
as its all __init text.

v2: split out subarch handlng on switch to make it easier
later to add other subarchs. The 'fall-through' switch
handling can be confusing and we'll remove it later
when we add handling for X86_SUBARCH_CE4100.
v3: document vmlinux size impact as per 0-day, and also
explain why pnpbios is treated differently than the
RTC legacy feature.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/include/asm/x86_init.h   | 26 ++
 arch/x86/kernel/platform-quirks.c | 11 +++
 drivers/pnp/pnpbios/core.c|  3 ++-
 include/linux/pnp.h   |  2 ++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 89d9d57e145d..4dcdf74dfed8 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -142,15 +142,41 @@ struct x86_cpuinit_ops {
 struct timespec;
 
 /**
+ * struct x86_legacy_devices - legacy x86 devices
+ *
+ * @pnpbios: this platform can have a PNPBIOS. If this is disabled the platform
+ * is known to never have a PNPBIOS.
+ *
+ * These are devices known to require LPC or ISA bus. The definition of legacy
+ * devices adheres to the ACPI 5.2.9.3 IA-PC Boot Architecture flag
+ * ACPI_FADT_LEGACY_DEVICES. These devices consist of user visible devices on
+ * the LPC or ISA bus. User visible devices are devices that have end-user
+ * accessible connectors (for example, LPT parallel port). Legacy devices on
+ * the LPC bus consist for example of serial and parallel ports, PS/2 keyboard
+ * / mouse, and the floppy disk controller. A system that lacks all known
+ * legacy devices can assume all devices can be detected exclusively via
+ * standard device enumeration mechanisms including the ACPI namespace.
+ *
+ * A system which has does not have ACPI_FADT_LEGACY_DEVICES enabled must not
+ * have any of the legacy devices enumerated below present.
+ */
+struct x86_legacy_devices {
+   int pnpbios;
+};
+
+/**
  * struct x86_legacy_features - legacy x86 features
  *
  * @rtc: this device has a CMOS real-time clock present
  * @ebda_search: it's safe to search for the EBDA signature in the hardware's
  * low RAM
+ * @devices: legacy x86 devices, refer to struct x86_legacy_devices
+ * documentation for further details.
  */
 struct x86_legacy_features {
int rtc;
int ebda_search;
+   struct x86_legacy_devices devices;
 };
 
 /**
diff --git a/arch/x86/kernel/platform-quirks.c 
b/arch/x86/kernel/platform-quirks.c
index 01b159781d96..ab643825a7aa 100644
--- a/arch/x86/kernel/platform-quirks.c
+++ b/arch/x86/kernel/platform-quirks.c
@@ -8,6 +8,7 @@ void __init x86_early_init_platform_quirks(void)
 {
x86_platform.legacy.rtc = 1;
x86_platform.legacy.ebda_search = 0;
+   x86_platform.legacy.devices.pnpbios = 1;
 
switch (boot_params.hdr.hardware_subarch) {
case X86_SUBARCH_PC:
@@ -15,6 +16,9 @@ void __init x86_early_init_platform_quirks(void)
break;
case X86_SUBARCH_XEN:
case X86_SUBARCH_LGUEST:
+   x86_platform.legacy.devices.pnpbios = 0;
+   x86_platform.legacy.rtc = 0;
+   break;
case X86_SUBARCH_INTEL_MID:
x86_platform.legacy.rtc = 0;
break;
@@ -23,3 +27,10 @@ void __init x86_early_init_platform_quirks(void)
if (x86_platform.set_legacy_features)
x86_platform.set_legacy_features();
 }
+
+#if defined(CONFIG_PNPBIOS)
+bool __init arch_pnpbios_disabled(void)
+{
+   return x86_platform.legacy.devices.pnpbios == 0;
+}
+#endif
diff --git a/drivers/pnp/pnpbios/core.c b/drivers/pnp/pnpbios/core.c
index facd43b8516c..81603d99082b 100644
--- a/drivers/pnp/pnpbios/core.c
+++ b/drivers/pnp/pnpbios/core.c
@@ -521,10 +521,11 @@ static int __init pnpbios_init(void)
int ret;
 
if (pnpbios_disabled || dmi_check_system(pnpbios_dmi_table) ||
-   paravirt_enabled

[PATCH v5 09/14] x86/tboot: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
There is already a check for boot_params.tboot_addr prior
to paravirt_enabled(). Both Xen and lguest, which are also the
only ones that set paravirt_enabled to true, never set the
boot_params.tboot_addr. The Xen folks are sure a force disable
to 0 is not needed, we recently forced disabled this on lguest.
With this in place this check is no longer needed.

Xen folks are sure force disable to 0 is not needed because
apm_info lives in .bss, we recently forced disabled this on
lguest, and on the Xen side just to be sure Boris zeroed out
the .bss for PV guests through commit 04b6b4a56884327c1648
("xen/x86: Zero out .bss for PV guests"). With this care taken
into consideration the paravirt_enabled() check is simply not
needed anymore.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/tboot.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index e72a07f20b05..9b0185fbe3eb 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -74,12 +74,6 @@ void __init tboot_probe(void)
return;
}
 
-   /* only a natively booted kernel should be using TXT */
-   if (paravirt_enabled()) {
-   pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
-   return;
-   }
-
/* Map and check for tboot UUID. */
set_fixmap(FIX_TBOOT_BASE, boot_params.tboot_addr);
tboot = (struct tboot *)fix_to_virt(FIX_TBOOT_BASE);
-- 
2.7.2



[PATCH v5 05/14] x86, ACPI: move ACPI_FADT_NO_CMOS_RTC check to ACPI boot code

2016-04-08 Thread Luis R. Rodriguez
This moves the ACPI specific check into the ACPI boot code,
it also takes advantage of the x86_platform.legacy.rtc which
is checked for already on the RTC initialization code. This
lets us remove the nasty #ifdefery and consolidate the checks
to use only one toggle to disable the RTC init code.

The works as RTC is initialized by device_initcall(add_rtc_cmos),
this will run late in boot on start_kernel() during rest_init(),
acpi_parse_fadt() gets called earlier during setup_arch().

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/acpi/boot.c | 4 
 arch/x86/kernel/rtc.c   | 8 
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 8c2f1ef6ca23..8c9c2bdba092 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -913,6 +913,10 @@ late_initcall(hpet_insert_resource);
 
 static int __init acpi_parse_fadt(struct acpi_table_header *table)
 {
+   if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) {
+   pr_debug("ACPI: not registering RTC platform device\n");
+   x86_platform.legacy.rtc = 0;
+   }
 
 #ifdef CONFIG_X86_PM_TIMER
/* detect the location of the ACPI PM Timer */
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 62c48da3889d..ff4f4180fefd 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -189,14 +189,6 @@ static __init int add_rtc_cmos(void)
if (of_have_populated_dt())
return 0;
 
-#ifdef CONFIG_ACPI
-   if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) {
-   /* This warning can likely go away again in a year or two. */
-   pr_info("ACPI: not registering RTC platform device\n");
-   return -ENODEV;
-   }
-#endif
-
if (!x86_platform.legacy.rtc)
return -ENODEV;
 
-- 
2.7.2



[PATCH v5 03/14] tools/lguest: make lguest launcher use X86_SUBARCH_LGUEST explicitly

2016-04-08 Thread Luis R. Rodriguez
Be explicit and make use of X86_SUBARCH_LGUEST directly.

Signed-off-by: Luis R. Rodriguez 
---
 tools/lguest/lguest.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lguest/lguest.c b/tools/lguest/lguest.c
index 80159e6811c2..ff0aa580c6e1 100644
--- a/tools/lguest/lguest.c
+++ b/tools/lguest/lguest.c
@@ -3351,8 +3351,8 @@ int main(int argc, char *argv[])
/* Boot protocol version: 2.07 supports the fields for lguest. */
boot->hdr.version = 0x207;
 
-   /* The hardware_subarch value of "1" tells the Guest it's an lguest. */
-   boot->hdr.hardware_subarch = 1;
+   /* X86_SUBARCH_LGUEST tells the Guest it's an lguest. */
+   boot->hdr.hardware_subarch = X86_SUBARCH_LGUEST;
 
/* Tell the entry path not to try to reload segment registers. */
boot->hdr.loadflags |= KEEP_SEGMENTS;
-- 
2.7.2



[PATCH v5 06/14] x86/init: use a platform legacy quirk for ebda

2016-04-08 Thread Luis R. Rodriguez
This replaces the paravirt_enabled() check with a
proper x86 legacy platform quirk.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
+39 +35+35 +25

That's a 4 byte total overhead, the rest is all cleared out
upon init as its all __init text.

v2: document 0-day vmlinux size impact

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/include/asm/x86_init.h   | 3 +++
 arch/x86/kernel/head.c| 2 +-
 arch/x86/kernel/platform-quirks.c | 4 
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 8bb8c1a4615a..89d9d57e145d 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -145,9 +145,12 @@ struct timespec;
  * struct x86_legacy_features - legacy x86 features
  *
  * @rtc: this device has a CMOS real-time clock present
+ * @ebda_search: it's safe to search for the EBDA signature in the hardware's
+ * low RAM
  */
 struct x86_legacy_features {
int rtc;
+   int ebda_search;
 };
 
 /**
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c
index 992f442ca155..afe65dffee80 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/head.c
@@ -38,7 +38,7 @@ void __init reserve_ebda_region(void)
 * that the paravirt case can handle memory setup
 * correctly, without our help.
 */
-   if (paravirt_enabled())
+   if (!x86_platform.legacy.ebda_search)
return;
 
/* end of low (conventional) memory */
diff --git a/arch/x86/kernel/platform-quirks.c 
b/arch/x86/kernel/platform-quirks.c
index 021a5f973ce3..01b159781d96 100644
--- a/arch/x86/kernel/platform-quirks.c
+++ b/arch/x86/kernel/platform-quirks.c
@@ -7,8 +7,12 @@
 void __init x86_early_init_platform_quirks(void)
 {
x86_platform.legacy.rtc = 1;
+   x86_platform.legacy.ebda_search = 0;
 
switch (boot_params.hdr.hardware_subarch) {
+   case X86_SUBARCH_PC:
+   x86_platform.legacy.ebda_search = 1;
+   break;
case X86_SUBARCH_XEN:
case X86_SUBARCH_LGUEST:
case X86_SUBARCH_INTEL_MID:
-- 
2.7.2



[PATCH v5 10/14] x86/cpu/intel: remove not needed paravirt_enabled() for f00f work around

2016-04-08 Thread Luis R. Rodriguez
The X86_BUG_F00F work around is responsible for fixing up the error
generated on attempted F00F exploitation from an OOPS to a SIGILL.
There is no reason why this code should not be allowed to run on
PV guest on a F00F-affected CPU -- it would simply never trigger.
The pv_enabled() check was there only to avoid printing the f00f
workaround, so removing the check is purely a cosmetic change.

Suggested-by: Andy Lutomirski 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/cpu/intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f71a34944b56..66509285ffdd 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -233,7 +233,7 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
 * The Quark is also family 5, but does not have the same bug.
 */
clear_cpu_bug(c, X86_BUG_F00F);
-   if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
+   if (c->x86 == 5 && c->x86_model < 9) {
static int f00f_workaround_enabled;
 
set_cpu_bug(c, X86_BUG_F00F);
-- 
2.7.2



[PATCH v5 07/14] tools/lguest: force disable tboot and apm

2016-04-08 Thread Luis R. Rodriguez
The paravirt_enabled() check is going away, the area tossed to
the kernel on lguest is not zerored out, so ensure lguest force
disables tboot and apm just in case the kernel file being read might
have this set for whatever reason.

Signed-off-by: Luis R. Rodriguez 
---
 tools/lguest/lguest.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/lguest/lguest.c b/tools/lguest/lguest.c
index ff0aa580c6e1..0aa75af6e862 100644
--- a/tools/lguest/lguest.c
+++ b/tools/lguest/lguest.c
@@ -3357,6 +3357,12 @@ int main(int argc, char *argv[])
/* Tell the entry path not to try to reload segment registers. */
boot->hdr.loadflags |= KEEP_SEGMENTS;
 
+   /* We don't support tboot */
+   boot->tboot_addr = 0;
+
+   /* Ensure this is 0 to prevent apm from loading */
+   boot->apm_bios_info.version = 0;
+
/* We tell the kernel to initialize the Guest. */
tell_kernel(start);
 
-- 
2.7.2



[PATCH v5 01/14] x86/boot: enumerate documentation for the x86 hardware_subarch

2016-04-08 Thread Luis R. Rodriguez
Although hardware_subarch has been in place since the x86 boot
protocol 2.07 it hasn't been used much. Enumerate current possible
values to avoid misuses and help with semantics later at boot
time should this be used further.

These enums should only ever be used by architecture x86 code,
and all that code should be well contained and compartamentalized,
clarify that as well.

v2: updates documentation further -- be a bit more pedantic about
annotating care and use of these guys.
v3: Use s/SOC/SoC and also anntoate that both domU and dom0 are
both currently supported through the PV boot path.

Cc: Andy Shevchenko 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/include/uapi/asm/bootparam.h | 37 ++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/bootparam.h 
b/arch/x86/include/uapi/asm/bootparam.h
index 329254373479..bf9fea2f4591 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -157,7 +157,42 @@ struct boot_params {
__u8  _pad9[276];   /* 0xeec */
 } __attribute__((packed));
 
-enum {
+/**
+ * enum x86_hardware_subarch - x86 hardware subarchitecture
+ *
+ * The x86 hardware_subarch and hardware_subarch_data were added as of the x86
+ * boot protocol 2.07 to help distinguish and support custom x86 boot
+ * sequences. This enum represents accepted values for the x86
+ * hardware_subarch.  Custom x86 boot sequences (not X86_SUBARCH_PC) do not
+ * have or simply *cannot* make use of natural stubs like BIOS or EFI, the
+ * hardware_subarch can be used on the Linux entry path to revector to a
+ * subarchitecture stub when needed. This subarchitecture stub can be used to
+ * set up Linux boot parameters or for special care to account for nonstandard
+ * handling of page tables.
+ *
+ * These enums should only ever be used by x86 code, and the code that uses
+ * it should be well contained and compartamentalized.
+ *
+ * KVM and Xen HVM do not have a subarch as these are expected to follow
+ * standard x86 boot entries. If there is a genuine need for "hypervisor" type
+ * that should be considered separately in the future. Future guest types
+ * should seriously consider working with standard x86 boot stubs such as
+ * the BIOS or EFI boot stubs.
+ *
+ * @X86_SUBARCH_PC: Should be used if the hardware is enumerable using standard
+ * PC mechanisms (PCI, ACPI) and doesn't need a special boot flow.
+ * @X86_SUBARCH_LGUEST: Used for x86 hypervisor demo, lguest
+ * @X86_SUBARCH_XEN: Used for Xen guest types which follow the PV boot path,
+ * which start at asm startup_xen() entry point and later jump to the C
+ * xen_start_kernel() entry point. Both domU and dom0 type of guests are
+ * currently supportd through this PV boot path.
+ * @X86_SUBARCH_INTEL_MID: Used for Intel MID (Mobile Internet Device) platform
+ * systems which do not have the PCI legacy interfaces.
+ * @X86_SUBARCH_CE4100: Used for Intel CE media processor (CE4100) SoC for
+ * for settop boxes and media devices, the use of a subarch for CE4100
+ * is more of a hack...
+ */
+enum x86_hardware_subarch {
X86_SUBARCH_PC = 0,
X86_SUBARCH_LGUEST,
X86_SUBARCH_XEN,
-- 
2.7.2



[PATCH v5 08/14] apm32: remove paravirt_enabled() use

2016-04-08 Thread Luis R. Rodriguez
There is already a check for apm_info.bios == 0, the
apm_info.bios is set from the boot_params.apm_bios_info.
Both Xen and lguest, which are also the only ones that set
paravirt_enabled to true, never set the apm_bios.info. The

Xen folks are sure force disable to 0 is not needed because
apm_info lives in .bss, we recently forced disabled this on
lguest, and on the Xen side just to be sure Boris zeroed out
the .bss for PV guests through commit 04b6b4a56884327c1648
("xen/x86: Zero out .bss for PV guests"). With this care taken
into consideration the paravirt_enabled() check is simply not
needed anymore.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/apm_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 9307f182fe30..c7364bd633e1 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -2267,7 +2267,7 @@ static int __init apm_init(void)
 
dmi_check_system(apm_dmi_table);
 
-   if (apm_info.bios.version == 0 || paravirt_enabled() || 
machine_is_olpc()) {
+   if (apm_info.bios.version == 0 || machine_is_olpc()) {
printk(KERN_INFO "apm: BIOS not found.\n");
return -ENODEV;
}
-- 
2.7.2



Re: [PATCH] mtd: nand: nuc900: allow compiling with COMPILE_TEST

2016-04-08 Thread kbuild test robot
Hi Rafał,

[auto build test WARNING on mtd/master]
[also build test WARNING on v4.6-rc2 next-20160408]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Rafa-Mi-ecki/mtd-nand-nuc900-allow-compiling-with-COMPILE_TEST/20160408-185814
base:   git://git.infradead.org/linux-mtd.git master
config: blackfin-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=blackfin 

All warnings (new ones prefixed by >>):

>> drivers/mtd/nand/nuc900_nand.c:36:0: warning: "SWRST" redefined [enabled by 
>> default]
   arch/blackfin/mach-bf533/include/mach/defBF532.h:25:0: note: this is the 
location of the previous definition

vim +/SWRST +36 drivers/mtd/nand/nuc900_nand.c

8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  20  
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  21  #include 

8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  22  #include 

8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  23  #include 

8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  24  
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  25  #define 
REG_FMICSR0x00
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  26  #define 
REG_SMCSR 0xa0
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  27  #define 
REG_SMISR 0xac
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  28  #define 
REG_SMCMD 0xb0
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  29  #define 
REG_SMADDR0xb4
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  30  #define 
REG_SMDATA0xb8
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  31  
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  32  #define 
RESET_FMI 0x01
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  33  #define 
NAND_EN   0x08
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  34  #define 
READYBUSY (0x01 << 18)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  35  
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10 @36  #define 
SWRST 0x01
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  37  #define 
PSIZE (0x01 << 3)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  38  #define 
DMARWEN   (0x03 << 1)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  39  #define 
BUSWID(0x01 << 4)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  40  #define 
ECC4EN(0x01 << 5)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  41  #define 
WP(0x01 << 24)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  42  #define 
NANDCS(0x01 << 25)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  43  #define 
ENDADDR   (0x01 << 31)
8bff82cb drivers/mtd/nand/w90p910_nand.c Wan ZongShun 2009-07-10  44  

:: The code at line 36 was first introduced by commit
:: 8bff82cbc30884fc52969608d090d874641e7196 mtd: add nand support for 
w90p910 (v2)

:: TO: Wan ZongShun 
:: CC: David Woodhouse 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH v5 13/14] x86/init: rename ebda code file

2016-04-08 Thread Luis R. Rodriguez
This makes it clearer what this is.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/Makefile  | 2 +-
 arch/x86/kernel/Makefile   | 2 +-
 arch/x86/kernel/{head.c => ebda.c} | 0
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename arch/x86/kernel/{head.c => ebda.c} (100%)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index f9ed8a7ce2b6..6fce7f096b88 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -208,7 +208,7 @@ endif
 
 head-y := arch/x86/kernel/head_$(BITS).o
 head-y += arch/x86/kernel/head$(BITS).o
-head-y += arch/x86/kernel/head.o
+head-y += arch/x86/kernel/ebda.o
 head-y += arch/x86/kernel/platform-quirks.o
 
 libs-y  += arch/x86/lib/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 7a9e44d935de..0503f5bfb18d 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -4,7 +4,7 @@
 
 extra-y:= head_$(BITS).o
 extra-y+= head$(BITS).o
-extra-y+= head.o
+extra-y+= ebda.o
 extra-y+= platform-quirks.o
 extra-y+= vmlinux.lds
 
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/ebda.c
similarity index 100%
rename from arch/x86/kernel/head.c
rename to arch/x86/kernel/ebda.c
-- 
2.7.2



[PATCH v5 02/14] x86/xen: use X86_SUBARCH_XEN for PV guest boots

2016-04-08 Thread Luis R. Rodriguez
The use of subarch should have no current effect on Xen
PV guests, as such this should have no current functional
effects.

Reviewed-by: David Vrabel 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/xen/enlighten.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 9b8f1eacc110..40487f1ecb4c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1661,6 +1661,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
boot_params.hdr.ramdisk_image = initrd_start;
boot_params.hdr.ramdisk_size = xen_start_info->mod_len;
boot_params.hdr.cmd_line_ptr = __pa(xen_start_info->cmd_line);
+   boot_params.hdr.hardware_subarch = X86_SUBARCH_XEN;
 
if (!xen_initial_domain()) {
add_preferred_console("xenboot", 0, NULL);
-- 
2.7.2



[PATCH v5 12/14] x86, ACPI: parse ACPI_FADT_LEGACY_DEVICES

2016-04-08 Thread Luis R. Rodriguez
ACPI 5.2.9.3 IA-PC Boot Architecture flag ACPI_FADT_LEGACY_DEVICES
can be used to determine if a system has legacy devices LPC or
ISA devices. The x86 platform already has a struct which lists
known associated legacy devices, we start off careful only
by disabling root devices we should not regress with. The struct
and device list can be expanded with time to cover more root
legacy components.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/kernel/acpi/boot.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 8c9c2bdba092..c9a06e573fa5 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -913,6 +913,11 @@ late_initcall(hpet_insert_resource);
 
 static int __init acpi_parse_fadt(struct acpi_table_header *table)
 {
+   if (!(acpi_gbl_FADT.boot_flags & ACPI_FADT_LEGACY_DEVICES)) {
+   pr_debug("ACPI: no legacy devices present\n");
+   x86_platform.legacy.devices.pnpbios = 0;
+   }
+
if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_CMOS_RTC) {
pr_debug("ACPI: not registering RTC platform device\n");
x86_platform.legacy.rtc = 0;
-- 
2.7.2



[PATCH v5 04/14] x86/rtc: replace paravirt rtc check with platform legacy quirk

2016-04-08 Thread Luis R. Rodriguez
We have 4 types of x86 platforms that disable RTC:

  * Intel MID
  * Lguest - uses paravirt
  * Xen dom-U - uses paravirt
  * x86 on legacy systems annotated with an ACPI legacy flag

We can consolidate all of these into a platform specific legacy
quirk set early in boot through i386_start_kernel() and through
x86_64_start_reservations(). This deals with the RTC quirks which
we can rely on through the hardware subarch, the ACPI check can
be dealt with separately.

For Xen things are bit more complex given that the @X86_SUBARCH_XEN
x86_hardware_subarch is shared on for Xen which uses the PV path for
both domU and dom0. Since the semantics for differentiating between
the two are Xen specific we provide a platform helper to help override
default legacy features -- x86_platform.set_legacy_features(). Use
of this helper is highly discouraged, its only purpose should be
to account for the lack of semantics available within your given
x86_hardware_subarch.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.textx86_early_init_platform_quirks()
+70 +62+62  +43

Only 8 bytes overhead total, as the main increase in size is
all removed via __init.

v2: split the subarch check from the ACPI check, clarify
on the ACPI change commit log why ordering works
v3: add x86_platform.set_legacy_features() to account for dom0,
add also size impact on vmlinux as per 0-day report

Suggested-by: Ingo Molnar 
Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/Makefile |  1 +
 arch/x86/include/asm/paravirt.h   |  6 --
 arch/x86/include/asm/paravirt_types.h |  5 -
 arch/x86/include/asm/processor.h  |  1 -
 arch/x86/include/asm/x86_init.h   | 21 +
 arch/x86/kernel/Makefile  |  6 +-
 arch/x86/kernel/head32.c  |  2 ++
 arch/x86/kernel/head64.c  |  1 +
 arch/x86/kernel/platform-quirks.c | 21 +
 arch/x86/kernel/rtc.c |  7 ++-
 arch/x86/lguest/boot.c|  1 -
 arch/x86/xen/enlighten.c  | 10 +++---
 12 files changed, 60 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kernel/platform-quirks.c

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 4086abca0b32..f9ed8a7ce2b6 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -209,6 +209,7 @@ endif
 head-y := arch/x86/kernel/head_$(BITS).o
 head-y += arch/x86/kernel/head$(BITS).o
 head-y += arch/x86/kernel/head.o
+head-y += arch/x86/kernel/platform-quirks.o
 
 libs-y  += arch/x86/lib/
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 601f1b8f9961..6c7a4a192032 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -20,12 +20,6 @@ static inline int paravirt_enabled(void)
return pv_info.paravirt_enabled;
 }
 
-static inline int paravirt_has_feature(unsigned int feature)
-{
-   WARN_ON_ONCE(!pv_info.paravirt_enabled);
-   return (pv_info.features & feature);
-}
-
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index e8c2326478c8..6acc1b26cf40 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -70,14 +70,9 @@ struct pv_info {
 #endif
 
int paravirt_enabled;
-   unsigned int features;/* valid only if paravirt_enabled is set */
const char *name;
 };
 
-#define paravirt_has(x) paravirt_has_feature(PV_SUPPORTED_##x)
-/* Supported features */
-#define PV_SUPPORTED_RTC(1<<0)
-
 struct pv_init_ops {
/*
 * Patch may replace one of the defined code sequences with
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 9264476f3d57..0c70c7daa6b8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ static inline unsigned long current_top_of_stack(void)
 #else
 #define __cpuidnative_cpuid
 #define paravirt_enabled() 0
-#define paravirt_has(x)0
 
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 1ae89a2721d6..8bb8c1a4615a 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -142,6 +142,15 @@ struct x86_cpuinit_ops {
 struct timespec;
 
 /**
+ * struct x86_legacy_features - legacy x86 features
+ *
+ * @rtc: this device has a CMOS real-time clock present
+ */
+struct x86_legacy_features {
+   int rtc;
+};
+
+/**
  * struct x86_platform_ops - platform specific runtime functions
  * @calibrate_tsc: calibrate TSC
  * @get_wallclock: get time from HW clock like RTC etc.
@@ -152,6 +161,14 @@ struct timespe

[PATCH v5 14/14] x86/paravirt: remove paravirt_enabled()

2016-04-08 Thread Luis R. Rodriguez
That that paravirt_enabled() is replaced with proper
x86 semantics we can remove it.

Signed-off-by: Luis R. Rodriguez 
---
 arch/x86/include/asm/paravirt.h   | 5 -
 arch/x86/include/asm/paravirt_types.h | 1 -
 arch/x86/include/asm/processor.h  | 1 -
 arch/x86/kernel/kvm.c | 8 
 arch/x86/kernel/paravirt.c| 1 -
 arch/x86/lguest/boot.c| 2 --
 arch/x86/xen/enlighten.c  | 1 -
 7 files changed, 19 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6c7a4a192032..dff26bc91b17 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,11 +15,6 @@
 #include 
 #include 
 
-static inline int paravirt_enabled(void)
-{
-   return pv_info.paravirt_enabled;
-}
-
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 6acc1b26cf40..7fedf24bd811 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -69,7 +69,6 @@ struct pv_info {
u16 extra_user_64bit_cs;  /* __USER_CS if none */
 #endif
 
-   int paravirt_enabled;
const char *name;
 };
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0c70c7daa6b8..8d326e822cb8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -473,7 +473,6 @@ static inline unsigned long current_top_of_stack(void)
 #include 
 #else
 #define __cpuidnative_cpuid
-#define paravirt_enabled() 0
 
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index dc1207e2f193..eea2a6f72b31 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -285,14 +285,6 @@ static void __init paravirt_ops_setup(void)
 {
pv_info.name = "KVM";
 
-   /*
-* KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
-* guest kernel works like a bare metal kernel with additional
-* features, and paravirt_enabled is about features that are
-* missing.
-*/
-   pv_info.paravirt_enabled = 0;
-
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
pv_cpu_ops.io_delay = kvm_io_delay;
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index f08ac28b8136..71a2d8a05a66 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -294,7 +294,6 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 
 struct pv_info pv_info = {
.name = "bare hardware",
-   .paravirt_enabled = 0,
.kernel_rpl = 0,
.shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
 
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index f5497ee5fd2f..3847e736702e 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1408,8 +1408,6 @@ __init void lguest_init(void)
 {
/* We're under lguest. */
pv_info.name = "lguest";
-   /* Paravirt is enabled. */
-   pv_info.paravirt_enabled = 1;
/* We're running at privilege level 1, not 0 as normal. */
pv_info.kernel_rpl = 1;
/* Everyone except Xen runs with this set. */
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e066fcf87c3d..7c1da39623f4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1186,7 +1186,6 @@ static unsigned xen_patch(u8 type, u16 clobbers, void 
*insnbuf,
 }
 
 static const struct pv_info xen_info __initconst = {
-   .paravirt_enabled = 1,
.shared_kernel_pmd = 0,
 
 #ifdef CONFIG_X86_64
-- 
2.7.2



[RFC][PATCH] MAINTAINERS: Add Android Ion as a separate entry

2016-04-08 Thread Laura Abbott
The android drivers have a few other people reviewing patches.
Add a separate entry to ensure patches go to the right people.

Signed-off-by: Laura Abbott 
---
Sumit and I have been doing review anyway so I think it makes sense for
us to be cc-ed on patches in addition to the generic Android maintainers.
Anyone else who wants to join in is welcome.
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 40eb1db..c697c6e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -776,6 +776,15 @@ S: Supported
 F: drivers/android/
 F: drivers/staging/android/
 
+ANDROID ION DRIVER
+M: Laura Abbott 
+M: Sumit Semwal 
+L: de...@driverdev.osuosl.org
+S: Supported
+F: drivers/staging/android/ion
+F: drivers/staging/android/uapi/ion.h
+F: drivers/staging/android/uapi/ion_test.h
+
 AOA (Apple Onboard Audio) ALSA DRIVER
 M: Johannes Berg 
 L: linuxppc-...@lists.ozlabs.org
-- 
2.5.5



[GIT PULL] SCSI fixes for 4.6-rc2

2016-04-08 Thread James Bottomley
This is a set of 8 fixes.  Two are trivial gcc-6 updates (brace
additions and unused variable removal).  There's a couple of cxlflash
regressions, a correction for sd being overly chatty on revalidation
(causing excess log increases).  A VPD issue which could crash USB
devices because they seem very intolerant to VPD inquiries, an ALUA
deadlock fix and a mpt3sas buffer overrun fix.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Arnd Bergmann (1):
  aacraid: add missing curly braces

Bart Van Assche (2):
  scsi_dh_alua: Fix a recently introduced deadlock
  scsi: Declare local symbols static

Calvin Owens (1):
  mpt3sas: Don't overreach ioc->reply_post[] during initialization

Hannes Reinecke (1):
  scsi: Do not attach VPD to devices that don't support it

Manoj N. Kumar (2):
  cxlflash: Move to exponential back-off when cmd_room is not available
  cxlflash: Fix regression issue with re-ordering patch

Martin K. Petersen (1):
  sd: Fix excessive capacity printing on devices with blocks bigger than 
512 bytes

And the diffstat

 drivers/scsi/aacraid/linit.c   |   3 +-
 drivers/scsi/cxlflash/main.c   | 138 -
 drivers/scsi/cxlflash/main.h   |   5 +-
 drivers/scsi/device_handler/scsi_dh_alua.c |   4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c|  33 ---
 drivers/scsi/scsi.c|   3 +-
 drivers/scsi/scsi_sysfs.c  |   8 +-
 drivers/scsi/sd.c  |  47 ++
 drivers/scsi/sd.h  |   7 +-
 include/scsi/scsi_device.h |  25 ++
 10 files changed, 164 insertions(+), 109 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 21a67ed..ff6caab 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -452,10 +452,11 @@ static int aac_slave_configure(struct scsi_device *sdev)
else if (depth < 2)
depth = 2;
scsi_change_queue_depth(sdev, depth);
-   } else
+   } else {
scsi_change_queue_depth(sdev, 1);
 
sdev->tagged_supported = 1;
+   }
 
return 0;
 }
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 35968bd..8fb9643 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -289,7 +289,7 @@ static void context_reset(struct afu_cmd *cmd)
atomic64_set(&afu->room, room);
if (room)
goto write_rrin;
-   udelay(nretry);
+   udelay(1 << nretry);
} while (nretry++ < MC_ROOM_RETRY_CNT);
 
pr_err("%s: no cmd_room to send reset\n", __func__);
@@ -303,7 +303,7 @@ write_rrin:
if (rrin != 0x1)
break;
/* Double delay each time */
-   udelay(2 << nretry);
+   udelay(1 << nretry);
} while (nretry++ < MC_ROOM_RETRY_CNT);
 }
 
@@ -338,7 +338,7 @@ retry:
atomic64_set(&afu->room, room);
if (room)
goto write_ioarrin;
-   udelay(nretry);
+   udelay(1 << nretry);
} while (nretry++ < MC_ROOM_RETRY_CNT);
 
dev_err(dev, "%s: no cmd_room to send 0x%X\n",
@@ -352,7 +352,7 @@ retry:
 * afu->room.
 */
if (nretry++ < MC_ROOM_RETRY_CNT) {
-   udelay(nretry);
+   udelay(1 << nretry);
goto retry;
}
 
@@ -683,28 +683,23 @@ static void stop_afu(struct cxlflash_cfg *cfg)
 }
 
 /**
- * term_mc() - terminates the master context
+ * term_intr() - disables all AFU interrupts
  * @cfg:   Internal structure associated with the host.
  * @level: Depth of allocation, where to begin waterfall tear down.
  *
  * Safe to call with AFU/MC in partially allocated/initialized state.
  */
-static void term_mc(struct cxlflash_cfg *cfg, enum undo_level level)
+static void term_intr(struct cxlflash_cfg *cfg, enum undo_level level)
 {
-   int rc = 0;
struct afu *afu = cfg->afu;
struct device *dev = &cfg->dev->dev;
 
if (!afu || !cfg->mcctx) {
-   dev_err(dev, "%s: returning from term_mc with NULL afu or MC\n",
-  __func__);
+   dev_err(dev, "%s: returning with NULL afu or MC\n", __func__);
return;
}
 
switch (level) {
-   case UNDO_START:
-   rc = cxl_stop_context(cfg->mcctx);
-   BUG_ON(rc);
case UNMAP_THREE:
cxl_unmap_afu_irq(cfg->mcctx, 3, afu);
case UNMAP_TWO:
@@ -713,9 +708,34 @@ static void term_mc(struct cxlflash_cfg *

Re: [PATCH] Revert "Input: atmel_mxt_ts - disable interrupt for 50ms after reset"

2016-04-08 Thread Tom Rini
On Fri, Apr 08, 2016 at 10:30:02PM +0100, Nick Dyer wrote:
> On 2016-04-08 13:39, Tom Rini wrote:
>  I have a Pixel 2 here - can you advise how to reproduce?
> >>>
> >>> I (and a bunch of other folks, the linux-samus people now point people
> >>> at using mxt-app every boot to reset the device) see this every time I
> >>> either suspend the laptop or do a warm boot into a new kernel (I didn't
> >>> try kexec but it too is probably broken).  Note that I'm not using
> >>> mainline to boot ChromeOS but I've got a regular Linux distro in ROOT-C.
> >>
> >> OK. I will try it. My Pixel is running Ubuntu with a mainline kernel, so
> >> should be able to repro.
> > 
> > Thanks.  Happy to test patches when you get there and feel free to shoot
> > me patches to have more info get dumped out or whatever if needed.
> 
> Could you try the below patch to correctly acquire the IRQ after soft reset on
> systems using IRQF_TRIGGER_FALLING.
> 
> Appears to work correctly on my Pixel 2 during a brief test.

This also works for me so:

Tested-by: Tom Rini 

... and adding in the linux-samus github project person so it can get
fixed there too.

On an unrelated note and since you have a Pixel 2 as well, the
touchscreen doesn't work for input after suspend (before and after this
patch) but is fine on cold and warm reboots.  Any chance you can debug
that one as well?  Thanks!

> 
> A workaround also seems to be to reconfig T18 COMMSCONFIG to enable
> the RETRIGEN bit using mxt-app:
> mxt-app -W -T18 44
> mxt-app --backup
> ---
>  drivers/input/touchscreen/atmel_mxt_ts.c | 28 ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/atmel_mxt_ts.c 
> b/drivers/input/touchscreen/atmel_mxt_ts.c
> index 2160512e..5af7907 100644
> --- a/drivers/input/touchscreen/atmel_mxt_ts.c
> +++ b/drivers/input/touchscreen/atmel_mxt_ts.c
> @@ -1093,6 +1093,19 @@ static int mxt_t6_command(struct mxt_data *data, u16 
> cmd_offset,
>   return 0;
>  }
>  
> +static int mxt_acquire_irq(struct mxt_data *data)
> +{
> + int error;
> +
> + enable_irq(data->irq);
> +
> + error = mxt_process_messages_until_invalid(data);
> + if (error)
> + return error;
> +
> + return 0;
> +}
> +
>  static int mxt_soft_reset(struct mxt_data *data)
>  {
>   struct device *dev = &data->client->dev;
> @@ -,7 +1124,7 @@ static int mxt_soft_reset(struct mxt_data *data)
>   /* Ignore CHG line for 100ms after reset */
>   msleep(100);
>  
> - enable_irq(data->irq);
> + mxt_acquire_irq(data);
>  
>   ret = mxt_wait_for_completion(data, &data->reset_completion,
> MXT_RESET_TIMEOUT);
> @@ -1466,19 +1479,6 @@ static int mxt_update_cfg(struct mxt_data *data, const 
> struct firmware *cfg)
>   return ret;
>  }
>  
> -static int mxt_acquire_irq(struct mxt_data *data)
> -{
> - int error;
> -
> - enable_irq(data->irq);
> -
> - error = mxt_process_messages_until_invalid(data);
> - if (error)
> - return error;
> -
> - return 0;
> -}
> -
>  static int mxt_get_info(struct mxt_data *data)
>  {
>   struct i2c_client *client = data->client;

-- 
Tom


signature.asc
Description: Digital signature


Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-08 Thread Bjorn Andersson
On Fri 08 Apr 16:01 PDT 2016, Timur Tabi wrote:

> Bjorn Andersson wrote:
> 
> >It sounds like you're trying to say that the pins used can be are
> >muxed as GPIO or MDIO, in the TLMM.
> 
> I'm not 100% sure, but I think that's correct.  If you don't want to have
> normal networking, you could connect those external pins to some GPIO device
> (like an LED or whatever), and then configure the pin muxing for GPIO
> purposes.  But if that's true, it's only true on the FSM9900. On the
> QDF2432, those lines are not connected to the TLMM.  They are instead
> hard-wired to the Emac.
> 

Then through proper use of the pinctrl framework you should configure
the FSM9900 to mux these pins appropriately and the two solutions are
equivalent.

> >In the downstream kernel this is often seen with the drivers calling
> >gpio_request() to "reserve" said pins, but all you should do is
> >described the desired configuration and muxing in the pinctrl node,
> >reference that from your driver and simply ignore the fact that those
> >pins could have been used as GPIO pins.
> 
> That makes sense, but I think the driver already does that.
> 
> https://patchwork.ozlabs.org/patch/561667/
> 
> Function emac_probe_resources() has a call to of_get_named_gpio().  And then
> emac_mac_up() calls gpio_request().  As far as I can tell, that's it.
> 
> I'm guessing that the of_get_named_gpio() call needs to be changed somehow,
> but I'm not sure how.
> 

Thanks for the link.

In short those call to the gpio framework should just be removed. They
should only be there if you're using the gpiolib to control the state of
those pins, and you're not as far as I can see.


The general outline of what you should have in your dts instead is:

soc {
tlmm {
compatible = "qcom,pinctrl-xyz";

mdio_pins_a: mdio {
state {
pins = "gpio0", "gpio1";
function = "mdio";
};
};
};

emac {
compatible = "qcom,somthing-emac";

pinctrl-names = "default";
pinctrl-0 = <&mdio_pins_a>;
};
};

Regards,
Bjorn


Re: [PATCH 4/4] arm64: pmu: add A72 cpu type, support multiple PMU types

2016-04-08 Thread kbuild test robot
Hi Jeremy,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.6-rc2 next-20160408]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Jeremy-Linton/arm-pmu-Fix-non-devicetree-probing/20160409-060104
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux 
for-next/core
config: arm-multi_v5_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

   In file included from include/asm-generic/percpu.h:6:0,
from arch/arm/include/asm/percpu.h:50,
from include/linux/percpu.h:12,
from include/linux/topology.h:34,
from include/linux/gfp.h:8,
from include/linux/slab.h:14,
from include/linux/resource_ext.h:19,
from include/linux/acpi.h:26,
from drivers/perf/arm_pmu.c:14:
   drivers/perf/arm_pmu.c: In function 'probe_plat_pmu':
>> include/linux/percpu-defs.h:250:31: warning: initialization from 
>> incompatible pointer type
#define per_cpu_ptr(ptr, cpu) ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
  ^
>> drivers/perf/arm_pmu.c:880:33: note: in expansion of macro 'per_cpu_ptr'
  struct cpuinfo_arm64 *cinfo = per_cpu_ptr(&cpu_data, cpu);
^
>> drivers/perf/arm_pmu.c:881:29: error: dereferencing pointer to incomplete 
>> type
  unsigned int cpuid = cinfo->reg_midr;
^
   In file included from include/asm-generic/percpu.h:6:0,
from arch/arm/include/asm/percpu.h:50,
from include/linux/percpu.h:12,
from include/linux/topology.h:34,
from include/linux/gfp.h:8,
from include/linux/slab.h:14,
from include/linux/resource_ext.h:19,
from include/linux/acpi.h:26,
from drivers/perf/arm_pmu.c:14:
   drivers/perf/arm_pmu.c: In function 'arm_pmu_device_probe':
>> include/linux/percpu-defs.h:250:31: warning: initialization from 
>> incompatible pointer type
#define per_cpu_ptr(ptr, cpu) ({ (void)(cpu); VERIFY_PERCPU_PTR(ptr); })
  ^
   drivers/perf/arm_pmu.c:1030:34: note: in expansion of macro 'per_cpu_ptr'
   struct cpuinfo_arm64 *cinfo = per_cpu_ptr(&cpu_data, 0);
 ^
   drivers/perf/arm_pmu.c:1031:30: error: dereferencing pointer to incomplete 
type
   unsigned int cpuid = cinfo->reg_midr;
 ^

vim +881 drivers/perf/arm_pmu.c

   874  GFP_KERNEL);
   875  if (!pmu->irq_affinity)
   876  return -ENOMEM;
   877  }
   878  
   879  for_each_possible_cpu(cpu) {
 > 880  struct cpuinfo_arm64 *cinfo = per_cpu_ptr(&cpu_data, 
 > cpu);
 > 881  unsigned int cpuid = cinfo->reg_midr;
   882  
   883  if (cpuid == pmuid) {
   884  cpumask_set_cpu(cpu, &pmu->supported_cpus);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v1 06/12] serial: 8250_dma: stop ongoing RX DMA on exception

2016-04-08 Thread Peter Hurley
On 04/08/2016 01:07 AM, Andy Shevchenko wrote:
> On Fri, Apr 8, 2016 at 2:54 AM, Peter Hurley  wrote:
>> On 04/07/2016 01:37 PM, Andy Shevchenko wrote:
>>> If we get an exeption interrupt. i.e. UART_IIR_RLSI, stop any ongoing RX DMA
>>> transfer otherwise it might generates more spurious interrupts and make port
>>> unavailable anymore.
>>
>> Then how to know which rx byte the error is for if dma continues anyway?
>> What if there are multiple error bytes?
> 
> And how should it work?
> We get an interrupt during DMA, if we don't stop DMA it will be racy
> with direct readings.

It makes sense to me that the ongoing DMA needs paused, flushed & terminated,
but the UART should have already aborted the DMA at the first error byte,
so it doesn't make sense to me that the DMA hardware went sideways.

Have you verified that the actual byte in error is reported as the frame/parity
byte and that error-free data is unmangled? Like with a data pattern and a logic
analyzer?


>>
>>
>>> As has been seen on Intel Broxton system:
>>
>> This system shouldn't be setup for UART DMA imo.
> 
> Same approach is done in 8250_omap.

Well, omap8250 has totally different (and possibly unnecessary) rx dma flow.

During the development of the omap8250 driver, it was discovered that the
normal 8250 rx dma flow didn't work reliably on OMAP; ie., the rx dma wouldn't
start once rx uart interrupt had already happened.

*So omap8250 sets up rx dma before any data has been received*
That's the dma that is cancelled when an RLSI interrupt is received;
on OMAP the residue is always 0.

Well, it turns out that the omap8250 rx dma flow *may* be limited to only
1 specific design, the am335x, which has a bunch of other dma issues, with
both tx and rx dma. So all that omap8250 dma handling might be going
away anyway.

IOW, omap8250 is a terrible dma model; do not use.
[Granted the current model needs some work as well; eg., using ping-pong
dma buffers to weather dmaengine descriptor completion latency).

Regards,
Peter Hurley


Re: [PATCH 01/13] devpts: Teach /dev/ptmx to find the associated devpts via path lookup

2016-04-08 Thread Eric W. Biederman
Linus Torvalds  writes:

> But more fundamentally I still don't actually understand why you even
> really care.

At this point I care because there is a failure of communication.
Until this email no one has ever said:  "Ok that actually could happen
but we don't actually care."

Right now I am a bit paranoid because I have seen a few too many cases
where some little detail was glossed over and someone clever turned it
into a great big CVE they could drive a truck through.  So I am once
bitten twice shy and all of that.

> We get the wrong pts case *today*. We'd get a different wrong pts
> namespace when somebody tries to do odd things. Why would we care? It
> would be a _better_ guess.
>
> I don't see the security issue. If you do tricks to get pty's in
> another group, what's the problem? You have to do it consciously, and
> I don't see what the downside is. You get what you ask for, and I
> don't see a new attack surface.
>
> The whole "somebody used chmod on /dev/pts/" argument sounds bogus.
> That's an insane thing to do. If you want a private namespace, you
> make *all* of /dev private, you don't go "oh, I'll just make the pts
> subdirectory private".

Oh I pretty much agree it is an insane thing to do.  At the same time I
know that people can make a lot of little sane decisions that can lead
to an insane situation, so just because it is insane I can't rule
it out automatically.

The actual sane thing to do, and what I think most of userspace does
at this point is to create it's own mount namespace so nothing is
visible to outsiders.

> In other words, your whole scenario sounds totally made up to begin
> with. And even if it happens, I don't see what would be so disastrous
> about it.

In general I agree.  The scenario is made up.  I would be surprised if
it happens.

> I mean, right now, /dev/ptmx is world read-write in the root container
> and everybody gets access to the same underlying set of ptys. And
> that's not some horrible security issue. It's how things are
> *supposed* to work.

I agree.

> So I really don't see the argument. You guys are just making shit up.

I don't see why we have the linux extension of supporting anything
except mode 0666 on /dev/ptmx or /dev/pts/ptmx.  This is really about
not breaking that linux extension by overlooking some little detail.

On the attack analysis front the worst thing I can see happening is a
denial of service attack.  I see two possible denial of service attacks.
One possible attack creates a pty and prevents devpts from being
unmounted.  Another possible attack creates all possible ptys on a
devpts instance, and prevents legitimate tty creations from happening.

At the end of the day as you say it would be a pretty crazy person who
isolated a mount of devpts with just the permissions of /dev/pts/ptmx.
So if we don't want to care knowing those stupid attacks above are
possible I am happy not to care.  They don't look all that serious to
me.

Eric


Re: [PATCH] ion: scatterlist offset not used for buffer map

2016-04-08 Thread Colin Cross
On Thu, Apr 7, 2016 at 11:56 PM, John Einar Reitan
 wrote:
> On Thu, Apr 07, 2016 at 12:37:50PM -0700, Laura Abbott wrote:
>> On 04/07/2016 04:29 AM, John Einar Reitan wrote:
>> > ion's default user/kernel page mapping code don't honor the offset
>> > option for scatterlists. It uses sg_page and expect the whole page to be
>> > mapped, while the offset could dictate an offset within a large page.
>> >
>> > sg_phys correctly accounts for the offset, so should be used instead.
>> >
>>
>> Can you be more specific about which heap and which allocation pattern
>> is exposing this bug?
>
> The heap that exposed the bug is one I'm developing and will be posting
> as a RFC soon. It uses compound pages and an sub-divides it into surface
> buffers. The ion buffers are configured to hold sgl's with the compound
> page and the correct offset of the buffer, via
> sg_set_page(.., compound_page, .., offset_of_logical_buffer);

I don't think this is right.  A compound_page still has a page struct
for every page, you should be passing the page struct where your data
starts.  Using an offset > PAGE_SIZE is going to break lots of places,
for example anywhere that uses kmap(sg_page(sg)).

> sg_phys/sg_virt  includes this offset, but if you poke the sg and extract
> the page with sg_page yourself you must include this offset in your
> calculations too.


[PATCH] [media] bt8xx: remove needless module refcounting

2016-04-08 Thread Alexey Khoroshilov
It is responsibility of a caller of fops->open(),
to make sure an owner of the fops is available until file is closed.
So, there is no need to lock THIS_MODULE explicitly.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/media/pci/bt8xx/dst_ca.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/media/pci/bt8xx/dst_ca.c b/drivers/media/pci/bt8xx/dst_ca.c
index da8b414fd824..8681b9143a35 100644
--- a/drivers/media/pci/bt8xx/dst_ca.c
+++ b/drivers/media/pci/bt8xx/dst_ca.c
@@ -655,7 +655,6 @@ static long dst_ca_ioctl(struct file *file, unsigned int 
cmd, unsigned long ioct
 static int dst_ca_open(struct inode *inode, struct file *file)
 {
dprintk(verbose, DST_CA_DEBUG, 1, " Device opened [%p] ", file);
-   try_module_get(THIS_MODULE);
 
return 0;
 }
@@ -663,7 +662,6 @@ static int dst_ca_open(struct inode *inode, struct file 
*file)
 static int dst_ca_release(struct inode *inode, struct file *file)
 {
dprintk(verbose, DST_CA_DEBUG, 1, " Device closed.");
-   module_put(THIS_MODULE);
 
return 0;
 }
-- 
1.9.1



[PATCH v6 2/2] mm, thp: avoid unnecessary swapin in khugepaged

2016-04-08 Thread Ebru Akagunduz
Currently khugepaged makes swapin readahead to improve
THP collapse rate. This patch checks vm statistics
to avoid workload of swapin, if unnecessary. So that
when system under pressure, khugepaged won't consume
resources to swapin and won't trigger direct reclaim
when swapin readahead.

The patch was tested with a test program that allocates
800MB of memory, writes to it, and then sleeps. The system
was forced to swap out all. Afterwards, the test program
touches the area by writing, it skips a page in each
20 pages of the area. When waiting to swapin readahead
left part of the test, the memory forced to be busy
doing page reclaim. There was enough free memory during
test, khugepaged did not swapin readahead due to business.

Test results:

After swapped out
---
  | Anonymous | AnonHugePages | Swap  | Fraction  |
---
With patch| 0 kB  |  0 kB | 80 kB |%100   |
---
Without patch | 0 kB  |  0 kB | 80 kB |%100   |
---

After swapped in
---
  | Anonymous | AnonHugePages | Swap  | Fraction  |
---
With patch| 385120 kB | 102400 kB | 414880 kB |%26|
---
Without patch | 389728 kB | 194560 kB | 410272 kB |%49|
---

Signed-off-by: Ebru Akagunduz 
Acked-by: Rik van Riel 
---
Changes in v2:
 - Add reference to specify which patch fixed (Ebru Akagunduz)
 - Fix commit subject line (Ebru Akagunduz)

Changes in v3:
 - Remove default values of allocstall (Kirill A. Shutemov)

Changes in v4:
 - define unsigned long allocstall instead of unsigned long int
   (Vlastimil Babka)
 - compare allocstall when khugepaged goes to sleep
   (Rik van Riel, Vlastimil Babka)

Changes in v5:
 - Drop fixes sha part because fixed patch is not in Linus's tree
   (Michal Hocko)
 - Save allocstall where khugepaged exactly sleeps (Michal Hocko)

Changes in v6:
 - Fix build error (test robot)

Note: I didn't add optimistic swapin and mmap_sem in this
  patch series. I couldn't overcome yet.
  I'll send them after the series ends up.

 mm/huge_memory.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 23740cd..ae99524 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -105,6 +105,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  */
 static unsigned int khugepaged_max_ptes_none __read_mostly;
 static unsigned int khugepaged_max_ptes_swap __read_mostly = HPAGE_PMD_NR/8;
+static unsigned long allocstall;
 
 static int khugepaged(void *none);
 static int khugepaged_slab_init(void);
@@ -2451,7 +2452,7 @@ static void collapse_huge_page(struct mm_struct *mm,
struct page *new_page;
spinlock_t *pmd_ptl, *pte_ptl;
int isolated = 0, result = 0;
-   unsigned long hstart, hend;
+   unsigned long hstart, hend, swap, curr_allocstall;
struct mem_cgroup *memcg;
unsigned long mmun_start;   /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
@@ -2506,7 +2507,14 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out;
}
 
-   __collapse_huge_page_swapin(mm, vma, address, pmd);
+   swap = get_mm_counter(mm, MM_SWAPENTS);
+   curr_allocstall = sum_vm_event(ALLOCSTALL);
+   /*
+* When system under pressure, don't swapin readahead.
+* So that avoid unnecessary resource consuming.
+*/
+   if (allocstall == curr_allocstall && swap != 0)
+   __collapse_huge_page_swapin(mm, vma, address, pmd);
 
anon_vma_lock_write(vma->anon_vma);
 
@@ -2900,14 +2908,17 @@ static void khugepaged_wait_work(void)
if (!khugepaged_scan_sleep_millisecs)
return;
 
+   allocstall = sum_vm_event(ALLOCSTALL);
wait_event_freezable_timeout(khugepaged_wait,
 kthread_should_stop(),
msecs_to_jiffies(khugepaged_scan_sleep_millisecs));
return;
}
 
-   if (khugepaged_enabled())
+   if (khugepaged_enabled()) {
+   allocstall = sum_vm_event(ALLOCSTALL);
wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
+   }
 }
 
 static int khugepaged(void *none)
@@ -2916,6 +2927,7 @@ static int khugepaged(void *none)
 
set_freezable();
set_user_nice(current, MAX_NICE);

Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-08 Thread Timur Tabi

Bjorn Andersson wrote:


It sounds like you're trying to say that the pins used can be are
muxed as GPIO or MDIO, in the TLMM.


I'm not 100% sure, but I think that's correct.  If you don't want to 
have normal networking, you could connect those external pins to some 
GPIO device (like an LED or whatever), and then configure the pin muxing 
for GPIO purposes.  But if that's true, it's only true on the FSM9900. 
On the QDF2432, those lines are not connected to the TLMM.  They are 
instead hard-wired to the Emac.



In the downstream kernel this is often seen with the drivers calling
gpio_request() to "reserve" said pins, but all you should do is
described the desired configuration and muxing in the pinctrl node,
reference that from your driver and simply ignore the fact that those
pins could have been used as GPIO pins.


That makes sense, but I think the driver already does that.

https://patchwork.ozlabs.org/patch/561667/

Function emac_probe_resources() has a call to of_get_named_gpio().  And 
then emac_mac_up() calls gpio_request().  As far as I can tell, that's it.


I'm guessing that the of_get_named_gpio() call needs to be changed 
somehow, but I'm not sure how.


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.


[PATCH v6 1/2] mm, vmstat: calculate particular vm event

2016-04-08 Thread Ebru Akagunduz
Currently, vmstat can calculate specific vm event with all_vm_events()
however it calculates all vm events at a time. This patch introduces
a new function to calculate only single event at a time.

Signed-off-by: Ebru Akagunduz 
Suggested-by: Kirill A. Shutemov 
Acked-by: Kirill A. Shutemov 
Reviewed-by: Rik van Riel 
Acked-by: Vlastimil Babka 
Acked-by: Christoph Lameter 
---
Changes in v2:
 - this patch newly created in this version
 - create sum event function to
   calculate particular vm event (Kirill A. Shutemov)

Changes in v3:
 - add dummy definition of sum_vm_event
   when CONFIG_VM_EVENTS is not set
   (Kirill A. Shutemov)

Changes in v4:
 - add Suggested-by tag (Vlastimil Babka)

Changes in v5:
 - CC'ed Christoph Lameter  (Andrew Morton)

Changes in v6:
 - Fix commit log (Christoph Lameter)

 include/linux/vmstat.h |  6 ++
 mm/vmstat.c| 12 
 2 files changed, 18 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 02fce41..723be2c 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -53,6 +53,8 @@ static inline void count_vm_events(enum vm_event_item item, 
long delta)
 
 extern void all_vm_events(unsigned long *);
 
+extern unsigned long sum_vm_event(enum vm_event_item item);
+
 extern void vm_events_fold_cpu(int cpu);
 
 #else
@@ -73,6 +75,10 @@ static inline void __count_vm_events(enum vm_event_item 
item, long delta)
 static inline void all_vm_events(unsigned long *ret)
 {
 }
+static inline unsigned long sum_vm_event(enum vm_event_item item)
+{
+   return 0;
+}
 static inline void vm_events_fold_cpu(int cpu)
 {
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 070fd90..d6b6c03 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -34,6 +34,18 @@
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
 EXPORT_PER_CPU_SYMBOL(vm_event_states);
 
+unsigned long sum_vm_event(enum vm_event_item item)
+{
+   int cpu;
+   unsigned long ret = 0;
+
+   get_online_cpus();
+   for_each_online_cpu(cpu)
+   ret += per_cpu(vm_event_states, cpu).event[item];
+   put_online_cpus();
+   return ret;
+}
+
 static void sum_vm_events(unsigned long *ret)
 {
int cpu;
-- 
1.9.1



[PATCH v6 0/2] mm, thp: Fix unnecessarry resource consuming in swapin

2016-04-08 Thread Ebru Akagunduz
This patch series fixes unnecessarry resource consuming
in khugepaged swapin and introduces a new function to
calculate value of specific vm event.

Ebru Akagunduz (2):
  mm, vmstat: calculate particular vm event
  mm, thp: avoid unnecessary swapin in khugepaged

 include/linux/vmstat.h |  6 ++
 mm/huge_memory.c   | 18 +++---
 mm/vmstat.c| 12 
 3 files changed, 33 insertions(+), 3 deletions(-)

-- 
1.9.1



[PATCH v16 3/6] of, numa: Add NUMA of binding implementation.

2016-04-08 Thread David Daney
From: David Daney 

Add device tree parsing for NUMA topology using device
"numa-node-id" property in distance-map and cpu nodes.

This is a complete rewrite of a previous patch by:
   Ganapatrao Kulkarni

Signed-off-by: David Daney 
Acked-by: Rob Herring 
---
 drivers/of/Kconfig   |   3 +
 drivers/of/Makefile  |   1 +
 drivers/of/of_numa.c | 211 +++
 include/linux/of.h   |   9 +++
 4 files changed, 224 insertions(+)
 create mode 100644 drivers/of/of_numa.c

diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index e2a4841..b3bec3a 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -112,4 +112,7 @@ config OF_OVERLAY
  While this option is selected automatically when needed, you can
  enable it manually to improve device tree unit test coverage.
 
+config OF_NUMA
+   bool
+
 endif # OF
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index 156c072..bee3fa9 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_OF_MTD)  += of_mtd.o
 obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
 obj-$(CONFIG_OF_RESOLVE)  += resolver.o
 obj-$(CONFIG_OF_OVERLAY) += overlay.o
+obj-$(CONFIG_OF_NUMA) += of_numa.o
 
 obj-$(CONFIG_OF_UNITTEST) += unittest-data/
diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
new file mode 100644
index 000..0f2784b
--- /dev/null
+++ b/drivers/of/of_numa.c
@@ -0,0 +1,211 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 - 2016 Cavium Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+
+/*
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now for all cpus.
+*/
+static void __init of_numa_parse_cpu_nodes(void)
+{
+   u32 nid;
+   int r;
+   struct device_node *cpus;
+   struct device_node *np = NULL;
+
+   cpus = of_find_node_by_path("/cpus");
+   if (!cpus)
+   return;
+
+   for_each_child_of_node(cpus, np) {
+   /* Skip things that are not CPUs */
+   if (of_node_cmp(np->type, "cpu") != 0)
+   continue;
+
+   r = of_property_read_u32(np, "numa-node-id", &nid);
+   if (r)
+   continue;
+
+   pr_debug("NUMA: CPU on %u\n", nid);
+   if (nid >= MAX_NUMNODES)
+   pr_warn("NUMA: Node id %u exceeds maximum value\n",
+   nid);
+   else
+   node_set(nid, numa_nodes_parsed);
+   }
+}
+
+static int __init of_numa_parse_memory_nodes(void)
+{
+   struct device_node *np = NULL;
+   struct resource rsrc;
+   u32 nid;
+   int r = 0;
+
+   for (;;) {
+   np = of_find_node_by_type(np, "memory");
+   if (!np)
+   break;
+
+   r = of_property_read_u32(np, "numa-node-id", &nid);
+   if (r == -EINVAL)
+   /*
+* property doesn't exist if -EINVAL, continue
+* looking for more memory nodes with
+* "numa-node-id" property
+*/
+   continue;
+   else if (r)
+   /* some other error */
+   break;
+
+   r = of_address_to_resource(np, 0, &rsrc);
+   if (r) {
+   pr_err("NUMA: bad reg property in memory node\n");
+   break;
+   }
+
+   pr_debug("NUMA:  base = %llx len = %llx, node = %u\n",
+rsrc.start, rsrc.end - rsrc.start + 1, nid);
+
+   r = numa_add_memblk(nid, rsrc.start,
+   rsrc.end - rsrc.start + 1);
+   if (r)
+   break;
+   }
+   of_node_put(np);
+
+   return r;
+}
+
+static int __init of_numa_parse_distance_map_v1(struct device_node *map)
+{
+   const __be32 *matrix;
+   int entry_count;
+   int i;
+
+   pr_info("NUMA: parsing numa-distance-map-v1\n");
+
+   matrix = of_get_property(map, "distance-matrix", NULL);
+   if (!matrix) {
+   pr_err("NUMA: No distance-matrix property in distance-map\n");
+   return -EINVAL;
+   }

[PATCH v16 1/6] efi: ARM/arm64: ignore DT memory nodes instead of removing them

2016-04-08 Thread David Daney
From: Ard Biesheuvel 

There are two problems with the UEFI stub DT memory node removal
routine:
- it deletes nodes as it traverses the tree, which happens to work
  but is not supported, as deletion invalidates the node iterator;
- deleting memory nodes entirely may discard annotations in the form
  of additional properties on the nodes.

Since the discovery of DT memory nodes occurs strictly before the
UEFI init sequence, we can simply clear the memblock memory table
before parsing the UEFI memory map. This way, it is no longer
necessary to remove the nodes, so we can remove that logic from the
stub as well.

Signed-off-by: Ard Biesheuvel 
Signed-off-by: David Daney 
---
 drivers/firmware/efi/arm-init.c|  8 
 drivers/firmware/efi/libstub/fdt.c | 24 +---
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index aa1f743..5d6945b 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -143,6 +143,14 @@ static __init void reserve_regions(void)
if (efi_enabled(EFI_DBG))
pr_info("Processing EFI memory map:\n");
 
+   /*
+* Discard memblocks discovered so far: if there are any at this
+* point, they originate from memory nodes in the DT, and UEFI
+* uses its own memory map instead.
+*/
+   memblock_dump_all();
+   memblock_remove(0, ULLONG_MAX);
+
for_each_efi_memory_desc(&memmap, md) {
paddr = md->phys_addr;
npages = md->num_pages;
diff --git a/drivers/firmware/efi/libstub/fdt.c 
b/drivers/firmware/efi/libstub/fdt.c
index 6dba78a..e58abfa 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void 
*orig_fdt,
unsigned long map_size, unsigned long desc_size,
u32 desc_ver)
 {
-   int node, prev, num_rsv;
+   int node, num_rsv;
int status;
u32 fdt_val32;
u64 fdt_val64;
@@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void 
*orig_fdt,
goto fdt_set_fail;
 
/*
-* Delete any memory nodes present. We must delete nodes which
-* early_init_dt_scan_memory may try to use.
-*/
-   prev = 0;
-   for (;;) {
-   const char *type;
-   int len;
-
-   node = fdt_next_node(fdt, prev, NULL);
-   if (node < 0)
-   break;
-
-   type = fdt_getprop(fdt, node, "device_type", &len);
-   if (type && strncmp(type, "memory", len) == 0) {
-   fdt_del_node(fdt, node);
-   continue;
-   }
-
-   prev = node;
-   }
-
-   /*
 * Delete all memory reserve map entries. When booting via UEFI,
 * kernel will use the UEFI memory map to find reserved regions.
 */
-- 
1.8.3.1



[PATCH v16 4/6] arm64: Move unflatten_device_tree() call earlier.

2016-04-08 Thread David Daney
From: David Daney 

In order to extract NUMA information from the device tree, we need to
have the tree in its unflattened form.

Move the call to bootmem_init() in the tail of paging_init() into
setup_arch, and adjust header files so that its declaration is
visible.

Move the unflatten_device_tree() call between the calls to
paging_init() and bootmem_init().  Follow on patches add NUMA handling
to bootmem_init().

Signed-off-by: David Daney 
---
 arch/arm64/include/asm/mmu.h |  1 +
 arch/arm64/kernel/setup.c| 13 +
 arch/arm64/mm/mm.h   |  1 -
 arch/arm64/mm/mmu.c  |  2 --
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 990124a..97b1d8f 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -29,6 +29,7 @@ typedef struct {
 #define ASID(mm)   ((mm)->context.id.counter & 0x)
 
 extern void paging_init(void);
+extern void bootmem_init(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
 extern void init_mem_pgprot(void);
 extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 9dc6776..9bd237e 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -327,6 +327,12 @@ void __init setup_arch(char **cmdline_p)
acpi_boot_table_init();
 
paging_init();
+
+   if (acpi_disabled)
+   unflatten_device_tree();
+
+   bootmem_init();
+
relocate_initrd();
 
kasan_init();
@@ -335,12 +341,11 @@ void __init setup_arch(char **cmdline_p)
 
early_ioremap_reset();
 
-   if (acpi_disabled) {
-   unflatten_device_tree();
+   if (acpi_disabled)
psci_dt_init();
-   } else {
+   else
psci_acpi_init();
-   }
+
xen_early_init();
 
cpu_read_bootcpu_ops();
diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
index ef47d99..71fe989 100644
--- a/arch/arm64/mm/mm.h
+++ b/arch/arm64/mm/mm.h
@@ -1,3 +1,2 @@
-extern void __init bootmem_init(void);
 
 void fixup_init(void);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f3e5c74..267903b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -564,8 +564,6 @@ void __init paging_init(void)
 */
memblock_free(__pa(swapper_pg_dir) + PAGE_SIZE,
  SWAPPER_DIR_SIZE - PAGE_SIZE);
-
-   bootmem_init();
 }
 
 /*
-- 
1.8.3.1



[PATCH v16 6/6] arm64, mm, numa: Add NUMA balancing support for arm64.

2016-04-08 Thread David Daney
From: Ganapatrao Kulkarni 

Enable NUMA balancing for arm64 platforms.
Add pte, pmd protnone helpers for use by automatic NUMA balancing.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
Signed-off-by: David Daney 
---
 arch/arm64/Kconfig   |  1 +
 arch/arm64/include/asm/pgtable.h | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 99f9b55..a578080 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,7 @@ config ARM64
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_SUPPORTS_ATOMIC_RMW
+   select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 989fef1..89b8f20 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -272,6 +272,21 @@ static inline pgprot_t mk_sect_prot(pgprot_t prot)
return __pgprot(pgprot_val(prot) & ~PTE_TABLE_BIT);
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+/*
+ * See the comment in include/asm-generic/pgtable.h
+ */
+static inline int pte_protnone(pte_t pte)
+{
+   return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+   return pte_protnone(pmd_pte(pmd));
+}
+#endif
+
 /*
  * THP definitions.
  */
-- 
1.8.3.1



[PATCH v16 5/6] arm64, numa: Add NUMA support for arm64 platforms.

2016-04-08 Thread David Daney
From: Ganapatrao Kulkarni 

Attempt to get the memory and CPU NUMA node via of_numa.  If that
fails, default the dummy NUMA node and map all memory and CPUs to node
0.

Tested-by: Shannon Zhao 
Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
Signed-off-by: David Daney 
---
 arch/arm64/Kconfig|  26 +++
 arch/arm64/include/asm/mmzone.h   |  12 ++
 arch/arm64/include/asm/numa.h |  45 +
 arch/arm64/include/asm/topology.h |  10 +
 arch/arm64/kernel/pci.c   |  10 +
 arch/arm64/kernel/setup.c |   4 +
 arch/arm64/kernel/smp.c   |   4 +
 arch/arm64/mm/Makefile|   1 +
 arch/arm64/mm/init.c  |  35 +++-
 arch/arm64/mm/numa.c  | 396 ++
 10 files changed, 538 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4f43622..99f9b55 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -76,6 +76,7 @@ config ARM64
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
+   select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
@@ -98,6 +99,7 @@ config ARM64
select SYSCTL_EXCEPTION_TRACE
select HAVE_CONTEXT_TRACKING
select HAVE_ARM_SMCCC
+   select OF_NUMA if NUMA && OF
help
  ARM 64-bit (AArch64) Linux support.
 
@@ -546,6 +548,30 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+   bool "Numa Memory Allocation and Scheduler Support"
+   depends on SMP
+   help
+ Enable NUMA (Non Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory of the CPU and add some more
+ NUMA awareness to the kernel.
+
+config NODES_SHIFT
+   int "Maximum NUMA Nodes (as a power of 2)"
+   range 1 10
+   default "2"
+   depends on NEED_MULTIPLE_NODES
+   help
+ Specify the maximum number of NUMA Nodes available on the target
+ system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+   def_bool y
+   depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz
 
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 000..a0de9e6
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,12 @@
+#ifndef __ASM_MMZONE_H
+#define __ASM_MMZONE_H
+
+#ifdef CONFIG_NUMA
+
+#include 
+
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid) (node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_MMZONE_H */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 000..e9b4f29
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,45 @@
+#ifndef __ASM_NUMA_H
+#define __ASM_NUMA_H
+
+#include 
+
+#ifdef CONFIG_NUMA
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node)  (node)
+
+int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+void numa_clear_node(unsigned int cpu);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+   return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_free_distance(void);
+void __init early_map_cpu_to_node(unsigned int cpu, int nid);
+void numa_store_cpu_info(unsigned int cpu);
+
+#else  /* CONFIG_NUMA */
+
+static inline void numa_store_cpu_info(unsigned int cpu) { }
+static inline void arm64_numa_init(void) { }
+static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
+
+#endif /* CONFIG_NUMA */
+
+#endif /* __ASM_NUMA_H */
diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index a3e9d6f..8b57339 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -22,6 +22,16 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#ifdef CONFIG_NUMA
+
+struct pci_bus;
+int pcibus_to_node(struct pci_bus *bus);
+#define cpumask_of_pcibus

[PATCH v16 2/6] Documentation, dt, numa: dt bindings for NUMA.

2016-04-08 Thread David Daney
From: Ganapatrao Kulkarni 

Add DT bindings for numa mapping of memory, CPUs and IOs.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
Signed-off-by: David Daney 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/numa.txt | 275 +
 1 file changed, 275 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/numa.txt

diff --git a/Documentation/devicetree/bindings/numa.txt 
b/Documentation/devicetree/bindings/numa.txt
new file mode 100644
index 000..21b3505
--- /dev/null
+++ b/Documentation/devicetree/bindings/numa.txt
@@ -0,0 +1,275 @@
+==
+NUMA binding description.
+==
+
+==
+1 - Introduction
+==
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==
+2 - numa-node-id
+==
+
+For the purpose of identification, each NUMA node is associated with a unique
+token known as a node id. For the purpose of this binding
+a node id is a 32-bit integer.
+
+A device node is associated with a NUMA node by the presence of a
+numa-node-id property which contains the node id of the device.
+
+Example:
+   /* numa node 0 */
+   numa-node-id = <0>;
+
+   /* numa node 1 */
+   numa-node-id = <1>;
+
+==
+3 - distance-map
+==
+
+The optional device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- compatible : Should at least contain "numa-distance-map-v1".
+
+- distance-matrix
+  This property defines a matrix to describe the relative distances
+  between all numa nodes.
+  It is represented as a list of node pairs and their relative distance.
+
+  Note:
+   1. Each entry represents distance from first node to second node.
+   The distances are equal in either direction.
+   2. The distance from a node to self (local distance) is represented
+   with value 10 and all internode distance should be represented with
+   a value greater than 10.
+   3. distance-matrix should have entries in lexicographical ascending
+   order of nodes.
+   4. There must be only one device node distance-map which must
+   reside in the root node.
+   5. If the distance-map node is not present, a default
+   distance-matrix is used.
+
+Example:
+   4 nodes connected in mesh/ring topology as below,
+
+   0___20__1
+   |   |
+   |   |
+   20 20
+   |   |
+   |   |
+   |___|
+   3   20  2
+
+   if relative distance for each hop is 20,
+   then internode distance would be,
+ 0 -> 1 = 20
+ 1 -> 2 = 20
+ 2 -> 3 = 20
+ 3 -> 0 = 20
+ 0 -> 2 = 40
+ 1 -> 3 = 40
+
+ and dt presentation for this distance matrix is,
+
+   distance-map {
+compatible = "numa-distance-map-v1";
+distance-matrix = <0 0  10>,
+  <0 1  20>,
+  <0 2  40>,
+  <0 3  20>,
+  <1 0  20>,
+  <1 1  10>,
+  <1 2  20>,
+  <1 3  40>,
+  <2 0  40>,
+  <2 1  20>,
+  <2 2  10>,
+  <2 3  20>,
+  <3 0  20>,
+  <3 1  40>,
+  <3 2  20>,
+  <3 3  10>;
+   };
+
+==
+4 - Example dts
+==

[PATCH v16 0/6] arm64, numa: Add numa support for arm64 platforms

2016-04-08 Thread David Daney
From: David Daney 

v16:

- No functional change.

- Rebase to v4.6-rc2 to avoid merge conflicts.

v15:

- Make the distance-map node optional (again), if it is not in
  the device tree, default values are used.

- Minor cleanups to of_numa.c as suggested by Rob Harring.

v14:
- Revised patch to unflatten the device tree earlier.

- Cleanups and added EXPORT_SYMBOL to of_numa.c as suggested
  by Rob Harring

v13:
- Added patch to unflatten the device tree earlier.

- Rewrote of_numa.c to work on unflattened the device tree.

- Cleanup of EXPORTs in arch/arm64/mm/numa.c as suggested by
  Will Deacon.

v12:

- Replaced 6 patches from Ard Biesheuvel with new simpler, and
  more correct, single patch, also from Ard.

v11:
- Dropped cleanup patches for other architectures, they will be
  submitted as a separate set after more testing.

- Added patch set from Ard Biesheuvel that are needed to make
  the whole thing actually work.  Previously this was a
  separate set.

- Kconfig and other fixes and simplifications as suggested by
  Rob Herring.

- Rearranged, refactored and reordered so that we don't patch
  new files multiple times.

- Summary:

o 6 patches from Ard Biesheuvel to allow use of
  "memory" nodes with efi stub.

o 2 patches to document and add of_numa.c

o 1 patch to add arm64 NUMA support.

o 1 patch to add NUMA balancing support for arm64.

v10:
- Incorporated review comments from Rob Herring.
- Moved numa binding and implementation to devicetree core.
- Added cleanup patch to remove redundant NODE_DATA macro from asm 
header files
- Include numa balancing support for arm64 patch in this series.
- Fix tile build issue reported by the kbuild robot(patch 7)

v9: - Added cleanup patch to reuse and avoid redefinition of 
cpumask_of_pcibus
  as suggested from Will Deacon and Bjorn Helgaas.
  - Including patch to Make pci-host-generic driver numa aware.
  - Incorporated comment from Shannon Zhao.

v8:
- Incorporated review comments of Mark Rutland and Will Deacon.
- Added pci helper function and macro for numa.

v7:
- managing numa memory mapping using memblock.
- Incorporated review comments of Mark Rutland.

v6:
- defined and implemented the numa dt binding using
node property proximity and device node distance-map.
- renamed dt_numa to of_numa

v5:
- created base verion of numa.c which creates dummy numa without using 
dt
  on single socket platforms. Then added patches for dt support.
- Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note: 1. This patchset is tested for NUMA and without NUMA with dt
(both with and without NUMA bindings) on thunderx single
socket and dual socket boards.

Ard Biesheuvel (1):
  efi: ARM/arm64: ignore DT memory nodes instead of removing them

David Daney (2):
  of, numa: Add NUMA of binding implementation.
  arm64: Move unflatten_device_tree() call earlier.

Ganapatrao Kulkarni (3):
  Documentation, dt, numa: dt bindings for NUMA.
  arm64, numa: Add NUMA support for arm64 platforms.
  arm64, mm, numa: Add NUMA balancing support for arm64.

 Documentation/devicetree/bindings/numa.txt | 275 
 arch/arm64/Kconfig |  27 ++
 arch/arm64/include/asm/mmu.h   |   1 +
 arch/arm64/include/asm/mmzone.h|  12 +
 arch/arm64/include/asm/numa.h  |  45 
 arch/arm64/include/asm/pgtable.h   |  15 ++
 arch/arm64/include/asm/topology.h  |  10 +
 arch/arm64/kernel/pci.c|  10 +
 arch/arm64/kernel/setup.c  |  17 +-
 arch/arm64/kernel/smp.c|   4 +
 arch/arm64/mm/Makefile |   1 +
 arch/arm64/mm/init.c   |  35 ++-
 arch/arm64/mm/mm.h |   1 -
 arch/arm64/mm/mmu.c|   2 -
 arch/arm64/mm/numa.c   | 396 +
 drivers/firmware/efi/arm-init.c|   8 +
 drivers/firmware/efi/libstub/fdt.c |  24 +-
 drivers/of/Kconfig |   3 +
 drivers/of/Makefile|   1 +
 drivers/of

[PATCH] mm: memcontrol: let v2 cgroups follow changes in system swappiness

2016-04-08 Thread Johannes Weiner
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We
might want to add one later - that's a different discussion - but
until we do, the cgroups should always follow the system setting.
Otherwise it will be unchangeably set to whatever the ancestor
inherited from the system setting at the time of cgroup creation.

Signed-off-by: Johannes Weiner 
Cc: sta...@vger.kernel.org # 4.5
---
 include/linux/swap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index e58dba3..15d17c8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -534,6 +534,10 @@ static inline swp_entry_t get_swap_page(void)
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
+   /* Cgroup2 doesn't have per-cgroup swappiness */
+   if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
+   return vm_swappiness;
+
/* root ? */
if (mem_cgroup_disabled() || !memcg->css.parent)
return vm_swappiness;
-- 
2.8.0



Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-08 Thread Bjorn Andersson
On Fri, Apr 8, 2016 at 12:06 PM, Timur Tabi  wrote:
> Andrew Lunn wrote:
>
>> There are two different things here. One is configuring the pin to be
>> a GPIO. The second is using the GPIO as a GPIO. In this case,
>> bit-banging the MDIO bus.
>>
>> The firmware could be doing the configuration, setting the pin as a
>> GPIO. However, the firmware cannot be doing the MDIO bit-banging to
>> make an MDIO bus available. Linux has to do that.
>>
>> Or it could be we have all completely misunderstood the hardware, and
>> we are not doing bit-banging GPIO MDIO. There is a real MDIO
>> controller there, we don't use these pins as GPIOs, etc
>
>
> Actually, I think there is a misunderstanding.
>
> On the FSM9900 SOC (which uses device-tree), the two pins that connect to
> the external PHY are gpio pins.  However, the driver needs to reprogram the
> pinmux so that those pins are wired to the Emac controller.  That's what the
> the gpio code in this driver is doing: it's just configuring the pins so
> that they connect directly between the Emac and the external PHY.  After
> that, they are no longer GPIO pins, and you cannot use the "GPIO controlled
> MDIO bus".  There is no MDIO controller on the SOC.  The external PHY is
> controlled directly from the Emac and also from the internal PHY.  It is
> screwy, I know, but that's what Gilad was trying to explain.
>

It sounds like you're trying to say that the pins used can be are
muxed as GPIO or MDIO, in the TLMM.

In the downstream kernel this is often seen with the drivers calling
gpio_request() to "reserve" said pins, but all you should do is
described the desired configuration and muxing in the pinctrl node,
reference that from your driver and simply ignore the fact that those
pins could have been used as GPIO pins.

Regards,
Bjorn


Re: [PATCH] ion: scatterlist offset not used for buffer map

2016-04-08 Thread Laura Abbott

On 04/07/2016 11:56 PM, John Einar Reitan wrote:

On Thu, Apr 07, 2016 at 12:37:50PM -0700, Laura Abbott wrote:

On 04/07/2016 04:29 AM, John Einar Reitan wrote:

ion's default user/kernel page mapping code don't honor the offset
option for scatterlists. It uses sg_page and expect the whole page to be
mapped, while the offset could dictate an offset within a large page.

sg_phys correctly accounts for the offset, so should be used instead.



Can you be more specific about which heap and which allocation pattern
is exposing this bug?


The heap that exposed the bug is one I'm developing and will be posting
as a RFC soon. It uses compound pages and an sub-divides it into surface
buffers. The ion buffers are configured to hold sgl's with the compound
page and the correct offset of the buffer, via
sg_set_page(.., compound_page, .., offset_of_logical_buffer);

sg_phys/sg_virt  includes this offset, but if you poke the sg and extract
the page with sg_page yourself you must include this offset in your
calculations too.



This patch should be re-sent when you have the RFC for the heap. Unless
there is a heap available in tree we don't really need this patch.

Thanks,
Laura


Re: [PATCH v1 09/12] serial: 8250_lpss: split LPSS driver to separate module

2016-04-08 Thread Peter Hurley
On 04/08/2016 01:17 AM, Andy Shevchenko wrote:
> On Fri, Apr 8, 2016 at 4:42 AM, Peter Hurley  wrote:
>> On 04/07/2016 01:37 PM, Andy Shevchenko wrote:
>>> Intes SoCs, such as Braswell, have DesignWare UART. Split out to separate
>>> module which also will be used for Intel Quark later.
>>
>> What's the rationale?
> 
> 1. Not poison 8250_pci with too many quirks.
> 2. They all use same DMA engine, otherwise we might end up in all
> possible DMA engines included in one file.
> 3. All of them are actually DesignWare, so, in the future we might
> share code between 8250_dw and 8250_lpss.

Just my opinion, but I like to see the rationale in the changelog.


>> And this really isn't a split; this patch introduces a number of significant
>> changes from the pci version.
> 
> Some style changes, yes, but "significant"?
> For example?

I'm just pointing out the changelog doesn't really match the
commit. I'm not suggesting necessarily to redo the series, but just more
adequately reflect the change. See below.


>>
>>
>>> Signed-off-by: Andy Shevchenko 
>>> ---
>>>  drivers/tty/serial/8250/8250_lpss.c | 279 
>>> 
>>>  drivers/tty/serial/8250/8250_pci.c  | 227 ++---
>>>  drivers/tty/serial/8250/Kconfig |  14 +-
>>>  drivers/tty/serial/8250/Makefile|   1 +
>>>  4 files changed, 301 insertions(+), 220 deletions(-)
>>>  create mode 100644 drivers/tty/serial/8250/8250_lpss.c
>>>
>>> diff --git a/drivers/tty/serial/8250/8250_lpss.c 
>>> b/drivers/tty/serial/8250/8250_lpss.c
>>> new file mode 100644
>>> index 000..bca4adb
>>> --- /dev/null
>>> +++ b/drivers/tty/serial/8250/8250_lpss.c
>>> @@ -0,0 +1,279 @@
>>> +/*
>>> + * 8250_lpss.c - Driver for UART on Intel Braswell and various other Intel 
>>> SoCs
>>> + *
>>> + * Copyright (C) 2016 Intel Corporation
>>> + * Author: Andy Shevchenko 
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +#include "8250.h"
>>> +
>>> +#define PCI_DEVICE_ID_INTEL_BYT_UART10x0f0a
>>> +#define PCI_DEVICE_ID_INTEL_BYT_UART20x0f0c
>>> +
>>> +#define PCI_DEVICE_ID_INTEL_BSW_UART10x228a
>>> +#define PCI_DEVICE_ID_INTEL_BSW_UART20x228c
>>> +
>>> +#define PCI_DEVICE_ID_INTEL_BDW_UART10x9ce3
>>> +#define PCI_DEVICE_ID_INTEL_BDW_UART20x9ce4
>>> +
>>> +/* Intel LPSS specific registers */
>>> +
>>> +#define BYT_PRV_CLK  0x800
>>> +#define BYT_PRV_CLK_EN   BIT(0)
>>> +#define BYT_PRV_CLK_M_VAL_SHIFT  1
>>> +#define BYT_PRV_CLK_N_VAL_SHIFT  16
>>> +#define BYT_PRV_CLK_UPDATE   BIT(31)
>>> +
>>> +#define BYT_TX_OVF_INT   0x820
>>> +#define BYT_TX_OVF_INT_MASK  BIT(1)
>>> +
>>> +struct lpss8250;
>>> +
>>> +struct lpss8250_board {
>>> + unsigned long freq;
>>> + unsigned int base_baud;
>>> + int (*setup)(struct lpss8250 *, struct uart_port *p);
>>> +};
>>
>> New concept.
>>
>>> +
>>> +struct lpss8250 {
>>> + int line;
>>> + struct lpss8250_board *board;
>>> +
>>> + /* DMA parameters */
>>> + struct uart_8250_dma dma;
>>> + struct dw_dma_slave dma_param;
>>> + u8 dma_maxburst;
>>> +};
>>> +
>>> +/*/
>>
>> Please remove.
>>
>>> +
>>> +static void lpss8250_set_termios(struct uart_port *p,
>>> +  struct ktermios *termios,
>>> +  struct ktermios *old)
>>> +{
>>> + unsigned int baud = tty_termios_baud_rate(termios);
>>> + struct lpss8250 *lpss = p->private_data;
>>> + unsigned long fref = lpss->board->freq, fuart = baud * 16;
>>> + unsigned long w = BIT(15) - 1;
>>> + unsigned long m, n;
>>> + u32 reg;
>>> +
>>> + /* Get Fuart closer to Fref */
>>> + fuart *= rounddown_pow_of_two(fref / fuart);
>>> +
>>> + /*
>>> +  * For baud rates 0.5M, 1M, 1.5M, 2M, 2.5M, 3M, 3.5M and 4M the
>>> +  * dividers must be adjusted.
>>> +  *
>>> +  * uartclk = (m / n) * 100 MHz, where m <= n
>>> +  */
>>> + rational_best_approximation(fuart, fref, w, w, &m, &n);
>>> + p->uartclk = fuart;
>>> +
>>> + /* Reset the clock */
>>> + reg = (m << BYT_PRV_CLK_M_VAL_SHIFT) | (n << BYT_PRV_CLK_N_VAL_SHIFT);
>>> + writel(reg, p->membase + BYT_PRV_CLK);
>>> + reg |= BYT_PRV_CLK_EN | BYT_PRV_CLK_UPDATE;
>>> + writel(reg, p->membase + BYT_PRV_CLK);
>>> +
>>> + p->status &= ~UPSTAT_AUTOCTS;
>>> + if (termios->c_cflag & CRTSCTS)
>>> + p->status |= UPSTAT_AUTOCTS;
>>> +
>>> + serial8250_do_set_termios(p, termios, old);
>>> +}
>>> +
>>> +/**

Re: [Xen-devel] Does __KERNEL_DS serve a purpose?

2016-04-08 Thread Andrew Cooper
On 08/04/16 23:06, Andy Lutomirski wrote:
> On Fri, Apr 8, 2016 at 10:12 AM, Paolo Bonzini  wrote:
>>
>> On 08/04/2016 18:00, Andy Lutomirski wrote:
>>> But %ss can be loaded with 0 on 64-bit kernels.  (I assume that
>>> loading 0 into %ss sets SS.DPL to 0 if done at CPL0, but I'm vague on
>>> this, since it only really matters to hypervisor code AFAIK.)
>> It's even simpler, unless CPL=0 SS cannot be loaded with 0 while in a
>> 64-bit code segment (SS can never be loaded with 0 if you're not in a
>> 64-bit code segment).
>>
>> Thus indeed SS=0 implies SS.DPL=0 on 64-bit kernels.
> I think we are stuck with __KERNEL_DS: SYSCALL uses it.

SYSCALL expects the OS to keep the programmed selector in sync with its
descriptor entry.  It specifically loads fixed attributes, and doesn't
re-read the GDT.

> Unless we start fiddling with conforming code segments (ugh)

I don't see how this would help.

> , I don't think
> there's a valid GDT layout that doesn't have two flat data segments.

My gut feeling is that nothing good can possibly come of having the GDT
entry out of sync with the fixed attributes SYSCALL loads.  It would
break code which manually reloaded %ss, such as constructed an IRET
frame using PUSH %ss.

> Oh well, chalk it up to historical accident.

Feel very glad that SYSCALL and SYSENTER (appear to) behave identically
in their expectations of GDT layout and fixed attributes...

I for one wouldn't bet on it, knowing the x86 architecture.

~Andrew


[net-next][PATCH 0/2] RDS: couple of fixes for 4.6

2016-04-08 Thread Santosh Shilimkar
Patches are also available at below git tree. 

git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git 
for_4.6/net-next/rds-fixes

Qing Huang (1):
  RDS: fix endianness for dp_ack_seq

Santosh Shilimkar (1):
  RDS: Fix the atomicity for congestion map update

 net/rds/cong.c  | 4 ++--
 net/rds/ib_cm.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-- 
1.9.1



[net-next][PATCH 2/2] RDS: Fix the atomicity for congestion map update

2016-04-08 Thread Santosh Shilimkar
Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.

Full credit to Wengang  for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.

Reviewed-by: Leon Romanovsky 
Signed-off-by: Wengang Wang 
Signed-off-by: Santosh Shilimkar 
---
 net/rds/cong.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/cong.c b/net/rds/cong.c
index e6144b8..6641bcf 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -299,7 +299,7 @@ void rds_cong_set_bit(struct rds_cong_map *map, __be16 port)
i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;
 
-   __set_bit_le(off, (void *)map->m_page_addrs[i]);
+   set_bit_le(off, (void *)map->m_page_addrs[i]);
 }
 
 void rds_cong_clear_bit(struct rds_cong_map *map, __be16 port)
@@ -313,7 +313,7 @@ void rds_cong_clear_bit(struct rds_cong_map *map, __be16 
port)
i = be16_to_cpu(port) / RDS_CONG_MAP_PAGE_BITS;
off = be16_to_cpu(port) % RDS_CONG_MAP_PAGE_BITS;
 
-   __clear_bit_le(off, (void *)map->m_page_addrs[i]);
+   clear_bit_le(off, (void *)map->m_page_addrs[i]);
 }
 
 static int rds_cong_test_bit(struct rds_cong_map *map, __be16 port)
-- 
1.9.1



[net-next][PATCH 1/2] RDS: fix endianness for dp_ack_seq

2016-04-08 Thread Santosh Shilimkar
From: Qing Huang 

dp->dp_ack_seq is used in big endian format. We need to do the
big endianness conversion when we assign a value in host format
to it.

Signed-off-by: Qing Huang 
Signed-off-by: Santosh Shilimkar 
---
 net/rds/ib_cm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 8764970..310cabc 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -194,7 +194,7 @@ static void rds_ib_cm_fill_conn_param(struct rds_connection 
*conn,
dp->dp_protocol_major = RDS_PROTOCOL_MAJOR(protocol_version);
dp->dp_protocol_minor = RDS_PROTOCOL_MINOR(protocol_version);
dp->dp_protocol_minor_mask = 
cpu_to_be16(RDS_IB_SUPPORTED_PROTOCOLS);
-   dp->dp_ack_seq = rds_ib_piggyb_ack(ic);
+   dp->dp_ack_seq = cpu_to_be64(rds_ib_piggyb_ack(ic));
 
/* Advertise flow control */
if (ic->i_flowctl) {
-- 
1.9.1



[PATCH RFT 2/2] macb: kill PHY reset code

2016-04-08 Thread Sergei Shtylyov
With  the 'phylib' now  being aware of  the "reset-gpios" PHY node property,
there should be no need to frob the PHY reset in this  driver anymore...

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/cadence/macb.c |   17 -
 drivers/net/ethernet/cadence/macb.h |1 -
 2 files changed, 18 deletions(-)

Index: net-next/drivers/net/ethernet/cadence/macb.c
===
--- net-next.orig/drivers/net/ethernet/cadence/macb.c
+++ net-next/drivers/net/ethernet/cadence/macb.c
@@ -2884,7 +2884,6 @@ static int macb_probe(struct platform_de
  = macb_clk_init;
int (*init)(struct platform_device *) = macb_init;
struct device_node *np = pdev->dev.of_node;
-   struct device_node *phy_node;
const struct macb_config *macb_config = NULL;
struct clk *pclk, *hclk = NULL, *tx_clk = NULL;
unsigned int queue_mask, num_queues;
@@ -2977,18 +2976,6 @@ static int macb_probe(struct platform_de
else
macb_get_hwaddr(bp);
 
-   /* Power up the PHY if there is a GPIO reset */
-   phy_node =  of_get_next_available_child(np, NULL);
-   if (phy_node) {
-   int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-
-   if (gpio_is_valid(gpio)) {
-   bp->reset_gpio = gpio_to_desc(gpio);
-   gpiod_direction_output(bp->reset_gpio, 1);
-   }
-   }
-   of_node_put(phy_node);
-
err = of_get_phy_mode(np);
if (err < 0) {
pdata = dev_get_platdata(&pdev->dev);
@@ -3054,10 +3041,6 @@ static int macb_remove(struct platform_d
mdiobus_unregister(bp->mii_bus);
mdiobus_free(bp->mii_bus);
 
-   /* Shutdown the PHY if there is a GPIO reset */
-   if (bp->reset_gpio)
-   gpiod_set_value(bp->reset_gpio, 0);
-
unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);
clk_disable_unprepare(bp->hclk);
Index: net-next/drivers/net/ethernet/cadence/macb.h
===
--- net-next.orig/drivers/net/ethernet/cadence/macb.h
+++ net-next/drivers/net/ethernet/cadence/macb.h
@@ -832,7 +832,6 @@ struct macb {
unsigned intdma_burst_length;
 
phy_interface_t phy_interface;
-   struct gpio_desc*reset_gpio;
 
/* AT91RM9200 transmit */
struct sk_buff *skb;/* holds skb until xmit 
interrupt completes */



[PATCH RFT 1/2] phylib: add device reset GPIO support

2016-04-08 Thread Sergei Shtylyov
The PHY  devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question;  that solution, when  applied to the device trees,
led to adding the PHY reset GPIO properties to the MAC device node, with
one exception: Cadence MACB driver which could handle the "reset-gpios"
prop  in a PHY device  subnode.  I believe that the correct approach is to
teach the 'phylib' to get the MDIO device reset GPIO from the device tree
node corresponding to this device -- which this patch is doing...

Note that I had to modify the  AT803x PHY driver as it would stop working
otherwise as it made use of the reset GPIO for its own purposes...

Signed-off-by: Sergei Shtylyov 

---
 Documentation/devicetree/bindings/net/phy.txt |2 +
 drivers/net/phy/at803x.c  |   19 ++
 drivers/net/phy/mdio_bus.c|4 +++
 drivers/net/phy/mdio_device.c |   27 +++--
 drivers/net/phy/phy_device.c  |   33 --
 drivers/of/of_mdio.c  |   16 
 include/linux/mdio.h  |3 ++
 include/linux/phy.h   |5 +++
 8 files changed, 89 insertions(+), 20 deletions(-)

Index: net-next/Documentation/devicetree/bindings/net/phy.txt
===
--- net-next.orig/Documentation/devicetree/bindings/net/phy.txt
+++ net-next/Documentation/devicetree/bindings/net/phy.txt
@@ -35,6 +35,8 @@ Optional Properties:
 - broken-turn-around: If set, indicates the PHY device does not correctly
   release the turn around line low at the end of a MDIO transaction.
 
+- reset-gpios: The GPIO phandle and specifier for the PHY reset signal.
+
 Example:
 
 ethernet-phy@0 {
Index: net-next/drivers/net/phy/at803x.c
===
--- net-next.orig/drivers/net/phy/at803x.c
+++ net-next/drivers/net/phy/at803x.c
@@ -65,7 +65,6 @@ MODULE_LICENSE("GPL");
 
 struct at803x_priv {
bool phy_reset:1;
-   struct gpio_desc *gpiod_reset;
 };
 
 struct at803x_context {
@@ -271,22 +270,10 @@ static int at803x_probe(struct phy_devic
 {
struct device *dev = &phydev->mdio.dev;
struct at803x_priv *priv;
-   struct gpio_desc *gpiod_reset;
 
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
-
-   if (phydev->drv->phy_id != ATH8030_PHY_ID)
-   goto does_not_require_reset_workaround;
-
-   gpiod_reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
-   if (IS_ERR(gpiod_reset))
-   return PTR_ERR(gpiod_reset);
-
-   priv->gpiod_reset = gpiod_reset;
-
-does_not_require_reset_workaround:
phydev->priv = priv;
 
return 0;
@@ -361,14 +348,14 @@ static void at803x_link_change_notify(st
 */
if (phydev->drv->phy_id == ATH8030_PHY_ID) {
if (phydev->state == PHY_NOLINK) {
-   if (priv->gpiod_reset && !priv->phy_reset) {
+   if (phydev->mdio.reset && !priv->phy_reset) {
struct at803x_context context;
 
at803x_context_save(phydev, &context);
 
-   gpiod_set_value(priv->gpiod_reset, 1);
+   phy_device_reset(phydev, 1);
msleep(1);
-   gpiod_set_value(priv->gpiod_reset, 0);
+   phy_device_reset(phydev, 0);
msleep(1);
 
at803x_context_restore(phydev, &context);
Index: net-next/drivers/net/phy/mdio_bus.c
===
--- net-next.orig/drivers/net/phy/mdio_bus.c
+++ net-next/drivers/net/phy/mdio_bus.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -371,6 +372,9 @@ void mdiobus_unregister(struct mii_bus *
if (!mdiodev)
continue;
 
+   if (mdiodev->reset)
+   gpiod_put(mdiodev->reset);
+
mdiodev->device_remove(mdiodev);
mdiodev->device_free(mdiodev);
}
Index: net-next/drivers/net/phy/mdio_device.c
===
--- net-next.orig/drivers/net/phy/mdio_device.c
+++ net-next/drivers/net/phy/mdio_device.c
@@ -12,6 +12,8 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -103,6 +105,13 @@ void mdio_device_remove(struct mdio_devi
 }
 EXPORT_SYMBOL(mdio_device_remove);
 
+void mdio

[PATCH RFT 0/2] Teach phylib hard-resetting devices

2016-04-08 Thread Sergei Shtylyov
Hello.

   Here's the set of 2 patches against DaveM's 'net-next.git' repo. They add to
'phylib' support for resetting devices via GPIO and do some clean up after
doing that...

[1/2] phylib: add device reset GPIO support
[2/2] macb: kill PHY reset code

MBR, Sergei



RE: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-08 Thread Doug Smythies
On 2016.04.08 14:00 Rafael J. Wysocki wrote:
> On Friday, April 08, 2016 08:50:54 AM Mike Galbraith wrote:
>> On Fri, 2016-04-08 at 08:45 +0200, Peter Zijlstra wrote:
>> 
>>> Cute, I thought you used governor=performance for your runs?
>> 
>> I do, and those numbers are with it thus set.

> Well, this is a trade-off.
>
> 4.5 introduced a power regression here so this one goes back to the previous
> state of things.

Mike:

Could you send me, or point me to, the program "pipe-test"?
So far, I have only found one, but it is both old and not
the same program you are running (based on print statements).

I realize I might not be to recreate your problem scenario anyhow,
I just want to try.

... Doug




Re: [PATCH net-next 1/8] perf: optimize perf_fetch_caller_regs

2016-04-08 Thread Steven Rostedt
On Tue, 5 Apr 2016 14:06:26 +0200
Peter Zijlstra  wrote:

> On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> > avoid memset in perf_fetch_caller_regs, since it's the critical path of all 
> > tracepoints.
> > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of 
> > perf_trace_##call
> > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by 
> > perpcu_alloc  
> 
> Its not actually allocated; but because its a static uninitialized
> variable we get .bss like behaviour and the initial value is copied to
> all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.
> 
> > and
> > subsequent call to perf_arch_fetch_caller_regs initializes the same fields 
> > on all archs,
> > so we can safely drop memset from all of the above cases and   
> 
> Indeed.
> 
> > move it into
> > perf_ftrace_function_call that calls it with stack allocated pt_regs.  
> 
> Hmm, is there a reason that's still on-stack instead of using the
> per-cpu thing, Steve?

Well, what do you do when you are tracing with regs in an interrupt
that already set the per cpu regs field? We could create our own
per-cpu one as well, but then that would require checking which level
we are in, as we can have one for normal context, one for softirq
context, one for irq context and one for nmi context.

-- Steve



> 
> > Signed-off-by: Alexei Starovoitov   
> 
> In any case,
> 
> Acked-by: Peter Zijlstra (Intel) 



Re: [Xen-devel] Does __KERNEL_DS serve a purpose?

2016-04-08 Thread Andy Lutomirski
On Fri, Apr 8, 2016 at 10:12 AM, Paolo Bonzini  wrote:
>
>
> On 08/04/2016 18:00, Andy Lutomirski wrote:
>> But %ss can be loaded with 0 on 64-bit kernels.  (I assume that
>> loading 0 into %ss sets SS.DPL to 0 if done at CPL0, but I'm vague on
>> this, since it only really matters to hypervisor code AFAIK.)
>
> It's even simpler, unless CPL=0 SS cannot be loaded with 0 while in a
> 64-bit code segment (SS can never be loaded with 0 if you're not in a
> 64-bit code segment).
>
> Thus indeed SS=0 implies SS.DPL=0 on 64-bit kernels.

I think we are stuck with __KERNEL_DS: SYSCALL uses it.  Unless we
start fiddling with conforming code segments (ugh), I don't think
there's a valid GDT layout that doesn't have two flat data segments.

Oh well, chalk it up to historical accident.


[PATCH 1/4] arm: pmu: Fix non-devicetree probing

2016-04-08 Thread Jeremy Linton
From: Mark Salter 

There is a problem in the non-devicetree PMU probing where some
probe functions may get the number of supported events through
smp_call_function_any() using the arm_pmu supported_cpus mask.
But at the time the probe function is called, the supported_cpus
mask is empty so the call fails. This patch makes sure the mask
is set before calling the init function rather than after.

Signed-off-by: Mark Salter 
Signed-off-by: Jeremy Linton 
---
 drivers/perf/arm_pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 32346b5..49fa845 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -997,8 +997,8 @@ int arm_pmu_device_probe(struct platform_device *pdev,
if (!ret)
ret = init_fn(pmu);
} else {
-   ret = probe_current_pmu(pmu, probe_table);
cpumask_setall(&pmu->supported_cpus);
+   ret = probe_current_pmu(pmu, probe_table);
}
 
if (ret) {
-- 
2.4.3



Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

2016-04-08 Thread Luis R. Rodriguez
On Fri, Apr 08, 2016 at 03:16:14PM +0100, George Dunlap wrote:
> On 07/04/16 19:51, Luis R. Rodriguez wrote:
> > While Andrew's position is right in that perhaps only Xen tools have to deal
> > with the HVMLite specific entry, it would also still mean diverging from 
> > ARM's
> > own EFI entry only position, which I'd like to clarify that ARM has no 
> > custom
> > Xen entry, we should strive to match that. Anything far from that to me 
> > really
> > deserves an explanation, specially if we are going to argue that HVMLite is
> > the best that x86 Xen can do.
> > 
> > Ultimately unifying entry approaches for Xen in a streamlined fashion seems
> > like a sensible thing to strive for. Anything we push in the other 
> > direction,
> > as small as it can be, should deserve at least a 'hey, wait a minute'...
> 
> Quick factual correction here.
> 
> "Since ARM guests only use the EFI entry point, x86 guests should also
> only use the EFI entry point" is certainly a reasonable argument to make.
> 
> However, dom0 on ARM does not use the EFI entry point.  When starting
> dom0, Xen uses the native entry point (the one that UBoot uses) and
> hands dom0 a device-tree node.  The reason this is possible on ARM is
> that there are no assumptions made about what hardware is or is not
> present on the system -- everything that needs to be communicated about
> what is or is not present can be passed in DT.
> 
> So it is incorrect to say that ARM has an "EFI entry only" position.
> 
> (On ACPI systems, it does apparently generate some UEFI informational
> tables, which it passes to the dom0 kernel via DT; and the kernel
> unpacks and puts in the right place.  Normal Xen ARM guests can use EFI,
> but that's because we start OVMF in the guest context to provide the EFI
> services.  These may be where the idea that ARM guests use only the UEFI
> entry point came from.)
> 
> Obviously it would be nice if we could use the native entry point on x86
> as well, but there's decades of legacy hardware and backwards
> compatibility to deal with there.

OK thanks for the clarification -- still no custom entries for Xen!
We should strive for that, at the very least.

You do have a point about the legacy stuff. There are two options there:

  * Fold legacy support under HVMLite -- which seems to be what we
currently want to do (we should evaluate the implications and
requirements here for that); or

  * Leave legacy stuff on the old PV path; this may be something to
bring to the table if we had in place a proactive solution to
avoid further fallout from the architecture of the huge differences
on the entries. The work I'm doing should help with that. (We should
also evaluate the implications and requirements here for that as
well).

  Luis


[PATCH 4/4] arm64: pmu: add A72 cpu type, support multiple PMU types

2016-04-08 Thread Jeremy Linton
ARM big/little machines can have PMU's with differing PMU counters.
ACPI systems should be able to support this as well. Also add support
for A72 PMU counters.

Signed-off-by: Jeremy Linton 
---
 arch/arm64/include/asm/cputype.h |   1 +
 arch/arm64/kernel/perf_event.c   |   1 +
 drivers/perf/arm_pmu.c   |  54 +++--
 drivers/perf/arm_pmu_acpi.c  | 229 +++
 4 files changed, 204 insertions(+), 81 deletions(-)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 87e1985..1e40799 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -74,6 +74,7 @@
 
 #define ARM_CPU_PART_AEM_V80xD0F
 #define ARM_CPU_PART_FOUNDATION0xD00
+#define ARM_CPU_PART_CORTEX_A720xD08
 #define ARM_CPU_PART_CORTEX_A570xD07
 #define ARM_CPU_PART_CORTEX_A530xD03
 
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 8f12eac..1893f77 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -870,6 +870,7 @@ static const struct of_device_id armv8_pmu_of_device_ids[] 
= {
 static const struct pmu_probe_info armv8_pmu_probe_table[] = {
ARMV8_PMU_PART_PROBE(ARM_CPU_PART_CORTEX_A53, armv8_a53_pmu_init),
ARMV8_PMU_PART_PROBE(ARM_CPU_PART_CORTEX_A57, armv8_a57_pmu_init),
+   ARMV8_PMU_PART_PROBE(ARM_CPU_PART_CORTEX_A72, armv8_a72_pmu_init),
PMU_PROBE(0, 0, armv8_pmuv3_init), /* if all else fails... */
{ /* sentinel value */ }
 };
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 49fa845..ffca517 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -11,6 +11,7 @@
  */
 #define pr_fmt(fmt) "hw perfevents: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -24,6 +25,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -853,25 +855,51 @@ static void cpu_pmu_destroy(struct arm_pmu *cpu_pmu)
 }
 
 /*
- * CPU PMU identification and probing.
+ * CPU PMU identification and probing. Its possible to have
+ * multiple CPU types in an ARM machine. Assure that we are
+ * picking the right PMU types based on the CPU in question
  */
-static int probe_current_pmu(struct arm_pmu *pmu,
-const struct pmu_probe_info *info)
+static int probe_plat_pmu(struct arm_pmu *pmu,
+const struct pmu_probe_info *info,
+unsigned int pmuid)
 {
-   int cpu = get_cpu();
-   unsigned int cpuid = read_cpuid_id();
int ret = -ENODEV;
+   int cpu;
+   int aff_ctr = 0;
+   struct platform_device *pdev = pmu->plat_device;
+   int irq = platform_get_irq(pdev, 0);
 
-   pr_info("probing PMU on CPU %d\n", cpu);
+   if (irq >= 0 && !irq_is_percpu(irq)) {
+   pmu->irq_affinity = kcalloc(pdev->num_resources, sizeof(int),
+   GFP_KERNEL);
+   if (!pmu->irq_affinity)
+   return -ENOMEM;
+   }
 
+   for_each_possible_cpu(cpu) {
+   struct cpuinfo_arm64 *cinfo = per_cpu_ptr(&cpu_data, cpu);
+   unsigned int cpuid = cinfo->reg_midr;
+
+   if (cpuid == pmuid) {
+   cpumask_set_cpu(cpu, &pmu->supported_cpus);
+   pr_devel("enable pmu on cpu %d\n", cpu);
+   if (pmu->irq_affinity) {
+   pmu->irq_affinity[aff_ctr] = cpu;
+   aff_ctr++;
+   }
+   }
+   }
+
+   pr_debug("probing PMU %X\n", pmuid);
+   /* find the type of PMU given the CPU */
for (; info->init != NULL; info++) {
-   if ((cpuid & info->mask) != info->cpuid)
+   if ((pmuid & info->mask) != info->cpuid)
continue;
+   pr_devel("Found PMU\n");
ret = info->init(pmu);
break;
}
 
-   put_cpu();
return ret;
 }
 
@@ -997,8 +1025,14 @@ int arm_pmu_device_probe(struct platform_device *pdev,
if (!ret)
ret = init_fn(pmu);
} else {
-   cpumask_setall(&pmu->supported_cpus);
-   ret = probe_current_pmu(pmu, probe_table);
+   if (acpi_disabled) {
+   /* use the boot cpu. */
+   struct cpuinfo_arm64 *cinfo = per_cpu_ptr(&cpu_data, 0);
+   unsigned int cpuid = cinfo->reg_midr;
+
+   ret = probe_plat_pmu(pmu, probe_table, cpuid);
+   } else
+   ret = probe_plat_pmu(pmu, probe_table, pdev->id);
}
 
if (ret) {
diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
index 722f4ca..793092c 100644
--- a/drivers/perf/arm_pmu_acpi.c
+++ b/drivers/perf/arm_pmu_acpi.c
@@ -2,6 +2,7 @@
  * PMU supp

[PATCH 2/4] arm64: pmu: add fallback probe table

2016-04-08 Thread Jeremy Linton
From: Mark Salter 

In preparation for ACPI support, add a pmu_probe_info table to
the arm_pmu_device_probe() call. This table gets used when
probing in the absence of a devicetree node for PMU.

Signed-off-by: Mark Salter 
Signed-off-by: Jeremy Linton 
---
 arch/arm64/kernel/perf_event.c | 10 +-
 include/linux/perf/arm_pmu.h   |  3 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index f419a7c..8f12eac 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -867,9 +867,17 @@ static const struct of_device_id armv8_pmu_of_device_ids[] 
= {
{},
 };
 
+static const struct pmu_probe_info armv8_pmu_probe_table[] = {
+   ARMV8_PMU_PART_PROBE(ARM_CPU_PART_CORTEX_A53, armv8_a53_pmu_init),
+   ARMV8_PMU_PART_PROBE(ARM_CPU_PART_CORTEX_A57, armv8_a57_pmu_init),
+   PMU_PROBE(0, 0, armv8_pmuv3_init), /* if all else fails... */
+   { /* sentinel value */ }
+};
+
 static int armv8_pmu_device_probe(struct platform_device *pdev)
 {
-   return arm_pmu_device_probe(pdev, armv8_pmu_of_device_ids, NULL);
+   return arm_pmu_device_probe(pdev, armv8_pmu_of_device_ids,
+   armv8_pmu_probe_table);
 }
 
 static struct platform_driver armv8_pmu_driver = {
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 4196c90..495332f 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -145,6 +145,9 @@ struct pmu_probe_info {
 #define XSCALE_PMU_PROBE(_version, _fn) \
PMU_PROBE(ARM_CPU_IMP_INTEL << 24 | _version, ARM_PMU_XSCALE_MASK, _fn)
 
+#define ARMV8_PMU_PART_PROBE(_part, _fn) \
+   PMU_PROBE((_part) << MIDR_PARTNUM_SHIFT, MIDR_PARTNUM_MASK, _fn)
+
 int arm_pmu_device_probe(struct platform_device *pdev,
 const struct of_device_id *of_table,
 const struct pmu_probe_info *probe_table);
-- 
2.4.3



[PATCH 0/4 v3] arm64/perf: Add ACPI support

2016-04-08 Thread Jeremy Linton
Enable ARM performance monitoring units on ACPI/arm64 machines.

This patch expands and reworks the patches published by Mark Salter
in order to clean up a few of the previous review comments, as well as
add support for A72's and big/little configurations.

I've been testing this patch in convert with an assortment of ACPI
patches to enable things like PCIe. Its been tested on juno, seattle
and some xgene systems.

Thanks,

Jeremy Linton (1):
  arm64: pmu: add A72 cpu type, support multiple PMU types

Mark Salter (3):
  arm: pmu: Fix non-devicetree probing
  arm64: pmu: add fallback probe table
  arm64: pmu: Add support for probing with ACPI

 arch/arm64/include/asm/cputype.h |   1 +
 arch/arm64/kernel/perf_event.c   |  11 +-
 arch/arm64/kernel/smp.c  |   5 +
 drivers/perf/Kconfig |   4 +
 drivers/perf/Makefile|   1 +
 drivers/perf/arm_pmu.c   |  54 --
 drivers/perf/arm_pmu_acpi.c  | 212 +++
 include/linux/perf/arm_pmu.h |  10 ++
 8 files changed, 287 insertions(+), 11 deletions(-)
 create mode 100644 drivers/perf/arm_pmu_acpi.c

-- 
2.4.3



[PATCH 3/4] arm64: pmu: Add support for probing with ACPI

2016-04-08 Thread Jeremy Linton
From: Mark Salter 

In the case of ACPI, the PMU IRQ information is contained in the
MADT table. Also, since the PMU does not exist as a device in the
ACPI DSDT table, it is necessary to create a platform device so
that the appropriate driver probing is triggered.

Signed-off-by: Mark Salter 
Signed-off-by: Jeremy Linton 
---
 arch/arm64/kernel/smp.c  |   5 ++
 drivers/perf/Kconfig |   4 ++
 drivers/perf/Makefile|   1 +
 drivers/perf/arm_pmu_acpi.c  | 125 +++
 include/linux/perf/arm_pmu.h |   7 +++
 5 files changed, 142 insertions(+)
 create mode 100644 drivers/perf/arm_pmu_acpi.c

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b2d5f4e..c6f2c53 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -502,6 +503,7 @@ acpi_map_gic_cpu_interface(struct 
acpi_madt_generic_interrupt *processor)
return;
}
bootcpu_valid = true;
+   arm_pmu_parse_acpi(0, processor);
return;
}
 
@@ -522,6 +524,9 @@ acpi_map_gic_cpu_interface(struct 
acpi_madt_generic_interrupt *processor)
 */
acpi_set_mailbox_entry(cpu_count, processor);
 
+   /* get PMU irq info */
+   arm_pmu_parse_acpi(cpu_count, processor);
+
cpu_count++;
 }
 
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 04e2653..818fa3b 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -12,4 +12,8 @@ config ARM_PMU
  Say y if you want to use CPU performance monitors on ARM-based
  systems.
 
+config ARM_PMU_ACPI
+   def_bool y
+   depends on ARM_PMU && ACPI
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index acd2397..fd8090d 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o
+obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c
new file mode 100644
index 000..722f4ca
--- /dev/null
+++ b/drivers/perf/arm_pmu_acpi.c
@@ -0,0 +1,125 @@
+/*
+ * PMU support
+ *
+ * Copyright (C) 2015 Red Hat Inc.
+ * Author: Mark Salter 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PMU_PDEV_NAME "armv8-pmu"
+
+struct pmu_irq {
+   int gsi;
+   int trigger;
+};
+
+static struct pmu_irq pmu_irqs[NR_CPUS] __initdata;
+
+void __init arm_pmu_parse_acpi(int cpu, struct acpi_madt_generic_interrupt 
*gic)
+{
+   pmu_irqs[cpu].gsi = gic->performance_interrupt;
+   if (gic->flags & ACPI_MADT_PERFORMANCE_IRQ_MODE)
+   pmu_irqs[cpu].trigger = ACPI_EDGE_SENSITIVE;
+   else
+   pmu_irqs[cpu].trigger = ACPI_LEVEL_SENSITIVE;
+}
+
+#ifndef CONFIG_SMP
+/*
+ * In !SMP case, we parse for boot CPU IRQ here.
+ */
+static int __init acpi_parse_pmu_irqs(struct acpi_subtable_header *header,
+ const unsigned long end)
+{
+   struct acpi_madt_generic_interrupt *gic;
+
+   gic = (struct acpi_madt_generic_interrupt *)header;
+
+   if (cpu_logical_map(0) == (gic->arm_mpidr & MPIDR_HWID_BITMASK))
+   arm_pmu_parse_acpi(0, gic);
+
+   return 0;
+}
+
+static void __init acpi_parse_boot_cpu(void)
+{
+   count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
+ acpi_parse_pmu_irqs, 0);
+}
+#else
+#define acpi_parse_boot_cpu() do {} while (0)
+#endif
+
+static int __init pmu_acpi_init(void)
+{
+   struct platform_device *pdev;
+   struct pmu_irq *pirq = pmu_irqs;
+   struct resource *res, *r;
+   int err = -ENOMEM;
+   int i, count, irq;
+
+   if (acpi_disabled)
+   return 0;
+
+   acpi_parse_boot_cpu();
+
+   /* Must have irq for boot boot cpu, at least */
+   if (pirq->gsi == 0)
+   return -EINVAL;
+
+   irq = acpi_register_gsi(NULL, pirq->gsi, pirq->trigger,
+   ACPI_ACTIVE_HIGH);
+
+   if (irq_is_percpu(irq))
+   count = 1;
+   else
+   for (i = 1, count = 1; i < NR_CPUS; i++)
+   if (pmu_irqs[i].gsi)
+   ++count;
+
+   pdev = platform_device_alloc(PMU_PDEV_NAME, -1);
+   if (!pdev)
+   goto err_free_gsi;
+
+   res = kcalloc(count, sizeof(*res), GFP_KERNEL);
+   if (!res)
+   goto err_free_device;
+
+   for (i = 0, r = res; i < count; i++, pirq++, r++) {
+   if (i)
+   irq = acpi_register_gsi(NULL, pirq->gsi, pirq->trigger,
+   ACPI_ACTIVE_HIGH);
+   r->start = r->end = irq;
+   r->flags = IORESOURCE_IRQ;
+

Re: [PATCH 01/13] devpts: Teach /dev/ptmx to find the associated devpts via path lookup

2016-04-08 Thread Andy Lutomirski
On Fri, Apr 8, 2016 at 2:29 PM, Eric W. Biederman  wrote:
> Andy Lutomirski  writes:
>
>> On Apr 8, 2016 12:05 PM, "Linus Torvalds"  
>> wrote:
>>>
>>> On Fri, Apr 8, 2016 at 11:51 AM, Eric W. Biederman
>>>  wrote:
>>> >
>>> > Given that concern under the rule we don't break userspace we have to
>>> > check the permissions of /dev/pts/ptmx when we are creating a new pty,
>>> > on a instance of devpts that was created with newinstance.
>>>
>>> The rule is that we don't break existing installations.
>>>
>>> If somebody has root and installs a "ptmx" node in an existing mount
>>> space next to a pts subdirectory, that's not a security issue, nor is
>>> it going to break any existing installation.
>>
>> What Eric's saying is that you don't have to be root for this.
>>
>> But Eric, I think there might be a better mitigation.  For a ptmx
>> chardev, just fail the open if the chardev's vfsmount or the devpts's
>> vfsmount doesn't belong to the same userns as the devpts's superblock.
>> After all, setting this attack up requires the caps on one of the
>> vfsmounts, and if you have those caps you could attack your own devpts
>> instance quite easily.  Would that work?
>
> I don't think so.  For one it depends on getting s_user_ns which should
> happen but is not there yet.  For another the way you describe
> it you would break the case of
>
> unshare(CLONE_NEWUSER);
> unshare(CLONE_NEWNS);
> open("/dev/ptmx");
>
> Which is actually more likely to break userspace than anything else we
> have considered.  I know people actually do that.
>

Hmm, you're right.  Never mind.

--Andy


Re: [PATCH 01/13] devpts: Teach /dev/ptmx to find the associated devpts via path lookup

2016-04-08 Thread Linus Torvalds
On Fri, Apr 8, 2016 at 2:29 PM, Eric W. Biederman  wrote:
>
> I don't think so.  For one it depends on getting s_user_ns which should
> happen but is not there yet.  For another the way you describe
> it you would break the case of
>
> unshare(CLONE_NEWUSER);
> unshare(CLONE_NEWNS);
> open("/dev/ptmx");
>
> Which is actually more likely to break userspace than anything else we
> have considered.  I know people actually do that.

.. but you could just check that the ptmx node is actually the same
superblock that the pts directory is mounted on. If it's a bind mount,
that wouldn't be true.

But more fundamentally I still don't actually understand why you even
really care.

We get the wrong pts case *today*. We'd get a different wrong pts
namespace when somebody tries to do odd things. Why would we care? It
would be a _better_ guess.

I don't see the security issue. If you do tricks to get pty's in
another group, what's the problem? You have to do it consciously, and
I don't see what the downside is. You get what you ask for, and I
don't see a new attack surface.

The whole "somebody used chmod on /dev/pts/" argument sounds bogus.
That's an insane thing to do. If you want a private namespace, you
make *all* of /dev private, you don't go "oh, I'll just make the pts
subdirectory private".

In other words, your whole scenario sounds totally made up to begin
with. And even if it happens, I don't see what would be so disastrous
about it.

I mean, right now, /dev/ptmx is world read-write in the root container
and everybody gets access to the same underlying set of ptys. And
that's not some horrible security issue. It's how things are
*supposed* to work.

So I really don't see the argument. You guys are just making shit up.

Linus


Re: [PATCH] cpufreq: Skip all governor-related actions for cpufreq_suspended set

2016-04-08 Thread Rafael J. Wysocki
On Friday, April 08, 2016 11:14:14 AM Viresh Kumar wrote:
> On 08-04-16, 00:05, Rafael J. Wysocki wrote:
> > On Thursday, April 07, 2016 05:35:03 PM Viresh Kumar wrote:
> 
> > > That's *ugly* and it works by chance, unless I am misreading it
> > > completely.
> > 
> > I'm assuming that what you mean by "ugly" here is "not really 
> > straightforward",
> > which I agree with,
> 
> Yeah.
> 
> > but then it is really disappointing to see comments like
> > that from you about the code that you helped to write.
> 
> I was just trying to say that this isn't how I feel it should be done.
> :(

Fair enough.

> > Moreover, runtime CPU offline *also* doesn't have to run the governor 
> > exit/init
> > for the same reason why the policy directory doesn't have to be removed on
> > CPU offline: it is just pointless to do that.  The governor has been stopped
> > already and it won't do anything more.  The only problem here is to prevent
> > governor tunable sysfs attributes from triggering actions in that state,
> > but that shouldn't be too difficult to arrange for.  If that's done,
> 
> Isn't that already guaranteed as userspace should have been frozen by
> by the time we reach cpufreq_suspend()?

For the "offline/online during suspend/resume" case it is guaranteed, but for
the "runtime offline/online" case it isn't.

I essentially would like those two cases to be as similar as reasonably
possible, if not identical.

Thanks,
Rafael



Re: [Xen-devel] HVMLite / PVHv2 - using x86 EFI boot entry

2016-04-08 Thread Luis R. Rodriguez
On Wed, Apr 06, 2016 at 12:23:47PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 06, 2016 at 12:05:16PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 06, 2016 at 04:02:40PM +0100, Matt Fleming wrote:
> > > On Wed, 06 Apr, at 12:07:36PM, George Dunlap wrote:
> > > > 
> > > > So rather than make a new entry point which does just the minimal
> > > > amount of work to run on a software interface (Xen), you want to take
> > > > an interface designed for hardware (EFI) and put in hacks so that it
> > > > knows that sometimes some EFI services are not available?  That sounds
> > > > like it's going to make the EFI path just as unmanageable as the
> > > > current PV path.
> > >  
> > > Requiring code in the new entry point to manipulate control registers
> > > and do the switch to long-mode does not seem like a minimal amount of
> > > code to me,
> > > 
> > >   
> > > http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg00134.html
> > > 
> > > What's likely to happen in the future is that startup_(32|64) will be
> > > entered with different settings depending on whether coming from
> > > HVMlite or bare metal, due to the natural tendency for these kinds of
> > > code paths to diverge.
> > 
> > I hope they do not have the same churn as the rest of Linux code.
> > 
> > The startup_(32|64) are to be called from divergent
> > bootloaders - and they are responsible to set the stage. Or in other
> > words - startup_(32|64) has some expectations of what the world
> > will look like. Changing those means the bootloaders stub have to change
> > too.
> > 
> > But if there is churn it surely is less than what the PV code paths
> > are enforcing now in x86 code.

Its better for sure. But we can also look at other options to make it
even better. Its worth some review at the very least.

> Let me expand on that since I was not sure if I was clear.
> 
> Currently Boris tirelessly ends up fixing on almost every merge window
> Xen related fallout. That is new functionality that breaks Xen.
> He has been doing this for years and before him I was doing it.

FWIW the work I'm doing with linker tables and x86's use of this on
the boot side of things should help avoid these issues proactively.
Sounds too good to be true ? I know. I thought it was rather impossible,
but its what I've come up with and I think it should really help with
that.

This should help either avoid these issues moving forward proactively
to let us keep the old PV path for legacy junk if we want that, or if
we really want to remove the PV path completely and replace it with
HVMLite it should help us avoid issues proactively until we are
ready to nuke the old PV path completely.

So lets say we plan to remove old PV path in 5 years, with the work I'm
doing on the old PV path it means we'll have in place a proactive
framework to avoid Xen fallout *now*, while we churn away towards the
HVMLite lofty goals.

> This is what an maintainer does - and with the HVMLite/PVH stub
> paths that will still continue - that is fallout from the
> startup_(32|64) code changes will be handled as before.

Right, that's because we did not have a proactive solution to
the problem.

> However the bigger goals are that:
>  - This churn will be much much lower than the existing one,
> 
>  - baremetal won't have to deal with some rather odd semantics
>placed by the pvops paths that are funky and drive x86
>maintainers to lose hair (amongts other things).

Right on. We are all in strong agreement that the old PV path is a
grand piece of fecal matter.

  Luis


Re: [PATCH] cpufreq: Skip all governor-related actions for cpufreq_suspended set

2016-04-08 Thread Rafael J. Wysocki
On Friday, April 08, 2016 11:15:13 AM Viresh Kumar wrote:
> On 07-04-16, 03:29, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki 
> > 
> > Since governor operations are generally skipped if cpufreq_suspended
> > is set, do nothing at all in cpufreq_start_governor() and
> > cpufreq_exit_governor() in that case.
> > 
> > In particular, this prevents fast frequency switching from being
> > disabled after a suspend-to-RAM cycle on all CPUs except for the
> > boot one.
> > 
> > Fixes: b7898fda5bc7 (cpufreq: Support for fast frequency switching)
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> >  drivers/cpufreq/cpufreq.c |6 ++
> >  1 file changed, 6 insertions(+)
> 
> Acked-by: Viresh Kumar 

Thanks!

However, since I'm going to apply https://patchwork.kernel.org/patch/8777561/
to pm-cpufreq-sched, I will only apply the first hunk of the $subject one,
ie. the patch below.  I assume that the ACK still applies. :-)

I'll take it for v4.6, because it fixes up a commit already in there.

---
From: Rafael J. Wysocki 
Subject: [PATCH] cpufreq: Skip all governor-related actions for 
cpufreq_suspended set

Since governor operations are generally skipped if cpufreq_suspended
is set, do nothing at all in cpufreq_start_governor() in that case.

That function is called in the cpufreq_online() path, and may also
be called from cpufreq_offline() in some cases, which are invoked
by the nonboot CPUs disabing/enabling code during system suspend
to RAM and resume.  That happens when all devices have been
suspended, so if the cpufreq driver relies on things like I2C to
get the current frequency, it may not be ready to do that then.

The change here prevents problems from happening for this reason.

Fixes: 3bbf8fe3ae08 (cpufreq: Always update current frequency before startig 
governor)
Signed-off-by: Rafael J. Wysocki 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-pm/drivers/cpufreq/cpufreq.c
===
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -2053,6 +2053,9 @@ static int cpufreq_start_governor(struct
 {
int ret;
 
+   if (cpufreq_suspended)
+   return 0;
+
if (cpufreq_driver->get && !cpufreq_driver->setpolicy)
cpufreq_update_current_freq(policy);
 



Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-04-08 Thread Timur Tabi

Vikram Sethi wrote:


On the FSM9900 SOC (which uses device-tree), the two pins that connect to the external 
PHY are gpio pins.  However, the driver needs to reprogram the pinmux so that those pins 
are wired to the Emac controller.  That's what the the gpio code in this driver is doing: 
it's just configuring the pins so that they connect directly between the Emac and the 
external PHY.  After that, they are no longer GPIO pins, and you cannot use the 
"GPIO controlled MDIO bus".  There is no MDIO controller on the SOC.  The 
external PHY is controlled directly from the Emac and also from the internal PHY.  It is 
screwy, I know, but that's what Gilad was trying to explain.

It is incorrect to say there's no MDIO controller on the SoC. The EMAC Core on 
the SoC itself has a MDIO controller which talks to the external PHY. The 
internal SGMII is not on MDIO however.
Please see the EMAC specification.


Sorry, I should have said that there is no *independent* MDIO controller 
(one that has its own driver).  As you said, you can only talk to the 
external PHY through the Emac.


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.


  1   2   3   4   5   6   >